Imperative Programming
Lesson 8
Binary search
Today we will…
• Program the binary search algorithm
• Learn to use binary search.
• Experiment it with arrays of strings.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
2
Why do we sort?
To better find!
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
3
Binary search, common sense
• Linear search (function find) computes the
position of the first occurrence of a given value.
• It is an exhaustive search: if the given value is
not present in the array, we only find that out
after having checked all the elements in the
array.
• If the array is sorted, we know, by common
sense, that we may find the given value, or
conclude that it is not present in the array,
without having to check all the elements.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
4
Binary search
• There is a sorted array where we want to find a
given value. We split the array in two equal sized
halves, and look at the middle element. There are
three possibilities:
The middle element is greater than the given
value; in this case, we search again but consider
only the first half.
The middle element is less than the given value;
in this case, we search again but consider only
the first half.
The middle element is equal to the given value:
we found it!
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
5
Function bfind
int bfind (int *a, int n, int x)
{
Variable i represents the left
int i = 0;
Only works if the
boundary of the search
array is sorted.
int j = n-1;
interval; variable j, the right
boundary.
while (i <= j)
{
Variable m represents the
int m = (i+j) / 2;
subscript of the middle
if (a[m] > x)
element.
j = m-1;
The middle element is greater
else if (a[m] < x)
than the given value:
i = m+1;
The middle element is less
else
than the given value:
return m;
Bingo!
}
The given element does not
return -1;
exist in the array.
21 March 2011
Imperative Programming -}8 © Pedro Guerreiro
6
Playing with bfind
void test_bfind (void)
{
int a [1000];
int n = numbers_read (stdin, a);
int x;
assert (is_sorted (a, n));
while (scanf ("%d", &x) != EOF)
{
int z = bfind (a, n, x);
printf ("%d\n", z);
}
}
21 March 2011
4 7 12 23 45
61 62 79 88 89
90 93
^Z
70
-1
45
4
4
0
3 3 3 5 5 5 5 5 5 8 8
3
^Z
-1
4
93
-1
11
100 3
Note that if the
2
-1
given value
5
88
5
occurs more than
8
8
once, bfind does
^Z
9
not necessarily
100
find the first
-1
occurrence.
^Z
Imperative Programming - 8 © Pedro Guerreiro
7
Tracing bfind
int trace_bfind (int *a, int n, int x)
{
int i = 0;
Marks position i and j
int j = n-1;
with a vertical bar.
while (i <= j)
{
char s [256];
int m = (i+j) / 2;
ints_putfln ("%3d", a, n);
printf ("%*s%*s\n", 3 * (i+1), "|", 3 * (j-i), "|");
printf ("%*s", 3 * (m+1), a[m] < x ? "<" : a[m] > x ? ">" : "=");
gets(s);
…
}
Marks the middle element with “<”, “>” or “=”,
return -1;
depending on whether it is less, it is greater
}
or it is equal to the given value.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
8
Watching the trace
50 does
not exist.
10 exists.
21 March 2011
12 exists.
6 18 12 91 93 54 23 73 84 27 46 3 23 85 17 5 98 19 10
^Z
6 18 12 91 93 54 23 73 84 27 46 3 23 85 17 5 98 19 10
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 98
50
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
<
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
>
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
|
<
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
| |
>
-1
10
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
>
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
|
>
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
|
<
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
| |
<
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
||
=
3
12
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
>
3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93
|
|
Imperative Programming
- 8 © Pedro Guerreiro
=
4
98
|
98
|
98
98
98
|
98
98
98
98
98
|
98
9
Logarithmic complexity
• With binary search, the search interval is
halved at each step.
• For example, if the array has 1000
elements, on the second step we only
consider 500, and then 250, 125, 62, 31,
15, 7, 3, 1, approximately.
• Thus, the number of steps is proportional
to the logarithm (in base 2) of the number
of element of the vector.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
10
Linear search vs. binary search
• Consider an array with N elements. In order to find a element
with a given value, knowing that such an element exists, using
linear search we need N/2 comparisons, on the average. If we
do not know whether the element exists or not, we need N
comparisons to decide that it does not.
• If the array is sorted, and we use binary search, at each step,
the search interval is halved. This implies that with about
log2N comparisons (one in each step inside the command
while) we will decide whether the element exists or not.
• Thus, binary search is much more efficient than linear search.
On the other hand, linear search is more general, because
binary search can only be used when the array is sorted by
the criterion whose value we are looking for.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
11
Is Gambelas a Portuguese city?
• So far, we have been using only arrays of
numbers.
• It’s time to experiment with arrays of
strings.
• Let us consider a problem: we have the list
of the Portuguese cities, stored in a text
file. Using this list, we want find out if a
given town has the dignity of being a city.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
12
Are you a city?
• We want to write an interactive program that
accept names of towns and replies “sim”, if the
town is a city, and “não” if it is not.
• The program read the list of city names from a
file and stores them in an array. The names are
lower case only, without any diacritical marks.
• The list is sorted, so the array will be sorted and
we can use binary search.
• But, as a warm up, we will also try linear search.
The array is rather small, and we will not be able
to tell the difference in execution time between
the two algorithms.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
13
Portuguese cities
This is the 2010 official
list, with 156 cities.
abrantes, agualva-cacem, agueda, albufeira, alcacer do sal, alcobaca, almada, almeirim,
alverca do ribatejo, amadora, amarante, amora, anadia, angra do heroismo, aveiro, barcelos,
barreiro, beja, borba, braga, braganca, caldas da rainha, camara de lobos, canico,
cantanhede, cartaxo, castelo branco, chaves, coimbra, costa da caparica, covilha, elvas,
entroncamento, ermesinde, esmoriz, espinho, esposende, estarreja, estremoz, evora, fafe,
faro, fatima, felgueiras, figueira da foz, fiaes, freamunde, funchal, fundao, gafanha da nazare,
gandra, gondomar, gouveia, guarda, guimaraes, horta, ilhavo, lagoa, lagos, lamego, leiria,
lisboa, lixa, loule, loures, lourosa, macedo de cavaleiros, machico, maia, mangualde, marco
de canaveses, marinha grande, matosinhos, mealhada, meda, miranda do douro, mirandela,
montemor-o-novo, montijo, moura, odivelas, olhao da restauracao, oliveira de azemeis,
oliveira do bairro, oliveira do hospital, ourem, ovar, pacos de ferreira, paredes, penafiel,
peniche, peso da regua, pinhel, pombal, ponta delgada, ponte de sor, portalegre, portimao,
porto, povoa de santa iria, povoa de varzim, praia da vitoria, quarteira, queluz, rebordosa,
reguengos de monsaraz, ribeira grande, rio maior, rio tinto, sabugal, sacavem, samora
correia,santa comba dao, santa cruz, santa maria da feira, santana, santarem, santiago do
cacem, santo tirso, sao joao da madeira, sao mamede de infesta, sao pedro do sul,sao
salvador de lordelo, seia, seixal, senhora da hora, serpa, setubal, silves, sines, tarouca,
tavira, tomar, tondela, torres novas, torres vedras, trancoso, trofa, valbom, vale de cambra,
valenca, valongo, valpacos, vendas novas, viana do castelo, vila baleira, vila do conde, vila
franca de xira, vila nova de famalicao, vila nova de foz coa, vila nova de gaia, vila nova de
santo andre, vila real, vila real de santo antonio, viseu, vizela.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
14
Global declarations
• Watch how we declare an array of strings,
where all the string have the same given
capacity:
The capacity of the array
#define MAX_CITIES 200
#define MAX_LEN 64
of strings is MAX_CITIES.
The capacity of each string
is MAX_LEN – 1 (because
one position is used by the
string terminator).
char cities[MAX_CITIES][MAX_LEN];
int n_cities;
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
15
Reading strings by line
• We read line by line, with gets:
void read_cities (void)
{
n_cities = 0;
while (gets (cities [n_cities]) != NULL)
n_cities++;
}
At the end of data,
function gets returns
NULL.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
16
Writing an array of strings
• This is just normal:
void write_cities ()
{
int i;
for (i = 0; i < n_cities; i++)
printf ("%s\n", cities[i]);
}
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
17
Searching strings, linearly
• We use the linear search algorithm, noting
the equality of string is not represented by
the operator ==.
• In fact, equality of strings is computed by
library function strcmp.
Pronounce “string
• How is strcmp used?
compare”:
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
18
strcmp
• strcmp is the function used for comparing strings. It
returns an integer.
• These are the rules:
strcmp (x, y) returns a negative number if x, as a
string, would be placed before y in an alphabetical
list of strings.
strcmp (x, y) returns zero if the value of x as a
string, is equal to the value of y.
strcmp (x, y) returns a positive number if x, as a
string, would be placed after y in an alphabetical
list of strings.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
19
Examples with strcmp
• Watch:
void test_strcmp (void)
{
printf ("%d\n", strcmp ("lisboa", "porto"));
printf ("%d\n", strcmp ("portalegre", "faro"));
printf ("%d\n", strcmp ("porto", "portalegre"));
printf ("%d\n", strcmp ("faro", "faro"));
printf ("%d\n", strcmp ("lisboa", "castelo branco"));
printf ("%d\n", strcmp ("", "loule"));
printf ("%d\n", strcmp ("alcoutim", ""));
printf ("%d\n", strcmp ("", ""));
}
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
-1
1
1
0
1
-1
1
0
20
find_city
• Watch how strcmp is used:
int find_city (char *x)
{
int i;
for (i = 0; i < n_cities; i++)
if (strcmp (cities[i], x) == 0)
return i;
return -1;
}
21 March 2011
Linear search in the
global array cities.
This means: if cities[i]
and x are equal, etc.
Imperative Programming - 8 © Pedro Guerreiro
21
Binary search in int bfind_city (char *x)
{
an array of strings int i = 0;
int j = n_cities-1;
while (i <= j)
{
int m = (i+j) / 2;
int z = strcmp (cities[m], x);
if (z > 0)
j = m-1;
else if (z < 0)
i = m+1;
else
return m;
}
return -1;
• It is the same
algorithm, adapted,
for strings and
string comparisons:
Variable z is used to
avoid calling strcmp
twice.
}
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
22
trofa
valbom
vale de cambra
valenca
valongo
valpacos
vendas novas
viana do castelo
vila baleira
vila do conde
vila franca de xira
vila nova de famalicao
vila nova de foz coa
void test_find_city (void)
vila nova de gaia
vila nova de santo andre
{
vila real
vila real de santo antonio
char name [MAX_LEN];
viseu
vizela
read_cities ();
^Z
while (gets (name) != NULL)
faro
End of data, entered
sim
{
by copy-paste on the
sim
console. The lines
monchique
int r = find_city (name);
não
above are the last
printf ("%s\n", r != -1 ? "sim" : "não");
não
lines of the input (not
almada
r = bfind_city (name);
sim
the complete input).
sim
printf ("%s\n", r != -1 ? "sim" : "não");
vila real
sim
}
sim
}
gambelas
não
21 March 2011
Imperative Programming - 8 © Pedronão
Guerreiro
23
^Z
Testing
• We test both search
functions at once:
Exercises
• Program a bubblesort function to sort the array cities.
(Hint: do not use = to assign strings; use function strcpy
instead).
• Program a function to compute the position of the first
city whose name starts by a given letter, or -1 if there is
none. Consider linear search and binary search.
• Program a function to compute the position of the city
with the longest name.
• Program a function that write the names of all cities, with
all cities starting with the same letter on the same line,
separated with commas.
• Program a function to print the binary logarithms of the
powers of 10.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
24
Control
• Which is more efficient: linear search or
binary search?
• What is the main difference between
searching int arrays and string arrays?
• What are the rules of strcmp?
• How do we declare an array of strings?
• What does gets return at the end of file?
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
25
Tomorrow
• We will start studying pointers.
• Pointers are an important ingredient in C
programming.
• C is the language for pointers ☺
• Newer languages avoid pointers. In C, we
use pointers everywhere.
21 March 2011
Imperative Programming - 8 © Pedro Guerreiro
26
Download

Imperative Programming