Imperative Programming Lesson 8 Binary search Today we will… • Program the binary search algorithm • Learn to use binary search. • Experiment it with arrays of strings. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 2 Why do we sort? To better find! 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 3 Binary search, common sense • Linear search (function find) computes the position of the first occurrence of a given value. • It is an exhaustive search: if the given value is not present in the array, we only find that out after having checked all the elements in the array. • If the array is sorted, we know, by common sense, that we may find the given value, or conclude that it is not present in the array, without having to check all the elements. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 4 Binary search • There is a sorted array where we want to find a given value. We split the array in two equal sized halves, and look at the middle element. There are three possibilities: The middle element is greater than the given value; in this case, we search again but consider only the first half. The middle element is less than the given value; in this case, we search again but consider only the first half. The middle element is equal to the given value: we found it! 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 5 Function bfind int bfind (int *a, int n, int x) { Variable i represents the left int i = 0; Only works if the boundary of the search array is sorted. int j = n-1; interval; variable j, the right boundary. while (i <= j) { Variable m represents the int m = (i+j) / 2; subscript of the middle if (a[m] > x) element. j = m-1; The middle element is greater else if (a[m] < x) than the given value: i = m+1; The middle element is less else than the given value: return m; Bingo! } The given element does not return -1; exist in the array. 21 March 2011 Imperative Programming -}8 © Pedro Guerreiro 6 Playing with bfind void test_bfind (void) { int a [1000]; int n = numbers_read (stdin, a); int x; assert (is_sorted (a, n)); while (scanf ("%d", &x) != EOF) { int z = bfind (a, n, x); printf ("%d\n", z); } } 21 March 2011 4 7 12 23 45 61 62 79 88 89 90 93 ^Z 70 -1 45 4 4 0 3 3 3 5 5 5 5 5 5 8 8 3 ^Z -1 4 93 -1 11 100 3 Note that if the 2 -1 given value 5 88 5 occurs more than 8 8 once, bfind does ^Z 9 not necessarily 100 find the first -1 occurrence. ^Z Imperative Programming - 8 © Pedro Guerreiro 7 Tracing bfind int trace_bfind (int *a, int n, int x) { int i = 0; Marks position i and j int j = n-1; with a vertical bar. while (i <= j) { char s [256]; int m = (i+j) / 2; ints_putfln ("%3d", a, n); printf ("%*s%*s\n", 3 * (i+1), "|", 3 * (j-i), "|"); printf ("%*s", 3 * (m+1), a[m] < x ? "<" : a[m] > x ? ">" : "="); gets(s); … } Marks the middle element with “<”, “>” or “=”, return -1; depending on whether it is less, it is greater } or it is equal to the given value. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 8 Watching the trace 50 does not exist. 10 exists. 21 March 2011 12 exists. 6 18 12 91 93 54 23 73 84 27 46 3 23 85 17 5 98 19 10 ^Z 6 18 12 91 93 54 23 73 84 27 46 3 23 85 17 5 98 19 10 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 98 50 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | < 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | > 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | | < 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | | > -1 10 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | > 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | | > 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | | < 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | | < 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 || = 3 12 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | > 3 5 6 10 12 17 18 19 23 23 27 46 54 73 84 85 91 93 | | Imperative Programming - 8 © Pedro Guerreiro = 4 98 | 98 | 98 98 98 | 98 98 98 98 98 | 98 9 Logarithmic complexity • With binary search, the search interval is halved at each step. • For example, if the array has 1000 elements, on the second step we only consider 500, and then 250, 125, 62, 31, 15, 7, 3, 1, approximately. • Thus, the number of steps is proportional to the logarithm (in base 2) of the number of element of the vector. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 10 Linear search vs. binary search • Consider an array with N elements. In order to find a element with a given value, knowing that such an element exists, using linear search we need N/2 comparisons, on the average. If we do not know whether the element exists or not, we need N comparisons to decide that it does not. • If the array is sorted, and we use binary search, at each step, the search interval is halved. This implies that with about log2N comparisons (one in each step inside the command while) we will decide whether the element exists or not. • Thus, binary search is much more efficient than linear search. On the other hand, linear search is more general, because binary search can only be used when the array is sorted by the criterion whose value we are looking for. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 11 Is Gambelas a Portuguese city? • So far, we have been using only arrays of numbers. • It’s time to experiment with arrays of strings. • Let us consider a problem: we have the list of the Portuguese cities, stored in a text file. Using this list, we want find out if a given town has the dignity of being a city. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 12 Are you a city? • We want to write an interactive program that accept names of towns and replies “sim”, if the town is a city, and “não” if it is not. • The program read the list of city names from a file and stores them in an array. The names are lower case only, without any diacritical marks. • The list is sorted, so the array will be sorted and we can use binary search. • But, as a warm up, we will also try linear search. The array is rather small, and we will not be able to tell the difference in execution time between the two algorithms. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 13 Portuguese cities This is the 2010 official list, with 156 cities. abrantes, agualva-cacem, agueda, albufeira, alcacer do sal, alcobaca, almada, almeirim, alverca do ribatejo, amadora, amarante, amora, anadia, angra do heroismo, aveiro, barcelos, barreiro, beja, borba, braga, braganca, caldas da rainha, camara de lobos, canico, cantanhede, cartaxo, castelo branco, chaves, coimbra, costa da caparica, covilha, elvas, entroncamento, ermesinde, esmoriz, espinho, esposende, estarreja, estremoz, evora, fafe, faro, fatima, felgueiras, figueira da foz, fiaes, freamunde, funchal, fundao, gafanha da nazare, gandra, gondomar, gouveia, guarda, guimaraes, horta, ilhavo, lagoa, lagos, lamego, leiria, lisboa, lixa, loule, loures, lourosa, macedo de cavaleiros, machico, maia, mangualde, marco de canaveses, marinha grande, matosinhos, mealhada, meda, miranda do douro, mirandela, montemor-o-novo, montijo, moura, odivelas, olhao da restauracao, oliveira de azemeis, oliveira do bairro, oliveira do hospital, ourem, ovar, pacos de ferreira, paredes, penafiel, peniche, peso da regua, pinhel, pombal, ponta delgada, ponte de sor, portalegre, portimao, porto, povoa de santa iria, povoa de varzim, praia da vitoria, quarteira, queluz, rebordosa, reguengos de monsaraz, ribeira grande, rio maior, rio tinto, sabugal, sacavem, samora correia,santa comba dao, santa cruz, santa maria da feira, santana, santarem, santiago do cacem, santo tirso, sao joao da madeira, sao mamede de infesta, sao pedro do sul,sao salvador de lordelo, seia, seixal, senhora da hora, serpa, setubal, silves, sines, tarouca, tavira, tomar, tondela, torres novas, torres vedras, trancoso, trofa, valbom, vale de cambra, valenca, valongo, valpacos, vendas novas, viana do castelo, vila baleira, vila do conde, vila franca de xira, vila nova de famalicao, vila nova de foz coa, vila nova de gaia, vila nova de santo andre, vila real, vila real de santo antonio, viseu, vizela. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 14 Global declarations • Watch how we declare an array of strings, where all the string have the same given capacity: The capacity of the array #define MAX_CITIES 200 #define MAX_LEN 64 of strings is MAX_CITIES. The capacity of each string is MAX_LEN – 1 (because one position is used by the string terminator). char cities[MAX_CITIES][MAX_LEN]; int n_cities; 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 15 Reading strings by line • We read line by line, with gets: void read_cities (void) { n_cities = 0; while (gets (cities [n_cities]) != NULL) n_cities++; } At the end of data, function gets returns NULL. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 16 Writing an array of strings • This is just normal: void write_cities () { int i; for (i = 0; i < n_cities; i++) printf ("%s\n", cities[i]); } 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 17 Searching strings, linearly • We use the linear search algorithm, noting the equality of string is not represented by the operator ==. • In fact, equality of strings is computed by library function strcmp. Pronounce “string • How is strcmp used? compare”: 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 18 strcmp • strcmp is the function used for comparing strings. It returns an integer. • These are the rules: strcmp (x, y) returns a negative number if x, as a string, would be placed before y in an alphabetical list of strings. strcmp (x, y) returns zero if the value of x as a string, is equal to the value of y. strcmp (x, y) returns a positive number if x, as a string, would be placed after y in an alphabetical list of strings. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 19 Examples with strcmp • Watch: void test_strcmp (void) { printf ("%d\n", strcmp ("lisboa", "porto")); printf ("%d\n", strcmp ("portalegre", "faro")); printf ("%d\n", strcmp ("porto", "portalegre")); printf ("%d\n", strcmp ("faro", "faro")); printf ("%d\n", strcmp ("lisboa", "castelo branco")); printf ("%d\n", strcmp ("", "loule")); printf ("%d\n", strcmp ("alcoutim", "")); printf ("%d\n", strcmp ("", "")); } 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro -1 1 1 0 1 -1 1 0 20 find_city • Watch how strcmp is used: int find_city (char *x) { int i; for (i = 0; i < n_cities; i++) if (strcmp (cities[i], x) == 0) return i; return -1; } 21 March 2011 Linear search in the global array cities. This means: if cities[i] and x are equal, etc. Imperative Programming - 8 © Pedro Guerreiro 21 Binary search in int bfind_city (char *x) { an array of strings int i = 0; int j = n_cities-1; while (i <= j) { int m = (i+j) / 2; int z = strcmp (cities[m], x); if (z > 0) j = m-1; else if (z < 0) i = m+1; else return m; } return -1; • It is the same algorithm, adapted, for strings and string comparisons: Variable z is used to avoid calling strcmp twice. } 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 22 trofa valbom vale de cambra valenca valongo valpacos vendas novas viana do castelo vila baleira vila do conde vila franca de xira vila nova de famalicao vila nova de foz coa void test_find_city (void) vila nova de gaia vila nova de santo andre { vila real vila real de santo antonio char name [MAX_LEN]; viseu vizela read_cities (); ^Z while (gets (name) != NULL) faro End of data, entered sim { by copy-paste on the sim console. The lines monchique int r = find_city (name); não above are the last printf ("%s\n", r != -1 ? "sim" : "não"); não lines of the input (not almada r = bfind_city (name); sim the complete input). sim printf ("%s\n", r != -1 ? "sim" : "não"); vila real sim } sim } gambelas não 21 March 2011 Imperative Programming - 8 © Pedronão Guerreiro 23 ^Z Testing • We test both search functions at once: Exercises • Program a bubblesort function to sort the array cities. (Hint: do not use = to assign strings; use function strcpy instead). • Program a function to compute the position of the first city whose name starts by a given letter, or -1 if there is none. Consider linear search and binary search. • Program a function to compute the position of the city with the longest name. • Program a function that write the names of all cities, with all cities starting with the same letter on the same line, separated with commas. • Program a function to print the binary logarithms of the powers of 10. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 24 Control • Which is more efficient: linear search or binary search? • What is the main difference between searching int arrays and string arrays? • What are the rules of strcmp? • How do we declare an array of strings? • What does gets return at the end of file? 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 25 Tomorrow • We will start studying pointers. • Pointers are an important ingredient in C programming. • C is the language for pointers ☺ • Newer languages avoid pointers. In C, we use pointers everywhere. 21 March 2011 Imperative Programming - 8 © Pedro Guerreiro 26