Class 10 Slides Solution to Dutch National Flag problem DNF Solution, continued The solution in code General sorting; Selection sort Selection sort implementation Insertion sort Insertion sort, both versions implemented for comparison Searching and sorting with files The merge operation Merge code MergeSort split operation MergeSort merge operation MergeSort analysis MergeSort code APStack template class (AB only) Using a stack for evaluating an arithmetic expression Another sample stack program: converting to binary; Lab APQueue template class (AB only) Lab: Palindrome checking with stacks and queues -------------------------------------------------------------------------------- Solution to Dutch National Flag problem Start: B R W B W W R B R Finish: R R R W W W B B B We need three markers, one for each color. The reds will accumulate rightwards from the left end, the blues leftwards from the right end. And the whites will accumulate rightwards from the right end of the reds! Here's the invariant diagrammatically: R R R W W W ? ? ? ? ? B B B ­ redMarker ­ whiteMarker ­ blueMarker And here is a more formal logical description of the invariant: for 0 <= i < redMarker, flag[i] = RED for redMarker <= i < whiteMarker, flag[i] = WHITE for blueMarker < i < N, flag[i] = BLUE -------------------------------------------------------------------------------- DNF Solution, continued In order to "make progress towards termination", whiteMarker and blueMarker have to move at least one position closer to each other each time through our main program loop. Notice that redMarker always points either to a WHITE, or to the same position as whiteMarker. So redMarker isn't going to play a role in determining what action to take. In fact, we can always make progress based only on the value pointed to by whiteMarker. Here are the three possibilities; what action should we take in each case? R R R W W W W ? ? ? ? B B B ­ redMarker ­ whiteMarker ­ blueMarker R R R W W W B ? ? ? ? B B B ­ redMarker ­ whiteMarker ­ blueMarker R R R W W W R ? ? ? ? B B B ­ redMarker ­ whiteMarker ­ blueMarker One last question: when do we stop? -------------------------------------------------------------------------------- The solution in code enum Colors { RED, WHITE, BLUE }; int redMarker, whiteMarker, blueMarker; apvector Flag(N); redMarker = whiteMarker = 0; blueMarker = N - 1; while (whiteMarker <= blueMarker) { switch (Flag[whiteMarker]) { case WHITE: whiteMarker++; break; case RED: swap(Flag[redMarker], Flag[whiteMarker]); redMarker++; whiteMarker++; break; case BLUE: swap(Flag[whiteMarker], Flag[blueMarker]); blueMarker--; break; } } How many times will the loop be executed? What is the order of the algorithm? -------------------------------------------------------------------------------- General sorting We'll look at three algorithms today (and one later when we do recursion). Selection sort and insertion sort are best when you are sorting an array and want to use a minimal amount of extra space. They both do rearrangement primarily by swapping values, as with the Dutch flag problems. They are both O(N2) in the worst case. The third sort, MergeSort, can also be performed on an array, but is most common and appropriate when sorting values stored in files. Selection sort Given: an array of N unordered elements. We'll use integer values. Strategy: find the lowest value, put it in the first position. Find the next lowest, put it in the second position. Find the next lowest, put it in the third position. ... Start: 18 35 22 97 84 55 61 10 72 Intermediate state: 10 18 22 ? ? 55 ? ? ? ­ pos Invariant: for 0 < i < pos, A[i-1] <= A[i] (that is, the elements to the left of pos are in non-decreasing order) Every time we go through our loop, pos gets moved one position to the right, so we make progress, and can stop when it reaches the right end. -------------------------------------------------------------------------------- Selection sort implementation I'm assuming a swap function has been defined for integers. I write one helper function, which will make the main selection sort function very brief. int PosOfLowest(const apvector& A, int left, int right) // Precondition: A is an initialized array, left & right are valid // positions in the array (0 <= left < A.length, same for right) // Postcondition: returns the position of the lowest value found // between A[left] and A[right], inclusive { int pos, lowestPos = left; for (pos = left + 1; pos <= right; pos++) if (A[pos] < A[lowestPos]) lowestPos = pos; return lowestPos; } void SelectionSort(apvector& A) { int pos, N = A.length(); for (pos = 0; pos < N; pos++) swap(A[pos], A[PosOfLowest(A, pos, N - 1)]; } There's a small simplification that can be made. What is it? Why is this sort O(N2)? Are there any cases where it does better than O(N2)? -------------------------------------------------------------------------------- Insertion sort Suppose you have an array that's only partially filled with values which are already in order, and you're asked to add just one more value, in its proper place in order. Start: 2 3 5 13 21 23 ? ? ? ­ insertPos ­ pos Finish: 2 3 5 10 13 21 23 ? ? Steps performed: Determine where the new value belongs Adjust array to make room for it Put new value in its place The values to be inserted are actually stored in the unknown, unsorted part of the array. Our invariant will be the same as in selection sort: all the values to the left of pos are sorted, and pos gets moved one position to the right each time through our loop. We could search from left to right, starting from position 0 and ending when the proper value for insertPos is found. Then we'd have to shift all the elements from insertPos to pos over one to the left. This would take two loops to accomplish, one for the search and one for the shifting. It's more efficient, though, to search from right to left, starting from pos - 1, and shifting as we go, until the proper value for insertPos is found, at which point all the shifting would have already been done. This takes only one loop. -------------------------------------------------------------------------------- Insertion sort, both versions implemented for comparison Left-to-right search, right-to-left shift void InsertionSort(apvector& A) { int N = A.length(); int pos, insertPos, shiftPos; for (pos = 1; pos < N; pos++) { for (insertPos = 0; insertPos < pos; insertPos++) if (A[insertPos] >= A[pos]) break; for (shiftPos = pos ; shiftPos > insertPos; shiftPos--) swap(A[shiftPos], A[shiftPos - 1]); } } Right-to-left search & shift combined void InsertionSort(apvector& A) { int N = A.length(); int pos, insertPos; for (pos = 1; pos < N; pos++) { for (insertPos = pos ; insertPos > 0; insertPos--) { if (A[insertPos - 1] <= A[insertPos]) break; swap(A[insertPos], A[insertPos - 1]); } } } Why is this sort O(N2)? Are there any cases where it does better than O(N2)? What's the best case? -------------------------------------------------------------------------------- Searching and sorting with files Assume the file contains values one per line. To search for a particular value, there's not much choice if the file is sequential access; you have to do linear search, whether the values are sorted or not. If there's a random access mechanism and the values are sorted, you can use the binary search algorithm. How to sort the values in a file? One possibility is to read them into an array, sort them there, then write them back out to the file. That's assuming you know (or can determine) the number of values in the file. If it's a very large file, that means a very large array. Another possibility is to use MergeSort, which uses the original file plus two temporary files for storage, and no arrays. On top of that, it runs in O(N log N), which is better than O(N2)! -------------------------------------------------------------------------------- The merge operation One component of MergeSort is based on the operation of merging two lists of sorted values. Suppose I have two files, File1 and File2, containing sorted lists of names. To merge them, I follow these steps: Start at the beginnings of both files. Examine one value from File1 and File2. Write the smaller value to the output file and get the next value from the file that value came from. Continue until you reach the end of one of the files. Write the remaining values from the non-empty file into the output file. For instance: File1 File2 Outfile Bill Aaron Eric Dave Justin Lori Mark Martin Mike Matt Scott Stan Tony -------------------------------------------------------------------------------- Merge code Here is a function which takes as parameters two ifstreams and one ofstream, which it assumes are already open, and merges the values in the two input files. void MergeFiles(ifstream& infile1, ifstream& infile2, ofstream& outfile) { apstring name1, name2; getline(infile1, name1); getline(infile2, name2); while (name1 != "" && name2 != "") { if (name1 <= name2) { outfile << name1 << endl; getline(infile1, name1); } else { outfile << name2 << endl; getline(infile2, name2); } } while (name1 != "") { outfile << name1 << endl; getline(infile1, name1); } while (name2 != "") { outfile << name2 << endl; getline(infile1, name2); } } -------------------------------------------------------------------------------- MergeSort split operation In MergeSort, we start with a single file. In order to do a merge, we have to split it into two files. One way would be just to write values alternately to two files. A better way, though, is to alternately write runs of values to the two files. A run is an ordered sequence of elements in a file. The picture below shows the performance of a split operation, with the runs marked. -------------------------------------------------------------------------------- MergeSort merge operation The merge operation behaves similarly to merging two sorted files, but it merges pairs of runs. Any unpaired runs left over when one file is exhausted just get copied into the output file. Here's a merge operation: -------------------------------------------------------------------------------- MergeSort analysis Eventually, the original file will consist of just one run; that is, it will be sorted! I already told you this one's O(N log N). Here's why: Each split & merge cuts the number of runs in the file at least in half. There are no more than N runs (where N is the number of values) in the original file. The split and merge operations are both O(N). So the whole algorithm is O(N log N) in the worst case! Why would this algorithm be more awkward to implement with arrays? -------------------------------------------------------------------------------- MergeSort code #include "iostream.h" #include "fstream.h" #include "bool.h" #include "apstring.h" #include "stdio.h" const apstring directory = "C:\\temp\\"; const apstring filename = "test.txt"; const apstring filename1 = "test1.txt"; const apstring filename2 = "test2.txt"; void MergeSort(const apstring& filename); void Split(const apstring& inName, const apstring& outName1, const apstring& outName2); int MergeRuns(const apstring& outName, const apstring& inName1, const apstring& inName2); void CopyRun(ifstream& inFile, ofstream& outFile, apstring& inBuf); bool CopyOne(ifstream& inFile, ofstream& outFile, apstring& inBuf); bool EmptyBuf(const apstring& buffer); void ShowFile(const apstring& filename); main() { MergeSort(directory + filename); return 0; } void MergeSort(const apstring& filename) // Precondition: filename is the name of an existing file containing // words to be sorted, one per line // Postcondition: the words in the file are sorted { apstring tempname1, tempname2; tempname1 = directory + tmpnam(NULL); tempname2 = directory + tmpnam(NULL); while (true) { cout << "Split:" << endl; Split(filename, tempname1, tempname2); ShowFile(tempname1); ShowFile(tempname2); cout << "MergeRuns:" << endl; if (MergeRuns(filename, tempname1, tempname2) == 1) break; ShowFile(filename); } remove(tempname1.c_str()); remove(tempname2.c_str()); ShowFile(filename); } void Split(const apstring& inName, const apstring& outName1, const apstring& outName2) // Precondition: inName is the name of an existing file // Postcondition: The files named by outName1 and outName2 contain // alternate sorted runs of the elements in the file inName. // All files are closed before the function returns. { ifstream inFile; ofstream outFile1, outFile2; apstring inBuf; inFile.open(inName.c_str()); outFile1.open(outName1.c_str()); outFile2.open(outName2.c_str()); getline(inFile, inBuf); while (true) { CopyRun(inFile, outFile1, inBuf); if (EmptyBuf(inBuf)) break; CopyRun(inFile, outFile2, inBuf); if (EmptyBuf(inBuf)) break; } inFile.close(); outFile1.close(); outFile2.close(); } int MergeRuns(const apstring& outName, const apstring& inName1, const apstring& inName2) { ofstream outFile; ifstream inFile1, inFile2; bool endRun1, endRun2; int numOfRuns = 0; apstring inBuf1, inBuf2; inFile1.open(inName1.c_str()); inFile2.open(inName2.c_str()); outFile.open(outName.c_str()); getline(inFile1, inBuf1); getline(inFile2, inBuf2); while (!EmptyBuf(inBuf1) || !EmptyBuf(inBuf2)) { endRun1 = endRun2 = false; while (!endRun1 && !endRun2) { if (inBuf1 < inBuf2) endRun1 = CopyOne(inFile1, outFile, inBuf1); else endRun2 = CopyOne(inFile2, outFile, inBuf2); } if (endRun1) CopyRun(inFile2, outFile, inBuf2); if (endRun2) CopyRun(inFile1, outFile, inBuf1); numOfRuns++; } while (!EmptyBuf(inBuf1)) { CopyRun(inFile1, outFile, inBuf1); numOfRuns++; } while (!EmptyBuf(inBuf2)){ CopyRun(inFile2, outFile, inBuf2); numOfRuns++; } inFile1.close(); inFile2.close(); outFile.close(); return numOfRuns; } void CopyRun(ifstream& inFile, ofstream& outFile, apstring& inBuf) // Precondition: inFile and outFile are already opened, and inBuf // contains the next entry to be processed from inFile. // Postcondition: Copies one run from inFile to outFile. { while (!(CopyOne(inFile, outFile, inBuf))); } bool CopyOne(ifstream& inFile, ofstream& outFile, apstring& inBuf) // Precondition: inFile and outFile are already opened, and inBuf // contains the next entry to be processed from inFile. // Postcondition: If inBuf is not empty, copies its contents to outFile and reads next line // from inFile into inBuf. Return value is true if inBuf is empty, if the end of inFile has // been reached, or if the entry copied is greater than the next entry from inFile (that // is, the end of the run has been detected). { if (EmptyBuf(inBuf)) return true; outFile << inBuf << endl; if (inFile.eof()) { inBuf = ""; return true; } else { apstring previousBuf = inBuf; getline(inFile, inBuf); return (previousBuf > inBuf); } } bool EmptyBuf(const apstring& buffer) // Precondition: buffer is an apstring // Postcondition: Returns true if buffer is the empty string. { return (buffer == ""); } -------------------------------------------------------------------------------- APStack template class (AB only) Switching gears... There are two more AP classes to cover, both template classes, and both included only in the AB curriculum. The classes are apstack and apqueue. A stack is a data structure which behaves similarly to a stack of cafeteria trays. Trays can get placed on the top of the stack ("push"). Trays get taken off the top of the stack ("pop"). Only the top tray is easily accessible. A stack is called a "last-in-first-out", or LIFO, data structure. The apstack class offers the following methods: apstack(); // construct empty stack apstack(const apstack& s); // copy constructor ~apstack(); // destructor const apstack& operator = (const apstack& rhs); const itemType& top() const; // return top element, without popping bool isEmpty() const; // return true if no elements on the stack int length() const; // number of elements on stack void push(const itemType& item); // push item on top of stack void pop(); // pop top item (and discard) void pop(itemType& item); // pop top item, store in output param void makeEmpty(); // remove all items from stack -------------------------------------------------------------------------------- Using a stack for evaluating an arithmetic expression The following code uses a stack to calculate the value of this arithmetic expression: (1 + 5) * (8 - (4 - 1)) We're used to writing arithmetic expressions in what's called infix notation, where a binary operator is written between its two operands. A compiler will have to translate this into postfix form, where the operator appears after the two operands. It also has to do some work to account for parentheses and operator precedence. The postfix form of the above expression is: 1 5 + 8 4 1 - - * The expression is evaluated from left to right. If the next value is a number, it gets pushed on the stack. If it's an operator, we pop two values from the stack, apply the operator to them, and push the answer back on the stack. What's left when we've exhausted the expression is the answer as the single value on the stack. #include "apstack.h" int main() { apstack numbers; int x, y; numbers.push(1); numbers.push(5); numbers.pop(y); numbers.pop(x); numbers.push(x + y); numbers.push(8); numbers.push(4); numbers.push(1); numbers.pop(y); numbers.pop(x); numbers.push(x - y); numbers.pop(y); numbers.pop(x); numbers.push(x - y); numbers.pop(y); numbers.pop(x); numbers.push(x * y); cout << "The answer is " << numbers.top() << endl; return 0; } -------------------------------------------------------------------------------- Another sample stack program: converting to binary This program uses a stack to convert a decimal number into binary. This can be done by successively dividing by 2, taking the remainders each time and stringing them together. The only problem is that the bits in the answer are generated in the opposite order that they should be printed. The last bit to be calculated should be the first displayed (far left of binary number), the first bit calculated should be the last displayed (far right of number). A stack is appropriate here because you take the elements off of it in the opposite order they were put on. The algorithm goes like this: Create an empty stack Read in the Number to be converted While Number not equal to 0 find the remainder when Number is divided by 2 push remainder on the stack divide Number by 2 While stack isn't empty pop top remainder off and display You can check this for the number 26, whose binary equivalent is 110102. Lab: Implement the decimal-to-binary conversion algorithm. Allow the user to input a number to be converted. -------------------------------------------------------------------------------- APQueue template class (AB only) A queue is a data structure which behaves similarly to a supermarket check-out line. You join the line by standing at the end ("enqueue"). The cashier helps the person at the front of the line ("dequeue"). A queue is called a "first-in-first-out", or FIFO, data structure. The apqueue class offers the following methods: apqueue(); // construct empty queue apqueue(const apqueue& s); // copy constructor ~apqueue(); // destructor const apqueue& operator = (const apqueue& rhs); const itemType& front() const; // return front element, no dequeueing bool isEmpty() const; // return true if no elements in the queue int length() const; // number of elements in queue void enqueue(const itemType& item); // enqueue item at end of queue void dequeue(); // dequeue front item (and discard) void dequeue(itemType& item); // dequeue front item, store in output param void makeEmpty(); // remove all items from queue -------------------------------------------------------------------------------- Lab: Use a stack and a queue to implement a palindrome checker (not a very efficient one, but that's not the point). Input a line from the user, character by character (use get()). Push each character on the stack, and enqueue it in the queue. When you reach the end of the line, the stack will have the last character on top, and the queue will have the first character at the front. Compare the top character of the stack and the front character on the queue, then pop/dequeue them, until you find two that don't match (not a palindrome) or you empty either the stack or queue (a palindrome). The apstack and apqueue classes use arrays as the internal data structure representation. Next week we will look at another way to implement stacks and queues, using pointers and linked lists.