**Due Date: Friday 10/15 11:59PM.**

## Navigation

- A. Intro
- B. Estimating Efficiency by Timing Functions
- C. Practice with Asymptotics and Maps
- D. Submission

## A. Intro

In lecture you have been learning about complexity theory and how to determine the asymptotic bounds of functions. This practice has involved computing theoretical bounds, but it can be useful to tie this theory into real examples. This is exactly what you will be doing in this lab!

We consider some alternative ways of measuring the efficiency of a given code segment. Given a function `f`

, we want to find out how fast that function runs. One way of doing this is to take out a stopwatch and measure exactly how much time it takes for `f`

to run on some input. We will look at some plots of runtime to understand how some functions' runtimes change with the size of their input. However, we will also see that there are problems with using elapsed time for this purpose. These empirical measures are not meant to replace the better models for analyzing runtime complexity from lecture and discussion, but merely provide another perspective on the runtime of functions. You will conduct several experiments, discuss the results with your partner, and then record some of the answers from your discussion.

The next part of the lab will take a look at different implementations of maps. We have provided a complete implementation to a map which uses a linked list to store key-value pairs. You will be asked you to complete the implementation of a faster map which uses a binary search trees (BST) to store the key-value pairs. After implementing your tree map, you should be able to compare its performance to `java.util`

's implementation of a BST map, `TreeMap`

and hopefully see that the performance is similar (at least on randomly inserted points).

To get started pull the files for lab 8 from the skeleton.

```
git fetch shared
git merge shared/lab8 -m "Start lab8"
git push
```

## B. Estimating Efficiency by Timing Functions

### Algorithms

An *algorithm* is an abstract notion that describes an approach for solving a problem. The code we write in this class, our programs, are implementations of algorithms. For example, consider the problem of sorting a list of numbers. There are many different algorithms that will allow us to sort a list of numbers, each of which might have its own runtime. We will learn more about many of these sorting algorithms later in the class, but we will introduce a few of them briefly here.

One algorithm we might use to solve this problem is called *bubble sort*. Bubble sort tells us we can sort a list by repeatedly looping through the list and swapping adjacent items if they are out of order, until the entire sorting is complete. Another algorithm we might use to solve this problem is called *insertion sort*. Insertion sort says to sort a list by repeatedly looping through our list, removing the minimum element, and putting it into a new list in the correct order.

Several websites like VisuAlgo, Sorting.at, Sorting Algorithms, and USF have developed some animations that can help us visualize these sorting algorithms. Spend a little time playing around with these demos to get an understanding of how much time it takes for bubble sort or insertion sort to completely sort a list. Again we'll revisit sorting in more detail later in this course, but for now, try to get a feeling of how long each algorithm takes to sort a list. How many comparisons does each sort need? And how many swaps? Can you guess the overall runtime? Don't spend too much time on this, we will revisit sorting in much more detail later in the class.

Since each comparison and each swap takes time, we want to know which is the faster algorithm. The runtime bounds that we compute using asymptotic analysis might not tell us this. Asymptotic bounds might tell us that two functions should run in time proportional to one another, but they would not tell us which of the two is actually faster. This is one case where it might be better to turn to empirical measures, despite their many potential problems.

### Time Estimates

To start, it seems that the most reasonable way to estimate the time an algorithm takes is to measure the time directly. Each computer has an internal clock that keeps track of time, usually in the number of fractions of a second that have elapsed since a given base date. The Java method that accesses the clock is `System.nanoTime()`

. We have provided the `timing.Timer`

class which implements functions that behave like a stopwatch which we'll use to time different operations. You should not have to interact directly with the `timing.Timer`

class, but it might be useful to understand how it is implemented and what functionality it has. Spend a few minutes familiarizing yourself with the file and discuss what each part of this class seems to be doing with your lab partner.

### Exercise: Timing Sorting Algorithms

The file `timing.Sorter.java`

contains multiple implementations of various sorting algorithms, including the bubble sort and insertion sort algorithms mentioned earlier. Each Sorter contains the following methods (some of which might be inherited from the abstract class `Sorter`

).

`int[] getRandomArray(int numElems)`

: Returns an array of the specified size filled with randomly generated values.`void sort(int[] arr)`

: Sort the given input array`arr`

.`List<Double> score(int by, int ntrials, int nrepeats)`

: Conduct a timing experiment. This will involve sorting`ntrials`

different size arrays starting at 5 and increasing in size by`by`

each time. The experiments will then be ran`nrepeats`

times.

Again you do not understand completely what each of these sorters is doing, but it might be interesting to see how each of them works. Pay close attention to the asymptotic runtimes of each of these different algorithms. The full list of algorithms contained and their asymptotic runtimes are as follows:

`BubbleSorter`

: This algorithm is as described above. On each iteration a forward pass is taken through the array or numbers, and any adjacent out of order elements are swapped. This continues until a forward pass is made and there are no swaps necessary. The time complexity for bubble sort is worst case $\Theta(N^2)$ where $N$ is the length of the array.`WipingBubbleSorter`

: This algorithm is similar to the bubble sort described above, but`WipingBubbleSorter`

will alternate between taking forward passes and backwards passes through the array to swap elements. This however does not change the time complexity which for the worst case is $\Theta(N^2)$ where $N$ is the length of the array.`InsertionSorter`

: This algorithm will build a progressively larger sublist of sorted numbers, by continually inserting the next value in sorted sublist. The worst case runtime for insertion sort is also $\Theta(N^2)$ where $N$ is the length of the array.`JavaSorter`

: This algorithm is Java's native sorting algorithm for arrays. The actual implementation of this algorithm is quicksort for primitive values and mergesort for Object values. You do not need to know what either of these algorithms do at this points, but the resulting time complexity is expected to be $\Theta(N \cdot \log(N))$ where $N$ is the length of the array.`CountingSorter`

: This algorithm (at least as it is coded here) does not allow for all values of integers. We override the`getRandomArray`

function to only contain values of a fixed size. This sort then counts each of the possible values, and fills in an array corresponding to the correct counts of each of the possible values. The time complexity for the worst case is $\Theta(N)$ where $N$ is the length of the array.

We would like to compare these sorters, and much as before with asymptotic analysis we are mainly concerned with how the speed of the algorithm changes as a function of the input size. The input size here will be the length of the array to be sorted. In order to do this we will be running a series of calls to each of the different sorting algorithms and then plotting the results in order to visualize how each of them appears to grow. *Optional: If you are interested in seeing how we are plotting these in Java, feel free to check out the timing.GraphUtil.java file. We will not be expecting you to understand the code here, but again it might be interesting to look at.*

The file `timing.SortTiming`

contains code to generate a plot of the runtimes of the different sorting implementations when sorting different sizes of arrays. Again feel free to examine this code, but you are not expected to understand everything that is going on here. You should be able to treat this class as a black box that accepts three arguments (or sets them by default) corresponding to the number of different array sizes it should test (`ntrials`

), the interval by which it should increase the array size between trials (`by`

), and the number of times it should repeat each size array per sorting implementation (`nrepeats`

). By default, the minimum size array is 5. When you run this program it will take some time to run all of the experiments then a nice plot should be opened on your computer.

For example, the commands

```
javac timing/SortTiming.java
java timing.SortTiming 100 50 10
```

uses the command-line arguments to plot all sorters' runtimes for 100 trials, each increasing by 50 elements, and repeating each trial 10 times (and averaging the result).

Try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of `SortTiming`

and alter the `ntrials`

, `by`

, and `nrepeats`

values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of `ntrials`

, `by`

, and `nrepeats`

to see how the graphs change!

### Discussion: Timing Results

Once you have the visualization up and running, answer each of these questions with your lab partner!

- Is one sorting algorithm
*always*faster than another? - Above we said that
`BubbleSort`

,`WipingBubbleSort`

, and`InsertionSort`

each had the same $\Theta(N^2)$ asymptotic time complexity. How can you explain the differences in the plots for these three algorithms? - What information can we gain from empirical analysis of algorithms which might not be as noticeable in asymptotical bounds?
- For any given sorting algorithm, does increasing the array size
*always*mean the sorting takes longer? - How does changing
`nrepeats`

change the plot? - Is your plot the
*exact*same as your partner's plot, even with the same values of`ntrials`

,`by`

, and`nrepeats`

? *Optional*: Look at the source code for`BubbleSorter`

to`WipingBubbleSorter`

. After looking at the plots, can you intuitively explain why`WipingBubbleSorter`

is usually 2x as fast as`BubbleSorter`

? (Hint: Consider the immobility of some elements when the swapping passes are single directional (i.e. only going forward), and how this "Wiping" strategy helps deal with that issue.) Can you come up with an example that shows the difference in runtime?

We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the `sort_timing.txt`

file in your lab8 directory.

*Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.*

### Amortization

In lecture you should have also learned about amortization. If you have not watched lecture or reviewed the slides please take some time to go through slides 8-13 or the rest of this section will not make as much sense. Still we will briefly explain amortization here to recap the main points.

If we look at a sequence of $N$ operations to be carried out for a specific algorithm (which might be an operation on a data structure), it is possible that each one of those operations differs in time complexity. Amortized time means that instead of considering the worst case time complexity for a given operation, we will instead look at the "average" time for each one of the operation. For certain instances, this can be more insightful than just assuming that each of the operations will run in worst case time. In this next exercise we will examine different schemas of resizeable arrays to see what the amortized performance of each of them is.

### Exercise: Timing Amortized Lists

The file `timing.GrowList.java`

contains multiple implementations of a list that can grow as more elements are inserted into it. Each `GrowList`

contains the following methods (some of which might be inherited from the abstract class `GrowList`

).

`void add(int e)`

: Inserts an`int`

element into the given list. Each`GrowList`

's behavior is described below.`GrowList newList()`

: Creates a new`GrowList`

, this is a method that is included to allow for easier creation of lists during tests.`List<Double> score(int maxSize, int nLists)`

: Conduct a timing experiment by inserting`maxSize`

elements into`nLists`

different lists. The output of this will be a list such that`i`

th item is the time for the`i`

th insertion. We use`nLists`

different lists and average the performance over all of them to hopefully reduce noise in our data collection.`List<Double> accumScore(int maxSize, int nLists)`

: Conduct a timing experiment as specified above. The output of this will be a list such that`i`

th item is the cumulative total of the average times of the first`i`

insertions.

Unlike the sorters from above, you should be able to understand all of the code for each of the `GrowList`

implementations that we have provided. After reading the below descriptions, spend some time with your partner to look at each of their implementations and discuss how each of them work. Once again, pay close attention to the asymptotic runtimes of each of these variants. The full list of implementations and their asymptotic runtimes are as follows:

`GeomGrowList`

: This is an array-backed list that resizes by doubling the size of the array when the array is fully filled up. The runtime for $N$ insertions into a`GeomGrowList`

is $\Theta(N)$, and the amortized runtime of a single addition is $\Theta(1)$.`ArithGrowList`

: This is an array-backed list that resizes by increasing the size of the array by one when the array is fully filled up. The runtime for $N$ insertions into a`ArithGrowList`

is $\Theta(N^2)$, and the amortized runtime of a single addition is $\Theta(N)$.`JavaGrowList`

: This is a wrapper to Java's`ArrayList`

which is array-backed list that resizes by doubling the size of the array when the array is fully filled up. The runtime for $N$ insertions into a`JavaGrowList`

is $\Theta(N)$, and the amortized runtime of a single addition is $\Theta(1)$.

The file `timing.AmortizationTiming`

contains code to generate plots of the runtimes of the different `GrowList`

implementations as many elements are added to each of them. Again feel free to examine this code, but you can again treat this class as a black box. The three arguments accepted (or set by default) are `maxSize`

which specifies the number of elements to add, `nLists`

which specifies the number of lists to test on, and `accumulate`

which will specify whether you would like to visualize the accumulated runtimes or the per operation runtimes (`accumulate`

should be passed as `-accum`

or `-noaccum`

, see below for more information). When you run this program it will take some time to run all of the experiments, and then a nice plot should be opened on your computer.

For example, the commands

```
javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -noaccum
```

uses the command-line arguments to plot all lists' runtimes per operation for 1024 additions into 1000 different lists (averaging the result). On the other hand the commands

```
javac timing/AmortizationTiming.java
java timing.AmortizationTiming 1024 1000 -accum
```

uses the command-line arguments to plot the accumulated runtimes for the first `i`

insertion for the same 1024 additions into the same 1000 lists.

Just like before try running the program, either with IntelliJ or over the command line. To run with IntelliJ, you can run the main method of `AmortizationTiming`

and alter the `maxSize`

, `nLists`

, and `accumulate`

values. If you're running it on the command line, run the above commands from the lab8 directory. We have made this setup flexible, so please play with the values of `maxSize`

, `nLists`

, and `accumulate`

to see how the graphs change!

### Discussion: Amortization Results

Once you have the visualization up and running, answer each of these questions with your lab partner!

- Is one
`GrowList`

implementation always better than the others? - Why is the runtime for
`N`

insertions into a geometrically resizing list a $\Theta(N)$ operation? - Why is the runtime for
`N`

insertions into an arithmetically resizing list a $\Theta(N^2)$ operation? - How does the runtime per operation for the
`ArithGrowList`

compare to that of`GeomGrowList`

and`JavaGrowList`

? Specifically look at the non-accumulated plots and describe the trends for how long each operation takes as a function of how many elements have already been inserted in the list. - When are there spikes in the per operation runtime graphs for each of the implementations? Do these make sense to you?
*Hint: some of these should and others might not. Empirical runtime can be quite messy and depends on machine specifics which will be revealed in other subsequent classes like CS61C.* *Optional*: Try changing the code for`GeomGrowList`

to resize by a different factor. How does this affect the theoretical asymptotic runtime? How does this effect the plotted runtime?*Optional*: Try changing the code for`ArithGrowList`

to resize by adding a different fixed number of spots in the array. How does this affect the theoretical asymptotic runtime? How does this effect the plotted runtime?

We will be asking you to record a brief summary of your responses to these questions to show that you have put some thought into discussing them. Again this is supposed to be a collaborative exercise and some of these questions might have many reasonable answers. If you have questions you can also ask a TA/AI! Once you feel satisfied with your responses, put a summary into the `amortized_timing.txt`

file in your lab8 directory.

*Note: We will not be directly grading your responses and you will be graded on effort. You should write at least one sentence for each question. No responses are required for the optional questions.*

## C. Practice with Asymptotics and Maps

### Simple Map Introduction

For the next part of the lab we will shift gears and look at different implementations of the `SimpleMap`

interface we have provided for you. A map is an abstract data type which maintains pairs of keys and associated values. You probably have used them before for many tasks in 61B this semester or previously in other classes (a Python dictionary is a map). The `SimpleMap`

interface contains the following two following methods:

`void put(K key, V value)`

: Inserts a new key-value pair into the map. If the key already exists, then the associated value is updated to the new value.`V get(K key)`

: Returns the value associated with a particular key.

### LinkedListMap

As the name suggests this implementation of `SimpleMap`

is backed by a linked list. Unlike the nodes in previous linked lists we have looked at (`IntList`

and `IntDList`

), the `MapNode`

elements of this linked list will contain both a key and a value. *Optional: You will not be implementing any of the functionality of this class, but familiarize yourself with the code. Discuss with your partner what the best and worst case time is for the put and get operations are.*

### Exercise: TreeMap

The `TreeMap`

implementation of the `SimpleMap`

class will be an implementation of the `SimpleMap`

interface that is instead backed by a binary search tree. If you would like a refresher on binary search trees, please see the slides from lecture. We have provided the basic structure of this class for you which we will describe in more detail here. Similar to the `LinkedListMap`

there will be a `TreeMapNode`

inner class which will be the underlying recursive binary search tree structure. The `TreeMap`

class has a `_root`

variable which points to the root of the tree. The `TreeMapNode`

class contains the following variables:

`TreeMapNode _left`

: Pointer to the left child of the`TreeMapNode`

.`TreeMapNode _right`

: Pointer to the right child of the`TreeMapNode`

.`K _key`

: The key for the key-value pair represented by the`TreeMapNode`

.`V _value`

: The value for the key-value pair represented by the`TreeMapNode`

.

We have provided the `put`

and the `get`

functions for you and you should not need to edit these functions (although you may if you so choose as long as the tests still pass). The structure presented here is a common design pattern for classes which are wrapper classes for tree recursive data structures. The `get`

and `put`

functions themselves are not recursive as they operate on a `TreeMap`

(which itself is not a recursive data structure). These methods call `putHelper`

and `getHelper`

which are recursive functions that take in `TreeMapNode`

s and return `TreeMapNodes`

.

Your task for this lab will be to fill in the recursive `putHelper`

and `getHelper`

functions which handle most of the tree related operations. The goal will be to complete these methods such that your completed `TreeMap`

implementation will perform similarly to that of Java's `java.util.TreeMap`

. Before beginning take a look at the code provided in the `TreeMap`

and make sure you understand the structure. Next implement the two methods by filling in the `// FIXME`

s in the code. Finally discuss with your partner what the worst case and best case runtimes for each of these methods will be.

*Hint: The keys of the map will be Comparable<K> which means that you should use the compareTo(K key) function in order to compare keys. This will be useful for traversing through the tree correctly.*

### MapTest

We have provided a small set of unit tests to test both the `LinkedListMap`

and the `TreeMap`

that you will be implementing. There are three helper methods `smallTestMap`

, `fuzzTestMap`

(this test is a fuzz test), and `timedFuzzTestMap`

. More information can be found in the comments of these methods. The `fuzzTestMap`

and the `timedFuzzTestMap`

might be different than other testing functions that you have seen before. *Optional: Spend some time looking at each of these functions and try to understand what they are doing, discuss this with your partner. You might be able to take ideas from these tests to write new kinds of test for other code in your future!*

The actual tests that will be ran on the autograder are as follows (all just call one of the above three testing functions).

`smallTestLinkedListMap`

: Runs`smallTestMap`

with a`LinkedListMap`

.`smallTestTreeMap`

: Runs`smallTestMap`

with a`TreeMap`

.`fuzzTestLinkedListMap`

: Runs`fuzzTestMap`

with a`LinkedListMap`

.`fuzzTestTreetMap`

: Runs`fuzzTestMap`

with a`TreeMap`

.`timedFuzzTestTreeMap`

: Runs`timedFuzzTestMap`

with a`TreeMap`

.

By default you should pass the `LinkedListMap`

tests and fail the `TreeMap`

tests. In order to get full credit for this section of the lab you must pass all 5 of these tests.

*Optional: It might be fun to see in practice how much better we can do when we use a binary search tree over a linked list. Try writing a new test, or modifying an existing one to see what the ratio between the speed of the LinkedListMap and the java.util.TreeMap is.*

*Optional: The TreeMap that you have created here probably offers a good speedup over the LinkedListMap, and might even be close to that of the java.util.TreeMap however it might not be quite as good as it seems. In the tests that we have provided we insert random key-value pairs which has empirically been shown to result in roughly balanced tree structures. If we instead insert a large number of key-value pairs into our tree with keys that are in sorted order, then the performance of our tree might be decreased significantly. Discuss with your partner why this might be, and then see if you can verify this empirically by writing a new test or refactoring the existing code.*

## D. Submission and Grading

### Deliverables

For full credit on the lab you must:

- Put sufficient effort into the responses in
`sort_timing.txt`

. - Put sufficient effort into the responses in
`ammortized_timing.txt`

. - Complete the
`putHelper`

and`getHelper`

functions in`map.TreeMap.java`

. In order to get full credit for this part of the lab all of the provided tests in`map.MapTest.java`

must pass. Also remember to complete and submit the

`partner.txt`

if you worked with a partner. If you did not, you do not need to edit this file.- You can earn .4 points for each of the Amortized and Sort Timing question tests. You can earn .4 points for passing 3 of the Map tests, .8 points for passing 4, and 1.2 points for passing all 5.

### Submission

You can submit the same as usual:

```
<git add / commit>
git tag lab8-0 # or the next highest submission number
git push
git push --tags
```