Last Updated: 2019-09-20 Fri 14:43

CSCI 2021 Assignment 1: C Programming

CODE DISTRIBUTION: a1-code.zip

CHANGELOG:

Fri 20 Sep 2019 02:40:44 PM CDT

Several Piazza posts identified that the documentation string for hashmap_show_structure() should read as follows

- Uses format specifier "%3d : " to print the table indices

to pass tests instead of the existing %2d. This has now been fixed. Thanks to those who identified the bug and reported it to me.

Wed 11 Sep 2019 08:58:38 AM CDT
A minor bug in the test results was identified in Post 41 which has now been corrected. The bug is associated with Problem 3 and the hash_main command hashcode <key> which should print hashcodes as a 64-bit long using the format string "%ld". The tests contained wrong answers for some of these which appear on Test #2 for Problem 3. This has been fixed in the zip and those working on the project should download a corrected copy of test_hash_main_data.sh into their code directory.
Tue 10 Sep 2019 09:45:49 AM CDT
As described in Post 24, the original a1-code.zip was missing the test_check_valgrid.sh testing file. It has been added to the zip. If you already downloaded the assignment code and started adding code, download the missing file here and copy it into your a1-code/ directory: https://www-users.cs.umn.edu/~kauffman/2021/test_check_valgrind.sh

1 Introduction

Basic application programming in C is an essential step down towards the lower levels of computing. This project explores fundamental aspects of getting work done in C:

  • Dynamic memory management with malloc()/free()
  • Reading data from files in both text and binary format
  • Displaying information to the screen
  • Reading commands from users in interactive programs
  • Dealing with structs and the data structures they can build

The assignment is divided into several problems utilizing many of the above techniques.

  • Problem 1 deals with reading data from files into dynamically allocated arrays
  • Problem 2 builds a simple display routine which can be used to display the data read using Problem 1 code
  • Problem 3 builds a simple hash table application

Each problem is fairly independent from the others so may be worked on in any order though to complete Problem 2, code from Problem 1 must be finished.

1.1 Grading Criteria

Credit for this assignment will be given based on two categories.

  • Manual Inspection Criteria (~50%): Each problem has a checklist of things that graders will look for. The checklist is in the spec and often contains hints on what to do. Make sure you have a look at these.
  • Automated Testing (~50%): Each problem has tests associated with it along with instructions on how to run those tests from the command line. Tests require that code compiles and runs according to the descriptions given so make sure you verify that these work.

1.2 Getting Started

Take the following steps to get started

  1. Download the code associated with the project linked at the top of the spec. Unzip it and examine some of the provided code.
  2. Examine the overview of the files provided listed in the Download and Setup section. This gives brief descriptions of files that already exist and those that you must create.
  3. Pick a problem and read. There is a lot of information and many examples provided for each problem. Reading this will help you write the correct code earlier rather than later.
  4. Ask questions: if its not clear how to proceed, put up a Piazza post or visit an office hour.
  5. Get coding: don't wait to start for too long as this will greatly increase your stress level an potentially result in late submissions.
  6. Familiarize yourself with the late submission policy for assignments so you are not caught off guard. No submissions will be accepted more than 48 hours after the deadline.

1.3 Makefile

A Makefile is provided as part of this project. Building programs in C is a bit tedious and most folks use build systems of which make is the oldest. The instructions and dependencies to create programs are written in a Makefile which is then interpreted by the make program which will run gcc and other commands to create programs.

Use this Makefile by issuing commands like make deltas_main

> make deltas_main                 # build problem 1 main program
gcc -Wall -g -c deltas_main.c
gcc -Wall -g -c read_deltas.c
gcc -Wall -g -o deltas_main deltas_main.o read_deltas.o

> make clean                       # remove all programs/binary object files
rm -f save_deltas deltas_main print_graph_demo graph_file hash_main  *.o

> make hash_main                   # build problem 3 main program
gcc -Wall -g -c hash_main.c
gcc -Wall -g -c hash_funcs.c
gcc -Wall -g -o hash_main hash_main.o hash_funcs.o

> make clean                       # remove all programs/binary object files
rm -f save_deltas deltas_main print_graph_demo graph_file hash_main  *.o

> make                             # build all programs/objects for the assignment
gcc -Wall -g -c save_deltas.c
gcc -Wall -g -o save_deltas save_deltas.o deltas.h
gcc -Wall -g -c deltas_main.c
gcc -Wall -g -c read_deltas.c
gcc -Wall -g -o deltas_main deltas_main.o read_deltas.o
gcc -Wall -g -c print_graph_demo.c
gcc -Wall -g -c print_graph.c
gcc -Wall -g -o print_graph_demo print_graph_demo.o print_graph.o
gcc -Wall -g -c graph_file.c
gcc -Wall -g -o graph_file graph_file.o print_graph.o read_deltas.o
gcc -Wall -g -c hash_main.c
gcc -Wall -g -c hash_funcs.c
gcc -Wall -g -o hash_main hash_main.o hash_funcs.o

You are not required to understand all that is the Makefile (yet) but it is a very useful tool and may be worth your while to inspect.

1.4 Automated Tests

Automated tests can be downloaded in a zip linked at the top of the spec. These tests are known to work on lab machines only. They may work on other machines/environments but this is not guaranteed or supported.

The provided Makefile allows automated tests to be run via calls like make test-p1 and make test-p2. See the transcript below.

> make test-p1                         # run tests for problem 1
gcc -Wall -g -c test_read_deltas.c
gcc -Wall -g -c read_deltas.c
gcc -Wall -g -o test_read_deltas test_read_deltas.o read_deltas.o
===TESTS for P1===
Running binary tests for read_deltas
./test_read_deltas
Test  0       text-5 :   read_text_deltas()  len=  5 : OK
Test  1     text-128 :   read_text_deltas()  len= 32 : OK
Test  2     text-one :   read_text_deltas()  len=  1 : OK
...
========================================
10 / 10 tests passed


> make test-p2                         # run tests for problem 2
gcc -Wall -g -c print_graph_demo.c
gcc -Wall -g -c print_graph.c
...
===TESTS for P2===
Testing print_graph via print_graph_demo
./test_print_graph.sh
Output OK
Valgrind OK

Testing graph_file
./test_graph_file.sh
Loading tests... 10 tests loaded
Running 10 tests

RUNNING NORMAL TESTS
TEST  1 text-ten-5         : OK
TEST  2 text-ten-20        : OK
TEST  3 text-21-5          : OK
TEST  4 text-21-12         : OK
TEST  5 int-21-7           : OK
...
=====================================
OVERALL:
10 / 10 Normal correct
10 / 10 Valgrind correct


> make test                         # run tests for all problems
...

Each problem describes specifically how tests can be run and how credit will be assigned.

Note that in some cases, testing scripts can run single tests at a time which is useful for debugging the test.

While grading, graders will typically download student code on a lab machine and invoke

make test-pX

where X is a problem number. If the Makefile is missing or incomplete or if code does not compile, zero credit will be assigned for the testing grade.

2 Download Code and Setup

Download the code pack linked at the top of the page. Unzip this which will create a project folder. Create new files in this folder. Ultimately you will re-zip this folder to submit it.

File State Notes
Makefile Provided Build file to compile all programs
save_deltas.c Provided Problem 1 sample to create delta files
deltas_main.c Provided Problem 1 main() function
deltas.h Provided Problem 1 header file
read_deltas.c Create Problem 1/2 functions to write
     
data/short.txt Data Problem 1 data files
data/short.int Data  
data/short.4bit Data  
data/wave.txt Data  
data/wave.int Data  
data/wave.4bit Data  
print_graph_demo.c Provided Problem 2 demo program
print_graph.c Create Problem 2 function to write
graph_file.c Create Problem 2 main function to write
hashmap.h Provided Problem 3 header file
hash_funcs.c Create Problem 3 functions to write
hash_main.c Create Problem 3 main function to write
     
data/hash-demo.script Data Problem 3 sample input script to main program
data/stranger.hm Data Problem 3 sample hash map saved to file
data/other.script Data Problem 3 sample input script to main program
data/other.hm Data Problem 3 sample hash map saved to file
data/big.script Data Problem 3 sample input script to main program
data/big.hm Data Problem 3 sample hash map saved to file
TESTING    
test-data/ Testing Directory in which temporary testing files are written
test_read_deltas.c Testing Problem 1 tests for read_deltas.c
test_print_graph.sh Testing Problem 2 Test script for print_graph_demo
test_print_graph_data.sh Testing Problem 2 Test data for print_graph_demo
test_graph_file.sh Testing Problem 2 Test script for graph_file
test_graph_file_data.sh Testing Problem 2 Test data for graph_file
test_hash_main.sh Testing Problem 3 Test script for hash_main
test_hash_main_data.sh Testing Problem 3 Test data for hash_main

3 Problem 1: Reading Text and File Data

3.1 Overview: Delta Arrays

This problem centers around a simple task: read integers from a file into a dynamically allocated array. The caveat to this is that the integers in the file are stored in a special format: an initial value followed by deltas or changes from the previous element. An example is below: comments are on the right side

data/short.txt: 
20   # start with 20
-2   # subtract 2
-4   # subtract 4
-4   # subtract 4
-3   # subtract 3
-5   # subtract 5
 1   # add 1
-2   # subtract 2
 4   # add 4 
 4   # etc...
...

The deltas_main program will read this file and produce the following output

> ./deltas_main text data/short.txt 
Reading text format
data_len: 21
   # read
   0   20
   1   18
   2   14
   3   10
   4    7
   5    2
   6    3
   7    1
   8    5
   9    9
  10    5
  11    7
  12    7
  13    5
  14    5
  15    5
  16    7
  17    8
  18    3
  19    7
  20    4

Notice the array contents are not exactly what is in the file but rather compute an element by adding on the delta from the file to the previous total, sometimes referred to as a cumulative sum.

Creating Delta files

The data/ directory contains several delta files. These have extensions indicating the format of the data like .txt for textual and .int for binary int data.

The file save_deltas.c is provided and allows you to create your own delta files for testing. It is demonstrated below.

> make save_deltas                   # builds the save_deltas program
gcc -Wall -g -c save_deltas.c
gcc -Wall -g -o save_deltas save_deltas.o deltas.h

> ./save_deltas                      # show usage of program
usage: ./save_deltas <format> <filename>
reads input from stdin and outputs in given format to <filename>
if typing numbers in, press Ctrl-D to end input
 <format> is one of
 text : text ints are in the given filename
 int  : binary ints are in the given filename
 4bit : 4bit binary ints are in the given filename

> ./save_deltas text deltas.txt      # save typed deltas in text format in file deltas.txt
10
12
17
15
10
5
-1
3                                    # press Ctrl-D to end typed input
wrote 8 ints to deltas.txt in text format

> cat deltas.txt                     # show contents of file deltas.txt on screen
10 2 5 -2 -5 -5 -6 4 

> ./save_deltas int binary.int       # save typed deltas in binary format in file binary.int
1
5
-1
4
wrote 4 ints to binary.int in int format

> cat binary.int                     # show binary data: it's a mess
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

Read Delta Files: Code to Write

The file deltas_main.c is provided and shows how required functions are called to read data. The main work to be done is in read_deltas.c which will implement several functions for reading data in delta files in different formats.

All code for this problem should be written in read_deltas.c to fill in the definition of the functions described below. As the functions are completed, one can compile the deltas_main.c program and test it on the provided input files.

3.2 Reading Text Integers

In read_deltas.c, define the following function.

int *read_text_deltas(char *fname, int *len);
// Reads integers in text delta format from the file named by fname
// and returns an array of them. The first integer in the file gives
// the starting point and subsequent integers are changes from the
// previous total.
// 
// Opens the file with fopen() and scans through its contents using
// fscanf() counting how many text integers exist in it.  Then
// allocates an array of appropriate size using malloc(). Uses
// rewind() to go back to the beginning of the file then reads
// integers into the allocated array. Closes the file after reading
// all ints.  Returns a pointer to the allocated array.
// 
// The argument len is a pointer to an integer which is set to the
// length of the array that is allocated by the function.
//
// If the file cannot be opened with fopen() or if there are no
// integers in the file, sets len to -1 and returns NULL.

Below are some implementation notes explaining some details.

Text Delta Format

As described above, the format of files for this function is text delta which will have the .txt extension. You can look at these files with a text editor. The first number is the starting point and subsequent numbers are changes from the last number. Some examples:

| delta file:  | 0 | 2 | 2 | 2 | 2 |  2 |  1 |  2 |  1 |  2 |  1 |
| array value: | 0 | 2 | 4 | 6 | 8 | 10 | 11 | 13 | 14 | 16 | 17 |
| array index: | 0 | 1 | 2 | 3 | 4 |  5 |  6 |  7 |  8 |  9 | 10 |

| delta file:  | 20 | -2 | -4 | -4 | -3 | -5 | 1 | -2 | 4 | 4 | -4 |
| array value: | 20 | 18 | 14 | 10 |  7 |  2 | 3 |  1 | 5 | 9 |  5 |
| array index: |  0 |  1 |  2 |  3 |  4 |  5 | 6 |  7 | 8 | 9 | 10 |

Note that the contents of delta text files can occur on one line, each on a separate line, or several per line. All numbers will be separated by one or more whitespace characters. Make use of the scanning techniques below to handle this without trouble.

Opening Text Files

Make use of the standard C file I/O function fopen() to open a named file. Do some research to figure out the arguments to this, particularly

  • What arguments are passed to fopen() to open a file for reading
  • What the function returns on success
  • What the function returns on failure which indicates a file could not be opened.

Reading Data from Text Files

Make use of the C function fscanf() which is similar to scanf() to read data. Using a format specifier like %d will skip any whitespace to find the next integer available in the input stream. This allows one to handle data spread across multiple lines easily.

Since the number of integers in the file is not known in advance, use the following two-pass strategy to determine the size of the array required.

  1. Use a loop with fscanf() just to count how many integers are in the file. Each time you attempt a read, capture the return value of fscanf(). If the return value is EOF (end of file) then the read failed due to reaching the end of the file.
  2. Allocate an array of integers big enough to hold the contents of the file in it. Use malloc() for this.
  3. Use the standard C rewind() function to move back to the beginning of the file.
  4. Use another loop with fscanf() to read through the file again. This time, each read should store an integer in the array.

Compiling and Running with deltas_main.c

When you have completed this function, you may wish to test it with deltas_main.c. You will likely need to take the following steps:

  • Comment out the parts of deltas_main.c which use later functions like read_int_deltas() so there are no undefined references.
  • Compile the main program. While you can compile the program with gcc manually as in

    > gcc -o deltas_main deltas_main.c read_deltas.c
    
    

    it is likely that you will want to use the provided Makefile which automatically builds for you as in

      > make deltas_main
      gcc -Wall -g -c deltas_main.c
      gcc -Wall -g -c read_deltas.c
      gcc -Wall -g -o deltas_main deltas_main.o read_deltas.o
    

    You may begin examining Makefile as it is an incredibly useful tool for building complex projects.

  • Run the program with some data files such as

      > ./deltas_main text data/short.txt
      Reading text format
      data_len: 21
         # read
         0   20
         1   18
         2   14
         3   10
         ...
    

    Note that the program needs to be told the format text as the 2nd argument to run properly.

3.3 Reading Binary Integers

In read_deltas.c, define the following function.

int *read_int_deltas(char *fname, int *len);
// Reads integers in binary delta format from the file named by fname
// and returns an array of them.  The first integer in the file gives
// the starting point and subsequent integers are changes from the
// previous total.
// 
// Integers in the file are in binary format so the size of the file
// in bytes indicates the quantity of integers. Uses the stat() system
// call to determine the file size in bytes which then allows an array
// of appropriate size to be allocated. DOES NOT scan through the file
// to count its size as this is not needed.
// 
// Opens the file with fopen() and uses repeated calls to fread() to
// read binary integers into the allocated array. Does delta
// computations as integers are read. Closes the file after reading
// all ints.
// 
// The argument len is a pointer to an integer which is set to
// the length of the array that is allocated by the function.
//
// If the file cannot be opened with fopen() or file is smaller than
// the size of 1 int, sets len to -1 and returns NULL.

Binary Delta Format

The file format expected by this file is binary rather than textual so it is likely not possible to examine the data in text format. The data/ directory contains some example files

  • data/short.int : 84 bytes : 21 integers
  • data/wave.int : 508 bytes : 127 integers

The data in these are identical to the .txt equivalents:

  • The first number is the starting point
  • Subsequent numbers are changes of the previous number

However, rather than use ASCII characters that are human readable, each integer is stored directly as 4 bytes in the file.

Determining File Size / Array Elements

Since each integer occupies 4 bytes directly in the file, the size of the file corresponds to the number of integers in the file. This allows one to allocate an appropriately sized array based on the file size. This means that no scanning is required to determine how many ints are in the file.

The stat() system call in Unix provides information about a file by filling in values of a struct. The file size in bytes is one such piece of information. Use a calling sequence like the following to determine the binary file size.

  struct stat sb;                                    // struct to hold
  int result = stat(fname, &sb);                     // unix system call to determine size of named file
  if(result==-1 || sb.st_size < sizeof(int)){        // if something went wrong or bail if file is too small
    return NULL;
  }
  int total_bytes = sb.st_size;                      // size of file in bytes

Note the use of the struct stat sb, on old convention for declaring a struct as a stack variable. Its memory address is passed to stat() which fills in information. The field st_size is the number of bytes in the file. This can then be used to determine how many ints are in the file (4 bytes per int) and allocate an array with malloc() prior to reading through the file.

Note that this technique only works with binary files. Text files use a variable number of characters/bytes per integer and include formatting characters such as spaces and newlines. This means the number of bytes in the file does not correspond to the number of integers in it.

Reading Binary Data

Files with binary data can be opened identically to the text data but reading requires use of the fread() function. This function differs greatly from fscanf() as it has no format string. Instead, it is meant to read raw bits/bytes from a file into a specified memory address. It is likely that you will use a call similar to the following.

fread(&values[i], sizeof(int),    1,     fin);
      Address to | bytes for   |quantity |file handle  
      store data | one element |to read  |to read from

This call reads a single binary integer into the an array element from an open file handle.

Make sure as you read integers into the array you perform the delta computation of adding on the previous value to the recently read delta.

3.4 OPTIONAL: Reading 4-bit Signed Integers

For those looking for a bit more challenge, optionally implement the following function (pride only: no bonus credit).

int *read_4bit_deltas(char *fname, int *len);
// Reads integers in 4bit delta format from the file named by fname
// and returns an array of them.  The first integer in the file gives
// the starting point and subsequent integers are changes from the
// previous total.
// 
// Integers in the file are in 4bit format so the size of the file in
// bytes indicates the quantity of integers: 1 4byte integer appears
// at the beginning and subsequently, each byte as two 4bit signed
// integers packed into it. Uses the stat() system call to determine
// the file size in bytes which then allows an array of appropriate
// size to be allocated. DOES NOT scan through the file to count its
// size as this is not needed.
// 
// Opens the file with fopen() and uses repeated calls to fread() to
// read binary integers into the allocated array. Does delta
// computations as integers are read. Closes the file after reading
// all ints.
// 
// The argument len is a pointer to an integer which is set to
// the length of the array that is allocated by the function.
//
// If the file cannot be opened with fopen() or file is smaller than
// the size of 1 int, sets len to -1 and returns NULL.

4-bit Delta Files

In many applications such as measuring temperature changes every minute, one does not expect large changes. This means a full 32-bit signed integer to represent the change is a waste of space. Instead, a 4-bit signed integer is sufficient to represent deltas between -8 and +7. The 4-bit delta format is as follows.

  • The first 4 bytes are a 32 bit integer which is the starting value. This is identical to the previous file formats
  • Subsequently, each byte (8 bits) contains TWO 4 bit signed integers which are changes from the previous value.

Like the binary int format above, this format has a direct correspondence of the size of the file to the number of integers

  • data/short.int : 21 integers = 4 byte int + (20 ints)/2 ints per byte = 14 bytes.
  • data/wave.int : 127 integers = 4 byte int + (126 ints)/2 ints per byte = 67 bytes

This is a significantly compressed size over the sizes of other formats which is useful if space is tight (ex: a remotely located temperatures sensor on top of Mt. Everest).

The trade-off for the space savings is that the 4-bit format requires significantly more skill to decode into a standard array of integer values.

Decoding 4-bit Integers

Note that reading 4-bit ints can be done with fread() as it binary was read in the previous problem. However, one will need to read 1 byte at time and process it to become two integer deltas.

As an example, consider the a file starting in the following way:

Byte 1    Byte 2    Byte 3    Byte 4
0000 0111 0000 0000 0000 0000 0000 0000   // little endian 32 bit int = 7
Byte 5
0011 0101 // delta1: right 4 bits = +3, delta2: left 4 bits = +5
Byte 6
1111 0001 // delta1: right 4 bits = -3, delta2: left 4 bits = +1
Byte 7
0111 1000 // delta1: right 4 bits = -8, delta2: left 4 bits = +7

To handle this file format, one would take the following approach.

  • Read the initial value as normal (4 bytes for an int)
  • Read 1 byte
  • Mask the low 4 bits for the first 4-bit delta
  • Shift and mask upper 4 bits for the second 4-bit delta
  • Shift both deltas left then right to carry the sign bit (0 positive, 1 negative) across the number.

The following C binary operators are useful for this

int result = 0;
// initializ result: 32 bits all 0

result = x & 0b1111;   
// result is bitwise and of x and the binary constant of zeroes with 4
// low-order ones

result = x >> 4;            
// result is x shifted 4 bits to the right

result = (x << 28) >> 28;
// result is x shifted left 28, the right 28 to carry its sign bit
// across 32 bits

Combining these bitwise operations will allow one to convert a 4-bit signed integer to a 32-bit signed integer.

3.5 Grading Criteria for P1   grading

The following criteria will be checked in read_deltas.c

Weight Criteria
  AUTOMATED TESTS make test-p1
10 Provides 10 tests for read_text_deltas() and read_int_deltas()
  Compile and run using make test-p1
  1 point per test passed
   
5 Valgrind Memory Checks of test_read_deltas.c
  Compile and run using make test-p1
  5 points awarded for Valgrind identifying no memory errors
  Deductions made based on severity of memory errors identified by Valgrind
  MANUAL INSPECTION
10 read_text_deltas()
  Clear commenting indicating which part of the count / allocate / read algorithm is being implemented
  Clear loop to count number of elements in file prior to allocation
  Correctly reading data and checking for end of file with fscanf()
  Proper use of malloc() to allocate an appropriately sized array
  Proper use of dereference to set the len parameter
  File closed prior to return
  Clear efforts to check for file open failures or no integers to return NULL
   
10 read_int_deltas()
  Clear commenting indicating which part of the check size / allocate / read algorithm is being implemented
  Use of stat() system call to calculate file size
  Use of malloc() to allocate correctly sized array
  Clear loop using fread() to read binary data into the array
  Proper use of dereference to set the len parameter
  File closed prior to return
  Clear efforts to check for file open failures or no integers to return NULL

4 Problem 2: Graphing Data

4.1 Overview

Prior to advent of graphical displays, computers were mostly operated via text terminals. This meant any graphical displays had to be rendered with character data in a grid. Anyone thinking this format limited the creativity of computer folks is encouraged to open a terminal and run

telnet towel.blinkenlights.nl

which will likely change their tune (allow ~2 hours of viewing time for the full effect).

The purpose of this problem is somewhat more mundane: create a simple plotting routine for the text terminal. This print_graph() function which resides in the print_graph.c file will display an array of numbers on the screen in a graph-like fashion as per the examples below.

> make print_graph_demo               # build the provided demo program
gcc -Wall -g -c print_graph_demo.c
gcc -Wall -g -c print_graph.c
gcc -Wall -g -o print_graph_demo print_graph_demo.o print_graph.o

> ./print_graph_demo                  # run the provided demo program
DEMO DATA 1
===========
length: 22
min: 0
max: 20
range: 20
max_height: 10
units_per_height: 2.00
     +----+----+----+----+-
 20 |                    X 
 18 |                   XX 
 16 |                  XXX 
 14 |                 XXXX 
 12 |                XXXXX 
 10 |          X    XXXXXXX
  8 |        XXX   XXXXXXXX
  6 |      XXXXX  XXXXXXXXX
  4 |    XXXXXXX XXXXXXXXXX
  2 |  XXXXXXXXXXXXXXXXXXXX
  0 |XXXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+-
     0    5    10   15   20   

DEMO DATA 2
===========
length: 22
min: 0
max: 20
range: 20
max_height: 20
units_per_height: 1.00
     +----+----+----+----+-
 20 |                    X 
 19 |                    X 
 18 |                   XX 
 17 |                   XX 
 16 |                  XXX 
 15 |                  XXX 
 14 |                 XXXX 
 13 |                 XXXX 
 12 |                XXXXX 
 11 |                XXXXX 
 10 |          X    XXXXXXX
  9 |         XX    XXXXXXX
  8 |        XXX   XXXXXXXX
  7 |       XXXX   XXXXXXXX
  6 |      XXXXX  XXXXXXXXX
  5 |     XXXXXX  XXXXXXXXX
  4 |    XXXXXXX XXXXXXXXXX
  3 |   XXXXXXXX XXXXXXXXXX
  2 |  XXXXXXXXXXXXXXXXXXXX
  1 | XXXXXXXXXXXXXXXXXXXXX
  0 |XXXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+-
     0    5    10   15   20   

DEMO DATA 3
===========
length: 50
min: 13
max: 996
range: 983
max_height: 10
units_per_height: 98.30
     +----+----+----+----+----+----+----+----+----+----
996 |                X                                 
897 |       X        X X            X                  
799 |  X    X X   X  X X    X       X                X 
701 |  XX   X X   X  XXX    X      XX   XXX    X   X XX
602 |  XX   X X  XX  XXX X  X      XX  XXXX    XX  X XX
504 |  XX   XXX  XX  XXX XX X      XXX XXXX XX XX  X XX
406 |  XX X XXX XXXX XXX XX X  XXX XXX XXXXXXXXXX  X XX
307 | XXX X XXX XXXXXXXXXXX X XXXX XXXXXXXXXXXXXXX X XX
209 | XXX XXXXXXXXXXXXXXXXX XXXXXX XXXXXXXXXXXXXXXXX XX
111 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 13 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+----+----+----+----+----+----
     0    5    10   15   20   25   30   35   40   45   

DEMO DATA 4
===========
length: 50
min: 13
max: 996
range: 983
max_height: 18
units_per_height: 54.61
     +----+----+----+----+----+----+----+----+----+----
996 |                X                                 
941 |                X              X                  
886 |       X        X X            X                  
832 |       X X      X X            X                X 
777 |  X    X X   X  XXX    X       X   XXX          XX
722 |  XX   X X   X  XXX    X       X   XXX          XX
668 |  XX   X X   X  XXX    X      XX  XXXX    X   X XX
613 |  XX   X X  XX  XXX    X      XX  XXXX    XX  X XX
559 |  XX   XXX  XX  XXX XX X      XX  XXXX  X XX  X XX
504 |  XX   XXX  XX  XXX XX X      XXX XXXX XX XX  X XX
449 |  XX X XXX XXXX XXX XX X   X  XXX XXXX XX XX  X XX
395 | XXX X XXX XXXX XXX XX X  XXX XXX XXXXXXXXXX  X XX
340 | XXX X XXX XXXXXXXXXXX X  XXX XXXXXXXXXXXXXXX X XX
286 | XXX X XXX XXXXXXXXXXX X XXXX XXXXXXXXXXXXXXX X XX
231 | XXX X XXXXXXXXXXXXXXX XXXXXX XXXXXXXXXXXXXXXXX XX
176 | XXX XXXXXXXXXXXXXXXXX XXXXXX XXXXXXXXXXXXXXXXXXXX
122 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 67 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 13 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+----+----+----+----+----+----
     0    5    10   15   20   25   30   35   40   45   

4.2 A Function For Graphing Data

In the file print_graph.c, write a the following C function.

void print_graph(int *data, int len, int max_height);
// Prints a graph of the values in data which is an array of integers
// that has len elements. The max_height argument is the height in
// character rows that the maximum number data[] should be.  A sample
// graph is as follows:
//
// length: 50
// min: 13
// max: 996
// range: 983
// max_height: 10
// units_per_height: 98.30
//      +----+----+----+----+----+----+----+----+----+----
// 996 |                X                                 
// 897 |       X        X X            X                  
// 799 |  X    X X   X  X X    X       X                X 
// 701 |  XX   X X   X  XXX    X      XX   XXX    X   X XX
// 602 |  XX   X X  XX  XXX X  X      XX  XXXX    XX  X XX
// 504 |  XX   XXX  XX  XXX XX X      XXX XXXX XX XX  X XX
// 406 |  XX X XXX XXXX XXX XX X  XXX XXX XXXXXXXXXX  X XX
// 307 | XXX X XXX XXXXXXXXXXX X XXXX XXXXXXXXXXXXXXX X XX
// 209 | XXX XXXXXXXXXXXXXXXXX XXXXXX XXXXXXXXXXXXXXXXX XX
// 111 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
//  13 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
//      +----+----+----+----+----+----+----+----+----+----
//      0    5    10   15   20   25   30   35   40   45   

Computing Range Statistics

The first step to creating the graph is to compute the minimum and maximum values in the data array. This allows one to compute the range of the data. The max_height argument can then be used to compute how many "units" a data value must have to get an X character at the given heights. All this information is printed prior to the main body of the graph as in the following.

length: 50
min: 13
max: 996
range: 983
max_height: 10
units_per_height: 98.30
                     ^^
                     2 decimal places

Note that the units_per_height value is a floating point value and it is printed with 2 decimal places of accuracy.

Top and Bottom Axes

The top and bottom axes are simply dashed - lines with a + every 5 steps similar to the following:

      +----+----+----+----+----+----+----+----+----+----

The left part of the axis is padded with 5 spaces to allow room for vertical axis numbering.

Below the bottom axis, print an index reference every 5 units. This allows particular data elements (data[11] for example) to be located easily. This is easiest to do with printf() and a format specifier like %-5d which will left justify the integers in a field of 5 characters. The bottom axis and numbering should look like

     +----+----+----+----+----+----+----+----+----+----
     0    5    10   15   20   25   30   35   40   45   
^^^^^     ^^^^^
5 spaces  5 chars wide, left justified

Suggestion: To print the axis, use a single loop for the axis line which prints a dash - except on each 5th iteration when it prints a plus + instead.

Main plot Body

There are a variety of ways to create the plate body BUT below is the preferred algorithm which uses integer variables to control the loop. This avoids common problems associated with floating point numbers in loops.

  1. Loop from max_height down to 0. Each iteration will draw a "level" of the plot from top to bottom.
  2. At the start of each loop, compute the cutoff at the iteration's level using arithmetic like

    min + level * units_per_height
    
    

    Make sure that to cast this to an integer.

  3. Print the vertical axis reference number first so it appears in the left margin. Use a format specifier like %3d | which will print the level number reference right justified in a field of 3 characters followed by a space and a vertical bar.
  4. Loop through each element of the data array. If the data element is larger than or equal to the level's cutoff, print an X. Otherwise, print a space.
  5. After printing X/space for each data, print a newline to move down one level.

Your code should use a doubly nested loop to get the printing correct.

Demoing with print_graph_demo.c

A correct implementation of print_graph() should compile against print_graph_demo.c and produce the output shown at the top of this section.

4.3 Main Function for Graphing

The print_graph() function should work seamlessly to print the data array produced by the functions in read_deltas.c. All that remains is a main function to glue these two functions together.

In the file graph_file.c, create a main() function which does the following:

  • Accepts 3 command line arguments
    1. A format of a file
    2. A filename
    3. A number which is the maximum height of the graph in characters
  • Uses the atoi(str) function to convert the maximum height command line argument from a string to an integer.
  • Uses an appropriate function from read_deltas.c to read data from the given file.
  • Plots the data using print_graph()
  • Frees the data array prior to exiting.

Note that this main() function is VERY similar to the main() in deltas_main.c which means you can lift a lot of code from there to get the job done.

A demonstration of how this program should work is below. follows.

You DO NOT need implement the 4bit file format if you did not do the optional problem on this.

> make graph_file                       # build using Makefile
gcc -Wall -g -c graph_file.c
gcc -Wall -g -c print_graph.c
gcc -Wall -g -c read_deltas.c
gcc -Wall -g -o graph_file graph_file.o print_graph.o read_deltas.o

> ./graph_file                          # no arguments gives usage message
usage: graph_file <format> <filename> <height>
 <format> is one of
 text : text ints are in the given filename
 int  : binary ints are in the given filename
 4bit : 4bit binary ints are in the given filename

> ./graph_file text data/short.txt 5    # graph the text short.txt file, height 5
Reading text format
length: 21
min: 1
max: 20
range: 19
max_height: 5
units_per_height: 3.80
     +----+----+----+----+
 20 |X                    
 16 |XX                   
 12 |XXX                  
  8 |XXXX     X       X   
  4 |XXXXX   XXXXXXXXXX XX
  1 |XXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+
     0    5    10   15   20   

> ./graph_file text data/short.txt 20   # graph the text short.txt file, height 20
Reading text format
length: 21
min: 1
max: 20
range: 19
max_height: 20
units_per_height: 0.95
     +----+----+----+----+
 20 |X                    
 19 |X                    
 18 |XX                   
 17 |XX                   
 16 |XX                   
 15 |XX                   
 14 |XXX                  
 13 |XXX                  
 12 |XXX                  
 11 |XXX                  
 10 |XXXX                 
  9 |XXXX     X           
  8 |XXXX     X       X   
  7 |XXXXX    X XX   XX X 
  6 |XXXXX    X XX   XX X 
  5 |XXXXX   XXXXXXXXXX X 
  4 |XXXXX   XXXXXXXXXX XX
  3 |XXXXX X XXXXXXXXXXXXX
  2 |XXXXXXX XXXXXXXXXXXXX
  1 |XXXXXXXXXXXXXXXXXXXXX
  1 |XXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+
     0    5    10   15   20   

> ./graph_file int data/short.int 10    # graph the binary short.int file, height 10
Reading binary int format
length: 21
min: 1
max: 20
range: 19
max_height: 10
units_per_height: 1.90
     +----+----+----+----+
 20 |X                    
 18 |XX                   
 16 |XX                   
 14 |XXX                  
 12 |XXX                  
 10 |XXXX                 
  8 |XXXX     X       X   
  6 |XXXXX    X XX   XX X 
  4 |XXXXX   XXXXXXXXXX XX
  2 |XXXXXXX XXXXXXXXXXXXX
  1 |XXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+
     0    5    10   15   20   

> ./graph_file int data/wave.int 15     # graph the binary wave.int file, heigh 15
Reading binary int format
length: 127
min: -20
max: 20
range: 40
max_height: 15
units_per_height: 2.67
     +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+-
 20 |              XXXX                                                           XXXX                                              
 17 |          XXXXXXXXXXXX                                                   XXXXXXXXXXXX                                          
 14 |        XXXXXXXXXXXXXXXXX                                              XXXXXXXXXXXXXXXXX                                       
 12 |       XXXXXXXXXXXXXXXXXXX                                            XXXXXXXXXXXXXXXXXXX                                      
  9 |     XXXXXXXXXXXXXXXXXXXXXXX                                        XXXXXXXXXXXXXXXXXXXXXXX                                    
  6 |   XXXXXXXXXXXXXXXXXXXXXXXXXX                                     XXXXXXXXXXXXXXXXXXXXXXXXXX                                   
  4 |  XXXXXXXXXXXXXXXXXXXXXXXXXXXX                                   XXXXXXXXXXXXXXXXXXXXXXXXXXXX                                  
  1 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                                XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                                
 -1 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                              XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                              X
 -4 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                           XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                           XXX
 -6 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                         XXXX
 -9 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                     XXXXXX
-12 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX                  XXXXXXX
-14 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX               XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX               XXXXXXXXX
-17 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX          XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX          XXXXXXXXXXX
-20 |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+-
     0    5    10   15   20   25   30   35   40   45   50   55   60   65   70   75   80   85   90   95   100  105  110  115  120  125  

4.4 Grading Criteria for P2   grading

The test_graph_file.sh script creates outputs in the test-data/ directory such as test-data/actual.txt which is the output on test which may be useful to inspect. Running a single test as in

./test_graph_file.sh 4

ensures that files in the test-data/ directory correspond to the results of that single test.

Weight Criteria
  AUTOMATED TESTS make test-p2
5 test_print_graph.sh
  Script which checks output of print_graph_demo
  Compile and run using make test-p2 OR
  Run individually by doing ./test_print_graph.sh
  0-4 points assigned based on matching expected output
  1 point assigned for avoiding memory problems identified by Valgrind
   
10 test_graph_file.sh
  10 tests for graph_file executable
  Compile and run using make test-p2 OR
  Run script using ./test_graph_file.sh
  1 point per test passed
  0-3 point deduction if Valgrind identifies memory problems
  NOTE: individual tests can be run by doing
  ./test_graph_file.sh 4 # runs test 4
  MANUAL INSPECTION
5 print_graph.c
  Clear commenting indicating what part of the graph is being printed throughout
  Block which prints the initial statistics such as min/max/range
  Clearly marked blocks for printing top and bottom axis
  Clear doubly-nested loop structure to print main graph body
  Effective use of printf() with character field widths specified to format numbers padded by spaces.
   
5 graph_file.c
  Clear set of cases which selects the appropriate file format/read function
  Both text and binary int formats are handled
  Memory allocated by the read function is free()'d in main()

5 Problem 3: Hash Tables in C

5.1 Overview

This problem implements a rudimentary hash table application in C along with a program that uses it. The architecture is similar to a lab problem involving linked lists so reviewing that code can provide some insight.

Basic Hash Tables are covered in most introductory data structure classes such as CSCI 1913 and 1933. Should you need to review them, the following resources may prove useful.

5.2 hash_main Demonstration

The intent of this problem is to develop a small application which behaves as the following demo indicates.

> make hash_main                        # build the hash_main application
gcc -Wall -g -lm -c hash_main.c
gcc -Wall -g -lm -c hash_funcs.c
gcc -Wall -g -lm -o hash_main hash_main.o hash_funcs.o

> ./hash_main                           # run the application
Hashmap Demo
Commands:
  hashcode <key>   : prints out the numeric hash code for the given key (does not change the hash map)
  put <key> <val>  : inserts the given key/val into the hash map, overwrites existing values if present
  get <key>        : prints the value associated with the given key or NOT FOUND
  print            : shows contents of the hashmap ordered by how they appear in the table
  structure        : prints detailed structure of the hash map
  clear            : reinitializes hash map to be empty with default size
  save <file>      : writes the contents of the hash map the given file
  load <file>      : clears the current hash map and loads the one in the given file
  next_prime <int> : if <int> is prime, prints it, otherwise finds the next prime and prints it
  expand           : expands memory size of hashmap to reduce its load factor
  quit             : exit the program

HM> put Lucas brash                     # put key/vals into the hash map
HM> put Mike DM
HM> put Dustin corny
HM> put Will lost

HM> print                               # print the contents of the hash map
        Mike : DM                       # shows up out of order from insertions
      Dustin : corny                    # due to the randomness of hash codes
        Will : lost
       Lucas : brash

HM> get Dustin                          # lookup based on key
FOUND: corny                            # success: value is returned
HM> get El
NOT FOUND                               # failure: not in hash map
HM> get Lucas                           
FOUND: brash                            # success
HM> get Jim
NOT FOUND                               # failure

HM> put El weird                        # add more key/values to the hash map
HM> put Steve hairy
HM> put Nancy torn
HM> put Barb ran-away?
HM> print
        Mike : DM
       Nancy : torn
        Barb : ran-away?
      Dustin : corny
          El : weird
        Will : lost
       Lucas : brash
       Steve : hairy

HM> hashcode Lucas                      # show hash codes for various keys, 
1633908044                              # calculated from the string
HM> hashcode Barb                       # PRINT USING "%ld" for 64-bit longs
1651663170                              # much larger than the table size

HM> structure                           # show the structure of the hash table
item_count: 8                           # how many key/val pairs in the map
table_size: 5                           # size of the table array
load_factor: 1.6000                     # item_count / table_size
  0 : {(1701538125) Mike : DM} {(521359221070) Nancy : torn} {(1651663170) Barb : ran-away?} 
  1 : {(121399204345156) Dustin : corny} 
  2 : {(27717) El : weird} 
  3 : {(1819044183) Will : lost} 
  4 : {(495555147084) Lucas : brash} {(435778057299) Steve : hairy} 

# above shows hash codes for key, value, linked lists in each bucket
# 0 and 5 are in bucket 0, 1 and 6 are in bucket 1, etc  

HM> expand                              # expand the hash map's table array
HM> structure                           # show the structure
item_count: 8                           # same # of key/vals
table_size: 11                          # larger array, always a prime number
load_factor: 0.7273                     # reduced load factor
  0 : {(1819044183) Will : lost} 
  1 : {(1701538125) Mike : DM} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : torn} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 

# all elements rehashed to new locations based on new table size 11

HM> save stranger.hm                    # save hash map to a file
HM> clear                               # clear current hashmap
HM> print                               # print: nothing there
HM> structure                           # structure: empty after 'clear'
item_count: 0
table_size: 5
load_factor: 0.0000
  0 : 
  1 : 
  2 : 
  3 : 
  4 : 

HM> put Jim cop                         # put new values in the hashmap
HM> put Joyce worried
HM> print                               # print new hash map
       Joyce : worried
         Jim : cop

HM> structure                           # show structure
item_count: 2                           # 2 items only
table_size: 5
load_factor: 0.4000
  0 : 
  1 : {(435460599626) Joyce : worried} 
  2 : 
  3 : {(7170378) Jim : cop} 
  4 : 

HM> get Dustin                          # old elements were cleared out
NOT FOUND
HM> get Joyce                           # new elements present
FOUND: worried

HM> load stranger.hm                    # load from file; clears current hash table
HM> print                               # old elements are restored
        Will : lost
        Mike : DM
      Dustin : corny
       Steve : hairy
        Barb : ran-away?
       Nancy : torn
          El : weird
       Lucas : brash

HM> structure                           # identical structure from the 'load'
item_count: 8
table_size: 11
load_factor: 0.7273
  0 : {(1819044183) Will : lost} 
  1 : {(1701538125) Mike : DM} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : torn} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 

HM> get Dustin                          # now found
FOUND: corny
HM> get Joyce                           # cleared after the 'load'
NOT FOUND

HM> put Will found                      # overwrite values associated with existing keys
Overwriting previous key/val
HM> put Nancy decided
Overwriting previous key/val
HM> put Mike girl-crazy
Overwriting previous key/val

HM> print                               
        Will : found                    # new value
        Mike : girl-crazy               # new value
      Dustin : corny
       Steve : hairy
        Barb : ran-away?
       Nancy : decided                  # new value
          El : weird
       Lucas : brash


HM> next_prime 10                       # calculates next larger prime  
11
HM> next_prime 25
29
HM> next_prime 31                       # already prime
31

HM> structure                           # show structure
item_count: 8
table_size: 11
load_factor: 0.7273
  0 : {(1819044183) Will : found} 
  1 : {(1701538125) Mike : girl-crazy} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : decided} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 

HM> expand                              # expand again to next_prime(table_size*2+1)
HM> structure
item_count: 8                           # same as before
table_size: 23                          # larger, always a prime number
load_factor: 0.3478                     # reduced load factor
  0 : 
  1 : 
  2 : {(27717) El : weird} 
  3 : 
  4 : {(1651663170) Barb : ran-away?} {(495555147084) Lucas : brash} 
  5 : 
  6 : 
  7 : 
  8 : {(121399204345156) Dustin : corny} 
  9 : 
 10 : {(521359221070) Nancy : decided} 
 11 : {(1701538125) Mike : girl-crazy} {(435778057299) Steve : hairy} 
 12 : {(1819044183) Will : found} 
 13 : 
 14 : 
 15 : 
 16 : 
 17 : 
 18 : 
 19 : 
 20 : 
 21 : 
 22 : 

HM> quit                                # quit the program

5.3 hash_funcs.c: Hash Table Functions

We will favor separately chained hash maps which have the following features.

  • A hash map stores key/value pairs allowing fast lookup based on keys. It is a data structure that is often used to implement 'dictionaries'.
  • The hash map has a 'table' which is a fixed size array. Each array element is a pointer to a linked list node or NULL.
  • Items in the hash table are stored in linked list nodes including the lookup key and value associated with the key. Items that hash to the same table location are stored in the list at that location
  • A hashcode() function is provided which generates an integer from a string and is the first step to determining where key/value is located.

Examine the header file hashmap.h to see the two C structs which will be used to represent hash tables.

// Type for linked list nodes in hash map
typedef struct hashnode {
  char key[128];                // string key for items in the map
  char val[128];                // string value for items in the map
  struct hashnode *next;        // pointer to next node, NULL if last node
} hashnode_t;

// Type of hash table
typedef struct {
  int item_count;               // how many key/val pairs in the table
  int table_size;               // how big is the table array
  hashnode_t **table;           // array of pointers to nodes which contain key/val pairs
} hashmap_t;

Standard operations are supported by the hash map.

  • Adding key/val pairs
  • Searching for the value associated with a key
  • Altering the value associated with a key
  • Clearing the entire hash map
  • Printing the key/value pairs in the hash map
  • Expanding the memory used by the hash map to improve lookup performance.

These are broken down into the following C functions which you will need to implement in hash_funcs.c except for the first hashcode() function which is provided.

// hash_funcs.c: utility functions for operating on hash maps. Most
// functions are used in hash_main.c which provides an application to
// work with the functions.

long hashcode(char key[]);
// PROVIDED: Compute a simple hash code for the given character
// string. The code is "computed" by casting the first 8 characters of
// the string to a long and returning it. The empty string has hash
// code 0. If the string is a single character, this will be the ASCII
// code for the character (e.g. "A" hashes to 65).  Longer strings
// will have numbers which are the integer interpretation of up to
// their first 8 bytes.  ADVANTAGE: constant time to compute the hash
// code. DISADVANTAGE: poor distribution for long strings; all strings
// with same first 8 chars hash to the same location.

void hashmap_init(hashmap_t *hm, int table_size);
// Initialize the hash map 'hm' to have given size and item_count
// 0. Ensures that the 'table' field is initialized to an array of
// size 'table_size' and filled with NULLs. 

int hashmap_put(hashmap_t *hm, char key[], char val[]);
// Adds given key/val to the hash map. 'hashcode(key) modulo
// table_size' is used to calculate the position to insert the
// key/val.  Searches the entire list at the insertion location for
// the given key. If key is not present, a new node is added. If key
// is already present, the current value is altered in place to the
// given value "val" (no duplicate keys are every introduced).  If new
// nodes are added, increments field "item_count".  Makes use of
// standard string.h functions like strcmp() to compare strings and
// strcpy() to copy strings. Lists in the hash map are arbitrarily
// ordered (not sorted); new items are always appended to the end of
// the list.  Returns 1 if a new node is added (new key) and 0 if an
// existing key has its value modified.

char *hashmap_get(hashmap_t *hm, char key[]);
// Looks up value associated with given key in the hashmap. Uses
// hashcode() and field "table_size" to determine which index in table
// to search.  Iterates through the list at that index using strcmp()
// to check for matching key. If found, returns a pointer to the
// associated value.  Otherwise returns NULL to indicate no associated
// key is present.

void hashmap_free_table(hashmap_t *hm);
// De-allocates the hashmap's "table" field. Iterates through the
// "table" array and its lists de-allocating all nodes present
// there. Subsequently de-allocates the "table" field and sets all
// fields to 0 / NULL. Does NOT attempt to free 'hm' as it may be
// stack allocated.

void hashmap_show_structure(hashmap_t *hm);
// Displays detailed structure of the hash map. Shows stats for the
// hash map as below including the load factor (item count divided
// by table_size) to 4 digits of accuracy.  Then shows each table
// array index ("bucket") on its own line with the linked list of
// key/value pairs on the same line. The hash code for keys is appears
// prior to each key.  EXAMPLE:
// 
// item_count: 6
// table_size: 5
// load_factor: 1.2000
//   0 : {(65) A : 1} 
//   1 : 
//   2 : {(122) z : 3} {(26212) df : 6} 
//   3 : {(98) b : 2} {(25443) cc : 5} 
//   4 : {(74) J : 4} 
//
// NOTES:
// - Uses format specifier "%3d : " to print the table indices
// - Shows the following information for each key/val pair
//   {(25443) cc : 5}
//     |      |    |
//     |      |    +-> value
//     |      +-> key
//     +-> hashcode("cc"), print using format "%ld" for 64-bit longs

void hashmap_write_items(hashmap_t *hm, FILE *out);
// Outputs all elements of the hash table according to the order they
// appear in "table". The format is
// 
//       Peach : 3.75
//      Banana : 0.89
//  Clementine : 2.95
// DragonFruit : 10.65
//       Apple : 2.25
// 
// with each key/val on a separate line. The format specifier
//   "%12s : %s\n"
// 
// is used to achieve the correct spacing. Output is done to the file
// stream 'out' which is standard out for printing to the screen or an
// open file stream for writing to a file as in hashmap_save().

void hashmap_save(hashmap_t *hm, char *filename);
// Writes the given hash map to the given 'filename' so that it can be
// loaded later.  Opens the file and writes its 'table_size' and
// 'item_count' to the file. Then uses the hashmap_write_items()
// function to output all items in the hash map into the file.
// EXAMPLE FILE:
// 
// 5 6
//            A : 2
//            Z : 2
//            B : 3
//            R : 6
//           TI : 89
//            T : 7
// 
// First two numbers are the 'table_size' and 'item_count' field and
// remaining text are key/val pairs.

int hashmap_load(hashmap_t *hm, char *filename);
// Loads a hash map file created with hashmap_save(). If the file
// cannot be opened, prints the message
// 
// ERROR: could not open file 'somefile.hm'
//
// and returns 0 without changing anything. Otherwise clears out the
// current hash map 'hm', initializes a new one based on the size
// present in the file, and adds all elements to the hash map. Returns
// 0 on successful loading. This function does no error checking of
// the contents of the file so if they are corrupted, it may cause an
// application to crash or loop infinitely.

int next_prime(int num);
// If 'num' is a prime number, returns 'num'. Otherwise, returns the
// first prime that is larger than 'num'. Uses a simple algorithm to
// calculate primeness: check if any number between 2 and (num/2)
// divide num. If none do, it is prime. If not, tries next odd number
// above num. Loops this approach until a prime number is located and
// returns this. Used to ensure that hash table_size stays prime which
// theoretically distributes elements better among the array indices
// of the table.

void hashmap_expand(hashmap_t *hm);
// Allocates a new, larger area of memory for the "table" field and
// re-adds all items currently in the hash table to it. The size of
// the new table is next_prime(2*table_size+1) which keeps the size
// prime.  After allocating the new table, all entries are initialized
// to NULL then the old table is iterated through and all items are
// added to the new table according to their hash code. The memory for
// the old table is de-allocated and the new table assigned to the
// hashmap fields "table" and "table_size".  This function increases
// "table_size" while keeping "item_count" the same thereby reducing
// the load of the hash table. Ensures that all memory associated with
// the old table is free()'d (linked nodes and array). Cleverly makes
// use of existing functions like hashmap_init(), hashmap_put(),
// and hashmap_free_table() to avoid re-writing algorithms
// implemented in those functions.

5.4 Provided hash_code() Function

The code for hash_code() is provided below. All hash tables require a means of converting keys into a non-unique number. For consistency, we will use the below code which employs several interesting C tricks that will be discussed of the progression of the class. Some of the advantages/disadvantages of the hash function are mentioned in the comments.

// PROVIDED: Compute a simple hash code for the given character
// string. The code is "computed" by casting the first 8 characters of
// the string to a long and returning it. The empty string has hash
// code 0. If the string is a single character, this will be the ASCII
// code for the character (e.g. "A" hashes to 65).  Longer strings
// will have numbers which are the integer interpretation of up to
// their first 8 bytes.  ADVANTAGE: constant time to compute the hash
// code. DISADVANTAGE: poor distribution for long strings; all strings
// with same first 8 chars hash to the same location.
long hashcode(char key[]){
  union {
    char str[8];
    long num;
  } strnum;
  strnum.num = 0;

  for(int i=0; i<8; i++){
    if(key[i] == '\0'){
      break;
    }
    strnum.str[i] = key[i];
  }
  return strnum.num;
}

5.5 hash_main.c: main function / application

In hash_main.c implement an interactive program which allows users to type commands to manipulate the hash map.

The provided Makefile should compile the hash_main program as follows.

> make hash_main                # Compile with Makefile
gcc -Wall -g -lm -c hash_main.c
gcc -Wall -g -lm -c hash_funcs.c
gcc -Wall -g -lm -o hash_main hash_main.o hash_funcs.o

> ./hash_main                   # Run hash_main application
Hashmap Demo
Commands:
  hashcode <key>   : prints out the numeric hash code for the given key (does not change the hash map)
  put <key> <val>  : inserts the given key/val into the hash map
  get <key>        : prints the value associated with the given key or NOT FOUND
  print            : shows contents of the hashmap ordered by how they appear in the table
  structure        : prints detailed structure of the hash map
  clear            : reinitializes hash map to be empty with default size
  save <file>      : writes the contents of the hash map the given file
  load <file>      : clears the current hash map and loads the one in the given file
  next_prime <int> : if <int> is prime, prints it, otherwise finds the next prime and prints it
  expand           : expands memory size of hashmap to reduce its load factor
  quit             : exit the program
HM> 

The following sections provide some implementation details.

Read commands, Execute

The basic flow of hash_main.c follows the same pattern that code for a lab exercise demonstrates. A good way to get started on the main application is to copy over the lab solution and begin modifying it.

  • Create a hashmap_t variable, likely on the stack as a local variable in main()
  • Start a loop that terminates when the user exits or there is no more input
  • Each time the user enters a string, read it and check for one of the built-in commands
  • On identifying the command, potentially read another string if needed (commands like put and get)
  • Call an appropriate hashmap_XXX() function to handle the command

Supported Commands

To indicate to users of the program the supported commands, use the following code to print out the initial option list.

  printf("Hashmap Demo\n");
  printf("Commands:\n");
  printf("  hashcode <key>   : prints out the numeric hash code for the given key (does not change the hash map)\n");
  printf("  put <key> <val>  : inserts the given key/val into the hash map, overwrites existing values if present\n");
  printf("  get <key>        : prints the value associated with the given key or NOT FOUND\n");
  printf("  print            : shows contents of the hashmap ordered by how they appear in the table\n");
  printf("  structure        : prints detailed structure of the hash map\n");
  printf("  clear            : reinitializes hash map to be empty with default size\n");
  printf("  save <file>      : writes the contents of the hash map the given file\n");
  printf("  load <file>      : clears the current hash map and loads the one in the given file\n");
  printf("  next_prime <int> : if <int> is prime, prints it, otherwise finds the next prime and prints it\n");
  printf("  expand           : expands memory size of hashmap to reduce its load factor\n");
  printf("  quit             : exit the program\n");

Clearing Hashmaps

When issuing the clear command, the contents of the current hash map should be eliminated and the hash map reinitialized to the default size. This is most easily done via.

  1. Using the hashmap_free_table() function
  2. Using hashmap_init() with the HASHMAP_DEFAULT_TABLE_SIZE constant defined in hashmap.h.

Paths to Files for Saving and Loading

Saving and loading data involves writing to files. The names associated with the files must be specified as a path so that if files are to be saved or loaded from subdirectories, include this as part of the path. For example, the stranger.hm file is provided in the data/ directory and once loading functionality is code, it can be loaded as follows.

a1-code> ./hash_main
..
HM> load data/stranger.hm                 # load the provided hash map from a subdirectory
HM> print                                 # show contents of hash map
        Will : lost
        Mike : DM
      Dustin : corny
       Steve : hairy
        Barb : ran-away?
       Nancy : torn
          El : weird
       Lucas : brash

HM> structure                             # show internal structure of the hash map
item_count: 8
table_size: 11
load_factor: 0.7273
  0 : {(1819044183) Will : lost} 
  1 : {(1701538125) Mike : DM} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : torn} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 

HM> put Demigorgon scary                  # add more data
HM> save cur.hm                           # save in the current directory
HM> clear                                 # clear hash map
HM> print                                 # show contents are empty
HM> load cur.hm                           # reload from file in current directory
HM> print                                 # show contents are restored from save
        Will : lost
        Mike : DM
      Dustin : corny
       Steve : hairy
        Barb : ran-away?
       Nancy : torn
  Demigorgon : scary
          El : weird
       Lucas : brash

HM> put Jim cop                           # add more data
HM> save data/new.hm                      # save in a copy in 'data' subdirectory
HM> quit

Failing to Load Files

If a file cannot be opened with the load 'file.hm' command, the underlying hashmap_load() should print an error of the form

ERROR: could not open file 'somefile.hm'

and the hash_main.c main loop should NOT change the hash map and print the message

load failed

in response to the error. Below is a demonstration of this from the automated tests.

...
HM> print                            # hash map has contents
           A : 1
           C : 3
           D : 4
           E : 2

HM> load test-data/not-there.tmp     # attempt to load a non-existent file
ERROR: could not open file 'test-data/not-there.tmp'
load failed
HM> print                            # hash map is unchanged
           A : 1
           C : 3
           D : 4
           E : 2

Command Echoing: -echo option to hash_main

Some users may wish to "script" this the main program: allow it to process input commands from a file rather than from the user typing. This notion is discussed in a lab and should be reviewed if it is unfamiliar.

An example of an input script is in data/hash-demo.script which looks like a lot of user commands:

put Lucas brash
put Mike DM
put Dustin corny
put Will lost
print
get Dustin
get El
get Lucas
get Jim
put El weird
put Steve hairy
put Nancy torn
put Barb ran-away?
print
hashcode Lucas
hashcode Barb
structure
expand
structure
save stranger.hm
clear
print
structure
put Jim cop
put Joyce worried
print
structure
get Dustin
get Joyce
load stranger.hm
print
structure
get Dustin
get Joyce
put Will found
put Nancy decided
put Mike girl-crazy
print
structure
next_prime 10
next_prime 25
expand
structure
save data/other.hm
quit 

The file can be fed directly to the program without needing type it using Unix pipes as per the following:

> cat data/hash-demo.script | ./hash_main -echo

Notice the use of a command line argument for hash_main: the -echo option. This is a REQUIRED feature which prints commands typed by users to the screen. To implement this, do the following.

Model the structure of command echoing after what is shown in related Lab work.

  • Use a variable in main() to indicate whether command echoing is on or off; set it to off by default
  • Check when main() starts whether command line argument 1 is set to -echo in which case turn echoing on. A condition like the following is useful.

      if(argc > 1 && strcmp("-echo",argv[1])==0) {
    
  • Each command should check for echoing and print the command being run along with any arguments to it. This technique is demonstrated in related lab work.

It will take some work to get the exact placement of echoes correct but will ultimately lead to nice results that involve LITTLE typing like the example below.

> cat data/hash-demo.script | ./hash_main -echo
Hashmap Demo
Commands:
  hashcode <key>   : prints out the numeric hash code for the given key (does not change the hash map)
  put <key> <val>  : inserts the given key/val into the hash map, overwrites existing values if present
  get <key>        : prints the value associated with the given key or NOT FOUND
  print            : shows contents of the hashmap ordered by how they appear in the table
  structure        : prints detailed structure of the hash map
  clear            : reinitializes hash map to be empty with default size
  save <file>      : writes the contents of the hash map the given file
  load <file>      : clears the current hash map and loads the one in the given file
  next_prime <int> : if <int> is prime, prints it, otherwise finds the next prime and prints it
  expand           : expands memory size of hashmap to reduce its load factor
  quit             : exit the program
HM> put Lucas brash
HM> put Mike DM
HM> put Dustin corny
HM> put Will lost
HM> print
        Mike : DM
      Dustin : corny
        Will : lost
       Lucas : brash
HM> get Dustin
FOUND: corny
HM> get El
NOT FOUND
HM> get Lucas
FOUND: brash
HM> get Jim
NOT FOUND
HM> put El weird
HM> put Steve hairy
HM> put Nancy torn
HM> put Barb ran-away?
HM> print
        Mike : DM
       Nancy : torn
        Barb : ran-away?
      Dustin : corny
          El : weird
        Will : lost
       Lucas : brash
       Steve : hairy
HM> hashcode Lucas
1633908044
HM> hashcode Barb
1651663170
HM> structure
item_count: 8
table_size: 5
load_factor: 1.6000
  0 : {(1701538125) Mike : DM} {(521359221070) Nancy : torn} {(1651663170) Barb : ran-away?} 
  1 : {(121399204345156) Dustin : corny} 
  2 : {(27717) El : weird} 
  3 : {(1819044183) Will : lost} 
  4 : {(495555147084) Lucas : brash} {(435778057299) Steve : hairy} 
HM> expand
HM> structure
item_count: 8
table_size: 11
load_factor: 0.7273
  0 : {(1819044183) Will : lost} 
  1 : {(1701538125) Mike : DM} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : torn} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 
HM> save stranger.hm
HM> clear
HM> print
HM> structure
item_count: 0
table_size: 5
load_factor: 0.0000
  0 : 
  1 : 
  2 : 
  3 : 
  4 : 
HM> put Jim cop
HM> put Joyce worried
HM> print
       Joyce : worried
         Jim : cop
HM> structure
item_count: 2
table_size: 5
load_factor: 0.4000
  0 : 
  1 : {(435460599626) Joyce : worried} 
  2 : 
  3 : {(7170378) Jim : cop} 
  4 : 
HM> get Dustin
NOT FOUND
HM> get Joyce
FOUND: worried
HM> load stranger.hm
HM> print
        Will : lost
        Mike : DM
      Dustin : corny
       Steve : hairy
        Barb : ran-away?
       Nancy : torn
          El : weird
       Lucas : brash
HM> structure
item_count: 8
table_size: 11
load_factor: 0.7273
  0 : {(1819044183) Will : lost} 
  1 : {(1701538125) Mike : DM} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : torn} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 
HM> get Dustin
FOUND: corny
HM> get Joyce
NOT FOUND
HM> put Will found
Overwriting previous key/val
HM> put Nancy decided
Overwriting previous key/val
HM> put Mike girl-crazy
Overwriting previous key/val
HM> print
        Will : found
        Mike : girl-crazy
      Dustin : corny
       Steve : hairy
        Barb : ran-away?
       Nancy : decided
          El : weird
       Lucas : brash
HM> structure
item_count: 8
table_size: 11
load_factor: 0.7273
  0 : {(1819044183) Will : found} 
  1 : {(1701538125) Mike : girl-crazy} {(121399204345156) Dustin : corny} 
  2 : {(435778057299) Steve : hairy} 
  3 : {(1651663170) Barb : ran-away?} 
  4 : 
  5 : 
  6 : {(521359221070) Nancy : decided} 
  7 : 
  8 : {(27717) El : weird} {(495555147084) Lucas : brash} 
  9 : 
 10 : 
HM> next_prime 10
11
HM> next_prime 25
29
HM> expand
HM> structure
item_count: 8
table_size: 23
load_factor: 0.3478
  0 : 
  1 : 
  2 : {(27717) El : weird} 
  3 : 
  4 : {(1651663170) Barb : ran-away?} {(495555147084) Lucas : brash} 
  5 : 
  6 : 
  7 : 
  8 : {(121399204345156) Dustin : corny} 
  9 : 
 10 : {(521359221070) Nancy : decided} 
 11 : {(1701538125) Mike : girl-crazy} {(435778057299) Steve : hairy} 
 12 : {(1819044183) Will : found} 
 13 : 
 14 : 
 15 : 
 16 : 
 17 : 
 18 : 
 19 : 
 20 : 
 21 : 
 22 : 
HM> save data/other.hm
HM> quit

5.6 Grading Criteria for P3   grading

The following criteria will be checked in this problem.

Weight Criteria
  AUTOMATED TESTS make test-p3
15 test_hash_main.sh Testing script
  15 tests for hash_main executable, exercises functions in hash_funcs.c
  Compile and run using make test-p3 OR
  Run script using ./test_hash_main.sh
  1 point per test passed
  NOTE: individual tests can be run by doing
  ./test_hash_main.sh 4 # runs test 4
   
5 Full credit if Valgrind in test_hash_main.sh does not find memory problems
  Deductions based on severity of memory problems identified
  MANUAL INSPECTION
15 hash_funcs.c
  Clear commenting indicating intent during functions "calculate table index" or "iterate through list"
  Relatively short functions: nothing needs to be too long (50 lines tops)
  Clearly written iteration loops to traverse linked lists in hash map functions
  Use of strcmp() to check keys during put/get operations
  Concise and commented code for hashmap_put() which calculates table position and inserts into the list there
  hashmap_free_table() frees all list elements and the table array, assigns 0/NULL to fields
  hashmap_save() makes use of hashmap_write() with an open file handle to avoid redundant code
  hashmap_load() checks that files open successfully and uses hashmap_free_table() to de-allocate memory
  File closed after finished during saving and loading
  next_prime() is clearly written and easy to follow; the algorithm used is explained in comments
  hashmap_expand() uses other functions in such as next_prime(), hashmap_init() and hashmap_free_table() to avoid repeating code.
   
5 hash_main.c
  Comments indicating what high-level command is being implemented in each part of main
  Clear structure of main loop, reading commands, selecting actions
  Use of string functions to identify which command to execute
  Clear efforts to honor -echo option and echo commands
  Clear effort to free all memory prior to exit by clearing hashmap on exit

6 Assignment Submission

6.1 Submit to Gradescope

Some of the pictures below reference a tree problem which is now a hash problem. The process of uploading assignments is the same in both cases.

  1. In a terminal, change to your assignment code directory and type make zip which will create a zip file of your code. A session should look like this:

       > cd Desktop/2021/a1-code      # location of assignment code
    
       > ls 
       Makefile    graph_file.c  print_graph.c  hash_main.c 
       ...
    
       > make zip                     # create a zip file using Makefile target
       rm -f a1-code.zip
       cd .. && zip "solution-a1-2021/a1-code.zip" -r "solution-a1-2021"
         adding: solution-a1-2021/ (stored 0%)
         adding: solution-a1-2021/Makefile.student (deflated 68%)
         adding: solution-a1-2021/test_graph_file.sh (deflated 69%)
         adding: solution-a1-2021/hash_main.c (deflated 71%)
         ...
       Zip created in a1-code.zip
    
       > ls a1-code.zip
       a1-code.zip
    
  2. Log into Gradescope and locate and click 'Assignment 1' which will open up submission

    gradescope01.png

  3. Click on the 'Drag and Drop' text which will open a file selection dialog; locate and choose your a1-code.zip file

    gradescope02.png

  4. This will show the contents of the Zip file and should include your C source files along with testing files and directories.

    gradescope03.png

  5. Click 'Upload' which will show progress uploading files. It may take a few seconds before this dialog closes to indicate that the upload is successful. Note: there is a limit of 256 files per upload; normal submissions are not likely to have problems with this but you may want to make sure that nothing has gone wrong such as infinite loops creating many files or incredibly large files.

    gradescope04.png

  6. Once files have successfully uploaded, the Autograder will begin running the command line tests and recording results. These are the same tests that are run via make test.

    gradescope05.png

  7. When the tests have completed, results will be displayed summarizing scores along with output for each batch of tests.

    gradescope06.png

6.2 Late Policies

You may wish to review the policy on late project submission which will cost you late tokens to submit late or credit if you run out of tokens. No projects will be accepted more than 48 hours after the deadline.

https://www-users.cs.umn.edu/~kauffman/2021/syllabus.html#orga3dad00


Author: Chris Kauffman (kauffman@umn.edu)
Date: 2019-09-20 Fri 14:43