Last Updated: 2023-04-25 Tue 16:58

CSCI 2021 HW14: Wrap-up and Review

CODE DISTRIBUTION: hw14-code.zip

  • Download the code distribution every HW
  • See further setup instructions below

CHANGELOG: Empty

1 Rationale

This HW reviews concepts from earlier assignments and lecture to prepare for the final exam. It is optional but provides some useful practice.

Associated Reading / Preparation

Review the following sections from Bryant and O'Hallaron:

  • Ch 5 on code optimizations
  • Ch 6 on the memory hierarchy especially writing cache friendly code
  • Ch 9 on Virtual Memory and Paging
  • Ch 7 on Linking and the ELF Format
  • Previous chapters mentioned on the schedule to prepare for the comprehensive portion of the final exam.

Grading Policy

  • There is no material to submit for this HW and it is not worth additional credit.
  • There is an Exit Survey on Canvas which is linked under the Assignments section. This is an anonymous survey. Completing it will earn 1 additional Engagemen Point
  • Students are also encouraged to complete the Student Rating of Teaching for the course here: https://srt.umn.edu/blue. A sufficient response rate will trigger a final exam question to be revealed during the last lecture.

2 Review Problem 1

Consider the following C program which runs a function and reports the addresses of several entities before pausing.

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int important_func(int arg1, long arg2){
  printf("%p : important_func()\n",&important_func);
  printf("%p : arg1\n",&arg1);
  printf("%p : arg2\n",&arg2);
  printf("program pid is %d\n",getpid());
  printf("press any key to continue\n");
  getchar();
}

int main(int argc, char *argv[]){
  int x = 1;
  long y = 2;
  important_func(x, y);
  return 0;
}

Running the program creates the following output.

> gcc demoprog.c 
> ./a.out
0x5575f7169169 : important_func()
0x7ffea79aa9bc : arg1
0x7ffea79aa9b0 : arg2
program pid is 11045
press any key to continue

While the program is paused, pmap is invoked in another terminal and produces the following. Certain lines numbered in comments on the right.

> pmap 11045
11045:   a.out                                 #Line
00005575f7168000      4K r---- a.out           # 1 
00005575f7169000      4K r-x-- a.out           # 2 
00005575f716a000      4K r---- a.out           # 3 
00005575f716b000      4K r---- a.out           # 4 
00005575f716c000      4K rw--- a.out           # 5 
00005575f8117000    132K rw---   [ anon ]      # 6 
00007fcdd5146000    136K r---- libc-2.29.so    # 7 
00007fcdd5168000   1328K r-x-- libc-2.29.so    # 8 
00007fcdd52b4000    304K r---- libc-2.29.so    # 9 
00007fcdd5300000      4K ----- libc-2.29.so    #10 
00007fcdd5301000     16K r---- libc-2.29.so    #11 
00007fcdd5305000      8K rw--- libc-2.29.so    #12 
00007fcdd5307000     24K rw---   [ anon ]      #13 
00007fcdd5340000      8K r---- ld-2.29.so      #14 
00007fcdd5342000    124K r-x-- ld-2.29.so      #15 
00007fcdd5361000     32K r---- ld-2.29.so      #16 
00007fcdd536a000      4K r---- ld-2.29.so      #17 
00007fcdd536b000      4K rw--- ld-2.29.so      #18 
00007fcdd536c000      4K rw---   [ anon ]      #19 
00007ffea798b000    132K rw---   [ stack ]     #20 
00007ffea79cf000     12K r----   [ anon ]      #21 
00007ffea79d2000      4K r-x--   [ anon ]      #22 
 total             2296K                        

Answer the following questions.

  1. Which specific lines of pmap output correspond to which entities in the program? Aside from the address correspondence, what information that pmap prints supports this correspondence?
  2. Despite the entities important_func() and arg1 appearing on the same line of C code, they are assigned very different memory addresses. Why is this the case?
  3. We learned in our discussion of assembly programming that the first 6 arguments of functions are passed in registers and registers do not have main memory addresses. However, the program prints out an address for arg1 and arg2. Explain what is going on and how this discrepancy is resolved.

SOLUTION

3 Review Problem 2

The following is a compilation session that goes wrong. Study it and the errors that ensue.

> cat do_math.c                      # show code for program
// Demo program using math operations.
#include <stdio.h>
#include <math.h>
#include <unistd.h>

int main(int argc, char *argv[]){
  double e = M_E;
  double cos_e = cos(e);
  double sin_e = sin(e);
  double e_squared = pow(e,2.0);

  printf("E is %.3f\n",e);
  printf("cos(E) is %.3f\n",cos_e);
  printf("sin(E) is %.3f\n",sin_e);
  printf("E^2 is %.3f\n",e_squared);

  printf("program pid is %d\n",getpid());
  printf("press any key to continue\n");
  getchar();

  return 0;
}

> gcc -c do_math.c                   # compile the code with -c option

> file do_math.o
do_math.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

> gcc -o math_prog do_math.c         # compile the program to an executable
/usr/bin/ld: /tmp/ccNWf9Yu.o: in function `main':
do_math.c:(.text+0x26): undefined reference to `cos'
/usr/bin/ld: do_math.c:(.text+0x3d): undefined reference to `sin'
/usr/bin/ld: do_math.c:(.text+0x60): undefined reference to `pow'
collect2: error: ld returned 1 exit status
  1. Why is it that the first gcc -c compilation succeeds while the second gcc -o fails? What is the difference between these two invocations?
  2. What role does the math.h header file play in this compilation process? What would happen if it were removed?
  3. What kind of error is reported in the second set of failures? Does it require code changes to the do_math.c file?
  4. How does one "fix" this problem to produce the ultimate executable.

SOLUTION

4 Review Problem 3

Continuing the previous problem with the do_math.c program, while running it the program, one can invoke pmap to gather information on the working memory of the program. The shows two simultaneous runs of math_prog along with its corresponding pmap information.

#### SHELL 1 ######                              #### SHELL 2 ######      
> ./math_prog                                    > ./math_prog            
E is 2.718                                       E is 2.718               
cos(E) is -0.912                                 cos(E) is -0.912         
sin(E) is 0.411                                  sin(E) is 0.411          
E^2 is 7.389                                     E^2 is 7.389             
program pid is 3334355                           program pid is 3335206   
press any key to continue                        press any key to continue

#### SHELL 3 ######                              #### SHELL 4 ######
> pmap 3334355                                   > pmap 3335206                              
3334355:   ./math_prog                           3335206:   ./math_prog                      
000055eb074de000      4K r---- math_prog         000055d42f543000      4K r---- math_prog    
000055eb074df000      4K r-x-- math_prog         000055d42f544000      4K r-x-- math_prog    
000055eb074e0000      4K r---- math_prog         000055d42f545000      4K r---- math_prog    
000055eb074e1000      4K r---- math_prog         000055d42f546000      4K r---- math_prog    
000055eb074e2000      4K rw--- math_prog         000055d42f547000      4K rw--- math_prog    
000055eb09132000    132K rw---   [ anon ]        000055d42f7fe000    132K rw---   [ anon ]   
00007f8a8f301000     12K rw---   [ anon ]        00007f465f83c000     12K rw---   [ anon ]   
00007f8a8f304000    148K r---- libc-2.30.so      00007f465f83f000    148K r---- libc-2.30.so 
00007f8a8f329000   1332K r-x-- libc-2.30.so      00007f465f864000   1332K r-x-- libc-2.30.so 
00007f8a8f476000    296K r---- libc-2.30.so      00007f465f9b1000    296K r---- libc-2.30.so 
00007f8a8f4c0000      4K ----- libc-2.30.so      00007f465f9fb000      4K ----- libc-2.30.so 
00007f8a8f4c1000     12K r---- libc-2.30.so      00007f465f9fc000     12K r---- libc-2.30.so 
00007f8a8f4c4000     12K rw--- libc-2.30.so      00007f465f9ff000     12K rw--- libc-2.30.so 
00007f8a8f4c7000     16K rw---   [ anon ]        00007f465fa02000     16K rw---   [ anon ]   
00007f8a8f4cb000     60K r---- libm-2.30.so      00007f465fa06000     60K r---- libm-2.30.so 
00007f8a8f4da000    624K r-x-- libm-2.30.so      00007f465fa15000    624K r-x-- libm-2.30.so 
00007f8a8f576000    612K r---- libm-2.30.so      00007f465fab1000    612K r---- libm-2.30.so 
00007f8a8f60f000      4K r---- libm-2.30.so      00007f465fb4a000      4K r---- libm-2.30.so 
00007f8a8f610000      4K rw--- libm-2.30.so      00007f465fb4b000      4K rw--- libm-2.30.so 
00007f8a8f611000      8K rw---   [ anon ]        00007f465fb4c000      8K rw---   [ anon ]   
00007f8a8f645000      8K r---- ld-2.30.so        00007f465fb80000      8K r---- ld-2.30.so   
00007f8a8f647000    124K r-x-- ld-2.30.so        00007f465fb82000    124K r-x-- ld-2.30.so   
00007f8a8f666000     32K r---- ld-2.30.so        00007f465fba1000     32K r---- ld-2.30.so   
00007f8a8f66f000      4K r---- ld-2.30.so        00007f465fbaa000      4K r---- ld-2.30.so   
00007f8a8f670000      4K rw--- ld-2.30.so        00007f465fbab000      4K rw--- ld-2.30.so   
00007f8a8f671000      4K rw---   [ anon ]        00007f465fbac000      4K rw---   [ anon ]   
00007ffcf2746000    132K rw---   [ stack ]       00007fff6782d000    132K rw---   [ stack ]  
00007ffcf27fc000     12K r----   [ anon ]        00007fff67876000     12K r----   [ anon ]   
00007ffcf27ff000      4K r-x--   [ anon ]        00007fff67879000      4K r-x--   [ anon ]   
ffffffffff600000      4K --x--   [ anon ]        ffffffffff600000      4K --x--   [ anon ]   
 total             3624K                          total             3624K                    
  1. The pmap listings show entries for libc-2.30.so and libm-2.30.so. What are these entities and are they present in the do_math program?
  2. Identify several parts of the working memory of the two math_prog processes that are distinct from each other. Also identify several parts that are shared and describe how this sharing is possible.
  3. Both processes report using 3624K of memory. Do you expect that once one process finishes, there will be 3264K of memory made available? Describe why or why not.

SOLUTION

5 Review Problem 4

Below are tables showing values in Main Memory and the state of a small Cache. Beneath this is a short code fragment that sums elements from main memory. The code has a user parameter stride. For strides of 1,2,3,4, work through the code changing cache and counting how many cache hits/misses occur with the given stride. Show the final state of cache in each case.

Main Memory

| Addr | Addr Bits      | Value |
|------+----------------+-------|
|   20 | 00  10  0000   |     9 |
|   24 | 00  10  0100   |    10 |
|   28 | 00  10  1000   |    11 |
|   2C | 00  10  1100   |    12 |
|   30 | 00  11  0000   |    13 |
|   34 | 00  11  0100   |    14 |
|   38 | 00  11  1000   |    15 |
|   3C | 00  11  1100   |    16 |
|   40 | 01  00  0000   |    17 |
|   44 | 01  00  0100   |    18 |
|   48 | 01  00  1000   |    19 |
|   4C | 01  00  1100   |    20 |
|   50 | 01  01  0000   |    21 |
|   54 | 01  01  0100   |    22 |
|   58 | 01  01  1000   |    23 |
|   5C | 01  01  1100   |    24 |
|   60 | 01  10  0000   |    25 |
|   64 | 01  10  0100   |    26 |
|   68 | 01  10  1000   |    27 |
|   6C | 01  10  1100   |    28 |
|------+----------------+-------|
|      | Tag Set Offset |       |

Cache

  • Direct-mapped: 1 line per set
  • 16-byte lines = 4-bit offset
  • 4 Sets = 2-bit index
  • 8-bit Address = 2-bit tag
  • Total Cache Size = 64 bytes 4 sets * 16 bytes
  • Mostly cold with some data from an earlier part of the code
|     |   |     | Blocks/Line           |
| Set | V | Tag | 0-3  4-7  8-11  12-15 |
|-----+---+-----+-----------------------|
|  00 | 1 | 00  | 1    2    3     4     |
|  01 | 0 | -   | -                     |
|  10 | 0 | -   | -                     |
|  11 | 0 | -   | -                     |
|-----+---+-----+-----------------------|
| Set | V | Tag | 0-3  4-7  8-11  12-15 |

Code

int *data = (int *) 0x20;           // address of start of the array
int stride; scanf("%d", &stride);   // user enters 1, 2, 3, or other values
int sum = 0;
for(int i=0; i<20; i += stride){
  sum += data[i];
}

Stats for Strides:

Show sum, number of hits/misses, and final cache state for strides of 1,2,3,4.

  • Stride 1: sum, misses/hits, final cache state
  • Stride 2: sum, misses/hits, final cache state
  • Stride 3: sum, misses/hits, final cache state
  • Stride 4: sum, misses/hits, final cache state

SOLUTION

6 Review Problem 5

The code in reversal_benchmark.c implements two functions which reverse the elements of an array. They take markedly different approaches.

void reverse_arr1(int *arr, long size){
  int *tmp = malloc(sizeof(int)*size);
  for(long i=0; i<size; i++){
    tmp[i] = arr[size-i-1];
  }
  for(long i=0; i<size; i++){
    arr[i] = tmp[i];
  }
  free(tmp);
}

void reverse_arr2(int *arr, long size){
  for(long i=0; i<size/2; i++){
    int tmp = arr[i];
    arr[i] = arr[size-i-1];
    arr[size-i-1] = tmp;
  }
}

(A) Predictions

Predict which of these two functions will run faster based on their code characteristics. Justify your answer by contrasting the features of the two reversal functions.

SOLUTION

(B) Timing

Modify the provided reversal_benchmark.c file to perform timing calculations on the two functions as they are called on different sizes of arrays. Print runtimes in a tabular format.

Make sure to compile/run with a convention like the following:

> gcc -Og reversal_benchmark.c    # compile with minimal optimizations

> ./a.out                         # show usage
usage: ./a.out <min_pow2> <max_pow2> <repeats>

> ./a.out 10 20 100               # run benchmark, size 2^10 to 2^20, 100 repeats per
      SIZE       rev1       rev2
      1024       ???        ???
      2048       ???        ???
      4096       ???        ???
      8192       ???        ???
     16384       ???        ???
     32768       ???        ???
     65536       ???        ???
    131072       ???        ???
    262144       ???        ???
    524288       ???        ???
   1048576       ???        ???

Paste the C code you wrote below to show you how you timed the function runs.

SOLUTION

(C) Analysis

Paste the table of numbers your code produced for timing the two reversal functions. Describe if the one you predicted would be faster actually was measured to be faster.

SOLUTION

(D) Optimized Reversal

After determining the faster reversal function, modify it further in an attempt to further improve its performance.

SOLUTION


Author: Chris Kauffman (kauffman@umn.edu)
Date: 2023-04-25 Tue 16:58