Last Updated: 2023-04-21 Fri 09:19

CSCI 2021 HW13: pmap and Linking

CODE DISTRIBUTION: hw13-code.zip

CHANGELOG: Empty

1 Rationale

On modern computing systems, virtual memory creates the illusion that every program has a linear address space from 0 to some large address. Mostly this happens behind the scenes and is managed by the operating system but knowledge of presence of virtual addresses provides insight into many aspects of practical programming. One can inspect some of the OS information on the virtual address space of a program using utilities such as pmap.

The linker is a little discussed portion of a typical compiler chain but it is a frequent source of frustrating compilation errors when dealing with code libraries. This lab covers the basics of linking to system libraries and how this affects the virtual memory image of the resulting program.

Associated Reading / Preparation

Bryant and O'Hallaron: Ch 9 on Virtual Memory is informative for this Problem 1. The mmap() function is discussed in section 9.8.4. Bryant and O'Hallaron Ch 7 on Linking and ELF formats is pertinent to the second problem.

Grading Policy

Credit for this HW is earned by taking the associated HW Quiz which is linked under Gradescope. The quiz will ask similar questions as those that are present in the QUESTIONS.txt file and those that complete all answers in QUESTIONS.txt should have no trouble with the quiz.

Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.

See the full policies in the course syllabus.

2 Codepack

The codepack for the HW contains the following files:

File State Description
QUESTIONS.txt EDIT Questions to answer
Makefile Provided Makefile to build programs for the HW
memory_parts.c Provided Problem 1 program to analyze
gettysburg.txt Provided Problem 1 data file
do_math.c Provided Problem 2 program compile and link
do_pthreads.c Provided Problem 2 program compile and link

3 What to Understand

Ensure that you understand

  • That program regions like the stack and heap are comprised of virtual addresses that the OS maps to physical locations
  • Basic compiler options to link against standard libraries like the math library
  • How the nm command can show defined and undefined symbols in an executable
  • How to use ldd to show what libraries an executable is dynamically dependent upon

4 Questions

Analyze the files in the provided codepack and answer the questions given in QUESTIONS.txt.

                           _________________

                            HW 13 QUESTIONS
                           _________________


- Name: (FILL THIS in)
- NetID: (THE kauf0095 IN kauf0095@umn.edu)

Write your answers to the questions below directly in this text file.
Submit the whole text file while taking the associated quiz.


PROBLEM 1: Virtual Memory and pmap
==================================

(A) memory_parts memory areas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Examine the source code for the provided `memory_parts.c'
  program. Identify what region of program memory you expect the
  following variables to be allocated into:
  - global_arr[]
  - stack_arr[]
  - heap_arr


(B) Running memory_parts and pmap
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Compile the `memory_parts' using the provided Makefile.
  ,----
  | > make memory_parts
  `----
  Run the program and note that it prints several pieces of information
  - The addresses of several of the variables allocated
  - Its Process ID (PID) which is a unique number used to identify the
    running program. This is an integer.
  For example, the output might be
  ,----
  | > ./memory-parts
  | 0x5605a7c271e9 : main()
  | 0x5605a7c2a0c0 : global_arr
  | 0x7ffe5ff7d600 : stack_arr
  | 0x5605a92442a0 : heap_arr
  | 0x7f1fa7303000 : mmap'd file
  | 0x600000000000 : mmap'd block1
  | 0x600000001000 : mmap'd block2
  | my pid is 8406
  | press any key to continue
  `----
  so the programs PID is 8406

  The program will also stop at this point until a key is pressed. DO
  NOT PRESS A KEY YET.

  Open another terminal and type the following command in that new
  terminal.
  ,----
  | > pmap THE-PID-NUMBER-THAT-WAS-PRINTED-EARLIER
  `----

  Paste the output of pmap below.


(C) Program Addresses vs Mapped Addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  pmap prints out the virtual address space table for the program. The
  leftmost column is a virtual address mapped by the OS for the program
  to some physical location.  The next column is the size of the area of
  memory associated with that starting address. The 3rd column contains
  permissions of the program has for the memory area: r for read, w for
  read, x for execute. The final column is contains any identifying
  information about the memory area that pmap can discern.

  Compare the addresses of variables and functions from the paused
  program to the output. Try to determine the virtual address space in
  which each variable resides and what region of program memory that
  virtual address must belong to (stack, heap, globals, text).  In some
  cases, the identifying information provided by pmap may make this
  obvious.


(D) Min Size of Mapped Areas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  The minimum size of any virtual area of memory appears to be 4K. Why
  is this the case?


(E) Additional Observations
~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Notice that in addition to the "normal" variables that are mapped,
  there is also an entry for the mmap()'d file 'gettysburg.txt' in the
  virtual address table.  The mmap() function is explored in the next
  problem but note its calling sequence which involves use of a couple
  system calls:
  1. `open()' which is a low level file opening call which returns a
     numeric file descriptor.
  2. `fstat()' which obtains information such as size for an open file
     based on its numeric file descriptor. The `stat() / fstat()' system
     calls are used to ask the Unix Operating System information about
     files such as their size, modification times, and access
     permissions.  This system call is studied more in Operating System
     courses.

  Finally there are additional calls to `mmap()' which allocate memory
  to the program at a specific virtual address. Similar code to this is
  often used to allocate and expand the heap area of memory for programs
  in implementations of `malloc()'.


PROBLEM 2: Linking to System Libraries
======================================

(A)
~~~

  The file `do_math.c' contains some basic usage of the C library math
  functions like `pow()'.  Compile this program using the command line
  ,----
  | > gcc do_math.c
  `----

  and show the results below which should be problematic. Describe why
  the linker complains about functions like `cos' and `pow'.

  *Note*: problems will arise on Linux systems with gcc: other
  OS/compiler combinations may not cause any problems.


(B)
~~~

  In order to fix this problem, one must link the program against the
  math library typically called `libm'. This can be done with the option
  `-l' for "library" and `m' for the math library as shown:
  ,----
  | > gcc do_math.c -lm
  `----


  Show a run of the resulting executable after a successful compile
  below.


(C)
~~~

  After successfully compiling `do_math.c', use the `ldd' command to
  examine which dynamically linked libraries it requires to
  run. Assuming the executable is named `a.out', invoke the command like
  this
  ,----
  | > ldd a.out
  `----

  Show the output for this command and note anything related to the math
  library that is reported.


(D)
~~~

  Run the program which should report its Process ID (pid) before
  pausing.  In a separate terminal, while the program is still running,
  execute the pmap command to see the virtual address space for the
  program (command `pmap <pid>'). Paste the results below and describe
  any relation to the math library that is apparent.


(E)
~~~

  Repeat the general steps above with the C file `do_pthreads.c' which
  will require linking to the PThreads library with `-lpthread'.
  - Compile to show error messages
  - Compile successfully with proper linking and show output
  - Call `ldd' on the executable
  - While the program is paused, run `pmap' to see its virtual address
    space

  Show the output of these commands below.

Author: Chris Kauffman (kauffman@umn.edu)
Date: 2023-04-21 Fri 09:19