CSCI 4061 INTRODUCTION TO OPERATING SYSTEMS (last revised 8/11/03) OVERALL DESCRIPTION: 4061 is a required course for computer science and comp[uter engineering majors. 4061 introduces operating systems principles through practical systems programming. Students will learn about fundamental operating systems principles: definition and comparison of operating systems, hardware services for system protection and memory management, the kernel and shell, file I/O and buffering, file system, process and thread control, signals, interprocess communication and synchronization. They will be able to write Unix system programs that utilize system services, and they will understand the benefits and complications of buffering, asynchronous events, and synchronized communicating processes or threads. CATALOG DESCRIPTION: Foundations of operating systems. History and evolution of operating systems, shells, tools, memory organization, file system overview, I/O, concurrent processes, and interprocess communication. CONTENT: Week 1 DEFINITION OF AN OPERATING SYSTEM: Comparison of several operating systems students know with respect to mutli-user, multi-tasking, multi-processor, stability, and protection capabilities. Kernel, shell, and application layers.The users', systems programmers', and OS designers' points of view. Operating system services: file I/O, file systems, process control, interprocess communication. Process vs. program. Hardware services to support OS: kernel and user modes, I/O and clock interrupts, traps, virtual memory. Cooperative vs. pre-emptive multi-tasking. Week 2 SHELL PROGRAMMING: Difference between OS kernel, shell, and utilities. Reading documentation: man pages, man sections, searching man pages. Shell command line syntax: standard input/output/error, pipes, background processes, job control, wildcards. Shell scripts: running a shell script, variables, arguments, quoting, command substitution, flow control, signal handling. Filters and utilities: grep, sed, sort, cut, and so on. Shell programming assignment. Week 3 SYSTEM INPUT AND OUTPUT File structure as a stream of bytes. Pathnames, file types, file ownership, and file permissions. Opening a file and internal OS representation as a file descriptor. Read and write modes. Reading and writing. Difference between raw binary data and formatted ASCII data. File position and seeking; seeking beyond the end of a file and files with "holes". Example of file copy with and without buffering; performance benefits of buffering; performance penalty of many system calls. Kernel disk cache and synchronized writing. Week 4 ATOMIC OPERATIONS AND FILE SHARING Problem of simultaneous processes doing read, modify, write. Atomic operations. Implementing atomic operations with test-and-set machine instruction. Unix lock files with O_CREAT|O_EXCL. Multiple processes sharing a file; Unix file descriptor table, file table, and in memory i-node table. Dup system call. Transparency of Unix disk cache. System I/O programming assignment. Week 5 FILE SYTEM AND DIRECTORIES File protection mechanism: process user id, file owner id, user, group, and other read, write, and execute permissions. Meaning of read, write, and execute permissions for directories. Set-uid programs to access protected files in constrained way. Unix file system organization: partitions, blocks, super block, i-nodes, i-list, directories, hard links, link counts. When is a file really deleted; unlinking open files. System calls to get i-node information, link and unlink files, read directories, and make directories. Week 6 STANDARD INPUT AND OUTPUT Standard I/O buffering for efficiency. Formatted input and output. Standard I/O library routines for opening, closing, reading, writing, seeking, and testing for error or EOF conditions. File pointers and the underlying structure. Efficiency comparison with system I/O. Fully buffered and line buffered I/O. Problems of I/O buffers in user space--inconsistent views of the file. Flushing buffers explicitly and on process termination. Week 7 ENVIRONMENT OF A SINGLE PROCESS OS services for process control. How a program is loaded and started as a process. Passing arguments to the process--argc and argv. Passing environment variables. Terminating a process, normal and abnormal termination, return from main, exit, _exit, and termination by a signal. Exit value of a process. Memory layout of a process: text, data, bss, heap, and stack segments. Stack frames for each procedure call: arguments, return value, local variables. Dynamic memory allocation on the heap with malloc. Week 8 MULTI-TASKING PROCESSES Multi-tasking. Context of a process and a context switch. Process table information about each process. Process states: running in user mode, running in kernel mode, ready, sleeping, and terminated. Process scheduling--non-deterministic as far as systems programmers are concerned. Sleeping/waiting on an event. Heavy weight processes vs. light weight threads; applications and advantages of threads; process level threads vs. kernel level threads. (Start fork, exec, and wait from week 9.) Week 9 FORK, EXEC, AND WAIT Creating a process with fork; fork is called once and returns twice. What a child copies and inherits from parent and differences from parent. Tree of parent-child relationships. Windows NT process model--no parent child relationship. Exec system call to replace program of process; called once and never returns. What changes and what remains across an exec. Unix exec system calls; PATH environment variable. Advantages and disadvantages of fork/exec model, inefficiency, copy on write, vfork. Open files across fork and exec. Wait to get exit status from child and synchronize with child, join. Wait and waitpid system calls. Zombies and orphans. Programming assignment to write a simple shell. Week 10 SIGNALS Software interrupts. Hardware and software generation of signals. Asynchronous nature of signals. Signal and Kill system calls. Unix signals. Handling, ignoring, or taking default action on a signal. Signals across fork and exec. Pause, alarm, and sleep system calls. Simple example of sleep implementation. Problems with simple example of sleep--race conditions. What can be safely done in a signal handler, volatile, atomic types, re-entrant system calls. Using setjmp and longjmp to implement sleep to avoid race conditions--still has problems. Reliable signals. Generated, pending, blocked, and delivered signals. System calls to atomically unblock signals and wait for signal. Final correct implementation of sleep. Week 11 NON-BLOCKING I/O, FILE LOCKING, PIPES, FIFOs (Finish signals from week 10.) Blocking system calls, "slow" system calls, interrupted system calls, setting and using non-blocking I/O. Pipes: stream of data between two processes, kernel buffer, blocking when writing to a full buffer or reading from an empty buffer. Pipe system call; forking to create communicating processes. Reading from a buffer whose write end is closed, writing to a buffer whose read end is closed. Pipes require common ancestor who created the pipe. FIFOs: named pipes, file system pathname used to name the pipe, unrelated processes can communicate over a FIFO. Programming assignment on interprocess communication. Week 12 SOCKETS Interprocess communication mechanism, both local (Unix domain sockets) and networked (Internet domain sockets). Client server model. Concurrent servers fork a child for each connection. Addressing problem; 5 tuple of protocol, source machine, source port, destination machine, and destination port identifies a connection and distinguished multiple connections to the same machine. Stream vs. datagram, connection oriented vs. connectionless, reliable vs. unreliable communication. Berkley socket system calls, socket, bind, listen, accept. Week 13 THREADS Attributes shared by threads (global memory, code, open files, ...) and attributes unique to each thread (flow of control, stack, registers). Efficiency of context switching between threads compared to processes. Interprocess communication through shared memory. Synchronization issues with shared memory. POSIX thread system calls and semaphores. Example chat programs with non-blocking I/O compared to threads. Week 14 INTERPROCESS COMMUNICATION Various IPC mechanisms: shared file, signals, pipes, FIFOs, messages shared memory, semaphores, sockets. Messages: discrete nature of messages (not stream), addressing problem, queue size, permissions, multiple readers or writers, reliable or unreliable. Shared memory: fast but synchronization problems. Critical section problem: mutual exclusion, progress, bounded waiting. Examples solutions of critical section problem that satisfy some but not all three properties. Atomic operations vs. critical sections. Semaphores: solution to critical section problem, kernel level implementation that satisfies mutual exclusion, progress, and bounded waiting. Deadlock and starvation. Note: the last programming assignment could use pipes or sockets, threads or heavyweight processes. Depending on the assignment, the order of the last classes can be changed to cover material needed for the programming assignment early. WHY THIS CLASS IS IMPORTANT, AND ITS ROLE IN THE CURRICULUM: Many upper level courses expect fundamental knowledge of operating systems concepts and many programming assignments and projects require familiarity with systems programming. This course gives students this knowledge and skill. More generally it exposes them to complex computing systems which increases their maturity. Synchronization issues are complex and difficult for students to understand, but important in systems, networking, and languages. Students must learn to think non-deterministically to understand race conditions and synchronization operations. In many ways this course serves as an advanced programming course; the programming projects require solid programming and debugging skills. The systems programming skills taught in this course are also of value in the work world. Many students have commented that this was the most helpful course in their computing jobs. PREREQUISITES, AND RATIONALE: CS 2021, Machine Architecture and Organization, is the prerequisite for CS 4061. Students need to understand basic hardware and assembly language programming concepts for this course. Students need to understand data representation and number systems to understand the difference between raw binary I/O and formatted I/O. They need to understand bit flags and bitwise operations to understand the arguments to many system calls. They need to understand subroutine calls and the hardware stack to understand the process memory organization. They need basic hardware concepts to understand I/O, interrupts, and virtual memory. And they need to understand assembly language programming to understand the test-and-set operation for synchronization. CLASSES HAVING 4061 AS A PREREQUISITE, AND RATIONALE: CS 4061 is a pre-requisite for CS 5103 Operating Systems. This course is essential preparation for the 5103 Operating Systems course; students must understand operating systems services and the systems programmers' view of an operating system from the outside before they can understand operating systems design from the inside. Also, programming assignment in the Operating Systems course require systems programming skill developed in CS 4061. CS 4061 is a pre-requisite for CS 4970 Advance Project Laboratory. The Advance Project Laboratory expects students to develop a complex software application; many such projects could involve systems programming. CS 4061 is a pre-requisite for CS 5211 Data Communications and Computer Networks. The networks course requires an understanding of interprocess communication generally; also the concept of complex layered software systems developed in 4061 is useful in understanding network architectures. The coverage of sockets in 4061 gives students practical experience using networking services and introduces them to issues like connection oriented stream services and addressing. This makes the more abstract treatment of these issues in 5211 more understandable. CLASS FORMAT: 4 credits, lectures and four independent programming assignments. PROBABLE TEXT, IF ANY: Robbins, "Unix Systems Programming," 2nd edition; Sarwar and Al-Saqabi, "Linux and Unix Programming Tools." OUTCOMES: Upon successful completion of the course students should understand basic operating systems concepts and be able to write systems programs that use operating systems services. The concepts and system calls are described above.