INTRODUCTION TO PARALLEL COMPUTING: DESIGN AND ANALYSIS OF ALGORITHMS Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis TABLE OF CONTENTS 1 Introduction 1.1 What is Parallel Computing? 1.2 Scope of Parallel Processing 1.3 Issues in Parallel Computing 2 Models of Parallel Computers 2.1 Taxonomy of Parallel Architectures 2.2 An Ideal Parallel Computer 2.3 Interconnection Networks for Shared Memory Computers 2.4 Interconnection Networks for Distributed Memory Computers 2.5 Embedding Other Networks into a Hypercube 2.6 Convergence of Shared Memory and Distributed Memory Architectures 2.7 Routing Mechanisms in Communication Networks 2.8 Cost-Performance Analysis of Static Communication Networks 3 Basic Communication Operations 3.1 Simple Message Transfer Between Two Nodes 3.2 One-To-All Broadcast 3.3 All-To-All Broadcast 3.4 One-To-All Personalized Communication 3.5 All-To-All Personalized Communication 3.6 Circular Shift 3.7 Faster Methods for Some Communication Operations 3.8 Summary 4 Performance and Scalability of Parallel Systems 4.1 Performance Metrics for Parallel Systems 4.2 Effect of Granularity Data-Mapping on Performance 4.3 Scalability of Parallel Systems 4.4 The Isoefficiency Metric of Scalability 4.5 Sources of Parallel Overhead 4.6 Minimum Execution Time and Minimum Cost-Optimal Execution Time 4.7 Other Scalability Metrics 5 Dense Matrix Algorithms 5.1 Mapping Matrices on Processors 5.2 Matrix Transpose 5.3 Matrix-Vector Multiplication 5.4 Matrix Multiplication 5.5 Solving a System of Linear Equations 6 Sorting 6.1 Issues in Sorting on Parallel Computers 6.2 Sorting Networks 6.3 Bubble Sort and its Variants 6.4 Quicksort 6.5 Other Sorting Algorithms 7 Graph Algorithms 7.1 Definitions and Representation 7.2 Minimum Spanning Tree 7.3 Single-Source Shortest Paths 7.4 All-Pairs Shortest Paths 7.5 Connected Components 7.6 Algorithms for Sparse Graphs 8 Search Algorithms for Discrete Optimization 8.1 Sequential Algorithms for Discrete Optimization Problems 8.2 Search Overhead 8.3 Depth-First Search Algorithms 8.4 Parallel Best-First Search 8.5 Speedup Anomalies in Parallel Search Algorithms 9 Dynamic Programming 9.1 Serial Monadic DP Formulations 9.2 Non-Serial Monadic DP Formulations 9.3 Serial Polyadic DP Formulations 9.4 Non-Serial Polyadic DP Formulations 9.5 Summary and Discussion 10 Fast Fourier Transform 10.1 Serial Algorithm 10.2 Binary Exchange Algorithm 10.3 The Transpose Algorithm 10.4 Cost-Effectiveness of Meshes and Hypercubes for FFT 10.5 Computation 10.6 Other FFTs 11 Solving Sparse Systems of Linear Equations 11.1 Basic Operations on Sparse Matrices 11.2 Iterative Methods for Sparse Linear Systems 11.3 Finite Element Method 11.4 Direct Methods for Sparse Linear Systems 11.5 Multigrid Methods 12 Systolic Algorithms and Their Mapping on Parallel Computers 12.1 Examples of Systolic Systems 12.2 Mapping Systolic Systems to Parallel Computers 12.3 Mapping One-Dimensional Systolic Arrays 12.4 Mapping Two-Dimensional Systolic Arrays 13 Parallel Programming Languages 13.1 Parallel Programming Paradigms 13.2 Message Passing Extensions to Programming Languages 13.3 Data Parallel Languages 13.4 Programming Shared Memory Computers 13.5 Fortran D Index Glossary Most chapters conclude with a bibliography.