S1: Csci 8701 - Week 8
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Problem Formulation, Contributions
  • SQL Query Optimization Problem
    • Given: A SQL query Q over N tables
    • ... Space of semantically equivalent Execution Plans
    • Find most efficient plan tree.
    • Objective: Minimize time to answer query
    • Constraints: simple queries (N is small)
  • Major Contributions
    • Define Space of execution plans
    • Cost estimation models for plans
    • Systematic Search strategy for cheapest plan
    • ... Dynamic Programming


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: Key Concepts : Scope of Plan Tree
  • Terms: SQL Query vs. Query Block
  • Recall SQL Query Block
    SELECT [DISTINCT] < list of columns >
    FROM < list of tables >
    [WHERE < list of "Boolean Factors" (predicates in CNF) > ]
    [GROUP BY < list of columns >
    [HAVING < list of Boolean Factors > ]]
    [ORDER BY < list of columns > ];
  • Meaning
    • Plan Tree would
    • ... Take product of tables in FROM clause
    • ... Project onto columns relevant to other clauses
    • ... WHERE clause = > apply all filters in it
    • Postprocessing would
    • ... GROUP BY clause = > form groups on the result
    • ... HAVING clause = > filter groups with it
    • ... ORDER BY clause = > ensure right order of output
    • ... DISTINCT modifier = > remove duplicates


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: Key Concepts: Plan Space
  • SQL query -- > a set of SQL blocks
  • Plan space for single block query
    • sequence of 1 table or 2 table operations
    • only left-deep plan trees
    • postpone Cartesian products
  • Q? Which of the following were not considered by System R?
  • Access Methods (strategies) for select, project, join
    • Scans: sequential & index (clustered/unclustered)
    • NL-join, sort-merge join, hash join
    • sorting & hash-based grouping
  • Q? Which join-methods was not considered by System R?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: Key Concepts: Cost Estimation
  • Parameters of cost models (pp. 143)
    • # of tuples & pages
    • # of values per indexed column
    • # of RSS calls
  • Cost Model for an operator (Tables 1-2, pp. 144-5)
    • Components: I/O and CPU costs
    • Based on size of base tables
    • Size of intermediate result is based on
    • ... estimate of predicate selectivities
    • Q? Are there any surprises in Table 1 or 2?
  • Q? Where are cost models for join strategies?
    • Do they show the dominance zones of 2 methods?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Key Concepts: Searching the Plan Space
  • Algorithm structure - Dynamic Programming
    • Only n! left-deep join-orders
    • many share common prefixes
    • Do not enumerate all join-orders
  • Execution Trace
    • Find all plans for accessing each base relation
    • For each table, save cheapest plan, and
    • Try joining pairs of 1-table plans saved so far.
    • Save cheapest 2-table plans
    • Try combining a 2-table plan with a 1-table plan.
    • Save cheapest 3-way plans.
    • ...
    • Combining k-way and 1-way plans
    • ... until there is a full plan trees


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Validation Methodology
  • Evaluation:
    • Computational complexity = n * exp(2, n-1)
    • Memory complexity = choose(n, n/2)
    • Not scalable beyond 10-15 joins
  • Paper presents a detailed example
    • Illustrates the key concepts
    • Shows the usefulness of proposed approach
  • Quotes previous studies on join strategies
    • ... motivates the approach
  • Conceptually better than competition at the time
    • ... - Ingres, Oracle 6
  • Other
    • Group designing a prototype relational DBMS
    • Methods has stood test of time
    • Minor revisions over 25 years


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Assumptions
  • Cost Components
    • Cost of query optimization is negligible
    • Note: search space size is exponential
    • Cost Models are available
  • Algebraic optimization has occurred already
    • Push selections, projects below joins
  • Cost Models - an inexact science!
    • Uniform distribution of key values in a table
    • No difference b/w Random and Sequential IO
    • Independence across predicates
  • Search space explorations
    • Only left-deep trees are interesting
  • Other assumptions - Sequential execution


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Rewrite today
  • Reduce Assumptions
    • User defined data types and cost models
    • Optimizing for parallel executions
  • Update literature survey
    • Semantic Query Optimization
    • SQL Query Rewrite (IBM)
    • Processing correlated nested queries (Decorrelation)
  • Cost Estimation - New approaches:
    • Sampling base relations
    • Histograms of value distributions
  • Q? What is cost of maintaining the estimates?
    • Refresh estimates periodically
    • Uniform distribution assumptions
  • New Search methods
    • DP-variations, Randomized, Job Scheduling


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.