S1: Ch. 15: Query Processing and Optimization
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Query Processing
  • Process is similar to compilation (Fig. 15.1, pp 494)
    • SQL Query -- > [scanner, parser] -- > query tree/graph
    • ...-- > [OPTIMIZER] -- > efficient execution plan
    • ...-- > [code generator] -- > code to execute the plan
    • ...-- > [run-time support] -- > query result
  • We will FOCUS on the optimizer module
    • Output:: Execution plan = set of selected access routines
    • ...from possible access routines for each relational operator
    • Input: Query tree or query graph
    • Algorithms: Systematic Enumeration/ Dynamic Programming
    • ...Algebraic optimization, Other heuristics
    • ...Semantic Optimization, Source to Source transforms
  • Access Routines for relational operations & cost models
    • Select (Point, Range): scan, binary search, index-search
    • Join: nested loop, sort-merge, hybrid hash


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: 15.8.1: Cost Models for Access Routines
  • Cost Metrics = #block accesses (OR response time in seconds)
    • focus on I/O cost, not CPU/communication
  • I/O Cost DEPENDS ON
    • Search criteria: point/range query on ordering /other fields
    • File Structures: heap, sorted, hashed.
    • Index-type: primary, clustering, secondary, B+tree, multi-level,...
    • Other factors, e.g. buffering, disk placement, materialization,
    • ...Overflow / free-space mgmt, spanned/unspanned blocking, ...
  • Parameters of Cost Models
    • s = selectivity = fraction of tuples selected = |result| / |table|
    • S(field) = selection cardinality = r(d) / (#distinct values of field)
    • B, bd, bi(j) = block size, #blocks in datafile/ at level j of index
    • r(d) = #records in data file
    • u, bfr(d), bfr(i) = utilization, blocking factors for datafile/index
    • fo, d = fanout and depth of index
    • R, P, Pr = record size, tree pointersize , data record pointer size
    • e = Pr[a record in overflow area] * E[overflow chain length]
  • Q? Compare these parameters to those in Oracle system catalog.


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: 15.3: Access Routines for Relational Operators
  • Access routine - is an execution strategy for a relational operator
    • May use available file-structures and indexes
    • Specifies algorithms , and applicability
  • Point Query: (i) Linear, (ii) Binary, (iii) Hash, (iv) Indexed search
  • Range Query: (1) Linear search , (2) Binary search then scan
    • ...3. Find 1st record in, then scan datafile (Primary/Clustered index)
    • ...4. Find 1st record in, then scan index leafs (Secondary index/B+tree)
  • Conjunctive selection - 1. Use composite index
    • ...2. Process one condition first followed by scan of result
    • ...3. Intersect (record pointers retrieved by different conditions)
  • Q? How do we choose an appropriate plan?
    • Rule based models
    • Cost based models
  • Ex. Provide algebraic formula for access routines
    • Point Search (UNIQUE field): bd/2 ; log(bd); 1+ e; d + 1;
    • ...not unique = > + S/bfr(d) to sorted/hashed file; bd for heap
    • Range Search: bd or bd/2 + s*bd; log(bd) + s*bd; d + 1 + s*bd;
    • ...secondary index : d + s*[bi(leaf) + r(d)] worst case


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: 15.3.2 + 15.8.4: JOIN strategies, Cost models
  • Focus on two way equi-join:: join(R, S, R.A = S.B)
  • Join-selectivity= js(R.A=S.B)=|join result|/(|R|*|S|); |X|=#records in X
    • = > Join result has b(RS)= (js*|R|*|S| / bfr(RS)) blocks
  • (1)Nested loop: For each block in R, (a) Read matching record in S
    • ...using scan or indexed search, and (b) Write partial result
    • I/O Cost = bR + (bR * bS) + b(RS)
    • ...Indexed search cost= f(available index(S.B)), see Page 499
    • ...= bR + [|R|*(d(S.B) + S(S.B))] + b(RS), secondary index(S.B)
  • (2) Sort-Merge: (i) If Needed, Sort R on R.A, Sort S on S.B
    • ...(ii) Scan R and S block by block [See Fig. 15.3(a), pp 503]
    • Very efficient if R and S already sorted, Cost = bR + bS + b(RS)
    • Sorting = > Add k*{bR * log(bR) + bS * log(bS)}
  • (3) Hash: (i) hash records of R, S into a common hash file (CHF)
    • ...(ii) scan CHF to compute and write results
    • Cost= 3*(bR+bS) + b(RS), w/ buffering (hybrid hash, pp507, para 3)


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Dominance Analysis of Access Routines
  • Ex. Compare scan and secondary-index for Range queries?
    • Heap file Scan: bd; < = Secondary index: d + s*[bi(leaf) + r(d)]
    • ...OR selectivity(s) > = (bd-d) / [bi(leaf)+rd] > bd/rd = (1/bfr(d))
    • example: bfr(d) = 10; selectivity = 0.15 ;
  • Ex.Nested loop: Which relation should be in outer loop ?
    • Smaller one = > reduces cost
  • Q? Which strategy wins for join(R,S), IF (bR > bS) and
    • (1) primary index on R.A in R, S.B in S (depth = x)
    • ...Let #record/distinct value of join field = a
    • ...(bR = bS) = > sort merge if bR = bS
    • ...(bR > a*x*bS) = > nested loop with index-scan
    • (2) Heap Files , bR = bS :: hash-join (log(bR) > 10)
    • (3) Hashed Files R on R.A, S on S.B,
    • ...(bR = bS) = > hash join if bR = bS , log(bR) > 10
    • ...(bR > 4*bfr(inner loop)*bS) = > nested loop with index-scan ?


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Dominance Analysis of Access Routines (Contd.)
  • Q? Which strategy wins for join(R,S), IF (bR > bS) and
    • (4) primary index on R.A, S hashed on primary key S.B,
    • ...bS = bR, bfr(R) = 3, index-depth = 1, (Hint: unique matches)
    • ...Nested loop (inner = indexed search on R) cost = 4bR + b(RS)
    • (5)bS < |memory| << bR ; cost = bR + bS + B(RS) for all 3
    • ...1. buffer + index/sort/hash S in memory;
    • ...2. For each block in R, bring the block and write result
  • Q? Which strategies can be used for non-equijoins?
    • ...for select-project-equijoin?
  • Summary comparison of join strategies
    • Nested loop is a dark horse, watch out!
    • ...e.g. (bR/bS) >> 1, or bfr >> 1
    • ...OR only one file has good search structure
    • Both files sorted, sort-merge can win
    • ...unless (bR/bS) >> 1, or bfr >> 1
    • Both files have No good search structure, Hybrid hash can win
    • ...unless (bR/bS) >> 1, or bfr >> 1


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: 15.7: Heuristic Query Transformations
  • Output: Sequence or Set of access routines
  • Input:(i) Query Tree: syntax tree for eqv. rel. algebra query
    • ...Leafs = input relations :: Non-leafs = rel. algebra operations
    • Initial canonical tree: Easy to generate from parser (Fig. 15.5(a))
    • ...cartesian product of all rel.+ 1 select + 1 project
    • ...Ex. Fig. 15.5 (b) pp 516 with Q (pp 515)
  • (ii)Query graph: constraints in tuple calculus exp. (Fig. 15.4/pp 513)
    • Nodes = relations (tuple variables) OR constants (double circle)
    • ...Edges = join conditions
    • We will not use it. Came from Ingres/QUEL optimizer.
  • ALGORITHM 1: Source to Source Transform (Ch 15.7.2, pp 520-1)
    • Semantic transformation rules, e.g. integrity constraints (IC)
    • ...Ex. Q(pp 533): Find Emps who earn more than their supervisors
    • ...IC: No employee earns more than his/her supervisor.
    • Implemented by Thm. prover = > exponential in #applicable rules
    • Alternative is hand recoding


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Query Optimization Algorithms (Contd.)
  • ALGORITHM 2: Algebraic optimization by transforming query trees
    • Start w/ Initial canonical tree
    • Use laws of relational algebra (Ex. rule 4, pp 519)
    • General steps (pp 520): apply those operations first
    • ...which reduce the size of intermediate results
  • Ex. Trace: Fig. 15.5 (pp 516-518)
    • Fig. 15.5(b) move selects down the tree (close to table)
    • Fig. 15.5(c) more restrictive select 1st (pname = 'Aquarium')
    • ...commute cartesian products, rearrange select conditions
    • Fig. 15.5(d): Combine (product, select) into joins
    • Fig. 15.5(e): Move projects down the tree, add project
    • ...to drop fields not needed for result (and join/selects)
  • ALGORITHM 3: Dynamic Programming w/ cost models
    • 1. Choose an ordering of join operation (e.g. left to right)
    • 2. Given ordering, enumerate combination of access routines
    • ...choose low-cost combination, 1 access routine per rel. op.
    • Used in commercial systems, Step 2 is cubic in #relations,
    • ...Step 1. is exponential and optimizers scrimp here!


Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.