Skip to main content

Query Optimization

Execution Tree:

  • Leaf Nodes: base relations
  • Inner Nodes: operators
  • Tuples from the base relations flow into the parent operator nodes
  • Output of an opterator nodes flows into the parent node
  • Output of the root flow as result to the client

Execution Tree

Example
SELECT GM.gid, count(*), AVG(U.age)
FROM Users U, GroupMembers GM
WHERE U.uid = GM.uid AND U.experience = 10
GROUP BY GM.gid
HAVING AVG(U.age) <25

A corresponding example execution tree:

Pipelining

  • Definition 1: As soon as an output tuple of an operator is found, it is forwarded to the next operator

  • Definition 2: As soon as one ouput frame in the memory buffer is full, it is processed by the next operator

  • Enables parallel execution of operators

  • No write to disk costs for intermediate relations if it can be avoided

Example
SELECT *
FROM Users U, GroupMembers GM
WHERE U.uid = GM.uid
AND experience = 10

Assumptions:

  • 50 buffer frames available
  • 40,000 tuples on 500 UserPages
  • 1,000 GroupMember pages
  • 1% of users have experience = 10 i.e. 400 users
  • Each User tuple has on average 2.5 GM tuples
  • In the execution tree, selection is done first, and the resulting tuples are pipelined directory to the join

Which join is best?

Block Nested Loop

Cost:

UserPages+UserPages×0.0150×GroupMemberPages=500+1×1000=1,500\text{UserPages} + \frac{\text{UserPages} \times 0.01}{50} \times \text{GroupMemberPages} = 500 + 1 \times 1000 = 1,500 Index Nested Loop

Cost:

UserPages+Users×0.01×Cost finding GroupMembers=500+400×(1+2.5)=500+1400=1900\text{UserPages} + |\text{Users}| \times 0.01 \times \text{Cost finding GroupMembers} = 500 + 400 \times (1 + 2.5) = 500 + 1400 = 1900 Sort Merge Join

Cost:

UserPages+3×GroupPages=3500\text{UserPages} + 3 \times \text{GroupPages} = 3500

Optimization Techniques

Two types of optimization techniques:

  • Cost-Based Optimization:

    • Calculate and compare the I/O cost of possible implementations of operators.
  • Algebraic Optimization:

    • Prioritize operations that eliminate a lot of tuples
    • push down selections and projections
    • Order joins to have intermediate results with minimum number of tuples and tuple size

    Algebraic Optimization

Push down projections, selections and joins to their respective relations

Note: need to be careful that projecting does not remove attributes that might be needed later on in.

Example
SELECT u.name, u.experience
FROM Users U, GroupMembers g
WHERE u.uid = g.uid
AND g.gid = "G1"
AND u.age > 20

This query can be expressed in algebraic form as:

πuname, experience(σgid=G1age>20(UsersGroupMembers))\pi_{\text{uname, experience}}(\sigma_{\text{gid} = 'G1' \land \text{age} > 20} (\text{Users} \bowtie \text{GroupMembers}))

where the order of operators is first join, then selection, then projection.

Naive Execution Tree

Optimized Execution Tree

Pushing projections down the execution tree wont reduce the number of tuples but it will reduce the memory size of intermediate results as attributes are trimmed from each tuple.

General rules of thumb

  • If the application has certain kinds of queries that will be run frequently, create appropriate indexes to speed up these queries
  • If the application has many updates and inserts, be conservative with creating indexes for the relevent relations since each INSERT/UPDATE will also change the index tree.
  • SQL EXPLAIN explains how a query is executed internally.