Administração e Optimização de Bases de Dados Administração e Optimização de Bases de Dados Expression Equivalence Rules Departamento de Engenharia Informática Instituto Superior Técnico 2nd Semester 2009/2010 Slides baseados nos slides oficiais do livro “Database System Concepts” c Silberschatz, Korth and Sudarshan. Equivalence Rules I Administração e Optimização de Bases de Dados 1) Conjunctive selection operations can be deconstructed into a sequence of individual selections σθ1 ∧θ2 (E ) = σθ1 (σθ2 (E )) 2) Selection operations are commutative σθ1 (σθ2 (E )) = σθ2 (σθ1 (E )) 3) Only the last in a sequence of projection operations is needed, the others can be omitted πL1 (πL2 (. . . (πLn (E )) . . . )) = πL1 (E ) Equivalence Rules II Administração e Optimização de Bases de Dados 4) Selections can be combined with Cartesian products and theta joins σθ (E1 × E2 ) = E1 ⊲⊳θ E2 σθ1 (E1 ⊲⊳θ2 E2 ) = E1 ⊲⊳θ1 ∧θ2 E2 5) Theta-join operations and natural joins are commutative E1 ⊲⊳θ E2 = E2 ⊲⊳θ E1 Equivalence Rules III Administração e Optimização de Bases de Dados 6) Join operations are associative (a) Natural join operations are associative (E1 ⊲⊳ E2 ) ⊲⊳ E3 = E1 ⊲⊳ (E2 ⊲⊳ E3 ) (b) Theta joins are associative in the following manner (E1 ⊲⊳θ1 E2 ) ⊲⊳θ2 ∧θ3 E3 = E1 ⊲⊳θ1 ∧θ3 (E2 ⊲⊳θ2 E3 ) where θ2 involves attributes from only E2 and E3 Equivalence Rules IV Administração e Optimização de Bases de Dados Example (customer ⊲⊳c.c name=d.c name depositor ) ⊲⊳d.a number =a.a number ∧balance>1000 account = customer ⊲⊳c.c name=d.c name∧balance>1000 (depositor ⊲⊳d.a number =a.a number account) Equivalence Rules V Administração e Optimização de Bases de Dados 7) The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in θ0 involve only the attributes of one of the expressions (E1 ) being joined σθ0 (E1 ⊲⊳θ E2 ) = (σθ0 (E1 )) ⊲⊳θ E2 (b) When θ1 involves only the attributes of E1 and θ2 involves only the attributes of E2 σθ1 ∧θ2 (E1 ⊲⊳θ E2 ) = (σθ1 (E1 )) ⊲⊳θ (σθ2 (E2 )) Equivalence Rules VI Administração e Optimização de Bases de Dados Example σc name=Smith∧balance>1000 (depositor ⊲⊳ account) = σc name=Smith (depositor ) ⊲⊳ σbalance>1000 (account) Equivalence Rules VII Administração e Optimização de Bases de Dados 8) The projections operation distributes over the theta join operation as follows: (a) Let L1 and L2 be attributes from E1 and E2 . If the join involves only attributes in L1 ∪ L2 πL1 ∪L2 (E1 ⊲⊳θ E2 ) = πL1 (E1 ) ⊲⊳θ πL2 (E2 ) (b) Let L3 be attributes of E1 that are involved in the join condition, but are not in L1 ∪ L2 , and let L4 be attributes of E2 that are involved in the join condition, but are not in L1 ∪ L2 πL1 ∪L2 (E1 ⊲⊳θ E2 ) = πL1 ∪L2 ((πL1 ∪L3 (E1 )) ⊲⊳θ (πL2 ∪L4 (E2 ))) Equivalence Rules VIII Administração e Optimização de Bases de Dados Example πb name,c name (account ⊲⊳a.a number =d.a number depositor ) = πb name,c name ((πb name,a number (E1 )) ⊲⊳a.a number =d.a (πc number name,a number (E2 ))) Equivalence Rules IX Administração e Optimização de Bases de Dados 9) The set operations union and intersection are commutative E1 ∪ E2 = E2 ∪ E1 E1 ∩ E2 = E2 ∩ E1 10) Set union and intersection are associative (E1 ∪ E2 ) ∪ E3 = E1 ∪ (E2 ∪ E3 ) (E1 ∩ E2 ) ∩ E3 = E1 ∩ (E2 ∩ E3 ) Equivalence Rules X Administração e Optimização de Bases de Dados 11) The selection operation distributes over ∪, ∩ and − σθ (E1 − E2 ) = σθ (E1 ) − σθ (E2 ) and similarly for ∪ and ∩. Also σθ (E1 − E2 ) = σθ (E1 ) − E2 and similarly for ∪ and ∩ 12) The projection operation distributes over union πL (E1 ∪ E2 ) = πL (E1 ) ∪ πL (E2 ) Other rules also exist for outer joins, aggregations, ... Transformation Example Administração e Optimização de Bases de Dados Query: Find the names of all customers who have an account at some branch located in Brooklyn πcustomer name (σbranch city =Brooklyn (branch ⊲⊳ (account ⊲⊳ depositor ))) Transformation using rule 7(a) πcustomer name ((σbranch city =Brooklyn (branch)) ⊲⊳ (account ⊲⊳ depositor )) Performing the selection as early as possible reduces the size of the relation to be joined Multiple Transformations Administração e Optimização de Bases de Dados Query: Find the names of all customers with an account at a Brooklyn branch whose account balance is over $1000 πcustomer name ((σbranch city=Brooklyn∧balance>1000 (branch ⊲⊳ (account ⊲⊳ depositor ))) Transformation using join associatively (rule 6(a)): πcustomer name ((σbranch city=Brooklyn∧balance>1000 (branch ⊲⊳ account) ⊲⊳ depositor )) Second form provides an opportunity to apply the “perform selections early” rule, resulting in the expression πcustomer name ((σbranch city=Brooklyn (branch) ⊲⊳ σbalance>1000 (account)) ⊲⊳ depositor ) Multiple Transformations (cont.) Administração e Optimização de Bases de Dados πcustomer πcustomer name σbranch city = Brooklyn ∧ name ⊲⊳ balance > 1000 ⊲⊳ branch account ⊲⊳ σbranch depositor ⊲⊳ depositor city =Brooklyn σbalance>1000 branch account Projection Operation Example Administração e Optimização de Bases de Dados Consider the query πcustomer name ((σbranch city =Brooklyn (branch) ⊲⊳ account) ⊲⊳ depositor ) When we compute σbranch city =Brooklyn (branch) ⊲⊳ account, we get a relation whose schema is (branch name, branch city , assets, account number , balance) Push projections using equivalence rules 8(a) and 8(b) eliminate unneeded attributes from intermediate results πcustomer name (πaccount number (σbranch city =Brooklyn (branch) ⊲⊳ account) ⊲⊳ depositor ) Join Ordering Example Administração e Optimização de Bases de Dados Consider the expression πcustomer name ((σbranch city =Brooklyn (branch)) ⊲⊳ (account ⊲⊳ depositor )) Could compute (account ⊲⊳ depositor ) first, and join result with σbranch city =Brooklyn (branch) However, (account ⊲⊳ depositor ) is likely to be a large relation... ... and only a small fraction of the bank’s customers are likely to have accounts in branches located in Brooklyn Thus, it is better to compute first σbranch city =Brooklyn (branch) ⊲⊳ account Administração e Optimização de Bases de Dados The End