Tatooine Conjunctive Plan Optimization
Conjunctive Plan
I propose to see plan optimizations on the following query on 1M triples BSBM dataset:
Q05<$offer, $vendor, $offerURL, $vendorName, $vendorHomepage> :- triple($offer, <bsbm:vendor>, $vendor), triple($offer, <bsbm:offerWebpage>, $offerURL), triple($vendor, <rdfs:label>, $vendorName), triple($vendor, <foaf:homepage>, $vendorHomepage);
Its Tatooine execution plan is (you have to zoom on it):
The query has 10 conjunctive rewritings according to the mappings. The execution plan computes the answers of the union of all rewritings. The query has 56200 answers which are computed in more than 10s with the above plan. The plan has the following structure:
- at the root, there is a distinct block composed of two Uniq operators followed by MemorySort,
- a Union operator, each of whose subplans computes answers of one rewriting,
- for optimization, each rewriting plan starts with andistinct block,
- a Project operator induced by answer variables of the rewriting,
- HashJoin operators representing the joins in the conjunction of the rewritings,
- Selection operators induced by constants in query,
- a Skolem function block composed of a BindAccessEval operator with FunctionCallBindAccess operator, transforming the output of the left input of BindAccessEval,
- an SQLEval operator evaluating the SQL view on the datasource.
Optimizations
I brought the following optimizations:
- Pushing an operator into its SQLEval operator child or their SQLEval operators children. Only Selection, HashJoin and Projection can be pushed
- Inverting a Skolem function block with its parent. Only Selection, HashJoin and Projection can be inverted in this way.
- Removing useless operators or blocks appearing during optimization process.
Inverting Selection and Skolem function block
The inversion is done as follows:
Selection Skolem + Function | ===> Block v + Skolem | Function v Block Selection
Inverting HashJoin and Skolem function blocks
The inversion is done as follow:
HashJoin Skolem + function +-----+-----+ block | | ===> + v v | Skolem Skolem v function function HashJoin block block + +-------+ | |
The inversion is not possible, when the join is made on columns having different Skolem functions. In this case, we know that the join will not produce any answer, so replace it sub plan by an empty operator.
Removing of Empty Sub-plans
Inverting Projection with a Skolem Function Block
Removal of Useless Union Operators and Distinct Blocks
Pushing Projection in SQLEval
Replacing Sort-Based Distinct Block by Hash-Based Distinct Operator
Inverting Distinct with a Skolem Function Block
Pushing Distinct in SQLEval
Inverting Union with Skolem Function Blocks
Inverting Union with Skolem function blocks can be done by:
- grouping same Skolem function blocks under the union,
- add a projection to each input operator of Skolem function group to normalized its inputs (reordering and repeating columns),
- create a union operator for the input of each group of functions
- create a Skolem function blocks on top of each union of inputs
- create union of each Skolem function blocks
I propose to see plan optimizations on the following query on 1M triples BSBM dataset:
Qt<$X> :- triple($X, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType1>);
This query have 2810 answers and 96 rewritings, but all of this rewritings have use the same skolem function.
Pushing Union In SQLEval
TODO Product
During rewriting product may appear when a join between atoms is "removed" by instantiating a query variable using constant of mappings head.