$$%% examples \newcommand{\exGraph}{\graph_{\mathrm{ex}}} \newcommand{\exOnto}{\onto_{\mathrm{ex}}} \newcommand{\exMappings}{\mappings_{\mathrm{ex}}} \newcommand{\exExtensions}{\extensions_{\mathrm{ex}}} \newcommand{\exRule}{r_{\mathrm{ex}}} \newcommand{\RDFSrules}{\rules_{\mathrm{RDFS}}} %% RDF \newcommand{\triple}[3]{(#1, #2, #3)} \newcommand{\tuple}[1]{\langle #1 \rangle} \newcommand{\subject}{\mathtt{s}} \newcommand{\prop}{\mathtt{p}} \newcommand{\object}{\mathtt{o}} \newcommand{\blank}{\_{:}b} \newcommand{\blankn}[1]{\_{:}#1} \newcommand{\irin}[1]{{:}\mathrm{#1}} \newcommand{\class}{\mathtt{c}} \newcommand{\nsrdf}{\mathrm{rdf{:}}} \newcommand{\nsrdfs}{\mathrm{rdfs{:}}} \newcommand{\rdftype}{\mathrm{rdf{:}type}} \newcommand{\rdfLiteral}{\mathrm{rdf{:}Literal}} \newcommand{\rdfssubClassOf}{\mathrm{rdfs{:}subClassOf}} \newcommand{\rdfssubPropertyOf}{\mathrm{rdfs{:}subPropertyOf}} \newcommand{\rdfsdomain}{\mathrm{rdfs{:}domain}} \newcommand{\rdfsrange}{\mathrm{rdfs{:}range}} \newcommand{\rdfsClass}{\mathrm{rdfs{:}Class}} \newcommand{\rdfProperty}{\mathrm{rdf{:}Property}} \newcommand{\xsdint}{\mathrm{xsd{:}int}} %% \newcommand{\type}{\tau} \newcommand{\subclass}{\prec_{sc}} \newcommand{\subproperty}{\prec_{sp}} \newcommand{\domain}{\hookleftarrow_{d}} \newcommand{\range}{\hookrightarrow_{r}} \newcommand{\rdfentailment}{\vdash_{^\mathrm{RDF}}} \newcommand{\RDFS}[1]{\mathrm{RDFS}(#1)} \newcommand{\aka}{a.k.a.~} \newcommand{\etc}{etc} \newcommand{\wrt}{w.r.t.~} \newcommand{\st}{s.t.~} \newcommand{\ie}{i.e.,~} \newcommand{\eg}{e.g.,~} \newcommand{\graph}{G} \newcommand{\rules}{\mathcal{R}} \newcommand{\sources}{\mathcal{S}} \newcommand{\views}{\mathcal{V}} \newcommand{\extensions}{\mathcal{E}} \newcommand{\onto}{\mathcal{O}} \newcommand{\mappings}{\mathcal{M}} \newcommand{\modelsrdf}{\models_\rules} \newcommand{\bgp}{P} \newcommand{\Bl}[1]{\mathrm{Bl}(#1)} \newcommand{\Val}[1]{\mathrm{Val}(#1)} \newcommand{\Var}[1]{\mathrm{Var(#1)}} \newcommand{\ext}[1]{\mathrm{ext}(#1)} \newcommand{\cert}{\mathrm{cert}} \newcommand{\ans}{\mathrm{ans}} \newcommand{\query}{\leftarrow} \newcommand{\body}[1]{\textrm{body}(#1)} \newcommand{\head}[1]{\textrm{head}(#1)} \newcommand{\cs}{\mathrm{cs}} \newcommand{\lcs}{\mathrm{lcs}} \newcommand{\cl}{\mathrm{cl}} \newcommand{\lua}{\mathrm{lua}} \newcommand{\lur}{\mathrm{lur}} \newtheorem{lemma}{Lemma} \newtheorem{definition}{Definition} \newtheorem{problem}{Problem} \newtheorem{property}{Property} \newtheorem{corollary}{Corollary} \newtheorem{example}{Example} \newtheorem{theorem}{Theorem} \newcommand{\URIs}{\mathscr U} \newcommand{\IRIs}{\mathscr I} \newcommand{\BNodes}{\mathscr B} \newcommand{\Literals}{\mathscr L} \newcommand{\Variables}{\mathscr V} % DB \newcommand{\CQ}{\ensuremath{\mathtt{CQ}}\xspace} \newcommand{\UCQ}{\ensuremath{\mathtt{UCQ}}\xspace} \newcommand{\SQL}{\ensuremath{\mathtt{SQL}}\xspace} \newcommand{\rel}[1]{\mathsf{#1}} % Cost model \newcommand{\cans}[1]{|#1|_t} \newcommand{\cref}[1]{|#1|_r} \newcommand{\db}{\mathtt{db}} % DL \newcommand{\cn}{\ensuremath{N_{C}}\xspace} \newcommand{\rn}{\ensuremath{N_{R}}\xspace} \newcommand{\inds}{\ensuremath{N_{I}}\xspace} \newcommand{\ainds}{\ensuremath{\mathrm{Ind}}\xspace} \newcommand{\funct}{\mathit{funct} \ } \newcommand{\KB}{\mathcal{K}\xspace} \newcommand{\dlr}{DL-Lite$_{\mathcal{R}}$\xspace} % Logics \newcommand{\FOL}{\ensuremath{\mathtt{FOL}}\xspace} \newcommand{\datalog}{\ensuremath{\mathtt{Datalog}}\xspace} \newcommand{\dllite}{DL-Lite\xspace} \newcommand{\true}{\mathrm{true}} \newcommand{\false}{\mathrm{false}} \newcommand{\dis}{\mathtt{dis}} \newcommand{\vars}[1]{\ensuremath{\mathrm{vars}(#1)}} %\newcommand{\terms}[1]{\ensuremath{\mathrm{terms}(#1)}} %math \renewcommand{\phi}{\varphi} \newcommand\eqdef{\stackrel{\mathclap{\normalfont\mbox{def}}}{=}} \newcommand\restr[2]{#1_{|#2}} \newcommand{\ontoBody}[1]{\mathrm{body}_\onto(#1)} %proof of the rewriting theorem \newcommand{\rdfGraph}{\graph^{\mappings}_{\extensions}} \newcommand\systemGraph{\graph^{\mappings \cup \mappings^{\text{STD}}_\onto}_{\extensions \cup \extensions_\onto}} \newcommand\viewsGraph{\graph^{\mappings^{\rules,\onto} \cup \mappings^{\text{STD}}_\onto}_{\extensions \cup \extensions_\onto}} \newcommand{\standMappings}{\mappings^{\text{STD}}_\onto} \newcommand{\reminder}[1]{[\vadjust{\vbox to0pt{\vss\hbox to0pt{\hss{\Large $\Longrightarrow$}}}}{{\textsf{\small #1}}}]} %\newcommand{\FG}[1]{\textcolor{blue}{\reminder{FG:~#1}}} \newcommand{\extVersion}{false} \newcommand{\printIfExtVersion}[2] { \ifthenelse{\equal{\extVersion}{true}}{#1}{} \ifthenelse{\equal{\extVersion}{false}}{#2}{} } \newcommand{\bda}{\true} \newcommand{\ifBDA}[2]% {% \ifthenelse{\equal{\bda}{true}}{#1}{}% \ifthenelse{\equal{\bda}{false}}{#2}{}% } %%% Local Variables: %%% TeX-master: "paper" %%% End: $$

Tatooine Conjunctive Plan Optimization

Conjunctive Plan

I propose to see plan optimizations on the following query on 1M triples BSBM dataset:

Q05<$offer, $vendor, $offerURL, $vendorName, $vendorHomepage> :- 
            triple($offer, <bsbm:vendor>, $vendor),
            triple($offer, <bsbm:offerWebpage>, $offerURL),
            triple($vendor, <rdfs:label>, $vendorName),
            triple($vendor, <foaf:homepage>, $vendorHomepage);

Its Tatooine execution plan is (you have to zoom on it):

Q05.png

Figure 1: Query plan without optimization

The query has 10 conjunctive rewritings according to the mappings. The execution plan computes the answers of the union of all rewritings. The query has 56200 answers which are computed in more than 10s with the above plan. The plan has the following structure:

  1. at the root, there is a distinct block composed of two Uniq operators followed by MemorySort,
  2. a Union operator, each of whose subplans computes answers of one rewriting,
  3. for optimization, each rewriting plan starts with andistinct block,
  4. a Project operator induced by answer variables of the rewriting,
  5. HashJoin operators representing the joins in the conjunction of the rewritings,
  6. Selection operators induced by constants in query,
  7. a Skolem function block composed of a BindAccessEval operator with FunctionCallBindAccess operator, transforming the output of the left input of BindAccessEval,
  8. an SQLEval operator evaluating the SQL view on the datasource.

Optimizations

I brought the following optimizations:

  1. Pushing an operator into its SQLEval operator child or their SQLEval operators children. Only Selection, HashJoin and Projection can be pushed
  2. Inverting a Skolem function block with its parent. Only Selection, HashJoin and Projection can be inverted in this way.
  3. Removing useless operators or blocks appearing during optimization process.

Inverting Selection and Skolem function block

The inversion is done as follows:

Selection          Skolem
    +             Function
    |      ===>    Block
    v                +
  Skolem             |
 Function            v
  Block          Selection

Inverting HashJoin and Skolem function blocks

The inversion is done as follow:

      HashJoin                  Skolem
         +                     function
   +-----+-----+                block
   |           |        ===>      +
   v           v                  |
 Skolem      Skolem               v
function    function           HashJoin
 block       block                +
                              +-------+
                              |       |

The inversion is not possible, when the join is made on columns having different Skolem functions. In this case, we know that the join will not produce any answer, so replace it sub plan by an empty operator.

Q05.png

Figure 2: Query plan where Selection and HashJoin have been inverted wrt the Skolem function block and pushed in SQLEval. The black boxes represent empty operators due to impossible inversions. The query plan is executed in less than 4s.

Removing of Empty Sub-plans

The introduced empty operators (joins over results of different Skolem functions – these will never match) lead to empty sub-plans that can be removed.

Q05.png

Figure 3: Query plan with its empty sub-plans removed. Its execution takes less than 3s.

Inverting Projection with a Skolem Function Block

The inversion is done as follows:

Projection         Skolem
    +             Function
    |      ===>    Block
    v                +
  Skolem             |
 Function            v
  Block          Projection

Q05.png

Figure 4: Query plan where projection has been inverted with a Skolem function block. Its execution takes less than 2s.

Removal of Useless Union Operators and Distinct Blocks

We remove the union having one child and consecutive distinct blocks.

Q05.png

Figure 5: Query plan whitout useless operators and blocks. Its execution takes less than 1,5s.

Pushing Projection in SQLEval

We change an SQLEval preceded by Projection, in order to push the projection in the SQL query.

Q05.png

Figure 6: Query plan where projections are pushed into SQLEval operators. The execution takes less than 1s.

Replacing Sort-Based Distinct Block by Hash-Based Distinct Operator

Q05.png

Figure 7: Query plan where distinct block is replaced by HashDistinct operator. The execution takes around 300ms.

Inverting Distinct with a Skolem Function Block

The inversion is done as follows:

Distinct          Skolem
   +             Function
   |      ===>    Block
   v                +
 Skolem             |
Function            v
 Block           Distinct

Q05.png

Figure 8: Query plan where distinct operator has been inverted with Skolem function block. Its execution takes around 300ms.

Pushing Distinct in SQLEval

Q05.png

Figure 9: Query plan where distinct operator has been inverted with Skolem function block. Its execution takes less than 300ms.

Inverting Union with Skolem Function Blocks

Inverting Union with Skolem function blocks can be done by:

  1. grouping same Skolem function blocks under the union,
  2. add a projection to each input operator of Skolem function group to normalized its inputs (reordering and repeating columns),
  3. create a union operator for the input of each group of functions
  4. create a Skolem function blocks on top of each union of inputs
  5. create union of each Skolem function blocks

I propose to see plan optimizations on the following query on 1M triples BSBM dataset:

Qt<$X> :-
           triple($X, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType1>);

This query have 2810 answers and 96 rewritings, but all of this rewritings have use the same skolem function.

Qt.png

Figure 10: Query plan where union operator has been inverted with Skolem function block. Its execution takes around 500ms.

Pushing Union In SQLEval

Qt.png

Figure 11: Query plan where union operator has been pushed with SQLEval. Its execution takes less than 200ms.

TODO Product

During rewriting product may appear when a join between atoms is "removed" by instantiating a query variable using constant of mappings head.