Source Target Rewriting Optimizations

During source target rewriting, many optimizations can be done using source constraints and Skolem functions

Skolem Function Encoding

We will assume that Skolem functions have disjoint range and are bijective.

Example

Assume the following mappings where \(f\) and \(g\) are Skolem functions:

\(\textrm{Product}(id, l, c) \rightarrow \triple{f(id)}{\textrm{ex:label}}{l}, \triple{f(id)}{\textrm{ex:comment}}{c}\).
\(\textrm{Vendor}(id, l) \rightarrow \triple{g(id)}{\textrm{ex:label}}{l}\).

Consider the following query: \[q(x, y, z) \leftarrow \triple{x}{\textrm{ex:label}}{y}, \triple{x}{\textrm{ex:comment}}{z}\]

Current Rewriting

We have encoded functions in a two layers integration:

\(\textrm{Product}(id, l, c) \rightarrow \textrm{V}_{1}(f(id),l,c)\)
\(V_{1}(p, l, c) \rightarrow \triple{p}{\textrm{ex:label}}{l}, \triple{p}{\textrm{ex:comment}}{c}\)
\(\textrm{Vendor}(id, l) \rightarrow \textrm{V}_{2}(g(id), l)\)
\(\textrm{V}_{2}(v, l) \rightarrow \triple{v}{\textrm{ex:label}}{l}\)

Currently, we will rewrite \(q\) in terms of queries using view \(V_{1}\) and \(V_{2}\) only and we find:

\(Q_{1}(x, y, z) \leftarrow \textrm{V}_{1}(x, y, c), \textrm{V}_{1}(x,l,z)\)
\(Q_{2}(x, y, z) \leftarrow \textrm{V}_{2}(x, y), \textrm{V}_{1}(x,l,z)\)

We can see that the rewriting \(Q_{2}\) have no answers, because the join on the first position of \(V_{1}\) and \(V_{2}\) will be translated as \(f(id_{1}) = g(id_{2})\) using the definitions of the views. But Skolem functions have disjoint ranges.

Skolem Predicates

We can translate the two layers mappings in an one layer mappings as follow:

\(\textrm{Product}(id, l, c), F(id, p) \rightarrow \triple{p}{\textrm{ex:label}}{l}, \triple{p}{\textrm{ex:comment}}{c}\).
\(\textrm{Vendor}(id, l), G(id, v) \rightarrow \triple{v}{\textrm{ex:label}}{l}\).

where \(F\) and \(G\) Skolem predicates represent respectively Skolem functions \(f\) and \(g\). The first arguments of predicate are the functions inputs and the second arguments are the outputs. Because this new predicate represent Skolem function, we can define the following constraints on it (here for \(F\), the same is true for \(G\)):

function predicate: \(\forall x,y_{1},y_{2}~ F(x, y_{1}), F(x, y_{2}) \rightarrow y_{1} = y_{2}\),
injective function: \(\forall x_{1}, x_{2},y~ F(x_{1}, y), F(x_{2}, y) \rightarrow x_{1} = x_{2}\),
disjoint ranges: for two distinct Skolem predicate \(F\) and \(G\), \(\forall x_{1}, x_{2}, y~ F(x_{1}, y), G(x_{2}, y) \rightarrow \perp\).

If we consider the rewritings of \(q\) according to the mappings including Skolem predicates:

\(Q_{1}(x, y, z) \leftarrow \textrm{Product}(id_{1}, y, c), F(id_{1}, x), \textrm{Product}(id_{2}, l, z), F(id_{2}, x)\)
\(Q_{2}(x, y, z) \leftarrow \textrm{Vendor}(id_{1}, y), G(id_{1}, x), \textrm{Product}(id_{2}, l, z), F(id_{2}, x)\)

According to the disjoint range property of Skolem predicate, we can deduce that \(Q_{2}\) have no answer. And we can apply the injective function property of \(F\) to obtain the following rewriting of \(q\):

\[Q_{r}(x, y, z) \leftarrow \textrm{Product}(id, y, c), \textrm{Product}(id, l, z), F(id, x)\]

Primary Key

If we add the fact that the first column of \(\textrm{Product}\) is a primary key, we have the following constraint: \[\forall x, y_{1}, y_{2}, z_{1}, z_{2} ~ \textrm{Product}(x, y_{1}, z_{1}), \textrm{Product}(x, y_{2}, z_{2}) \rightarrow y_{1} = y_{2}, z_{1} = z_{2}\]

Then the rewriting of \(q\) becomes: \[Q_{r}(x, y, z) \leftarrow \textrm{Product}(id, y, z), F(id, x)\]

Applying a Primary Key Constraint on a CQ

A primary key constraint is a pair \((P, I)\) where:

\(P\) a predicate
\(I\) a set of indexes in \(P\).

Let \(F\) be a conjunction of atoms, a primary key constraint \((P, I)\) is applicable on \(F\), if there exists two distinct atoms \(P(\bar t)\) and \(P(\bar u)\) such that \(\forall i \in I~ \bar t_{i} = \bar u_{i}\). We say that \((P,I)\) is applicable on \(F\) by the atoms \(P(\bar t)\) and \(P(\bar u)\).

We define the result of the application of this primary key constraint on \(F\) using this two atoms as \(h(F)\), where \(h\) is homomorphism: \[\{\bar u_{k} \mapsto \bar t_{k} \mid 1 \leq k \leq \mathrm{arity}(P)\}.\] The homomorphism \(h\) is not well defined, when there exists \(k\) such that \(\bar t_{k}\) and \(\bar u_{k}\) are distinct constants, we say that there is a clash during the application.

Since conjunction of atoms are represented as set of atoms, the application of a primary key constraint on a conjunction of atoms using two atoms without clash correspond to:

remove one of the two atoms
replace each variable of the removed atom by the corresponding term of the remained one.

A clash during the application of a primary key constraint means that \(F\) the conjunction of atoms violates the primary key constraint. In the case where \(F\) is a conjunctive query, we can deduce that \(F\) have no answer under the primary key constraint.

Translate SQL View as Conjunctive Query on Tables

When it is possible, it is worth to translate SQL view definition into conjunctive view on table. It will enables constraints and query core computation.