$$%% examples \newcommand{\exGraph}{\graph_{\mathrm{ex}}} \newcommand{\exOnto}{\onto_{\mathrm{ex}}} \newcommand{\exMappings}{\mappings_{\mathrm{ex}}} \newcommand{\exExtensions}{\extensions_{\mathrm{ex}}} \newcommand{\exRule}{r_{\mathrm{ex}}} \newcommand{\RDFSrules}{\rules_{\mathrm{RDFS}}} %% RDF \newcommand{\triple}[3]{(#1, #2, #3)} \newcommand{\tuple}[1]{\langle #1 \rangle} \newcommand{\subject}{\mathtt{s}} \newcommand{\prop}{\mathtt{p}} \newcommand{\object}{\mathtt{o}} \newcommand{\blank}{\_{:}b} \newcommand{\blankn}[1]{\_{:}#1} \newcommand{\irin}[1]{{:}\mathrm{#1}} \newcommand{\class}{\mathtt{c}} \newcommand{\nsrdf}{\mathrm{rdf{:}}} \newcommand{\nsrdfs}{\mathrm{rdfs{:}}} \newcommand{\rdftype}{\mathrm{rdf{:}type}} \newcommand{\rdfLiteral}{\mathrm{rdf{:}Literal}} \newcommand{\rdfssubClassOf}{\mathrm{rdfs{:}subClassOf}} \newcommand{\rdfssubPropertyOf}{\mathrm{rdfs{:}subPropertyOf}} \newcommand{\rdfsdomain}{\mathrm{rdfs{:}domain}} \newcommand{\rdfsrange}{\mathrm{rdfs{:}range}} \newcommand{\rdfsClass}{\mathrm{rdfs{:}Class}} \newcommand{\rdfProperty}{\mathrm{rdf{:}Property}} \newcommand{\xsdint}{\mathrm{xsd{:}int}} %% \newcommand{\type}{\tau} \newcommand{\subclass}{\prec_{sc}} \newcommand{\subproperty}{\prec_{sp}} \newcommand{\domain}{\hookleftarrow_{d}} \newcommand{\range}{\hookrightarrow_{r}} \newcommand{\rdfentailment}{\vdash_{^\mathrm{RDF}}} \newcommand{\RDFS}[1]{\mathrm{RDFS}(#1)} \newcommand{\aka}{a.k.a.~} \newcommand{\etc}{etc} \newcommand{\wrt}{w.r.t.~} \newcommand{\st}{s.t.~} \newcommand{\ie}{i.e.,~} \newcommand{\eg}{e.g.,~} \newcommand{\graph}{G} \newcommand{\rules}{\mathcal{R}} \newcommand{\sources}{\mathcal{S}} \newcommand{\views}{\mathcal{V}} \newcommand{\extensions}{\mathcal{E}} \newcommand{\onto}{\mathcal{O}} \newcommand{\mappings}{\mathcal{M}} \newcommand{\modelsrdf}{\models_\rules} \newcommand{\bgp}{P} \newcommand{\Bl}[1]{\mathrm{Bl}(#1)} \newcommand{\Val}[1]{\mathrm{Val}(#1)} \newcommand{\Var}[1]{\mathrm{Var(#1)}} \newcommand{\ext}[1]{\mathrm{ext}(#1)} \newcommand{\cert}{\mathrm{cert}} \newcommand{\ans}{\mathrm{ans}} \newcommand{\query}{\leftarrow} \newcommand{\body}[1]{\textrm{body}(#1)} \newcommand{\head}[1]{\textrm{head}(#1)} \newcommand{\cs}{\mathrm{cs}} \newcommand{\lcs}{\mathrm{lcs}} \newcommand{\cl}{\mathrm{cl}} \newcommand{\lua}{\mathrm{lua}} \newcommand{\lur}{\mathrm{lur}} \newtheorem{lemma}{Lemma} \newtheorem{definition}{Definition} \newtheorem{problem}{Problem} \newtheorem{property}{Property} \newtheorem{corollary}{Corollary} \newtheorem{example}{Example} \newtheorem{theorem}{Theorem} \newcommand{\URIs}{\mathscr U} \newcommand{\IRIs}{\mathscr I} \newcommand{\BNodes}{\mathscr B} \newcommand{\Literals}{\mathscr L} \newcommand{\Variables}{\mathscr V} % DB \newcommand{\CQ}{\ensuremath{\mathtt{CQ}}\xspace} \newcommand{\UCQ}{\ensuremath{\mathtt{UCQ}}\xspace} \newcommand{\SQL}{\ensuremath{\mathtt{SQL}}\xspace} \newcommand{\rel}[1]{\mathsf{#1}} % Cost model \newcommand{\cans}[1]{|#1|_t} \newcommand{\cref}[1]{|#1|_r} \newcommand{\db}{\mathtt{db}} % DL \newcommand{\cn}{\ensuremath{N_{C}}\xspace} \newcommand{\rn}{\ensuremath{N_{R}}\xspace} \newcommand{\inds}{\ensuremath{N_{I}}\xspace} \newcommand{\ainds}{\ensuremath{\mathrm{Ind}}\xspace} \newcommand{\funct}{\mathit{funct} \ } \newcommand{\KB}{\mathcal{K}\xspace} \newcommand{\dlr}{DL-Lite$_{\mathcal{R}}$\xspace} % Logics \newcommand{\FOL}{\ensuremath{\mathtt{FOL}}\xspace} \newcommand{\datalog}{\ensuremath{\mathtt{Datalog}}\xspace} \newcommand{\dllite}{DL-Lite\xspace} \newcommand{\true}{\mathrm{true}} \newcommand{\false}{\mathrm{false}} \newcommand{\dis}{\mathtt{dis}} \newcommand{\vars}[1]{\ensuremath{\mathrm{vars}(#1)}} %\newcommand{\terms}[1]{\ensuremath{\mathrm{terms}(#1)}} %math \renewcommand{\phi}{\varphi} \newcommand\eqdef{\stackrel{\mathclap{\normalfont\mbox{def}}}{=}} \newcommand\restr[2]{#1_{|#2}} \newcommand{\ontoBody}[1]{\mathrm{body}_\onto(#1)} %proof of the rewriting theorem \newcommand{\rdfGraph}{\graph^{\mappings}_{\extensions}} \newcommand\systemGraph{\graph^{\mappings \cup \mappings^{\text{STD}}_\onto}_{\extensions \cup \extensions_\onto}} \newcommand\viewsGraph{\graph^{\mappings^{\rules,\onto} \cup \mappings^{\text{STD}}_\onto}_{\extensions \cup \extensions_\onto}} \newcommand{\standMappings}{\mappings^{\text{STD}}_\onto} \newcommand{\reminder}[1]{[\vadjust{\vbox to0pt{\vss\hbox to0pt{\hss{\Large $\Longrightarrow$}}}}{{\textsf{\small #1}}}]} %\newcommand{\FG}[1]{\textcolor{blue}{\reminder{FG:~#1}}} \newcommand{\extVersion}{false} \newcommand{\printIfExtVersion}[2] { \ifthenelse{\equal{\extVersion}{true}}{#1}{} \ifthenelse{\equal{\extVersion}{false}}{#2}{} } \newcommand{\bda}{\true} \newcommand{\ifBDA}[2]% {% \ifthenelse{\equal{\bda}{true}}{#1}{}% \ifthenelse{\equal{\bda}{false}}{#2}{}% } %%% Local Variables: %%% TeX-master: "paper" %%% End: $$

Which is the most probable class ?

Theoretical result

We consider the case of a random variable \(X\) with a finite range \(\{v_{1}, \dots, v_{m}\}\) and there exists an minimal integer \(Z\) such that \(P(X=v_{i}) = \frac{u_{i}}{Z}\). So, we have \(\sum_{1\leq i \leq m} u_{i} = Z\).

We perform \(n\) independent draws of \(X\), the probability of obtaining \(k_{i}\) times the values \(v_{i}\) with \(\sum_{1\leq i \leq m} k_{i} = n\) is: \[\binom{n}{k_{1}} (\frac{u_{1}}{Z})^{k_{1}} \times \binom{n - k_{1}}{k_{2}} (\frac{u_{2}}{Z})^{k_{2}} \dots \times \binom{n - k_{1} \dots - k_{m-1}}{k_{m}} (\frac{u_{m}}{Z})^{k_{m}}\]

We can simply the formula to obtain: \[\frac{n!}{Z^{n}} \prod_{1\leq i \leq m} \frac{u_{i}^{k_{i}}}{k_{i}!}\]

Finding the set of values \(k_{i}\) that maximize the above formula is equivalent to find for each \(i\) the \(k_{i}\) maximizing \(\frac{u_{i}^{k_{i}}}{k_{i}!}\). According the following section, the maximum is reached when \(k_{i} = u_{i}\). However the additional constraint \(\sum_{1\leq i \leq m} k_{i} = n\) ensures that the choice \(k_{i} = u_{i}\) is possible iff \(n\) is a multiple of \(Z\).

Analyze of uk/k!

In the following, we observe that the maximum of \(\frac{u^{k}}{k!}\) for fixed \(u\) seems to be reached when \(k=u\).

import matplotlib.pyplot as plt
import numpy as np
n = 50

k = np.arange(0, n)
k[0] = 1 # fact(0)
u = np.repeat(np.arange(0, n), n).reshape((n, n))
uoverk = np.divide(u, k)
uoverk[:,0] = 1 # u^0 =1
res = np.cumprod(uoverk, axis=1)
normalized_res = res/res.max(axis=1)[:,None]
fig, axis = plt.subplots() # il me semble que c'est une bonne habitude de faire supbplots
heatmap = axis.pcolor(normalized_res, cmap=plt.cm.Blues) # heatmap contient les valeurs
plt.colorbar(heatmap)
plt.xlabel("k")
plt.ylabel("u", rotation=0)
plt.title("u^k/k! normalized for fixed u")
plt.show()

e16c9d053b3952bf48300ded8b9cfea3a1e6e881.png

We just have to write the following equation :

\[\frac{u^n}{n!} = \frac{u}{1} \times \frac{u}{2} \dots \times \frac{u}{n}\]

6k draws of a dice

We choose for a given \(k\), \(n=6k\), \(Z=6k\), \(m=6\), \(u_i=k\), so the \(k_i\) have to be equal to \(k\) to maximize the probability and the formula becomes :

\[\frac{(6k)!}{(6k)^{6k}} \prod_{1\leq i \leq 6} \frac{k^k}{k!} = \frac{(6k)!}{6^{6k} (k!)^6}\]

def most_prob_class(k):
    prob = 1
    for i in range(1, 6*k +1):
        d =  i % k if (i % k) != 0 else k 
        prob = prob * (i/(6*d))
    return prob

print([most_prob_class(1), most_prob_class(10), most_prob_class(100), most_prob_class(300)])
[0.015432098765432098, 7.456270054665195e-05, 2.4632858255234786e-07, 1.5853278892898133e-08]

TODO Comparison of expected value and best answer

In general, a PDB is a triplet \((\mathcal D, \mathcal W, P)\) where \(\mathcal D\) is the possibly infinite set of possible tuples, \(\mathcal W\) is a \(\sigma\) algebra on \(\mathcal D\), it represents the set of the possible database instances, so every member of \(\mathcal W\) is a finite set and \(P\) is a probability over \(\mathcal W\).

How to define the union or intersection of two instances in \(\mathcal W\) with the bag semantic ?

How to define an independent block PDB as a PDB from the probabilities of the values in each block ? It should be easy. Is the order of the tuples taken into account ?

Finally, how to relate the previous results with the probabilities of BIDPDB ?

For \(Q\) a given numerical query and a \((\mathcal D, \mathcal W, P)\) a PDB, the expected value of \(Q\) on \((\mathcal D, \mathcal W, P)\) is defined by: \[E(Q(D)) = \int_{D \in \mathcal W} Q(D) dP\]