Databases with missing data
The aim of this project is to study the query answering over database with missing data, where the missingness is described by a graph of missingness.
Query answering over block dependent probabilistic databases
The different notions of query answering for a numerical query q (including Boolean queries: 0 or 1) over a BIPDB D:
- the expect value defined by \(E(q(D))\)
- a most probable answer is an possible answer having the highest probability
- a best answer is an answer on a most probable distribution of the tuples. In this case, the possible worlds that have the same distribution of tuples are considered as equivalent : we say that they form a class.
The answer of a CQ over BIPDB should be another BIPDB.
Open questions
- The comparison of the best answer and the most probable answer shows that the two notions are different on a small example.
- The best answer and the expect value are the same notion when the number of rows in D is such that for every probabilities p, \(|D| \times p\) is an integer ? It leads us to another question. In this case, is the class of possible worlds where the tuples is compliant with the distribution the most probable class ? I started to work on those questions here.