Designing a benchmark for probabilistic databases.

N. Zandbergen


Keywords:
Benchmark, Probabilistic Database, DuBio, MayBMS, Deduplication, Product Matching, Entity Resolution.

Abstract:
As increasing volumes of uncertain data are produced every day, the need for a mature probabilistic database management system grows. Various probabilistic database systems have been developed throughout the years, but none seems robust enough to function in a real-world environment. To aid the development of a robust system, The QuestionMark Benchmark for Probabilistic Databases has been developed. QuestionMark is a benchmark specifically designed for real-world strain testing of probabilistic databases. QuestionMark covers a wide range of functionalities, so that any application area can be tested. To validate the performance of the benchmark, the state-of-the-art probabilistic database MayBMS and the novel probabilistic database DuBio are run through the benchmark to evaluate their effectiveness, efficiency and appeal. Empirical evaluation shows that QuestionMark is a promising technology and can fulfil its purpose.

PDF:
Download