Title
A formal approach to querying big data (Research)
Abstract
Over the recent years "big data" became a prominent buzzword in the research community as well as in the technology and even mainstream media. The term "big data" generally refers to a context where Gigabytes constitute the unit size for measuring data volumes, where Terabytes are commonly encountered, and many Web companies, scientific or financial institutions must deal with Petabytes of information. In response to pressing practical needs a variety of systems arose for handling big data, like e.g., MapReduce as introduced by Google, together with numerous proposals for extensions. While the characteristics of the above mentioned systems are different, they depart from traditional database systems in that they employ parallelization and data distribution as key components for handling growing data collections of exuberant size. While progress in database research has lead to a deep understanding of traditional non-distributed data models and sequential querying, a similar understanding for big data computation is missing and initial models are only scarcely developed. Given the number of competing systems and their diversity, it remains unclear which system is best suited for which kind of queries. This PhD study therefore aims to develop and study computational models for big data to provide insight in the use of existing systems and to formulate possible improvements.
Period of project
01 October 2016 - 01 April 2018