Abstract:
The production of huge amount of data and the emergence of cloud computing have introduced new requirements for data management. Many applications need to interact with several heterogeneous data stores depending on the type of data they have to manage: relational and NoSQL (i.e., document, graph, key-value, and column) data stores. Interacting with heterogeneous data models via different APIs and query languages imposes challenging tasks to the developers of multiple data stores applications. Indeed, the execution of complex queries over heterogeneous data models cannot, currently, be achieved in a declarative way as it is used to be with single data store application, and therefore requires extra implementation efforts. In this paper we propose a mediation based component to optimize and execute complex queries over multiple data stores in Cloud environments. This component is referred to as virtual data store (VDS). The key ingredients of our solution are (1) a simple global schema describing the different data sources and their relationships, (2) a cost model to evaluate the cost of the operations, (3) an inter data stores parallelism execution model, and (4) a dynamic programming based approach to generate optimal execution plan. Quantitative and qualitative experiments are conducted to validate our approach.