Project R-12626


Vector embeddings as database views (Research)


Over the past decade, vector embedding methods have been developed as a means of enabling machine learning over structured data such as graphs or, more generally, relational databases. While the empirical effectiveness of vector embeddings for focused learning tasks and application domains is well-researched, exactly what information of the structured data is encoded in embeddings is less understood. In this project, we postulate that by looking at embeddings through the lens of database research, we can gain more insight in what information embeddings contain. Concretely, we propose to design query languages in which vector embeddings can naturally be expressed. In this setting, questions concerning the kind of information that is encoded in the embedded vectors can naturally be phrased as a query rewriting using views problem, which we will study. Furthermore, by taking into account structural properties of embedding queries, we open the door to a transfer of methods in databases to vector embeddings, and back. In particular, database methods for incremental query evaluation and query sampling can be applied for the efficient learning of embedding parameters, while, conversely, embeddings can be exploited for database indexing.

Period of project

01 January 2022 - 31 December 2025