Title
Vector embeddings as database views (Research)
Abstract
Over the past decade, vector embedding methods have been
developed as a means of enabling machine learning over structured
data such as graphs or, more generally, relational databases. While
the empirical effectiveness of vector embeddings for focused learning
tasks and application domains is well-researched, exactly what
information of the structured data is encoded in embeddings is less
understood. In this project, we postulate that by looking at
embeddings through the lens of database research, we can gain
more insight in what information embeddings contain. Concretely, we
propose to design query languages in which vector embeddings can
naturally be expressed. In this setting, questions concerning the kind
of information that is encoded in the embedded vectors can naturally
be phrased as a query rewriting using views problem, which we will
study. Furthermore, by taking into account structural properties of
embedding queries, we open the door to a transfer of methods in
databases to vector embeddings, and back. In particular, database
methods for incremental query evaluation and query sampling can be
applied for the efficient learning of embedding parameters, while,
conversely, embeddings can be exploited for database indexing.
Period of project
01 January 2022 - 31 December 2025