ChromaSymbolEmbeddingVectorDatabase

ChromaSymbolEmbeddingVectorDatabase is a concrete implementation of a vector database that saves into a Chroma database. It extends the functionality of ChromaVectorDatabase, allowing storage, retrieval, and manipulation of SymbolEmbedding instances.

Overview

ChromaSymbolEmbeddingVectorDatabase provides a variety of methods to manage entries in the chroma database including adding single or batches of entries (add() and batch_add()), retrieving entries by their keys (get(), batch_get()) or all entries in a sorted order (get_ordered_entries(), get_ordered_keys()), and updating single or multiple entries (update_entry(), batch_update()). In addition, it also offers functionality to generate a hashable key from a SymbolEmbedding instance with entry_to_key() method.

Example

This is a simplified usage example of ChromaSymbolEmbeddingVectorDatabase:

from automata.symbol_embedding.base import SymbolEmbedding
from automata.symbol_embedding.vector_databases import ChromaSymbolEmbeddingVectorDatabase

factory = SymbolEmbedding
collection_name = "test_collection"

# Instantiate ChromaSymbolEmbeddingVectorDatabase
database = ChromaSymbolEmbeddingVectorDatabase(collection_name, factory)

# Add an entry
entry = factory(symbol=Symbol, document="some document", vector=np.array([1, 2, 3]))
database.add(entry)

# Retrieve entry
retrieved = database.get(database.entry_to_key(entry))

# Update entry
entry.vector = np.array([4, 5, 6])
database.update_entry(entry)

# Delete entry
database.discard(database.entry_to_key(entry))

Note

This class does not check if the chroma database instance used is connected to a database. It’s the user’s responsibility to manage the chroma database connection.

Follow-up Questions:

  • How is this class handling connection errors to the Chroma Database?

  • Is there a way to manage the database connection from within this class?

  • Are there any limitations regarding the size of the symbol vectors that can be stored in the database?