SymbolEmbeddingHandler
Overview
SymbolEmbeddingHandler is an abstract base class that provides
handling of symbol embeddings. This class is designed to filter, fetch,
and process symbol embeddings that are required to build and retrieve
complex vector representations (embeddings) of symbols. A Symbol
represents distinct logic, such as a python class, method or local
variable, along with a unique URI, typically in software applications.
SymbolEmbeddingHandler, in its derived classes, handles embedding of
symbol documents and symbol code, using a vector database to store and
retrieve these embeddings.
Example
The following example demonstrates how to use SymbolEmbeddingHandler:
As it’s an abstract base class, you cannot make an instance of this class directly. But, you can subclass it and provide concrete methods.
from automata.symbol.base import Symbol
from automata.core.base.database.vector import VectorDatabaseProvider
from automata.embedding.base import EmbeddingBuilder
from automata.symbol_embedding.handler import SymbolEmbeddingHandler
class MySymbolEmbeddingHandler(SymbolEmbeddingHandler):
def process_embedding(self, symbol: Symbol):
# add your embedding process here
pass
# create instances of necessary classes
symbol = Symbol.from_string("scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.tools.base`/ToolNotFoundError#__init__().")
vector_database = VectorDatabaseProvider() # replace with a concrete implementaion of VectorDatabaseProvider
embedding_builder = EmbeddingBuilder() # replace with a concrete implementaion of EmbeddingBuilder
handler = MySymbolEmbeddingHandler(vector_database, embedding_builder)
handler.process_embedding(symbol) # this will call your own process_embedding implementation
Limitations
The primary limitation comes from the abstract nature of this class. As it lacks concrete implementation, it must be subclassed and its abstract methods must be implemented before use. It also relies on the presence of ordered entries in the embedding database, which means the embedding database must support such functionality.
Further, it assumes that every symbol has a dotpath representation that can be used to fetch the symbol embedding from the database. If a symbol doesn’t have a dotpath representation or the database doesn’t have the corresponding entry, it won’t return the correct embedding.
Follow-up Questions:
What needs to be done to accommodate symbols that do not have a dotpath representation, or to populate the database with symbols that lack corresponding entries?
How robust is the symbol filtering mechanism? Could there be performance or accuracy issues when handling extensive or complex symbol batches?
Is it possible to handle symbols and their embeddings that don’t conform to the typical structure, such as those with multiple or nested dotpaths?
In the base
EmbeddingHandler, what kind of expectations does the class have on the structure and format of vector representations for embeddings?