SymbolDocEmbeddingBuilder
SymbolDocEmbeddingBuilder is a class that creates documentation
embeddings for a given Symbol. This class exists in the
automata.symbol_embedding.builders package. It is crucial in
understanding and building the context surrounding primary symbols in
the code.
Overview
SymbolDocEmbeddingBuilder is used to build an embedding for a
symbol’s documentation. It generates a search list for related context,
accumulates documentation from retrieval sources, and then creates an
embedding from the final document. This class works in combination with
other classes such as EmbeddingVectorProvider,
LLMChatCompletionProvider, SymbolSearch, and
PyContextRetriever.
Examples
The following is a basic example demonstrating how the
SymbolDocEmbeddingBuilder would be used to create documentation
embeddings for a symbol.
from automata.symbol_embedding.builders import SymbolDocEmbeddingBuilder
from automata.embedding.base.EmbeddingVectorProvider import MyEmbeddingVectorProvider
from automata.llm.foundation import MyLLMChatCompletionProvider
from automata.experimental.search.symbol_search import MySymbolSearch
from automata.retrievers.py.context import PyContextRetriever
embedding_provider = MyEmbeddingVectorProvider(...)
completion_provider = MyLLMChatCompletionProvider(...)
symbol_search = MySymbolSearch(...)
retriever = PyContextRetriever(...)
builder = SymbolDocEmbeddingBuilder(
embedding_provider=embedding_provider,
completion_provider=completion_provider,
symbol_search=symbol_search,
retriever=retriever
)
source_code = """
def my_func():
\"\"\"This is a sample function.\"\"\"
return 5
"""
symbol = Symbol.from_string(...)
result = builder.build(source_code, symbol)
Where all the My... objects are various classes of those types.
Please note that the actual class names and instantiation will depend on the specific embedding provider, completion provider, symbol_search, and retriever that you use.
Limitations
While extremely useful for creating documentation embeddings,
SymbolDocEmbeddingBuilder should be used with due consideration of
its potential limitations. The quality and accuracy of the embeddings
depend heavily on the underlying EmbeddingVectorProvider and
LLMChatCompletionProvider used.
Follow-up Questions:
What is the expected format and content of the
Symbolobject?How does the
SymbolSearchclass affect the output ofSymbolDocEmbeddingBuilder?How can we optimize the source code to XML transformation in
PyContextRetrieverfor better results?