SymbolCodeEmbedding
SymbolCodeEmbedding is a concrete class used for creating embeddings
for source code symbols. It is part of the
automata.symbol_embedding.base package.
Overview
SymbolCodeEmbedding is used to embed source code symbols in an
N-dimensional space. It inherits from SymbolEmbedding, which is an
abstract class for symbol code embeddings. The class mainly has a
constructor for initialization and a __str__ method which returns a
string representation of the object.
Example
For creating an instance of SymbolCodeEmbedding, you first need to
have a Symbol and a source_code associated with it. You also
need a numpy array for the vector representation. Below is an example:
from automata.symbol.base import Symbol
from automata.symbol_embedding.base import SymbolCodeEmbedding
import numpy as np
symbol = Symbol("URIsymbol")
source_code = "def test():\n\treturn"
vector = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
# Initialize SymbolCodeEmbedding
embed = SymbolCodeEmbedding(symbol, source_code, vector)
# Print the SymbolCodeEmbedding
print(embed)
Limitations
There might limitations around the dimensionality of the numpy vector as the complexity and size of source code symbols can impose storage and computation issues.
Follow-up Questions:
What is the dimension of the numpy vectors which are standard in this context?
How are the numpy vectors generated or trained from the source code?
Are there any implicit limitations or assumptions on the type of symbol or source code that can be embedded?