You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
pombredanne
changed the title
Consider using token replacement and abstraction
AI-GCS: Design and implement "Code Stemming", e.g., token replacement and abstraction
Dec 26, 2024
I have a small prototype using pygments that is create a specific formatter that can split code in comments, code and literals. It works on a stream of tokens for this and we could use the principles to replace token values with a generic name for variables. Let me push that in a branch to highlight the approach.
This is something done here for instance https://dickgrune.com/Programs/similarity_tester/
See:
Each code token is assigned a normalized replacement (here a single letter) and this is what is used afterwards.
Eventually this is to implement the "code stemming" as documented in https://ai-gen-code-search.readthedocs.io/en/latest/approximate-matching-design-3.html and https://github.com/aboutcode-org/ai-gen-code-search/blob/main/docs/source/approximate-matching-design-3.rst
The text was updated successfully, but these errors were encountered: