Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic autocompletion #77

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FlorianKothmeier
Copy link

Resolves #5

This should work for most exposed java functionality. Getting code completion for python is more complicated due to dynamic typing.

I first tried to port the existing Jython code from the ghidra repository. But this ultimately failed because the ghidra python2 code just didn't work even after porting it to valid python3. Furthermore, the java introspection side relied on jython features, which were also hard to port over.

So I gave up and wrote a new implementation. IMHO this code is more readable and robust, because it uses the python standard library to achieve most of it and no python magic and doesn't use eval() everywhere. It mostly traverses the AST generated by python. The only case this code uses eval for is getting properties from local scope.

Still, a lot of the boilerplate on the Java side was copied over (and adapted) from the ghidra implementation, such as code completion colors.

How does this work? (Not a complete overview; only the important points)

The code is split in two parts. A Python side complete.py:

  • finds the long suffix that parses as valid python via ast.parse
  • then traverses the AST and tries to resolve the result of the expression. It uses these rules:
    • For variable names, we just resolve the object with eval
    • For property access, we use inspect.getmembers for a list of all members and find the one we want
    • For function calls this is more challenging. We first get the function object via the above rules and then handle two cases:
      1. If it's a Java function (determined by some heuristic), we invoke the Java PythonCodeCompletionFactory.getReturnType, which returns a faked java object of the return type. More on that later on
      2. If it's Python, the function may have a function signature with typed return value. In this case we return the type of that value. The class type shares all the functions in that object, which work well for code completion. However, this does not give us any hint about member variables that would be set on creation. If the function does not have a typed return value then we just give up.
  • When an object could be determined, we get all members via inspect.getmembers and turn them into code completions. If we have a member typed out (e.g. currentProgram().getM), we filter this list by a prefix.

The Java side mostly resolves around implementing the "faked" objects returned by PythonCodeCompletionFactory.getReturnType.

We use a private inner class InspectableJavaObject<T>, which holds a reference to the java.lang.Class of the object we want to fake. We can then get a property list via getProperties(), these return either more InspectableJavaObjects for fields or another type InspectableJavaMethod for functions defined in that class. The InspectableJavaMethod needs a special case in the getReturnType implementation that resolves the Java Class of the object type we wanted to fake instead.

Additionally due to a jep limitation, an interpreter may only be used from the same thread it was created on. Therefore I had to move the interpreter to its own Java Thread. Any action to run on the interpreter can now be done by submitting Futures and waiting for their completion.

I also got rid of the exec call in jepwrappers.py as this made it hard to get the type information into the signature of the wrapper functions.

Limitations

complete.py uses a lot of match statements for traversing the AST. Unfortunately match statements are a relatively new addition to python only being added in 3.10. However, typing this out with if ... else chains is a nightmare. If this is a problem, we could fall back to no autocompletion in case the module cannot be loaded.

Furthermore, there is a special case for the str type. Jep converts between some Java and Python types automatically. This is mostly observed on java.lang.String <-> str

So when the java side reports that a function returns java.lang.String, the python side has to fix the type to str or else the completions are wrong. While this isn't pretty I'm not sure if there is a better solution.

The same problem applies to other types as well. However I fear the other cases are not as straight forward as this one. Additionally the string problem is the most common one, so this should work fine for most cases.

The Python locals() function cannot be evaluated directly from the java side, as it must be run from inside python. Unfortunately this is needed to get the local variables. The workaround is currently to eval and assign it to a variable that can be retrieved later, which is ugly because it pollutes the script environment (and needs a special case not to break the autocompletion)

As we are trying to parse the command character by character the completion runs in O(n^2) time, which could be a problem for longer inputs. For testing I tested completion on the string "foo("*500 + "currentProgram().getM" and completion results still completed instantly so this hopefully shouldn't be a problem. This string is already too long for the interpreter console anyways.

Completion on the python side are limited mostly by signatures on python functions. Therefore it is quite important that the builtin wrapper function return values are correctly typed or else code completion will not work on them.

Additionally, this code does not give any hints on the argument types or the number of arguments. This may be nontrivial though, especially considering the wrapper functions have no sensible function signature arguments, but accept any args.

Furthermore, this only covers the basic completion cases. More challenging cases such as array subscripts and import autocompletions are not covered at all.

@google-cla
Copy link

google-cla bot commented Sep 9, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@mike-hunhoff
Copy link
Collaborator

@FlorianKothmeier thank you for your contribution! Would you please confirm that you intend to sign the CLA? Unfortunately we are unable to accept contributions without the CLA.

Assuming that you sign the CLA, I'll be able to give your PR a thorough review early next week - I look forward to taking a closer look!

@FlorianKothmeier
Copy link
Author

@mike-hunhoff I thought I already signed the cla. At least the cla check in the CI passes now.

Can you point me in the right direction what I'm missing?

@mike-hunhoff
Copy link
Collaborator

@mike-hunhoff I thought I already signed the cla. At least the cla check in the CI passes now.

Can you point me in the right direction what I'm missing?

Yes you are correct, sorry for missing that. CLA check is passing now so you're good to go, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

User Story: Code Completion / Discoverability
2 participants