-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idempotent callbacks in Python #539
Comments
You could try to write you rules so that the functions are called less often. Maybe store their results in a predicate and use that predicate throughout the program. |
Thank you for the hints. I was able to reduce the overhead of callbacks substantially by storing state in predicates and make the external functions pure functions. That way I could cache the results in a patched _pyclingo_ground_callback(). That is a little late, as Python has already been called, but since FFI seems to use a receiver makes right policy, it is possible to avoid much of the (un)marshalling when caching. I also used FFI functions to unpack and to initialise data instead of Python loops/comprehensions and that speeds up significantly as well. My cache uses function name plus arguments as a key, ignoring the context the external function is defined in/on. (Just for safety, I clear the cache when the context changes.) This works for me, for now at least. But it raises the question what exactly the context means regarding the semantics of the function call. It is possible to ground different parts of a program with different contexts. But what does it mean to ground two parts into the same control using different contexts? It seems to me that one should not do that. Either the different contexts do exactly the same and the result is an incrementally ground program, or the contexts do different things, yielding a mess of different parts that begs for problems. Or is it? I believe using multiple contexts is not very wise, and as a consequence, it is safe to ignore the context in my cache. Regarding the formal semantics of the external functions and their context, is anything specified somewhere? |
Depending on your project's requirements, you might want to consider PyPy. Its Just-In-Time (JIT) compiler can significantly improve performance, especially with modules like clingo. Regarding your cache, directly accessing the ground callback could reduce the number of function calls, which can be expensive in standard Python due to the interpreter's limitations. PyPy could offer different results due to its JIT compiler. Note that JIT support is also in development for standard Python. For cleaner code, have a look at annotations from Python's functools library. |
(Interesting to know that CPython also gets a JIT compiler. I wasn't aware of that. I surely need to take a look. I am a big fan of CPython (notwithstanding the GIL) because of its API allowing one to do next to everything.) My project spends 90% of its time in grounding, of which quite a lot in _pyclingo_ground_callback, particularly in the FFI marshalling code. I replaced Python loops with calls to ffi.unpack etc. As these are C++ they are unfortunately not the target of the JIT. I initially did try functools.cache, but since that is a general purpose cache, it requires Python data structures, precisely the ones I avoided to unmarshall. Also, it requires an extra intermediate function. So I cache on lower level data. But I could still give functools another try, since it sure is cleaner. What are your thoughts on the exact calling semantics of external functions? Is considering them purely functional something you would do? |
Clingo assumes external functions are pure. It has to because there is no guaranteed order in which functions are called or how often they are called. Do you mean that clingo itself should cache external functions calls? This would be possible. Whether this is beneficial depends. If a function is only ever called with the same arguments once, there is nothing to gain here. In your use case, speed ups are to be expected. |
OK, that is clear. Pure functions it is. But, it is possible to call ground() more than once on the same control, while passing different contexts. Is that something one should never do, or is it by design and does it give distinct possibilities I am not aware of? If the latter is the case: what are the implications for the identification of external functions? Are external functions with the same name (and arity) considered equal even though they are on different context objects or classes? |
Clingo requires that external functions are pure within one ground call. Whether a context object is changed for the next ground call or the functions in the context object change, does not matter for correct functioning of clingo. I would call it a possible use case to do this. I typically, use |
OK, so if there are no requirements regarding the results of external functions across contexts, that means the grounder must (have to/will) call external functions again. Because, if it didn't, it would have an idempotency or pureness requirement across contexts. This might result in otherwise undistinguishable literals, for example
So one could implement some sort of enumeration, because previous facts are not removed between grounds. As for the calling semantics of external functions: they will be called at least once (for each combination of arguments) during each ground. As for my cache: resetting it when the context changes seems like an necessary thing to do. Is my logic right? |
I think you can do anything here that your application requires. Just follow the minimum requirements clingo needs. |
(I am very sorry this issue's topic is skewing a bit, but while implementing a proper cache, this came up.) Well, I am doing "anything", but since I cannot be sure if and when external functions are called, I cannot rely on behaviour that goes beyond pure functional. At the moment, it seems like external functions are called on each ground, but not all of them, because optimisation comes first, and functions called in a first ground can possibly not be called in a second ground, or v.v. My workaround is to make sure external functions are called is using aggregations (where not needed logically) because those inhibit certain optimisations. But it is not really a good solution, as rewriting a piece of logic might unexpectedly lead to external functions no longer called or get called where they were not in the first place. Also, #499 will change the calling of external functions. Whether 499 is an improvement or not does not matter. The point is that it will change if external functions are called. That means the calling semantics of external functions are basically: zero or more times. That isn't really workable is it? I am not asking for a change here, but I am trying to surface the hidden specs. |
Semantics of externals
CachingImplementing a cache at the C++ level would be ideal if this feature is desired for Clingo. For now, using Python's functools or a custom implementation is a good workaround for applications needing caching. Ensuring external functions are calledThis requirement seems specific to your application. Without more context, I cannot say more here. It does not seem to have anything to do with caching. |
Thank you very much for writing this down. Is it new or from some documentation I am not aware of? It is clear to me now how I can implement a cache in a consistent way. Although I still have some questions about the exact meaning of:
I'll be pondering about it and maybe I'll writeup some examples to see for myself how it works. Thanks for taking the effort! |
Here is an example: map = {}
def remap(x):
return map.setdefault(x, Number(len(map))) It will work fine with caching (probably on meaningful on the C++ level to avoid the python overhead because the function as such is not expensive). |
L.S.,
Although the User Guide warns that external functions need to be idempotent, it still calls them with the exact same arguments repeatedly.
This wasn't an issue until my model became larger and is now spending more time in callbacks than in grounding itself. It is also an issue because the marshalling C++/Python v.v. is taking a lot of time. With C++ callbacks, I likely wouldn't have noticed.
I found that a handful of functions, each producing <1000 unique results from the same arguments get called about 100.000 times.
From investigation it follows that the vast majority of the calls are repeated with the same arguments. I did not expect that given the prerequisite for idempotency.
Is there a way to avoid this?
Best regards,
Erik
The text was updated successfully, but these errors were encountered: