Idempotent callbacks in Python #539

ejgroene · 2025-02-24T14:47:53Z

L.S.,

Although the User Guide warns that external functions need to be idempotent, it still calls them with the exact same arguments repeatedly.

This wasn't an issue until my model became larger and is now spending more time in callbacks than in grounding itself. It is also an issue because the marshalling C++/Python v.v. is taking a lot of time. With C++ callbacks, I likely wouldn't have noticed.

I found that a handful of functions, each producing <1000 unique results from the same arguments get called about 100.000 times.

From investigation it follows that the vast majority of the calls are repeated with the same arguments. I did not expect that given the prerequisite for idempotency.

Is there a way to avoid this?

Best regards,
Erik

rkaminsk · 2025-02-24T15:05:31Z

You could try to write you rules so that the functions are called less often. Maybe store their results in a predicate and use that predicate throughout the program.

ejgroene · 2025-02-25T08:21:32Z

Thank you for the hints.

I was able to reduce the overhead of callbacks substantially by storing state in predicates and make the external functions pure functions. That way I could cache the results in a patched _pyclingo_ground_callback().

That is a little late, as Python has already been called, but since FFI seems to use a receiver makes right policy, it is possible to avoid much of the (un)marshalling when caching.

I also used FFI functions to unpack and to initialise data instead of Python loops/comprehensions and that speeds up significantly as well.

My cache uses function name plus arguments as a key, ignoring the context the external function is defined in/on. (Just for safety, I clear the cache when the context changes.) This works for me, for now at least. But it raises the question what exactly the context means regarding the semantics of the function call.

It is possible to ground different parts of a program with different contexts. But what does it mean to ground two parts into the same control using different contexts? It seems to me that one should not do that. Either the different contexts do exactly the same and the result is an incrementally ground program, or the contexts do different things, yielding a mess of different parts that begs for problems. Or is it?

I believe using multiple contexts is not very wise, and as a consequence, it is safe to ignore the context in my cache.

Regarding the formal semantics of the external functions and their context, is anything specified somewhere?

rkaminsk · 2025-02-25T08:47:39Z

Depending on your project's requirements, you might want to consider PyPy. Its Just-In-Time (JIT) compiler can significantly improve performance, especially with modules like clingo. Regarding your cache, directly accessing the ground callback could reduce the number of function calls, which can be expensive in standard Python due to the interpreter's limitations. PyPy could offer different results due to its JIT compiler. Note that JIT support is also in development for standard Python. For cleaner code, have a look at annotations from Python's functools library.

ejgroene · 2025-02-25T09:44:12Z

(Interesting to know that CPython also gets a JIT compiler. I wasn't aware of that. I surely need to take a look. I am a big fan of CPython (notwithstanding the GIL) because of its API allowing one to do next to everything.)

My project spends 90% of its time in grounding, of which quite a lot in _pyclingo_ground_callback, particularly in the FFI marshalling code. I replaced Python loops with calls to ffi.unpack etc. As these are C++ they are unfortunately not the target of the JIT.

I initially did try functools.cache, but since that is a general purpose cache, it requires Python data structures, precisely the ones I avoided to unmarshall. Also, it requires an extra intermediate function.

So I cache on lower level data. But I could still give functools another try, since it sure is cleaner.

What are your thoughts on the exact calling semantics of external functions? Is considering them purely functional something you would do?

rkaminsk · 2025-02-25T10:12:00Z

What are your thoughts on the exact calling semantics of external functions? Is considering them purely functional something you would do?

Clingo assumes external functions are pure. It has to because there is no guaranteed order in which functions are called or how often they are called. Do you mean that clingo itself should cache external functions calls? This would be possible. Whether this is beneficial depends. If a function is only ever called with the same arguments once, there is nothing to gain here. In your use case, speed ups are to be expected.

ejgroene · 2025-02-26T08:14:51Z

OK, that is clear. Pure functions it is.

But, it is possible to call ground() more than once on the same control, while passing different contexts. Is that something one should never do, or is it by design and does it give distinct possibilities I am not aware of?

If the latter is the case: what are the implications for the identification of external functions? Are external functions with the same name (and arity) considered equal even though they are on different context objects or classes?

rkaminsk · 2025-02-26T08:41:13Z

OK, that is clear. Pure functions it is.

But, it is possible to call ground() more than once on the same control, while passing different contexts. Is that something one should never do, or is it by design and does it give distinct possibilities I am not aware of?

If the latter is the case: what are the implications for the identification of external functions? Are external functions with the same name (and arity) considered equal even though they are on different context objects or classes?

Clingo requires that external functions are pure within one ground call. Whether a context object is changed for the next ground call or the functions in the context object change, does not matter for correct functioning of clingo. I would call it a possible use case to do this. I typically, use #program directives to inject step parameters, though. In short, there are no requirements from the system how functions across grounding steps should behave.

ejgroene · 2025-02-26T13:18:23Z

OK, so if there are no requirements regarding the results of external functions across contexts, that means the grounder must (have to/will) call external functions again.

Because, if it didn't, it would have an idempotency or pureness requirement across contexts.

This might result in otherwise undistinguishable literals, for example predictate_a(@func_a()) to be stored multiple times with different values, such as:

predicate_a(1).
predicate_b(2).

So one could implement some sort of enumeration, because previous facts are not removed between grounds.
That in fact does mean that Clingo supports external functions with side effects (or contexts with state if you will), but only across grounds.

As for the calling semantics of external functions: they will be called at least once (for each combination of arguments) during each ground.

As for my cache: resetting it when the context changes seems like an necessary thing to do.

Is my logic right?

rkaminsk · 2025-02-26T14:01:21Z

I think you can do anything here that your application requires. Just follow the minimum requirements clingo needs.

ejgroene · 2025-02-26T20:45:42Z

(I am very sorry this issue's topic is skewing a bit, but while implementing a proper cache, this came up.)

Well, I am doing "anything", but since I cannot be sure if and when external functions are called, I cannot rely on behaviour that goes beyond pure functional. At the moment, it seems like external functions are called on each ground, but not all of them, because optimisation comes first, and functions called in a first ground can possibly not be called in a second ground, or v.v.

My workaround is to make sure external functions are called is using aggregations (where not needed logically) because those inhibit certain optimisations. But it is not really a good solution, as rewriting a piece of logic might unexpectedly lead to external functions no longer called or get called where they were not in the first place.

Also, #499 will change the calling of external functions. Whether 499 is an improvement or not does not matter. The point is that it will change if external functions are called.

That means the calling semantics of external functions are basically: zero or more times. That isn't really workable is it?

I am not asking for a change here, but I am trying to surface the hidden specs.

rkaminsk · 2025-02-26T21:49:04Z

Semantics of externals

Consistency with Pure Functions: When using pure external functions, you can always expect the same answer sets, regardless of the ASP system being used. This ensures portability and reproducibility across different systems.
Flexibility with Non-Pure Functions: While there's no strict requirement to use pure functions, doing so ensures consistency. If you choose to use non-pure functions, where the order of calls affects results, it's crucial that they still produce the same output for the same inputs. However, this approach might lead to differences in answer sets between ASP systems. Despite this, it can be useful in certain applications.
Avoiding Inconsistent Results: If functions return different results when called multiple times with the same arguments, it can lead to unpredictable behavior. This should generally be avoided to maintain reliability.

Caching

Implementing a cache at the C++ level would be ideal if this feature is desired for Clingo. For now, using Python's functools or a custom implementation is a good workaround for applications needing caching.

Ensuring external functions are called

This requirement seems specific to your application. Without more context, I cannot say more here. It does not seem to have anything to do with caching.

ejgroene · 2025-03-04T07:51:29Z

Thank you very much for writing this down. Is it new or from some documentation I am not aware of?

It is clear to me now how I can implement a cache in a consistent way. Although I still have some questions about the exact meaning of:

If you choose to use non-pure functions, where the order of calls affects results, it's crucial that they still produce the same output for the same inputs.

I'll be pondering about it and maybe I'll writeup some examples to see for myself how it works.

Thanks for taking the effort!

rkaminsk · 2025-03-04T10:29:45Z

If you choose to use non-pure functions, where the order of calls affects results, it's crucial that they still produce the same output for the same inputs.

I'll be pondering about it and maybe I'll writeup some examples to see for myself how it works.

Here is an example:

map = {}
def remap(x):
    return map.setdefault(x, Number(len(map)))

It will work fine with caching (probably on meaningful on the C++ level to avoid the python overhead because the function as such is not expensive).

rkaminsk added the question label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idempotent callbacks in Python #539

Idempotent callbacks in Python #539

ejgroene commented Feb 24, 2025

rkaminsk commented Feb 24, 2025

ejgroene commented Feb 25, 2025 •

edited

Loading

rkaminsk commented Feb 25, 2025

ejgroene commented Feb 25, 2025

rkaminsk commented Feb 25, 2025

ejgroene commented Feb 26, 2025

rkaminsk commented Feb 26, 2025

ejgroene commented Feb 26, 2025 •

edited

Loading

rkaminsk commented Feb 26, 2025 •

edited

Loading

ejgroene commented Feb 26, 2025

rkaminsk commented Feb 26, 2025

ejgroene commented Mar 4, 2025

rkaminsk commented Mar 4, 2025 •

edited

Loading

Idempotent callbacks in Python #539

Idempotent callbacks in Python #539

Comments

ejgroene commented Feb 24, 2025

rkaminsk commented Feb 24, 2025

ejgroene commented Feb 25, 2025 • edited Loading

rkaminsk commented Feb 25, 2025

ejgroene commented Feb 25, 2025

rkaminsk commented Feb 25, 2025

ejgroene commented Feb 26, 2025

rkaminsk commented Feb 26, 2025

ejgroene commented Feb 26, 2025 • edited Loading

rkaminsk commented Feb 26, 2025 • edited Loading

ejgroene commented Feb 26, 2025

rkaminsk commented Feb 26, 2025

Semantics of externals

Caching

Ensuring external functions are called

ejgroene commented Mar 4, 2025

rkaminsk commented Mar 4, 2025 • edited Loading

ejgroene commented Feb 25, 2025 •

edited

Loading

ejgroene commented Feb 26, 2025 •

edited

Loading

rkaminsk commented Feb 26, 2025 •

edited

Loading

rkaminsk commented Mar 4, 2025 •

edited

Loading