Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: remove leak of JNI weak references in JniWrapperCache #1648

Merged
merged 5 commits into from
Feb 3, 2025

Conversation

pwrobeldev
Copy link
Contributor

@pwrobeldev pwrobeldev commented Jan 29, 2025

----- Motivation -----
According to Java documentation of 'PhantomReference' class, which is used to
schedule post-mortem cleanup of resources of C++ objects tied with Java objects
and cleanup of the JniWrapperCache entry, there can be a time gap between clearing
PhantomReference of Java object and putting PhantomReference object in the registered
queue.

During this time window the user of JniWrapperCache should not be capable of getting
Java object from weak reference handle stored in wrapper cache map, because JNI weak
reference returns null -- it is treated like PhantomReference and also cleared by GC.

Sadly, the function, which adds to JniWrapperCache does not check if there is an entry for a given
native pointer. As a result, we may override existing weak reference handle -- therefore, we would
never destroy it. This results in a weak reference handle leak.

The situation may happen if we query the same object multiple times. For instance when we have a
C++ class, which has a member of object type that can be obtained via getter function. Calling this
getter multiple times may cause weak reference handles leak.

----- Implemented changes -----

  • A new internal debug flag, which enables JNI weak reference check in the function that adds to cache.
  • A new Java functional test used to reproduce the problem.
  • The fix in JniWrapperCache, which ensures that if object was garbage collected it is removed from the map.

@pwrobeldev pwrobeldev changed the title Java: implement test which triggers leak of JNI weak references Java: remove leak of JNI weak references in JniWrapperCache Jan 29, 2025
@pwrobeldev
Copy link
Contributor Author

Note to reviewers: the CI failed for the first commit, which introduced the stress test. This shows, that weak reference handles were leaked. Then a fix was added to PR.

@pwrobeldev pwrobeldev marked this pull request as ready for review January 29, 2025 11:59
@pwrobeldev pwrobeldev requested a review from Hsilgos January 29, 2025 12:07
@@ -19,6 +19,10 @@ cmake_minimum_required(VERSION 3.6)

project(functional)

if (INTERNAL_GLUECODIUM_JNI_WEAK_REF_CHECK)
add_definitions(-DINTERNAL_GLUECODIUM_JNI_WEAK_REF_CHECK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Better use GLUECODIUM_ prefix: GLUECODIUM_INTERNAL_JNI_WEAK_REF_CHECK
  2. Perhaps it can be more general name to combine it with more debug checks in the future.
  3. add_definition is obsolete function. target_compile_definitions should be used instead.
  4. By good this option must be part of gluecodium-cmake scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed. Added a new optional property called GLUECODIUM_ENABLE_INTERNAL_DEBUG_CHECKS.


#ifdef INTERNAL_GLUECODIUM_JNI_WEAK_REF_CHECK
if (s_wrapper_cache.count(obj_ptr) > 0) {
jenv->ThrowNew(jenv->FindClass("java/lang/Exception"), "Weak reference leaked!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may call throw_new_runtime_exception(jenv, "Weak reference leaked!") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍


#ifdef INTERNAL_GLUECODIUM_JNI_WEAK_REF_CHECK
if (iter == s_wrapper_cache.end()) {
jenv->ThrowNew(jenv->FindClass("java/lang/Exception"), "Invalid removal of cache entry");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

@@ -34,9 +36,17 @@ namespace jni

static std::mutex s_mutex;
static std::unordered_map<const void*, jobject> s_wrapper_cache;
static std::unordered_map<const void*, int> s_skip_removals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be instead of 2nd global container we may keep jobject and "scheduled removals" in struct in s_wrapper_cache?

struct CachedObject {
  jobject obj = nullptr;
  unsigned int scheduled_removals = 0;
};

static std::unordered_map<const void*, CachedObject> s_wrapper_cache;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added 👍

@pwrobeldev pwrobeldev force-pushed the pwrobeldev/jni-weak-references-leak branch from cc8665f to a0278da Compare February 3, 2025 09:11
According to Java documentation of 'PhantomReference' class, which is used to
schedule post-mortem cleanup of resources of C++ objects tied with Java objects
and cleanup of the JniWrapperCache entry, there can be a time gap between clearing
PhantomReference of Java object and putting PhantomReference object in the registered
queue.

During this time window the user of JniWrapperCache should not be capable of getting
Java object from weak reference handle stored in wrapper cache map, because JNI weak
reference returns null -- it is treated like PhantomReference and also cleared by GC.

Sadly, the function, which adds to JniWrapperCache does not check if there is an entry for a given
native pointer. As a result, we may override existing weak reference handle -- therefore, we would
never destroy it. This results in a weak reference handle leak.

The situation may happen if we query the same object multiple times. For instance when we have a
C++ class, which has a member of object type that can be obtained via getter function. Calling this
getter multiple times may cause weak reference handles leak.

This commit brings a new functional test, which triggers the leak.

Signed-off-by: Patryk Wrobel <[email protected]>
When trying to obtain a Java object from cache
we may encounter a situation in which we cannot
create local reference, because the Java object
was garbage collected. In such a case obtained
local reference points to NULL.

We should delete weak reference and the entry from
cache, because if the check is performed in the
'convert_to_jni()' function we create a new Java object
and try to cache it in JniWrapperCache. The function which
adds to cache does not check if it contains any record for
given pointer.

We may encounter the situation in which we override handle
stored in cache without deleting it. This would lead to weak
reference handle leak.

Signed-off-by: Patryk Wrobel <[email protected]>
Signed-off-by: Patryk Wrobel <[email protected]>
This change extends debug checks in JniWrapperCache class
to show the problem related to double dispose.

When we remove an entry from JniWrapperCache in 'get_cached_wrapper_impl()'
because the reference was invalidated we must anticipate, that the cleanup
will be scheduled for this non-existent object. It may cause problems, because
the cleanup of weak reference entry for the old object may remove the entry
for the new entry.

This commit extends checking and again makes 'stressTestJniWeakReferenceCache()'
test to fail.

In next commit the fix will be implemented to prevent such double-dispose.

Signed-off-by: Patryk Wrobel <[email protected]>
When garbage collector clears the reference and we identify it
in 'JniWrapperCache::cache_wrapper_impl()' we remove the entry
from cache. This ensures, that references are not leaked in
'convert_to_jni()' function.

However, this removal of entry from cache had very bad consequence.
For the already cleared object disposal of cached entry can be
scheduled after new entry for the same pointer is added to cache.
In such situation the disposal scheduled for old object removed
entry related to new object.

In order to prevent that a counter was introduced. The last scheduled
removal from cache erases the entry -- the remaining ones do not do
anything.

Signed-off-by: Patryk Wrobel <[email protected]>
@pwrobeldev pwrobeldev force-pushed the pwrobeldev/jni-weak-references-leak branch from a0278da to 116cdc4 Compare February 3, 2025 09:15
@pwrobeldev pwrobeldev merged commit a18a561 into master Feb 3, 2025
17 checks passed
@pwrobeldev pwrobeldev deleted the pwrobeldev/jni-weak-references-leak branch February 3, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants