[JVM] Necromance a cheaper (de)serialization #777

Kaiepi · 2022-08-04T06:50:21Z

This started as means of fixing rakudo/rakudo#4952 for the next stretch of #776, but is kind of its own rabbit hole. The hs_err log produced by the failure in CORE.d.setting points to this line as the source of the error from within nqp:

nqp/src/vm/jvm/runtime/org/raku/nqp/runtime/LibraryLoader.java

Line 71 in 4f4d4ef

CompilationUnit cu = (CompilationUnit)c.newInstance();

While looking around here, I noticed we are very lackadaisical with how we handle serialized bytecode within memory. Namely, when LibraryLoad.load is invoked, we can wind up copying entire files in memory multiple times over, and their decompressed classfile and bytecode files persist in memory as long as their classes do. I haven't quite managed to even provoke a change in the CORE.d.setting error produced, but I have managed to convince CORE.c.setting to build in 90% of the time it does on master by reducing the copying to just one instance so far. This can be taken further to eliminate copying of files in bulk entirely, but will take time.

Passes make test with OpenJDK 19-ea, I guess.

We use Java 9 now, so we can get rid of some of the locking involved here.

We've been depending largely on byte *arrays* of entire files so far. Now we prefer to mmap in small chunks when access to a file is given, doing a heap allocation given raw byte input instead, waiting until the last minute to dump the entire thing in memory in either case. We introduce a ByteBufferedInputStream to help avoid repetitively reading files into (potentially huge!) chunks of memory on its own. Its `copy` method should *not* exist, but this class is a whole tangent on its own, so just do the bare minimum of an implementation of an input stream given a byte buffer.

Despite a JAR doing ZIP compression, it doesn't do so hot with Raku's bytecode. XZ can help reduce the size of a JAR produced by us to some degree (50957 bytes for NQPCORE.setting with XZ vs. 53286 bytes with ZIP), and *hopefully* doesn't interfere with precomp times.

Seems to offer an improvement in build times and allows for better handling of memory with large workloads anyways.

This isn't a very good idea because it will lead to larger bytecode when we already have bytecode size problems, but neither is spinning up instances with caches we never wind up using. We'll deal with this properly once ByteBufferedInputStream and IgnoreNameClassLoader get more attention.

We have a shiny new serialized XZ type of file to read, so we have no worries as to versioning of how its read; let's exploit that. Java enjoys having names for its defined classes, so put the name of the class in the XZ's JAR entry comment field and use it when redefining it on read. We wind up with null if none can be found anyways, giving us prior behaviour for uncompressed serializations too. We have a loadClass-ish method that doesn't follow Java's spec per se. We have the choice of linking with resolveClass after building one with a findClass-ish method; do so. Slap a synchronized modifier on our new-ish loadByteClass method and now there's nothing preventing us from having parallel class loading. Mark our new SerialClassLoader as being parallel-capable and set -XX:+AllowParallelDefineClass now that we have a somewhat more modern JDK and now that there shouldn't be anything else preventing us from doing so.

Maybe this'll return as a config thing if this carries a significant enough improvement.

Kaiepi · 2022-08-04T07:40:06Z

Turns out -XX:+UseTransparentHugePages is a detriment! CORE.c.setting can now build 23% faster than on master.

ClassLoader.getClassLoadingLock would throw before, but I couldn't figure out why: it can't handle null names because it keeps a map of defined strings. Override ClassLoader.getClassLoadingLock; our unique object to lock on can just be ourself for now.

We *should be* able to InputStream our way out of keeping files in memory, with the exception of when other parts of nqp do so beforehand, thus the existence of the MemoryClassLoader. The SerialClassLoader wraps an input stream to read from, which JarFileClassLoader extends with the preliminary file loading. A similar FileClassLoader exists, which only cares about decompressed classfiles. Regardless, Rakudo builds are broken because xz's XZInputStream has its bugs. Womp womp.

xz now breaks Rakudo builds by its minimal, quirky InputStream implementation in the form of XZInputStream, which doesn't play nice with the default readAllBytes implementation, which we kinda need in some form. lz4, in comparison to xz and zlib, is more inefficient in terms of compressed output size (NQPCORE.setting.jar is now 57KB compared to 52KB w/ ZIP DEFLATE), but is nevertheless more efficient to compress and decompress. Because precomp files are apt to exist locally, hopefully this tradeoff is worth it...

Given it has been refactored to death.

All of its subtypes are oriented towards files in some way, MemoryClassLoader handling cases where the file's already in memory anyway.

ByteClassLoader tracks which classes are defined, while GlobalContext tracks which are loaded. The LibraryLoader depends on the former, but what if the latter already defined a class to be loaded (circling back to itself via require, perhaps)? ByteClassLoader can keep track of all classes loaded or defined by ourselves if it acts as a parent for LibraryLoader's various classloaders, eliminating such a duplicate definition. If we give StreamClassLoader and MemoryClassLoader a new SerialClassLoader parent that takes care of class caching, we get to dedup some logic there as well.

We only need them once, while we still have access the original reference to any given classfile. Also, name it such rather than the vague "bytes" or "serial class".

This helps reduce the size of precompiled JARs, and is information not typically included in a release to begin with.

This is a matter of picking a compression level and keeping a static compressor and decompressor around.

Kaiepi · 2022-08-09T15:55:07Z

There's more that can be done in this vein in the SerializationReader whereabouts, but I'd rather not mess with both the library loader and serialization in general at the same time. CI seems to enjoy these changes as-is.

- Actually use any cached class, read or defined... - We need to use getName when name is null in either case, not getCanonicalName, so we avoid name clashing with static classes.

usev6 · 2022-09-19T18:34:31Z

This looks/sounds very promising.

I don't feel qualified to give it a real review, but I'll at least try to test the branch. (But that will take some time.)

patrickbkr · 2022-10-21T15:12:48Z

Given this unblocks Kaiepi, what hinders merging this as is? It doesn't seem to touch MoarVM things, so can't break thinks too badly.

usev6

As I noted earlier, I don't feel qualified for a real review. But I tried my best and raised a few questions. @Kaiepi maybe you could have a look at those?

Apart from those comments this still looks very promising to me. I didn't come around to give it a real test, but will do that now.

tools/build/install-jvm-runner.pl.in

src/vm/jvm/runtime/org/raku/nqp/runtime/GlobalContext.java

src/vm/jvm/runtime/org/raku/nqp/runtime/LibraryLoader.java

usev6++

usev6 · 2022-10-23T06:44:42Z

A spectest looked good as well.

usev6 · 2022-10-24T19:06:39Z

Given this unblocks Kaiepi, what hinders merging this as is? It doesn't seem to touch MoarVM things, so can't break thinks too badly.

From me it's a +1 for merging.

If no-one else chimes in I'll hit the merge button in a couple of days.

Kaiepi added 6 commits August 4, 2022 06:33

[JVM] Do thread-safe mutable state in precomp atomically

66a2e89

We use Java 9 now, so we can get rid of some of the locking involved here.

[JVM] Set UseTransparentHugePages and AggressiveHeap java flags

d16e247

Seems to offer an improvement in build times and allows for better handling of memory with large workloads anyways.

Kaiepi mentioned this pull request Aug 4, 2022

[JVM] Tune Java in response to nqp's (de)serialization rewrite rakudo/rakudo#5016

Merged

[JVM] Remove non-portable UseTransparentHugePages flag

4984e94

Maybe this'll return as a config thing if this carries a significant enough improvement.

Kaiepi force-pushed the jvm-necromancy branch from 08d509b to 4984e94 Compare August 4, 2022 07:02

Kaiepi force-pushed the jvm-necromancy branch from f3aeebe to 4724757 Compare August 5, 2022 18:35

Kaiepi added 3 commits August 7, 2022 08:19

[JVM] Protect against IOException on read in the deserialize op

49d6df0

Kaiepi force-pushed the jvm-necromancy branch 2 times, most recently from 0d208d7 to 49d6df0 Compare August 7, 2022 21:00

Kaiepi added 8 commits August 8, 2022 07:59

[JVM] Remove stray imports, debug prints, etc. from LibraryLoader

fc14289

Given it has been refactored to death.

[JVM] StreamClassLoader can have its filename

bc82158

All of its subtypes are oriented towards files in some way, MemoryClassLoader handling cases where the file's already in memory anyway.

[JVM] Finalize MemoryClassLoader buffer fields

04514ad

[JVM] Keep weak references to buffered classfiles in LibraryLoader

eaac9e3

We only need them once, while we still have access the original reference to any given classfile. Also, name it such rather than the vague "bytes" or "serial class".

[JVM] Do not produce debug symbols

8a1e692

This helps reduce the size of precompiled JARs, and is information not typically included in a release to begin with.

[JVM] Tune LZ4 compression for performance

f14c571

This is a matter of picking a compression level and keeping a static compressor and decompressor around.

[JVM] Don't throw bare exceptions from (de)serialization ops

3abee01

Kaiepi marked this pull request as ready for review August 9, 2022 15:54

[JVM] Fix caching of defined classes by the ByteClassLoader

6b009db

- Actually use any cached class, read or defined... - We need to use getName when name is null in either case, not getCanonicalName, so we avoid name clashing with static classes.

Kaiepi force-pushed the jvm-necromancy branch from 7e8c859 to 6b009db Compare August 9, 2022 21:29

usev6 reviewed Oct 22, 2022

View reviewed changes

Kaiepi added 2 commits October 22, 2022 18:06

[JVM] Eliminate references to xz from the move to lz4

dff4850

usev6++

[JVM] Remove some unused imports from GlobalContext

f4177ab

usev6++

usev6 merged commit ccf2d60 into Raku:master Oct 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JVM] Necromance a cheaper (de)serialization #777

[JVM] Necromance a cheaper (de)serialization #777

Kaiepi commented Aug 4, 2022 •

edited

Loading

Kaiepi commented Aug 4, 2022

Kaiepi commented Aug 9, 2022

usev6 commented Sep 19, 2022

patrickbkr commented Oct 21, 2022

usev6 left a comment

usev6 commented Oct 23, 2022

usev6 commented Oct 24, 2022

[JVM] Necromance a cheaper (de)serialization #777

[JVM] Necromance a cheaper (de)serialization #777

Conversation

Kaiepi commented Aug 4, 2022 • edited Loading

Kaiepi commented Aug 4, 2022

Kaiepi commented Aug 9, 2022

usev6 commented Sep 19, 2022

patrickbkr commented Oct 21, 2022

usev6 left a comment

Choose a reason for hiding this comment

usev6 commented Oct 23, 2022

usev6 commented Oct 24, 2022

Kaiepi commented Aug 4, 2022 •

edited

Loading