Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve nogc error when out of space #1175

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

eileencodes
Copy link
Contributor

When using NoGC if the machine runs out of memory, the user will see an error that says garbage collection failed:

ERROR: An MMTk GC thread panicked.  This is a bug.
panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/plan/nogc/global.rs:74:9:
internal error: entered unreachable code: GC triggered in nogc
Backtrace is disabled.
run with `RUST_BACKTRACE=1` environment variable to display a backtrace
running file: test/zlib/test_zlib.rb

I added a check in poll to see if the plan being used supports GC and if not, raise an error. I could have updated the schedule_collection function to return this error, but I felt that since that function is regarding scheduling, it was better to raise if any plan does not support GC than change the purpose of schedule_collection. We still want to know if a GC ended up being triggered in schedule_collection.

The new error looks like this:

[2024-07-24T18:43:45Z INFO  mmtk::memory_manager] Initialized MMTk with NoGC (FixedHeapSize(1073741824))
thread '<unnamed>' panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/util/heap/gc_trigger.rs:80:17:
internal error: entered unreachable code: User ran out of space and the plan does not support collecting garbage.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
running file: test/zlib/test_zlib.rb

Note: I've written very little rust so while I know this works (tested in CRuby) I don't know if from a design perspective this is correct.

When using `NoGC` if the machine runs out of memory, the user will see
an error that says garbage collection failed:

```
ERROR: An MMTk GC thread panicked.  This is a bug.
panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/plan/nogc/global.rs:74:9:
internal error: entered unreachable code: GC triggered in nogc
Backtrace is disabled.
run with `RUST_BACKTRACE=1` environment variable to display a backtrace
running file: test/zlib/test_zlib.rb
```

I added a check in `poll` to see if the plan being used supports GC and
if not, raise an error. I could have updated the `schedule_collection`
function to return this error, but I felt that since that function is
regarding scheduling, it was better to raise if any plan does not
support GC than change the purpose of `schedule_collection`. We still
want to know if a GC ended up being triggered in `schedule_collection`.

The new error looks like this:

```
[2024-07-24T18:43:45Z INFO  mmtk::memory_manager] Initialized MMTk with NoGC (FixedHeapSize(1073741824))
thread '<unnamed>' panicked at /Users/eileencodes/src/github.com/open_source/mmtk-core/src/util/heap/gc_trigger.rs:80:17:
internal error: entered unreachable code: User ran out of space and the plan does not support collecting garbage.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
running file: test/zlib/test_zlib.rb
```
@eileencodes eileencodes changed the title Improve nogc error Improve nogc error when out of space Jul 24, 2024
@wks
Copy link
Collaborator

wks commented Jul 25, 2024

Yes. It makes sense to let it exit immediately when a mutator thread has an allocation failure. The previous code relies on the unreachable!() macro in schedule_collection executed on a GC thread, and the error message appeared as if it were a bug in MMTk while it is just normal for NoGC.

I made some slight changes. I retained the info! log in poll so that when we see the number of pages it prints is too small, we know we need to increase the heap size. I also changed unreachable! to panic! because it is technically not "unreachable code". unreachable! is intended to guard code paths that shouldn't be executed unless there is a bug in the code, but running out of memory is a run-time error.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks
Copy link
Collaborator

wks commented Jul 25, 2024

There are two additional problems.

  1. We currently do mock testing using MockVM, and it hooks the Collection::block_for_gc call back function to identify if the mutator triggered GC in a code snippet. If we panic early in GCTrigger::poll, it will not reach block_for_gc.
  2. We actually have an out-of-memory handler Collection::out_of_memory. Its default behavior is panicking, but it allows the VM binding to override it in order to handle the event of out-of-memory in a VM-specific manner. For example, JVM should throw OutOfMemoryError. Ruby doesn't seem to have a hard heap size limit, but only a heap growth factor, and the operating system will start killing innocent processes after a Ruby program allocates too many large objects (which have buffers allocated by malloc).

I think we need to think more about this topic, i.e. out-of-memory handling in NoGC. Intuitively, if we run out of memory in NoGC, then Collection::out_of_memory should be called, which defaults to panicking.

@qinsoon
Copy link
Member

qinsoon commented Jul 25, 2024

We can just rewrite related tests. I am also okay if you would like to call out_of_memory for NoGC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants