Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two step space creation #1004

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

wks
Copy link
Collaborator

@wks wks commented Oct 30, 2023

We let plans specify the requirement of all spaces, and then compute the start and extent of all spaces at the same time. By doing this, when a space (including discontiguous spaces) is created, we know its start and extent, and will no longer need to mutate the space later. This removes the global vec of unsafe mutable references to FreeListPageResource in Map32.

Concretely, the changes include:

  1. Splitting VMRequest into two structures, VMRequest and VMResponse.
    • VMRequest specifies the requirements, like the old VMRequest, but Discontiguous is changed to Unrestricted. The plan no longer requests a "discontiguous" space which will later be changed to "contiguous" on 64-bit machines.
    • VMResponse represents the placement decision (contiguousness, start and extent). If the VMRequest is "Unrestricted", the VMResponse may be contiguous or discontiguous.
  2. The procedure of creating spaces is split into a specification phase and a creation phase.
    • The plan first specify all spaces, providing requirements.
    • The HeapMeta type determines the contiguousness and address range of each space.
    • The plan then create contiguous or discontiguous spaces at those ranges.
  3. The shared_fl_map: Vec<Option<NonNull<CommonFreeListPageResource>>> field in Map32 is removed. That global map was used to record all FreeListPageResource instances in order to fix their start addresses after the start address of the discontiguous range is computed after spaces are created. Since we now know the discontiguous range before creating any space, we no longer need that vec.

Related issues and PRs

This PR is part of the greater goal of removing unsafe operations related to page resources and VMMap, described in #853:

An older PR #953 also removes the unsafe Map32::shared_fl_map, but it does so by letting plans provide a method for_each_space_mut to mutate spaces after creation. It doesn't fix the root of the problem because it still mutates spaces after creation.

Things not fixed in this PR

Map64 holds global mutable references to FreeListPageResource and mutate their start addresses, too, but for a completely different reason (see #853). But since fixing this problem will require non-trivial changes, too, I'll postpone that to another PR, possibly a shrunk version of #953 that focuses on Map64 alone.

This PR does not remove the UnsafeCell<Map32Inner> in Map32. It will be removed later.

@wks wks requested review from qinsoon and wenyuzhao October 30, 2023 07:54
@wks
Copy link
Collaborator Author

wks commented Oct 30, 2023

Should we rename HeapMeta?

I reused the struct of the same name, and extended its responsibility. The HeapMeta type did not exist in JikesRVM. In JikesRVM, heapCursor and heapLimit were static fields of the abstract class Space. HeapMeta was introduced in 32158cd so that heap_cursor and heap_limit can be shared between spaces.

I think we can rename this type to more accurately describe its role. I am thinking about "SpacePlacer" because that's what it does -- deciding the places of spaces.

And there is "SpaceMeta"

So the placement decision for each space is represented by a "SpaceMeta". I just named it after HeapMeta. Maybe there is a better name.

Space "allocation" or "placement"?

I kept using the phrase "space placement" to refer to the action of deciding the start and extent of spaces. I feel that we have always been using the word "allocation" to refer to allocating memory for things, and "space allocation" may be interpreted as "doing mmap so that the space is backed by actual memory". Let's discuss if there is a better phrase for "space placement".

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR illustrates that we can remove Map32::shared_fl_map by introducing an explicit two-spec space creation. I think it is a good idea. However, implementation-wise, the PR is too complex at the moment.

I suggest we keep using the name VMRequest. I see in the PR, that some enum variants need to be renamed. But I don't see a good reason we should rename the enum, and the new name does not seem to be clearer. I think we can just keep the name, and it will be helpful when we need to cross reference Rust and Java MMTk.

The concepts of asynchronous programming (future/promise/resolution) does not really applicable to our context in my opion, and the implementation is too complex. We can just ask each plan to provide a list of VMRequests when we create HeapMeta. HeapMeta resolves the VMRequests when it is created, and store it in HeapMeta. Then we pass HeapMeta to new() for each plan in the same way as it is now, and the plan can get the corresponding SpaceMeta and use that to create the spaces. Concretely, we could also ask for a HashMap<&str, VMRequest> (&str is the space name), and provide fn get_space_meta(name: &str) -> SpaceMeta on HeapMeta.

@qinsoon
Copy link
Member

qinsoon commented Oct 30, 2023

Should we rename HeapMeta?

I reused the struct of the same name, and extended its responsibility. The HeapMeta type did not exist in JikesRVM. In JikesRVM, heapCursor and heapLimit were static fields of the abstract class Space. HeapMeta was introduced in 32158cd so that heap_cursor and heap_limit can be shared between spaces.

I think we can rename this type to more accurately describe its role. I am thinking about "SpacePlacer" because that's what it does -- deciding the places of spaces.

We can rename HeapMeta if we have a good name. This does not need to be done in this PR (but you can include in the PR if you think it is appropriate).

What HeapMeta is actually doing is to layout the virtual memory for the spaces. My suggestion is to rename the current VMLayout to VMLayoutParameters and to rename HeapMeta to VMLayout.

@wks
Copy link
Collaborator Author

wks commented Oct 30, 2023

I suggest we keep using the name VMRequest. I see in the PR, that some enum variants need to be renamed. But I don't see a good reason we should rename the enum, and the new name does not seem to be clearer. I think we can just keep the name, and it will be helpful when we need to cross reference Rust and Java MMTk.

Yes. After a second thought, I think we should keep the name VMRequest. We can also rename SpaceMeta to VMResponse to pair it with VMRequest.

The concepts of asynchronous programming (future/promise/resolution) does not really applicable to our context in my opion, and the implementation is too complex. We can just ask each plan to provide a list of VMRequests when we create HeapMeta. HeapMeta resolves the VMRequests when it is created, and store it in HeapMeta. Then we pass HeapMeta to new() for each plan in the same way as it is now, and the plan can get the corresponding SpaceMeta and use that to create the spaces. Concretely, we could also ask for a HashMap<&str, VMRequest> (&str is the space name), and provide fn get_space_meta(name: &str) -> SpaceMeta on HeapMeta.

I designed it this way so that the SpaceMeta and the constructor of concrete spaces are in the same scope. For example, for SemiSpace, we define both copyspace0_meta and copyspace1_meta in the same function SemiSpace::new(). In this way, we never miss the specification of any space. If we miss a VMRequest, or if we consume the same VMResponse twice, it will be a compile error.

But if we use HashMap<&str, VMRequest>, and require the plan to provide a list of VMRequest instances, we will have to write one function to provide a list of (name, VMRequest) pairs, and another function new() to actually instantiate the Plan and the spaces. It will increase the chance to make mistake because

  1. The space names in both functions must match, and
  2. The number of spaces must match, and
  3. The compiler cannot check if they match at compile time.

If the idea of future/promise makes it hard to understand, we can rename FutureSpaceMeta to PendingSpaceMeta or PendingVMResponse.

@qinsoon
Copy link
Member

qinsoon commented Oct 30, 2023

The concepts of asynchronous programming (future/promise/resolution) does not really applicable to our context in my opion, and the implementation is too complex. We can just ask each plan to provide a list of VMRequests when we create HeapMeta. HeapMeta resolves the VMRequests when it is created, and store it in HeapMeta. Then we pass HeapMeta to new() for each plan in the same way as it is now, and the plan can get the corresponding SpaceMeta and use that to create the spaces. Concretely, we could also ask for a HashMap<&str, VMRequest> (&str is the space name), and provide fn get_space_meta(name: &str) -> SpaceMeta on HeapMeta.

I designed it this way so that the SpaceMeta and the constructor of concrete spaces are in the same scope. For example, for SemiSpace, we define both copyspace0_meta and copyspace1_meta in the same function SemiSpace::new(). In this way, we never miss the specification of any space. If we miss a VMRequest, or if we consume the same VMResponse twice, it will be a compile error.

We can achieve the same goal with a hash map. We can make sure that a space meta can be fetched exactly once (we remove an entry from the result hashmap once fetched), and after plan creation, the result hashmap should be empty (retrieving space meta matches VM requests). Fetching a wrong space meta will panic.

But if we use HashMap<&str, VMRequest>, and require the plan to provide a list of VMRequest instances, we will have to write one function to provide a list of (name, VMRequest) pairs, and another function new() to actually instantiate the Plan and the spaces. It will increase the chance to make mistake because

  1. The space names in both functions must match, and
  2. The number of spaces must match, and
  3. The compiler cannot check if they match at compile time.

If the idea of future/promise makes it hard to understand, we can rename FutureSpaceMeta to PendingSpaceMeta or PendingVMResponse.

  1. Using a constant for space names would solve this.
  2. This can be checked easily.
  3. It is an advantage for compile time check, but it is not decisive.

These are the two main reasons why I suggested providing a list of VMRequests:

1. It is structurally and logically simpler

This is what the current code does. The plan needs to make sure the ordering is correct for creating VMRequests, letting the parent plan to create VMRequests, and then resolving its own space creation.

Plan: VMRequest1
ParentPlan: VMRequest2
ParentParentPlan: VMRequest3
ParentParentPlan: SpaceMeta3
ParentPlan: SpaceMeta2
Plan: SpaceMeta1

What I propose. All the requests are known beforehand, and are resolved. During plan creation, each plan just needs to get the space meta for their own spaces.

MMTK: [VMRequest1, VMRequest2, VMReqest3]
HeapMeta: [SpaceMeta1, SpaceMeta2, SpaceMeta3]
Plan: Fetch SpaceMeta1
ParentPlan: Fetch SpaceMeta2
ParentParentPlan: Fetch SpaceMeta3

2. It matches the design of "plan as configuration"

We have a general goal of being able to provide a configuration file for plans. We currently ask the plan for a configuration of allocators, allocation/copy semantics mapping, etc. Asking the plan to provide a configuration of VMRequests fits the design.


I am happy to discuss further on the issue.

@wks
Copy link
Collaborator Author

wks commented Oct 31, 2023

We can achieve the same goal with a hash map. We can make sure that a space meta can be fetched exactly once (we remove an entry from the result hashmap once fetched), and after plan creation, the result hashmap should be empty (retrieving space meta matches VM requests). Fetching a wrong space meta will panic.

It is still run-time checking instead of compile-time checking.

  1. Using a constant for space names would solve this.

  2. This can be checked easily.

  3. It is an advantage for compile time check, but it is not decisive.

That mitigated the problem, but I still believe that compile-time checking (with the help of ownership and move checker) is a better solution because it prevents those things from happening completely.

These are the two main reasons why I suggested providing a list of VMRequests:

1. It is structurally and logically simpler

This is what the current code does. The plan needs to make sure the ordering is correct for creating VMRequests, letting the parent plan to create VMRequests, and then resolving its own space creation.

Plan: VMRequest1
ParentPlan: VMRequest2
ParentParentPlan: VMRequest3
ParentParentPlan: SpaceMeta3
ParentPlan: SpaceMeta2
Plan: SpaceMeta1

What I propose. All the requests are known beforehand, and are resolved. During plan creation, each plan just needs to get the space meta for their own spaces.

MMTK: [VMRequest1, VMRequest2, VMReqest3]
HeapMeta: [SpaceMeta1, SpaceMeta2, SpaceMeta3]
Plan: Fetch SpaceMeta1
ParentPlan: Fetch SpaceMeta2
ParentParentPlan: Fetch SpaceMeta3

I don't like the approach of making all VMRequest instances in MMTK. Plan determines the set of spaces. If we put all VMRequest objects into MMTK, it will be cluttered with all the spaces from all implemented plans, and will soon become a chaos. For example, what if we have two generational plans, but their ideal nursery sizes are different? We may end up having nursery_for_plan1: VMRequest, nursery_for_plan2: VMRequest, ...

There is one alternative that I actually considered. Letting each concrete plan implement a function that returns a vec (or map) of space-vmrequest pairs.

impl SemiSpace {
    pub fn specify_spaces() -> Vec<(&'static str, VMRequest)> {
        let mut result = vec![("copyspace0", VMRequest::Unconstrainted), ("copyspace1", VMRequest::Unconstrainted)];
        result.append(CommonSpace::specify_spaces());
        result
    }
}

But I didn't adopt that approach in favour for the approach I used in this PR, mainly due to the ability to check statically.

2. It matches the design of "plan as configuration"

We have a general goal of being able to provide a configuration file for plans. We currently ask the plan for a configuration of allocators, allocation/copy semantics mapping, etc. Asking the plan to provide a configuration of VMRequests fits the design.

Yes. I agree. Ideally, spaces should be specified declaratively and MMTk should be able to instantiate them in the right order, i.e. knowing their locations and contiguousness before instantiating spaces. Maybe it is possible to make some kind of DSL (maybe using macros) and hide some of the details to the implementation.

I'll try making another version of this PR later to try some alternative approaches, e.g. HashMap and macros.

@steveblackburn
Copy link
Contributor

This looks interesting.

However, I would like to flag this for deep discussion before we go any further with it. The ideas at heart of this issue are foundational ones.

Experience with ProcessEdgesWork suggests that well-intentioned refactoring can have profound impacts on usability which undermine the most key elements of the MMTk design.

My suggestion is that we do not make any such change until:

  1. We address all high priority issues
  2. Get CI to the point that we have confidence in it
  3. Fix ProcessEdgesWork and associated changes so that we get to the point that, in that example, simple changes to tracing loops can be made by any competent coder.

We now have a situation where the code has become so complex that this is not possible, even for highly experienced team members. The design ethos of MMTk was to allow inexperienced people to come and rapidly explore GC designs. It is extremely important that we don't erode this principle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants