Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change underlying storage to an actual union #72

Open
schveiguy opened this issue Aug 29, 2022 · 10 comments
Open

Change underlying storage to an actual union #72

schveiguy opened this issue Aug 29, 2022 · 10 comments

Comments

@schveiguy
Copy link
Contributor

SumType does this and I think because of this, it has more accurate semantics than TaggedUnion, and also needs less machinery around the lifetime issues.

I'm putting this issue report here, because I think it's a good goal to have that I think unlocks some other issues that need fixing, specifically #63 and #44.

@s-ludwig
Copy link
Owner

Do you have specific semantics in mind or lifetime management that could be simplified? Not having GC-scanned void[] storage in the case that none of the types contain pointers is the only thing that I can think of right now.

But it would probably make the code a bit easier to read in any case. There had been issues with unused union parts not getting overwritten that produced false pointers, but those appeared to have gone, judging by vibe.d's Json implementation, which already switched back to union.

@schveiguy
Copy link
Contributor Author

Instead of using memory tricks, and having to remember to call things that the compiler does naturally, we can let the compiler just do its thing. I have a suspicion that there are some shortcuts that might not be correct, but I haven't proven it yet. I found it odd that when I just added the full union to the Dummy union, it started complaining about how e.g. opAssign couldn't be @nogc.

It also paves the way for CT usage.

@s-ludwig
Copy link
Owner

But all the lifetime management (construction, destruction, copy construction/postblit) still needs to be done manually. And opAssign needs a valid value to overwrite, so either a bit-copy of T.init needs to be done in advance, or we need an approach similar to the current emplace one.

(Anyway, I don't really argue against this, we should definitely do it)

@schveiguy
Copy link
Contributor Author

Looking at std.sumtype, @pbackus is destroying the current value, and then assigning the union by constructing a new one (which I assume blits the proper .init value first). I don't think he's calling any postblit/etc manually:
https://github.com/dlang/phobos/blob/9e42570bc28be1452995d288f9667f310df0becd/std/sumtype.d#L620

The ctors look better too:
https://github.com/dlang/phobos/blob/9e42570bc28be1452995d288f9667f310df0becd/std/sumtype.d#L358

In general, the machinery just looks much more straightforward and simple.

I'd like to see a similar approach here. Let the compiler do the work it knows how to do. It also gives the compiler more info about what is going on. When the compiler sees void[N] I think it just ignores all lifetime issues.

@s-ludwig
Copy link
Owner

s-ludwig commented Sep 2, 2022

I don't know, maybe its because I know the code, but I really think the main difference comes down to certain simplifications, such as always doing destroy+blit assign instead of using the copy constructor/postblit of the target type, and not so much to how the storage is defined. I don't see much in terms of actual simplification or compiler assistance (struct alignment definitely is, though).

BTW, I just discovered that I started a WIP branch for this already: https://github.com/s-ludwig/taggedalgebraic/tree/union_storage

@schveiguy
Copy link
Contributor Author

Simplification should be a goal. If TU is storing a union and not void[N], it should utilize the compiler's in-built abilities to ensure lifetime is done correctly. It may not be automatic, it's still a union, but it should be utilized.

I have no objection to using postblit/copy vs. destroy+blit, so long as it's not a manual process.

I'm reminded of my newaa project where I reimplemented the built in associative arrays as a library type. I don't do any special mechanisms for the Entry struct to simulate what the compiler does naturally. Compare to the actual array runtime, which is insanely complex with all kinds of casts and machinery to replace what the compiler should be doing.

I'm not saying this is a requirement, nor that I have a specific vision in mind. But I'd like to have that kind of "aha, I get how this works" feeling when looking at the code, instead of "I have no idea how this works... this looks like black magic... this looks like a compiler workaround".

BTW, I just discovered that I started a WIP branch for this already

Nice! I'll take a look when I have some time.

@s-ludwig
Copy link
Owner

s-ludwig commented Sep 2, 2022

I have no objection to using postblit/copy vs. destroy+blit, so long as it's not a manual process.

But it is manual in both cases. I feel like I'm obviously missing something here, but unless I'm terribly mistaken, unions are not performing any lifetime functionality on their own. The differences AFAICS are just automatic alignment, automatic avoidance of GC scanning, as well as direct member access vs. a cast.

@schveiguy
Copy link
Contributor Author

It has to be done via direct member access. But e.g. this kind of stuff is hard to comprehend why it's needed:

package void rawEmplace(T)(void[] dst, ref T src)
{
T[] tdst = () @trusted { return cast(T[])dst[0 .. T.sizeof]; } ();
static if (is(T == class)) {
tdst[0] = src;
} else {
import std.conv : emplace;
emplace!T(&tdst[0]);
rawSwap(tdst[0], src);
}
}
// std.algorithm.mutation.swap sometimes fails to compile due to
// internal errors in hasElaborateAssign!T/isAssignable!T. This is probably
// caused by cyclic dependencies. However, there is no reason to do these
// checks in this context, so we just directly move the raw memory.
package void rawSwap(T)(ref T a, ref T b)
@trusted {
void[T.sizeof] tmp = void;
void[] ab = (cast(void*)&a)[0 .. T.sizeof];
void[] bb = (cast(void*)&b)[0 .. T.sizeof];
tmp[] = ab[];
ab[] = bb[];
bb[] = tmp[];
}

This is where I started trying to fix #63 and where it started complaining about @nogc no longer being valid.

Maybe I'm misunderstanding the complexity. The whole thing is somewhat confusing to me. Why in some cases it's a void[N], and in others it's a union between the first member and void[N]? This makes weird problems happen when your first member is of a certain type (but I suspect there are similar issues with SumType).

It would be nice if the storage is U fields; Kind tag; and that's it.

@s-ludwig
Copy link
Owner

s-ludwig commented Sep 2, 2022

Okay, but using m_data.trustedGet!T = ...; should be more or less exactly the same as m_data.field_0 = ...; in the union case. The rawEmplace was just a workaround for issues with non-copyable types and bogus complaints about internal pointers by swap (I don't remember the details right now). But generally, this could be replaced by a very similar approach to the the SumType code.

edit: CTFE is another valid point though!

@schveiguy
Copy link
Contributor Author

Okay, but using m_data.trustedGet!T = ...; should be more or less exactly the same as m_data.field_0 = ...;

But that's not what happens, these are the paths for all construction and opAssign:


m_data.rawEmplace(value);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants