Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf2go: ergonomic enums #1636

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

bpf2go: ergonomic enums #1636

wants to merge 2 commits into from

Conversation

mejedi
Copy link
Contributor

@mejedi mejedi commented Dec 20, 2024

Constant names emitted for enum elements end up quite unwieldy. They are
prefixed with enum name because in C struct, union and enum names are in
a separate namespace, allowing for overlaps with identifiers, e.g:

enum E { E };

While such overlaps are possible in theory, people usually don't do
it. If a typicall C naming convention is followed, we get

enum something {
    SOMETHING_FOO, SOMETHING_BAR
};

generating <STEM>SomethingSOMETHING_FOO and <STEM>SomethingSOMETHING_BAR.

In addition to "safe" long names, generate shorter ones if the
respective name is not taken. <STEM>SOMETHING_FOO and
<STEM>SOMETHING_BAR are much nicer to work with.

@mejedi mejedi changed the title Ergonomic enums bpf2go: Ergonomic enums Dec 20, 2024
@mejedi mejedi requested a review from ti-mo December 20, 2024 13:05
bpf2go orders generated types by name to ensure the output is stable.
Adjust data structures such that we can rely on implicit sorting in
text/template, removing sortTypes alltogether.

The added reservedNames hash is handy for rejecting "reserved" type names
such as "<STEM>Specs", and for the upcoming "ergonomic" enum feature
(generate short names for enum members if not yet taken).

Signed-off-by: Nick Zavaritsky <[email protected]>
Constant names emitted for enum elements end up quite unwieldy. They are
prefixed with enum name because in C struct, union and enum names are in
a separate namespace, allowing for overlaps with identifiers, e.g:

enum E { E };

While such overlaps are possible in *theory*, people usually don't do
it. If a typicall C naming convention is followed, we get

enum something {
    SOMETHING_FOO, SOMETHING_BAR
};

generating <STEM>SomethingSOMETHING_FOO and <STEM>SomethingSOMETHING_BAR.

In addition to "safe" long names, generate shorter ones if the
respective name is not taken. <STEM>SOMETHING_FOO and
<STEM>SOMETHING_BAR are much nicer to work with.

Signed-off-by: Nick Zavaritsky <[email protected]>
@mejedi mejedi changed the title bpf2go: Ergonomic enums bpf2go: ergonomic enums Dec 20, 2024
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While yes, this makes for shorter identifiers, it also makes it less deterministic who will get the prefix-less one in case of conflicts. The first-declared one? The one appearing first alphabetically? Is it random? This needs to be clearly defined.

I also kind of don't really see the problem, could you give a real-world example of an identifier that gets unwieldy because of this? There's a case to be made for anonymous enums, which I'm not sure we currently support; at least I couldn't find any examples.

I'm also not sure about adding GoFormatter.ShortEnumIdentifier, we already have EnumIdentifier that could probably be extended.

@@ -116,3 +65,46 @@ func TestObjects(t *testing.T) {
qt.Assert(t, qt.StringContains(str, "Var1 *ebpf.Variable `ebpf:\"var_1\"`"))
qt.Assert(t, qt.StringContains(str, "ProgFoo1 *ebpf.Program `ebpf:\"prog_foo_1\"`"))
}

func TestEnums(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to write this test without the range over conflict true/false? I'm not a big fan of treating invariants like a list of test cases, because you inevitably end up sprinkling if <invariant> around anyway.

These are technically separate tests; perhaps you could extract the series of qt.Assert calls into a test() closure within the function to avoid repetition.

}
}

func wsSeparated(terms ...string) *regexp.Regexp {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this strictly required? IMO it's harder to read wsSeparated("barEnumNameV1", "barEnumName", "=", "1") than simply "barEnumNameV1 barEnumName = 1". Or is the amount of whitespace not deterministic?

Copy link
Contributor Author

@mejedi mejedi Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the amount of whitespace not deterministic?

The amount of whitespace depends on the length of other identifiers in the set since formatter inserts extra spaces for alignment. It doesn't improve readability indeed, but makes it easier to update test code in the future.

types, err := sortTypes(typeNames)
if err != nil {
return err
typeByName := map[string]btf.Type{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use make() for consistency?

// sortTypes returns a list of types sorted by their (generated) Go type name.
//
// Duplicate Go type names are rejected.
func sortTypes(typeNames map[btf.Type]string) ([]btf.Type, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the output no longer need to be sorted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pkg.go.dev/text/template#hdr-Actions

{{range pipeline}} T1 {{end}}
The value of the pipeline must be an array, slice, map, or channel.
If the value of the pipeline has length zero, nothing is output;
otherwise, dot is set to the successive elements of the array,
slice, or map and T1 is executed. If the value is a map and the
keys are of basic type with a defined order, the elements will be
visited in sorted key order.

args.ObjectFile,
}

var buf bytes.Buffer
if err := commonTemplate.Execute(&buf, &ctx); err != nil {
return fmt.Errorf("can't generate types: %s", err)
return fmt.Errorf("can't generate types: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why %v over %s? This change feels a little arbitrary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant %w.

Identifier: args.Identifier,
ShortEnumIdentifier: func(_, element string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be made a part of EnumIdentifier instead? Not sure why this deserves its own hook, it's just another step during enum identifier generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this deserves its own hook, it's just another step during enum identifier generation.

I propose generating BOTH long and short identifiers to maintain backwards compatibility.

@@ -12,58 +13,6 @@ import (
"github.com/cilium/ebpf/cmd/bpf2go/internal"
)

func TestOrderTypes(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically same question as above. Do we have another way of ensuring stability in the generated output? Was this always unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we range over a map in text/template, items are first ordered by key.

// NB: This also deduplicates types.
typeNames[typ] = args.Stem + args.Identifier(typ.TypeName())
tn := templateName(args.Stem)
reservedNames := map[string]struct{}{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight, I'd model this as a property to be specified during construction of the GoFormatter. Everytime the formatter generates an identifier (struct or otherwise), it can check uniqueness and push it to the set of seen names. I think it's a good feature to have in the formatter to prevent it from emitting invalid code, it doesn't need to be b2g-specific.

This way, you also avoid having to close over reservedNames and typeByName to get it into an ShortEnumIdentifier function.

@mejedi
Copy link
Contributor Author

mejedi commented Jan 8, 2025

@ti-mo

I also kind of don't really see the problem, could you give a real-world example of an identifier that gets unwieldy because of this?

enum reassm_rc {
  REASSM_RC_SUCCESS,
  REASSM_RC_INTERNAL_ERROR,
  REASSM_RC_TOO_MANY_REASSEMBLIES,
  ...
};

yields

const (
    frReassmRcREASSM_RC_SUCCESS               frReassmRc = 0
    frReassmRcREASSM_RC_INTERNAL_ERROR        frReassmRc = 1
    frReassmRcREASSM_RC_TOO_MANY_REASSEMBLIES frReassmRc = 2
    ....

This code is meant to be reused across several projects, therefore I can't simply have SUCCESS, INTERNAL_ERROR and TOO_MANY_REASSEMBLIES.

@mejedi
Copy link
Contributor Author

mejedi commented Jan 9, 2025

@ti-mo

it also makes it less deterministic who will get the prefix-less one in case of conflicts. The first-declared one? The one appearing first alphabetically? Is it random? This needs to be clearly defined.

Let's talk about conflicts. Assuming a C compiler, the only conflict possible is with a type name. E.g.

struct foo {};
enum E {foo};

Enum item name MUST be unique. The following program fails to compile:

enum A {foo};
enum B {foo}; // error: redefinition of enumerator 'foo'

In case of conflict, a type name wins.


In theory, the input object file could have duplicate enum identifiers. It is possible with non-C languages or multiple C compilation units linked together with bpftool. It is a niche use case as of today.

Types are visited in alphabetical order, items in a enum are visited in BTF order which typically matches declaration order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants