Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buf.registry.descriptor.v1 package #102

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

bufdev
Copy link
Member

@bufdev bufdev commented Mar 31, 2024

This adds the buf.registry.descriptor.v1 package, specifically containing the FileDescriptorProtoService. This service is meant to supplant the following constructs:

Some design decisions made:

  • This continues to lean into the ResourceRef model, letting users easily reference modules, labels, or commits in a single RPC. This prevents some difficulty with caching, but no more than the DownloadService, which serves a similar mechanism at a source-file level. We decided earlier on that it's better to make this a one-RPC operation than worry about caching here, although we can revisit that (quickly) for both the FileDescriptorProtos service and DownloadService if we want (I think in practice, the current shape serves us best).
  • This uses a FieldMask for the first time, in what I think is a clean usage of it. We don't like FieldMask in general, but in lieu of a better option, this feels like what we should go with. We could discuss having our own tag-based idea of field mask, but we can also always just deprecate this field anyways.
  • This leans into the google.protobuf types as opposed to continuing down the path that buf.alpha.image.v1.Image started with quasi-extending FileDescriptorSet/FileDescriptorProto in an unsupported way. It returns the same information via a sideband type FileDescriptorProtoExtension that is included on GetFileDescriptorProtosResponse, which feels more in line with the rest of our APIs in registry-proto.
  • This returns repeated FileDescriptorProto instead of FileDescriptorSet. The primary motivator for this was the FieldMask field, however it likely just makes more sense - FileDescriptorProtos can be put into i.e. CodeGeneratorRequests, and I haven't seen a usage of FileDescriptorSet in the wild that i.e. adds fields to FileDescriptorSet (which of course, outside of what we do with buf.alpha.v1.Image, is impossible as it is, and we didn't even do it in buf.alpha.v1.Image). Making this repeated FileDescriptorProto` feels more appropriate.
  • The FileDescriptorProtoExtension type uses commit_id instead of Commit. This is in line with Graph, where only a commit_id is returned, as opposed to Download where a Commit is returned. With FileDescriptorProtos as well as Graphs, the Commit may be duplicated 1000s of times, so an ID feels more proper here to prevent duplicated information in the common case (on Download, Commits could in theory be duplicated if two or more ResourceRefs point to the same Commit, however this is an edge case).
  • This uses the symbols terminology from reflect-proto instead of types. This one is more debatable.
  • This uses exclude_source_code_info instead of exclude_source_info as the equivalent flag is named on protoc and buf. The actual FileDescriptorProto field is named source_code_info, so this feels more proper. Back when I was writing buf, I debated naming the flag --exclude-source-code-info to be honest, but didn't so as to have more parity with protoc.

Note that I did contemplate whether we wanted to split out DownloadService, UploadService into a new file package, and potentially GraphService into a graph package (the latter being less desirable, as we could have graphs of other things i.e. plugins). That's a conversation I still want to briefly have - I feel the current setup leans us toward the module package being a mega-package, and you could make an argument that descriptor should fit within it as well.

@bufdev bufdev requested a review from nicksnyder March 31, 2024 20:29
@Alfus
Copy link

Alfus commented Mar 31, 2024

(tag/value based field_mask for thrift: https://github.com/facebook/fbthrift/blob/main/thrift/lib/thrift/field_mask.thrift)

// As opposed to syntax on a FileDescriptorProto, this field will be empty if no syntax was
// explicitly specified, and "proto2" if "proto2" was explicitly specified. Otherwise, this
// field matches the semantics of the syntax field on FileDescriptorProtos.
string specified_syntax = 3;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional design note - this was a bool syntax_specified in image.proto, but it seems clearer to make this the actual specified syntax, if it exists.

//
// SourceCodeInfo is optional additional information that is not required to describe a set
// of Protobuf APIs.
bool exclude_source_code_info = 3;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question whether this should be include or exclude - we decided for the HTTP GET endpoint to make this include.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the decision on whether to call this include or exclude depends on whatever you want to be returned by default. If this should be returned by default, then exclude is right (opt-out). If it shouldn't be returned by default, then include is right (opt-in). To decide what the default would be, I would think about the anticipated use cases and whether a majority of those use cases would need or not need this info. (similarly for the other fields)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing tools against descriptors, I find myself constantly forgetting that the info is not returned by default and I need to set a flag somewhere in the chain of tools to ensure the info is available. I'd rather need to ask for it to be stripped as an optimization than have to remember to include it, personally.

This goes for the other opt-in/out questions below, as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including by default is more convenient for use cases like the CLI. Excluding by default is better for constrained environments, where things suddenly stop working because a giant module with many comments shows up.

Giant modules have other downsides. Including by default is the least surprising behavior in my book.

//
// With the new Editions work, options are either runtime-retention or source-retention.
// Setting this option will exclude the latter.
bool exclude_source_retention_options = 4;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question whether this should be include or exclude - we decided for the HTTP GET endpoint to make this include.

// FileDescriptorProtoExtensions contain additional information about FileDescriptorProtos.
// We include this information in a separate message as FileDescriptorProtos are not
// extendable, and we do not want to modify the FileDescriptorProto shape.
bool exclude_file_descriptor_proto_extensions = 5;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question whether this should be include or exclude.

Comment on lines +59 to +68
// The explicitly-specified syntax within the file.
//
// If the syntax field is not set within a file, this is interpreted as the syntax "proto2".
// The protoc compiler does not set the syntax field if the syntax was "proto2", regardless of if
// the syntax was explicitly specified or not. This prevents programs (such as linters) from
// determining whether or not a syntax was explicitly specified within a file
//
// As opposed to syntax on a FileDescriptorProto, this field will be empty if no syntax was
// explicitly specified, and "proto2" if "proto2" was explicitly specified. Otherwise, this
// field matches the semantics of the syntax field on FileDescriptorProtos.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the middle paragraph here a bit of a tangent that was confusing on first read. Wondering if we can resequence this documentation so the behavior is documented first, and then the "why" is last (if necessary at all).

Suggested change
// The explicitly-specified syntax within the file.
//
// If the syntax field is not set within a file, this is interpreted as the syntax "proto2".
// The protoc compiler does not set the syntax field if the syntax was "proto2", regardless of if
// the syntax was explicitly specified or not. This prevents programs (such as linters) from
// determining whether or not a syntax was explicitly specified within a file
//
// As opposed to syntax on a FileDescriptorProto, this field will be empty if no syntax was
// explicitly specified, and "proto2" if "proto2" was explicitly specified. Otherwise, this
// field matches the semantics of the syntax field on FileDescriptorProtos.
// The explicitly-specified syntax within the file.
// If the syntax field is not set within a file, this is interpreted as the syntax "proto2".
//
// As opposed to syntax on a FileDescriptorProto, this field will be empty if no syntax was
// explicitly specified, and "proto2" if "proto2" was explicitly specified. This enables programs
// (such as linters) to determine whether or not a syntax was explicitly specified within a file.
// Otherwise, this field matches the semantics of the syntax field on FileDescriptorProtos.

//
// SourceCodeInfo is optional additional information that is not required to describe a set
// of Protobuf APIs.
bool exclude_source_code_info = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the decision on whether to call this include or exclude depends on whatever you want to be returned by default. If this should be returned by default, then exclude is right (opt-out). If it shouldn't be returned by default, then include is right (opt-in). To decide what the default would be, I would think about the anticipated use cases and whether a majority of those use cases would need or not need this info. (similarly for the other fields)

bool exclude_source_code_info = 3;
// Exclude source-retention options from the returned FileDescriptorProtos.
//
// With the new Editions work, options are either runtime-retention or source-retention.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// With the new Editions work, options are either runtime-retention or source-retention.
// With Protobuf Editions, options are either runtime-retention or source-retention.

Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/bufbuild/reflect-proto/blob/66bfea7a86e10b3cb421c4b90201b74becda519f/buf/reflect/v1beta1/file_descriptor_set.proto#L32 - this one is more debatable, but we never invested in this as an independent mechanism, so buf.registry.descriptor.v1 serves as the BSR-specific version of this (with additional features).

By supplant this, you mean we plan to remove the existing reflection endpoint? We had a blog post about it, and it's used by the prototransform module, the stand-alone knit gateway, and the vanguard-envoy filter. I guess that last one isn't public so definitely not used. The others may not be used much, but it seems likely to be non-zero.

Comment on lines +128 to +129
// This will match the semantics used within CodeGeneratorRequests.
google.protobuf.compiler.Version compiler_version = 3 [(buf.validate.field).required = true];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply we're doing something new here? I thought we used to always set the version to nil/unset since version of the Buf CLI weren't really comparable to versions of protoc.

// As opposed to syntax on a FileDescriptorProto, this field will be empty if no syntax was
// explicitly specified, and "proto2" if "proto2" was explicitly specified. Otherwise, this
// field matches the semantics of the syntax field on FileDescriptorProtos.
string specified_syntax = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to validate this with an in constraint?

//
// SourceCodeInfo is optional additional information that is not required to describe a set
// of Protobuf APIs.
bool exclude_source_code_info = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing tools against descriptors, I find myself constantly forgetting that the info is not returned by default and I need to set a flag somewhere in the chain of tools to ensure the info is available. I'd rather need to ask for it to be stripped as an optimization than have to remember to include it, personally.

This goes for the other opt-in/out questions below, as well.

// A symbol is a package, message, enum, service, method, or extension.
//
// Symbols are referenced by their fully-qualified name. For example, if "Foo" is a message within
// the package "foo.v1", the fully-qualified name is "foo.v1.Foo". Fully-qualfied names should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the package "foo.v1", the fully-qualified name is "foo.v1.Foo". Fully-qualfied names should
// the package "foo.v1", the fully-qualified name is "foo.v1.Foo". Fully-qualified names should

// As opposed to syntax on a FileDescriptorProto, this field will be empty if no syntax was
// explicitly specified, and "proto2" if "proto2" was explicitly specified. Otherwise, this
// field matches the semantics of the syntax field on FileDescriptorProtos.
string specified_syntax = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for a proto file with syntax="proto2", or without a syntax specified, FileDescriptorProto.syntax is blank, and this field is "proto2".

What happens for a file with edition="2023"? FileDescriptorProto.syntax will be "editions" in this case, but syntax wasn't really specified. Is this field "", or "editions"?

// Get compiled FileDescriptorProtos for the given Module, Label, or Commit.
rpc GetFileDescriptorProtos(GetFileDescriptorProtosRequest) returns (GetFileDescriptorProtosResponse) {
option idempotency_level = NO_SIDE_EFFECTS;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late to the party, but this is the most important thing to me: the RPC should be side effect free, so we can expose it as an HTTP GET using Connect.

Copy link

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed❌ failed (1)✅ passed✅ passedSep 13, 2024, 4:06 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants