Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] unique repeated messages by field #92

Open
derekperkins opened this issue Sep 18, 2023 · 6 comments
Open

[Feature Request] unique repeated messages by field #92

derekperkins opened this issue Sep 18, 2023 · 6 comments
Labels
Feature New feature or request

Comments

@derekperkins
Copy link

Feature description:
There's already a built in rule for scalar uniqueness in repeated fields. Most repeated messages have an id field that defines uniqueness, that could be used.
https://github.com/bufbuild/protovalidate/blob/main/proto/protovalidate/buf/validate/validate.proto#L3199-L3213

Proposed implementation or solution:

This isn't valid CEL, just a rough description of what I'm expecting

  optional bool unique = 3 [(priv.field).cel = {
    id: "repeated.unique_by_field"
    message: "repeated value must contain unique items, identified by an id field"
    expression: "this.unique(this.unique_id)"
  }];

Contribution:
Minimal time to contribute currently

@derekperkins derekperkins added the Feature New feature or request label Sep 18, 2023
@RoxKilly
Copy link

RoxKilly commented Oct 18, 2023

@derekperkins what you're proposing is meant to support this right? (copied from protoc-gen-validate issue)

Suppose I have an array of messages with field id.

message Student {
  string id = 1;
}

message Class {
 // Could your suggestion be used to validate that students[n].id is not repeated within this array?
  repeated Student students = 1; 
}

Have you found a way to get this behavior with the current API?

@derekperkins
Copy link
Author

Yes, this is exactly the use case I'm suggesting. No, I haven't solved this today with protovalidate, but I'm interested to hear back if you find a good workaround

@rodaine
Copy link
Member

rodaine commented Oct 30, 2023

You may be able to get away with this using what's available in CEL today on Class.students:

this.map(student, student.id).unique()

This uses the e.map CEL standard macro and our unique custom function.

@matthewpi
Copy link

You may be able to get away with this using what's available in CEL today on Class.students:

this.map(student, student.id).unique()

This uses the e.map CEL standard macro and our unique custom function.

I was able to get this to work by specifying the CEL expression on a message, rather than the repeated field. Here is a code sample:

message Environment {
	option (buf.validate.message).cel = {
		id: "repeated.unique_by_field"
		message: "repeated value must contain unique items, identified by the name field"
		expression: "this.variables.map(v, v.name).unique()"
	};

	// Variables to use within the environment. Each variable must have a unique `name` field.
	repeated EnvVar variables = 1;
//	[(buf.validate.field).cel = {
//		id: "repeated.unique_by_field"
//		message: "repeated value must contain unique items, identified by the name field"
//		expression: "this.map(v, v.name).unique()"
//	}];
}

message EnvVar {
	// Name of the environment variable. Must be unique.
	string name = 1;
	// Value of the environment variable.
	string value = 2;
}

The buf.validate.message CEL expression works like a dream, returning the following violation.

"violations": [
	{
		"fieldPath": "environment",
		"constraintId": "repeated.unique_by_field",
		"message": "repeated value must contain unique items, identified by the name field"
	}
]

However, as you might have noticed. The fieldPath is not what is wanted in this scenario. While the expression is being ran on environment, that expression only ever refers to the variables field, making it more difficult to properly match validation errors to the exact field they are related to.

If we use the buf.validate.field CEL rule instead of the one on the message, we hit an issue with the way CEL expressions are processed on repeated fields. It seems that when you manually specify a CEL expression on a repeated field, it is processed on each item within the associated array.

compilation error: failed to compile embedded type Environment for UpdateRequest.environment: compilation error: failed to compile expression repeated.unique_by_field: ERROR: \u003cinput\u003e:1:1: expression of type 'EnvVar' cannot be range of a comprehension (must be list, map, or dynamic)

| this.map(v, v.name).unique()
| ^

Assuming that the issue preventing the expression from being evaluated on the repeated field itself is resolved, there is just one other issue. Which is handling what field needs to be unique. The constraint ID can be used for localization, but right now it is just a generic repeated.unique_by_field which doesn't indicate any information regarding what field was used for the unique validation. There are two issues here, first is the fieldPath not being specific enough on what indexes failed the unique check (and the associated field potentially), and the constraintId not having a proper way of communicating what field the unique check is being ran on. Right now the best solution is to either have your message contain the field name, or if you need proper localization, to add the field name to the constraint key manually. But ideally there should be a proper solution for handling things like this in the future.

@mivanovcpd
Copy link

mivanovcpd commented Nov 16, 2023

After looking at different ways of doing this, could we perhaps have an annotation to mark unique fields:

message Student {
  string first_name = 1;
  string last_name = 2;
  option (buf.validate.message).unique.keys = ["first_name","last_name"];
}
message StudentList {
  repeated Student students = 1 [(buf.validate.field).repeated.unique = true];
}

We could set a restriction on the types of the fields in unique.keys. Say it would be simplest if we only allow:

  • allow scalar(optional too), Message(provided it has (buf.validate.message).unique.keys option)
    Note: it's very important to allow optional here as it's crucial for many validations where you need to differentiate between missing and default value
  • do not allow: oneof, repeated, Message without (buf.validate.message).unique.keys option

Do you think this could work?

@rodaine
Copy link
Member

rodaine commented Nov 16, 2023

compilation error: failed to compile embedded type Environment for UpdateRequest.environment: compilation error: failed to compile expression repeated.unique_by_field: ERROR: \u003cinput\u003e:1:1: expression of type 'EnvVar' cannot be range of a comprehension (must be list, map, or dynamic)

@matthewpi, I think this is a bug on protovalidate-go's part (likely in the type checking). This should have worked. I'll take a look and see if I can repro and add some tests for it. [Edit: protovalidate-go#80]


There are two issues here, first is the fieldPath not being specific enough on what indexes failed the unique check (and the associated field potentially), and the constraintId not having a proper way of communicating what field the unique check is being ran on.

@matthewpi, there's #17 and #125 which aim to make it easier to get at the underlying info of a violation error. That said, with a custom CEL expression, you have control over the id and error message and can even construct it within the expression to use string formatting against the variables provided:

repeated EnvVar variables = 1 [(buf.validate.field).cel = {
  id: "env.vars.unique", // the ID can be very specific to the field in question
  expression: "this.map(v, v.name).unique() ? '' : 'variables are expected to be unique, got %s'.format([this.map(v, v.name)])"
}

We intentionally do not include any data in the standard constraint error messages or as part of the violation error as there are security and PII concerns with echoing potentially sensitive data. You are free to do so in custom expressions however, and we're planning to address the linked issues above in the near term.


We could set a restriction on the types of the fields in unique.keys.

@mivanovcpd, a multi-field uniqueness comparison can be problematic for languages that don't have equivalents of Java's equals and hashCode overrides. For protobufs in Go, for instance, you cannot use msg1 == msg2 as Go only performs address comparison for pointer types and further the messages contain internal state that may be different between messages even if their fields are identical (so the protobuf library includes a proto.Equal which is used for comparison). Likewise, Go does not allow overriding the hash function of a type. This means we cannot use a map to make the uniqueness check O(n) as we'll need to explicitly proto.Equal each member of the repeated field to the others (which would run in O(n^2)). Alternatively, especially with a subset of fields, we'd need to write a custom hashing utility explicitly for this purpose if we wanted to maintain an O(n) runtime.

That said, if we can determine a way to make this universally efficient and not a performance landmine, either in the case of uniqueness across the entire message or a subset of fields (using a FieldMask WKT for this would offer the most flexibility), I'm open to adding such a standard constraint.

rodaine added a commit to bufbuild/protovalidate-go that referenced this issue Nov 17, 2023
The way we were resolving the CEL types for type-checking expressions
was inconsistent between custom and standard constraints. Resulting in
compilation errors (particularly around repeated fields). Logic shared
between the two (mostly lookups) was moved into the `expression`
internal package and used uniformly for both environments.

This also improves on a previously discovered bug around the Any WKT
where custom expressions against such a field would fail with a runtime
error if its underlying type was not known to CEL (CEL treats Any's as
the underlying type, instead of the Any message itself). The standard
constraints on Any do not have this limitation. We are populating the
root CEL environment with `protoregistry.GlobalFiles` for now, but will
likely make this configurable in the long-run.

Context:
bufbuild/protovalidate#92 (comment)
(h/t @matthewpi)
igor-tsiglyar pushed a commit to igor-tsiglyar/protovalidate that referenced this issue Apr 16, 2024
For repeated items, CEL types were being miscalculated for variables in
the expression resulting in type-check/compilation errors

Fixes bufbuild#91
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants