Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add named role support #202

Merged
merged 11 commits into from
Jan 19, 2024
Merged

Conversation

mikemrm
Copy link
Contributor

@mikemrm mikemrm commented Nov 27, 2023

This adds support for giving roles names, along with tracking additional data such as the creator, created at and updated at.

To support this, a small database package has been created to handle interacting with the backend database and simplifying changes.

In addition to the database package, the http api has changed to support these new features.

  • All endpoints which return a role will now return the new information if such details exist.
  • POST /api/v1/resources/:id/roles now requires a name to be provided.
  • PATCH /api/v1/roles/:id has been added with support for updating an existing roles name and actions.

@mikemrm mikemrm force-pushed the named-roles-database branch 6 times, most recently from 3d57e8a to 80dde1f Compare November 30, 2023 15:11
@mikemrm mikemrm changed the title add permissions-api database and migrations for named roles add named role support Dec 1, 2023
@mikemrm mikemrm force-pushed the named-roles-database branch from ed155e2 to d9d8992 Compare December 1, 2023 21:03
@mikemrm mikemrm marked this pull request as ready for review December 1, 2023 21:23
@mikemrm mikemrm requested review from a team as code owners December 1, 2023 21:23
Copy link
Contributor

@jnschaeffer jnschaeffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping some early comments here regarding the Helm chart. I'll get to the code next.

@@ -0,0 +1,70 @@
{{- if has .Values.config.crdb.migrateHook (list "pre-sync" "manual") }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we anticipate needing or wanting to support these three kinds of migration workflows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manual is perhaps not needed. but pre-sync and init are useful.

pre-sync can't be used during the initial deployment as it will rely on configs and secrets that have to be synced first, and therefore won't be available. so you must use the init method. but after using the init method, you can switch to the pre-sync method which will ensure migrations only execute once per sync.

if having these options is not preferred we could get rid of it and just use init containers.

{{- else }}
generateName: migrate-database-
annotations:
argocd.argoproj.io/hook: PreSync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think it's good to not assume any user of this is using Argo specifically.

Comment on lines +50 to +54
# migrateHook sets when to run database migrations. one of: pre-sync, init, manual
# - pre-sync: hook runs as a job before any other changes are synced.
# - init: is run as an init container to the server deployment and may run multiple times if replica count is high.
# - manual: a migrate-database job will be available to triggered manually
migrateHook: "init"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: Argo. I do like the concept, though!

Comment on lines +61 to +62
# password is the auth password to the database
password: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we shouldn't even let users set manually, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually like to expose all available configuration options, which this is an option that can be set in the crdbx config which is why I added it here.

Comment on lines +65 to +66
# uri is the raw uri connection string
uri: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: sensitive data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re same as above

Copy link
Contributor

@jnschaeffer jnschaeffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts.

db/migrations.go Outdated Show resolved Hide resolved
db/migrations.go Outdated Show resolved Hide resolved
internal/types/types.go Outdated Show resolved Hide resolved
internal/database/db.go Outdated Show resolved Hide resolved
internal/database/db.go Outdated Show resolved Hide resolved
internal/query/relations.go Outdated Show resolved Hide resolved
internal/query/relations.go Outdated Show resolved Hide resolved
internal/query/relations.go Outdated Show resolved Hide resolved
internal/query/relations.go Outdated Show resolved Hide resolved
Comment on lines 1051 to 1106
dbTx, err := e.db.DeleteRoleTransaction(ctx, roleResource.ID)
if err != nil {
// If the role doesn't exist, simply ignore.
if !errors.Is(err, database.ErrNoRoleFound) {
return err
}
} else {
// Setup rollback in case an error occurs before we commit.
defer dbTx.Rollback() //nolint:errcheck // error is logged in function
}

for _, filter := range filters {
if err = e.deleteRelationships(ctx, filter); err != nil {
return fmt.Errorf("failed to delete role action %s: %w", filter.OptionalResourceId, err)
err = fmt.Errorf("failed to delete role action %s: %w", filter.OptionalResourceId, err)

span.RecordError(err)
span.SetStatus(codes.Error, err.Error())

return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I attempt to update a role with new actions while it's being deleted? Reading this code, it looks possible that I wind up with relationships in SpiceDB but no corresponding CRDB entries. This might also, ironically, require a SELECT ... FOR UPDATE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now handled properly as we always have a transaction open before on the database. if two changes came in at the same time, one would have to wait until the database released the row.

@mikemrm mikemrm force-pushed the named-roles-database branch 3 times, most recently from 1f15a23 to 0a6b4ca Compare December 7, 2023 17:09
@mikemrm mikemrm requested a review from jnschaeffer December 7, 2023 17:16
Copy link
Contributor

@jnschaeffer jnschaeffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the tests yet but this looks broadly good. Some minor points below.

// check on the role resource.
resource, err := r.engine.GetRoleResource(ctx, roleResource)
if err != nil {
return echo.NewHTTPError(http.StatusBadRequest, "error getting resource").SetInternal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense as a 400? If the resource wasn't found, we probably want to return a 404, and if something else failed then to return a 500.

Actions []string `json:"actions"`

ResourceID gidx.PrefixedID `json:"resource_id,omitempty"`
Creator gidx.PrefixedID `json:"creator"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the earlier comment around this got resolved but it looks like I missed this one. As a side note, we could also make this CreatedBy/created_by instead; Creator/creator just feels off.

We also probably want a field denoting who last updated the role too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, good catch.

}

return role, nil
}

// UpdateRole returns nothing but satisfies the Engine interface.
func (e *Engine) UpdateRole(ctx context.Context, actor, roleResource types.Resource, newName string, newActions []string) (types.Role, error) {
return types.Role{}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm increasingly wary of mock interfaces (even though I'm pretty sure I wrote this one), but for the sake of completeness it likely makes sense to return the updated fields here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair nuff. i'll update this to use the mock support for returning values.

internal/query/relations.go Show resolved Hide resolved
Comment on lines 370 to 373
// If the name is the same, then clear the changed name.
if role.Name == newName {
newName = ""
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just keep the name and pass it right through. It saves us a little complexity and ultimately almost all of the time is going to be spent in the DB regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. i'll update it to instead, if no name is provided (for example, when just updating actions) it will populate the name from the existing record so we can keep the query simple.

Comment on lines 429 to 455
if err := e.store.CommitContext(dbCtx); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())

logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))

return types.Role{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here re: rolling back SpiceDB relationships. I'm not saying we need to roll them back, but behavior here should be defined, preferably as comments.

internal/storage/context.go Show resolved Hide resolved

func pqIsRoleAlreadyExistsError(err error) bool {
if pqErr, ok := err.(*pq.Error); ok {
return pqErr.Code.Name() == "unique_violation" && pqErr.Constraint == "roles_pkey"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little suspicious of tying our error detection logic to constraint names in the DB. Is there a better way to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of a better way to handle error detection. Since we own the database and migrations, we know the names of the indexes. We could make these more configurable or make them discovered on start but I'm not sure that's necessary at this point.

Alternatively we could just return less descriptive errors but I would like to avoid that.


func pqIsRoleNameTakenError(err error) bool {
if pqErr, ok := err.(*pq.Error); ok {
return pqErr.Code.Name() == "unique_violation" && pqErr.Constraint == "roles_resource_id_name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: constraints.

Comment on lines 204 to 216
// If no name is provided, only update the updated_at timestamp.
if name == "" {
row = tx.QueryRowContext(ctx, `
UPDATE roles SET updated_at = now() WHERE id = $1
RETURNING id, name, resource_id, creator_id, created_at, updated_at
`, roleID.String())
} else {
row = tx.QueryRowContext(ctx, `
UPDATE roles SET name = $1, updated_at = now() WHERE id = $2
RETURNING id, name, resource_id, creator_id, created_at, updated_at
`, name, roleID.String(),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, we can probably make this less complex by just accepting whatever role name we get.

@mikemrm mikemrm force-pushed the named-roles-database branch from 11fa1d6 to 291ad8e Compare December 13, 2023 12:47
@mikemrm mikemrm requested a review from jnschaeffer December 13, 2023 12:55

logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))

// TODO: add spicedb rollback logic.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens without this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role actions are persisted (and will be visible to the user), but role metadata changes are not. While we could attempt to roll back the changes to SpiceDB, that operation itself could fail, meaning we would be in the same situation regardless.

We should probably spell that out here in the code, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that clarification, I'll get a version of this added to the code.

Copy link
Contributor

@jnschaeffer jnschaeffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good - mostly some notes on documentation and tests.

Comment on lines 333 to 334
old := make(map[string]bool, len(oldActions))
new := make(map[string]bool, len(newActions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this, but if you want to save a few bytes on the heap, struct{} can be used here along with the _, ok := myMap["some_key"] pattern.


logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))

// TODO: add spicedb rollback logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role actions are persisted (and will be visible to the user), but role metadata changes are not. While we could attempt to roll back the changes to SpiceDB, that operation itself could fail, meaning we would be in the same situation regardless.

We should probably spell that out here in the code, though.

relationships, err := e.readRelationships(ctx, filter)
if err != nil {
return nil, err
}

out := relationshipsToRoles(relationships)

rolesByID := make(map[string]storage.Role, len(dbRoles))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a silly question, but why not just have a map[gidx.PrefixedID]storage.Role if we just convert every key to a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 that's a good question... i'm not sure why I did that... It definitely doesn't need to be converted. Thanks!

span.RecordError(err)
span.SetStatus(codes.Error, err.Error())

logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably document what happens here too if this operation fails. The delete from the DB will fail and be rolled back, but the role will have no associated actions, which is probably fine.

That said, it looks like maybe roles can be deleted even if they have assignments? If that's the case, that's not great, but let's solve that in a follow-up PR rather than make this one bigger since it's not new behavior.

Name: "UpdateMissingRole",
Input: gidx.MustNewID(RolePrefix),
CheckFn: func(ctx context.Context, t *testing.T, res testingx.TestResult[types.Role]) {
assert.Error(t, res.Err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to check what the error is here, and not just that an error occurred.

Input: role.ID,
CheckFn: func(ctx context.Context, t *testing.T, res testingx.TestResult[types.Role]) {
assert.NoError(t, res.Err)
require.Equal(t, "test2", res.Success.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be good to also check the other attributes here: update time, actions, and so on.

Comment on lines +34 to +36
if !testCase.Sync {
t.Parallel()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We almost certainly want to ensure our test cases are run concurrently. This is a good way to detect race conditions in Go code.

Copy link

@alexhall alexhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

-- create "roles" table
CREATE TABLE "roles" (
"id" character varying NOT NULL,
"name" character varying(64) NOT NULL,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for setting this limit at the storage level? In my experience, doing this is just setting yourself up for running future database migrations to increase the limit - 64 characters is pretty low for something you intend to expose to users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 characters seems to be a common limit. This field is part of a unique index and needs to have a limit to ensure performance.

@jnschaeffer
Copy link
Contributor

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

@alexhall I missed this before, sorry. I might be wrong here, but if we have actions stored in CRDB it doesn't seem like that fixes the problems that come from coordinating transactions between both systems. If a commit fails in CRDB but we already persisted actions to SpiceDB, for example, we'll now be in an inconsistent state (i.e., SpiceDB will have a different view of actions in a role from what CRDB shows). As written, we at least keep the actions in only one place.

@alexhall
Copy link

alexhall commented Jan 9, 2024

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

@alexhall I missed this before, sorry. I might be wrong here, but if we have actions stored in CRDB it doesn't seem like that fixes the problems that come from coordinating transactions between both systems. If a commit fails in CRDB but we already persisted actions to SpiceDB, for example, we'll now be in an inconsistent state (i.e., SpiceDB will have a different view of actions in a role from what CRDB shows). As written, we at least keep the actions in only one place.

In this situation, I wouldn't be writing to SpiceDB until after the write to CRDB succeeds. If the write to SpiceDB fails, that can be retried, and possibly even reconciled in the background, but at least I'd know that I have persisted in CRDB a consistent view of how the roles and actions should be. In other words, treat SpiceDB as a query-optimized view used for evaluating access checks, accept that data will get out of sync between CRDB and SpiceDB, and optimize for your ability to recover from that situation. I don't know how well that aligns with your existing architecture, but that's what jumps out to me at first glance when looking at this PR.

@mikemrm mikemrm requested a review from jnschaeffer January 11, 2024 21:42
@mikemrm mikemrm force-pushed the named-roles-database branch 2 times, most recently from efd9772 to 784860c Compare January 11, 2024 21:59
@jnschaeffer
Copy link
Contributor

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

@alexhall I missed this before, sorry. I might be wrong here, but if we have actions stored in CRDB it doesn't seem like that fixes the problems that come from coordinating transactions between both systems. If a commit fails in CRDB but we already persisted actions to SpiceDB, for example, we'll now be in an inconsistent state (i.e., SpiceDB will have a different view of actions in a role from what CRDB shows). As written, we at least keep the actions in only one place.

In this situation, I wouldn't be writing to SpiceDB until after the write to CRDB succeeds. If the write to SpiceDB fails, that can be retried, and possibly even reconciled in the background, but at least I'd know that I have persisted in CRDB a consistent view of how the roles and actions should be. In other words, treat SpiceDB as a query-optimized view used for evaluating access checks, accept that data will get out of sync between CRDB and SpiceDB, and optimize for your ability to recover from that situation. I don't know how well that aligns with your existing architecture, but that's what jumps out to me at first glance when looking at this PR.

I think this idea has merit, though perhaps for different reasons: storing the actions in CRDB lets us have more flexibility around how we present them (i.e., changing the action names, adding descriptions, etc).

That said, would it be worth putting this in a separate PR? This one is already at ~2k LOC and we probably want to give more consideration to how we define actions in general for reasons beyond DB transactions.

Database package along with migrations and migrate command giving
permissions-api it's own database to store details into.

Initial support is for Roles Get, Create, Update and Delete.

Signed-off-by: Mike Mason <[email protected]>
This integrates the new database which contains role metadata into the
query engine as well as updates the http api to expose this new
information.

Signed-off-by: Mike Mason <[email protected]>
Updated CreateRole, UpdateRole and DeleteRole to CreateRoleTransaction,
UpdateRoleTransaction and DeleteRoleTransaction to make it more clear
that a transaction is being started.

Additionally, the comments on these methods have been updated to include
statements that Commit or Rollback must be called to ensure the database
lifts all locks on rows which are affected.

Previously, the method names made it appear as though the action was
taken and completed. However this could lead to hung connections and
rows.

The new names make it clear that a new transaction is being started.
The returning struct only has two methods Commit and Rollback in
addition to the Record attribute resulting in a simple structure.

Signed-off-by: Mike Mason <[email protected]>
@mikemrm mikemrm force-pushed the named-roles-database branch from 784860c to ecd30fe Compare January 16, 2024 19:49
Since we're working with multiple backends, this allows us to place a
lock early and ensure a separate request doesn't conflict with an
in-flight change.

Signed-off-by: Mike Mason <[email protected]>
@mikemrm mikemrm force-pushed the named-roles-database branch 6 times, most recently from de5f8a2 to da03164 Compare January 18, 2024 16:13
@mikemrm mikemrm force-pushed the named-roles-database branch from da03164 to f2d105f Compare January 18, 2024 19:18
Copy link
Contributor

@jnschaeffer jnschaeffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - at this point I think we're at least in a good enough state to end-to-end test this and see how it shakes out before cutting a release.

@mikemrm mikemrm merged commit 28ffea5 into infratographer:main Jan 19, 2024
4 checks passed
@mikemrm mikemrm deleted the named-roles-database branch January 19, 2024 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants