Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: filemanager multiple same-key queries #762

Merged
merged 4 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,445 changes: 986 additions & 1,459 deletions lib/workload/stateless/stacks/filemanager/Cargo.lock

Large diffs are not rendered by default.

22 changes: 17 additions & 5 deletions lib/workload/stateless/stacks/filemanager/docs/API_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,14 +141,26 @@ curl --get -H "Authorization: Bearer $TOKEN" --data-urlencode "attributes[portal

## Multiple keys

The API supports querying using multiple keys with the same name. This represents an `or` condition in the SQL query, where
The API supports querying using multiple keys with the same name. This represents an `or` condition in the SQL query by default, where
records are fetched if any of the keys match. For example, the following finds records where the bucket is either `bucket1`
or `bucket2`:

```sh
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3?bucket[]=bucket1&bucket[]=bucket2" | jq
```

To be more explicit, pass in `or` as a keyword when querying. For example, the following is equivalent:

```sh
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3?bucket[or][]=bucket1&bucket[or][]=bucket2" | jq
```

To express an `and` condition in the SQL query instead, use the `and` keyword:

```sh
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3?bucket[and][]=bucket1&bucket[and][]=bucket2" | jq
```

Multiple keys are also supported on attributes. For example, the following finds records where the `portalRunId` is
either `20240521aecb782` or `20240521aecb783`:

Expand All @@ -159,9 +171,10 @@ curl --get -H "Authorization: Bearer $TOKEN" \
"https://file.dev.umccr.org/api/v1/s3" | jq
```

Note that the extra `[]` is required in the query parameters to specify multiple keys with the same name. Specifying
multiple of the same key without `[]` results in an error. It is also an error to specify some keys with `[]` and some
without for keys with the same name.
Note that the extra `[]` is required in the query parameters to specify multiple keys with the same name. It is also
required to place the extra `[]` when explicitly specifying `or` or `and` conditions. Specifying multiple of the same
key without `[]` results in an error. It is also an error to specify some keys with `[]` and some without for keys with
the same name.

## Updating records

Expand Down Expand Up @@ -226,7 +239,6 @@ curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3/pre
There are some missing features in the query API which are planned, namely:

* There is no way to compare values with `>`, `>=`, `<`, `<=`.
* There is no way to express `and` or `or` conditions in the API (except for multiple keys representing `or` conditions).

There are also some feature missing for attribute linking. For example, there is no way
to capture matching wildcard groups which can later be used in the JSON patch body.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,5 @@ tracing = { version = "0.1" }
axum = "0.7"

lambda_http = "0.13"
lambda_runtime = "0.13"

filemanager = { path = "../filemanager" }
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ axum = "0.7"
dotenvy = "0.15"
http = "1"
clap = { version = "4", features = ["derive", "env"] }
sea-orm = { version = "1.1.0-rc.1", default-features = false, features = ["sqlx-postgres", "runtime-tokio-rustls"] }
sea-orm = { version = "1.1.2", default-features = false, features = ["sqlx-postgres", "runtime-tokio-rustls"] }

filemanager = { path = "../filemanager", features = ["migrate"] }
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,13 @@ authors.workspace = true
rust-version.workspace = true

[dependencies]
thiserror = "1"
thiserror = "2"
clap_builder = "4"
clap = "4"
dotenvy = "0.15"
sea-orm-cli = { version = "1.1.0-rc.1", default-features = false, features = ["cli", "codegen", "runtime-tokio-rustls"] }
tokio = { version = "1", features = ["macros", "rt-multi-thread", "process"] }
miette = { version = "7", features = ["fancy"] }
serde = { version = "1", features = ["derive"] }
quote = "1"
syn = { version = "2", features = ["full", "extra-traits", "parsing", "visit-mut"] }
prettyplease = "0.2"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,44 @@ use crate::Result;
use heck::AsPascalCase;
use prettyplease::unparse;
use quote::format_ident;
use std::collections::HashMap;
use std::fs::{read_dir, read_to_string, write};
use std::path::Path;
use syn::visit_mut::VisitMut;
use syn::{parse_file, parse_quote, Ident, ItemStruct};
use syn::{parse_file, parse_quote, Ident, ItemStruct, Type};
use tokio::process::Command;

/// OpenAPI definition generator implementing `VisitMut`.
#[derive(Debug)]
pub struct GenerateOpenAPI<'a> {
model_ident: &'a Ident,
override_types: &'a HashMap<Type, Type>,
name: &'a str,
}

impl<'a> VisitMut for GenerateOpenAPI<'a> {
impl VisitMut for GenerateOpenAPI<'_> {
fn visit_item_struct_mut(&mut self, i: &mut ItemStruct) {
if &i.ident == self.model_ident {
let path_ident: Ident = format_ident!("{}", self.name);
i.attrs.push(parse_quote! { #[schema(as = #path_ident)] });
}

i.fields.iter_mut().for_each(|field| {
if self.override_types.contains_key(&field.ty) {
field.ty = self.override_types[&field.ty].clone();
}
})
}
}

/// Generate OpenAPI utoipa definitions on top of the sea-orm entities.
pub async fn generate_openapi(out_dir: &Path) -> Result<()> {
let model_ident: Ident = parse_quote! { Model };
let override_types: HashMap<Type, Type> = HashMap::from_iter(vec![(
parse_quote! { Option<DateTimeWithTimeZone> },
parse_quote! { Option<chrono::DateTime<chrono::FixedOffset>> },
)]);

for path in read_dir(out_dir)? {
let path = path?.path();

Expand All @@ -54,6 +67,7 @@ pub async fn generate_openapi(out_dir: &Path) -> Result<()> {

GenerateOpenAPI {
model_ident: &model_ident,
override_types: &override_types,
name,
}
.visit_file_mut(&mut tokens);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ rust-version.workspace = true

[dependencies]
tokio = { version = "1", features = ["macros"] }
tracing = { version = "0.1" }

aws_lambda_events = "0.15"
lambda_runtime = "0.13"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ rust-version.workspace = true
[dependencies]
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["macros"] }
tracing = { version = "0.1" }

lambda_runtime = "0.13"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ authors.workspace = true
rust-version.workspace = true

[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"

tokio = { version = "1", features = ["macros"] }
tracing = { version = "0.1" }

Expand All @@ -18,7 +15,3 @@ aws-sdk-cloudformation = "1"
lambda_runtime = "0.13"

filemanager = { path = "../filemanager", features = ["migrate"] }

[dev-dependencies]

serde_json = "1.0"
28 changes: 12 additions & 16 deletions lib/workload/stateless/stacks/filemanager/filemanager/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ tracing-subscriber = { version = "0.3", default-features = false, features = ["f

# Database
sqlx = { version = "0.8", default-features = false, features = ["postgres", "runtime-tokio", "tls-rustls", "chrono", "uuid", "macros"] }
sea-orm = { version = "1.1.0-rc.1", default-features = false, features = [
sea-orm = { version = "1.1", default-features = false, features = [
"sqlx-postgres",
"runtime-tokio-rustls",
"macros",
Expand All @@ -39,36 +39,36 @@ strum = { version = "0.26", features = ["derive"] }
# Query server
axum = "0.7"
axum-extra = "0.9"
utoipa = { version = "4", features = ["axum_extras", "chrono", "uuid", "url"] }
utoipa-swagger-ui = { version = "7", features = ["axum", "debug-embed", "url"] }
tower = "0.4"
tower-http = { version = "0.5", features = ["trace", "cors"] }
utoipa = { version = "5", features = ["axum_extras", "chrono", "uuid", "url"] }
utoipa-swagger-ui = { version = "8", features = ["axum", "debug-embed", "url"] }
tower = { version = "0.5", features = ["util"] }
tower-http = { version = "0.6", features = ["trace", "cors"] }
serde_qs = { version = "0.13", features = ["axum"] }
json-patch = "2"
json-patch = "3"

# General
chrono = { version = "0.4", features = ["serde"] }
thiserror = "1"
thiserror = "2"
uuid = { version = "1", features = ["v7"] }
mockall = "0.13"
mockall_double = "0.3"
itertools = "0.13"
url = { version = "2", features = ["serde"] }
bytes = "1.6"
envy = "0.4"
rand = "0.8"
parse-size = "1"
humantime = "2"
percent-encoding = "2"

# Inventory
csv = "1"
flate2 = "1"
md5 = "0.7"
hex = "0.4"
parquet = { version = "52", features = ["async"] }
arrow = { version = "52", features = ["chrono-tz"] }
arrow-json = "52"
orc-rust = "0.3"
parquet = { version = "53", features = ["async"] }
arrow = { version = "53", features = ["chrono-tz"] }
arrow-json = "53"
orc-rust = "0.5"

# AWS
aws-sdk-sqs = "1"
Expand All @@ -81,7 +81,6 @@ aws_lambda_events = "0.15"

[dev-dependencies]
lazy_static = "1"
percent-encoding = "2"

aws-smithy-runtime-api = "1"
aws-smithy-mocks-experimental = "0.2"
Expand All @@ -91,7 +90,4 @@ aws-sdk-s3 = { version = "1", features = ["test-util"] }
filemanager = { path = ".", features = ["migrate"] }

[build-dependencies]
filemanager-build = { path = "../filemanager-build" }
miette = { version = "7", features = ["fancy"] }
tokio = { version = "1", features = ["macros"] }
dotenvy = "0.15"
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.0-rc.1
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.2
pub mod prelude;
pub mod s3_object;
pub mod sea_orm_active_enums;
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.0-rc.1
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.2
pub use super::s3_object::Entity as S3Object;
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.0-rc.1
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.2
use super::sea_orm_active_enums::EventType;
use super::sea_orm_active_enums::StorageClass;
use sea_orm::entity::prelude::*;
Expand All @@ -19,11 +19,11 @@ pub struct Model {
pub key: String,
#[sea_orm(column_type = "Text")]
pub version_id: String,
pub event_time: Option<DateTimeWithTimeZone>,
pub event_time: Option<chrono::DateTime<chrono::FixedOffset>>,
pub size: Option<i64>,
#[sea_orm(column_type = "Text", nullable)]
pub sha256: Option<String>,
pub last_modified_date: Option<DateTimeWithTimeZone>,
pub last_modified_date: Option<chrono::DateTime<chrono::FixedOffset>>,
#[sea_orm(column_type = "Text", nullable)]
pub e_tag: Option<String>,
pub storage_class: Option<StorageClass>,
Expand All @@ -33,7 +33,7 @@ pub struct Model {
pub number_duplicate_events: i64,
#[sea_orm(column_type = "JsonBinary", nullable)]
pub attributes: Option<Json>,
pub deleted_date: Option<DateTimeWithTimeZone>,
pub deleted_date: Option<chrono::DateTime<chrono::FixedOffset>>,
#[sea_orm(column_type = "Text", nullable)]
pub deleted_sequencer: Option<String>,
pub number_reordered: i64,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.0-rc.1
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.2
use sea_orm::entity::prelude::*;
use serde::{Deserialize, Serialize};
#[derive(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ impl<'a> Collecter<'a> {

// Get the attributes from the old record to update the new record with.
let filter = S3ObjectsFilter {
ingest_id: vec![ingest_id],
ingest_id: vec![ingest_id].into(),
..Default::default()
};
let moved_object = ListQueryBuilder::new(database_client.connection_ref())
Expand Down Expand Up @@ -362,7 +362,7 @@ impl From<BuildError> for Error {
}

#[async_trait]
impl<'a> Collect for Collecter<'a> {
impl Collect for Collecter<'_> {
async fn collect(mut self) -> Result<EventSource> {
let (client, database_client, events, config) = self.into_inner();

Expand Down
Loading
Loading