Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tasks based ingestion #11

Merged
merged 35 commits into from
Dec 14, 2023
Merged

Conversation

vzayts
Copy link

@vzayts vzayts commented Dec 6, 2023

This pull request adds 2 new API services:

  • Tasks service for druid ingestion.
  • Metadata service for queries to druid metadata (methods added are helper methods for integrations tests, taken from our proprietary code).

Tasks Service adds a different mode of ingestion - task-based, which is different from supervisor-based that runs ingestion tasks periodically. Three APIs added there for the tasks: Submit, GetStatus and Shutdown. Some of the ingestion spec types and functions were initially implemented for our integration tests by @tomasz-h2o. Now they are moved here and extended.

Among other changes: extracting common ingestion specification data into common_spec_types.go.

I'd like to review this one against https://github.com/h2oai/go-druid/tree/tomasz/implement-suspend-resume that extends supervisor APIs implementation, and then have both features reviewed/merged/consumed for our migration data needs.

@vzayts vzayts added the area/core Core code related issue label Dec 6, 2023
@vzayts vzayts marked this pull request as ready for review December 8, 2023 20:34
Copy link
Collaborator

@zoido zoido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two main issues that make me to request changes:

  1. potential SQL injection
  2. no timeout for test await functions

See inline comments for details.

common_spec_types.go Outdated Show resolved Hide resolved
common_spec_types.go Outdated Show resolved Hide resolved
common_spec_types.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
tasks_test.go Outdated Show resolved Hide resolved
tasks_test.go Outdated Show resolved Hide resolved
tasks_test.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@tomasz-h2o tomasz-h2o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, couple of comments for your consideration.

Side question: do we really need this vast amount of structures introduced?
I understand that it covers thee parts of druid API we're using, though we're not really using big part of those fields, are we?

I'm not asking to remove them, though maybe make sure that we implement bare minumum for client to function.

common_spec_types.go Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
tasks_test.go Outdated Show resolved Hide resolved
tasks_test.go Outdated Show resolved Hide resolved
@vzayts vzayts force-pushed the vzaytsev/add-tasks-based-ingestion branch from 76de2f7 to bc46dc3 Compare December 11, 2023 22:27
@vzayts vzayts self-assigned this Dec 12, 2023
@vzayts
Copy link
Author

vzayts commented Dec 12, 2023

Side question: do we really need this vast amount of structures introduced? I understand that it covers thee parts of druid API we're using, though we're not really using big part of those fields, are we?

I'm not asking to remove them, though maybe make sure that we implement bare minumum for client to function.

Most of these types were implemented previously, and in this PR they were moved from supervisor_types.go to common_spec.go. I probably should have emphasized that on the initial note of the PR.

Copy link
Collaborator

@zoido zoido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I feel we need to polish them.

druid.go Outdated Show resolved Hide resolved
druid.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@tomasz-h2o tomasz-h2o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor comments from my side.

sql/datasource_available.sql Outdated Show resolved Hide resolved
tasks_test.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
Base automatically changed from tomasz/implement-suspend-resume to master December 12, 2023 13:11
@vzayts vzayts requested review from zoido and tomasz-h2o December 12, 2023 16:42
metadata.go Outdated Show resolved Hide resolved
metadata.go Outdated Show resolved Hide resolved
@zoido
Copy link
Collaborator

zoido commented Dec 12, 2023

@vzaytsev1981 I just spotted the time.After issue. I'll take a look at the rest tomorrow.

@vzayts vzayts requested a review from zoido December 12, 2023 18:34
Copy link
Collaborator

@zoido zoido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments.

Apologies. As the PR is on the bigger side I keep spotting things that I did not spotted before as my attention was lead in different parts of the code.

I'll unblock as the changes I requested were done.

I still have doubts regarding:

  • whether the metadata service fits the scope of the client/repo
  • how to approach the default task spec

sql/datasource_available.sql Outdated Show resolved Hide resolved
metadata.go Outdated
var datasourceAvailableQuery string

func fillDataSourceName(in string, ds string) string {
return strings.Replace(in, "${{ datasource }}", ds, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still on the fence with this. I see SQL, I see potential injection and I feel uneasy about it.

Seeing it it opens more questions:

  • Why do we use SQL and not native API?
  • Does Metadata service even belongs here?
    • The fact that it itself uses SQL and implements methods that feel more like application level helpers makes me doubt it.

@tomasz-h2o What i your taker on this? We have several iteration regarding some details that were outstanding enough that only now I see the forest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I un-exported the metadata service and its APIs, as well as other task-running helper functions. I also added the note on the intention for how metadata service is supposed to be used. Will that be OK @zoido, @tomasz-h2o ?

If that's not enough, I could refactor deeper to get rid of the service completely and move all the functionality to standalone test-helper functions.

Copy link
Collaborator

@tomasz-h2o tomasz-h2o Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO that would be OK.

We could optionally add _test post-fix to filename, so that would sort of staple this in test only scope.

task_types.go Show resolved Hide resolved
zoido
zoido previously approved these changes Dec 13, 2023
@zoido zoido dismissed their stale review December 13, 2023 09:06

wrong button

@vzayts vzayts requested a review from zoido December 13, 2023 16:33
Copy link
Collaborator

@tomasz-h2o tomasz-h2o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@zoido zoido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -43,10 +43,6 @@ func defaultTaskIngestionSpec() *TaskIngestionSpec {
Spec: &IngestionSpecData{
DataSchema: &DataSchema{
DataSource: "some_datasource",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this can also be confusing. What about "unspecified" or something similar?

@vzayts vzayts merged commit e0e1f01 into master Dec 14, 2023
3 checks passed
@vzayts vzayts deleted the vzaytsev/add-tasks-based-ingestion branch December 14, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Core code related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants