-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tasks based ingestion #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two main issues that make me to request changes:
- potential SQL injection
- no timeout for test await functions
See inline comments for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, couple of comments for your consideration.
Side question: do we really need this vast amount of structures introduced?
I understand that it covers thee parts of druid API we're using, though we're not really using big part of those fields, are we?
I'm not asking to remove them, though maybe make sure that we implement bare minumum for client to function.
76de2f7
to
bc46dc3
Compare
Most of these types were implemented previously, and in this PR they were moved from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! I feel we need to polish them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor comments from my side.
@vzaytsev1981 I just spotted the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments.
Apologies. As the PR is on the bigger side I keep spotting things that I did not spotted before as my attention was lead in different parts of the code.
I'll unblock as the changes I requested were done.
I still have doubts regarding:
- whether the metadata service fits the scope of the client/repo
- how to approach the default task spec
metadata.go
Outdated
var datasourceAvailableQuery string | ||
|
||
func fillDataSourceName(in string, ds string) string { | ||
return strings.Replace(in, "${{ datasource }}", ds, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still on the fence with this. I see SQL, I see potential injection and I feel uneasy about it.
Seeing it it opens more questions:
- Why do we use SQL and not native API?
- Does Metadata service even belongs here?
- The fact that it itself uses SQL and implements methods that feel more like application level helpers makes me doubt it.
@tomasz-h2o What i your taker on this? We have several iteration regarding some details that were outstanding enough that only now I see the forest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I un-exported the metadata service and its APIs, as well as other task-running helper functions. I also added the note on the intention for how metadata service is supposed to be used. Will that be OK @zoido, @tomasz-h2o ?
If that's not enough, I could refactor deeper to get rid of the service completely and move all the functionality to standalone test-helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO that would be OK.
We could optionally add _test
post-fix to filename, so that would sort of staple this in test only scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -43,10 +43,6 @@ func defaultTaskIngestionSpec() *TaskIngestionSpec { | |||
Spec: &IngestionSpecData{ | |||
DataSchema: &DataSchema{ | |||
DataSource: "some_datasource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, this can also be confusing. What about "unspecified"
or something similar?
This pull request adds 2 new API services:
Tasks Service adds a different mode of ingestion - task-based, which is different from supervisor-based that runs ingestion tasks periodically. Three APIs added there for the tasks:
Submit
,GetStatus
andShutdown
. Some of the ingestion spec types and functions were initially implemented for our integration tests by @tomasz-h2o. Now they are moved here and extended.Among other changes: extracting common ingestion specification data into
common_spec_types.go
.I'd like to review this one against https://github.com/h2oai/go-druid/tree/tomasz/implement-suspend-resume that extends supervisor APIs implementation, and then have both features reviewed/merged/consumed for our migration data needs.