-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign katip-elasticsearch internals #72
Comments
I'm taking a crack at this but don't let it stop you investigating separately. |
Here's an initial benchmark for the current state of it:
However this includes the creation of the http manager, scribe, and log environment. Will amend in a moment. I'm also going to knock down the final volume from 10,000 to 1,000. Update: I'm getting the wrong number of documents for the changed version that doesn't spin up the scribe for each benchmark. Here's my branch: https://github.com/bitemyapp/katip/tree/bitemyapp/bulk-elasticsearch-scribe @MichaelXavier any ideas what I might be doing wrong? |
I'm getting output like that, but my document count will only increase by a couple documents even after a forced refresh index. |
Hmm, normally if docs aren't showing up I'd assume it was due to not closing scribes but if I'm following your bench correctly it looks like you're doing that. Although I wonder if too eagerly: Back to the question of doc counts not increasing as expected, this is still strange. I'd expect every time you close the scribe for it to delay for some time to shuttle the single log message through to the end. The ES scribe should shut down and then the thread managing that scribe should shut down. My next step would probably be to toss some traces into the es scribe and see what's actually happening, probably restricting it to the smallest test size that exhibits the behavior. |
I did that before and it also exhibited weird behavior. I'll try to narrow it down. |
With that code, I seem to only be getting the first benchmark's 10 documents in a single batch or something. |
@MichaelXavier Fixed it. I was running close scribes in the |
After a single run I got:
With a total of 6,035 docs inserted. Which shouldn't be possible even with Criterion re-runs because 5 documents is not a whole insertion run. But it's closer. |
I'm assuming the load shedding is kicking in at this point. Update: increasing the queue size means that |
I think the benchmark is good to go now. I'll get started on the bulk API version now. |
Nice! I've got to imagine that the 1000 messages benchmark is going to get demolished. |
Discussed a bit in other channels. Tagging in @bitemyapp
katip-elasticsearch is extremely naive and has some bad UX to boot. Right now it has a configurable pool of worker threads. They pull from a
TBMQueue
and then do individualindexDocument
calls. This is not ideal because:indexDocument
is slow and hard on the elasticsearch server. It should be bulk indexing.I think the right way to go is:
The text was updated successfully, but these errors were encountered: