Skip to content
This repository has been archived by the owner on Jan 5, 2022. It is now read-only.

[Discussion] Handle Neptune ConcurrentModificationExceptions #89

Open
justinlittman opened this issue Jan 17, 2019 · 7 comments
Open

[Discussion] Handle Neptune ConcurrentModificationExceptions #89

justinlittman opened this issue Jan 17, 2019 · 7 comments

Comments

@justinlittman
Copy link
Contributor

While running load in production, Neptune is returning a significant number of ConcurrentModificationExceptions (Operation failed due to conflicting concurrent operations (please retry). This is making loading slow and unreliable.

I haven't been able to track down much information on this exception. The general sense I get is that Neptune is being overwhelmed by the number of inserts that are being performed. And this is to be expected.

Note that I'm fairly certain that our ETL Sparql load is being performed with only a single thread.

AWS documentation suggests creating a temporarily larger db instance during load and then deleting that instance. (Sigh.)

Options may include:

  • Increasing the size of our db instance.
  • Throttling the RIALTO ETL Sparql loader.
  • Adding retries to the Sparql Loader lambda.
  • Switching to a different Neptune load mechanism.

Discuss.

@justinlittman
Copy link
Contributor Author

Verified that our ETL Sparql load is performed with a single thread.

In production, we were concurrently running a retro publication load, a current publication load, and grants load. Stopping the current publication load and the grants load seemed to eliminate the exceptions, suggesting that the maximum number of concurrent loads is 1.

@justinlittman
Copy link
Contributor Author

The Neptune Loader command isn't an option since it supports inserting data only, not deleting.

@justinlittman
Copy link
Contributor Author

Another option is to increase the number of retries in the RIALTO ETL Sparql loader to avoid the load entirely failing. (Currently, it is set to 5 retries which is usually sufficient until it isn't.)

@justinlittman
Copy link
Contributor Author

Decision is to go with increased retries for now.

@aaron-collier
Copy link
Contributor

@justinlittman If I read that correctly (the aws doc that is), it is implying that a WRITE instance is only available during that load and otherwise only READ instances are available?

@justinlittman
Copy link
Contributor Author

Not clear if only READ instances or one of the READ instances gets promoted to WRITE instance.

@aaron-collier
Copy link
Contributor

Just ran a test where I created a neptune cluster with a writer and 3 readers... when deleting the writer, it randomly selected one of the readers to promote to writer, not the first one, a random one... but confirmed, one is promoted to writer

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants