[Discussion] Handle Neptune ConcurrentModificationExceptions #89

justinlittman · 2019-01-17T12:18:44Z

While running load in production, Neptune is returning a significant number of ConcurrentModificationExceptions (Operation failed due to conflicting concurrent operations (please retry). This is making loading slow and unreliable.

I haven't been able to track down much information on this exception. The general sense I get is that Neptune is being overwhelmed by the number of inserts that are being performed. And this is to be expected.

Note that I'm fairly certain that our ETL Sparql load is being performed with only a single thread.

AWS documentation suggests creating a temporarily larger db instance during load and then deleting that instance. (Sigh.)

Options may include:

Increasing the size of our db instance.
Throttling the RIALTO ETL Sparql loader.
Adding retries to the Sparql Loader lambda.
Switching to a different Neptune load mechanism.

Discuss.

justinlittman · 2019-01-17T15:56:17Z

Verified that our ETL Sparql load is performed with a single thread.

In production, we were concurrently running a retro publication load, a current publication load, and grants load. Stopping the current publication load and the grants load seemed to eliminate the exceptions, suggesting that the maximum number of concurrent loads is 1.

justinlittman · 2019-01-17T15:57:38Z

The Neptune Loader command isn't an option since it supports inserting data only, not deleting.

justinlittman · 2019-01-17T15:58:51Z

Another option is to increase the number of retries in the RIALTO ETL Sparql loader to avoid the load entirely failing. (Currently, it is set to 5 retries which is usually sufficient until it isn't.)

justinlittman · 2019-01-17T18:20:50Z

Decision is to go with increased retries for now.

aaron-collier · 2019-01-17T19:14:40Z

@justinlittman If I read that correctly (the aws doc that is), it is implying that a WRITE instance is only available during that load and otherwise only READ instances are available?

justinlittman · 2019-01-17T20:35:08Z

Not clear if only READ instances or one of the READ instances gets promoted to WRITE instance.

aaron-collier · 2019-01-17T22:04:09Z

Just ran a test where I created a neptune cluster with a writer and 3 readers... when deleting the writer, it randomly selected one of the readers to promote to writer, not the first one, a random one... but confirmed, one is promoted to writer

justinlittman mentioned this issue Jan 17, 2019

Increase Sparql Loader retries sul-dlss-deprecated/rialto-etl#356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Handle Neptune ConcurrentModificationExceptions #89

[Discussion] Handle Neptune ConcurrentModificationExceptions #89

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

aaron-collier commented Jan 17, 2019

justinlittman commented Jan 17, 2019

aaron-collier commented Jan 17, 2019

[Discussion] Handle Neptune ConcurrentModificationExceptions #89

[Discussion] Handle Neptune ConcurrentModificationExceptions #89

Comments

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

justinlittman commented Jan 17, 2019

aaron-collier commented Jan 17, 2019

justinlittman commented Jan 17, 2019

aaron-collier commented Jan 17, 2019