Investigate the performance issue in DataONE indexer #34

taojing2002 · 2022-12-07T01:28:03Z

I deployed the DataONE Indexer instance on the dev cluster and installed a Metacat instance supporting RabbitMQ on test.arcticdata.io. I created a simple package with a single metadata and single data objects. It took more than 14 seconds to finish the indexing. The annotation processor took about eight seconds.

Matt suggested we need to compare performance of the DataONE indexer with the current Metacat indexer. Also, we can test it on the production cluster.

taojing2002 · 2022-12-20T22:12:06Z

The initialize method in the OntologyModelService class takes long time to read the ontologies from the disk to a memory jena model. We moved the initialize method to the initialization process of the index worker and improved the performance during the object index process.

taojing2002 · 2022-12-20T22:18:33Z

Now we have two issues:

Iterate the SPARQL query results in the OntologyModelService takes long time (about four seconds). The details please see this ticket: jena.query.ResultSet.hasNext takes a long time in OntologyModelService.expandConcepts #43
It takes long time (1.5 seconds) to send the processed solr document to the solr server and get response. In my local stand-alone java dataone-indexer, it takes about 0.1 second.

artntek · 2024-02-06T18:24:42Z

From: #43
jena.query.ResultSet.hasNext takes a long time in OntologyModelService.expandConcepts #43
(dupe now closed)

In the dev cluster the jena.query.ResultSet.hasNext method takes about four seconds to finish. However, the second time to insert the same document, it almost takes 0 second to finish it. Somehow, there is a cache system there. The code looks like:

        Query query = QueryFactory.create(q);
        QueryExecution qexec = QueryExecutionFactory.create(query, ontModel);
        ResultSet results = qexec.execSelect();
        String name = field.getName();
        Set<String> values = new HashSet<String>();
         // results.hasNext() takes a long time
        while (results.hasNext()) {
          QuerySolution solution = results.next();

taojing2002 added this to the 3.0.0 milestone Dec 7, 2022

taojing2002 self-assigned this Dec 7, 2022

artntek modified the milestones: 3.0.0, 3.1.0 Feb 6, 2024

artntek mentioned this issue Feb 6, 2024

jena.query.ResultSet.hasNext takes a long time in OntologyModelService.expandConcepts #43

Closed

artntek modified the milestones: 3.1.0, 3.2.0 Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate the performance issue in DataONE indexer #34

Investigate the performance issue in DataONE indexer #34

taojing2002 commented Dec 7, 2022

taojing2002 commented Dec 20, 2022

taojing2002 commented Dec 20, 2022

artntek commented Feb 6, 2024 •

edited

Loading

Investigate the performance issue in DataONE indexer #34

Investigate the performance issue in DataONE indexer #34

Comments

taojing2002 commented Dec 7, 2022

taojing2002 commented Dec 20, 2022

taojing2002 commented Dec 20, 2022

artntek commented Feb 6, 2024 • edited Loading

artntek commented Feb 6, 2024 •

edited

Loading