Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance increase idea: graphs to ignore to graphs to look into list #27

Open
pietercolpaert opened this issue Apr 2, 2024 · 5 comments

Comments

@pietercolpaert
Copy link
Member

The first step when getting all the graphs to ignore, is making sure that your CBD and Shape Templates algorithm only further looks into the graphs it doesn’t need to ignore.

@pietercolpaert
Copy link
Member Author

at this moment we do for example this in Path.ts

    let quads = (
      inverse
        ? store.getQuads(null, this.predicate, focusNode, null)
        : store.getQuads(focusNode, this.predicate, null, null)
    ).filter((q) => !graphsToIgnore.includes(q.graph.value));

If however we instead of passing graphsToIgnore, pass the graphs to look into, we can change this piece of code into:

let quads = [];
for (let graph of graphs) {
     quads = quads.concat( (inverse? store.getQuads(null, this.predicate, focusNode, graph) : store.getQuads(focusNode, this.predicate, null, graph) )
}

Which I believe could go a lot faster thanks to the graph index in the RDFStore.

Getting to the list of graphs is easy: we just get all graphs that are being used, and remove all graphs that currently we put in the ignorelist (the other member IRIs)

@pietercolpaert
Copy link
Member Author

pietercolpaert commented May 3, 2024

Status update:

We forked rdf-stores and added a termCardinalities index. We pull requested this upstream but this is not going to be accepted. rubensworks/rdf-stores.js#8

For the time being we could use our experimental fork, or we could instead of forking it, also provide an extended RDF-store with termCardinality support by creating our own interface with the precise functionality we need. If an RDF Store is provided nonetheless, we can then still add the index we need to make it work for our use case.

interface RDFConnectQuadStore {
   getSubjects(predicate, object, graph); // gets a set of subjects
   getObjects(subject, predicate, graph); // gets a set of objects
   getGraphs(); //gets a set of graphs
   store: RdfStore; // Exposing the main store itself and use as-is
}

@ajuvercr
Copy link
Collaborator

ajuvercr commented May 3, 2024

I would like to find a path where end users don't have to worry about which store they pass along.
So maybe even change RdfStore to RDF/JS: Dataset and try to build efficient things around that.

The main problem is that detecting which graphs are present in the datastore is a O(n) operation if it is not supported by the store implementation and cannot be cached because that store instance might change between extract operations.

There are of course options in the more darker parts of javascript, but let's not sacrifice our sanity in the pursuit of performance.

@pietercolpaert
Copy link
Member Author

Yeah, I also don’t see it as optimal, but I do think it’s a thing that works for now in a better way than using an experimental rdf-stores fork

@pietercolpaert
Copy link
Member Author

I think it makes sense to make this store something reusable within RDF Connect, and also export a store interface for members, that would allow for sorting on timestampPath for example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants