Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tool to get graph statistics #556

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

GioLomia
Copy link
Contributor

This PR is meant to add an executable script that will allow us to inspect our memory performance and benchmark and changes we make the underlying memory structure.

@GioLomia GioLomia marked this pull request as draft September 22, 2021 21:20
@GioLomia GioLomia force-pushed the graph-memory-analytics branch 4 times, most recently from ba310c9 to f619e39 Compare September 28, 2021 16:30
@GioLomia GioLomia marked this pull request as ready for review September 29, 2021 18:54
@ddn0 ddn0 changed the title Adding a tool to get memory analytics on the property graphs. Add a tool to get graph statistics Sep 29, 2021
@ddn0
Copy link
Contributor

ddn0 commented Sep 29, 2021

I thought the goal was the unify our tools into a common framework. This seems to just create more tools that will get out of date.

Also, can you add tests?

@GioLomia
Copy link
Contributor Author

This is a separate PR. The one to unify things is still in draft. This tool was needed to unblock some work for other people.

@ddn0
Copy link
Contributor

ddn0 commented Sep 29, 2021

This is a separate PR. The one to unify things is still in draft. This tool was needed to unblock some work for other people.

I'd prefer we start with writing the library API and adding tests. If people want to run code that doesn't have tests, that is up to them and they can just take your branch. But we should try to uphold some quality for things in master.

@GioLomia
Copy link
Contributor Author

I can definitely write tests for this.

@GioLomia GioLomia marked this pull request as draft September 30, 2021 22:01
@GioLomia GioLomia force-pushed the graph-memory-analytics branch 2 times, most recently from bd94e80 to f00ae8c Compare October 4, 2021 21:59
@GioLomia GioLomia marked this pull request as ready for review October 5, 2021 15:40
@GioLomia GioLomia force-pushed the graph-memory-analytics branch 3 times, most recently from dca3a72 to cc9837f Compare October 6, 2021 15:24
@GioLomia GioLomia force-pushed the graph-memory-analytics branch from 706c0ff to d4b8999 Compare October 7, 2021 14:31
Copy link
Contributor

@witchel witchel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!
I think a unified tool to collect and print information on graphs will be really useful.
I do agree with Donald that we need a unification rather than another tool.
I also agree with him that tests are necessary. There are somewhat easy ways of building graphs in memory, you can see TestCreate in ingest.cpp. That starts from an empty in-memory graph and adds nodes, edges and properties.

tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
Comment on lines +68 to +69
real_used_space += sizeof(mainType);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this calculation correct for Boolean types? Is their sizeof == 8?
We shouldn't have any Booleans in V2 RDGs, but might as well be correct.

tools/graph-stats/graph-memory-stats.cpp Show resolved Hide resolved
tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
Comment on lines +212 to +186
basic_raw_stats.insert(std::pair("Node-Schema-Size", total_num_node_props));
basic_raw_stats.insert(std::pair("Edge-Schema-Size", total_num_edge_props));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Time.h:BytesToStr (in libsupport) for some good formatting options for sizes and times.

tools/graph-stats/graph-memory-stats.cpp Show resolved Hide resolved
} else {
prop_field = g->GetEdgeProperty(prop_name).value()->chunk(0);
}
auto bit_width = arrow::bit_width(dtype->id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe this makes your space calculation for Boolean types work?

tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@amberhassaan amberhassaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few important comments.

tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
tools/graph-stats/graph-memory-stats.cpp Outdated Show resolved Hide resolved
@GioLomia GioLomia force-pushed the graph-memory-analytics branch from 96030d9 to 148c34d Compare October 8, 2021 19:05
@GioLomia GioLomia force-pushed the graph-memory-analytics branch from ad7631c to 9ceb2fe Compare October 11, 2021 21:54
@amberhassaan
Copy link
Contributor

@GioLomia : my changes have diverged quite a bit from current state of the code, so you shouldn't push to this PR for now. We can revisit and decide what to do with the tool once I'm finished with my changes (and analysis of grouping memory usage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants