-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a tool to get graph statistics #556
base: master
Are you sure you want to change the base?
Conversation
ba310c9
to
f619e39
Compare
I thought the goal was the unify our tools into a common framework. This seems to just create more tools that will get out of date. Also, can you add tests? |
This is a separate PR. The one to unify things is still in draft. This tool was needed to unblock some work for other people. |
I'd prefer we start with writing the library API and adding tests. If people want to run code that doesn't have tests, that is up to them and they can just take your branch. But we should try to uphold some quality for things in master. |
I can definitely write tests for this. |
bd94e80
to
f00ae8c
Compare
dca3a72
to
cc9837f
Compare
706c0ff
to
d4b8999
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much!
I think a unified tool to collect and print information on graphs will be really useful.
I do agree with Donald that we need a unification rather than another tool.
I also agree with him that tests are necessary. There are somewhat easy ways of building graphs in memory, you can see TestCreate
in ingest.cpp
. That starts from an empty in-memory graph and adds nodes, edges and properties.
real_used_space += sizeof(mainType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this calculation correct for Boolean
types? Is their sizeof
== 8?
We shouldn't have any Booleans in V2 RDGs, but might as well be correct.
basic_raw_stats.insert(std::pair("Node-Schema-Size", total_num_node_props)); | ||
basic_raw_stats.insert(std::pair("Edge-Schema-Size", total_num_edge_props)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Time.h:BytesToStr
(in libsupport) for some good formatting options for sizes and times.
} else { | ||
prop_field = g->GetEdgeProperty(prop_name).value()->chunk(0); | ||
} | ||
auto bit_width = arrow::bit_width(dtype->id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe this makes your space calculation for Boolean types work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few important comments.
96030d9
to
148c34d
Compare
ad7631c
to
9ceb2fe
Compare
@GioLomia : my changes have diverged quite a bit from current state of the code, so you shouldn't push to this PR for now. We can revisit and decide what to do with the tool once I'm finished with my changes (and analysis of grouping memory usage). |
This PR is meant to add an executable script that will allow us to inspect our memory performance and benchmark and changes we make the underlying memory structure.