BUGFIX: [graph] Increase count of related elements fetched by Correlation Graph to 500000 #9240

ckane · 2024-12-05T02:29:12Z

The 500 upper limit is very low and results in a lot of correlation graphs displaying no correlations in the chart, while the main page of the Graph says there exist correlations. In some cases this limit was just 20. This fixes a lot of the reported problems where related reporting is seen on the Overview page of a analysis/case/etc. but the Correlation Graph is blank (because it isn't fetching all nodes, and the correlations don't show in the first 500 fetched). In some scenarios, the graph does show some items, but is missing other expected ones.

Proposed changes

Set first: 500000 when fetching first-degree and second-degree related nodes for Correlation Graphs

Related issues

Checklist

I consider the submitted work as finished
I tested the code for its functionality
I wrote test cases for the relevant uses case (coverage and e2e)
I added/update the relevant documentation (either on github or on notion)
Where necessary I refactored code to improve the overall quality

codecov · 2024-12-05T02:50:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.26%. Comparing base (f8d757f) to head (ebcd2af).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9240      +/-   ##
==========================================
- Coverage   65.26%   65.26%   -0.01%     
==========================================
  Files         630      630              
  Lines       60311    60311              
  Branches     6777     6771       -6     
==========================================
- Hits        39361    39359       -2     
- Misses      20950    20952       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

labo-flg · 2024-12-05T08:49:32Z

Thanks for your contributions !
I totally agree this limit makes the correlations views inconsistent, however it's a bit delicate to entirely remove the limit, for performance reasons.

Imagine you have a widely used entity like "APT28" or "USA", present in thousands of reports. GQL will wait for the complete resolution of this huge graph, which will take a long time. Display would suffer a lot too.

ckane · 2024-12-05T16:45:00Z

Thanks for your contributions ! I totally agree this limit makes the correlations views inconsistent, however it's a bit delicate to entirely remove the limit, for performance reasons.

Imagine you have a widely used entity like "APT28" or "USA", present in thousands of reports. GQL will wait for the complete resolution of this huge graph, which will take a long time. Display would suffer a lot too.

I agree, thought in my experience that behavior will probably hit an upper limit on browser or network timeout, which a user can still escape from if they don't like it - so with the low upper-limit, those "big reports" are unusable but also a bunch of medium-sized reports are also unusable. With this change, those big reports may still be unusable, but it makes the medium and small sized reports more usable and accurate.

The graphing library seems fairly performant, and here's an example of one with 1000's of items that seemed to render fine for me:

ckane · 2024-12-05T16:51:35Z

"BIG Reports" like the "Operation SMN (Novetta)" report that is available via MISP Feeds are still unusable, and eventually display an error. However, the normal entity graph for that report also doesn't work, and times out w/ an error while trying to fetch its content. This report is my favorite use case when trying to test/evaluate "large report" problems.

In my opinion, the failure mode of "too big" graphs occurs in a much more limited set of circumstances than the current "missing data" behavior today, and between the two, I would say it is preferred to tolerate the failure where a small number of large or very-related reports might error out with a render failure.

ckane · 2024-12-05T19:00:59Z

If you try to render a large graph and it's too big, it shows this error after about ~15secs, so IMO that behavior is preferable

ckane · 2024-12-06T03:38:07Z

I took this work a step further and have the correlation traversal algorithm traverse all related nodes of type cases, groupings, and reports regardless of which entity type is the "current view". This allows for the exploration of related OSINT reporting from within an Incident Reponse or Grouping container's correlation graph, and vice-versa.

Example view from within an RFI object I created:

labo-flg · 2024-12-16T17:04:45Z

I took this work a step further and have the correlation traversal algorithm traverse all related nodes of type cases, groupings, and reports regardless of which entity type is the "current view". This allows for the exploration of related OSINT reporting from within an Incident Reponse or Grouping container's correlation graph, and vice-versa.

That's a good point. Part of our work on #3227 for next milestone addressed this issue. It has been merged in #9175

Note that in the details view, the correlated containers are detected through common observables and indicators. To be consistent we also aligned in the graph view.
We add in this PR the ability to switch in the graph view from "observables and indicators" to "all entities" in case user needs extended correlations.

Now, concerning the limitation & performances issues, this raised a lot of discussions.
We need to be careful about the impact of such changes : in some conditions it might generate such a huge amount of data that your backend process would just eat all the memory and die.
See this issue : #9293

We think the best solution is a heavy refactoring of the graph code to fetch data bits by bits, paginating results and loading the graph incrementally.
I've drafted an issue for that #9359

ckane · 2024-12-16T20:38:51Z

I took this work a step further and have the correlation traversal algorithm traverse all related nodes of type cases, groupings, and reports regardless of which entity type is the "current view". This allows for the exploration of related OSINT reporting from within an Incident Reponse or Grouping container's correlation graph, and vice-versa.

That's a good point. Part of our work on #3227 for next milestone addressed this issue. It has been merged in #9175

Note that in the details view, the correlated containers are detected through common observables and indicators. To be consistent we also aligned in the graph view. We add in this PR the ability to switch in the graph view from "observables and indicators" to "all entities" in case user needs extended correlations.

Now, concerning the limitation & performances issues, this raised a lot of discussions. We need to be careful about the impact of such changes : in some conditions it might generate such a huge amount of data that your backend process would just eat all the memory and die. See this issue : #9293

We think the best solution is a heavy refactoring of the graph code to fetch data bits by bits, paginating results and loading the graph incrementally. I've drafted an issue for that #9359

Thanks! I agree that the best implementation would be to do a paginated walk to incrementally build the graph by making smaller requests that don't overload the back-end. I don't understand well enough how to do that for these 2nd-degree related items, however (thus, my solution being to just set the limits really really high).

I do think that having the ability to have the front-end query what the "total set" size would be ahead of time, so that it can render a progress meter during incremental graph build-out, would be a helpful UI element to it, so the end users know when the data is still being retrieved. The partial graph could be shown in real-time, and have additions to it added and displayed as they're fetched from the server.

Additionally, having a "stop" and/or "pause" button that allows the data fetch/graph-build to be stopped would also be helpful. So, for larger graphs, if the end user happens to see the data they're interested in visible on the graph at some point during the paginated ingest, they could stop or pause the data fetch and then explore the portion of the graph that's been built up to that point.

labo-flg · 2024-12-23T08:37:07Z

That's where we want to go next, we're aligned on that :)

I'm thinking step-by-step queries, all properly paginated. The frontend would query the main container, then it's content, then for each one the relate containers, etc.
The graph would load bits by bits on screen, with a proper loading bar.

Knowing in advance the size of the full graph is challenging, but at least we can give visual feedback with all the info we can get, as soon as we get them.

The 500 upper limit is very low and results in a lot of correlation graphs displaying no correlations in the chart, while the main page of the Graph says there exist correlations. In some cases this limit was just 20.

For correlation graphs, include any cases, groupings, or reports as correlations, regardless of the current container type

…graphs

This limit was still at 20, resulting in empty graphs when correlations were expected.

ckane mentioned this pull request Dec 5, 2024

Improve graph analysis #7448

Open

labo-flg added community use to identify PR from community graph linked to graph display and manipulation labels Dec 6, 2024

ckane force-pushed the corr-graph-missing-items-issue9198 branch 4 times, most recently from ed0ab53 to 1ee9da9 Compare December 12, 2024 04:12

ckane force-pushed the corr-graph-missing-items-issue9198 branch from 2bd60d5 to 76acd0f Compare December 18, 2024 03:29

ckane force-pushed the corr-graph-missing-items-issue9198 branch from 76acd0f to f89fbb0 Compare December 24, 2024 22:59

richard-julien force-pushed the master branch 5 times, most recently from b414944 to c7f4cb7 Compare January 13, 2025 09:27

ckane added 4 commits January 27, 2025 19:34

[graphs] Include cases, reports, and groupings as 2nd-degree connections

56756d4

For correlation graphs, include any cases, groupings, or reports as correlations, regardless of the current container type

[graph] Include any cases, reports, and groupings in all correlation …

b263bfe

…graphs

[graph] Increase limit on correlated objects retrieval

ebcd2af

This limit was still at 20, resulting in empty graphs when correlations were expected.

ckane force-pushed the corr-graph-missing-items-issue9198 branch from f89fbb0 to ebcd2af Compare January 28, 2025 00:34

SamuelHassine force-pushed the master branch from 87ac343 to b7cce8f Compare February 2, 2025 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUGFIX: [graph] Increase count of related elements fetched by Correlation Graph to 500000 #9240

BUGFIX: [graph] Increase count of related elements fetched by Correlation Graph to 500000 #9240

ckane commented Dec 5, 2024

codecov bot commented Dec 5, 2024 •

edited

Loading

labo-flg commented Dec 5, 2024

ckane commented Dec 5, 2024

ckane commented Dec 5, 2024

ckane commented Dec 5, 2024

ckane commented Dec 6, 2024

labo-flg commented Dec 16, 2024

ckane commented Dec 16, 2024

labo-flg commented Dec 23, 2024

BUGFIX: [graph] Increase count of related elements fetched by Correlation Graph to 500000 #9240

Are you sure you want to change the base?

BUGFIX: [graph] Increase count of related elements fetched by Correlation Graph to 500000 #9240

Conversation

ckane commented Dec 5, 2024

Proposed changes

Related issues

Checklist

codecov bot commented Dec 5, 2024 • edited Loading

Codecov Report

labo-flg commented Dec 5, 2024

ckane commented Dec 5, 2024

ckane commented Dec 5, 2024

ckane commented Dec 5, 2024

ckane commented Dec 6, 2024

labo-flg commented Dec 16, 2024

ckane commented Dec 16, 2024

labo-flg commented Dec 23, 2024

codecov bot commented Dec 5, 2024 •

edited

Loading