Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/reduce memory usage #3629

Merged
merged 10 commits into from
Sep 17, 2024
Merged

Fix/reduce memory usage #3629

merged 10 commits into from
Sep 17, 2024

Conversation

badGarnet
Copy link
Collaborator

This PR fixes the high memory usage when computing intersection areas.

  • it now converts the coordinates into half precision floating point numbers instead of double
  • removes some intermediate variables to free up memory usage

test

Using a memory profiler like memory_profiler in ipython:

## cell 1
from unstructured.partition.pdf_image.pdfminer_processing import areas_of_boxes_and_intersection_area
import numpy as np
%load_ext memory_profiler

## cell 2
%%memit
coords = np.random.rand(40000).reshape((10000,4)).astype(np.float16)

## cell 3
%%memit
inter_area, boxa_area, boxb_area = areas_of_boxes_and_intersection_area(coords, coords)

The peak memory and incremental memory from cell 3 should be close to

peak memory: 730.55 MiB, increment: 573.22 MiB

On main branch the coords is double precision and running the same code with

coords = np.random.rand(40000).reshape((10000,4)).astype(np.float64)

would result in peak memory usage more than 4GiB

This pull request includes updated ingest test fixtures.
Please review and merge if appropriate.

Co-authored-by: christinestraub <[email protected]>
Copy link
Collaborator

@christinestraub christinestraub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some unusual changes in the ingest test fixtures. I'm investigating this issue.

Copy link
Collaborator

@christinestraub christinestraub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@badGarnet badGarnet enabled auto-merge September 16, 2024 23:36
@badGarnet badGarnet added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@christinestraub christinestraub added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@christinestraub christinestraub added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@badGarnet badGarnet added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@badGarnet badGarnet merged commit 2d3cd45 into main Sep 17, 2024
50 checks passed
@badGarnet badGarnet deleted the fix/reduce-memory-usage branch September 17, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants