Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TDL-22683] Use Date Windowing Pagination #113

Merged
merged 70 commits into from
May 13, 2024

Conversation

shantanu73
Copy link
Contributor

@shantanu73 shantanu73 commented Feb 20, 2024

Description of change

  1. Implemented date windowing for all multi-threaded streams to optimize the process of fetching stream data and resolve infinite loop issues in extraction runs.
  2. Added a limit to buffer size of fetched records to release so that memory consumption can be optimized.
  3. Removed performance metrics calculation which was adding additional overhead on tap performance.
  4. Fixed data loss issue in interrupted sync scenario.

Manual QA steps

  • Ran sync runs, and python memory profiling to determine tap memory consumption and stream performance.

Risks

Rollback steps

  • revert this branch

cosimon and others added 30 commits December 20, 2023 18:14
1) Fixed Date windowing logic for multithreaded_bookmark_generator.
2) Added date_windowing instance variable as true in the constructor of activities stream.
1) Removed endpoint params from activities generator.
2) Overrode modify_reques_params method for Loan Transactions generator.
3) Removed the method _all_fetch_batch_steps from multithreaded bookmark generator.
4) Modified the implementation for _all_fetch_batch_steps method in multithreaded offset generator to include date windowing.
5) Added new method modify_reques_params in multithreaded offset generator.
1) Set max threads back to 20.
2) Fixed bug in Date windowing implementation.
1) Changed number of threads from 5 to default 20 for gl_journal_entries stream.
2) Added date window implementation for gl_journal_entries stream.
1) Removed redundant initialization of date_windowing for Activities, GL Journal entries & Loan transactions streams.
2) Changed max threads for Deposit transactions stream to default value.
@RushiT0122 RushiT0122 force-pushed the TDL-22683-date-windowing-pagination branch from bbeec12 to c4232d6 Compare May 3, 2024 13:19
@RushiT0122 RushiT0122 force-pushed the TDL-22683-date-windowing-pagination branch from c4232d6 to 5b691a9 Compare May 3, 2024 13:21
self.__username = username
self.__password = password
self.__subdomain = subdomain
base_url = "https://{}.mambu.com/api".format(subdomain)
self.base_url = base_url
self.page_size = page_size
self.window_size=window_size

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintain the spacing as on previous lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed already.

@@ -1,35 +0,0 @@
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to remove these tests completely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it because it was increasing execution time and not adding much value to the test suite.

@@ -16,6 +16,7 @@ def name():

def untestable_streams(self):
return set([
"clients", # Stream does not have enough records to test pagination

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there only a few records, or do we have no records at all? If we do have a small number of records, we can reduce the page limit and test this specific stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window_size implementation fixed the issue, removing clients from untestable_streams().

self.__username = username
self.__password = password
self.__subdomain = subdomain
base_url = "https://{}.mambu.com/api".format(subdomain)
self.base_url = base_url
self.page_size = page_size
try:
self.window_size = int(float(window_size)) if window_size else DEFAULT_DATE_WINDOW_SIZE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do integer typecasting directly instead of, converting to float first?

Copy link
Contributor

@RushiT0122 RushiT0122 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> int("10.0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '10.0'

@@ -1,10 +1,11 @@
import time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unused import time

@RushiT0122 RushiT0122 merged commit 79a9963 into master May 13, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants