Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core-clp): Postpone GlobalMetadataDB updates until archive writing finishes (fixes #685). #705

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Jan 30, 2025

Description

See #685 for high level PR goals.

re: implementation

Removed all writes during serialization to Global metadata DB.

Instead, file metadata is maintained in a new list (previously was deleted when a segment was closed), and all file metadata is written to Global metadata DB when archive finished.

I explored other options, but in the end, this seemed to be the cleanest.

Validation performed

  • Compared contents of global metadata DB on compressed loghub/Hadoop (lots of files) for previous commit and this PR, and validated contents were the same.
  • Tested compression/decompression of archive
  • Ran unit test
  • Ran clg
  • Added assertion (deleted now in PR) to test that file data was actually deallocated before adding to list.

Summary by CodeRabbit

  • New Features

    • Introduced a unified approach to handling metadata updates during file archiving, enhancing overall consistency and reliability.
    • Added a new method for updating global metadata related to archived files.
  • Refactor

    • Streamlined the archival workflow by consolidating metadata operations, resulting in a clearer and more efficient process for updating file information.
    • Enhanced resource management by ensuring proper deletion of written file metadata after updates.

Copy link
Contributor

coderabbitai bot commented Jan 30, 2025

Walkthrough

This pull request centralizes the handling of global metadata operations in the Archive class. The modifications remove direct calls to the global metadata database in various methods and introduce a new method, update_global_metadata, which encapsulates opening the database, checking for local metadata, updating archive and file metadata, and cleaning up written files. Additionally, a new member variable has been added to store metadata for written files. These changes postpone metadata updates until the archive writing is complete.

Changes

File(s) Change Summary
components/core/src/clp/streaming_archive/writer/Archive.cpp - Removed direct calls to m_global_metadata_db in open, close, persist_file_metadata, and update_metadata.
- Added call to new method update_global_metadata at the end of close.
components/core/src/clp/streaming_archive/writer/Archive.hpp - Added the declaration of update_global_metadata.
- Added new member variable std::vector<File*> m_file_metadata_for_global_update for handling file metadata post-write.

Sequence Diagram(s)

sequenceDiagram
    participant A as Archive
    participant DB as GlobalMetadataDB

    A->>A: close() invoked
    A->>A: Persist file metadata (without global DB update)
    A->>A: update_global_metadata() invoked
    A->>DB: Open global metadata database
    A->>A: Check for local metadata presence
    A->>DB: Add archive metadata
    A->>A: Update metadata for written files
    A->>A: Delete entries from m_file_metadata_for_global_update
    A->>DB: Close global metadata database
Loading

Possibly related issues

Possibly related PRs

Suggested reviewers

  • kirkrodrigues
  • haiqi96

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@davemarco davemarco changed the title feat(clp): Postpone GlobalMetadataDB updates until archive writing finishes (fixes #685). feat(core-clp): Postpone GlobalMetadataDB updates until archive writing finishes (fixes #685). Jan 31, 2025
@davemarco davemarco marked this pull request as ready for review February 4, 2025 15:41
@davemarco davemarco requested a review from haiqi96 February 4, 2025 15:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
components/core/src/clp/streaming_archive/writer/Archive.hpp (2)

282-286: Expand documentation to note file pointer deallocation.

The docstring only mentions global metadata updates. Since the method also deletes the file pointers, consider clarifying this side effect in the comment to prevent surprises for future maintainers.

 /**
  * Updates metadata in the global metadata database.
+ * This method also deletes all file pointers stored in m_files_written.
 */

324-327: Prefer using smart pointers or storing only file metadata.

Maintaining raw pointers in a vector can be error-prone. Consider using std::unique_ptr<File> for safer ownership semantics or storing just the necessary file metadata to avoid potential pointer misuse.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebe30e1 and 363cd56.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (3 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.hpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/streaming_archive/writer/Archive.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
🔇 Additional comments (3)
components/core/src/clp/streaming_archive/writer/Archive.cpp (3)

237-238: Verify exception handling and final state if metadata update fails.

If update_global_metadata() throws an exception here, the archive is partially closed. Ensure that this case is handled gracefully, so that the system does not remain in an invalid or half-closed state.


592-593: Confirm safe usage of raw pointers in m_files_written.

Adding raw pointers to m_files_written implies that other parts of the code must not reference these File* objects after they are deleted. Please confirm that no other references remain to avoid use-after-free behaviour.


640-651: Consider robust error handling in update_global_metadata.

At present, if any database operation fails, the subsequent file deletion and database closure are skipped. A structured error handling flow (e.g., try-finally) would protect against partial updates and resource leaks.

@@ -593,16 +589,12 @@ void Archive::close_segment_and_persist_file_metadata(

for (auto file : files) {
file->mark_as_in_committed_segment();
m_files_written.emplace_back(file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wonder, can we use something like m_files_written.insert(files.begin(), files.end())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I figured the loop was already there. Not sure if better to use existing loop, or do what you proposed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say for sure which one is better, so I am fine to keep the emplace.
I do notice that emplace construct the object in place and insert copies the object. but in our case, I guess emplace will also somehow have to copy via the copy constructor.
Anyway, I don't think the performance really matter here.

@LinZhihao-723 do you have any insights?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, can we directly just do m_files_written = files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, can we directly just do m_files_written = files?

No this will be executed multiple times per archive. That would overwrite the previous call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ture, you are right. my bad

Comment on lines 647 to 649
for (auto* file : m_files_written) {
delete file;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at the function name, one may not expect the function to call delete on m_files_written.

I feel it would be more clear to put these 3 lines out side of the function, or maybe update the funciton name to include the "delete" part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the delete out of the function

@@ -279,6 +279,11 @@ class Archive {
*/
void update_metadata();

/**
* Updates metadata in the global metadata database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto comment for deleting m_files_written.

You might also need to mention about exceptions if m_global_metadata_db's api throws any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the delete out of the function. None of the functions I am calling have any documented exceptions in the .hpp file.

Comment on lines 324 to 325
// Data for all files in this collection has been deallocated, and should only
// contain metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without reading the rest of the code outside this PR, the comment is not very easy to follow.
Can you point me to where the Data for all files is deallocated?

Copy link
Contributor Author

@davemarco davemarco Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function File::append_to_segment has the following code in it. I believe the members m_timestamp/logtype/variables are memory intensive.

    m_is_written_out = true;
    m_timestamps.reset(nullptr);
    m_logtypes.reset(nullptr);
    m_variables.reset(nullptr);

This function File::append_to_segment, should be called before the files are added to m_written_files. In terms of trace, Archive::append_file_contents_to_segment calls File::append_to_segment before calling Archive::close_segment_and_persist_file_metadata . Archive::close_segment_and_persist_file_metadata is function that adds files to m_written_files. So all the files added should have their data cleared. Note to test this, i added an assertion that m_is_written_out is true before adding to list in testing. If you want i can it back to source, but it should never actually trigger.

If you want i can modify the comment to say

Files in this collection only hold metadata. Files are added to this collection after 
`file->append_to_segment()` is called, which deallocates memory for timestamp,
logtype, and variable fields. 

We could also just remove the comment...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment makes sense to me. maybe we can rename the variables?

something like m_file_metadata_for_global_update.

Copy link
Contributor Author

@davemarco davemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered review questions

@@ -593,16 +589,12 @@ void Archive::close_segment_and_persist_file_metadata(

for (auto file : files) {
file->mark_as_in_committed_segment();
m_files_written.emplace_back(file);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I figured the loop was already there. Not sure if better to use existing loop, or do what you proposed

Comment on lines 647 to 649
for (auto* file : m_files_written) {
delete file;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the delete out of the function

@@ -279,6 +279,11 @@ class Archive {
*/
void update_metadata();

/**
* Updates metadata in the global metadata database.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the delete out of the function. None of the functions I am calling have any documented exceptions in the .hpp file.

Comment on lines 324 to 325
// Data for all files in this collection has been deallocated, and should only
// contain metadata.
Copy link
Contributor Author

@davemarco davemarco Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function File::append_to_segment has the following code in it. I believe the members m_timestamp/logtype/variables are memory intensive.

    m_is_written_out = true;
    m_timestamps.reset(nullptr);
    m_logtypes.reset(nullptr);
    m_variables.reset(nullptr);

This function File::append_to_segment, should be called before the files are added to m_written_files. In terms of trace, Archive::append_file_contents_to_segment calls File::append_to_segment before calling Archive::close_segment_and_persist_file_metadata . Archive::close_segment_and_persist_file_metadata is function that adds files to m_written_files. So all the files added should have their data cleared. Note to test this, i added an assertion that m_is_written_out is true before adding to list in testing. If you want i can it back to source, but it should never actually trigger.

If you want i can modify the comment to say

Files in this collection only hold metadata. Files are added to this collection after 
`file->append_to_segment()` is called, which deallocates memory for timestamp,
logtype, and variable fields. 

We could also just remove the comment...

@davemarco davemarco requested a review from haiqi96 February 6, 2025 15:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/src/clp/streaming_archive/writer/Archive.cpp (1)

644-652: Consider using early return pattern for better readability.

The validation check could be simplified using an early return pattern, making the code more maintainable.

 auto Archive::update_global_metadata() -> void {
     m_global_metadata_db->open();
-    if (false == m_local_metadata.has_value()) {
+    if (!m_local_metadata.has_value()) {
         throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
     }
     m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value());
     m_global_metadata_db->update_metadata_for_files(m_id_as_string, m_files_written);
     m_global_metadata_db->close();
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 363cd56 and 48b0779.

📒 Files selected for processing (1)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/streaming_archive/writer/Archive.cpp
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build-macos (macos-13, true)
🔇 Additional comments (3)
components/core/src/clp/streaming_archive/writer/Archive.cpp (3)

237-238: LGTM! Good placement of global metadata update.

The update is correctly placed after all operations are complete and before cleaning up resources.


240-243: LGTM! Proper cleanup of resources.

Files are correctly deleted after metadata is updated, ensuring no data loss.


596-596: Consider using insert() for better performance.

For better performance when adding multiple files, consider using insert() instead of multiple emplace_back() calls.

-        m_files_written.emplace_back(file);
+        m_files_written.insert(m_files_written.end(), files.begin(), files.end());

@haiqi96
Copy link
Contributor

haiqi96 commented Feb 6, 2025

Looking at the change, I feel it's acceptable as a quick fix. It could be confusing for the File Object to be in a state where it holds no valid data but valid metadata, but with properly commented code I don't think it will be a big deal. Did you discuss this design with Kirk?

Just brainstroming the possible ways to properly do it. One possible way is to factor out all File level metadata as a separate class / struct, and each File maintain a pointer to the metadata instance. After appending files to segment, it can then transfer the metadata instance's ownership to archive.cpp, but this could require a bunch of work...

Another hackier way is to set a flag in File.cpp to indicate if data in the file is valid. I don't think this is clean either but at least it gives reader a sense that File object could enter a status where only the metadata is valid.

@davemarco
Copy link
Contributor Author

Looking at the change, I feel it's acceptable as a quick fix. It could be confusing for the File Object to be in a state where it holds no valid data but valid metadata, but with properly commented code I don't think it will be a big deal. Did you discuss this design with Kirk?

I did not discuss design with kirk

Just brainstroming the possible ways to properly do it. One possible way is to factor out all File level metadata as a separate class / struct, and each File maintain a pointer to the metadata instance. After appending files to segment, it can then transfer the metadata instance's ownership to archive.cpp, but this could require a bunch of work...

This could maybe work, but there would be work to refactor a bunch of interfaces which access file, to instead access file->metadata. There could also be some awkwardness where existing operation might have to check whether file->metadata exists before accessing the metadata.

Another hackier way is to set a flag in File.cpp to indicate if data in the file is valid. I don't think this is clean either but at least it gives reader a sense that File object could enter a status where only the metadata is valid.

We already have the flag m_is_written_out which is set when data is cleared. I could maybe add a public getter function for this variable that also explains that only metadata is valid?

@haiqi96 haiqi96 closed this Feb 6, 2025
@haiqi96 haiqi96 reopened this Feb 6, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b0779 and 846062b.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (3 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.hpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_archive/writer/Archive.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/streaming_archive/writer/Archive.cpp
🪛 GitHub Actions: clp-lint
components/core/src/clp/streaming_archive/writer/Archive.cpp

[error] 650-650: code should be clang-formatted


[error] 650-650: code should be clang-formatted


[error] 650-650: code should be clang-formatted

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (2)
components/core/src/clp/streaming_archive/writer/Archive.cpp (2)

237-244: LGTM! Proper sequencing and cleanup.

The changes ensure that global metadata is updated before nulling the database pointer, followed by proper cleanup of file metadata.


594-597: LGTM! Efficient metadata handling.

The file is properly marked as in committed segment before being added to the list for global metadata update. Using emplace_back is efficient for insertion.

Comment on lines 644 to 652
auto Archive::update_global_metadata() -> void {
m_global_metadata_db->open();
if (false == m_local_metadata.has_value()) {
throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
}
m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value());
m_global_metadata_db->update_metadata_for_files(m_id_as_string, m_file_metadata_for_global_update);
m_global_metadata_db->close();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix formatting issues and improve error handling.

The pipeline reports formatting issues. Additionally, consider these improvements:

  1. Use more idiomatic condition check
  2. Add descriptive error message

Apply this diff to address the issues:

 auto Archive::update_global_metadata() -> void {
     m_global_metadata_db->open();
-    if (false == m_local_metadata.has_value()) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    if (!m_local_metadata) {
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            "Local metadata not initialized before updating global metadata"
+        );
     }
     m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value());
     m_global_metadata_db->update_metadata_for_files(m_id_as_string,
             m_file_metadata_for_global_update);
     m_global_metadata_db->close();
 }

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Actions: clp-lint

[error] 650-650: code should be clang-formatted


[error] 650-650: code should be clang-formatted


[error] 650-650: code should be clang-formatted

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 846062b and 920cfb8.

📒 Files selected for processing (1)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/streaming_archive/writer/Archive.cpp
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (3)
components/core/src/clp/streaming_archive/writer/Archive.cpp (3)

237-238: LGTM!

The order of operations is correct: updating global metadata before nulling the database pointer.


240-243: LGTM!

Proper cleanup of file metadata after the global update is complete.


596-596: LGTM!

Correctly storing file metadata for later global update after marking the file as in a committed segment.

Comment on lines +644 to +655
auto Archive::update_global_metadata() -> void {
m_global_metadata_db->open();
if (false == m_local_metadata.has_value()) {
throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
}
m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value());
m_global_metadata_db->update_metadata_for_files(
m_id_as_string,
m_file_metadata_for_global_update
);
m_global_metadata_db->close();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and condition check.

The implementation is correct, but there are a few improvements that can be made:

  1. Use idiomatic condition check
  2. Add descriptive error message

Apply this diff to improve the code:

 auto Archive::update_global_metadata() -> void {
     m_global_metadata_db->open();
-    if (false == m_local_metadata.has_value()) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    if (!m_local_metadata) {
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            "Local metadata not initialized before updating global metadata"
+        );
     }
     m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value());
     m_global_metadata_db->update_metadata_for_files(
             m_id_as_string,
             m_file_metadata_for_global_update
     );
     m_global_metadata_db->close();
 }

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
components/core/src/clp/streaming_archive/writer/Archive.cpp (2)

596-600: Enhance comment clarity about memory deallocation.

The comment should be more explicit about memory deallocation to prevent confusion.

-        // Files in this collection only hold metadata. Emplaced files have called method
-        // `File::append_to_segment()` which deallocates memory for timestamp,
-        // logtype, and variable fields.
+        // Files in this collection only hold metadata as `File::append_to_segment()` has been called,
+        // which deallocates memory for timestamp, logtype, and variable fields to reduce memory usage.
+        // The metadata is preserved for updating the global metadata database.

647-658: Improve error handling and condition check.

The implementation is correct, but could benefit from these improvements:

  1. Use idiomatic condition check
  2. Add descriptive error message
 auto Archive::update_global_metadata() -> void {
     m_global_metadata_db->open();
-    if (false == m_local_metadata.has_value()) {
-        throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+    if (!m_local_metadata) {
+        throw OperationFailed(
+            ErrorCode_Failure,
+            __FILENAME__,
+            __LINE__,
+            "Local metadata not initialized before updating global metadata"
+        );
     }
     m_global_metadata_db->add_archive(m_id_as_string, m_local_metadata.value());
     m_global_metadata_db->update_metadata_for_files(
             m_id_as_string,
             m_file_metadata_for_global_update
     );
     m_global_metadata_db->close();
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 920cfb8 and 0628636.

📒 Files selected for processing (2)
  • components/core/src/clp/streaming_archive/writer/Archive.cpp (3 hunks)
  • components/core/src/clp/streaming_archive/writer/Archive.hpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/streaming_archive/writer/Archive.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/streaming_archive/writer/Archive.cpp
⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build (macos-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (2)
components/core/src/clp/streaming_archive/writer/Archive.cpp (2)

237-238: LGTM!

The placement of the global metadata update ensures all file operations are complete before updating the global metadata.


240-244: LGTM!

The cleanup ensures proper resource management by deleting file pointers after they are no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants