Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CRC64 checksum check #18682

Merged
merged 5 commits into from
Sep 5, 2024
Merged

Add CRC64 checksum check #18682

merged 5 commits into from
Sep 5, 2024

Conversation

elega
Copy link
Contributor

@elega elega commented Sep 4, 2024

What changes are proposed in this pull request?

image

Why are the changes needed?

To check the file integrity

Does this PR introduce any user facing changes?

Add a crc64 check cli

GetBlockChecksumRequest request) {
// Non-batched mode
if (request.getBlockIdsCount() < mChecksumBatchSize) {
long startTs = CommonUtils.getCurrentMs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the client just send requests without doing logical processing? Would it be better to put this part of the logic processing on a higher level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved the logic to the util

* Heap buffer pool.
*/
public class NioHeapBufferPool {
private static final TreeMap<Integer, LinkedList<ByteBuffer>> BUF_POOL = new TreeMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current memory pool is only used by CRC, and the application will not be recycled. Will it occupy core business resources? I think CRC verification is a low-frequency non-core business that does not require high efficiency. Should I directly apply for heap memory temporarily? That’s it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a non-pooled option

import java.nio.ByteBuffer;
import java.util.zip.Checksum;

public class CRC64 implements Checksum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can crc64 not be used as a standard library function? Because different ufs crcs have different algorithms. For example, oss and cos are crc64, s3 is crc32, and hdfs has its own verification. Can we directly connect the verification algorithm of each ufs? Which one is better?

@@ -946,6 +946,10 @@ public FileInfo getFileInfo(long fileId)
@Override
public FileInfo getFileInfo(AlluxioURI path, GetStatusContext context)
throws FileDoesNotExistException, InvalidPathException, AccessControlException, IOException {
if (context.getOptions().hasDirectUfsAccess() && context.getOptions().getDirectUfsAccess()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 2.x, there is still no link from the master to the ufs request. Does this break the current architecture? Regardless of worker or proxy, they all get the mount id and attributes from the master and request ufs independently.

}
crc64.update(bf.array(), Math.toIntExact(bytesRead));
int permits = Math.toIntExact(Math.max(1, bytesRead / 1024));
if (mChecksumCalculationRateLimiter.isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to obtain the token before reading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's hard to know how many bytes read there so i decide to obtain the token after that.

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • First word of title ("CRC64") is not an imperative verb. Please use one of the valid words
      • Supported title prefixes are: [WIP], [SMALLFIX], [DOCFIX]
  • Commits associated with Github account: PASS

Some checks failed. Please fix the reported issues and reply
alluxio-bot, check this please
to re-run checks.

@elega elega changed the title [DRAFT] CRC64 checksum Add CRC64 checksum check Sep 5, 2024
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

@elega
Copy link
Contributor Author

elega commented Sep 5, 2024

alluxio-bot, merge this please.

@alluxio-bot
Copy link
Contributor

merge failed:
Merge refused because pull request does not have label start with type-

@elega elega added the type-feature This issue is a feature request label Sep 5, 2024
@elega
Copy link
Contributor Author

elega commented Sep 5, 2024

alluxio-bot, merge this please.

@alluxio-bot alluxio-bot merged commit 1a3fbc9 into Alluxio:master-2.x Sep 5, 2024
19 checks passed
@elega
Copy link
Contributor Author

elega commented Sep 5, 2024

alluxio-bot, cherry-pick this to branch-2.10 please.

@alluxio-bot
Copy link
Contributor

Auto cherry-pick unsuccessful Failed to setup local git for auto cherry-pick 1a3fbc9 cannot be auto cherry-picked to branch branch-2.10. Please cherry-pick this manually.

elega added a commit to elega/alluxio that referenced this pull request Sep 5, 2024
![image](https://github.com/user-attachments/assets/9230d1fb-c522-4d21-b382-9e06dc9d83ee)

To check the file integrity

Add a crc64 check cli
			pr-link: Alluxio#18682
			change-id: cid-4eb62e603591f3235b92e4191c5b511b534f6fc3
@elega elega mentioned this pull request Sep 5, 2024
elega added a commit to elega/alluxio that referenced this pull request Sep 5, 2024
![image](https://github.com/user-attachments/assets/9230d1fb-c522-4d21-b382-9e06dc9d83ee)

To check the file integrity

Add a crc64 check cli
			pr-link: Alluxio#18682
			change-id: cid-4eb62e603591f3235b92e4191c5b511b534f6fc3
alluxio-bot pushed a commit that referenced this pull request Sep 5, 2024
Cherry-pick of existing commit.
orig-pr: #18682
orig-commit: 1a3fbc9
orig-commit-author: elega <[email protected]>

			pr-link: #18688
			change-id: cid-4eb62e603591f3235b92e4191c5b511b534f6fc3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants