-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CRC64 checksum check #18682
Add CRC64 checksum check #18682
Conversation
GetBlockChecksumRequest request) { | ||
// Non-batched mode | ||
if (request.getBlockIdsCount() < mChecksumBatchSize) { | ||
long startTs = CommonUtils.getCurrentMs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the client just send requests without doing logical processing? Would it be better to put this part of the logic processing on a higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i moved the logic to the util
* Heap buffer pool. | ||
*/ | ||
public class NioHeapBufferPool { | ||
private static final TreeMap<Integer, LinkedList<ByteBuffer>> BUF_POOL = new TreeMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current memory pool is only used by CRC, and the application will not be recycled. Will it occupy core business resources? I think CRC verification is a low-frequency non-core business that does not require high efficiency. Should I directly apply for heap memory temporarily? That’s it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a non-pooled option
import java.nio.ByteBuffer; | ||
import java.util.zip.Checksum; | ||
|
||
public class CRC64 implements Checksum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can crc64 not be used as a standard library function? Because different ufs crcs have different algorithms. For example, oss and cos are crc64, s3 is crc32, and hdfs has its own verification. Can we directly connect the verification algorithm of each ufs? Which one is better?
@@ -946,6 +946,10 @@ public FileInfo getFileInfo(long fileId) | |||
@Override | |||
public FileInfo getFileInfo(AlluxioURI path, GetStatusContext context) | |||
throws FileDoesNotExistException, InvalidPathException, AccessControlException, IOException { | |||
if (context.getOptions().hasDirectUfsAccess() && context.getOptions().getDirectUfsAccess()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 2.x, there is still no link from the master to the ufs request. Does this break the current architecture? Regardless of worker or proxy, they all get the mount id and attributes from the master and request ufs independently.
} | ||
crc64.update(bf.array(), Math.toIntExact(bytesRead)); | ||
int permits = Math.toIntExact(Math.max(1, bytesRead / 1024)); | ||
if (mChecksumCalculationRateLimiter.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to obtain the token before reading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's hard to know how many bytes read there so i decide to obtain the token after that.
Automated checks report:
Some checks failed. Please fix the reported issues and reply |
Automated checks report:
All checks passed! |
alluxio-bot, merge this please. |
merge failed: |
alluxio-bot, merge this please. |
alluxio-bot, cherry-pick this to branch-2.10 please. |
Auto cherry-pick unsuccessful Failed to setup local git for auto cherry-pick 1a3fbc9 cannot be auto cherry-picked to branch branch-2.10. Please cherry-pick this manually. |
![image](https://github.com/user-attachments/assets/9230d1fb-c522-4d21-b382-9e06dc9d83ee) To check the file integrity Add a crc64 check cli pr-link: Alluxio#18682 change-id: cid-4eb62e603591f3235b92e4191c5b511b534f6fc3
![image](https://github.com/user-attachments/assets/9230d1fb-c522-4d21-b382-9e06dc9d83ee) To check the file integrity Add a crc64 check cli pr-link: Alluxio#18682 change-id: cid-4eb62e603591f3235b92e4191c5b511b534f6fc3
Cherry-pick of existing commit. orig-pr: #18682 orig-commit: 1a3fbc9 orig-commit-author: elega <[email protected]> pr-link: #18688 change-id: cid-4eb62e603591f3235b92e4191c5b511b534f6fc3
What changes are proposed in this pull request?
Why are the changes needed?
To check the file integrity
Does this PR introduce any user facing changes?
Add a crc64 check cli