-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support use default block size to replace the Hdfs ufs #14324
base: master-2.x
Are you sure you want to change the base?
Support use default block size to replace the Hdfs ufs #14324
Conversation
Codecov Report
@@ Coverage Diff @@
## master #14324 +/- ##
============================================
- Coverage 43.00% 34.62% -8.38%
+ Complexity 9296 7436 -1860
============================================
Files 1448 1479 +31
Lines 84205 86344 +2139
Branches 10178 10339 +161
============================================
- Hits 36209 29895 -6314
- Misses 44985 53925 +8940
+ Partials 3011 2524 -487
Continue to review full report at Codecov.
|
@@ -395,8 +395,12 @@ public UfsFileStatus getFileStatus(String path) throws IOException { | |||
FileStatus fs = hdfs.getFileStatus(tPath); | |||
String contentHash = | |||
UnderFileSystemUtils.approximateContentHash(fs.getLen(), fs.getModificationTime()); | |||
long blockSize = fs.getBlockSize(); | |||
if (mUfsConf.getBoolean(PropertyKey.USER_BLOCK_SIZE_ENABLED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this flag enabled, getFileStatus()
will start returning wrong block size. So I don't think this the right place to patch up.
What's the exact reason you want to override UFS block-size with Alluxio block-size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reference the ObjectUnderFileSystem#getFileStatus method.
The exact reason for customize the block size is that in our alluxio cluster, cephfs is our underfilesystem, the blockSize from cephfs is 4MB, so a 1 GB big file will cost 1GB/4MB blocks, i want to customize the alluxio block size, for example 64MB, it would be better to reduce the block count.
@ggezer Thanks for the previous review, I move the customize block size logic to high level. PTAL. |
/ping @ggezer |
fyi @HelloHorizon @beinan |
@@ -880,6 +880,9 @@ static void loadFileMetadataInternal(RpcContext rpcContext, LockedInodePath inod | |||
} | |||
ufsLength = ((UfsFileStatus) context.getUfsStatus()).getContentLength(); | |||
long blockSize = ((UfsFileStatus) context.getUfsStatus()).getBlockSize(); | |||
if (ServerConfiguration.getBoolean(PropertyKey.USER_BLOCK_SIZE_ENABLED)) { | |||
blockSize = ServerConfiguration.getBytes(PropertyKey.USER_BLOCK_SIZE_BYTES_DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 if you have different block sizes for different paths, you still need to update them with path config anyways so no need to create a new property for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the request! This setting makes sense to me. But it needs to respect the user-side config and path config in order to be really useful.
@@ -880,6 +880,9 @@ static void loadFileMetadataInternal(RpcContext rpcContext, LockedInodePath inod | |||
} | |||
ufsLength = ((UfsFileStatus) context.getUfsStatus()).getContentLength(); | |||
long blockSize = ((UfsFileStatus) context.getUfsStatus()).getBlockSize(); | |||
if (ServerConfiguration.getBoolean(PropertyKey.USER_BLOCK_SIZE_ENABLED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this only reads the server-side default, so it doesn't take your path config or configuration from your client command. This defeats the purpose of having this property.
@beinan @yuzhu Do you know how to make this respect both the client-side config and path config? I'm afraid making this respect user-side config needs this property value to be in message FileSystemCommonPOptions
. Getting the path config is easier than modifying RPC, but I forgot how :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to pick up the correct path-specific setting, we will need to resolve the config for the path. The path configs are returned from master
alluxio/core/server/master/src/main/java/alluxio/master/meta/DefaultMetaMaster.java
Line 430 in ef6b7ef
PathPropertiesView pathProperties = mPathProperties.snapshot(); |
to the client
return new SpecificPathConfiguration(getClientContext().getClusterConf(), |
Then the client will resolve the path config. This is how you set
sync.interval
using alluxio pathConf add /path
then how that takes effect.
The annoying thing is, in the sync process the master is actually the client that needs to do the path resolution, and currently there's no API you can directly call. @maobaolong I will take over this part and add one API for you to use. So in your PR here it should be just an easy call.
After a few discussions on this feature, we decided to still enable users to configure this per path, because at first it's hard to set the block size correctly without need to override in the future. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
@jiacheliu3 We have a use case about this PR recently, lets discuss whether we need this PR in os repo. |
ack. let's discuss this on our next community meeting |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
please do not mark this stale |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
Co-authored-by: Jiacheng Liu <[email protected]>
Co-authored-by: Jiacheng Liu <[email protected]>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
What changes are proposed in this pull request?
Now the cephfs, ozone and cosn are all implemented based on hdfs ufs, but we don‘t want alluxio use the ufs block size, for example, our block size of ceph is 4MB, which is too small for alluxio, we should supply a way to customize hdfs-like ufs block size for alluxio.
Why are the changes needed?
Add a propertyKey to enable the user default block size.
Does this PR introduce any user facing changes?
set
alluxio.user.block.size.enabled
to true to enable using the user block size for cephfs, ozone, hdfs, cosn.