Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support use default block size to replace the Hdfs ufs #14324

Open
wants to merge 4 commits into
base: master-2.x
Choose a base branch
from

Conversation

maobaolong
Copy link
Contributor

What changes are proposed in this pull request?

Now the cephfs, ozone and cosn are all implemented based on hdfs ufs, but we don‘t want alluxio use the ufs block size, for example, our block size of ceph is 4MB, which is too small for alluxio, we should supply a way to customize hdfs-like ufs block size for alluxio.

Why are the changes needed?

Add a propertyKey to enable the user default block size.

Does this PR introduce any user facing changes?

set alluxio.user.block.size.enabled to true to enable using the user block size for cephfs, ozone, hdfs, cosn.

@alluxio-bot alluxio-bot added the API Change Changes covering public API label Oct 28, 2021
@codecov-commenter
Copy link

codecov-commenter commented Oct 28, 2021

Codecov Report

Merging #14324 (e1e2060) into master (17e0542) will decrease coverage by 8.37%.
The diff coverage is 60.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #14324      +/-   ##
============================================
- Coverage     43.00%   34.62%   -8.38%     
+ Complexity     9296     7436    -1860     
============================================
  Files          1448     1479      +31     
  Lines         84205    86344    +2139     
  Branches      10178    10339     +161     
============================================
- Hits          36209    29895    -6314     
- Misses        44985    53925    +8940     
+ Partials       3011     2524     -487     
Impacted Files Coverage Δ
...java/alluxio/underfs/hdfs/HdfsUnderFileSystem.java 13.86% <0.00%> (-0.13%) ⬇️
...common/src/main/java/alluxio/conf/PropertyKey.java 99.43% <100.00%> (+0.01%) ⬆️
.../main/java/alluxio/master/file/meta/InodeView.java 0.00% <0.00%> (-100.00%) ⬇️
...main/java/alluxio/master/metastore/BlockStore.java 0.00% <0.00%> (-100.00%) ⬇️
...ain/java/alluxio/master/table/CatalogProperty.java 0.00% <0.00%> (-100.00%) ⬇️
...ain/java/alluxio/master/table/PartitionScheme.java 0.00% <0.00%> (-100.00%) ⬇️
...java/alluxio/master/file/BlockDeletionContext.java 0.00% <0.00%> (-100.00%) ⬇️
...java/alluxio/master/file/contexts/CallTracker.java 0.00% <0.00%> (-100.00%) ⬇️
.../alluxio/master/file/meta/LockedInodePathList.java 0.00% <0.00%> (-100.00%) ⬇️
...a/alluxio/master/table/PartitionedTableScheme.java 0.00% <0.00%> (-100.00%) ⬇️
... and 205 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17e0542...e1e2060. Read the comment docs.

@LuQQiu LuQQiu requested review from jiacheliu3 and ggezer November 2, 2021 02:01
@@ -395,8 +395,12 @@ public UfsFileStatus getFileStatus(String path) throws IOException {
FileStatus fs = hdfs.getFileStatus(tPath);
String contentHash =
UnderFileSystemUtils.approximateContentHash(fs.getLen(), fs.getModificationTime());
long blockSize = fs.getBlockSize();
if (mUfsConf.getBoolean(PropertyKey.USER_BLOCK_SIZE_ENABLED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this flag enabled, getFileStatus() will start returning wrong block size. So I don't think this the right place to patch up.

What's the exact reason you want to override UFS block-size with Alluxio block-size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I just reference the ObjectUnderFileSystem#getFileStatus method.

The exact reason for customize the block size is that in our alluxio cluster, cephfs is our underfilesystem, the blockSize from cephfs is 4MB, so a 1 GB big file will cost 1GB/4MB blocks, i want to customize the alluxio block size, for example 64MB, it would be better to reduce the block count.

@maobaolong
Copy link
Contributor Author

@ggezer Thanks for the previous review, I move the customize block size logic to high level. PTAL.

@maobaolong maobaolong requested a review from ggezer December 22, 2021 03:17
@jiacheliu3
Copy link
Contributor

/ping @ggezer

@jiacheliu3
Copy link
Contributor

fyi @HelloHorizon @beinan

@@ -880,6 +880,9 @@ static void loadFileMetadataInternal(RpcContext rpcContext, LockedInodePath inod
}
ufsLength = ((UfsFileStatus) context.getUfsStatus()).getContentLength();
long blockSize = ((UfsFileStatus) context.getUfsStatus()).getBlockSize();
if (ServerConfiguration.getBoolean(PropertyKey.USER_BLOCK_SIZE_ENABLED)) {
blockSize = ServerConfiguration.getBytes(PropertyKey.USER_BLOCK_SIZE_BYTES_DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggezer @yuzhu I have the feeling that the in-Alluxio block size should always be the Alluxio configuration block size, instead of the UFS one. If the user is not happy with the size config then he/she can use Alluxio path config to update that. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if you have different block sizes for different paths, you still need to update them with path config anyways so no need to create a new property for this.

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the request! This setting makes sense to me. But it needs to respect the user-side config and path config in order to be really useful.

core/common/src/main/java/alluxio/conf/PropertyKey.java Outdated Show resolved Hide resolved
core/common/src/main/java/alluxio/conf/PropertyKey.java Outdated Show resolved Hide resolved
@@ -880,6 +880,9 @@ static void loadFileMetadataInternal(RpcContext rpcContext, LockedInodePath inod
}
ufsLength = ((UfsFileStatus) context.getUfsStatus()).getContentLength();
long blockSize = ((UfsFileStatus) context.getUfsStatus()).getBlockSize();
if (ServerConfiguration.getBoolean(PropertyKey.USER_BLOCK_SIZE_ENABLED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only reads the server-side default, so it doesn't take your path config or configuration from your client command. This defeats the purpose of having this property.

@beinan @yuzhu Do you know how to make this respect both the client-side config and path config? I'm afraid making this respect user-side config needs this property value to be in message FileSystemCommonPOptions. Getting the path config is easier than modifying RPC, but I forgot how :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to pick up the correct path-specific setting, we will need to resolve the config for the path. The path configs are returned from master

PathPropertiesView pathProperties = mPathProperties.snapshot();

to the client
return new SpecificPathConfiguration(getClientContext().getClusterConf(),

Then the client will resolve the path config. This is how you set sync.interval using alluxio pathConf add /path then how that takes effect.

The annoying thing is, in the sync process the master is actually the client that needs to do the path resolution, and currently there's no API you can directly call. @maobaolong I will take over this part and add one API for you to use. So in your PR here it should be just an easy call.

@jiacheliu3
Copy link
Contributor

After a few discussions on this feature, we decided to still enable users to configure this per path, because at first it's hard to set the block size correctly without need to override in the future.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Jan 27, 2023
@maobaolong
Copy link
Contributor Author

@jiacheliu3 We have a use case about this PR recently, lets discuss whether we need this PR in os repo.

@github-actions github-actions bot removed the stale The PR/Issue does not have recent activities and will be closed automatically label Mar 7, 2023
@elega
Copy link
Contributor

elega commented Mar 8, 2023

ack. let's discuss this on our next community meeting

@github-actions
Copy link

github-actions bot commented Apr 8, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Apr 8, 2023
@maobaolong
Copy link
Contributor Author

please do not mark this stale

@github-actions github-actions bot removed the stale The PR/Issue does not have recent activities and will be closed automatically label Apr 9, 2023
@github-actions
Copy link

github-actions bot commented Jun 5, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Jun 5, 2023
@github-actions github-actions bot removed the stale The PR/Issue does not have recent activities and will be closed automatically label Jun 12, 2024
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Change Changes covering public API stale The PR/Issue does not have recent activities and will be closed automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants