-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make alluxio client return block location count configurable #18448
base: master-2.x
Are you sure you want to change the base?
Make alluxio client return block location count configurable #18448
Conversation
@maobaolong @jiacheliu3 @dbw9580 Could you help take a look at the pull request, thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM with some comments, thanks for the improvement
@@ -6843,6 +6843,15 @@ public String toString() { | |||
.setConsistencyCheckLevel(ConsistencyCheckLevel.WARN) | |||
.setScope(Scope.CLIENT) | |||
.build(); | |||
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_COUNT = | |||
intBuilder(Name.USER_UFS_BLOCK_LOCATION_RETURN_COUNT) | |||
.setDefaultValue(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just set default to Integer.MAX_VALUE and that means no limit
@@ -288,8 +288,14 @@ public List<BlockLocationInfo> getBlockLocations(URIStatus status) | |||
if (locations.isEmpty() && mFsContext.getPathConf(new AlluxioURI(status.getPath())) | |||
.getBoolean(PropertyKey.USER_UFS_BLOCK_LOCATION_ALL_FALLBACK_ENABLED)) { | |||
// Case 2: Fallback to add all workers to locations so some apps (Impala) won't panic. | |||
locations.addAll(getHostWorkerMap().values()); | |||
Collections.shuffle(locations); | |||
List<WorkerNetAddress> addresses = new ArrayList<>(getHostWorkerMap().values()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is copy No.1
int count = mFsContext.getClusterConf().getInt( | ||
PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_COUNT); | ||
count = count >= 0 ? count : Integer.MAX_VALUE; | ||
addresses = addresses.subList(0, Math.min(addresses.size(), count)); | ||
locations.addAll(addresses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is copy No.2
Can you make sure there's only 1 copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -6843,6 +6843,15 @@ public String toString() { | |||
.setConsistencyCheckLevel(ConsistencyCheckLevel.WARN) | |||
.setScope(Scope.CLIENT) | |||
.build(); | |||
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_COUNT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_COUNT = | |
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_LIMIT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
@jiacheliu3 Thanks a lot for reviewing the pull request! PTAL |
685323e
to
6a9e7c0
Compare
6a9e7c0
to
940bbe6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
@@ -288,15 +288,27 @@ public List<BlockLocationInfo> getBlockLocations(URIStatus status) | |||
if (locations.isEmpty() && mFsContext.getPathConf(new AlluxioURI(status.getPath())) | |||
.getBoolean(PropertyKey.USER_UFS_BLOCK_LOCATION_ALL_FALLBACK_ENABLED)) { | |||
// Case 2: Fallback to add all workers to locations so some apps (Impala) won't panic. | |||
locations.addAll(getHostWorkerMap().values()); | |||
Collections.shuffle(locations); | |||
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_LIMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_LIMIT; | |
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_FALLBACK_RETURN_LIMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's rename this property key name so it better reflects what it does in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
+ " should not be set to a negative number"); | ||
} | ||
List<WorkerNetAddress> addresses = getShuffleWorkerAddressList(); | ||
locations.addAll(addresses.subList(0, Math.min(addresses.size(), count))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line has many copies, you can just write an easy for loop
for (i = 0; i < count; i++) {
locations.add(addresses.get(i));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
locations.addAll(getHostWorkerMap().values()); | ||
Collections.shuffle(locations); | ||
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_LIMIT; | ||
int count = mFsContext.getClusterConf().getInt(locKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use path conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
List<BlockWorkerInfo> workers = mFsContext.getCachedWorkers(); | ||
Collections.shuffle(workers); | ||
return workers.stream().map(BlockWorkerInfo::getNetAddress).collect(toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not keep using getHostWorkerMap()
? This new code may not keep the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove the method and use getHostWorkerMap()
@@ -288,8 +288,21 @@ public List<BlockLocationInfo> getBlockLocations(URIStatus status) | |||
if (locations.isEmpty() && mFsContext.getPathConf(new AlluxioURI(status.getPath())) | |||
.getBoolean(PropertyKey.USER_UFS_BLOCK_LOCATION_ALL_FALLBACK_ENABLED)) { | |||
// Case 2: Fallback to add all workers to locations so some apps (Impala) won't panic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this situation occurs, can you directly return the default value defined here? This is more friendly to Impala's file handle cache and data cache. This is because Impala will use consistent hash to schedule data scan fragment, and random block location returns will reduce the cache hit rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I don't quite understand the change method you are talking about, or is this what you are talking about?
What changes are proposed in this pull request?
make alluxio client return block location count configurable
Why are the changes needed?
When accessing alluxio in the hive + yarn + mr scenario, the metadata will be accessed and split before the job is submitted, which will generate the job.splitmetainfo file. This file should not be too large. When accessing alluxio, if the data does not exist in alluxio, all worker address information will be returned, which will expand the size of the generated job.splitmetainfo file and eventually result in an error as below. This PR makes the number of returned worker addresses configurable
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Split metadata size exceeded 50000000. Aborting job job_1698927225178_1330717
Does this PR introduce any user facing changes?
No