Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make alluxio client return block location count configurable #18448

Open
wants to merge 4 commits into
base: master-2.x
Choose a base branch
from

Conversation

codings-dan
Copy link
Contributor

What changes are proposed in this pull request?

make alluxio client return block location count configurable

Why are the changes needed?

When accessing alluxio in the hive + yarn + mr scenario, the metadata will be accessed and split before the job is submitted, which will generate the job.splitmetainfo file. This file should not be too large. When accessing alluxio, if the data does not exist in alluxio, all worker address information will be returned, which will expand the size of the generated job.splitmetainfo file and eventually result in an error as below. This PR makes the number of returned worker addresses configurable

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Split metadata size exceeded 50000000. Aborting job job_1698927225178_1330717

Does this PR introduce any user facing changes?

No

@codings-dan
Copy link
Contributor Author

@maobaolong @jiacheliu3 @dbw9580 Could you help take a look at the pull request, thanks in advance!

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM with some comments, thanks for the improvement

@@ -6843,6 +6843,15 @@ public String toString() {
.setConsistencyCheckLevel(ConsistencyCheckLevel.WARN)
.setScope(Scope.CLIENT)
.build();
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_COUNT =
intBuilder(Name.USER_UFS_BLOCK_LOCATION_RETURN_COUNT)
.setDefaultValue(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just set default to Integer.MAX_VALUE and that means no limit

@@ -288,8 +288,14 @@ public List<BlockLocationInfo> getBlockLocations(URIStatus status)
if (locations.isEmpty() && mFsContext.getPathConf(new AlluxioURI(status.getPath()))
.getBoolean(PropertyKey.USER_UFS_BLOCK_LOCATION_ALL_FALLBACK_ENABLED)) {
// Case 2: Fallback to add all workers to locations so some apps (Impala) won't panic.
locations.addAll(getHostWorkerMap().values());
Collections.shuffle(locations);
List<WorkerNetAddress> addresses = new ArrayList<>(getHostWorkerMap().values());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is copy No.1

Comment on lines 294 to 298
int count = mFsContext.getClusterConf().getInt(
PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_COUNT);
count = count >= 0 ? count : Integer.MAX_VALUE;
addresses = addresses.subList(0, Math.min(addresses.size(), count));
locations.addAll(addresses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is copy No.2

Can you make sure there's only 1 copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -6843,6 +6843,15 @@ public String toString() {
.setConsistencyCheckLevel(ConsistencyCheckLevel.WARN)
.setScope(Scope.CLIENT)
.build();
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_COUNT =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_COUNT =
public static final PropertyKey USER_UFS_BLOCK_LOCATION_RETURN_LIMIT =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: FAIL
    • It looks like your commits can't be linked to a valid Github account.
      Your commits are made with the email [email protected], which does not allow your contribution to be tracked by Github.
      See this link for possible reasons this might be happening.
      To change the author email address that your most recent commit was made under, you can run:
      git -c user.name="Name" -c user.email="Email" commit --amend --reset-author
      See this answer for more details about how to change commit email addresses.
      Once the author email address has been updated, update the pull request by running:
      git push --force https://github.com/codings-dan/alluxio.git alluxio-community-client-block-loc-configurable

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@codings-dan
Copy link
Contributor Author

@jiacheliu3 Thanks a lot for reviewing the pull request! PTAL

@codings-dan codings-dan force-pushed the alluxio-community-client-block-loc-configurable branch 2 times, most recently from 685323e to 6a9e7c0 Compare December 13, 2023 11:17
@codings-dan codings-dan force-pushed the alluxio-community-client-block-loc-configurable branch from 6a9e7c0 to 940bbe6 Compare December 13, 2023 11:20
Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

@@ -288,15 +288,27 @@ public List<BlockLocationInfo> getBlockLocations(URIStatus status)
if (locations.isEmpty() && mFsContext.getPathConf(new AlluxioURI(status.getPath()))
.getBoolean(PropertyKey.USER_UFS_BLOCK_LOCATION_ALL_FALLBACK_ENABLED)) {
// Case 2: Fallback to add all workers to locations so some apps (Impala) won't panic.
locations.addAll(getHostWorkerMap().values());
Collections.shuffle(locations);
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_LIMIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_LIMIT;
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_FALLBACK_RETURN_LIMIT;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename this property key name so it better reflects what it does in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

+ " should not be set to a negative number");
}
List<WorkerNetAddress> addresses = getShuffleWorkerAddressList();
locations.addAll(addresses.subList(0, Math.min(addresses.size(), count)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line has many copies, you can just write an easy for loop

for (i = 0; i < count; i++) {
  locations.add(addresses.get(i));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

locations.addAll(getHostWorkerMap().values());
Collections.shuffle(locations);
PropertyKey locKey = PropertyKey.USER_UFS_BLOCK_LOCATION_RETURN_LIMIT;
int count = mFsContext.getClusterConf().getInt(locKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use path conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 307 to 309
List<BlockWorkerInfo> workers = mFsContext.getCachedWorkers();
Collections.shuffle(workers);
return workers.stream().map(BlockWorkerInfo::getNetAddress).collect(toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keep using getHostWorkerMap()? This new code may not keep the same behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove the method and use getHostWorkerMap()

@@ -288,8 +288,21 @@ public List<BlockLocationInfo> getBlockLocations(URIStatus status)
if (locations.isEmpty() && mFsContext.getPathConf(new AlluxioURI(status.getPath()))
.getBoolean(PropertyKey.USER_UFS_BLOCK_LOCATION_ALL_FALLBACK_ENABLED)) {
// Case 2: Fallback to add all workers to locations so some apps (Impala) won't panic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this situation occurs, can you directly return the default value defined here? This is more friendly to Impala's file handle cache and data cache. This is because Impala will use consistent hash to schedule data scan fragment, and random block location returns will reduce the cache hit rate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I don't quite understand the change method you are talking about, or is this what you are talking about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants