Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid get block info when positioned read for first retry #18417

Merged
merged 12 commits into from
Jan 12, 2024

Conversation

maobaolong
Copy link
Contributor

@maobaolong maobaolong commented Nov 15, 2023

What changes are proposed in this pull request?

Reduce talk to master to improve the performance for small file scenario.

I guess this is introduced by ed53d54

Why are the changes needed?

The block info can be get from file status, so we should not ask it from master every time.

Does this PR introduce any user facing changes?

No

@maobaolong
Copy link
Contributor Author

@jiacheliu3 @dbw9580 Would you like to take a look at this PR?

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment on lines 305 to 306
BlockInfo blockInfo = lastException == null
? mStatus.getBlockInfo(blockId) : mBlockStore.getInfo(blockId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbw9580 do you think it's possible that we can use mStatus.getBlockInfo(blockId) even if lastException != null?

@@ -302,11 +302,15 @@ private int positionedReadInternal(long pos, byte[] b, int off, int len) throws
try {
// Positioned read may be called multiple times for the same block. Caching the in-stream
// allows us to avoid the block store rpc to open a new stream for each call.
BlockInfo blockInfo = lastException == null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a threshold of outdate time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateStream check outdate time

@maobaolong
Copy link
Contributor Author

This looks like flaky test.

2023-12-08T01:07:46.0662458Z 01:07:46.062 [ERROR] Tests run: 5, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 20.478 s <<< FAILURE! - in alluxio.client.fs.TtlIntegrationTest
2023-12-08T01:07:46.0678588Z 01:07:46.065 [ERROR] alluxio.client.fs.TtlIntegrationTest.expireManyAfterFree  Time elapsed: 3.654 s  <<< FAILURE!
2023-12-08T01:07:46.0715684Z java.lang.AssertionError: expected:<0> but was:<100>
2023-12-08T01:07:46.0719774Z 	at org.junit.Assert.fail(Assert.java:89)
2023-12-08T01:07:46.0720851Z 	at org.junit.Assert.failNotEquals(Assert.java:835)
2023-12-08T01:07:46.0722218Z 	at org.junit.Assert.assertEquals(Assert.java:647)
2023-12-08T01:07:46.0738580Z 	at org.junit.Assert.assertEquals(Assert.java:633)
2023-12-08T01:07:46.0739849Z 	at alluxio.client.fs.TtlIntegrationTest.expireManyAfterFree(TtlIntegrationTest.java:156)
2023-12-08T01:07:46.0741365Z 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-12-08T01:07:46.0742908Z 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2023-12-08T01:07:46.0744595Z 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-12-08T01:07:46.0754912Z 	at java.lang.reflect.Method.invoke(Method.java:498)
2023-12-08T01:07:46.0756028Z 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2023-12-08T01:07:46.0758258Z 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2023-12-08T01:07:46.0759605Z 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2023-12-08T01:07:46.0760947Z 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2023-12-08T01:07:46.0762341Z 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2023-12-08T01:07:46.0763542Z 	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
2023-12-08T01:07:46.0764830Z 	at alluxio.testutils.LocalAlluxioClusterResource$1.evaluate(LocalAlluxioClusterResource.java:202)
2023-12-08T01:07:46.0766601Z 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
2023-12-08T01:07:46.0768171Z 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
2023-12-08T01:07:46.0769442Z 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2023-12-08T01:07:46.0770172Z 	at java.lang.Thread.run(Thread.java:748)
2023-12-08T01:07:46.0770552Z 
2023-12-08T01:07:46.0771437Z 01:07:46.070 [ERROR] alluxio.client.fs.TtlIntegrationTest.expireManyAfterFree  Time elapsed: 2.302 s  <<< FAILURE!
2023-12-08T01:07:46.0772885Z java.lang.AssertionError: expected:<0> but was:<100>
2023-12-08T01:07:46.0773595Z 	at org.junit.Assert.fail(Assert.java:89)
2023-12-08T01:07:46.0774081Z 	at org.junit.Assert.failNotEquals(Assert.java:835)
2023-12-08T01:07:46.0774688Z 	at org.junit.Assert.assertEquals(Assert.java:647)
2023-12-08T01:07:46.0775158Z 	at org.junit.Assert.assertEquals(Assert.java:633)
2023-12-08T01:07:46.0775834Z 	at alluxio.client.fs.TtlIntegrationTest.expireManyAfterFree(TtlIntegrationTest.java:156)
2023-12-08T01:07:46.0776575Z 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-12-08T01:07:46.0777389Z 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2023-12-08T01:07:46.0778215Z 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-12-08T01:07:46.0778893Z 	at java.lang.reflect.Method.invoke(Method.java:498)
2023-12-08T01:07:46.0779557Z 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2023-12-08T01:07:46.0780470Z 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2023-12-08T01:07:46.0781323Z 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2023-12-08T01:07:46.0782190Z 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2023-12-08T01:07:46.0783009Z 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2023-12-08T01:07:46.0783742Z 	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
2023-12-08T01:07:46.0784555Z 	at alluxio.testutils.LocalAlluxioClusterResource$1.evaluate(LocalAlluxioClusterResource.java:202)
2023-12-08T01:07:46.0785513Z 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
2023-12-08T01:07:46.0786480Z 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
2023-12-08T01:07:46.0787246Z 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2023-12-08T01:07:46.0787728Z 	at java.lang.Thread.run(Thread.java:748)
2023-12-08T01:07:46.0787973Z 

Copy link
Contributor

@dbw9580 dbw9580 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Can you add a new unit test to cover this new configuration and its behavior?

core/common/src/main/java/alluxio/conf/PropertyKey.java Outdated Show resolved Hide resolved
Copy link
Contributor

@dbw9580 dbw9580 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiacheliu3 jiacheliu3 added the type-performance This issue is about performance related issues or feature request label Dec 22, 2023
@jiacheliu3
Copy link
Contributor

good to merge after all checks are green

@jiacheliu3
Copy link
Contributor

the new UT doesn't work?

2023-12-25T02:32:20.8789964Z 02:32:20.836 [ERROR] alluxio.client.file.AlluxioFileInStreamTest.testStatusOutdated[3]  Time elapsed: 0.009 s  <<< FAILURE!
2023-12-25T02:32:20.8790724Z java.lang.AssertionError
2023-12-25T02:32:20.8791066Z 	at org.junit.Assert.fail(Assert.java:87)
2023-12-25T02:32:20.8791505Z 	at org.junit.Assert.assertTrue(Assert.java:42)
2023-12-25T02:32:20.8792939Z 	at org.junit.Assert.assertTrue(Assert.java:53)
2023-12-25T02:32:20.8794200Z 	at alluxio.client.file.AlluxioFileInStreamTest.testStatusOutdated(AlluxioFileInStreamTest.java:974)
2023-12-25T02:32:20.8795530Z 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-12-25T02:32:20.8796290Z 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2023-12-25T02:32:20.8797135Z 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-12-25T02:32:20.8797830Z 	at java.lang.reflect.Method.invoke(Method.java:498)
2023-12-25T02:32:20.8798513Z 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2023-12-25T02:32:20.8799370Z 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2023-12-25T02:32:20.8800242Z 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2023-12-25T02:32:20.8801105Z 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2023-12-25T02:32:20.8801933Z 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2023-12-25T02:32:20.8802741Z 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2023-12-25T02:32:20.8803469Z 	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2023-12-25T02:32:20.8804431Z 	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
2023-12-25T02:32:20.8805176Z 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
2023-12-25T02:32:20.8805941Z 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
2023-12-25T02:32:20.8806781Z 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
2023-12-25T02:32:20.8807503Z 	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2023-12-25T02:32:20.8808287Z 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2023-12-25T02:32:20.8808966Z 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2023-12-25T02:32:20.8809629Z 	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2023-12-25T02:32:20.8810264Z 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2023-12-25T02:32:20.8810868Z 	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
2023-12-25T02:32:20.8811415Z 	at org.junit.runners.Suite.runChild(Suite.java:128)
2023-12-25T02:32:20.8811917Z 	at org.junit.runners.Suite.runChild(Suite.java:27)
2023-12-25T02:32:20.8812465Z 	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2023-12-25T02:32:20.8813081Z 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2023-12-25T02:32:20.8813732Z 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2023-12-25T02:32:20.8814377Z 	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2023-12-25T02:32:20.8815024Z 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2023-12-25T02:32:20.8815666Z 	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2023-12-25T02:32:20.8816274Z 	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
2023-12-25T02:32:20.8817174Z 	at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner$2.call(DelegatingPowerMockRunner.java:149)
2023-12-25T02:32:20.8818344Z 	at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner$2.call(DelegatingPowerMockRunner.java:141)
2023-12-25T02:32:20.8819636Z 	at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner.withContextClassLoader(DelegatingPowerMockRunner.java:132)
2023-12-25T02:32:20.8820918Z 	at org.powermock.modules.junit4.internal.impl.DelegatingPowerMockRunner.run(DelegatingPowerMockRunner.java:141)
2023-12-25T02:32:20.8822144Z 	at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:117)
2023-12-25T02:32:20.8823464Z 	at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:57)
2023-12-25T02:32:20.8824504Z 	at org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59)
2023-12-25T02:32:20.8825320Z 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
2023-12-25T02:32:20.8826219Z 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:289)
2023-12-25T02:32:20.8827149Z 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
2023-12-25T02:32:20.8828022Z 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
2023-12-25T02:32:20.8828895Z 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
2023-12-25T02:32:20.8829748Z 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
2023-12-25T02:32:20.8830530Z 	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
2023-12-25T02:32:20.8831289Z 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)

@maobaolong
Copy link
Contributor Author

@jiacheliu3 Glad to say the ci checks are all passed, Please talk a look whether this could be merged.

Copy link
Contributor

@jiacheliu3 jiacheliu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment. Good to merge once added. Thanks a lot for the work!

@@ -6307,6 +6307,11 @@ public String toString() {
.setConsistencyCheckLevel(ConsistencyCheckLevel.WARN)
.setScope(Scope.CLIENT)
.build();
public static final PropertyKey USER_FILE_IN_STREAM_STATUS_EXPIRATION_TIME =
durationBuilder(Name.USER_FILE_IN_STREAM_STATUS_EXPIRATION_TIME)
.setDefaultValue("5min")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever a property key is added, remember to add description:

setDescription("Specifies how long the file metadata can be cached and reused during the FileInStream. Once the specified expiration time has elapsed, the file metadata will be reloaded from the Alluxio master. The cache reduces the number of metadata requests to the Master. The default is 5 minutes.")

@jiacheliu3
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit e939a39 into Alluxio:master-2.x Jan 12, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-performance This issue is about performance related issues or feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants