Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](catalog)Fixes query failures for Paimon tables stored in Kerberized HDFS #47192

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

CalvinKirs
Copy link
Member

@CalvinKirs CalvinKirs commented Jan 18, 2025

What problem does this PR solve?

Using JNI to directly read Paimon tables can lead to query failures when the Paimon storage is on HDFS with Kerberos authentication enabled.

Reproduction Steps:

  • Create a Paimon catalog stored on an HDFS cluster with Kerberos authentication enabled.
  • Execute the command: SET force_jni_scanner=true;.
  • To ensure a clean environment, restart the BE (Backend) service.
  • Perform any query on a table within the catalog.
2025-01-18 09:25:06  WARN Thread-13 org.apache.doris.paimon.PaimonJniScanner.open(PaimonJniScanner.java:126) - Failed to open paimon_scanner: java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
com.google.common.util.concurrent.UncheckedExecutionException: java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
        at org.apache.doris.paimon.PaimonTableCache.getTable(PaimonTableCache.java:84)
        at org.apache.doris.paimon.PaimonJniScanner.initTable(PaimonJniScanner.java:237)
        at org.apache.doris.paimon.PaimonJniScanner.open(PaimonJniScanner.java:122)
Caused by: java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at org.apache.paimon.hive.HiveCatalog.createHiveCatalog(HiveCatalog.java:708)
        at org.apache.paimon.hive.HiveCatalogFactory.create(HiveCatalogFactory.java:48)
        at org.apache.paimon.catalog.CatalogFactory.createCatalog(CatalogFactory.java:76)
        at org.apache.paimon.catalog.CatalogFactory.createCatalog(CatalogFactory.java:66)
        at org.apache.doris.paimon.PaimonTableCache.createCatalog(PaimonTableCache.java:75)
        at org.apache.doris.paimon.PaimonTableCache.loadTable(PaimonTableCache.java:58)
        at org.apache.doris.paimon.PaimonTableCache.access$000(PaimonTableCache.java:38)
        at org.apache.doris.paimon.PaimonTableCache$1.load(PaimonTableCache.java:51)
        at org.apache.doris.paimon.PaimonTableCache$1.load(PaimonTableCache.java:48)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3574)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079)
        ... 6 more

changes

This PR addresses an issue where queries fail when attempting to directly read Paimon tables using JNI, specifically in environments where HDFS is used as the storage backend and Kerberos authentication is enabled. The failure is caused by the lack of proper Kerberos authentication handling in the JNI implementation.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
      After updating the code, repeat the reproduction steps outlined above.
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 18, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32270 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b2b35eb2a419afe78b8e012607f5aff0a382f3a7, data reload: false

------ Round 1 ----------------------------------
q1	17572	5435	5324	5324
q2	2053	302	176	176
q3	10583	1238	713	713
q4	10205	964	509	509
q5	7551	2339	2167	2167
q6	193	163	132	132
q7	896	740	602	602
q8	9227	1326	1105	1105
q9	5163	4893	4874	4874
q10	6860	2336	1897	1897
q11	476	259	252	252
q12	348	360	214	214
q13	17782	3686	3118	3118
q14	234	222	215	215
q15	518	481	473	473
q16	626	611	587	587
q17	569	864	338	338
q18	6981	6665	6612	6612
q19	3000	928	513	513
q20	303	311	193	193
q21	2665	2127	1964	1964
q22	364	329	292	292
Total cold run time: 104169 ms
Total hot run time: 32270 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5611	5470	5455	5455
q2	232	332	240	240
q3	2289	2662	2353	2353
q4	1430	1795	1366	1366
q5	4314	4739	4657	4657
q6	171	159	130	130
q7	2077	1989	1804	1804
q8	2629	2829	2686	2686
q9	7376	7249	7359	7249
q10	3028	3260	2772	2772
q11	561	516	473	473
q12	657	819	672	672
q13	3539	3900	3238	3238
q14	294	312	277	277
q15	510	485	449	449
q16	641	709	662	662
q17	1202	1725	1261	1261
q18	7568	7523	7276	7276
q19	772	1183	1068	1068
q20	2040	2027	1883	1883
q21	5726	5145	4908	4908
q22	615	628	563	563
Total cold run time: 53282 ms
Total hot run time: 51442 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b2b35eb2a419afe78b8e012607f5aff0a382f3a7, data reload: false

query1	1312	960	970	960
query2	6297	2018	2019	2018
query3	11077	4742	4528	4528
query4	33315	23414	22843	22843
query5	3544	610	462	462
query6	300	200	178	178
query7	3989	482	319	319
query8	303	245	247	245
query9	9314	2543	2553	2543
query10	466	298	249	249
query11	17614	15244	15003	15003
query12	158	104	102	102
query13	1560	499	400	400
query14	9752	7075	6802	6802
query15	248	212	193	193
query16	7789	674	526	526
query17	1592	788	579	579
query18	2085	396	310	310
query19	208	196	163	163
query20	120	127	119	119
query21	204	132	109	109
query22	4651	4397	4320	4320
query23	34920	33643	33531	33531
query24	6525	2378	2288	2288
query25	500	450	391	391
query26	849	251	165	165
query27	2083	473	336	336
query28	5329	2444	2455	2444
query29	560	536	420	420
query30	215	186	168	168
query31	964	885	846	846
query32	70	64	57	57
query33	481	357	314	314
query34	739	894	509	509
query35	845	853	740	740
query36	997	1015	952	952
query37	121	93	82	82
query38	4268	4442	4194	4194
query39	1539	1415	1436	1415
query40	209	123	101	101
query41	50	53	63	53
query42	120	99	94	94
query43	527	534	501	501
query44	1317	811	803	803
query45	182	175	170	170
query46	870	1064	653	653
query47	1901	1912	1856	1856
query48	389	416	317	317
query49	725	494	410	410
query50	635	666	407	407
query51	7185	7081	7110	7081
query52	103	104	92	92
query53	228	251	184	184
query54	498	496	430	430
query55	85	77	83	77
query56	270	260	249	249
query57	1137	1158	1137	1137
query58	245	229	235	229
query59	3216	3154	3240	3154
query60	285	293	295	293
query61	141	136	198	136
query62	727	702	662	662
query63	224	185	186	185
query64	3402	1019	644	644
query65	3254	3157	3145	3145
query66	924	403	296	296
query67	15622	15672	15624	15624
query68	5054	836	519	519
query69	496	295	263	263
query70	1206	1154	1148	1148
query71	376	284	250	250
query72	5850	3957	3870	3870
query73	620	742	351	351
query74	10086	9115	8811	8811
query75	3162	3127	2698	2698
query76	3047	1165	749	749
query77	457	363	267	267
query78	9997	9883	9274	9274
query79	3098	786	587	587
query80	669	526	440	440
query81	489	270	240	240
query82	558	151	132	132
query83	170	167	165	165
query84	233	92	82	82
query85	787	425	303	303
query86	394	311	281	281
query87	4406	4436	4685	4436
query88	4969	2104	2083	2083
query89	395	318	291	291
query90	1789	191	187	187
query91	141	135	109	109
query92	64	54	53	53
query93	2587	886	525	525
query94	697	401	312	312
query95	351	262	253	253
query96	497	601	284	284
query97	2783	2838	2733	2733
query98	226	208	198	198
query99	1276	1348	1264	1264
Total cold run time: 288304 ms
Total hot run time: 193964 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b2b35eb2a419afe78b8e012607f5aff0a382f3a7, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.23	0.07	0.06
query4	1.63	0.10	0.11
query5	0.42	0.43	0.41
query6	1.14	0.65	0.65
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.59	0.50	0.50
query10	0.55	0.57	0.57
query11	0.14	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.61
query14	2.71	2.78	2.71
query15	0.90	0.83	0.82
query16	0.39	0.37	0.38
query17	0.99	1.04	1.05
query18	0.23	0.22	0.22
query19	1.97	1.86	1.97
query20	0.01	0.01	0.01
query21	15.36	0.94	0.58
query22	0.75	0.85	0.64
query23	15.25	1.43	0.60
query24	3.19	1.10	1.58
query25	0.15	0.26	0.16
query26	0.26	0.16	0.14
query27	0.06	0.05	0.06
query28	14.04	0.96	0.44
query29	12.61	3.95	3.25
query30	0.24	0.08	0.06
query31	2.84	0.62	0.39
query32	3.24	0.55	0.46
query33	3.03	3.04	3.02
query34	16.71	5.22	4.48
query35	4.64	4.51	4.54
query36	0.66	0.49	0.50
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.13	0.14
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 106.35 s
Total hot run time: 30.91 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants