We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
跑的公司内部文件(类似于PPT格式,但是文字非常多,密集),不方便向您们传输这个文件。 最后报错了: [03/09 12:19:59 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /...//AImodels/opendatalab/PDF-Ext [03/09 12:19:59 fvcore.common.checkpoint]: [Checkpointer] Loading from /...//AImodels/opendatalab/PDF-Extract-Kit-1___0/mode 2025-03-09 12:20:00.168 | INFO | magic_pdf.model.pdf_extract_kit:init:174 - DocAnalysis init done! 2025-03-09 12:20:00.168 | INFO | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:130 - model init cost: 4.70296573638916 2025-03-09 12:20:00.168 | INFO | main:setup:172 - mineru 模型初始化完毕 2025-03-09 12:20:00.168 | INFO | main:setup:178 - paddleocr 模型开始加载 2025-03-09 12:20:00.411685506 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution provider 2025-03-09 12:20:00.411713547 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show n 2025-03-09 12:20:00.486 | INFO | main:setup:182 - paddleocr 模型初始化完毕 2025-03-09 12:20:00.486 | INFO | main:setup:185 - 表格结构解析模型 模型开始加载 2025-03-09 12:20:00.557 | INFO | main:setup:188 - 表格结构解析模型 模型初始化完毕 2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:224 - do parse , theoutput_dir:/...//AMinerU_SaveTemp/164620 2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:225 - 当前 do parse的进程号:3271185 2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:226 - 当前输入PDF文件字节大小:9.616772999999998 MB 2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:227 - 开始解析PDF 2025-03-09 12:20:00.780 | INFO | magic_pdf.data.dataset:init:156 - lang: None 2025-03-09 12:20:00.781 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:186 - gpu_memory: 24 GB, batch_ratio: 16 2025-03-09 12:20:42.444 | INFO | magic_pdf.model.batch_analyze:call:74 - layout time: 34.65, image num: 89 2025-03-09 12:23:55.439 | INFO | magic_pdf.model.batch_analyze:call:195 - det time: 192.16, image num: 1099 2025-03-09 12:23:55.885 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:241 - gc time: 0.44 2025-03-09 12:23:55.885 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:245 - doc analyze time: 235.1, speed: 0.38 pages/second 2025-03-09 12:23:57.995 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:938 - 需解析页数:89 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 191, bottom: 206, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 206, bottom: 221, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 221, bottom: 236, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 236, bottom: 251, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 251, bottom: 266, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 266, bottom: 281, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 281, bottom: 296, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 296, bottom: 311, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 311, bottom: 326, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 326, bottom: 341, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 341, bottom: 356, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 356, bottom: 371, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 371, bottom: 386, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 386, bottom: 401, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 401, bottom: 416, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 416, bottom: 431, page_w 2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 431, bottom: 446, page_w 2025-03-09 12:24:01.372 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 446, bottom: 461, page_w 2025-03-09 12:24:01.372 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 461, bottom: 476, page_w 2025-03-09 12:24:02.420 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 32, right: 238, top: 477, bottom: 485, page_w 2025-03-09 12:24:02.933 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 3, right: 340, top: 469.0, bottom: 485.0, pag 2025-03-09 12:24:03.565 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 2, right: 752, top: 107, bottom: 231.666666666 2025-03-09 12:24:03.565 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 2, right: 752, top: 231.66666666666669, bottom 2025-03-09 12:24:03.565 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 2, right: 752, top: 356.33333333333337, bottom 2025-03-09 12:24:04.086 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 0, right: 752, top: 92, bottom: 219.3333333333 2025-03-09 12:24:04.086 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 0, right: 752, top: 219.33333333333331, bottom 2025-03-09 12:24:04.087 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 0, right: 752, top: 346.66666666666663, bottom 2025-03-09 12:24:06.321 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 9, right: 369, top: 477.5, bottom: 494.0, pag 2025-03-09 12:24:07.407 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 7, right: 752, top: 350, bottom: 389.333333333 2025-03-09 12:24:07.407 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 7, right: 752, top: 389.3333333333333, bottom: 2025-03-09 12:24:07.407 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 7, right: 752, top: 428.66666666666663, bottom 2025-03-09 12:24:08.991 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 438, right: 677, top: 469, bottom: 483, page_w: 751.2000122070312, page_h: 482.0400085449219 2025-03-09 12:24:08.992 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 438, right: 677, top: 483, bottom: 497, page_w: 751.2000122070312, page_h: 482.0400085449219 2025-03-09 12:24:08.992 | INFO | main:task101_pdf_parse:371 - 高精度PDF解析出错: Invalid box. right: 901, left: 583, bottom: 1000, top: 1002 Traceback (most recent call last): File "/...//ai_servers/server01_mineruBase_Litserve.py", line 229, in task101_pdf_parse out = do_parse( File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/tools/common.py", line 151, in do_parse pipe_result = infer_result.pipe_txt_mode( File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/operators/models.py", line 102, in pipe_txt_mode res = self.apply( File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/operators/models.py", line 70, in apply return proc(copy.deepcopy(self._infer_res), *args, **kwargs) File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/operators/models.py", line 95, in proc res = pdf_parse_union(*args, **kwargs) File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/pdf_parse_union_core_v2.py", line 951, in pdf_parse_union page_info = parse_page_core( File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/pdf_parse_union_core_v2.py", line 869, in parse_page_core sorted_bboxes = sort_lines_by_model(fix_blocks, page_w, page_h, line_height) File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/pdf_parse_union_core_v2.py", line 557, in sort_lines_by_model 1000 >= right >= left >= 0 and 1000 >= bottom >= top >= 0 AssertionError: Invalid box. right: 901, left: 583, bottom: 1000, top: 1002
可能您需要检测检测框和PDF文件大小的问题。
不好意思不能提供pdf文件来复现。 mineru = 1.2.0
Linux
3.10
No response
cuda
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Description of the bug | 错误描述
跑的公司内部文件(类似于PPT格式,但是文字非常多,密集),不方便向您们传输这个文件。 最后报错了:
[03/09 12:19:59 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /...//AImodels/opendatalab/PDF-Ext
[03/09 12:19:59 fvcore.common.checkpoint]: [Checkpointer] Loading from /...//AImodels/opendatalab/PDF-Extract-Kit-1___0/mode
2025-03-09 12:20:00.168 | INFO | magic_pdf.model.pdf_extract_kit:init:174 - DocAnalysis init done!
2025-03-09 12:20:00.168 | INFO | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:130 - model init cost: 4.70296573638916
2025-03-09 12:20:00.168 | INFO | main:setup:172 - mineru 模型初始化完毕
2025-03-09 12:20:00.168 | INFO | main:setup:178 - paddleocr 模型开始加载
2025-03-09 12:20:00.411685506 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution provider
2025-03-09 12:20:00.411713547 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show n
2025-03-09 12:20:00.486 | INFO | main:setup:182 - paddleocr 模型初始化完毕
2025-03-09 12:20:00.486 | INFO | main:setup:185 - 表格结构解析模型 模型开始加载
2025-03-09 12:20:00.557 | INFO | main:setup:188 - 表格结构解析模型 模型初始化完毕
2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:224 - do parse , theoutput_dir:/...//AMinerU_SaveTemp/164620
2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:225 - 当前 do parse的进程号:3271185
2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:226 - 当前输入PDF文件字节大小:9.616772999999998 MB
2025-03-09 12:20:00.658 | INFO | main:task101_pdf_parse:227 - 开始解析PDF
2025-03-09 12:20:00.780 | INFO | magic_pdf.data.dataset:init:156 - lang: None
2025-03-09 12:20:00.781 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:186 - gpu_memory: 24 GB, batch_ratio: 16
2025-03-09 12:20:42.444 | INFO | magic_pdf.model.batch_analyze:call:74 - layout time: 34.65, image num: 89
2025-03-09 12:23:55.439 | INFO | magic_pdf.model.batch_analyze:call:195 - det time: 192.16, image num: 1099
2025-03-09 12:23:55.885 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:241 - gc time: 0.44
2025-03-09 12:23:55.885 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:245 - doc analyze time: 235.1, speed: 0.38 pages/second
2025-03-09 12:23:57.995 | INFO | magic_pdf.pdf_parse_union_core_v2:pdf_parse_union:938 - 需解析页数:89
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 191, bottom: 206, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 206, bottom: 221, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 221, bottom: 236, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 236, bottom: 251, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 251, bottom: 266, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 266, bottom: 281, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 281, bottom: 296, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 296, bottom: 311, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 311, bottom: 326, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 326, bottom: 341, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 341, bottom: 356, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 356, bottom: 371, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 371, bottom: 386, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 386, bottom: 401, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 401, bottom: 416, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 416, bottom: 431, page_w
2025-03-09 12:24:01.371 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 431, bottom: 446, page_w
2025-03-09 12:24:01.372 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 446, bottom: 461, page_w
2025-03-09 12:24:01.372 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 552, right: 752, top: 461, bottom: 476, page_w
2025-03-09 12:24:02.420 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 32, right: 238, top: 477, bottom: 485, page_w
2025-03-09 12:24:02.933 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 3, right: 340, top: 469.0, bottom: 485.0, pag
2025-03-09 12:24:03.565 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 2, right: 752, top: 107, bottom: 231.666666666
2025-03-09 12:24:03.565 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 2, right: 752, top: 231.66666666666669, bottom
2025-03-09 12:24:03.565 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 2, right: 752, top: 356.33333333333337, bottom
2025-03-09 12:24:04.086 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 0, right: 752, top: 92, bottom: 219.3333333333
2025-03-09 12:24:04.086 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 0, right: 752, top: 219.33333333333331, bottom
2025-03-09 12:24:04.087 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 0, right: 752, top: 346.66666666666663, bottom
2025-03-09 12:24:06.321 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 9, right: 369, top: 477.5, bottom: 494.0, pag
2025-03-09 12:24:07.407 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 7, right: 752, top: 350, bottom: 389.333333333
2025-03-09 12:24:07.407 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 7, right: 752, top: 389.3333333333333, bottom:
2025-03-09 12:24:07.407 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:537 - right > page_w, left: 7, right: 752, top: 428.66666666666663, bottom
2025-03-09 12:24:08.991 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 438, right: 677, top: 469, bottom: 483, page_w: 751.2000122070312, page_h: 482.0400085449219
2025-03-09 12:24:08.992 | WARNING | magic_pdf.pdf_parse_union_core_v2:sort_lines_by_model:547 - bottom > page_h, left: 438, right: 677, top: 483, bottom: 497, page_w: 751.2000122070312, page_h: 482.0400085449219
2025-03-09 12:24:08.992 | INFO | main:task101_pdf_parse:371 - 高精度PDF解析出错: Invalid box. right: 901, left: 583, bottom: 1000, top: 1002
Traceback (most recent call last):
File "/...//ai_servers/server01_mineruBase_Litserve.py", line 229, in task101_pdf_parse
out = do_parse(
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/tools/common.py", line 151, in do_parse
pipe_result = infer_result.pipe_txt_mode(
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/operators/models.py", line 102, in pipe_txt_mode
res = self.apply(
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/operators/models.py", line 70, in apply
return proc(copy.deepcopy(self._infer_res), *args, **kwargs)
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/operators/models.py", line 95, in proc
res = pdf_parse_union(*args, **kwargs)
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/pdf_parse_union_core_v2.py", line 951, in pdf_parse_union
page_info = parse_page_core(
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/pdf_parse_union_core_v2.py", line 869, in parse_page_core
sorted_bboxes = sort_lines_by_model(fix_blocks, page_w, page_h, line_height)
File "/home/kemove/miniconda3/envs/minerU120/lib/python3.10/site-packages/magic_pdf/pdf_parse_union_core_v2.py", line 557, in sort_lines_by_model
1000 >= right >= left >= 0 and 1000 >= bottom >= top >= 0
AssertionError: Invalid box. right: 901, left: 583, bottom: 1000, top: 1002
可能您需要检测检测框和PDF文件大小的问题。
How to reproduce the bug | 如何复现
不好意思不能提供pdf文件来复现。 mineru = 1.2.0
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
No response
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: