Skip to content

Commit

Permalink
fix: improve false-positive Title elements on Chinese text <- Ingest …
Browse files Browse the repository at this point in the history
…test fixtures update (#3838)

This pull request includes updated ingest test fixtures.
Please review and merge if appropriate.

Co-authored-by: scanny <[email protected]>
  • Loading branch information
ryannikolaidis and scanny authored Dec 17, 2024
1 parent e5a3459 commit 3f5ab19
Show file tree
Hide file tree
Showing 8 changed files with 270 additions and 1,276 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "5209312022a75a31d95385fdccff68fa",
"text": "CHAPTER 1",
"metadata": {
Expand Down Expand Up @@ -51,7 +51,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "22a23e29022f32945965002cd734a8f0",
"text": "INTRODUCTION",
"metadata": {
Expand Down Expand Up @@ -79,7 +79,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "4c175cf543957acc4420221de28d3fca",
"text": "CHAPTER 1 \u2013 INTRODUCTION",
"metadata": {
Expand All @@ -101,7 +101,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "77022a5264f552b223538977cd40f640",
"text": "A.\tPURPOSE",
"metadata": {
Expand Down Expand Up @@ -189,7 +189,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "e341ffc123dd2827638aba18149c4175",
"text": "B.\tROLE OF THE UNITED STATES TRUSTEE",
"metadata": {
Expand Down Expand Up @@ -255,7 +255,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "1b11ebe52652656e0ed8c12e5969de9b",
"text": "C.\tSTATUTORY DUTIES OF A STANDING TRUSTEE\t",
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "5209312022a75a31d95385fdccff68fa",
"text": "CHAPTER 1",
"metadata": {
Expand Down Expand Up @@ -51,7 +51,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "22a23e29022f32945965002cd734a8f0",
"text": "INTRODUCTION",
"metadata": {
Expand Down Expand Up @@ -79,7 +79,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "4c175cf543957acc4420221de28d3fca",
"text": "CHAPTER 1 \u2013 INTRODUCTION",
"metadata": {
Expand All @@ -101,7 +101,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "77022a5264f552b223538977cd40f640",
"text": "A.\tPURPOSE",
"metadata": {
Expand Down Expand Up @@ -189,7 +189,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "e341ffc123dd2827638aba18149c4175",
"text": "B.\tROLE OF THE UNITED STATES TRUSTEE",
"metadata": {
Expand Down Expand Up @@ -255,7 +255,7 @@
}
},
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "1b11ebe52652656e0ed8c12e5969de9b",
"text": "C.\tSTATUTORY DUTIES OF A STANDING TRUSTEE\t",
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "56d531394823d81787d77a04462ed096",
"text": "Lorem ipsum dolor sit amet.",
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "56d531394823d81787d77a04462ed096",
"text": "Lorem ipsum dolor sit amet.",
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[
{
"type": "Title",
"type": "UncategorizedText",
"element_id": "cc23ac9998df1db62b795ec4e5133ab0",
"text": "Title",
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"eng"
],
"page_number": 1,
"orig_elements": "eJztlU1v2jAYx79K5PMgifNm71Z1leilIAqnFkWO/RiiJnbkOBsd4rvPhtK1E4dt0qQx7RQ/74//P1l52CFooAVly1qgjwFKODBKuIRIZAXJc8oqwDxNMkllFgmGPgSoBcsEs8zl75A/lL0eDIeD3YFp676vterLl6SHHWq18OEkiQnZr1wPA1wbUTaaM6vNsZLZjV8h3OgWQjMoBSb8os1TOKjemoHbwYB4b8CWtV0DI6F5H26YEpXWT6O4GzvHFu33bpKsG7DPnR+PWNc1tZvotgs/KzHWHaht20htWmb7kZay5uBKB6/I2M0WndEc3HXUum3Gp4gXYQNMgCml1tZ9TgM6U7fMPPuEhqn1wNbQewUQqDVa+XUsbK3PXN4HCzP0FiCYvOztq06NJof2yFX8SEgKSgkTaSogo1lW0IRgUeTuUFQVJvzyCEHbbVhffwVRenVKrpV19z0Kdz25mi1u5kGMVmdSLVsf06pD+A+xPoPSeTvnKdXQVuDVid/C/b70G6SL2jZwjmhFRERFlnIBmJBIFEwkNM8hYhHnVVX9Y0Rv7xbz6afl9eJ2endJUN/t/VNckyzPCEtlgYXDWUkc5RyqnCcYA5X48rj+LShe31fwOOAoToJfZsNzwAWN3W8uTomgKc6KIk5S8GYS0/Q/m99lczV+tLPlfDa9vzlDYvUNQ4PhWg=="
"orig_elements": "eJztlV1r2zAUhv+K0fWS+NvW7kpXSG+akCZXbTD6OHJMLcnI8pYu5L9PSpqtHYGxjcEydmUdnffovDoPwg87BC1IULZqOHofoIQBwSUTEPKsKPMcEwoxS5NMYJGFnKB3AZJgCSeWOP0O+UXV68EwOMQdGNn0faNVX72IHnZIau7TSRKV5X7tzjDAtOFVqxmx2hwrid14C5ONljAxg1JgJp+0eZoMqrdmYHYwwN8GsCWya2HENesnG6I41fppFHVjt7FF+73rJJoW7HPn2yPSdW3jOjp3k4+Kj3UHaitboY0kth9pIRoGrnTwExm73rwzmoG7jqplOz5l/BA2QDiYSmht3efUoDONJObZC1qi6oHU0PsJIFA1Wns7FrbWK1f3wdIMvQUIpi++fdXpoOnheOQqvickOMYl4WnKIcNZVuCkjHmRu0VBaVyyyyMEstuQvvkMvPLTqZhW1t33OLjr6dV8ebMIIrQ+I7WkPsroIf2HWJ9B6XY7t1OpQVLw04lew/1m+hXSlXJuoNbG21965Rm6tOQh5lnKOMRlGfKC8ATnOYQkZIxS+o/Rvb1bLmYfVtfL29ndJQF+4/unGSdZnpUkFUXMHVoq4jBnQHOWxDFgEV8e478Fy9d3FzwOcRglwW9xYjnEBY7crzBKS47TOCuKKEnBh0mE0/+cfpXT1fjRzleL+ez+5gdU1l8AKnj1Ng=="
}
},
{
Expand Down Expand Up @@ -383,7 +383,7 @@
"eng"
],
"page_number": 2,
"orig_elements": "eJxVkMtOwzAQRX8l8pompGmlhB2IIJAQldp0VarIsSdpVNtj+QFBVf4dG+iC3bzPvXO4EBAgQbl25OQuIauCccp51RXLjlf5ii0L3lddWZXlmt2uC3KTEAmOcupomL+QGLQWvWHwk2swcrR2RGXbv6HDhUjksV0UeVnOx3DDAEPDW4GMOjS/m9SdooTshBIy45UCk32iOWdeWWc8c94A/5/ARKUWsODIbHaiineI50Wu01CYyDwHUj8KcF864gnVWoyBGNRlH4qnqEFNUvRoJHV2gX0/MgirPn4kDWyuDTIIdtQgRXrtxCcIqgZPB7DRIAE1kGhLh0qrvOwgelpGvoPJRfZD+u62m9c62TwlzXOd7N9emvox2TX3Tb1Lmu1+19R1vHwV24xOAJmP32EBjeE="
"orig_elements": "eJxVkMtOwzAQRX8l8pomtGmlhB2IIJAQldp0VarIsSdpVNtj+QGBKv+ODe2C3bzPvbM/ExAgQblm4OQuIcucccp52eaLlpfzJVvkvCvboiyKFbtd5eQmIRIc5dTRMH8mMWgsesPgN9dg5GDtgMo2l6H9mUjksZ3n86KYDuGGAYaGNwIZdWj+Nqk7RgnZESVkxisFJvtEc8q8ss545rwB/j+BkUotYMaR2exIFW8RT7O5TkNhJNMUSN0gwH3piCdUazEEYlCXfSieogY1StGhkdTZGXbdwCCs+viRNLC5Nsgg2FG9FOm1E58gqOo97cFGgwRUT6ItHSqN8rKF6GkR+Q5GF9kP6bvbrF+rZP2U1M9Vsnt7qavHZFvf19U2qTe7bV1V8fJV7E4FpdCjGb6B1/HKdPgBKuqS2A=="
}
},
{
Expand Down Expand Up @@ -572,7 +572,7 @@
"eng"
],
"page_number": 2,
"orig_elements": "eJxVkMtuwjAQRX8l8rokDZRHukOFVmxAgrCoAEVOPAkRtsfyo02F8u+1oSy689yZ8bl3DlcCHARIW7SMvEbkJcvSYZqNR1NaZiU8jwHGFDJIKWMTOp2Qp4gIsJRRS/38lYRHYdDpCm61Ai1aY1qUpvgbOlyJQBbao1E6m/Un/4eGCjUrOFbUor5vUnsOFpIzCki0kxJ08o36kjhprHaVdRrY/wI6KhSHAcPKJGcqWYl4GaQq9kJH+t6T6paD/VEBT6hSvPVE7y75kixGBbITvEYtqDUDrOu2Ar/qwkViz2ZKYwU+jmwEjx+dcAROZeNoAyYEJCAbEmIprxTSiRJCpmHgW+hsYL/FR7vL5/k+32w/o8U+Xy130eY9mkdeXS9W648o3+53+XJ5vAEenvPWciD96RdNTpBK"
"orig_elements": "eJxVkMtuwjAQRX8l8rokDZRHukOFVmxAgmRRAYqceBIibI/lR5sW8e+1aVl0N+9z7+wvBDgIkLbsGHmOyFOWpcM0G4+mtMoqeBwDjClkkFLGJnQ6IQ8REWApo5b6+QsJQWnQ6RpuuQItOmM6lKb8G9pfiEAW2qNROptdj/6Ghho1KznW1KL+3aT2FCQkJxSQaCcl6OQT9Tlx0ljtaus0sP8J9FQoDgOGtUlOVLIK8TxIVewLPblePanpONgvFfCEKsU7T/Tqkg/JYlQge8Eb1IJaM8Cm6Wrwqy58JPZspjTW4O3IVvD43glP4FS2jrZggkECsiXBlvKVUjpRQfA0DHwLvQ3sl/hgd/k8L/LN9j1aFPlquYs2r9E88tX1YrV+i/JtscuXy8MNcNdcSC8YWtTdN7A8HLsefwA0I5VB"
}
},
{
Expand Down
Loading

0 comments on commit 3f5ab19

Please sign in to comment.