-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pacer): Refine multi-document page handling logic #402
base: main
Are you sure you want to change the base?
Conversation
c562279
to
6ba914a
Compare
8e2c8ed
to
e679a22
Compare
This commit introduces a helper function that encasuplates logic to check if a specific document within a combined PDF page is available in the recap archive.
Ensures that the `docsToCases` mapping is correctly populated when processing attachment pages.
Adds a new utility function to retrieve the `DocToCases` mapping from storage
Introduces a new function to determine if a particular document within a multi-doc page is available in the recap archive.
This commit introduces a new utility function to efficiently extract data from receipt tables, addressing the limitation of multi-document pages. This enhancement improves the extension's ability to accurately process documents.
e679a22
to
dbb4b31
Compare
@mlissner in my last commit, I implemented a Upon further investigation, I discovered that the extension was sending the HTML page containing the error message to the CL API (not great). By implementing the validation, we can prevent the upload of the invalid HTML content. Here are gifs showing the error message in different browsers:
|
Key changes:
Refines the
handleCombinedPdfPageView
(appellate) andhandleCombinedPDFView
(district) methods to accurately identify multi-document pages containing only one PDF file. By analyzing the HTML structure, I noticed that receipt tables are enclosed within center divs, and the number of these divs corresponds to the number of files in the combined PDF. Both methods now check for the presence of center nodes to determine if a warning should be displayed.In appellate pages, an additional filter was implemented to ensure accurate counting, as center divs may also be used to wrap the page's main content.
In both district and appellate courts, the document ID is often not directly accessible within the HTML structure of the page. While some courts use the document ID as the entry number, this is not a consistent practice across all jurisdictions. To address this challenge, this PR introduces two helper methods that uses the URL of the PACER page and the existing
DocToCases
mapping stored in our local storage:District court URLs frequently contain a query parameter named
exclude_attachments
. This parameter is a comma-separated list of shortened document IDs that are not included in the combined PDF. By parsing this list and comparing it to the DocToCases mapping, we can identify the missing document ID.This PR introduces the
getPacerDocIdFromExcludeList
helper function. It takes a list of excluded document IDs as input and returns the corresponding document ID based on the DocToCases mapping.Appellate court URLs often include a query parameter named
dls
. This parameter is a comma-separated list of shortened document IDs that are included in the combined PDF. By filtering the DocToCases mapping based on this list, we can determine the document ID.The
getPacerDocIdFromPartialId
method implements this filtering process, taking the partial as input and returning the extracted document ID.Introduces a new utility function,
parseDataFromReceiptTable
, to extract data from receipt tables in appellate courts. While parsing the title alone is often enough for single-document pages, it lacks the necessary information to identify the document in multi-document pages. To address this limitation, this function extracts data directly from the receipt table, providing a more reliable and comprehensive approach.Integrate all helper functions into the
handleCombinedPdfPageView
(appellate) andhandleCombinedPDFView
(district) methods. This will enable us to insert banners for available documents and upload the PDFs to the recap archive.Here are GIFs showing how our extension works in appellate and district courts:
Fixes freelawproject/recap#349