-
Notifications
You must be signed in to change notification settings - Fork 23
PDF Handling
There are two types of PDFs that hold user-submitted data:
- FilledPDF objects, which are pregenerated for San Francisco from their private intake PDF form.
- Printouts, which are PDFs generated on the fly for any user with appropriate permissions.
User-submitted data is stored in the FormSubmission model. All data are stored as strings, integers, booleans, or datetimes. Raw user input is stored without modification. In other words, malicious code entered into a text field would remain as a string with malicious potential.
FilledPDF objects are filled out using a python wrapper that calls a .jar
on the command line, in intake/pdfparser.py
.
The _fill()
method is the point at which python makes a call to the command line.
_fill()
uses filepaths to refer to the pdfs. pdf_path
is the input PDF form (this does not contain user-generated data). output_path
will be the filepath for the filled PDF (a tempfile path in all our existing use cases).
The user-submitted data is stored in the answers
input variable. At this stage User-submitted values that are stored as strings (such as text inputs) would be unchanged, while user-submitted values that are stored integers or other types sould be converted into strings. In other words, all the user-submitted values would be coerced to strings at this stage. The user data is serialized to a JSON string and passed to the command line.
After the pdfparser creates a filled pdf tempfile with user-submitted answers, it returns the bytes of the created tempfile (also see _get_file_contents()
).
In intake.services.pdf_service.fill_pdf_for_application
, the resulting bytes of the pdfparser's fill operation are returned and passed to the create_with_pdf_bytes
class method on intake.models.FilledPDF
.
The create_with_pdf_bytes
class method on intake.models.FilledPDF
creates a SimpleUploadedFile
instance from the bytes returned by pdfparser
, then instantiates a new FilledPDF
instance, setting the pdf
field to the SimpleUploadedFile
. The resulting file is written to S3 via django storages.
In intake.views.admin_views.FilledPDF
, the get
method returns the stored file directly, wrapping it in an HTTPResponse
.
- Create the new fillable PDF
Here are the field names needed to be compatible with filling (code reference here):
Page One We have not successfully recreated this form from scratch. We have made edits to specific text on this page (name of the Public Defender) and uploaded the revised version of the form.
Page Two
- Last Name:
LastName
- First Name:
FirstName
- DOB:
DOB
- SF Number:
SSN
- Upload the PDF in the admin interface:
- Log into CMR admin
- Visit the Django admin
- Under the "Intake" header, click on "Fillable PDFs"
- Click on "Clean Slate SF Combined"
- Upload the new version of the fillable PDF, click "Save"
Printouts are created in memory, using the reportlab library. The user-submitted values of fields are written to pdfs using the draw_field_value()
method in printing/pdf_form_display.py
. The user-submitted values are pulled by calling get_display_value()
on each field. get_display_value()
is a method on formation.field_base.Field
objects that is overwritten for formatting purposes on certain types of fields: WholeDollarField
, DateTimeField
, PhoneField
, ChoiceField
, MultipleChoiceField
, Counties
, and DateOfBirthField
.
In the .draw_paragraph()
method the value is coerced to a string and drawn into the PDF using repotlab's api.
# `text` contains the user-submitted value
if not text:
text = ''
if not isinstance(text, str):
text = str(text)
text = text.strip(string.whitespace)
text = text.replace('\n', "<br/>")
p = Paragraph(text, style)
...
p.drawOn(self.canvas, self.cursor.x, self.cursor.y - used_height)
Printouts are returned from reportlab's canvas rendering as file objects. The resulting file object is returned to intake.services.pdf_service.get_printout_for_submission
, which then returns the bytes contained in the file object (using .read()
)
Printouts are not stored. The generated file objects are served directly.
In intake.views.printout_views.CasePrintoutPDFView
, the bytes of the printout file object are wrapped in an HTTPResponse
object and returned.
Both FilledPDFs and printouts are served in multi-page bundles.
In the case of FilledPDFs, pdfparser is used to join multiple FilledPDF files and stores the resulting file in a PreBuiltPDFBundle object (also see here. PreBuiltPDFBundles are served by intake.views.prebuilt_pdf_bundle_views.PrebuiltPDFBundleFileView
.
In the case of printouts, intake.services.pdf_service.concatenated_printout
is used to create one pdf with multiple printouts, and is then served by intake.views.printout_views.PrintoutForApplicationsView
.