Skip to content
This repository has been archived by the owner on Jan 23, 2024. It is now read-only.

PDF Handling

Christa Hartsock edited this page Jan 9, 2020 · 6 revisions

There are two types of PDFs that hold user-submitted data:

  1. FilledPDF objects, which are pregenerated for San Francisco from their private intake PDF form.
  2. Printouts, which are PDFs generated on the fly for any user with appropriate permissions.

User data entered into the application.

User-submitted data is stored in the FormSubmission model. All data are stored as strings, integers, booleans, or datetimes. Raw user input is stored without modification. In other words, malicious code entered into a text field would remain as a string with malicious potential.

FilledPDFs

How user data is given to FilledPDFs

FilledPDF objects are filled out using a python wrapper that calls a .jar on the command line, in intake/pdfparser.py.

The _fill() method is the point at which python makes a call to the command line.

_fill() uses filepaths to refer to the pdfs. pdf_path is the input PDF form (this does not contain user-generated data). output_path will be the filepath for the filled PDF (a tempfile path in all our existing use cases).

The user-submitted data is stored in the answers input variable. At this stage User-submitted values that are stored as strings (such as text inputs) would be unchanged, while user-submitted values that are stored integers or other types sould be converted into strings. In other words, all the user-submitted values would be coerced to strings at this stage. The user data is serialized to a JSON string and passed to the command line.

Returning PDFs containing user data to Python

After the pdfparser creates a filled pdf tempfile with user-submitted answers, it returns the bytes of the created tempfile (also see _get_file_contents()). In intake.services.pdf_service.fill_pdf_for_application, the resulting bytes of the pdfparser's fill operation are returned and passed to the create_with_pdf_bytes class method on intake.models.FilledPDF.

Storage

The create_with_pdf_bytes class method on intake.models.FilledPDF creates a SimpleUploadedFile instance from the bytes returned by pdfparser, then instantiates a new FilledPDF instance, setting the pdf field to the SimpleUploadedFile. The resulting file is written to S3 via django storages.

Serving the PDF response

In intake.views.admin_views.FilledPDF, the get method returns the stored file directly, wrapping it in an HTTPResponse.

Updating the Fillable PDF

  1. Create the new fillable PDF

Here are the field names needed to be compatible with filling (code reference here):

Page One

  • Date: Date
  • Last Name: Last Name
  • First Name: First Name
  • MI: MI
  • SSN: Social Security Number
  • Date of Birth: Date of Birth
  • US Citizen?: US Citizen (Radio buttons)
  • Mailing Address: Street: Address Street
  • Mailing Address: City: Address City
  • Mailing Address: State: Address State
  • Zip: Address Zip
  • May we send you mail at this address: May we send mail here (Radio buttons)
  • Phone Number/s: Cell: Cell phone number
  • Phone Number/s: Home: Home phone number
  • Phone Number/s: Work: Work phone number
  • Phone Number/s: Other: Other phone number
  • May we leave voice messages about your case at these numbers: May we leave voicemail (Radio buttons)
  • Email Address: Email Address
  • On probation or parole?: On probation or parole (Radio buttons)
  • Serving a sentence?: Serving a sentence (Radio buttons)
  • Charged with a crime?: Charged with a crime (Radio buttons)
  • If you are on probation, where and until when?: If probation where and when?
  • Have you EVER been arrested or convicted of a crime OUTSIDE of San Francisco?: Arrested outside SF (Radio button)
  • If yes, list all the dates: Dates arrested outside SF
  • Are you currently employed: Employed (Radio button)
  • What is your monthly income?: What is your monthly income
  • What is your total monthly expense for essential needs? Monthly expenses
  • How did you hear about the Clean Slate Program? How did you hear about the Clear My Record Program

Page Two

  • Last Name: LastName
  • First Name: FirstName
  • DOB: DOB
  • SF Number: SSN
  1. Upload the PDF in the admin interface:
  • Log into CMR admin
  • Visit the Django admin
  • Under the "Intake" header, click on "Fillable PDFs"
  • Click on "Clean Slate SF Combined"
  • Upload the new version of the fillable PDF, click "Save"

PDF Printouts

How user data is given to PDF Printouts

Printouts are created in memory, using the reportlab library. The user-submitted values of fields are written to pdfs using the draw_field_value() method in printing/pdf_form_display.py. The user-submitted values are pulled by calling get_display_value() on each field. get_display_value() is a method on formation.field_base.Field objects that is overwritten for formatting purposes on certain types of fields: WholeDollarField, DateTimeField, PhoneField, ChoiceField, MultipleChoiceField, Counties, and DateOfBirthField.

In the .draw_paragraph() method the value is coerced to a string and drawn into the PDF using repotlab's api.

# `text` contains the user-submitted value
if not text:
    text = ''
if not isinstance(text, str):
    text = str(text)
text = text.strip(string.whitespace)
text = text.replace('\n', "<br/>")
p = Paragraph(text, style)
...
p.drawOn(self.canvas, self.cursor.x, self.cursor.y - used_height)

Returning Printouts containing user data to Python

Printouts are returned from reportlab's canvas rendering as file objects. The resulting file object is returned to intake.services.pdf_service.get_printout_for_submission, which then returns the bytes contained in the file object (using .read())

Storage

Printouts are not stored. The generated file objects are served directly.

Serving the Printout response

In intake.views.printout_views.CasePrintoutPDFView, the bytes of the printout file object are wrapped in an HTTPResponse object and returned.

Bundles

Both FilledPDFs and printouts are served in multi-page bundles.

In the case of FilledPDFs, pdfparser is used to join multiple FilledPDF files and stores the resulting file in a PreBuiltPDFBundle object (also see here. PreBuiltPDFBundles are served by intake.views.prebuilt_pdf_bundle_views.PrebuiltPDFBundleFileView.

In the case of printouts, intake.services.pdf_service.concatenated_printout is used to create one pdf with multiple printouts, and is then served by intake.views.printout_views.PrintoutForApplicationsView.

Clone this wiki locally