Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NY: parse historical PDFs #476

Open
chriszs opened this issue Apr 1, 2022 · 1 comment
Open

Fix NY: parse historical PDFs #476

chriszs opened this issue Apr 1, 2022 · 1 comment
Labels
data quality Bullet-proofing the data enhancement medium A slightly harder task

Comments

@chriszs
Copy link
Contributor

chriszs commented Apr 1, 2022

After negotiating an expanded timeframe with the NY WARN coordinator (going back to 2006), at her request I foiled a FOIL for the data in electronic spreadsheet format. Today the FOIL office gave me back these:

FL-22-0165 Release Letter .pdf
FL-22-0165 Records for Release_Part1.pdf
FL-22-0165 Records for Release_Part2.pdf
FL-22-0165 Records for Release_Part3.pdf

So, I should either plan to see if they'll give me spreadsheets instead, or resign myself to parsing these well-structured PDFs. And I plan to update the docs for NY.

@chriszs
Copy link
Contributor Author

chriszs commented May 6, 2022

I'm going to leave it at PDFs, but not able to write a parser just yet. Open for anyone.

@chriszs chriszs changed the title Fix NY: demand spreadsheets for historical data or parse PDFs Fix NY: parse historical PDFs May 16, 2022
jsvine added a commit to jsvine/warn-scraper that referenced this issue May 16, 2022
Responding to the call-out here:
biglocalnews#476

This being my first commit to the project, and not knowing how the
maintainers would like to handle the overlap between the data sources, I
tried to take the least destructive approach.
@palewire palewire added this to the Scraper repair shop milestone May 25, 2022
@palewire palewire added enhancement data quality Bullet-proofing the data medium A slightly harder task labels May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Bullet-proofing the data enhancement medium A slightly harder task
Projects
None yet
Development

No branches or pull requests

2 participants