Skip to content

Latest commit

 

History

History

ex2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Exercise 2: OCR Data Processing

  • Condition: there is a scanned document from 1930's. We used OCR to recognize texts on the document. Then, OCR output a "txt" file (i.e., a plain text file with no structure). We want to organize the messy texts into a structured format so it looks like an Excel file. We can do this using Python scripts.

  • Executing the code

    1. Go to python/ex2/notebook/
    2. Type jupyter notebook and hit enter.
    3. The code is already there. Execute block by block using Shift + Enter
    4. The output files will be saved in python/ex2/data/
    5. Check the CSV file using Excel.
  • Exporting the code to a HTML file with Markdown-styled text.

    1. Position your cursor in a block.
    2. Try to insert a new block by clicking the Insert menu.
    3. Change the new block's mode to Markdown.
    4. Try to type Markdown wordings.
    5. In the menu bar, click File -> Download as -> HTML
    6. Open the downloaded file in your browser. It is a pure HTML file automatically generated from Jupyter Notebook. In this way, you can generate a Python-based styled document for web.