-
Condition: there is a scanned document from 1930's. We used OCR to recognize texts on the document. Then, OCR output a "txt" file (i.e., a plain text file with no structure). We want to organize the messy texts into a structured format so it looks like an Excel file. We can do this using Python scripts.
-
Executing the code
- Go to
python/ex2/notebook/
- Type
jupyter notebook
and hit enter. - The code is already there. Execute block by block using
Shift + Enter
- The output files will be saved in
python/ex2/data/
- Check the CSV file using Excel.
- Go to
-
Exporting the code to a HTML file with Markdown-styled text.
- Position your cursor in a block.
- Try to insert a new block by clicking the
Insert
menu. - Change the new block's mode to
Markdown
. - Try to type Markdown wordings.
- In the menu bar, click
File -> Download as -> HTML
- Open the downloaded file in your browser. It is a pure HTML file automatically generated from Jupyter Notebook. In this way, you can generate a Python-based styled document for web.
ex2
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||