Skip to content

Commit

Permalink
Add streamlit app
Browse files Browse the repository at this point in the history
  • Loading branch information
VikParuchuri committed Feb 10, 2024
1 parent 9d3e906 commit 3d0e487
Show file tree
Hide file tree
Showing 13 changed files with 544 additions and 77 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ wandb
notebooks
results
data
slices

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who
| Presentation | [Image](static/images/pres.png) | [Image](static/images/pres_text.png) |
| Scientific Paper | [Image](static/images/paper.png) | [Image](static/images/paper_text.png) |
| Scanned Document | [Image](static/images/scanned.png) | [Image](static/images/scanned_text.png) |
| Scanned Form | [Image](static/images/funsd.png) | |
| Scanned Old Form | [Image](static/images/funsd.png) | [Image](static/images/funsd_text.jpg) |

# Installation

Expand All @@ -51,6 +51,15 @@ Model weights will automatically download the first time you run surya. Note th
- Inspect the settings in `surya/settings.py`. You can override any settings with environment variables.
- Your torch device will be automatically detected, but you can override this. For example, `TORCH_DEVICE=cuda`. For text detection, the `mps` device has a bug (on the [Apple side](https://github.com/pytorch/pytorch/issues/84936)) that may prevent it from working properly.

## Interactive App

I've included a streamlit app that lets you interactively try Surya on images or PDF files. Run it with:

```
pip install streamlit
surya_gui
```

## OCR (text recognition)

You can detect text in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected text and bboxes, and optionally save images of the reconstructed page.
Expand Down Expand Up @@ -78,10 +87,7 @@ The `results.json` file will contain these keys for each page of the input docum

**Performance tips**

Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `256`, which will use about 10GB of VRAM.

Depending on your CPU core count, `RECOGNITION_BATCH_SIZE` might make a difference there too - the default CPU batch size is `32`.

Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `256`, which will use about 10GB of VRAM. Depending on your CPU core count, it may help, too - the default CPU batch size is `32`.

### From python

Expand All @@ -94,20 +100,15 @@ from surya.model.recognition.processor import load_processor as load_rec_process
image = Image.open(IMAGE_PATH)
langs = ["en"] # Replace with your languages
det_processor = load_det_processor()
det_model = load_det_model()
rec_model = load_rec_model()
rec_processor = load_rec_processor()
det_processor, det_model = load_det_processor(), load_det_model()
rec_model, rec_processor = load_rec_model(), load_rec_processor()
predictions = run_ocr([image], langs, det_model, det_processor, rec_model, rec_processor)
```


## Text line detection

You can detect text lines in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected bboxes, and optionally save images of the pages with the bboxes.
You can detect text lines in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected bboxes.

```
surya_detect DATA_PATH --images
Expand All @@ -128,12 +129,7 @@ The `results.json` file will contain these keys for each page of the input docum

**Performance tips**

Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM.

Depending on your CPU core count, `DETECTOR_BATCH_SIZE` might make a difference there too - the default CPU batch size is `2`.

You can adjust `DETECTOR_NMS_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. Try lowering them to detect more text, and vice versa.

Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `2`.

### From python

Expand All @@ -149,9 +145,20 @@ model, processor = load_model(), load_processor()
predictions = batch_detection([image], model, processor)
```

## Table and chart detection
# Limitations

- This is specialized for document OCR. It will likely not work on photos or other images.
- It is for printed text, not handwriting (though it may work on some handwriting).
- The model has trained itself to ignore advertisements.
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.

## Troubleshooting

If OCR isn't working properly:

- If the lines aren't detected properly, try increasing resolution of the image if the width is below `896px`, and vice versa. Very high width images don't work well with the detector.
- You can adjust `DETECTOR_BLANK_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. `DETECTOR_BLANK_THRESHOLD` controls the space between lines - any prediction below this number will be considered blank space. `DETECTOR_TEXT_THRESHOLD` controls how text is joined - any number above this is considered text. `DETECTOR_TEXT_THRESHOLD` should always be higher than `DETECTOR_BLANK_THRESHOLD`, and both should be in the 0-1 range. Looking at the heatmap from the debug output of the detector can tell you how to adjust these (if you see faint things that look like boxes, lower the thresholds, and if you see bboxes being joined together, raise the thresholds).

Coming soon.

# Manual install

Expand All @@ -162,13 +169,6 @@ If you want to develop surya, you can install it manually:
- `poetry install` - installs main and dev dependencies
- `poetry shell` - activates the virtual environment

# Limitations

- This is specialized for document OCR. It will likely not work on photos or other images.
- It is for printed text, not handwriting (though it may work on some handwriting).
- The model has trained itself to ignore advertisements.
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.

# Benchmarks

## OCR
Expand Down
38 changes: 0 additions & 38 deletions demo_app.py

This file was deleted.

119 changes: 119 additions & 0 deletions ocr_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
import io

import pypdfium2
import streamlit as st
from surya.detection import batch_detection
from surya.model.detection.segformer import load_model, load_processor
from surya.model.recognition.model import load_model as load_rec_model
from surya.model.recognition.processor import load_processor as load_rec_processor
from surya.postprocessing.heatmap import draw_polys_on_image
from surya.ocr import run_ocr
from surya.postprocessing.text import draw_text_on_image
from PIL import Image
from surya.languages import CODE_TO_LANGUAGE
from surya.input.langs import replace_lang_with_code


@st.cache_resource()
def load_det_cached():
return load_model(), load_processor()


@st.cache_resource()
def load_rec_cached():
return load_rec_model(), load_rec_processor()


def text_detection(img):
preds = batch_detection([img], det_model, det_processor)[0]
det_img = draw_polys_on_image(preds["polygons"], img.copy())
return det_img, preds


# Function for OCR
def ocr(img, langs):
replace_lang_with_code(langs)
pred = run_ocr([img], [langs], det_model, det_processor, rec_model, rec_processor)[0]
rec_img = draw_text_on_image(pred["bboxes"], pred["text_lines"], img.size)
return rec_img, pred


def open_pdf(pdf_file):
stream = io.BytesIO(pdf_file.getvalue())
return pypdfium2.PdfDocument(stream)


@st.cache_data()
def get_page_image(pdf_file, page_num, dpi=96):
doc = open_pdf(pdf_file)
renderer = doc.render(
pypdfium2.PdfBitmap.to_pil,
page_indices=[page_num - 1],
scale=dpi / 72,
)
png = list(renderer)[0]
png_image = png.convert("RGB")
return png_image


@st.cache_data()
def page_count(pdf_file):
doc = open_pdf(pdf_file)
return len(doc)


st.set_page_config(layout="wide")
col1, col2 = st.columns([.5, .5])

det_model, det_processor = load_det_cached()
rec_model, rec_processor = load_rec_cached()


st.markdown("""
# Surya OCR Demo
This app will let you try surya, a multilingual OCR model. It supports text detection in any language, and text recognition in 90+ languages.
Notes:
- This works best on documents with printed text.
- Try to keep the image width around 896, especially if you have large text.
- This supports 90+ languages, see [here](https://github.com/VikParuchuri/surya/tree/master/surya/languages.py) for a full list of codes.
Find the project [here](https://github.com/VikParuchuri/surya).
""")

in_file = st.sidebar.file_uploader("PDF file or image:", type=["pdf", "png", "jpg", "jpeg", "gif", "webp"])
languages = st.sidebar.multiselect("Languages", sorted(list(CODE_TO_LANGUAGE.values())), default=["English"], max_selections=4)

if in_file is None:
st.stop()

filetype = in_file.type
whole_image = False
if "pdf" in filetype:
page_count = page_count(in_file)
page_number = st.sidebar.number_input(f"Page number out of {page_count}:", min_value=1, value=1, max_value=page_count)

pil_image = get_page_image(in_file, page_number)
else:
pil_image = Image.open(in_file).convert("RGB")

text_det = st.sidebar.button("Run Text Detection")
text_rec = st.sidebar.button("Run OCR")

# Run Text Detection
if text_det and pil_image is not None:
det_img, preds = text_detection(pil_image)
with col1:
st.image(det_img, caption="Detected Text", use_column_width=True)
st.json(preds)

# Run OCR
if text_rec and pil_image is not None:
rec_img, pred = ocr(pil_image, languages)
with col1:
st.image(rec_img, caption="OCR Result", use_column_width=True)
st.json(pred)

with col2:
st.image(pil_image, caption="Uploaded Image", use_column_width=True)
Loading

0 comments on commit 3d0e487

Please sign in to comment.