textract

Overview

textract is a command line tool for recognising text in images using macOS's built-in Vision framework

I think many macOS users will appreciate the current text recognition system. I think many people would also like to see it as a command line application.

This application consists of a single main.swift file. It only requires swiftc, which is available in xcode-select (Xcode CommandLine Tools) to compile the executable. You do not need Xcode, .xcodeproject, or anything else.

System Requirements

Text recognition on macOS, iOS, and other Apple systems is done with Vision Framework, which requires macOS 10.13+.

Installation

To build on your own machine, do the following:

Clone this repository:

git clone https://github.com/sth-v/textract.git

Go to the root of the cloned repository
```
cd textract
```
Compile the executable
```
swiftc -O main.swift -o textract
```
See swiftc --help for other options for compilation.

After this operation, an executable file named textract will appear in the root folder of the repository. This is the application.

Run

./textract --help

Output:

textract: A Swift command line tool for recognising text in images using macOS's built-in Vision framework.

Usage:
textract <path> [options]
textract --base64-input <base64 image> [options]

Arguments:
<path> The path to an image file or a directory containing image files.
<base64 image> A base64-encoded string representing an image.

Options:
--file-output Save recognised text to .txt files with the same base names as the images.
--print-report Print lists of processed and skipped files at the end.
-h, --help, ? Display this help section.

Examples:
1. Process a directory and output recognised text to stdout:
    ./textract /path/to/your/images

2. Process a single image file and output recognised text to stdout:
    ./textract /path/to/your/image/file

3. Process a directory and save recognised text to .txt files:
    ./textract /path/to/your/images --file-output

4. Process a directory, save recognised text to .txt files, and print a report:
    ./textract /path/to/your/images --file-output --print-report

5. Process a single image file, output recognized text to stdout, and print a report:
     ./textract /path/to/your/image/file --print-report
  
6. Process a single pdf file, output will be in markdown format, will include headings, paragraphs and lists:
     ./textract /path/to/your/image/file.pdf

7. Process a base64-encoded image string:
     ./textract --base64-input <base64 image>

Examples

Process a directory and output recognised text to stdout:
```
./textract /path/to/your/images
```
Process a single image file and output recognised text to stdout:
```
./textract /path/to/your/image/file
```
Process a directory and save recognised text to .txt files:
```
./textract /path/to/your/images --file-output
```
Process a directory, save recognised text to .txt files, and print a report:
```
./textract /path/to/your/images --file-output --print-report
```
Process a single image file, output recognised text to stdout, and print a report:
```
./textract /path/to/your/image/file --print-report
```
Process a single pdf file, output will be in markdown format, will include headings, paragraphs and lists:
```
./textract /path/to/your/image/file.pdf
```
```
If you are using a macOS device as a server, a macOS virtual machine, or a docker container you may want to call textract over a network, for example via the REST API. For this purpose we have implemented the ability to read an image from a base64 string, which is passed directly to stdin in the following form:
```
./textract --base64-input <base64 string>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

textract

Overview

System Requirements

Installation

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

textract

Overview

System Requirements

Installation

Examples