Add support for nougat models (`image-to-text`) #391

xenova · 2023-11-13T18:34:44Z

Closes #353

Example usage

Code adapted from here.

Example image

Pipeline API

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
let pipe = await pipeline('image-to-text', 'Xenova/nougat-small');
let output = await pipe(url, {
  min_length: 1,
  max_new_tokens: 40,
  bad_words_ids: [[pipe.tokenizer.unk_token_id]],
});
// [{ generated_text: "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: lblecher@meta.com\n\nGuillem Cucur" }]

AutoModel

import { AutoProcessor, AutoTokenizer, AutoModelForVision2Seq, RawImage } from '@xenova/transformers';

// Choose model to use
const model_id = 'Xenova/nougat-small';

// Load model, tokenizer, and processor
const model = await AutoModelForVision2Seq.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);

// Prepare PDF image for the model
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
const image = await RawImage.read(url);
const image_inputs = await processor(image);

// Generate text (here we only generate 30 tokens)
const output = await model.generate(image_inputs.pixel_values, {
  min_length: 1,
  max_new_tokens: 30,
  bad_words_ids: [[tokenizer.unk_token_id]],
});

// Decode output
const decoded = tokenizer.batch_decode(output, {
  skip_special_tokens: true
})[0];
// "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: lblecher@"

OffscreenCanvas does not have `toDataURL` function

Python uses \1, \2, etc. for group substitutions, but JavaScript uses $1, $2, etc.

xenova added 17 commits November 10, 2023 03:59

Add NougatTokenizer

0eefe8b

Add nougat unit tests

12b0241

Add support for NougatImageProcessor

5f21ec9

Add crop function to RawImage

5131e00

Fix RawImage save function

5fd5b03

OffscreenCanvas does not have `toDataURL` function

Add listed support for nougat models

68fd066

Fix min/max function typing

9e1b356

Add unknown token to tokenizer class

c673998

Implement NoBadWordsLogitsProcessor

5f562db

Use NoBadWordsLogitsProcessor in generate

1469b47

Fix regex group substitutions

ce4fc97

Python uses \1, \2, etc. for group substitutions, but JavaScript uses $1, $2, etc.

Create regexSplit helper function to split but keep delimiter

5216add

Fix splitting for String pattern types

9a5450b

Fix docstring

1a1c159

Merge branch 'main' into add-nougat

34bb56c

Merge branch 'main' into add-nougat

177c476

Merge branch 'main' into add-nougat

26097b0

xenova merged commit 5ddc472 into main Nov 20, 2023

xenova deleted the add-nougat branch December 1, 2023 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for nougat models (`image-to-text`) #391

Add support for nougat models (`image-to-text`) #391

xenova commented Nov 13, 2023

Add support for nougat models (image-to-text) #391

Add support for nougat models (image-to-text) #391

Conversation

xenova commented Nov 13, 2023

Example usage

Pipeline API

AutoModel

Add support for nougat models (`image-to-text`) #391

Add support for nougat models (`image-to-text`) #391