Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nougat models (image-to-text) #391

Merged
merged 17 commits into from
Nov 20, 2023
Merged

Add support for nougat models (image-to-text) #391

merged 17 commits into from
Nov 20, 2023

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Nov 13, 2023

Closes #353

Example usage

Code adapted from here.

Example image

image

Pipeline API

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
let pipe = await pipeline('image-to-text', 'Xenova/nougat-small');
let output = await pipe(url, {
  min_length: 1,
  max_new_tokens: 40,
  bad_words_ids: [[pipe.tokenizer.unk_token_id]],
});
// [{ generated_text: "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: lblecher@meta.com\n\nGuillem Cucur" }]

AutoModel

import { AutoProcessor, AutoTokenizer, AutoModelForVision2Seq, RawImage } from '@xenova/transformers';

// Choose model to use
const model_id = 'Xenova/nougat-small';

// Load model, tokenizer, and processor
const model = await AutoModelForVision2Seq.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);

// Prepare PDF image for the model
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
const image = await RawImage.read(url);
const image_inputs = await processor(image);

// Generate text (here we only generate 30 tokens)
const output = await model.generate(image_inputs.pixel_values, {
  min_length: 1,
  max_new_tokens: 30,
  bad_words_ids: [[tokenizer.unk_token_id]],
});

// Decode output
const decoded = tokenizer.batch_decode(output, {
  skip_special_tokens: true
})[0];
// "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: lblecher@"

@xenova xenova merged commit 5ddc472 into main Nov 20, 2023
@xenova xenova deleted the add-nougat branch December 1, 2023 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Nougat
1 participant