Skip to content

FastText subword vectors support in Spacy3 #8454

Discussion options

You must be logged in to vote

The fasttext-ish code doesn't really work like fasttext, and it's not used by the statistical models, so I wouldn't recommend using it.

Static fasttext vectors are fine in spacy models as static vectors, so it's not a reason against using fasttext vs. glove, for instance, but no, it's not using the subword information for unknown tokens.

One of the ideas I've been working on a bit is to add an option for replacing the HashEmbed layer with a custom version of fasttext vectors where all tokens have a vector calculated from n-gram substrings, but with a compact embedding table rather than its default single-hash 2M ngram table. The fasttext part is implemented but the thinc/spacy part is not…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@adrianeboyd
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity
2 participants