-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look into how we might use SPECTER to improve our labeller #41
Comments
According to the SPECTER paper, SPECTER outperforms SciBERT (even fine-tuned SciBERT) at text classification.
|
Thanks, I'm looking into this and trying their code. Will let you know |
I looked into Specter and it seems buggy. I filed an issue with their GitHub. allenai/specter#27 @hschilling @bruffridge @ARalevski @CkUnsworth There are some bugs with specter and I'll try to get training to work but we may need to contact them to find out how they fixed the issue above. |
Thanks for looking into that. |
They talked a lot about using the SPECTER embeddings and doing nearest neighbor. It involved using this https://allenai.org/data/s2orc which has 100,000,000 papers. If they only used the ones with PubMed identifiers, it would be "only" 20,000,000. With this, they were thinking of using simple classification methods like logistic regression
https://arxiv.org/pdf/2004.07180.pdf
https://github.com/allenai/specter
https://huggingface.co/allenai/specter
The text was updated successfully, but these errors were encountered: