Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract writing encoding error #2

Open
bluetyson opened this issue Aug 5, 2020 · 4 comments
Open

Abstract writing encoding error #2

bluetyson opened this issue Aug 5, 2020 · 4 comments

Comments

@bluetyson
Copy link
Contributor

Downloading preprint 1 of 1582
Downloading preprint 2 of 1582
Downloading preprint 3 of 1582


UnicodeEncodeError Traceback (most recent call last)
in
74 # write the abstract to a file
75 abstF = open(localAbstract, 'w')
---> 76 abstF.write( preprint.description )
77 abstF.close()
78

~\miniconda3\envs\avant2\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u1e9f' in position 541: character maps to

@bluetyson
Copy link
Contributor Author

This is on Windows 10, so can often be a problem.

@bluetyson
Copy link
Contributor Author

Will see if this works :- # write the abstract to a file
abstF = open(localAbstract, 'w', encoding='utf8')
abstF.write( preprint.description )
abstF.close()

Also, thanks for doing this - now I just have to do the text extraction piece.

@narock
Copy link
Contributor

narock commented Aug 6, 2020

@bluetyson I wanted to make sure you were aware of EarthArXiv's pending move to California Digital Libraries (CDL). We will be leaving Center for Open Science at the end of August. There's a lot of benefits to moving to CDL; however, the down side is that there will be a new API and this code will no longer work.

CDL hasn't released the specs for the new API yet. If you're using this current API for text extraction/analysis please have everything you need from the API by Friday August 21.

EarthArXiv will likely be offline for a few weeks after that. I'll update this repository with a new version as soon as we know more about the new API

@bluetyson
Copy link
Contributor Author

Thanks, I did read that. Yes, I have downloaded everything I need for my test now. Answers the question of whether a pull request is useful, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants