-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: can not serialize 'TextBlob' object #22
Comments
Can you share the error you get? I did some testing. The details are below, but the short version is that I am not sure if it is a spacytextblob issue. I could not get My understanding is that even with just using Test 1 - running the code from AltfunsMAimport spacy
from spacytextblob.spacytextblob import SpacyTextBlob
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('spacytextblob')
l = ['This is great. But this is horrible', 'The answer to everything is 42. What did you believe?']
docs = nlp.pipe(l, n_process = 2)
for doc in docs:
for s in doc.sents:
print(s._.blob.polarity)
print(s._.blob.subjectivity) For me, the program just hangs here...
Test 2 - running the code without spacytextblobimport spacy
nlp = spacy.load("en_core_web_sm")
l = ['This is great. But this is horrible', 'The answer to everything is 42. What did you believe?']
docs = nlp.pipe(l, n_process = 2)
for doc in docs:
print(doc) I get the same error as above.
System infoOS
Python $ python --version
Python 3.10.4 Packages $ pip freeze
blis==0.7.7
catalogue==2.0.7
certifi==2022.5.18.1
charset-normalizer==2.0.12
click==8.1.3
cymem==2.0.6
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl
idna==3.3
Jinja2==3.1.2
joblib==1.1.0
langcodes==3.3.0
MarkupSafe==2.1.1
murmurhash==1.0.7
nltk==3.7
numpy==1.22.4
packaging==21.3
pathy==0.6.1
preshed==3.0.6
pydantic==1.8.2
pyparsing==3.0.9
regex==2022.6.2
requests==2.28.0
smart-open==5.2.1
spacy==3.3.1
spacy-legacy==3.0.9
spacy-loggers==1.0.2
spacytextblob==4.0.0
srsly==2.4.3
textblob==0.15.3
thinc==8.0.17
tqdm==4.64.0
typer==0.4.1
typing_extensions==4.2.0
urllib3==1.26.9
wasabi==0.9.1 |
I did some more digging. I follow the issue here from the spacy repo: explosion/spaCy#8654 Now here is what I get, this is probably the error you were referring too based on your title. import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
def main():
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('spacytextblob')
l = ['This is great. But this is horrible', 'The answer to everything is 42. What did you believe?']
docs = nlp.pipe(l, n_process=2)
for doc in docs:
for s in doc.sents:
print(s._.blob.polarity)
print(s._.blob.subjectivity)
if __name__ == '__main__':
main() Here is the output
|
Same issue as above, haven't really dug into why, but appear the the spacy .pipe can't handle the class objects. |
I think that is correct @getorca. spacy.pipe wants to serialize the data, and it is not possible to do that with a TextBlob object. I will leave this issue open because it would be helpful if we could find a way to make this work. |
The only thing I could think of when I was looking at it was actually returning the response from TextBlob on all the ents, sentences, etc. But that seems less than ideal because of the high overhead, especially on longer docs. |
One approach could be to return a dict or just attributes instead of the |
As mentioned above I think that will add significant overhead, as well as eating a lot memory. I've been experience a lot of overhead from serialisation recently in relation to multiprocessing. Best thing I can suggest is recommend users write a custom pipeline, only return the data they need from textblob. |
Just trying this expansion, but it doesn't seem like you can use the pipe with more than one process, which makes it far less attractive.
The following yields the error in the subject
The text was updated successfully, but these errors were encountered: