-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
4.0.0 TextBlob Extension Support (#18)
- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`. - Support for TextBlob extensions (https://textblob.readthedocs.io/en/dev/extensions.html#extensions). - Docs are build using Material for MkDocs (https://squidfunk.github.io/mkdocs-material/) instead of Docusaurus.
- Loading branch information
1 parent
f51b5ff
commit ac5789f
Showing
75 changed files
with
1,530 additions
and
1,768 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# API Reference | ||
|
||
## Custom attributes | ||
|
||
When you add *spacytextblob* into your spaCy pipeline it exposes a custom attribute `._.blob`. This attribute is available for for the `Doc`, `Span`, and `Token` classes from spaCy. | ||
|
||
- `Doc._.blob` | ||
- `Span._.blob` | ||
- `Token._.blob` | ||
|
||
The section below outlines commonly accessed `._.blob` attributes and methods. See the [textblob docs](https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob) for the complete listing of all attributes and methods that are available in `._.blob`. | ||
|
||
### Attributes | ||
|
||
| Name | Type | Description | | ||
|------|------|-------------| | ||
| `doc._.blob.polarity` | `Float` | The polarity of the document. The polarity score is a float within the range [-1.0, 1.0]. | | ||
| `doc._.blob.subjectivity` | `Float` | The subjectivity of the document. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. | | ||
| `doc._.blob.sentiment_assessments.assessments` | `tuple` | Return a tuple of form (polarity, subjectivity, assessments ) where polarity is a float within the range [-1.0, 1.0], subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective, and assessments is a list of polarity and subjectivity scores for the assessed tokens. | | ||
|
||
### Methods | ||
|
||
**`doc._.blob.ngrams`** | ||
|
||
| Name | Type | Description | | ||
|------|------|-------------| | ||
| n | `int` | The number of words to include in the ngram. By default `3`. | | ||
| RETURNS | `List[WordLists]` | | | ||
|
||
|
||
## Config | ||
|
||
When adding *spacytextblob* to your spaCy pipeline you can optionally pass additional parameters into the `config` parameter: | ||
|
||
| Name | Type | Description | | ||
|------|------|-------------| | ||
| `blob_only` | `bool` | If True, *spacytextblob* will only expose `._.blob` and not attempt to expose `._.polarity`, `._.subjectivity`, or `._.assessments`. This should always be set to True when using TextBlob extensions. By default `False`. | | ||
| `custom_blob` | `Dict[str, str]` | The `"custom_blob"` key should be assigned to a dictionary that tells spaCy what function to replace `textblob.TextBlob` with. In this case, we want to replace it with `TextBlobDE`. The key of the dictionary is `"@misc"`. This tells spaCy to look into the misc section of the spaCy register. The value should be the string name of a function that you have registered with spaCy. See the [TextBlob extensions](tutorial/textblob_extensions.md) section for more details. | | ||
|
||
|
||
```python | ||
import spacy | ||
from spacytextblob.spacytextblob import SpacyTextBlob | ||
|
||
nlp = spacy.load("de_core_news_sm") | ||
|
||
nlp.add_pipe( "spacytextblob", config={ | ||
"blob_only": ..., # bool | ||
"custom_blob": ... # Dict[str, str] | ||
}) | ||
``` | ||
|
||
### Examples | ||
|
||
Using *spacytextblob* without an extension: | ||
|
||
```python | ||
{! docs/static/reference_code/spacytextblob_example.py !} | ||
``` | ||
|
||
Using *spacytextblob* with an extension: | ||
|
||
```python | ||
{! docs/static/reference_code/textblob_de_example.py !} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Changelog | ||
|
||
## 4.0.0 (TBD) | ||
|
||
- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`. | ||
- Support for TextBlob extensions (https://textblob.readthedocs.io/en/dev/extensions.html#extensions). | ||
- Docs are build using Material for MkDocs (https://squidfunk.github.io/mkdocs-material/) instead of Docusaurus. | ||
|
||
## 3.0.1 (2021-05-05) | ||
|
||
- Update the README on PyPi. | ||
|
||
## 3.0 (2021-04-02) | ||
|
||
- Dropped support for spaCy 2.0 API. | ||
|
||
## 0.1.0 to 0.1.7 | ||
|
||
- Supports spaCy 2.0 API. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{! CONTRIBUTING.md !} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{! README.md !} |
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
import spacy | ||
from spacytextblob.spacytextblob import SpacyTextBlob | ||
|
||
nlp = spacy.load('en_core_web_sm') | ||
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy." | ||
nlp.add_pipe("spacytextblob") | ||
doc = nlp(text) | ||
|
||
print(doc._.blob.polarity) | ||
# -0.125 | ||
|
||
print(doc._.blob.subjectivity) | ||
# 0.9 | ||
|
||
print(doc._.blob.sentiment_assessments.assessments) | ||
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)] | ||
|
||
print(doc._.blob.ngrams()) | ||
# [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
import spacy | ||
from spacytextblob.spacytextblob import SpacyTextBlob | ||
from textblob_de import TextBlobDE | ||
|
||
|
||
text = ''' | ||
Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. Ich muss | ||
unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider | ||
habe ich nur noch EUR 3.50 in meiner Brieftasche. | ||
''' | ||
|
||
@spacy.registry.misc("spacytextblob.de_blob") | ||
def create_de_blob(): | ||
return TextBlobDE | ||
|
||
config = { | ||
"blob_only": True, | ||
"custom_blob": {"@misc": "spacytextblob.de_blob"} | ||
} | ||
|
||
nlp = spacy.load("de_core_news_sm") | ||
nlp.add_pipe("spacytextblob", config=config) | ||
doc = nlp(text) | ||
|
||
print(doc._.blob.sentences) | ||
# [Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."), Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."), Sentence("Aber leider habe ich nur noch EUR 3.50 in meiner Brieftasche.")] | ||
|
||
print(doc._.blob.sentiment) | ||
# Sentiment(polarity=0.0, subjectivity=0.0) | ||
|
||
print(doc._.blob.tags) | ||
# [('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'), ('2014', 'CD'), ('und', 'CC'), ('Dr.', 'NN'), ('Meier', 'NN'), ('feiert', 'NN'), ('seinen', 'PRP$'), ('43.', 'CD'), ('Geburtstag', 'NN'), ('Ich', 'PRP'), ('muss', 'VB'), ('unbedingt', 'RB'), ('daran', 'RB'), ('denken', 'VB'), ('Mehl', 'NN'), ('usw.', 'IN'), ('für', 'IN'), ('einen', 'DT'), ('Kuchen', 'JJ'), ('einzukaufen', 'NN'), ('Aber', 'CC'), ('leider', 'VBN'), ('habe', 'VB'), ('ich', 'PRP'), ('nur', 'RB'), ('noch', 'IN'), ('EUR', 'NN'), ('3.50', 'CD'), ('in', 'IN'), ('meiner', 'JJ'), ('Brieftasche', 'NN')] |
Oops, something went wrong.