Skip to content

Commit

Permalink
4.0.0 TextBlob Extension Support (#18)
Browse files Browse the repository at this point in the history
- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`.
- Support for TextBlob extensions (https://textblob.readthedocs.io/en/dev/extensions.html#extensions).
- Docs are build using Material for MkDocs (https://squidfunk.github.io/mkdocs-material/) instead of Docusaurus.
  • Loading branch information
SamEdwardes authored Feb 19, 2022
1 parent f51b5ff commit ac5789f
Show file tree
Hide file tree
Showing 75 changed files with 1,530 additions and 1,768 deletions.
6 changes: 5 additions & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.6", "3.7", "3.8", "3.9"]
python-version: ["3.7", "3.8", "3.9"]

steps:
- uses: actions/checkout@v2
Expand All @@ -33,6 +33,10 @@ jobs:
pip install -r requirements.txt
python -m textblob.download_corpora
python -m spacy download en_core_web_sm
pip install textblob-de
python -m spacy download de_core_news_sm
pip install textblob-fr
python -m spacy download fr_core_news_sm
pip install pytest
- name: Test with pytest
run: |
Expand Down
30 changes: 23 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,30 @@
# Contiributing
# Contributing

spaCyTextBlob is happy to accept contributions from the community. Please review the guidelines below.
*spacytextblob* is happy to accept contributions from the community. Please review the guidelines below.

## Development environment

### poetry

`poetry` is used to manage python dependencies. See the docs on how to install python [https://python-poetry.org/](https://python-poetry.org/). To activate the poetry virtual environment run the following commands:

```bash
poetry install
poetry shell
```

### just

`just` is used to run scripts. See the just docs for instructions on how to install: [https://github.com/casey/just](https://github.com/casey/just).

## Code formatting

Please use the [black](https://black.readthedocs.io/en/stable/) for formatting code before submitting a PR.

```bash
black spacytextblob
```

## Testing

Please validate that all tests pass before submitting a PR by running:
Expand All @@ -16,11 +35,8 @@ pytest

## Docs

To build the docs please run:
To build the docs and visually inspect the docs please run:

```bash
bash scripts/build_docs.sh
just docs
```

If you add new documentation using a jupyter notebook please make sure to update [scripts/build_docs.sh](scripts/build_docs.sh) to include the new notebook.

11 changes: 0 additions & 11 deletions Makefile

This file was deleted.

101 changes: 32 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
# spaCyTextBlob <a href='https://spacytextblob.netlify.app/'><img src='website/static/img/logo-thumb-circle-250x250.png' align="right" height="139" /></a>
# spacytextblob

[![PyPI version](https://badge.fury.io/py/spacytextblob.svg)](https://badge.fury.io/py/spacytextblob)
[![pytest](https://github.com/SamEdwardes/spaCyTextBlob/actions/workflows/pytest.yml/badge.svg)](https://github.com/SamEdwardes/spaCyTextBlob/actions/workflows/pytest.yml)
[![pytest](https://github.com/SamEdwardes/spacytextblob/actions/workflows/pytest.yml/badge.svg)](https://github.com/SamEdwardes/spacytextblob/actions/workflows/pytest.yml)
![PyPI - Downloads](https://img.shields.io/pypi/dm/spacytextblob?label=PyPi%20Downloads)
[![Netlify Status](https://api.netlify.com/api/v1/badges/e2f2caac-7239-45a2-b145-a00205c3befb/deploy-status)](https://app.netlify.com/sites/spacytextblob/deploys)


A TextBlob sentiment analysis pipeline compponent for spaCy.

Version 3.0 is a major version update providing support for spaCy 3.0's new interface for adding pipeline components. As a result, it is not backwards compatible with previous versions of spaCyTextBlob. For compatability with spaCy 2.0 please use `pip install spacytextblob==0.1.7`.

*Note that version 1.0, and 2.0 have been skipped. The numbering has been aligned with spaCy's version numbering in the hopes of making it easier to compar.*
A TextBlob sentiment analysis pipeline component for spaCy.

- [Docs](https://spacytextblob.netlify.app/)
- [GitHub](https://github.com/SamEdwardes/spaCyTextBlob)
- [GitHub](https://github.com/SamEdwardes/spacytextblob)
- [PyPi](https://pypi.org/project/spacytextblob/)

## Table of Contents
Expand All @@ -25,119 +20,87 @@ Version 3.0 is a major version update providing support for spaCy 3.0's new inte

## Install

Install spaCyTextBlob from pypi.
Install *spacytextblob* from PyPi.

```bash
pip install spacytextblob
```

TextBlob also requires some data to be downloaded before getting started.
TextBlob requires additional data to be downloaded before getting started.

```bash
python -m textblob.download_corpora
```

spaCy requires that you download a model to get started.
spaCy also requires that you download a model to get started.

```bash
python -m spacy download en_core_web_sm
```

## Quick Start

spaCyTextBlob allows you to access all of the attributes created by TextBlob sentiment method but within the spaCy framework. The code below will demonstrate how to use spaCyTextBlob on a simple string.


```python
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
```

Using `spaCyTextBlob`:

*spacytextblob* allows you to access all of the attributes created of the `textblob.TextBlob` class but within the spaCy framework. The code below will demonstrate how to use *spacytextblob* on a simple string.

```python
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
nlp.add_pipe("spacytextblob")
doc = nlp(text)
```

print(doc._.blob.polarity)
# -0.125

```python
print('Polarity:', doc._.polarity)
```

Polarity: -0.125
print(doc._.blob.subjectivity)
# 0.9



```python
print('Sujectivity:', doc._.subjectivity)
print(doc._.blob.sentiment_assessments.assessments)
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
```

Sujectivity: 0.9



```python
print('Assessments:', doc._.assessments)
```

Assessments: [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]


Using `TextBlob`:

In comparison, here is how the same code would look using `TextBlob`:

```python
from textblob import TextBlob
blob = TextBlob(text)
```

text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
blob = TextBlob(text)

```python
print(blob.sentiment_assessments.polarity)
```

-0.125
# -0.125



```python
print(blob.sentiment_assessments.subjectivity)
```

0.9

# 0.9


```python
print(blob.sentiment_assessments.assessments)
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]
```

[(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]


## Quick Reference

spaCyTextBlob performs sentiment analysis using the [TextBlob](https://textblob.readthedocs.io/en/dev/quickstart.html) library. Adding spaCyTextBlob to a spaCy nlp pipeline provides access to three new extension attributes.
*spacytextblob* performs sentiment analysis using the [TextBlob](https://textblob.readthedocs.io/en/dev/quickstart.html) library. Adding *spacytextblob* to a spaCy nlp pipeline creates a new extension attribute for the `Doc`, `Span`, and `Token` classes from spaCy.

- `Doc._.blob`
- `Span._.blob`
- `Token._.blob`

- `._.polarity`
- `._.subjectivity`
- `._.assessments`
The `._.blob` attribute contains all of the methods and attributes that belong to the `textblob.TextBlob` class Some of the common methods and attributes include:

These extension attributes can be accessed at the `Doc`, `Span`, or `Token` level.
- **`._.blob.polarity`**: a float within the range [-1.0, 1.0].
- **`._.blob.subjectivity`**: a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
- **`._.blob.sentiment_assessments.assessments`**: a list of polarity and subjectivity scores for the assessed tokens.

Polarity is a float within the range [-1.0, 1.0], subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective, and assessments is a list of polarity and subjectivity scores for the assessed tokens.
See the [textblob docs](https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob) for the complete listing of all attributes and methods that are available in `._.blob`.

## Reference and Attribution

- TextBlob
- [https://github.com/sloria/TextBlob](https://github.com/sloria/TextBlob)
- [https://textblob.readthedocs.io/en/latest/](https://textblob.readthedocs.io/en/latest/)
- negspaCy (for inpiration in writing pipeline and organizing repo)
- negspaCy (for inspiration in writing pipeline and organizing repo)
- [https://github.com/jenojp/negspacy](https://github.com/jenojp/negspacy)
- spaCy custom components
- [https://spacy.io/usage/processing-pipelines#custom-components](https://spacy.io/usage/processing-pipelines#custom-components)
65 changes: 65 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# API Reference

## Custom attributes

When you add *spacytextblob* into your spaCy pipeline it exposes a custom attribute `._.blob`. This attribute is available for for the `Doc`, `Span`, and `Token` classes from spaCy.

- `Doc._.blob`
- `Span._.blob`
- `Token._.blob`

The section below outlines commonly accessed `._.blob` attributes and methods. See the [textblob docs](https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob) for the complete listing of all attributes and methods that are available in `._.blob`.

### Attributes

| Name | Type | Description |
|------|------|-------------|
| `doc._.blob.polarity` | `Float` | The polarity of the document. The polarity score is a float within the range [-1.0, 1.0]. |
| `doc._.blob.subjectivity` | `Float` | The subjectivity of the document. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. |
| `doc._.blob.sentiment_assessments.assessments` | `tuple` | Return a tuple of form (polarity, subjectivity, assessments ) where polarity is a float within the range [-1.0, 1.0], subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective, and assessments is a list of polarity and subjectivity scores for the assessed tokens. |

### Methods

**`doc._.blob.ngrams`**

| Name | Type | Description |
|------|------|-------------|
| n | `int` | The number of words to include in the ngram. By default `3`. |
| RETURNS | `List[WordLists]` | |


## Config

When adding *spacytextblob* to your spaCy pipeline you can optionally pass additional parameters into the `config` parameter:

| Name | Type | Description |
|------|------|-------------|
| `blob_only` | `bool` | If True, *spacytextblob* will only expose `._.blob` and not attempt to expose `._.polarity`, `._.subjectivity`, or `._.assessments`. This should always be set to True when using TextBlob extensions. By default `False`. |
| `custom_blob` | `Dict[str, str]` | The `"custom_blob"` key should be assigned to a dictionary that tells spaCy what function to replace `textblob.TextBlob` with. In this case, we want to replace it with `TextBlobDE`. The key of the dictionary is `"@misc"`. This tells spaCy to look into the misc section of the spaCy register. The value should be the string name of a function that you have registered with spaCy. See the [TextBlob extensions](tutorial/textblob_extensions.md) section for more details. |


```python
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load("de_core_news_sm")

nlp.add_pipe( "spacytextblob", config={
"blob_only": ..., # bool
"custom_blob": ... # Dict[str, str]
})
```

### Examples

Using *spacytextblob* without an extension:

```python
{! docs/static/reference_code/spacytextblob_example.py !}
```

Using *spacytextblob* with an extension:

```python
{! docs/static/reference_code/textblob_de_example.py !}
```
19 changes: 19 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Changelog

## 4.0.0 (TBD)

- New custom attribute `doc._.blob`, `span._.blob`, `token._.blob`.
- Support for TextBlob extensions (https://textblob.readthedocs.io/en/dev/extensions.html#extensions).
- Docs are build using Material for MkDocs (https://squidfunk.github.io/mkdocs-material/) instead of Docusaurus.

## 3.0.1 (2021-05-05)

- Update the README on PyPi.

## 3.0 (2021-04-02)

- Dropped support for spaCy 2.0 API.

## 0.1.0 to 0.1.7

- Supports spaCy 2.0 API.
1 change: 1 addition & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{! CONTRIBUTING.md !}
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{! README.md !}
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added docs/static/img/logo_transparent_computer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
19 changes: 19 additions & 0 deletions docs/static/reference_code/spacytextblob_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
text = "I had a really horrible day. It was the worst day ever! But every now and then I have a really good day that makes me happy."
nlp.add_pipe("spacytextblob")
doc = nlp(text)

print(doc._.blob.polarity)
# -0.125

print(doc._.blob.subjectivity)
# 0.9

print(doc._.blob.sentiment_assessments.assessments)
# [(['really', 'horrible'], -1.0, 1.0, None), (['worst', '!'], -1.0, 1.0, None), (['really', 'good'], 0.7, 0.6000000000000001, None), (['happy'], 0.8, 1.0, None)]

print(doc._.blob.ngrams())
# [WordList(['I', 'had', 'a']), WordList(['had', 'a', 'really']), WordList(['a', 'really', 'horrible']), WordList(['really', 'horrible', 'day']), WordList(['horrible', 'day', 'It']), WordList(['day', 'It', 'was']), WordList(['It', 'was', 'the']), WordList(['was', 'the', 'worst']), WordList(['the', 'worst', 'day']), WordList(['worst', 'day', 'ever']), WordList(['day', 'ever', 'But']), WordList(['ever', 'But', 'every']), WordList(['But', 'every', 'now']), WordList(['every', 'now', 'and']), WordList(['now', 'and', 'then']), WordList(['and', 'then', 'I']), WordList(['then', 'I', 'have']), WordList(['I', 'have', 'a']), WordList(['have', 'a', 'really']), WordList(['a', 'really', 'good']), WordList(['really', 'good', 'day']), WordList(['good', 'day', 'that']), WordList(['day', 'that', 'makes']), WordList(['that', 'makes', 'me']), WordList(['makes', 'me', 'happy'])]
32 changes: 32 additions & 0 deletions docs/static/reference_code/textblob_de_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from textblob_de import TextBlobDE


text = '''
Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. Ich muss
unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider
habe ich nur noch EUR 3.50 in meiner Brieftasche.
'''

@spacy.registry.misc("spacytextblob.de_blob")
def create_de_blob():
return TextBlobDE

config = {
"blob_only": True,
"custom_blob": {"@misc": "spacytextblob.de_blob"}
}

nlp = spacy.load("de_core_news_sm")
nlp.add_pipe("spacytextblob", config=config)
doc = nlp(text)

print(doc._.blob.sentences)
# [Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."), Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."), Sentence("Aber leider habe ich nur noch EUR 3.50 in meiner Brieftasche.")]

print(doc._.blob.sentiment)
# Sentiment(polarity=0.0, subjectivity=0.0)

print(doc._.blob.tags)
# [('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'), ('2014', 'CD'), ('und', 'CC'), ('Dr.', 'NN'), ('Meier', 'NN'), ('feiert', 'NN'), ('seinen', 'PRP$'), ('43.', 'CD'), ('Geburtstag', 'NN'), ('Ich', 'PRP'), ('muss', 'VB'), ('unbedingt', 'RB'), ('daran', 'RB'), ('denken', 'VB'), ('Mehl', 'NN'), ('usw.', 'IN'), ('für', 'IN'), ('einen', 'DT'), ('Kuchen', 'JJ'), ('einzukaufen', 'NN'), ('Aber', 'CC'), ('leider', 'VBN'), ('habe', 'VB'), ('ich', 'PRP'), ('nur', 'RB'), ('noch', 'IN'), ('EUR', 'NN'), ('3.50', 'CD'), ('in', 'IN'), ('meiner', 'JJ'), ('Brieftasche', 'NN')]
Loading

0 comments on commit ac5789f

Please sign in to comment.