Skip to content

Commit

Permalink
Add support for empty levels #minor (#37)
Browse files Browse the repository at this point in the history
* Add support for empty levels

* Refactor predict method for local classifier per parent node

* Refactor predict() method for local classifier per level

* Add black linting and badge
  • Loading branch information
mirand863 authored Jul 1, 2022
1 parent 233701e commit 5f62ca1
Show file tree
Hide file tree
Showing 19 changed files with 568 additions and 373 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/deploy-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
python -m pip install .
- name: Test with pytest
run: |
pytest -v
pytest -v --flake8 --pydocstyle --cov=hiclass --cov-fail-under=90 --cov-report html
coverage xml
- name: Upload Coverage to Codecov
if: matrix.os == 'ubuntu-latest'
Expand Down
9 changes: 7 additions & 2 deletions .github/workflows/test-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@ on:
- main

jobs:
build:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: psf/black@stable
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand All @@ -29,4 +34,4 @@ jobs:
python -m pip install .
- name: Test with pytest
run: |
pytest -v
pytest -v --flake8 --pydocstyle --cov=hiclass --cov-fail-under=90 --cov-report html
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

HiClass is an open-source Python library for hierarchical classification compatible with scikit-learn.

[![Deploy PyPI](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/mirand863/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/mirand863/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![Deploy PyPI](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/mirand863/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/mirand863/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

✨ Here is a **demo** that shows HiClass in action on hierarchical data:

Expand All @@ -16,7 +16,7 @@ HiClass is an open-source Python library for hierarchical classification compati
- [Who is using HiClass?](#who-is-using-hiclass)
- [Install](#install)
- [Quick start](#quick-start)
- [Step-by-step- walk-through](#step-by-step-walk-through)
- [Step-by-step walk-through](#step-by-step-walk-through)
- [API documentation](#api-documentation)
- [FAQ](#faq)
- [Support](#support)
Expand All @@ -34,7 +34,7 @@ HiClass is an open-source Python library for hierarchical classification compati
- **Hierarchical metrics:** HiClass supports the computation of hierarchical precision, recall and f-score, which are more appropriate for hierarchical data than traditional metrics.
- **Compatible with pickle:** Easily store trained models on disk for future use.

**Don't see a feature on this list?** Search our [issue tracker](https://github.com/mirand863/hiclass/issues) if someone has already requested it and add a comment to it explaining your use-case, or open a new issue if not. We prioritize our roadmap based on user feedback, so we'd love to hear from you.
**Any feature missing on this list?** Search our [issue tracker](https://github.com/mirand863/hiclass/issues) to see if someone has already requested it and add a comment to it explaining your use-case. Otherwise, please open a new issue describing the requested feature and possible use-case scenario. We prioritize our roadmap based on user feedback, so we would love to hear from you.

## Benchmarks

Expand Down Expand Up @@ -85,7 +85,7 @@ We would love to benchmark with larger datasets, if we can find them in the publ

Here is our public roadmap: https://github.com/mirand863/hiclass/projects/1.

We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you'd like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.
We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you would like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.


## Who is using HiClass?
Expand Down Expand Up @@ -123,7 +123,7 @@ Here's a quick example showcasing how you can train and predict using a local cl
from hiclass import LocalClassifierPerNode
from sklearn.ensemble import RandomForestClassifier

# define data
# Define data
X_train = [[1], [2], [3], [4]]
X_test = [[4], [3], [2], [1]]
Y_train = [
Expand Down Expand Up @@ -152,7 +152,7 @@ from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# define data
# Define data
X_train = [
'Struggling to repay loan',
'Unable to get annual report',
Expand Down Expand Up @@ -220,7 +220,9 @@ Please reach out to [email protected].

## Contributing

We are a small team on a mission to democratize hierarchical classification, and we'll take all the help we can get! If you'd like to get involved, here's information on [contribution guidelines and how to test the code locally](https://github.com/mirand863/hiclass/blob/main/CONTRIBUTING.md).
We are a small team on a mission to democratize hierarchical classification, and we will take all the help we can get! If you would like to get involved, here is information on [contribution guidelines and how to test the code locally](https://github.com/mirand863/hiclass/blob/main/CONTRIBUTING.md).

You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.

## Getting the latest updates

Expand Down
5 changes: 4 additions & 1 deletion docs/examples/README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Gallery of Examples
===================

These examples illustrate the main features of HiClass.
These examples illustrate the main features of HiClass.

.. toctree::
:hidden:
38 changes: 38 additions & 0 deletions docs/examples/plot_empty_levels.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# -*- coding: utf-8 -*-
"""
==========================
Different Number of Levels
==========================
HiClass supports different number of levels in the hierarchy.
For this example, we will train a local classifier per node
with a hierarchy similar to the following image:
.. figure:: ../algorithms/local_classifier_per_node.svg
:align: center
"""
from sklearn.linear_model import LogisticRegression

from hiclass import LocalClassifierPerNode

# Define data
X_train = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
X_test = [[9, 10], [7, 8], [5, 6], [3, 4], [1, 2]]
Y_train = [
["Bird"],
["Reptile", "Snake"],
["Reptile", "Lizard"],
["Mammal", "Cat"],
["Mammal", "Wolf", "Dog"],
]

# Use random forest classifiers for every node
rf = LogisticRegression()
classifier = LocalClassifierPerNode(local_classifier=rf)

# Train local classifier per node
classifier.fit(X_train, Y_train)

# Predict
predictions = classifier.predict(X_test)
print(predictions)
31 changes: 9 additions & 22 deletions docs/examples/plot_parallel_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
Larger datasets require more time for training.
While by default the models in HiClass are trained using a single core,
it is possible to train each local classifier in parallel by leveraging the library Ray [1]_.
In this example, we demonstrate how to train a hierarchical classifier in parallel,
using all the cores available, on a mock dataset from Kaggle [2]_.
In this example, we demonstrate how to train a hierarchical classifier in parallel by
setting the parameter :literal:`n_jobs` to use all the cores available. Training
is performed on a mock dataset from Kaggle [2]_.
.. [1] https://www.ray.io/
.. [2] https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification
Expand All @@ -25,29 +26,15 @@
from hiclass import LocalClassifierPerParentNode


def download(url: str, path: str) -> None:
"""
Download a file from the internet.
Parameters
----------
url : str
The address of the file to be downloaded.
path : str
The path to store the downloaded file.
"""
response = requests.get(url)
with open(path, "wb") as file:
file.write(response.content)


# Download training data
training_data_url = "https://zenodo.org/record/6657410/files/train_40k.csv?download=1"
training_data_path = "train_40k.csv"
download(training_data_url, training_data_path)
url = "https://zenodo.org/record/6657410/files/train_40k.csv?download=1"
path = "train_40k.csv"
response = requests.get(url)
with open(path, "wb") as file:
file.write(response.content)

# Load training data into pandas dataframe
training_data = pd.read_csv(training_data_path).fillna(" ")
training_data = pd.read_csv(path).fillna(" ")

# We will use logistic regression classifiers for every parent node
lr = LogisticRegression(max_iter=1000)
Expand Down
32 changes: 17 additions & 15 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,18 @@
#
import os
import sys
sys.path.insert(0, os.path.abspath('./../..'))
sys.path.insert(0, os.path.abspath('./../../hiclass'))

sys.path.insert(0, os.path.abspath("./../.."))
sys.path.insert(0, os.path.abspath("./../../hiclass"))
print(sys.path)

import sphinx_code_tabs

# -- Project information -----------------------------------------------------

project = 'hiclass'
copyright = '2022, Fabio Malcher Miranda, Niklas Köhnecke'
author = 'Fabio Malcher Miranda, Niklas Köhnecke'
project = "hiclass"
copyright = "2022, Fabio Malcher Miranda, Niklas Köhnecke"
author = "Fabio Malcher Miranda, Niklas Köhnecke"


# -- General configuration ---------------------------------------------------
Expand All @@ -32,15 +33,15 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.autosectionlabel',
'sphinx_code_tabs',
'sphinx_gallery.gen_gallery',
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.autosectionlabel",
"sphinx_code_tabs",
"sphinx_gallery.gen_gallery",
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand All @@ -55,12 +56,13 @@
use_rtd_scheme = False
try:
import sphinx_rtd_theme

extensions.extend(["sphinx_rtd_theme"])
use_rtd_scheme = True
except ImportError:
print("sphinx_rtd_theme was not installed, using alabaster as fallback!")

html_theme = 'sphinx_rtd_theme' if use_rtd_scheme else 'alabaster'
html_theme = "sphinx_rtd_theme" if use_rtd_scheme else "alabaster"


# Add any paths that contain custom static files (such as style sheets) here,
Expand All @@ -76,6 +78,6 @@
html_theme_options["sidebar_width"] = "230px"

sphinx_gallery_conf = {
'examples_dirs': '../examples',
'gallery_dirs': 'auto_examples',
}
"examples_dirs": "../examples",
"gallery_dirs": "auto_examples",
}
12 changes: 6 additions & 6 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ Welcome to hiclass' documentation!
:target: https://opensource.org/licenses/BSD-3-Clause
:alt: License

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black

.. toctree::
:titlesonly:
:includehidden:
:maxdepth: 3

introduction/index
get_started/index
auto_examples/index
algorithms/index

.. toctree::
:maxdepth: 3

api/index
api/index
Loading

0 comments on commit 5f62ca1

Please sign in to comment.