-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
upload again all the files for extraxtion
- Loading branch information
0 parents
commit 5e5fe7c
Showing
20 changed files
with
5,171 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
# Contributing to `dataset-extractor-lotus` | ||
|
||
Contributions are welcome, and they are greatly appreciated! | ||
Every little bit helps, and credit will always be given. | ||
|
||
You can contribute in many ways: | ||
|
||
# Types of Contributions | ||
|
||
## Report Bugs | ||
|
||
Report bugs at https://github.com/commons-research/dataset-extractor-lotus/issues | ||
|
||
If you are reporting a bug, please include: | ||
|
||
- Your operating system name and version. | ||
- Any details about your local setup that might be helpful in troubleshooting. | ||
- Detailed steps to reproduce the bug. | ||
|
||
## Fix Bugs | ||
|
||
Look through the GitHub issues for bugs. | ||
Anything tagged with "bug" and "help wanted" is open to whoever wants to implement a fix for it. | ||
|
||
## Implement Features | ||
|
||
Look through the GitHub issues for features. | ||
Anything tagged with "enhancement" and "help wanted" is open to whoever wants to implement it. | ||
|
||
## Write Documentation | ||
|
||
Cookiecutter PyPackage could always use more documentation, whether as part of the official docs, in docstrings, or even on the web in blog posts, articles, and such. | ||
|
||
## Submit Feedback | ||
|
||
The best way to send feedback is to file an issue at https://github.com/fpgmaas/dataset-extractor-lotus/issues. | ||
|
||
If you are proposing a new feature: | ||
|
||
- Explain in detail how it would work. | ||
- Keep the scope as narrow as possible, to make it easier to implement. | ||
- Remember that this is a volunteer-driven project, and that contributions | ||
are welcome :) | ||
|
||
# Get Started! | ||
|
||
Ready to contribute? Here's how to set up `dataset-extractor-lotus` for local development. | ||
Please note this documentation assumes you already have `poetry` and `Git` installed and ready to go. | ||
|
||
1. Fork the `dataset-extractor-lotus` repo on GitHub. | ||
|
||
2. Clone your fork locally: | ||
|
||
```bash | ||
cd <directory_in_which_repo_should_be_created> | ||
git clone [email protected]:commons-research/dataset-extractor-lotus.git | ||
``` | ||
|
||
3. Now we need to install the environment. Navigate into the directory | ||
|
||
```bash | ||
cd dataset-extractor-lotus | ||
``` | ||
|
||
If you are using `pyenv`, select a version to use locally. (See installed versions with `pyenv versions`) | ||
|
||
```bash | ||
pyenv local <x.y.z> | ||
``` | ||
|
||
Then, install and activate the environment with: | ||
|
||
```bash | ||
poetry install | ||
poetry shell | ||
``` | ||
|
||
4. Install pre-commit to run linters/formatters at commit time: | ||
|
||
```bash | ||
poetry run pre-commit install | ||
``` | ||
|
||
5. Create a branch for local development: | ||
|
||
```bash | ||
git checkout -b name-of-your-bugfix-or-feature | ||
``` | ||
|
||
Now you can make your changes locally. | ||
|
||
6. Don't forget to add test cases for your added functionality to the `tests` directory. | ||
|
||
7. When you're done making changes, check that your changes pass the formatting tests. | ||
|
||
```bash | ||
make check | ||
``` | ||
|
||
Now, validate that all unit tests are passing: | ||
|
||
```bash | ||
make test | ||
``` | ||
|
||
9. Before raising a pull request you should also run tox. | ||
This will run the tests across different versions of Python: | ||
|
||
```bash | ||
tox | ||
``` | ||
|
||
This requires you to have multiple versions of python installed. | ||
This step is also triggered in the CI/CD pipeline, so you could also choose to skip this step locally. | ||
|
||
10. Commit your changes and push your branch to GitHub: | ||
|
||
```bash | ||
git add . | ||
git commit -m "Your detailed description of your changes." | ||
git push origin name-of-your-bugfix-or-feature | ||
``` | ||
|
||
11. Submit a pull request through the GitHub website. | ||
|
||
# Pull Request Guidelines | ||
|
||
Before you submit a pull request, check that it meets these guidelines: | ||
|
||
1. The pull request should include tests. | ||
|
||
2. If the pull request adds functionality, the docs should be updated. | ||
Put your new functionality into a function with a docstring, and add the feature to the list in `README.md`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
GNU GENERAL PUBLIC LICENSE | ||
Version 3, 29 June 2007 | ||
|
||
Extract specific elements from the LOTUS dataset. | ||
Copyright (C) 2024 Pascal Amrein | ||
|
||
This program is free software: you can redistribute it and/or modify | ||
it under the terms of the GNU General Public License as published by | ||
the Free Software Foundation, either version 3 of the License, or | ||
(at your option) any later version. | ||
|
||
This program is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
GNU General Public License for more details. | ||
|
||
You should have received a copy of the GNU General Public License | ||
along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
Also add information on how to contact you by electronic and paper mail. | ||
|
||
You should also get your employer (if you work as a programmer) or school, | ||
if any, to sign a "copyright disclaimer" for the program, if necessary. | ||
For more information on this, and how to apply and follow the GNU GPL, see | ||
<http://www.gnu.org/licenses/>. | ||
|
||
The GNU General Public License does not permit incorporating your program | ||
into proprietary programs. If your program is a subroutine library, you | ||
may consider it more useful to permit linking proprietary applications with | ||
the library. If this is what you want to do, use the GNU Lesser General | ||
Public License instead of this License. But first, please read | ||
<http://www.gnu.org/philosophy/why-not-lgpl.html>. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
.PHONY: install | ||
install: ## Install the poetry environment and install the pre-commit hooks | ||
@echo "🚀 Creating virtual environment using pyenv and poetry" | ||
@poetry install | ||
@ poetry run pre-commit install | ||
@poetry shell | ||
|
||
.PHONY: check | ||
check: ## Run code quality tools. | ||
@echo "🚀 Checking Poetry lock file consistency with 'pyproject.toml': Running poetry lock --check" | ||
@poetry check --lock | ||
@echo "🚀 Linting code: Running pre-commit" | ||
@poetry run pre-commit run -a | ||
@echo "🚀 Static type checking: Running mypy" | ||
@poetry run mypy | ||
|
||
.PHONY: test | ||
test: ## Test the code with pytest | ||
@echo "🚀 Testing code: Running pytest" | ||
@poetry run pytest --doctest-modules | ||
|
||
.PHONY: build | ||
build: clean-build ## Build wheel file using poetry | ||
@echo "🚀 Creating wheel file" | ||
@poetry build | ||
|
||
.PHONY: clean-build | ||
clean-build: ## clean build artifacts | ||
@rm -rf dist | ||
|
||
.PHONY: help | ||
help: | ||
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}' | ||
|
||
.DEFAULT_GOAL := help |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# dataset extractor LOTUS | ||
|
||
[![Release](https://img.shields.io/github/v/release/fpgmaas/dataset-extractor-lotus)](https://img.shields.io/github/v/release/fpgmaas/dataset-extractor-lotus) | ||
[![Build status](https://img.shields.io/github/actions/workflow/status/fpgmaas/dataset-extractor-lotus/main.yml?branch=main)](https://github.com/fpgmaas/dataset-extractor-lotus/actions/workflows/main.yml?query=branch%3Amain) | ||
[![codecov](https://codecov.io/gh/fpgmaas/dataset-extractor-lotus/branch/main/graph/badge.svg)](https://codecov.io/gh/fpgmaas/dataset-extractor-lotus) | ||
[![Commit activity](https://img.shields.io/github/commit-activity/m/fpgmaas/dataset-extractor-lotus)](https://img.shields.io/github/commit-activity/m/fpgmaas/dataset-extractor-lotus) | ||
[![License](https://img.shields.io/github/license/fpgmaas/dataset-extractor-lotus)](https://img.shields.io/github/license/fpgmaas/dataset-extractor-lotus) | ||
[![DOI](https://zenodo.org/badge/DOI/records/7534071.svg)](https://zenodo.org/records/7534071) | ||
|
||
Extract specific elements/rows from the LOTUS dataset. | ||
|
||
## How to run the LOTUS extractor | ||
This is one possible example to run the script. | ||
|
||
```bash | ||
# navigater to a place, where you want to put the project | ||
cd ./path/to/folder/ | ||
|
||
# clone it from git and enter the folder | ||
git clone [email protected]:commons-research/dataset-extractor-lotus.git | ||
cd dataset-extractor-lotus | ||
|
||
# install the environment with poetry | ||
## if poetry is not already installed, follow the following steps | ||
pip install pipx | ||
pipx install poetry | ||
|
||
# Install from the poetry.lock file (command should be run in the same folder as the poetry.lock file) | ||
poetry install | ||
|
||
# run the script in poetry and get the help page | ||
poetry run python dataset_extractor_lotus/main.py -h | ||
|
||
# run the script in the interactive mode | ||
poetry run python dataset_extractor_lotus/main.py | ||
|
||
``` | ||
|
||
## Interactive Mode | ||
Example how to proceed: | ||
|
||
```bash | ||
Start interactive mode. | ||
? Welcome to the "toydataset extractor" for the LOTUS datasets. | ||
Chose your action: | ||
sampling | ||
❯ download | ||
exit (Ctrl+C) | ||
``` | ||
|
||
### Download dataset | ||
|
||
```bash | ||
Start interactive mode. | ||
? Welcome to the "toydataset extractor" for the LOTUS datasets. | ||
Chose your action: download | ||
The datasets will be searched from the internet. One moment please... | ||
? Please choose your dataset to download (sorted from newest to oldest): 220916_frozen_metadata.csv.gz (record: 7085063) | ||
? Enter path to download: data/ | ||
The dataset will be downloaded to data//220916_frozen_metadata.csv.gz (record: 7085063). | ||
Download complete: 220916_frozen_metadata.csv.gz | ||
``` | ||
|
||
### sampling from dataset | ||
|
||
```bash | ||
Start interactive mode. | ||
? Welcome to the "toydataset extractor" for the LOTUS datasets. | ||
Chose your action: sampling | ||
? Enter the filepath to sample from: data/220916_frozen_metadata.csv.gz | ||
Before type: String | ||
After type: Int32 | ||
? Please choose the taxonomy level to sample from: organism_taxonomy_05order | ||
? Choose from which member to sample (522 options): Anthoathecata | ||
? Please enter the amount of members to sample (max. 77): 32 | ||
? Please choose the output format: full | ||
? Enter the output file name or existing filename to append: test_sampling.csv | ||
File test_sampling.csv does not exist. Creating new file. | ||
``` | ||
|
||
|
||
## Documentation | ||
- **Github repository**: <https://github.com/commons-research/dataset-extractor-lotus.git> | ||
- **Documentation with mkdocs** <https://commons-research.github.io/dataset-extractor-lotus/> | ||
|
||
## Feedback | ||
Please follow this [guidelines](CONTRIBUTING.md) for reporting a bug. | ||
For other issues please use <https://github.com/commons-research/dataset-extractor-lotus/issues>. | ||
|
||
|
||
## Releasing a new version | ||
21.03.2024: v1.1 is released (interactive mode) | ||
07.03.2024: v1.0 is released (basic random sampling) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
* | ||
!.gitignore |
Oops, something went wrong.