Skip to content

Commit

Permalink
Merge pull request #365 from BioKIC/taxonomy
Browse files Browse the repository at this point in the history
Additions to instructions for bulk importing taxonomy
  • Loading branch information
ljwalker authored Jun 22, 2023
2 parents c3d56c2 + 7e76c7e commit c4ac193
Show file tree
Hide file tree
Showing 4 changed files with 29,048 additions and 9 deletions.
68 changes: 59 additions & 9 deletions content/portal_manager/taxonomy/batch_load/_index.en.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,73 @@
---
title: "Batch Loading Taxonomic Data"
date: 2022-07-06
lastmod: 2022-07-27
lastmod: 2023-06-21
authors: ["Cat Chapman"]
editors: ["Katie Pearson"]
editors: ["Lindsay Walker, Katie Pearson"]
draft: false
weight: 10
keywords: ["taxonomy","taxonomic thesaurus"]
---

{{< notice info >}}
This page describes how taxonomy can be added to a Symbiota portal in bulk.
{{</ notice >}}

{{< notice note >}}
A user must have Super Administrator or Taxonomic Administrator permissions to add to or edit the thesaurus.
{{</ notice >}}

Taxanomic names can be batch loaded into the taxonomic thesaurus from a flat, structured comma- or tab-delimited text file. To upload an Excel file, first save as a CSV (comma delimited) text file. The following fields can be included in your CSV file:
* **Scientific name** (required)
* Taxon author
* Parent taxon name
* Family
* Accepted taxon name (if different then scientific name)
## Input File Format
Taxonomic names can be batch loaded into the thesaurus from a flat, structured comma- or tab-delimited text file. To upload an Excel file, first save as a CSV (comma delimited) text file. Formatting this file correctly is important to ensuring your taxonomy is successfully imported. **At minimum, the file must include a column for _ScientificName_ and the parent taxon** (i.e. the taxon that nests immediately above the scientificName in your taxonomic hierarchy).

For example, to import the name _Buccinum polaris_ Gray, 1839 your input file could include the following fields:
| Taxon Rank| Parent | ScientificName | Family | Author | RankID |
| - | - | - | - | - | - |
| species | Buccinum | Buccinum polaris | Buccinidae | Gray, 1839 | 220 |

However, if the higher taxonomy does not already exist in your portal, the following rows should also be included in your file:
| Taxon Rank| Parent | ScientificName | Family | Author | RankID |
| - | - | - | - | - | - |
| division | Animalia | Mollusca | | | 30 |
| class | Mollusca | Gastropoda | | | 60 |
| order | Gastropoda | Neogastropoda | | | 100 |
| superfamily | Neogastropoda | Buccinoidea | | | 130 |
| family | Buccinoidea | Buccinidae | Buccinidae | | 140 |
| genus | Buccinidae | Buccinum | Buccinidae | | 180 |
| species | Buccinum | Buccinum polaris | Buccinidae | Gray, 1839 | 220 |


{{< button href="../../../documents/Fresh_Symbiota_Install_Animalia_Sample_Taxonomy_2023-06-21.csv" text="Download example CSV" >}}
{{< button href="https://github.com/BioKIC/symbiota-docs/blob/master/static/documents/Fresh_Symbiota_Install_Animalia_Sample_Taxonomy_2023-06-21.csv" text="View example as a CSV" >}}

### Tips for preparing your input file
- **Parent taxa must already exist in the thesaurus or be included in the input file.** As an alternative to a parent taxon column, as shown above, the input file can contain columns for the core hierarchy that is defined within the taxonunits table (e.g. kingdom, division, class, subclass, order, family, etc).
- Values for *_rankid_* are assigned on the backend of your database in the table _taxonunits_. This value may vary between Symbiota portals.
- If a taxon's acceptance status is not defined (0 = not accepted, 1 = accepted ), all taxa will be assumed to be accepted. Include a separate column in your input file to denote acceptance status, if desired.
- In the absence of a hierarchical definition (e.g. the parent taxa), infraspecific, specific, and genus linkage will be determined from scientificName and linked to family. If the Family value does not yet exist in the thesaurus _and_ hierarchy above the specific family is not defined in upload field, the Family value will be linked directly to Kingdom.
- You can only batch input taxonomy for one Kingdom at a time (Animalia, Fungi, etc.)
- Make sure not to map more than one source column to the same target, and that source column names are unique from one another.
- The batch loader will import new taxonomy in the order it appears in the file, starting with the first row. It is therefore best to organize the input file such that highest ranks appear first, followed by lower ranking taxa.
- Contact the Hub or someone with backend access to add new ranks. Ranks are added on a per-kingdom basis and may require community discussion in order to be inserted into the hierarchy.

## Using the Batch Loader
Once your file is prepared, if your user account has the appropriate permissions, you can batch upload taxonomy from a spreadsheet (CSV) or text file by navigating to _Sitemap > Taxonomy > Batch Upload a Taxonomic Data File > Taxa Upload_.
![Taxonomy Batch Upload Tool](/symbiota-docs/images/taxonomybatchloader.png)

1) Select your formatted input file.
2) Select the Target Thesaurus. (Your portal may only have one option.)
3) Select the Kingdom.

4) Once you select _Map Input File_, you will be prompted to map the fields in your input file to target fields in the database. Note that the parent taxon = _parentstr_ and scientificName = _scinameinput_.
![Taxonomy Batch Upload Tool](/symbiota-docs/images/taxonomybatchloader2.png)

5) Once the fields are mapped, select _Verify Mapping_.

6) Once the file is processed, you will be given the option to review the new names as they were imported into the database's internal staging tables. The is your last chance to preview changes before they are made permanent in the thesaurus; do so by selecting the link, _Download CSV Taxa File_. Look for an _errorstatus_ column in the CSV explaining any records that failed to import.

7) If you are satisfied with the import, select _Activate Taxa_ to transfer the new taxonomy to the portal's thesaurus. If not, close the window and start over.

Parent taxa need to be already in the system or be included in the input file. As an alternative to a parent taxon column, the input file can contain columns for the core hierarchy that is defined within the taxonunits table (e.g. kingdom, division, class, subclass, order, family, etc). After selecting the upload file, it will be necessary to map the input columns to the Symbiota structure. Scientific name, with or without author imbedded, must be mapped to "scinameinput". If acceptance (0 = not accepted, 1 = accepted ) is not defined, all taxa will be assumed to be accepted. In the absence of a hierarchical definition (e.g. parent taxa), infraspecific, specific, and genus linkage will be determined from scientific name and linked to family. If family does not yet exist in thesaurus and hierarchy above family is not defined in upload field, family will be linked directly to kingdom. Make sure not to map more than one source column to the same target. Finally, make sure that source column names are unique from one another.
### Tips for batch importing taxonomy
- If you are new to this process, complete a few test imports to be sure your new taxon records are ingesting correctly. You can review new records in the Taxonomy Explorer (_Sitemap > Additional Resources > Taxonomy Explorer_).
- For especially complex taxonomic trees, import higher taxonomy first, then import new genera and species-level names.
- When batch importing synonyms, import all accepted names (and, if needed, their higher taxonomy) first. Then import the synonyms and their associated higher taxonomy. A column containing the accepted name for each synonym can be included in your input file.
Loading

0 comments on commit c4ac193

Please sign in to comment.