This repo is a template repository for the files needed for migrating Inventory data from a source system into FOLIO
TLDR; Create a new private repository based on this template. Clone it and then run create_folder_structure.sh
- Supported migration tasks
- FOLIO Inventory data migration process
- Mapping files
- Example Records
- Perform a test migration
- Before you begin
- Transform bibs
- Post tranformed Instances and SRS records
- Transform MFHD records to holdings and SRS holdings
- Post tranformed MFHDs and Holdingsrecords to FOLIO
- Transform CSV/TSV files into Holdingsrecords
- Post trasformed Holdingsrecords to FOLIO
- Transform CSV/TSV files into Items
- Post transformed Items to FOLIO
- Transform CSV/TSV files into FOLIO users
- Post transformed users to FOLIO
Table of contents generated with markdown-toc
- Batch Poster (BatchPoster) - Post generated objects to FOLIO
- Bibs Transformer (BibsTransformer) - Transform MARC21 Bib records to FOLIO Instances and SRS records
- Holdings CSV Transformer (HoldingsCsvTransformer) - Creates FOLIO holdingsrecords from a TSV or CSV File
- Holdings MARC transformer (HoldingsMarcTransformer) - Transforms MARC21 MFHD records into FOLIO Holdings and SRS records
- Items Transformer (ItemsTransformer) - Creates FOLIO Items from a TSV or CSV File
- User Transformer (UserTransformer) - Creates FOLIO Users from a TSV or CSV File
This repository template plays a vital part in a process together with other repos allowing you to perform data migrations from a legacy ILS into FOLIO.
The program you will need to run the process is FOLIO Migration Tools. This is a Python program, not yet on PyPi, so you will need to clone it.
The toolkit requires you to run the transformation an data loading in sequence, and each step relies on previous migrations steps, like the existance of a map file with legacy system IDs and their FOLIO equivalents. The below picture shows the proposed migration steps for legacy objects into FOLIO:
The repo contains the following mapping files in the Mapping files folder. There is a web tool that helps you crate the mapping files for certain objects available at https://data-mapping-file-creator.folio.ebsco.com/data_mapping_creation
File\Process | Bibs->Instances | Holdings (from MARC/MFHD) | Holdings (from item tsv/csv) | Items | Open Loans | Users |
---|---|---|---|---|---|---|
marc-instance-mapping-rules.json | yes | no | no | no | no | no |
mfhd_rules.json | no | yes | no | no | no | no |
item_mapping.json | no | no | no | yes | no | no |
holdings_mapping.json | no | no | yes | no | no | |
locations.tsv | no | yes | yes | yes | no | no |
temp_locations.tsv | no | no | no | optional | no | no |
material_types.tsv | no | no | no | yes | no | |
loan_types.tsv | no | no | no | yes | no | no |
temp_loan_types.tsv | no | no | no | optional | no | no |
call_number_type_mapping.tsv | no | no | optional | optional | no | no |
statcodes.tsv | no | no | optional | optional | no | no |
item_statuses.tsv | no | no | no | optional | no | no |
post_loan_migration_statuses.tsv | no | no | no | no | optional | no |
patron_types.tsv | no | no | no | no | no | yes |
user_mapping.json | no | no | no | no | no | yes |
department_mapping.tsv | no | no | no | no | no | yes |
These are the mapping rules from MARC21 bib records to FOLIO instances. The rules are stored in the tenant, but it is good practice to keep them under version control so you can maintain the customizations as the mapping rules evolve.For more information on syntax etc, read the documentation.
This file is built out according to the mapping rules for bibs. The conditions are different, and not well documented at this point. Look at the example file and refer to the mappinrules documentation
Just as the item_mapping.json and the user mapping files, these files are esiest to create using the data-mapping-file-creator tool You base the mapping on the same item export as you use for the items.
This is a mapping file for the items. The process assumes you have the item data in a CSV/TSV format. The structure of the file is dependant on the the column names in the TSV file. For example, if you have a file that looks like this:
... | Z30_BARCODE | Z30_CALL_NO | Z30_DESCRIPTION | ... |
---|---|---|---|---|
... | 123456790 | Some call number | some note | ... |
Your map should look like this:
...
{
"folio_field": "barcode",
"legacy_field": "Z30_BARCODE",
"value":"",
"description": ""
},
{
"folio_field": "itemLevelCallNumber",
"legacy_field": "Z30_CALL_NO",
"value":"",
"description": ""
},
{
"folio_field": "notes[0].itemNoteTypeId",
"legacy_field": "Z30_DESCRIPTION",
"value": "c7bc292c-a318-43d3-9b03-7a40dfba046a",
"description": ""
},
{
"folio_field": "notes[0].staffOnly",
"legacy_field": "Z30_DESCRIPTION",
"value": false,
"description": ""
},
{
"folio_field": "notes[0].note",
"legacy_field": "Z30_DESCRIPTION",
"value": false,
"description": ""
},
...
The resulting FOLIO Item would look like this:
{
...
"barcode": "123456790",
"itemLevelCallNumber": "Some call number"
"notes":[{
"staffOnly": false,
"note": "some note",
"itemNoteTypeId": "c7bc292c-a318-43d3-9b03-7a40dfba046a"
}],
...
}
All mapping files (locations.tsv, material_types.tsv, locations.tsv etc) have a mechanism that allows you to add * to legacy fields in a row, and add the falback value from folio in the folio_code/folio_name column. If the mapping fails, the script will assign this value to the records created. Good practice is to have migration-specific value as a falback value to be able to locate the records in FOLIO
These mappings allow for some complexity. These are the mappings of the legacy and FOLIO locations. The file must be structured like this:
folio_code | legacy_code | Z30_COLLECTION |
---|---|---|
AFA | AFAS | AFAS |
AFA | * | * |
The legacy_code part is needed for both Holdings migratiom. For Item migration, the source fields can be used (Z30_COLLECTION in this case). You can add as many source fields as you like for the Items
These mappings allow for some complexity. The first column name is fixed, since that is the target material type in FOLIO. Then you add the column names from the Item export TSV. For each column added, the values in them must match. At least one value per column must match. Se loan_types.tsv for complex examples
folio_name | Z30_MATERIAL |
---|---|
Audiocassette | ACASS |
Audiocassette | * |
These mappings allow for some complexity. The first column name is fixed, since that is the target loan type in FOLIO. Then you add the column names from the Item export TSV. For each column added, the values in them must match. At least one value per column must match
folio_name | Z30_SUB_LIBRARY | Z30_ITEM_STATUS |
---|---|---|
Non-circulating | UMDUB | 02 |
Non-circulating | * | * |
These mappings allow for some complexity eventhough not needed.
folio_name | Z30_CALL_NO_TYPE |
---|---|
Dewey Decimal classification | 8 |
Unmapped | * |
In order to map one statistical code to the FOLIO UUID, you need this map, and the field mapped in the item_mappings.json. These mappings allow for some complexity even though not needed. This mapping does not allow for default values. Any record without the field will not get one assigned.
folio_code | Z30_STAT_CODE |
---|---|
married_with_children | 8 |
happily_ever_after | 9 |
The handling of Item statuses is a bit of a project of its own, since not all statuses in legacy systems will have their equivalents in FOLIO. This mapping allows you to point one legacy status to a FOLIO status. If not status map is supplied, the status will be set to available.
legacy_code | folio_name |
---|---|
checked_out | Checked out |
available | Available |
lost | Aged to lost |
This is not yet a mapping file per se, but it is used to substitute the values in the next_item_status column in the legacy open loans file. Leave the statuses you do not want the loans migration process to migrate empty and replace the legacy statuses you want to apply with the correct FOLIO ones.
In the example records folder, you will find example source records and example results from after a transformation
The following table outlines the result records and their use and role
File | Content | Use for |
---|---|---|
folio_holdings.json | FOLIO Holdings records in json format. One per row in the file | To be loaded into FOLIO using the batch APIs |
folio_instances.json | FOLIO Instance records in json format. One per row in the file | To be loaded into FOLIO using the batch APIs |
folio_items.json | FOLIO Item records in json format. One per row in the file | To be loaded into FOLIO using the batch APIs |
holdings_id_map.json | A json map from legacy Holdings Id to the ID of the created FOLIO Holdings record | To be used in subsequent transformation steps |
holdings_transformation_report.md | A file containing various breakdowns of the transformation. Also contains errors to be fixed by the library | Create list of cleaning tasks, mapping refinement |
instance_id_map.json | A json map from legacy Bib Id to the ID of the created FOLIO Instance record. Relies on the "ILS Flavour" parameter in the main_bibs.py scripts | To be used in subsequent transformation steps |
instance_transformation_report.md | A file containing various breakdowns of the transformation. Also contains errors to be fixed by the library | Create list of cleaning tasks, mapping refinement |
item_id_map.json | A json map from legacy Item Id to the ID of the created FOLIO Item record | To be used in subsequent transformation steps |
item_transform_errors.tsv | A TSV file with errors and data issues together with the row number or id for the Item | To be used in fixing of data issues |
items_transformation_report.md | A file containing various breakdowns of the transformation. Also contains errors to be fixed by the library | Create list of cleaning tasks, mapping refinement |
marc_xml_dump.xml | A MARCXML dump of the bib records, with the proper 001:s and 999 fields added | For pre-loading a Discovery system. |
srs.json | FOLIO SRS records in json format. One per row in the file | To be loaded into FOLIO using the batch APIs |
Download the HRID handling settings from the tenant. If there are HRID handling in the mapping rules:
- The HRID is set on the Instance
- The 001 in the MARC21 record (bound for SRS) is replaced with this HRID.
If the mapping-rules specify no HRID handling or the field designated for HRID contains no value:
- The HRID is being constructed from the HRID settings
- Pad the number in the HRID Settings so length is 11
- A new 035 field is created and populated with the value from 001
- The 001 in the MARC21 record (bound for SRS) is replaced with this HRID.
- Instance Metadata Elements
- Recommended MARC mapping to Inventory Instances
- Recommended MFHD to Inventory Holdings mapping
- Holdingsrecord JSON Schema
- FOLIO Instance storage JSON Schema
- FOLIO Intance (BL) JSON Schema
- Inventory elements - Beta
- MARC Mappings Information
The mapping files and example data in this repo will enable you perform a migration against the latest FOLIO bugfest enironment. Everything is configured except for the missing FOLIO user password. This step-by-step guide will take you through the steps involved. If there are no more steps, we are still working on these example records
- Move everything under the example_data folder into the data folder.
- Setup pipenv using either the Pipfile or the requirements.txt
This configuration piece in the configuration file determines the behaviour
{
"name": "transform_bibs",
"migrationTaskType": "BibsTransformer",
"useTenantMappingRules": true,
"ilsFlavour": "tag001",
"tags_to_delete": [
"841",
"852"
],
"files": [
{
"file_name": "bibs.mrc",
"suppressed": false
}
]
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
useTenantMappingRules | true | Placeholder for option to use an external rules file |
ilsFlavour | any of "aleph", "voyager", "sierra", "millennium", "koha", "tag907y", "tag001", "tagf990a" | Used to point scripts to the correct legacy identifier and other ILS-specific things |
tags_to_delete | any string | Tags with these names will be deleted (after transformation) and not get stored in SRS |
files | Objects with filename and boolean | Filename of the MARC21 file in the data/instances folder- Suppressed tells script to mark records as suppressedFromDiscovery |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json transform_bibs --base_folder PATH_TO_migration_repo_template/
These configuration pieces in the configuration file determines the behaviour
{
"name": "post_bibs",
"migrationTaskType": "BatchPoster",
"objectType": "Instances",
"batchSize": 250,
"file": {
"file_name": "folio_instances_test_run_transform_bibs.json"
}
},
{
"name": "post_srs_bibs",
"migrationTaskType": "BatchPoster",
"objectType": "SRS",
"batchSize": 250,
"file": {
"file_name": "folio_srs_instances_test_run_transform_bibs.json"
}
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
objectType | Any of "Extradata", "Items", "Holdings", "Instances", "SRS", "Users" | Type of object to post |
batchSize | integer | The number of records per batch to post. If the API does not allow batch posting, this number will be ignored |
file.filename | Any string | Name of file to post, located in the results folder |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json post_bibs --base_folder PATH_TO_migration_repo_template/
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json post_srs_bibs --base_folder PATH_TO_migration_repo_template/
This configuration piece in the configuration file determines the behaviour
{
"name": "transform_mfhd",
"migrationTaskType": "HoldingsMarcTransformer",
"legacyIdMarcPath": "001",
"mfhdMappingFileName": "mfhd_rules.json",
"locationMapFileName": "locations.tsv",
"defaultCallNumberTypeName": "Library of Congress classification",
"fallbackHoldingsTypeId": "03c9c400-b9e3-4a07-ac0e-05ab470233ed",
"useTenantMappingRules": false,
"hridHandling": "default",
"createSourceRecords": true,
"files": [
{
"file_name": "holding.mrc",
"suppressed": false
}
]
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
legacyIdMarcPath | A marc field followed by an optional subfield delimited by a $ | used to locate the legacy identifier for this record. Examles : "001", "951$c" |
mfhdMappingFileName | Any string | location of the MFHD rules in the mapping_files folder |
locationMapFileName | Any string | Location of the Location mapping file in the mapping_files folder |
defaultCallNumberTypeName | Any call number name from FOLIO | Used for fallback mapping for callnumbers |
fallbackHoldingsTypeId | A uuid | Fallback holdings type if mapping does not work |
useTenantMappingRules | false | boolean (true/false) NOT YET IMPLEMENTED. |
hridHandling | "default" or "preserve001" | If default, HRIDs will be generated according to the FOLIO settings. If preserve001, the 001s will be used as hrids if possible or fallback to default settings |
createSourceRecords | boolean (true/false) | |
files | Objects with filename and boolean | Filename of the MARC21 file in the data/instances folder- Suppressed tells script to mark records as suppressedFromDiscovery |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json transform_mfhd --base_folder PATH_TO_migration_repo_template/
These configuration pieces in the configuration file determines the behaviour
{
"name": "post_holdingsrecords_from_mfhd",
"migrationTaskType": "BatchPoster",
"objectType": "Holdings",
"batchSize": 250,
"file": {
"file_name": "folio_holdings_test_run_transform_mfhd.json"
}
},
{
"name": "post_srs_mfhds",
"migrationTaskType": "BatchPoster",
"objectType": "SRS",
"batchSize": 250,
"file": {
"file_name": "folio_srs_holdings_test_run_transform_mfhd.json"
}
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
objectType | Any of "Extradata", "Items", "Holdings", "Instances", "SRS", "Users" | Type of object to post |
batchSize | integer | The number of records per batch to post. If the API does not allow batch posting, this number will be ignored |
file.filename | Any string | Name of file to post, located in the results folder |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json post_holdingsrecords_from_mfhd --base_folder PATH_TO_migration_repo_template/
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json post_srs_mfhds --base_folder PATH_TO_migration_repo_template/
These configuration pieces in the configuration file determines the behaviour
{
"name": "transform_csv_holdings",
"migrationTaskType": "HoldingsCsvTransformer",
"holdingsMapFileName": "holdingsrecord_mapping.json",
"locationMapFileName": "locations.tsv",
"defaultCallNumberTypeName": "Library of Congress classification",
"callNumberTypeMapFileName": "call_number_type_mapping.tsv",
"previouslyGeneratedHoldingsFiles": [
"folio_holdings_test_run_transform_mfhd"
],
"holdingsMergeCriteria": [
"instanceId",
"permanentLocationId",
"callNumber"
],
"fallbackHoldingsTypeId": "03c9c400-b9e3-4a07-ac0e-05ab470233ed",
"files": [
{
"file_name": "csv_items.tsv"
}
]
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
holdingsMapFileName | Any string | location of the mapping file in the mapping_files folder |
locationMapFileName | Any string | Location of the Location mapping file in the mapping_files folder |
defaultCallNumberTypeName | any string | Name of callnumber in FOLIO used as a fallback |
callNumberTypeMapFileName | Any string | location of the mapping file in the mapping_files folder |
previouslyGeneratedHoldingsFiles | ||
holdingsMergeCriteria | A list of strings with the names of holdingsrecord properties (on the same level) | Used to group indivitual rows into Holdings records. Proposed setting is ["instanceId", "permanentLocationId", "callNumber"] |
fallbackHoldingsTypeId | uuid string | The fallback/default holdingstype UUID |
createSourceRecords | boolean (true/false) | |
files | Objects with filename and boolean | Filename of the MARC21 file in the data/instances folder- Suppressed tells script to mark records as suppressedFromDiscovery |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json transform_mfhd --base_folder PATH_TO_migration_repo_template/
See documentation for posting above
These configuration pieces in the configuration file determines the behaviour
{
"name": "transform_csv_items",
"migrationTaskType": "ItemsTransformer",
"itemsMappingFileName": "item_mapping_for_csv_items.json",
"locationMapFileName": "locations.tsv",
"callNumberTypeMapFileName": "call_number_type_mapping.tsv",
"materialTypesMapFileName": "material_types_csv.tsv",
"loanTypesMapFileName": "loan_types_csv.tsv",
"itemStatusesMapFileName": "item_statuses.tsv",
"files": [
{
"file_name": "csv_items.tsv"
}
]
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
itemsMappingFileName | Any string | location of the mapping file in the mapping_files folder |
locationMapFileName | Any string | Location of the Location mapping file in the mapping_files folder |
callNumberTypeMapFileName | Any string | location of the mapping file in the mapping_files folder |
materialTypesMapFileName | Any string | location of the mapping file in the mapping_files folder |
loanTypesMapFileName | Any string | location of the mapping file in the mapping_files folder |
itemStatusesMapFileName | Any string | location of the mapping file in the mapping_files folder |
files | Objects with filename and boolean | Filename of the MARC21 file in the data/instances folder- Suppressed tells script to mark records as suppressedFromDiscovery |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json transform_csv_items --base_folder PATH_TO_migration_repo_template/
See documentation for posting above
These configuration pieces in the configuration file determines the behaviour
{
"name": "user_transform",
"migrationTaskType": "UserTransformer",
"groupMapPath": "user_groups.tsv",
"userMappingFileName": "user_mapping.json",
"useGroupMap": true,
"userFile": {
"file_name": "staff.tsv"
}
}
Parameter | Possible values | Explanation |
---|---|---|
Name | Any string | The name of this task. Created files will have this as part of their names. |
migrationTaskType | Any of the avialable migration tasks | The type of migration task you want to run |
userMappingFileName | Any string | location of the mapping file in the mapping_files folder |
groupMapPath | Any string | Location of the user group mapping file in the mapping_files folder |
useGroupMap | boolean | Use the above group map file or use code-to-code direct mapping |
userFile.file_name | Any string | name of csv/tsv file of legacy users in the data/users folder |
pipenv run python main.py PATH_TO_migration_repo_template/mapping_files/exampleConfiguration.json user_transform --base_folder PATH_TO_migration_repo_template/
See documentation for posting above