Skip to content

An experiment in gathering together sources of information about digital preservation practices

License

Notifications You must be signed in to change notification settings

digipres/digipres-practice-index

Repository files navigation

digipres-practice-index

An experiment in gathering together sources of information about digital preservation practices

This initial plan is to experiment with using Python to gather useful information sources, starting with iPres. Then see if this can usefully be transformed into something searchable using Datasette or Datasette Lite.

This originally relied on a tool called DVC. Why DVC? Because I wanted to manage how the data is complied, and I liked the way it handles checking data dependencies. Very #DigiPres... Also, e.g. remote storage integration for data sets on Google Drive.

However, having tried both DVC and Snakemake, they seem very difficult to work with. Lots of complex dependencies that don't always install easily, and over-engineered for this use case. So, instead, build pipelines are manage the old-fashioned way, using Make. There's lots of tutorials for Make (e.g.), and the Turing Way book has a really good section called Reproducibility with Make.

Development Setup

You need Python 3 and Make.

Clone this repo. Set up a Python 3 virtual env, e.g.

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install .

Optionally, install NLP data required for some analysis/processing (not in production use):

python -m spacy download en_core_web_lg

Local Usage

Build the data:

make

Try the Datasette view:

datasette serve practice.db --setting truncate_cells_html 120

After which you should be able to go to e.g. http://127.0.0.1:8001/practice/publications?_facet=type&_searchmode=raw&_facet=year&_facet_array=creators&_facet_array=institutions&_facet_size=10&_sort=year

Other build targets generate other derivatives. Check the Makefile for details.

Sources of Practice

iPRES

Where are the papers and metadata... Links on https://iPRES-conference.org/ are not complete.

It may make more sense to use JSON to store this data, and use JSON Schema in VSCode to make it easer to edit them. That can then be consumed by the gathering scripts as well as being used to generate tabular forms like this.

The information about each iPRES conference is now stored as a set of Markdown+metadata files in the publications repository, and are summarised at http://www.digipres.org/publications/ipres/

PHAIDRA

OSF

  • More recent conferences appear in OSF, which has a much more complicated structure, but allows more types of materials to be stored.

IDEALS

About

An experiment in gathering together sources of information about digital preservation practices

Resources

License

Stars

Watchers

Forks

Packages

No packages published