Releases: hackoregon/data-science-pet-containers
Releases · hackoregon/data-science-pet-containers
More bug fixes
Bug fixes
Refactored Jupyter image
- The Jupyter image was over 4 gigabytes. This is unworkable, so I've dropped back to just the Jupyter notebook server on the image. The data science stack can be installed by executing a script the first time you start the service.
- All of the examples have been refactored to be runnable from a Linux host command line, including WSL Ubuntu on Windows 10 Pro.
- @nam20485 enhanced the documentation for the Windows 10 Pro WSL operations. Many thanks!!
- Miscellaneous bug fixes.
Tooling for database backup wrangling
- I've "upgraded" the users in the PostGIS container defined by
DB_USERS_TO_CREATE
. They are now Linux users and they are now database superusers with "trust" database authentication. This is the traditional modus operandi for a Debian / Ubuntu desktop. - I've added an example,
reowning_a_database
, that demonstrates how to automate some common database operations with Docker's host-container interface tools,docker cp
anddocker exec
.
Bug fixes, documentation cleanup
- Emacs Speaks Statistics (ESS) is now installed from source on both the
rstats
andpostgis
images (issue #54) - The user creation step of the automatic restores is cleaned up and the documentation in the README enhanced (issue #53)
- Linux Mint scripts are now available for both PostgreSQL and Docker hosting, and there are scripts for both Linux Mint 18 and 17 (issues #51 and #52)
- There is now a
.Rprofile
for thedbsuper
Linux user. This fixes a bug in the ODOT crash data migration example (issue #50) - Issue #49 is fixed.
Many new features!
New features:
- PostGIS:
- For compatibility with our AWS deployment, I've dropped back from 10.3 to 9.6. The other components, PostGIS and pgRouting, are still the latest stable versions.
- The PostGIS command line now includes R and Emacs Speaks Statistics (ESS). And, of course, Emacs.
- In addition to
pg_dump
"custom" format files (.backup
), automatic restores now work for.sql
and.sql.gz
files (plain text and compressed plain text). In fact, this is now the preferred format for database dumps given that we've been unable to restore a.backup
to the AWS server. - The Hack Oregon database users defined in our AWS server are now available in the
postgis
server. - Raw data can be loaded onto the
postgis
image at build time.
- Jupyter (formerly Miniconda and before that Jupyter)
- This is the name that makes sense. Jupyter is the tool; Miniconda is how it gets there.
- Cookiecutter is now installed by default.
- Analysis packages now include:
- geopandas,
- jupyter,
- matplotlib,
- pandas,
- psycopg2,
- requests,
- seaborn,
- statsmodels,
- cookiecutter, and
- osmnx.
- Rstats (formerly RStudio)
- Name change: we're planning to push these images to a Docker repository to cut down on how long it takes to bring up the containers. If this is a public repository, there's a potential issue with the RStudio trademark. To avoid that I've renamed the image.
- I've added Emacs Speaks Statistics (ESS). And, of course, Emacs.
- Amazon: I've put this image in the collection primarily for backup file restore testing. It looks mostly like the
postgis
image but not much of the infrastructure is duplicated. There's just enough to bring up a database with the Hack Oregon database users and verify that databases can be restored. - Windows 10 Pro Windows Subsystem for Linux Docker interface: It works! It really does! There were a couple of bugs but I've fixed them and for all practical purposes it looks just like running Docker Community Edition from an Ubuntu 16.04 LTS "Xenial Xerus" terminal.
- Examples: I've added the raw-data-to-database-backup scripts for the Transportation Systems passenger census and ODOT crash data as examples. They run inside a
postgis
container.
More Miniconda packages, more TriMet data
- Closed issues: 26 - 29
- The Miniconda image now includes
pandas
,geopandas
,requests
,psycopg2
,statsmodels
andseaborn
. - The PostGIS image has a new database superuser, Linux and PostgreSQL user
dbsuper
. Use this instead of logging in aspostgres
. - The PostGIS image now includes
virtualenvwrapper
. - The OpenStreetMap example now includes all TriMet public GIS data, not just the service area boundary.
- All the images now have a
clone-me.bash
script that clones this repository! So you can build a Docker network, clone this repository and run the examples. I'm open to suggestions for more examples as long as they use publicly-downloadable data only.
v1.1.0 - OpenStreetMap example and WSL usage notes
New features:
-
The OpenStreetMap pgRouting import example is working - if you have enough RAM! https://github.com/hackoregon/data-science-pet-containers/blob/master/examples/OpenStreetMap/README.md
-
I've added some instructions for running Docker on Windows 10 Pro from a Windows Subsystem for Linux (WSL) Ubuntu terminal. https://github.com/hackoregon/data-science-pet-containers/blob/master/win10pro-wsl-ubuntu-tools/README.md
v1.0.0 - Initial release
A Docker environment for data science, including GIS
- PostgreSQL 10, PostGIS 2.4 and pgRouting 2.5, plus numerous command-line ETL and GIS utilities,
- A Jupyter notebook server, and
- An RStudio server.
Found a bug? Documentation unclear? Want a new feature? File an issue at https://github.com/hackoregon/data-science-pet-containers/issues/new