Skip to content

Commit

Permalink
docs updated - ready for final tests
Browse files Browse the repository at this point in the history
  • Loading branch information
znmeb committed Apr 3, 2018
1 parent eaecf3a commit f9585c8
Show file tree
Hide file tree
Showing 7 changed files with 55 additions and 43 deletions.
27 changes: 15 additions & 12 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ As the name implies, the software is distributed via Docker. The user simply clo
Why do it this way?

* Provide a standardized common working environment for data scientists and DevOps engineers at Hack Oregon. We want to build using the same tools we'll use for deployment as much as possible.
* Deliver advanced open source technologies to Windows and MacOS desktops and laptops. While there are "native" installers for most of these tools, some are readily available for and only heavily tested on Linux.
* Deliver advanced open source technologies to Windows and MacOS desktops and laptops. While there are "native" installers for most of these tools, some are readily available for and only extensively tested on Linux.
* Isolation: for the most part, software running in containers is contained. It interacts with the desktop / laptop user through well-defined mechanisms, often as a web server.

### About the examples
Expand Down Expand Up @@ -61,7 +61,7 @@ I've coded up some examples of how I've used this toolset for Hack Oregon. They'
Type `CTL-C` to stop following the container log.
6. Connect to the container from the host: user name is `postgres`, host is `localhost`, port is the value of `HOST_POSTGRES_PORT`, usually 5439, and password is the value of `POSTGRES_PASSWORD`. You can connect with any client that uses the PostgreSQL protocol including pgAdmin and QGIS.
6. Connect to the container from the host: user name is `postgres`, host is `localhost`, port is the value of `HOST_POSTGRES_PORT`, usually 5439, and password is the value of `POSTGRES_PASSWORD`. You can connect with any client that uses the PostgreSQL protocol including pgAdmin and QGIS. You can also connect with Jupyter notebooks and RStudio Desktop.
To stop the service, type `docker-compose -f postgis.yml stop`. To start it back up again, `docker-compose -f postgis.yml start`.
Expand Down Expand Up @@ -121,7 +121,7 @@ The `postgis` service is based on the official PostgreSQL image from the Docker
All the images except `amazon` acquire PostgreSQL and its accomplices from the official PostgreSQL Global Development Group (PGDG) Debian repositories: <https://www.postgresql.org/download/linux/debian/>.
### Using the command line
I've tried to provide a comprehensive command line experience. `Git`, `curl`, `wget`, `lynx`, `nano` and `vim` are there, as is most of the command-line GIS stack (`gdal`, `proj`, `spatialite`, `rasterlite`, `geotiff`, `osm2pgsql` and `osm2pgrouting`), and of course `psql`.
I've tried to provide a comprehensive command line experience. `Git`, `curl`, `wget`, `lynx`, `nano`, `emacs` and `vim` are there, as is most of the command-line GIS stack (`gdal`, `proj`, `spatialite`, `rasterlite`, `geotiff`, `osm2pgsql` and `osm2pgrouting`), and of course `psql`.
I've also included `python3-csvkit` for Excel, CSV and other text files, `unixodbc` for ODBC connections and `mdbtools` for Microsoft Access files. If you want to extend this image further, it is based on Debian `jessie`.
Expand All @@ -134,10 +134,10 @@ For Linux hosts, including Windows 10 Pro Windows Subsystem for Linux Ubuntu (<h
* `login2postgis.bash`: Type `./login2postgis.bash` and you'll be logged into the `containers_postgis_1` container as `dbsuper`. You can also log in as one of the users in `DB_USERS_TO_CREATE`, for example, `./login2postgis.bash local-elections`.
* `login2amazon.bash`: Log in to `containers_amazon_1` as `dbsuper`.
* `pull-postgis-backups.bash`: This script does `docker cp containers_postgis_1:/home/dbsuper/Backups`. That is, it copies all the files from `/home/dbsuper/Backups` into `Backups` in your current directory, creating `Backups` if it doesn't exist.
* `pull-postgis-raw.bash`: This script does `docker cp containers_postgis_1:/home/dbsuper/Raw`. That is, it copies all the files from `/home/dbsuper/Raw` into `Raw` in your current directory, creating `Raw` if it doesn't exist.
I use this for transferring backup files for testing; I'll create a database in the PostGIS container, create a backup in `/home/dbsuper/Backups` and copy it out to the host with this script.
* `push-amazon-backups.bash`: This is the next step - it copies `Backups` to `/home/dbsuper/Backups` in `containers_amazon_1` for restore testing.
I use this for transferring backup files for testing; I'll create a database in the PostGIS container, create a backup in `/home/dbsuper/Raw` and copy it out to the host with this script. I create the backups in `Raw` instead of `Backups` so they don't get automatically restored.
* `push-amazon-raw.bash`: This is the next step - it copies `Raw` to `/home/dbsuper/Raw` in `containers_amazon_1` for restore testing.
### Virtualenvwrapper
You can use the Python `virtualenvwrapper` utility. See <https://virtualenvwrapper.readthedocs.io/en/latest/#> for the documentation.
Expand Down Expand Up @@ -210,18 +210,21 @@ If you want to load raw data onto the `postgis` image, copy the files to the `da
## Jupyter
This service is based on the Anaconda, Inc. (formerly Continuum) `miniconda3` image: <https://hub.docker.com/r/continuumio/miniconda3/>. I've added a non-root user `jupyter` to avoid the security issues associated with running Jupyter notebooks as "root".
The `jupyter` user has a Conda environment, also called `jupyter`. In addition to `jupyter`, the environment has
The `jupyter` user has a Conda environment, also called `jupyter`. To keep the image size manageable, I have ***not*** installed any data science tools up front.
To install a (large) data science stack, run the script `/home/jupyter/kitchen-sink.bash` as the `jupyter` user. This will install
* cookiecutter,
* geopandas,
* jupyter,
* gwr,
* matplotlib,
* osmnx,
* pandas,
* psycopg2,
* pysal,
* requests,
* seaborn,
* statsmodels,
* cookiecutter, and
* osmnx.
* seaborn, and
* statsmodels.
By default the Jupyter notebook server starts when Docker brings up the service. Type `docker logs conatiners_jupyter_1`. You'll see something like this:
Expand Down
52 changes: 30 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
Data Science Pet Containers
===========================

M. Edward (Ed) Borasky <[email protected]>, 2018-03-26
M. Edward (Ed) Borasky <[email protected]>, 2018-04-02

Overview
--------
Expand All @@ -62,7 +62,7 @@ Why do it this way?
using the same tools we’ll use for deployment as much as possible.
- Deliver advanced open source technologies to Windows and MacOS
desktops and laptops. While there are “native” installers for most
of these tools, some are readily available for and only heavily
of these tools, some are readily available for and only extensively
tested on Linux.
- Isolation: for the most part, software running in containers is
contained. It interacts with the desktop / laptop user through
Expand Down Expand Up @@ -112,7 +112,8 @@ Quick start
host is `localhost`, port is the value of `HOST_POSTGRES_PORT`,
usually 5439, and password is the value of `POSTGRES_PASSWORD`. You
can connect with any client that uses the PostgreSQL protocol
including pgAdmin and QGIS.
including pgAdmin and QGIS. You can also connect with Jupyter
notebooks and RStudio Desktop.

To stop the service, type `docker-compose -f postgis.yml stop`. To start
it back up again, `docker-compose -f postgis.yml start`.
Expand Down Expand Up @@ -227,9 +228,10 @@ repositories: <https://www.postgresql.org/download/linux/debian/>.
### Using the command line

I’ve tried to provide a comprehensive command line experience. `Git`,
`curl`, `wget`, `lynx`, `nano` and `vim` are there, as is most of the
command-line GIS stack (`gdal`, `proj`, `spatialite`, `rasterlite`,
`geotiff`, `osm2pgsql` and `osm2pgrouting`), and of course `psql`.
`curl`, `wget`, `lynx`, `nano`, `emacs` and `vim` are there, as is most
of the command-line GIS stack (`gdal`, `proj`, `spatialite`,
`rasterlite`, `geotiff`, `osm2pgsql` and `osm2pgrouting`), and of course
`psql`.

I’ve also included `python3-csvkit` for Excel, CSV and other text files,
`unixodbc` for ODBC connections and `mdbtools` for Microsoft Access
Expand All @@ -256,18 +258,18 @@ I’ve created some convenience scripts:
can also log in as one of the users in `DB_USERS_TO_CREATE`, for
example, `./login2postgis.bash local-elections`.
- `login2amazon.bash`: Log in to `containers_amazon_1` as `dbsuper`.
- `pull-postgis-backups.bash`: This script does
`docker cp containers_postgis_1:/home/dbsuper/Backups`. That is, it
copies all the files from `/home/dbsuper/Backups` into `Backups` in
your current directory, creating `Backups` if it doesn’t exist.
- `pull-postgis-raw.bash`: This script does
`docker cp containers_postgis_1:/home/dbsuper/Raw`. That is, it
copies all the files from `/home/dbsuper/Raw` into `Raw` in your
current directory, creating `Raw` if it doesn’t exist.

I use this for transferring backup files for testing; I’ll create a
database in the PostGIS container, create a backup in
`/home/dbsuper/Backups` and copy it out to the host with this
script.
- `push-amazon-backups.bash`: This is the next step - it copies
`Backups` to `/home/dbsuper/Backups` in `containers_amazon_1` for
restore testing.
`/home/dbsuper/Raw` and copy it out to the host with this script. I
create the backups in `Raw` instead of `Backups` so they don’t get
automatically restored.
- `push-amazon-raw.bash`: This is the next step - it copies `Raw` to
`/home/dbsuper/Raw` in `containers_amazon_1` for restore testing.

### Virtualenvwrapper

Expand Down Expand Up @@ -391,19 +393,25 @@ This service is based on the Anaconda, Inc. (formerly Continuum)
I’ve added a non-root user `jupyter` to avoid the security issues
associated with running Jupyter notebooks as “root”.

The `jupyter` user has a Conda environment, also called `jupyter`. In
addition to `jupyter`, the environment has
The `jupyter` user has a Conda environment, also called `jupyter`. To
keep the image size manageable, I have ***not*** installed any data
science tools up front.

To install a (large) data science stack, run the script
`/home/jupyter/kitchen-sink.bash` as the `jupyter` user. This will
install

- cookiecutter,
- geopandas,
- jupyter,
- gwr,
- matplotlib,
- osmnx,
- pandas,
- psycopg2,
- pysal,
- requests,
- seaborn,
- statsmodels,
- cookiecutter, and
- osmnx.
- seaborn, and
- statsmodels.

By default the Jupyter notebook server starts when Docker brings up the
service. Type `docker logs conatiners_jupyter_1`. You’ll see something
Expand Down
1 change: 1 addition & 0 deletions containers/jupyter-scripts/make-cookiecutter-project.bash
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@

echo "Creating a new Cookiecutter project in $HOME/Projects"
source activate jupyter
conda install --yes --quiet -c conda-forge cookiecutter
cd $HOME/Projects
cookiecutter https://github.com/drivendata/cookiecutter-data-science
5 changes: 0 additions & 5 deletions containers/pull-postgis-backups.bash

This file was deleted.

5 changes: 5 additions & 0 deletions containers/pull-postgis-raw.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#! /bin/bash

ls -Altr Raw
docker cp containers_postgis_1:/home/dbsuper/Raw .
ls -Altr Raw
4 changes: 0 additions & 4 deletions containers/push-amazon-backups.bash

This file was deleted.

4 changes: 4 additions & 0 deletions containers/push-amazon-raw.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#! /bin/bash

ls -Altr Raw
docker cp Raw containers_amazon_1:/home/dbsuper

0 comments on commit f9585c8

Please sign in to comment.