Multi-class sentiment analysis problem to classify texts into five emotion categories: joy, sadness, anger, fear, neutral by using transfer learning using BERT (tensorflow keras).
Usage:
Make sure the code can run normally in local environment. Take notes on the libraries it uses, for my case:pandas, numpy, ktrain which uses keras.
$ docker version
There are several jupyter-notebook-related base images. Since our notebook uses pandas, numpy, ktrain(uses keras which runs on top of tensorflow), tensorflow-notebook will be our choice, since it includes everything in jupyter/scipy-notebook(which has pandas, numpy included).
$ touch requirements.txt
In the same folder of our notebook, we create our Dockerfile:
FROM jupyter/tensorflow-notebook
COPY requirements.txt ./requirements.txt
COPY bert.ipynb ./bert.ipynb
COPY data ./data
COPY bert_model ./bert_model
RUN pip install -r requirements.txt
which uses the base image, and copies all files into our container.
$ docker build -t textemotionotebook .
$ docker images
Running with admin priviledges:
$ docker run -it -p 8888:8888 textemotionotebook
Use the last link, e.g. http://127.0.0.1:8888/lab?token=...
There should be warning suggesting the missing library: ktrain.
Update requirements.txt to include the the required library
ktrain==0.32.3
Or simply,
ktrain
After rebuilding the docker and rerunning the image, the notebook can be run successfully.
First find the image ID by running docker images
$ docker tag d3e2d3a63640 astolo/textemotionotebook:first
$ docker push astolo/textemotionotebook:first
Should be able to see the image in DockerHub 's My Profile
$ docker pull astolo/textemotionotebook:first
Running as administrator:
$ docker run -it -p 8888:8888 astolo/textemotionotebook:first
Use the bottom link, the notebook should run successfully.
Running as administrator in Powershell:
$ docker pull astolo/textemotionotebook:first
$ docker run -it -p 8888:8888 astolo/textemotionotebook:first
Use the bottom link, e.g.http://127.0.0.1:8888/lab?token=...
If you are not ready to wait for 10 hours to retrain the model, start from tag "RUN FROM HERE" in the notebook and use the pretrained model.
I first used an audio-related notebook, which improvises a pieze of jazz music. Later I found out that it was not trivial to play audio inside docker after going through a large number of tutorials trying to find a way to listen to my generated midi music file inside docker.
I also tried to shrink the size of my image by using more than one dockerfile and seperating the build and package stage, or using docker-compose but soon made the file looks pretty ugly and unreadable, which I believe was not the intension of this assignment: to dockerize a simple application. Plus most tutorials focus on nodejs or c or java based projects which are more natural to break down into stages compared to jupyter-notebook projects.
If you are to git clone
this repo, the notebook has to run from the top. Because there is a size limitation for github, I was unable to upload the .h5
file of the pretrained model, but if you
$ docker pull astolo/textemotionotebook:first
$ docker run -it -p 8888:8888 astolo/textemotionotebook:first
there should not be a problem running from the tag "RUN FROM HERE".