Name	Name	Last commit message	Last commit date
parent directory ..
constants	constants	Add blank space characters to license headers.	Oct 10, 2018
preprocessing	preprocessing	Add blank space characters to license headers.	Oct 10, 2018
trainer	trainer	Add blank space characters to license headers.	Oct 10, 2018
utils	utils	Add blank space characters to license headers.	Oct 10, 2018
README.md	README.md	Update README.md	Jun 12, 2019
__init__.py	__init__.py	Added sentiment analysis tutorial code.	Sep 28, 2018
config.yaml	config.yaml	Added sentiment analysis tutorial code.	Sep 28, 2018
config_hp_tuning.yaml	config_hp_tuning.yaml	Added sentiment analysis tutorial code.	Sep 28, 2018
requirements.txt	requirements.txt	Update requirements.txt	Jun 11, 2019
run_preprocessing.py	run_preprocessing.py	Add blank space characters to license headers.	Oct 10, 2018
scoring.py	scoring.py	Add blank space characters to license headers.	Oct 10, 2018
setup.py	setup.py	Add blank space characters to license headers.	Oct 10, 2018

Name

Last commit message

Last commit date

Add blank space characters to license headers.

Oct 10, 2018

preprocessing

Add blank space characters to license headers.

Oct 10, 2018

trainer

Add blank space characters to license headers.

Oct 10, 2018

utils

Add blank space characters to license headers.

Oct 10, 2018

README.md

Update README.md

Jun 12, 2019

__init__.py

Added sentiment analysis tutorial code.

Sep 28, 2018

config.yaml

Added sentiment analysis tutorial code.

Sep 28, 2018

config_hp_tuning.yaml

Added sentiment analysis tutorial code.

Sep 28, 2018

requirements.txt

Update requirements.txt

Jun 11, 2019

run_preprocessing.py

Add blank space characters to license headers.

Oct 10, 2018

scoring.py

Add blank space characters to license headers.

Oct 10, 2018

setup.py

Add blank space characters to license headers.

Oct 10, 2018

Sentiment analysis using TensorFlow RNNEstimator on Google Cloud Platform.

Overview.

This code aims at providing a simple example of how to train a RNN model using TensorFlow RNNEstimator on Google Cloud Platform. The model is designed to handle raw text files in input without preprocessing needed. A more detailed guide can be found here.

Problem and data.

The problem is a text classification example where we categorize the movie reviews into positive or negative sentiment. We base this example on the IMDb dataset provided from this website: http://ai.stanford.edu/~amaas/data/sentiment/

Set-up environment.

PROJECT_NAME=sentiment_analysis
git clone https://github.com/GoogleCloudPlatform/professional-services.git
cd professional-services/examples/cloudml-sentiment-analysis
python -m virtualenv env
source env/bin/activate
python -m pip install -U pip
python -m pip install -r requirements.txt

Download data.

DATA_PATH=data
INPUT_DATA=${DATA_PATH}/aclImdb/train
TRAINING_INPUT_DATA=${DATA_PATH}/training_data
wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz -P $DATA_PATH
tar -xzf ${DATA_PATH}/aclImdb_v1.tar.gz -C $DATA_PATH

Configure GCP.

PROJECT_ID=<...>
BUCKET_PATH=<...>
gcloud config set project $PROJECT_ID

Move data to GCP.

gsutil -m cp -r $DATA_PATH/aclImdb $BUCKET_PATH
GCP_INPUT_DATA=$BUCKET_PATH/aclImdb/train

Preprocess data.

JOB_NAME=training-$(date +"%Y%m%d-%H%M%S")
PROCESSED_DATA=$BUCKET_PATH/processed_data/$JOB_NAME
python run_preprocessing.py \
  --input_dir=$GCP_INPUT_DATA \
  --output_dir=$PROCESSED_DATA \
  --gcp=True \
  --project_id=$PROJECT_ID \
  --job_name=$JOB_NAME \
  --num_workers=8 \
  --worker_machine_type=n1-highcpu-4 \
  --region=us-central1

Train model locally.

MODEL_NAME=${PROJECT_NAME}_$(date +"%Y%m%d_%H%M%S")
TRAINING_OUTPUT_DIR=models/$MODEL_NAME
python -m trainer.task \
  --input_dir=$PROCESSED_DATA \
  --model_dir=$TRAINING_OUTPUT_DIR

Train model on GCP.

MODEL_NAME=${PROJECT_NAME}_$(date +"%Y%m%d_%H%M%S")
TRAINING_OUTPUT_DIR=${BUCKET_PATH}/$MODEL_NAME
gcloud ml-engine jobs submit training $MODEL_NAME \
  --module-name trainer.task \
  --staging-bucket $BUCKET_PATH \
  --package-path $PWD/trainer \
  --region=us-central1 \
  --runtime-version 1.12 \
  --config=config_hp_tuning.yaml \
  --stream-logs \
  -- \
  --input_dir $PROCESSED_DATA \
  --model_dir $TRAINING_OUTPUT_DIR

Train model locally with gcloud.

MODEL_NAME=${PROJECT_NAME}_$(date +"%Y%m%d_%H%M%S")
TRAINING_OUTPUT_DIR=models/$MODEL_NAME
gcloud ml-engine local train \
  --module-name=trainer.task \
  --package-path=$PWD/trainer \
  -- \
  --input_dir=$PROCESSED_DATA \
  --model_dir=$TRAINING_OUTPUT_DIR

Monitor with tensorboard.

tensorboard --logdir=$TRAINING_OUTPUT_DIR

Save model in GCP.

With HP tuning:

TRIAL_NUMBER=''
MODEL_SAVED_NAME=$(gsutil ls ${TRAINING_OUTPUT_DIR}/${TRIAL_NUMBER}/export/exporter/ | tail -1)

Without HP tuning:

MODEL_SAVED_NAME=$(gsutil ls ${TRAINING_OUTPUT_DIR}/export/exporter/ | tail -1)

gcloud ml-engine models create $PROJECT_NAME \
  --regions us-central1
gcloud ml-engine versions create $MODEL_NAME \
  --model $PROJECT_NAME \
  --origin $MODEL_SAVED_NAME \
  --runtime-version 1.12

Make local online predictions.

gcloud ml-engine local predict \
  --model-dir=${TRAINING_OUTPUT_DIR}/export/exporter/$(ls ${TRAINING_OUTPUT_DIR}/export/exporter/ | tail -1) \
  --text-instances=${DATA_PATH}/aclImdb/test/*/*.txt

Make online predictions with GCP.

gcloud ml-engine predict \
  --model=$PROJECT_NAME \
  --version=$MODEL_NAME \
  --text-instances=$DATA_PATH/aclImdb/test/neg/0_2.txt

Move out of sample data to GCS.

PREDICTION_DATA_PATH=${BUCKET_PATH}/prediction_data
gsutil -m cp -r ${DATA_PATH}/aclImdb/test/ $PREDICTION_DATA_PATH

Make batch predictions with GCP.

JOB_NAME=${PROJECT_NAME}_predict_$(date +"%Y%m%d_%H%M%S")
PREDICTIONS_OUTPUT_PATH=${BUCKET_PATH}/predictions/$JOB_NAME
gcloud ml-engine jobs submit prediction $JOB_NAME \
  --model $PROJECT_NAME \
  --input-paths $PREDICTION_DATA_PATH/neg/* \
  --output-path $PREDICTIONS_OUTPUT_PATH \
  --region us-central1 \
  --data-format TEXT \
  --version $MODEL_NAME

Scoring.

python scoring.py \
  --project_name=$PROJECT_ID \
  --model_name=$PROJECT_NAME \
  --input_path=$DATA_PATH/aclImdb/test \
  --size=1000 \
  --batch_size=20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

cloudml-sentiment-analysis

cloudml-sentiment-analysis

README.md

Sentiment analysis using TensorFlow RNNEstimator on Google Cloud Platform.

Overview.

Problem and data.

Set-up environment.

Download data.

Configure GCP.

Move data to GCP.

Preprocess data.

Train model locally.

Train model on GCP.

Train model locally with gcloud.

Monitor with tensorboard.

Save model in GCP.

Make local online predictions.

Make online predictions with GCP.

Move out of sample data to GCS.

Make batch predictions with GCP.

Scoring.

Files

cloudml-sentiment-analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

cloudml-sentiment-analysis

Folders and files

parent directory

README.md

Sentiment analysis using TensorFlow RNNEstimator on Google Cloud Platform.

Overview.

Problem and data.

Set-up environment.

Download data.

Configure GCP.

Move data to GCP.

Preprocess data.

Train model locally.

Train model on GCP.

Train model locally with gcloud.

Monitor with tensorboard.

Save model in GCP.

Make local online predictions.

Make online predictions with GCP.

Move out of sample data to GCS.

Make batch predictions with GCP.

Scoring.