Skip to content

Commit

Permalink
Merge pull request #128 from rh-aiservices-bu/feature/per-user-db
Browse files Browse the repository at this point in the history
claim processing review - step 1
  • Loading branch information
guimou authored Jan 17, 2024
2 parents f6f9f7d + a253b11 commit bd94451
Show file tree
Hide file tree
Showing 20 changed files with 505 additions and 22 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 14 additions & 9 deletions content/modules/ROOT/pages/05-05-process-claims.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,35 @@
include::_attributes.adoc[]

== What will the pipeline do?
Now that we have the web app deployed, we need some way to process the claims in the web app. For that, we will use a pipeline that either can run ad-hoc or be scheduled just like the sanity check pipeline. +
Now that we have the web app deployed, we can see that some claims are unprocessed yet. Of course, we want to way to do this processing, and it's even better if it can be fully automated!

For that, we will use a pipeline that can either be run ad-hoc or scheduled just like, the sanity check pipeline.

This pipeline is also a good starting point for creating an ArgoCD or Tekton pipeline which can be automatically triggered.

== What's inside the pipeline?
If you navigate to `lab-materials/05/05-05/` you can see a variety of files. +
Just like before, we have both an Elyra version and a yaml version of the pipeline. This time, we will use the yaml file of the pipeline, which has been slightly customized to be able to run independently of Elyra. +

If you navigate to `insurance-claim-processing/lab-materials/05/05-05` you can see a variety of files. +
Just like before, we have both an Elyra version and a yaml version of the pipeline. This time, we will use the yaml definition of the pipeline, which has been slightly customized to be able to run independently of Elyra. +
Here are the main files of the pipeline and what they do:

* *get_claims* - Will connect to the database, fetch any unprocessed claims, and add them to a list that will be passed to the other tasks through a file `claims.json`.
* The following will go through all the claims and use the full body of the text to try and find some important feature, then push that to the database:
* *get_claims* - Will connect to the database, fetch any unprocessed claims, and add them to a list that will be passed to the other tasks through a file: `claims.json`.
* The following will go through all the claims that need to be processed, and use the full body of the text to try and find some important feature, then push the results to the database:
** *get_location* - Finds the location of the accident.
** *get_accident_time* - Finds the time of the accident.
** *summarize_text* - Makes a short summary of the text.
** *get_sentiment* - Gets the sentiment of the text.
* *detect_objects* - Downloads the images of the claims and uses the served object-detection model to find damages in the image.
* *detect_objects* - Downloads the images of the claim and uses the served object-detection model to classify the damages in the image.

== Create a new PVC
Before we can run the pipeline, we need to create a PVC it can use to store file and results in. +
Go to the OpenShift Console and navigate to Storage -> PersistantStorageClaims.

Before we can run the pipeline, we need to create a PVC that will be used to store intermediary files and results in. +
Go to the OpenShift Console and navigate to Storage -> PersistentVolumeClaims.

[.bordershadow]
image::05/05-PVC.png[go to PVC]

Make sure you are in the right project (your username) and then press `Create PersistantVolumeClaim`.
Make sure you are in the right project (your username) and then press `Create PersistentVolumeClaim`.

[.bordershadow]
image::05/05-create-pvc.png[Create PVC]
Expand Down
4 changes: 2 additions & 2 deletions lab-materials/05/05-05/process_claims.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
# name: process-claims
generateName: training-pipeline-
name: process-claims
#generateName: training-pipeline-
annotations:
tekton.dev/output_artifacts: '{"run-a-file": [{"key": "artifacts/$PIPELINERUN/run-a-file/mlpipeline-metrics.tgz",
"name": "mlpipeline-metrics", "path": "/tmp/mlpipeline-metrics.json"}, {"key":
Expand Down
59 changes: 59 additions & 0 deletions lab-materials/05/app/db-init-job.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
apiVersion: batch/v1
kind: Job
metadata:
name: db-init-job
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
template:
spec:
initContainers:
- name: wait-for-db
image: busybox:1.28
command: ['sh', '-c', 'until nc -z -v -w30 $POSTGRESQL_DATABASE 5432; do echo "Waiting for database connection..."; sleep 2; done;']
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POSTGRESQL_DATABASE
value: claimdb.$(NAMESPACE).svc.cluster.local
containers:
- name: postgresql
image: registry.redhat.io/rhel9/postgresql-13:latest
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POSTGRESQL_DATABASE
valueFrom:
secretKeyRef:
name: claimdb
key: database-name
- name: POSTGRESQL_USER
valueFrom:
secretKeyRef:
name: claimdb
key: database-user
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: claimdb
key: database-password
- name: POSTGRESQL_DATABASE_HOST
value: claimdb.$(NAMESPACE).svc.cluster.local
command: ["/bin/bash", "-c"]
args:
- |
echo "Running SQL script"
psql -h $POSTGRESQL_DATABASE_HOST -p 5432 -U $POSTGRESQL_USER -d $POSTGRESQL_DATABASE -f /sql-script/script.sql
volumeMounts:
- name: sql-script-volume
mountPath: /sql-script
restartPolicy: Never
volumes:
- name: sql-script-volume
configMap:
name: sql-script-configmap
backoffLimit: 4
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ kind: Deployment
metadata:
name: ic-app
annotations:
argocd.argoproj.io/sync-wave: "1"
argocd.argoproj.io/sync-wave: "2"
spec:
replicas: 1
selector:
Expand All @@ -24,6 +24,10 @@ spec:
- containerPort: 5000
protocol: TCP
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INFERENCE_SERVER_URL
value: http://llm.ic-shared-llm.svc.cluster.local:3000/
- name: MAX_NEW_TOKENS
Expand All @@ -39,7 +43,7 @@ spec:
- name: REPETITION_PENALTY
value: '1.03'
- name: POSTGRES_HOST
value: claimdb.ic-shared-db.svc.cluster.local
value: claimdb.$(NAMESPACE).svc.cluster.local
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
Expand All @@ -60,16 +64,16 @@ spec:
- name: S3_ENDPOINT_URL
value: http://minio.ic-shared-minio.svc.cluster.local:9000
- name: IMAGES_BUCKET
value: claim-images
value: $(NAMESPACE)
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: miniocreds
name: secret-minio
key: aws_access_key_id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: miniocreds
name: secret-minio
key: aws_secret_access_key
resources: {}
terminationMessagePath: /dev/termination-log
Expand Down
75 changes: 75 additions & 0 deletions lab-materials/05/app/deployment-db.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: claimdb
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
selector:
matchLabels:
app: claimdb
replicas: 1
template:
metadata:
labels:
app: claimdb
spec:
containers:
- name: postgresql
image: registry.redhat.io/rhel9/postgresql-13:latest
resources:
limits:
memory: 512Mi
readinessProbe:
exec:
command:
- /usr/libexec/check-container
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
livenessProbe:
exec:
command:
- /usr/libexec/check-container
- '--live'
initialDelaySeconds: 120
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
env:
- name: POSTGRESQL_USER
valueFrom:
secretKeyRef:
name: claimdb
key: database-user
- name: POSTGRESQL_PASSWORD
valueFrom:
secretKeyRef:
name: claimdb
key: database-password
- name: POSTGRESQL_DATABASE
valueFrom:
secretKeyRef:
name: claimdb
key: database-name
securityContext:
capabilities: {}
privileged: false
ports:
- containerPort: 5432
protocol: TCP
imagePullPolicy: IfNotPresent
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: claimdb-data
mountPath: /var/lib/pgsql/data
volumes:
- name: claimdb-data
persistentVolumeClaim:
claimName: claimdb
strategy:
type: Recreate
12 changes: 9 additions & 3 deletions lab-materials/05/app/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,14 @@ resources:
# wave 0
- secret-db.yaml
- secret-minio.yaml
- pvc-db.yaml
- sql-script-configmap.yaml
# wave 1
- deployment.yaml
- service.yaml
- route.yaml
- deployment-db.yaml
- service-db.yaml
- db-init-job.yaml
- populate-images.yaml
# wave 2
- deployment-app.yaml
- service-app.yaml
- route-app.yaml
68 changes: 68 additions & 0 deletions lab-materials/05/app/populate-images.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
apiVersion: batch/v1
kind: Job
metadata:
name: populate-images
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
backoffLimit: 4
template:
spec:
initContainers:
- name: wait-for-minio
image: busybox:1.28
command: ['sh', '-c', 'until nc -z -v -w30 $MINIO_ENDPOINT 9000; do echo "Waiting for Minio connection..."; sleep 2; done;']
env:
- name: MINIO_ENDPOINT
value: minio.ic-shared-minio.svc.cluster.local
containers:
- name: add-images-to-bucket
image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/s2i-generic-data-science-notebook:1.2
imagePullPolicy: IfNotPresent
command: ["/bin/bash"]
args:
- -ec
- |-
git clone https://github.com/rh-aiservices-bu/insurance-claim-processing.git
cat << 'EOF' | python3
import boto3, os, botocore
s3 = boto3.client("s3",
endpoint_url=os.getenv("AWS_S3_ENDPOINT"),
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"))
# Set bucket
bucket_name = os.getenv("NAMESPACE")
# Upload original images to minio
for filename in os.listdir("insurance-claim-processing/bootstrap/ic-shared-database/images/original_images"):
with open(f"insurance-claim-processing/bootstrap/ic-shared-database/images/original_images/{filename}", "rb") as f:
s3.upload_fileobj(f, bucket_name, f"original_images/{filename}")
# Upload processed images to minio
for filename in os.listdir("insurance-claim-processing/bootstrap/ic-shared-database/images/processed_images"):
with open(f"insurance-claim-processing/bootstrap/ic-shared-database/images/processed_images/{filename}", "rb") as f:
s3.upload_fileobj(f, bucket_name, f"processed_images/{filename}")
EOF
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: AWS_S3_ENDPOINT
value: http://minio.ic-shared-minio.svc.cluster.local:9000
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: secret-minio
key: aws_access_key_id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: secret-minio
key: aws_secret_access_key
restartPolicy: Never
15 changes: 15 additions & 0 deletions lab-materials/05/app/pvc-db.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claimdb
labels:
app: claimdb
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ metadata:
name: ic-app
labels:
app: ic-app
annotations:
argocd.argoproj.io/sync-wave: "2"
spec:
to:
kind: Service
Expand Down
2 changes: 1 addition & 1 deletion lab-materials/05/app/secret-db.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ apiVersion: v1
metadata:
name: claimdb
labels:
app: ic-app-db
app: claimdb
annotations:
argocd.argoproj.io/sync-wave: "0"
stringData:
Expand Down
2 changes: 1 addition & 1 deletion lab-materials/05/app/secret-minio.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
kind: Secret
apiVersion: v1
metadata:
name: miniocreds
name: secret-minio
labels:
app: ic-app-minio
annotations:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ metadata:
labels:
app: ic-app
annotations:
argocd.argoproj.io/sync-wave: "1"
argocd.argoproj.io/sync-wave: "2"
spec:
ports:
- name: http
Expand Down
Loading

0 comments on commit bd94451

Please sign in to comment.