Merge pull request #128 from rh-aiservices-bu/feature/per-user-db

claim processing review - step 1
rh-aiservices-bu · Jan 17, 2024 · bd94451 · bd94451
2 parents f6f9f7d + a253b11
commit bd94451
Show file tree

Hide file tree

Showing 20 changed files with 505 additions and 22 deletions.
diff --git a/bootstrap/ic-shared-database/images/original_images/car0.jpg b/bootstrap/ic-shared-database/images/original_images/car0.jpg
diff --git a/bootstrap/ic-shared-database/images/original_images/car1.jpg b/bootstrap/ic-shared-database/images/original_images/car1.jpg
diff --git a/bootstrap/ic-shared-database/images/original_images/car2.jpg b/bootstrap/ic-shared-database/images/original_images/car2.jpg
diff --git a/bootstrap/ic-shared-database/images/original_images/car3.jpg b/bootstrap/ic-shared-database/images/original_images/car3.jpg
diff --git a/bootstrap/ic-shared-database/images/original_images/car5.jpg b/bootstrap/ic-shared-database/images/original_images/car5.jpg
diff --git a/bootstrap/ic-shared-database/images/original_images/car6.jpg b/bootstrap/ic-shared-database/images/original_images/car6.jpg
diff --git a/content/modules/ROOT/pages/05-05-process-claims.adoc b/content/modules/ROOT/pages/05-05-process-claims.adoc
@@ -2,30 +2,35 @@
 include::_attributes.adoc[]
 
 == What will the pipeline do?
-Now that we have the web app deployed, we need some way to process the claims in the web app. For that, we will use a pipeline that either can run ad-hoc or be scheduled just like the sanity check pipeline. +
+Now that we have the web app deployed, we can see that some claims are unprocessed yet. Of course, we want to way to do this processing, and it's even better if it can be fully automated!
+
+For that, we will use a pipeline that can either be run ad-hoc or scheduled just like, the sanity check pipeline.
+
 This pipeline is also a good starting point for creating an ArgoCD or Tekton pipeline which can be automatically triggered.
 
 == What's inside the pipeline?
-If you navigate to `lab-materials/05/05-05/` you can see a variety of files. +
-Just like before, we have both an Elyra version and a yaml version of the pipeline. This time, we will use the yaml file of the pipeline, which has been slightly customized to be able to run independently of Elyra. +
+
+If you navigate to `insurance-claim-processing/lab-materials/05/05-05` you can see a variety of files. +
+Just like before, we have both an Elyra version and a yaml version of the pipeline. This time, we will use the yaml definition of the pipeline, which has been slightly customized to be able to run independently of Elyra. +
 Here are the main files of the pipeline and what they do:
 
-* *get_claims* - Will connect to the database, fetch any unprocessed claims, and add them to a list that will be passed to the other tasks through a file `claims.json`.
-* The following will go through all the claims and use the full body of the text to try and find some important feature, then push that to the database:
+* *get_claims* - Will connect to the database, fetch any unprocessed claims, and add them to a list that will be passed to the other tasks through a file: `claims.json`.
+* The following will go through all the claims that need to be processed, and use the full body of the text to try and find some important feature, then push the results to the database:
 ** *get_location* - Finds the location of the accident.
 ** *get_accident_time* - Finds the time of the accident.
 ** *summarize_text* - Makes a short summary of the text.
 ** *get_sentiment* - Gets the sentiment of the text.
-* *detect_objects* - Downloads the images of the claims and uses the served object-detection model to find damages in the image.
+* *detect_objects* - Downloads the images of the claim and uses the served object-detection model to classify the damages in the image.
 
 == Create a new PVC
-Before we can run the pipeline, we need to create a PVC it can use to store file and results in. +
-Go to the OpenShift Console and navigate to Storage ->  PersistantStorageClaims.
+
+Before we can run the pipeline, we need to create a PVC that will be used to store intermediary files and results in. +
+Go to the OpenShift Console and navigate to Storage -> PersistentVolumeClaims.
 
 [.bordershadow]
 image::05/05-PVC.png[go to PVC]
 
-Make sure you are in the right project (your username) and then press `Create PersistantVolumeClaim`.
+Make sure you are in the right project (your username) and then press `Create PersistentVolumeClaim`.
 
 [.bordershadow]
 image::05/05-create-pvc.png[Create PVC]

diff --git a/lab-materials/05/05-05/process_claims.yaml b/lab-materials/05/05-05/process_claims.yaml
@@ -1,8 +1,8 @@
 apiVersion: tekton.dev/v1beta1
 kind: PipelineRun
 metadata:
-  # name: process-claims
-  generateName: training-pipeline-
+  name: process-claims
+  #generateName: training-pipeline-
   annotations:
     tekton.dev/output_artifacts: '{"run-a-file": [{"key": "artifacts/$PIPELINERUN/run-a-file/mlpipeline-metrics.tgz",
       "name": "mlpipeline-metrics", "path": "/tmp/mlpipeline-metrics.json"}, {"key":

diff --git a/lab-materials/05/app/db-init-job.yaml b/lab-materials/05/app/db-init-job.yaml
@@ -0,0 +1,59 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: db-init-job
+  annotations:
+    argocd.argoproj.io/sync-wave: "1"
+spec:
+  template:
+    spec:
+      initContainers:
+      - name: wait-for-db
+        image: busybox:1.28
+        command: ['sh', '-c', 'until nc -z -v -w30 $POSTGRESQL_DATABASE 5432; do echo "Waiting for database connection..."; sleep 2; done;']
+        env:
+        - name: NAMESPACE
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.namespace
+        - name: POSTGRESQL_DATABASE
+          value: claimdb.$(NAMESPACE).svc.cluster.local
+      containers:
+      - name: postgresql
+        image: registry.redhat.io/rhel9/postgresql-13:latest
+        env:
+          - name: NAMESPACE
+            valueFrom:
+              fieldRef:
+                fieldPath: metadata.namespace
+          - name: POSTGRESQL_DATABASE
+            valueFrom:
+              secretKeyRef:
+                name: claimdb
+                key: database-name
+          - name: POSTGRESQL_USER
+            valueFrom:
+              secretKeyRef:
+                name: claimdb
+                key: database-user
+          - name: PGPASSWORD
+            valueFrom:
+              secretKeyRef:
+                name: claimdb
+                key: database-password
+          - name: POSTGRESQL_DATABASE_HOST
+            value: claimdb.$(NAMESPACE).svc.cluster.local
+        command: ["/bin/bash", "-c"]
+        args:
+        - |
+          echo "Running SQL script"
+          psql -h $POSTGRESQL_DATABASE_HOST -p 5432 -U $POSTGRESQL_USER -d $POSTGRESQL_DATABASE -f /sql-script/script.sql
+        volumeMounts:
+        - name: sql-script-volume
+          mountPath: /sql-script
+      restartPolicy: Never
+      volumes:
+      - name: sql-script-volume
+        configMap:
+          name: sql-script-configmap
+  backoffLimit: 4
diff --git a/lab-materials/05/app/deployment.yaml → lab-materials/05/app/deployment-app.yaml b/lab-materials/05/app/deployment.yaml → lab-materials/05/app/deployment-app.yaml
@@ -4,7 +4,7 @@ kind: Deployment
 metadata:
   name: ic-app
   annotations:
-    argocd.argoproj.io/sync-wave: "1"
+    argocd.argoproj.io/sync-wave: "2"
 spec:
   replicas: 1
   selector:
@@ -24,6 +24,10 @@ spec:
             - containerPort: 5000
               protocol: TCP
           env:
+            - name: NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
             - name: INFERENCE_SERVER_URL
               value: http://llm.ic-shared-llm.svc.cluster.local:3000/
             - name: MAX_NEW_TOKENS
@@ -39,7 +43,7 @@ spec:
             - name: REPETITION_PENALTY
               value: '1.03'
             - name: POSTGRES_HOST
-              value: claimdb.ic-shared-db.svc.cluster.local
+              value: claimdb.$(NAMESPACE).svc.cluster.local
             - name: POSTGRES_DB
               valueFrom:
                 secretKeyRef:
@@ -60,16 +64,16 @@ spec:
             - name: S3_ENDPOINT_URL
               value: http://minio.ic-shared-minio.svc.cluster.local:9000
             - name: IMAGES_BUCKET
-              value: claim-images
+              value: $(NAMESPACE)
             - name: AWS_ACCESS_KEY_ID
               valueFrom:
                 secretKeyRef:
-                  name: miniocreds
+                  name: secret-minio
                   key: aws_access_key_id
             - name: AWS_SECRET_ACCESS_KEY
               valueFrom:
                 secretKeyRef:
-                  name: miniocreds
+                  name: secret-minio
                   key: aws_secret_access_key
           resources: {}
           terminationMessagePath: /dev/termination-log

diff --git a/lab-materials/05/app/deployment-db.yaml b/lab-materials/05/app/deployment-db.yaml
@@ -0,0 +1,75 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: claimdb
+  annotations:
+    argocd.argoproj.io/sync-wave: "1"
+spec:
+  selector:
+    matchLabels:
+      app: claimdb
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: claimdb
+    spec:
+      containers:
+        - name: postgresql
+          image: registry.redhat.io/rhel9/postgresql-13:latest
+          resources:
+            limits:
+              memory: 512Mi
+          readinessProbe:
+            exec:
+              command:
+                - /usr/libexec/check-container
+            initialDelaySeconds: 5
+            timeoutSeconds: 1
+            periodSeconds: 10
+            successThreshold: 1
+            failureThreshold: 3
+          livenessProbe:
+            exec:
+              command:
+                - /usr/libexec/check-container
+                - '--live'
+            initialDelaySeconds: 120
+            timeoutSeconds: 10
+            periodSeconds: 10
+            successThreshold: 1
+            failureThreshold: 3
+          env:
+            - name: POSTGRESQL_USER
+              valueFrom:
+                secretKeyRef:
+                  name: claimdb
+                  key: database-user
+            - name: POSTGRESQL_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: claimdb
+                  key: database-password
+            - name: POSTGRESQL_DATABASE
+              valueFrom:
+                secretKeyRef:
+                  name: claimdb
+                  key: database-name
+          securityContext:
+            capabilities: {}
+            privileged: false
+          ports:
+            - containerPort: 5432
+              protocol: TCP
+          imagePullPolicy: IfNotPresent
+          terminationMessagePath: /dev/termination-log
+          terminationMessagePolicy: File
+          volumeMounts:
+            - name: claimdb-data
+              mountPath: /var/lib/pgsql/data
+      volumes:
+        - name: claimdb-data
+          persistentVolumeClaim:
+            claimName: claimdb
+  strategy:
+    type: Recreate
diff --git a/lab-materials/05/app/kustomization.yaml b/lab-materials/05/app/kustomization.yaml
@@ -9,8 +9,14 @@ resources:
 # wave 0
 - secret-db.yaml
 - secret-minio.yaml
+- pvc-db.yaml
+- sql-script-configmap.yaml
 # wave 1
-- deployment.yaml
-- service.yaml
-- route.yaml
+- deployment-db.yaml
+- service-db.yaml
+- db-init-job.yaml
+- populate-images.yaml
 # wave 2
+- deployment-app.yaml
+- service-app.yaml
+- route-app.yaml
diff --git a/lab-materials/05/app/populate-images.yaml b/lab-materials/05/app/populate-images.yaml
@@ -0,0 +1,68 @@
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: populate-images
+  annotations:
+    argocd.argoproj.io/sync-wave: "1"
+spec:
+  backoffLimit: 4
+  template:
+    spec:
+      initContainers:
+      - name: wait-for-minio
+        image: busybox:1.28
+        command: ['sh', '-c', 'until nc -z -v -w30 $MINIO_ENDPOINT 9000; do echo "Waiting for Minio connection..."; sleep 2; done;']
+        env:
+        - name: MINIO_ENDPOINT
+          value: minio.ic-shared-minio.svc.cluster.local
+      containers:
+      - name: add-images-to-bucket
+        image: image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/s2i-generic-data-science-notebook:1.2
+        imagePullPolicy: IfNotPresent
+        command: ["/bin/bash"]
+        args:
+        - -ec
+        - |-
+          git clone https://github.com/rh-aiservices-bu/insurance-claim-processing.git
+
+          cat << 'EOF' | python3
+          import boto3, os, botocore
+
+          s3 = boto3.client("s3",
+                            endpoint_url=os.getenv("AWS_S3_ENDPOINT"),
+                            aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
+                            aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"))
+
+          # Set bucket
+          bucket_name = os.getenv("NAMESPACE")
+
+          # Upload original images to minio
+          for filename in os.listdir("insurance-claim-processing/bootstrap/ic-shared-database/images/original_images"):
+              with open(f"insurance-claim-processing/bootstrap/ic-shared-database/images/original_images/{filename}", "rb") as f:
+                  s3.upload_fileobj(f, bucket_name, f"original_images/{filename}")
+
+          # Upload processed images to minio
+          for filename in os.listdir("insurance-claim-processing/bootstrap/ic-shared-database/images/processed_images"):
+              with open(f"insurance-claim-processing/bootstrap/ic-shared-database/images/processed_images/{filename}", "rb") as f:
+                  s3.upload_fileobj(f, bucket_name, f"processed_images/{filename}")
+
+          EOF
+        env:
+        - name: NAMESPACE
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.namespace
+        - name: AWS_S3_ENDPOINT
+          value: http://minio.ic-shared-minio.svc.cluster.local:9000
+        - name: AWS_ACCESS_KEY_ID
+          valueFrom:
+            secretKeyRef:
+              name: secret-minio
+              key: aws_access_key_id
+        - name: AWS_SECRET_ACCESS_KEY
+          valueFrom:
+            secretKeyRef:
+              name: secret-minio
+              key: aws_secret_access_key
+      restartPolicy: Never
diff --git a/lab-materials/05/app/pvc-db.yaml b/lab-materials/05/app/pvc-db.yaml
@@ -0,0 +1,15 @@
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: claimdb
+  labels:
+    app: claimdb
+  annotations:
+    argocd.argoproj.io/sync-wave: "0"
+spec:
+  accessModes:
+  - ReadWriteOnce
+  resources:
+    requests:
+      storage: 100Mi
diff --git a/lab-materials/05/app/route.yaml → lab-materials/05/app/route-app.yaml b/lab-materials/05/app/route.yaml → lab-materials/05/app/route-app.yaml
@@ -5,6 +5,8 @@ metadata:
   name: ic-app
   labels:
     app: ic-app
+  annotations:
+    argocd.argoproj.io/sync-wave: "2"
 spec:
   to:
     kind: Service

diff --git a/lab-materials/05/app/secret-db.yaml b/lab-materials/05/app/secret-db.yaml
@@ -4,7 +4,7 @@ apiVersion: v1
 metadata:
   name: claimdb
   labels:
-    app: ic-app-db
+    app: claimdb
   annotations:
     argocd.argoproj.io/sync-wave: "0"
 stringData:

diff --git a/lab-materials/05/app/secret-minio.yaml b/lab-materials/05/app/secret-minio.yaml
@@ -2,7 +2,7 @@
 kind: Secret
 apiVersion: v1
 metadata:
-  name: miniocreds
+  name: secret-minio
   labels:
     app: ic-app-minio
   annotations:

diff --git a/lab-materials/05/app/service.yaml → lab-materials/05/app/service-app.yaml b/lab-materials/05/app/service.yaml → lab-materials/05/app/service-app.yaml
@@ -6,7 +6,7 @@ metadata:
   labels:
     app: ic-app
   annotations:
-    argocd.argoproj.io/sync-wave: "1"
+    argocd.argoproj.io/sync-wave: "2"
 spec:
   ports:
   - name: http