Skip to content

Commit

Permalink
Merge branch 'main' into mfioramo-patch-3
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexanderHodicke authored Dec 2, 2024
2 parents 05f588e + 478e536 commit 492966f
Show file tree
Hide file tree
Showing 21 changed files with 377 additions and 10 deletions.
2 changes: 1 addition & 1 deletion ai/generative-ai-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Reviewed: 13.11.2024

## Reusable Assets Overview
- [Podcast Generator](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/ai-speech/podcast-generator)

- [Decode Images and Videos with OCI GenAI](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/generative-ai-service/decode-Images-and-Videos-with-OCI-GenAI)

# Useful Links

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@

# Decode Images and Videos with OCI GenAI

This is an AI-powered application designed to unlock insights hidden within media files using the Oracle Cloud Infrastructure (OCI) Generative AI services. This application enables users to analyze images and videos, generating detailed summaries in multiple languages. Whether you are a content creator, researcher, or media enthusiast, this app helps you interpret visual content with ease.

<img src="./image.png">
</img>
---

## Features

### 🌍 **Multi-Language Support**
- Receive summaries in your preferred language, including:
- English, French, Arabic, Spanish, Italian, German, Portuguese, Japanese, Korean, and Chinese.

### 🎥 **Customizable Frame Processing for Videos**
- Extract video frames at user-defined intervals.
- Analyze specific frame ranges to tailor your results for precision.

### **Parallel Processing**
- Uses efficient parallel computation for quick and accurate frame analysis.

### 🖼️ **Image Analysis**
- Upload images to generate detailed summaries based on your input prompt.

### 🧠 **Cohesive Summaries**
- Combines individual frame insights to create a seamless, cohesive summary of the video’s overall theme, events, and key details.

---

## Technologies Used
- **[Streamlit](https://streamlit.io/):** For building an interactive user interface.
- **[Oracle Cloud Infrastructure (OCI) Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm):** For powerful image and video content analysis.
- **[OpenCV](https://opencv.org/):** For video frame extraction and processing.
- **[Pillow (PIL)](https://pillow.readthedocs.io/):** For image handling and processing.
- **[tqdm](https://tqdm.github.io/):** For progress visualization in parallel processing.

---

## Installation

1. **Clone the repository:**


2. **Install dependencies:**
Make sure you have Python 3.8+ installed. Then, install the required libraries:
```bash
pip install -r requirements.txt
```

3. **Configure OCI:**
- Set up your OCI configuration by creating or updating the `~/.oci/config` file with your credentials and profile.
- Replace placeholders like `compartmentId`, `llm_service_endpoint`, and `visionModel` in the code with your actual values.

---

## Usage

1. **Run the application:**
```bash
streamlit run app.py
```

2. **Upload a file:**
- Use the sidebar to upload an image (`.png`, `.jpg`, `.jpeg`) or a video (`.mp4`, `.avi`, `.mov`).

3. **Set parameters:**
- For videos, adjust the frame extraction interval and select specific frame ranges for analysis.

4. **Analyze and summarize:**
- Enter a custom prompt to guide the AI in generating a meaningful summary.
- Choose the output language from the sidebar.

5. **Get results:**
- View detailed image summaries or cohesive video summaries directly in the app.

---

## Screenshots
### Image Analysis
<img src="./image2.png">
</img>

### Video Analysis
<img src="./image3.png">
</img>

---


## Acknowledgments
- Oracle Cloud Infrastructure Generative AI for enabling state-of-the-art visual content analysis.
- Open-source libraries like OpenCV, Pillow, and Streamlit for providing powerful tools to build this application.

---

## Contact
If you have questions or feedback, feel free to reach out via [[email protected]](mailto:[email protected]).
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Author: Ansh
import streamlit as st
import oci
import base64
import cv2
from PIL import Image
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# OCI Configuration
compartmentId = "ocid1.compartment.oc1..XXXXXXXXXXXXXxxxxxxxxxxxxxxxxxxxxxxxxxxx"
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
CONFIG_PROFILE = "DEFAULT"
visionModel = "meta.llama-3.2-90b-vision-instruct"
summarizeModel = "cohere.command-r-plus-08-2024"
config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE)
llm_client = oci.generative_ai_inference.GenerativeAiInferenceClient(
config=config,
service_endpoint=llm_service_endpoint,
retry_strategy=oci.retry.NoneRetryStrategy(),
timeout=(10, 240)
)

# Functions for Image Analysis
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")

# Functions for Video Analysis
def encode_cv2_image(frame):
_, buffer = cv2.imencode('.jpg', frame)
return base64.b64encode(buffer).decode("utf-8")

# Common Functions
def get_message(encoded_image=None, user_prompt=None):
content1 = oci.generative_ai_inference.models.TextContent()
content1.text = user_prompt

message = oci.generative_ai_inference.models.UserMessage()
message.content = [content1]

if encoded_image:
content2 = oci.generative_ai_inference.models.ImageContent()
image_url = oci.generative_ai_inference.models.ImageUrl()
image_url.url = f"data:image/jpeg;base64,{encoded_image}"
content2.image_url = image_url
message.content.append(content2)
return message

def get_chat_request(encoded_image=None, user_prompt=None):
chat_request = oci.generative_ai_inference.models.GenericChatRequest()
chat_request.messages = [get_message(encoded_image, user_prompt)]
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC
chat_request.num_generations = 1
chat_request.is_stream = False
chat_request.max_tokens = 500
chat_request.temperature = 0.75
chat_request.top_p = 0.7
chat_request.top_k = -1
chat_request.frequency_penalty = 1.0
return chat_request

def cohere_chat_request(encoded_image=None, user_prompt=None):
print(" i am here")
chat_request = oci.generative_ai_inference.models.CohereChatRequest()
chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_COHERE
message = get_message(encoded_image, user_prompt)
chat_request.message = message.content[0].text
chat_request.is_stream = False
chat_request.preamble_override = "Make sure you answer only in "+ lang_type
chat_request.max_tokens = 500
chat_request.temperature = 0.75
chat_request.top_p = 0.7
chat_request.top_k = 0
chat_request.frequency_penalty = 1.0
return chat_request


def get_chat_detail(chat_request,model):
chat_detail = oci.generative_ai_inference.models.ChatDetails()
chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id=model)
chat_detail.compartment_id = compartmentId
chat_detail.chat_request = chat_request
return chat_detail

def extract_frames(video_path, interval=1):
frames = []
cap = cv2.VideoCapture(video_path)
frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
success, frame = cap.read()
count = 0

while success:
if count % (frame_rate * interval) == 0:
frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
success, frame = cap.read()
count += 1
cap.release()
return frames

def process_frame(llm_client, frame, prompt):
encoded_image = encode_cv2_image(frame)
try:
llm_request = get_chat_request(encoded_image, prompt)
llm_payload = get_chat_detail(llm_request,visionModel)
llm_response = llm_client.chat(llm_payload)
return llm_response.data.chat_response.choices[0].message.content[0].text
except Exception as e:
return f"Error processing frame: {str(e)}"

def process_frames_parallel(llm_client, frames, prompt):
with ThreadPoolExecutor() as executor:
results = list(tqdm(
executor.map(lambda frame: process_frame(llm_client, frame, prompt), frames),
total=len(frames),
desc="Processing frames"
))
return results

def generate_final_summary(llm_client, frame_summaries):
combined_summaries = "\n".join(frame_summaries)
final_prompt = (
"You are a video content summarizer. Below are summaries of individual frames extracted from a video. "
"Using these frame summaries, create a cohesive and concise summary that describes the content of the video as a whole. "
"Focus on providing insights about the overall theme, events, or key details present in the video, and avoid referring to individual frames or images explicitly.\n\n"
f"{combined_summaries}"
)
try:
llm_request = cohere_chat_request(user_prompt=final_prompt)
llm_payload = get_chat_detail(llm_request,summarizeModel)
llm_response = llm_client.chat(llm_payload)
return llm_response.data.chat_response.text
except Exception as e:
return f"Error generating final summary: {str(e)}"

# Streamlit UI
st.title("Decode Images and Videos with OCI GenAI")
uploaded_file = st.sidebar.file_uploader("Upload an image or video", type=["png", "jpg", "jpeg", "mp4", "avi", "mov"])
user_prompt = st.text_input("Enter your prompt for analysis:", value="Describe the content of this image.")
lang_type = st.sidebar.selectbox("Output Language", ["English", "French", "Arabic", "Spanish", "Italian", "German", "Portuguese", "Japanese", "Korean", "Chinese"])

if uploaded_file:
if uploaded_file.name.split('.')[-1].lower() in ["png", "jpg", "jpeg"]:
# Image Analysis
temp_image_path = "temp_uploaded_image.jpg"
with open(temp_image_path, "wb") as f:
f.write(uploaded_file.getbuffer())

st.image(temp_image_path, caption="Uploaded Image", width=500)

if st.button("Generate image Summary"):
with st.spinner("Analyzing the image..."):
try:
encoded_image = encode_image(temp_image_path)
llm_request = get_chat_request(encoded_image, user_prompt)
llm_payload = get_chat_detail(llm_request,visionModel)
llm_response = llm_client.chat(llm_payload)
llm_text = llm_response.data.chat_response.choices[0].message.content[0].text
st.success("OCI gen AI Response:")
st.write(llm_text)
except Exception as e:
st.error(f"An error occurred: {str(e)}")
elif uploaded_file.name.split('.')[-1].lower() in ["mp4", "avi", "mov"]:

# Video Analysis
temp_video_path = "temp_uploaded_video.mp4"
video_html = f"""
<video width="600" height="300" controls>
<source src="data:video/mp4;base64,{base64.b64encode(open(temp_video_path, 'rb').read()).decode()}" type="video/mp4">
Your browser does not support the video tag.
</video>
"""
st.markdown(video_html, unsafe_allow_html=True)
with open(temp_video_path, "wb") as f:
f.write(uploaded_file.getbuffer())

# st.video(temp_video_path)
st.write("Processing the video...")

frame_interval = st.sidebar.slider("Frame extraction interval (seconds)", 1, 10, 1)
frames = extract_frames(temp_video_path, interval=frame_interval)
num_frames = len(frames)
st.write(f"Total frames extracted: {num_frames}")

frame_range = st.sidebar.slider("Select frame range for analysis", 0, num_frames - 1, (0, num_frames - 1))

if st.button("Generate Video Summary"):
with st.spinner("Analyzing selected frames..."):
try:
selected_frames = frames[frame_range[0]:frame_range[1] + 1]
waiting_message = st.empty()
waiting_message.write(f"Selected {len(selected_frames)} frames for processing.")
# st.write(f"Selected {len(selected_frames)} frames for processing.")
frame_summaries = process_frames_parallel(llm_client, selected_frames, user_prompt)
# st.write("Generating final video summary...")
waiting_message.empty()
waiting_message.write("Generating final video summary...")
final_summary = generate_final_summary(llm_client, frame_summaries)
waiting_message.empty()
st.success("Video Summary:")
st.write(final_summary)
except Exception as e:
st.error(f"An error occurred: {str(e)}")
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
streamlit==1.33.0
oci==3.50.1
Pillow
opencv-python-headless==4.10.0.84
tqdm==4.66.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Copyright (c) 2024 Oracle and/or its affiliates.

The Universal Permissive License (UPL), Version 1.0

Subject to the condition set forth below, permission is hereby granted to any
person obtaining a copy of this software, associated documentation and/or data
(collectively the "Software"), free of charge and under any and all copyright
rights in the Software, and any and all patent rights owned or freely
licensable by each licensor hereunder covering either (i) the unmodified
Software as contributed to or provided by such licensor, or (ii) the Larger
Works (as defined below), to deal in both

(a) the Software, and
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
one is included with the Software (each a "Larger Work" to which the Software
is contributed by such licensors),

without restriction, including without limitation the rights to copy, create
derivative works of, display, perform, and distribute the Software and make,
use, sell, offer for sale, import, export, have made, and have sold the
Software and the Larger Work(s), and to sublicense the foregoing rights on
either these or other terms.

This license is subject to the following condition:
The above copyright notice and either this complete permission notice or at
a minimum a reference to the UPL must be included in all copies or
substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# OIC Adapters clickthrough presentations

Assets that contain oic adapters configuration and implementation practice for the
- EPM Adapter
- ERP Adapter
- HCM Adapter
- Kafka Adapter
- EBS Adapter

Review Date: 28.11.2024

# When to use these assets?

These assets should be used whenever needed to design solutions with mentioned applications and resources integration.

# How to use these asset?

The information is generic in nature and not specified for a particular customer.

# License

Copyright (c) 2024 Oracle and/or its affiliates.

Licensed under the Universal Permissive License (UPL), Version 1.0.

See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit 492966f

Please sign in to comment.