Skip to content

Commit

Permalink
YouTube transcript support; Style changes; dependabot
Browse files Browse the repository at this point in the history
  • Loading branch information
elegiggle committed Apr 16, 2024
1 parent d244e58 commit d82b0df
Show file tree
Hide file tree
Showing 5 changed files with 114 additions and 30 deletions.
19 changes: 19 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: 2
updates:
- package-ecosystem: github-actions
directory: "/"
schedule:
interval: daily
open-pull-requests-limit: 10

- package-ecosystem: docker
directory: "/"
schedule:
interval: daily
open-pull-requests-limit: 10

- package-ecosystem: pip
directory: "/"
schedule:
interval: daily
open-pull-requests-limit: 10
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.12.2-slim
FROM python:3.12.3-slim

WORKDIR /app

Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
<p align="center">
<a href="https://github.com/Elehiggle/Claude3MattermostChatbot/stargazers"><img src="https://img.shields.io/github/stars/Elehiggle/Claude3MattermostChatbot?style=flat-square" alt="GitHub Repo stars"></a>
<a href="https://github.com/Elehiggle/Claude3MattermostChatbot/actions/workflows/docker-publish.yml"><img src="https://img.shields.io/github/actions/workflow/status/Elehiggle/Claude3MattermostChatbot/docker-publish.yml?branch=master&label=build&logo=github&style=flat-square" alt="GitHub Actions Workflow Status"></a>
<a href="https://hub.docker.com/r/elehiggle/claude3mattermostchatbot"><img src="https://img.shields.io/docker/stars/elehiggle/claude3mattermostchatbot.svg?style=flat-square&logo=docker" alt="Docker Stars"></a>
<a href="https://hub.docker.com/r/elehiggle/claude3mattermostchatbot"><img src="https://img.shields.io/docker/pulls/elehiggle/claude3mattermostchatbot.svg?style=flat-square&logo=docker" alt="Docker Pulls"></a>
<a href="https://github.com/Elehiggle/Claude3MattermostChatbot/commits/master"><img src="https://img.shields.io/github/last-commit/Elehiggle/Claude3MattermostChatbot?style=flat-square" alt="GitHub last commit"></a>
<a href="https://github.com/Elehiggle/Claude3MattermostChatbot/blob/master/LICENSE"><img src="https://img.shields.io/github/license/Elehiggle/Claude3MattermostChatbot?style=flat-square" alt="GitHub License"></a>
</p>
[![GitHub Repo stars](https://img.shields.io/github/stars/Elehiggle/Claude3MattermostChatbot?style=flat-square)](https://github.com/Elehiggle/Claude3MattermostChatbot/stargazers)
[![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/Elehiggle/Claude3MattermostChatbot/docker-publish.yml?branch=master&label=build&logo=github&style=flat-square)](https://github.com/Elehiggle/Claude3MattermostChatbot/actions/workflows/docker-publish.yml)
[![Docker Stars](https://img.shields.io/docker/stars/elehiggle/claude3mattermostchatbot.svg?style=flat-square&logo=docker)](https://hub.docker.com/r/elehiggle/claude3mattermostchatbot)
[![Docker Pulls](https://img.shields.io/docker/pulls/elehiggle/claude3mattermostchatbot.svg?style=flat-square&logo=docker)](https://hub.docker.com/r/elehiggle/claude3mattermostchatbot)
[![GitHub last commit](https://img.shields.io/github/last-commit/Elehiggle/Claude3MattermostChatbot?style=flat-square)](https://github.com/Elehiggle/Claude3MattermostChatbot/commits/master)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/2a60f2fb1c0d4e53922aa79f7204dac4)](https://app.codacy.com/gh/Elehiggle/Claude3MattermostChatbot/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[![GitHub License](https://img.shields.io/github/license/Elehiggle/Claude3MattermostChatbot?style=flat-square)](https://github.com/Elehiggle/Claude3MattermostChatbot/blob/master/LICENSE)

# Claude3MattermostChatbot

Expand All @@ -18,6 +17,7 @@ This project is a chatbot for Mattermost that integrates with the Anthropic API
- Responds to messages mentioning "@chatbot" (or rather the chatbot's username) or direct messages
- Extracts text content from links shared in the messages
- Supports the **Vision API** for describing images provided as URLs within the chat message
- Gets transcripts of YouTube videos for easy tl;dw summarizations
- Maintains context of the conversation within a thread
- Sends typing indicators to show that the chatbot is processing the message
- Utilizes a thread pool to handle multiple requests concurrently (due to `mattermostdriver-asyncio` being outdated)
Expand All @@ -35,20 +35,20 @@ This project is a chatbot for Mattermost that integrates with the Anthropic API

1. Clone the repository:

```bash
git clone https://github.com/Elehiggle/Claude3MattermostChatbot.git
cd Claude3MattermostChatbot
```
```bash
git clone https://github.com/Elehiggle/Claude3MattermostChatbot.git
cd Claude3MattermostChatbot
```

2. Install the required dependencies:

```bash
pip3 install -r requirements.txt
```
_or alternatively:_
```bash
python3.12 -m pip install anthropic mattermostdriver ssl certifi beautifulsoup4 pillow httpx
```
```bash
pip3 install -r requirements.txt
```
_or alternatively:_
```bash
python3.12 -m pip install anthropic mattermostdriver ssl certifi beautifulsoup4 pillow httpx youtube-transcript-api
```

3. Set the following environment variables with your own values (most are optional):

Expand All @@ -62,7 +62,7 @@ python3.12 -m pip install anthropic mattermostdriver ssl certifi beautifulsoup4
| `MATTERMOST_PASSWORD` | Required if not using token. The password of the dedicated Mattermost user account for the chatbot (if using username/password login) |
| `MATTERMOST_MFA_TOKEN` | The MFA token of the dedicated Mattermost user account for the chatbot (if using MFA) |
#### Extended optional configuration variables:
### Extended optional configuration variables:
| Parameter | Description |
|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand Down
80 changes: 72 additions & 8 deletions chatbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import httpx
from io import BytesIO
from PIL import Image
from youtube_transcript_api import YouTubeTranscriptApi

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -43,7 +44,7 @@ def cdc(*args, **kwargs):
temperature = float(os.getenv("TEMPERATURE", "0.15"))
system_prompt_unformatted = os.getenv(
"AI_SYSTEM_PROMPT",
"You are a helpful assistant. The current UTC time is {current_time}. Whenever users asks you for help you will provide them with succinct answers formatted using Markdown; do not unnecessarily greet people with their name. Do not be apologetic. You know the user's name as it is provided within [CONTEXT, from:username] bracket at the beginning of a user-role message. Never add any CONTEXT bracket to your replies (eg. [CONTEXT, from:{chatbot_username}]). The CONTEXT bracket may also include grabbed text from a website if a user adds a link to his question.",
"You are a helpful assistant. The current UTC time is {current_time}. Whenever users asks you for help you will provide them with succinct answers formatted using Markdown; do not unnecessarily greet people with their name. Do not be apologetic. You know the user's name as it is provided within [CONTEXT, from:username] bracket at the beginning of a user-role message. Never add any CONTEXT bracket to your replies (eg. [CONTEXT, from:{chatbot_username}]). The CONTEXT bracket may also include grabbed text from a website if a user adds a link to his question. Users may post YouTube links for which you will get the transcript, in your answer DO NOT don't contain the link to the video the user just provided to you as he already knows it.",
)

# Mattermost server details
Expand Down Expand Up @@ -81,7 +82,7 @@ def cdc(*args, **kwargs):

# Chatbot account username, automatically fetched
chatbot_username = ""
chatbot_usernameAt = ""
chatbot_username_at = ""

# Create an AI client instance
ai_client = Anthropic(api_key=api_key, base_url=ai_api_baseurl)
Expand All @@ -94,7 +95,6 @@ def get_system_instructions():
current_time = datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d %H:%M:%S.%f")[
:-3
]
global chatbot_username
return system_prompt_unformatted.format(
current_time=current_time, chatbot_username=chatbot_username
)
Expand Down Expand Up @@ -183,7 +183,7 @@ def split_message(msg, max_length=4000):
return [msg]

if len(msg) > 40000:
raise Exception(f"Message too long.")
raise Exception("Message too long.")

current_chunk = "" # Holds the current message chunk
chunks = [] # Collects all message chunks
Expand Down Expand Up @@ -274,6 +274,9 @@ def handle_text_generation(
response_text = response_text.replace(link, "")
response_text = response_text.strip()

# Failsafe: Remove all empty Markdown links
response_text = re.sub(r"\[.*?]\(\)", "", response_text).strip()

# Split the response into multiple messages if necessary
response_parts = split_message(response_text)

Expand Down Expand Up @@ -335,7 +338,7 @@ async def message_handler(event):
return

# Remove the "@chatbot" mention from the message
message = post["message"].replace(chatbot_usernameAt, "").strip()
message = post["message"].replace(chatbot_username_at, "").strip()
channel_id = post["channel_id"]
sender_name = sanitize_username(event_data["data"]["sender_name"])
root_id = post["root_id"] # Get the root_id of the thread
Expand Down Expand Up @@ -383,7 +386,7 @@ async def message_handler(event):

# Add the current message to the messages array if "@chatbot" is mentioned, the chatbot has already been invoked in the thread or its a DM
if (
chatbot_usernameAt in post["message"]
chatbot_username_at in post["message"]
or chatbot_invoked
or channel_display_name.startswith("@")
):
Expand All @@ -400,6 +403,11 @@ async def message_handler(event):
logging.info(f"Skipping local URL: {link}")
continue
try:
if yt_is_valid_url(link):
transcript_text = yt_get_transcript(link)
extracted_text += transcript_text
continue

with client.stream(
"GET", link, timeout=4, follow_redirects=True
) as response:
Expand Down Expand Up @@ -576,13 +584,69 @@ async def message_handler(event):
logging.error(f"Error message_handler: {str(e)} {traceback.format_exc()}")


def yt_find_preferred_transcript(video_id):
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

# Define the preferred order of transcript types and languages
preferred_order = [
("manual", "en"),
("manual", None),
("generated", "en"),
("generated", None),
]

# Convert the TranscriptList to a regular list
transcripts = list(transcript_list)

# Sort the transcripts based on the preferred order
transcripts.sort(
key=lambda t: (
preferred_order.index((t.is_generated, t.language_code))
if (t.is_generated, t.language_code) in preferred_order
else len(preferred_order)
)
)

# Return the first transcript in the sorted list
return transcripts[0] if transcripts else None


def yt_extract_video_id(url):
pattern = r"(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/|youtube\.com/shorts/)([^\"&?/\s]{11})"
match = re.search(pattern, url)
return match.group(1) if match else None


def yt_get_transcript(url):
try:
video_id = yt_extract_video_id(url)
preferred_transcript = yt_find_preferred_transcript(video_id)

if preferred_transcript:
transcript = preferred_transcript.fetch()
return str(transcript)
except Exception as e:
logging.info(f"YouTube Transcript Exception: {str(e)}")

return (
"*COULD NOT FETCH THE VIDEO TRANSCRIPT FOR THE CHATBOT, WARN THE CHATBOT USER*"
)


def yt_is_valid_url(url):
# Pattern to match various YouTube URL formats including video IDs
pattern = r"(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/|youtube\.com/shorts/)([^\"&?/\s]{11})"
match = re.search(pattern, url)
return bool(match) # True if match found, False otherwise


def main():
try:
# Log in to the Mattermost server
driver.login()
global chatbot_username, chatbot_usernameAt
global chatbot_username, chatbot_username_at
chatbot_username = driver.client.username
chatbot_usernameAt = f"@{chatbot_username}"
chatbot_username_at = f"@{chatbot_username}"

logging.info(f"SYSTEM PROMPT: {get_system_instructions()}")

Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ mattermostdriver
certifi
beautifulsoup4
pillow
httpx
httpx
youtube-transcript-api

0 comments on commit d82b0df

Please sign in to comment.