Zendesk Ticket Embedding and Similarity Search

Overview

This application provides a powerful solution for analyzing and searching Zendesk support tickets using advanced natural language processing techniques. By generating vector embeddings for ticket content and storing them in Google BigQuery, the system enables semantic similarity searches that go beyond simple keyword matching.

Key Features

Generate embeddings for Zendesk ticket content using local LLM (Language Model)
Store and manage embeddings efficiently in BigQuery
Perform semantic similarity searches across ticket database
Support both batch processing and real-time embedding generation
Flexible deployment options (local development or cloud function)

Why This Matters

Support teams often need to find similar past tickets to:

Identify recurring issues
Apply previously successful solutions
Understand patterns in customer inquiries
Improve response consistency and efficiency

Traditional keyword-based searches can miss contextually similar tickets. This system uses embeddings to capture the semantic meaning of ticket content, enabling more intelligent and relevant search results.

Project Structure

BQ_Embeddings/
├── .env                        # Environment variables configuration
├── .gitignore                 # Git ignore file
├── bq_embedding.py            # Batch processing for BigQuery ticket data
├── embed_cloud_fn.py          # Cloud Function for embedding generation
├── get_embedding.py           # Core embedding generation utility
├── get_embeddings_local_llm.py # Local LLM integration for embeddings
├── main.py                    # Flask application for local development
├── similarity_search.py       # Similarity search implementation
├── looker-tickets_zendesk_schema.md # BigQuery schema documentation
└── archive/                   # Archived documentation and legacy code

Core Components

Embedding Generation
- get_embedding.py: Core utility for generating embeddings
- embed_cloud_fn.py: Cloud Function implementation
- main.py: Local development server
- Uses LM Studio with All-MiniLM-L6-v2-Embedding-GGUF model
Data Processing
- bq_embedding.py: Batch processes tickets in BigQuery
- get_embeddings_local_llm.py: Local LLM integration
- Handles missing embeddings and updates
Similarity Search
- similarity_search.py: Implements cosine similarity search
- Uses BigQuery for efficient vector comparisons
- Configurable similarity thresholds

BigQuery Architecture

The system uses the following BigQuery structure:

Project: looker-tickets
└── Dataset: zendesk
    ├── conversations_complete    # Stores ticket content and embeddings
    ├── similarity_search_results # Temporary results storage
    └── Additional tables for analytics and processing

Prerequisites

Python Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install openai google-cloud-bigquery flask requests

LM Studio
- Download and install LM Studio from their official website
- Load the "All-MiniLM-L6-v2-Embedding-GGUF" model
- Start the local server (default port: 1234)
ngrok
- Install ngrok
- Set up tunnel to LM Studio:
```
ngrok http 1234
```
- Note the generated URL (e.g., https://your-tunnel.ngrok.io)

Google Cloud Setup

Install Google Cloud SDK
Authenticate:
```
gcloud auth application-default login
```

Set your project:

gcloud config set project your-project-id

BigQuery Setup

Create Dataset

CREATE SCHEMA IF NOT EXISTS `your-project.zendesk`;

Create Tables

-- Table for storing conversations with embeddings
CREATE TABLE IF NOT EXISTS `your-project.zendesk.conversations_complete` (
  ticket_id INT64,
  embeddings ARRAY<FLOAT64>
);

-- Table for similarity search results (created automatically by the script)
-- your-project.zendesk.similarity_search_results

Configuration

Environment Variables Create a .env file:

GOOGLE_CLOUD_PROJECT=your-project-id
LM_STUDIO_URL=http://localhost:1234
NGROK_URL=your-ngrok-url

Running the Application

1. Generate Embeddings

Start Local Services

# Start LM Studio and ensure it's running on port 1234
# Start ngrok tunnel
ngrok http 1234

Update URLs
- Copy your ngrok URL
- Update embed_cloud_fn.py and main.py with the new URL

Run Embedding Generation

# For local development
python main.py

# For cloud function (if deployed)
curl -X POST https://your-cloud-function-url/get-embedding \
  -H "Content-Type: application/json" \
  -d '{"text": "your text here"}'

2. Execute Similarity Search

Start a New Search
```
python similarity_search.py
```
This will:
- Generate embedding for your input text
- Create/update the similarity_search_results table
- Start the comparison job
Check Results
```
python similarity_search.py check
```
This will show:
- Total matching tickets
- Average similarity score
- Minimum and maximum scores
- List of all matching tickets

Troubleshooting

LM Studio Connection Issues
- Verify LM Studio is running (http://localhost:1234)
- Check ngrok tunnel status
- Verify URLs in configuration
BigQuery Issues
- Verify Google Cloud authentication
- Check project and dataset permissions
- Validate table schemas
Common Error Messages
- "Failed to get embeddings": Check LM Studio and ngrok connection
- "Permission denied": Verify Google Cloud credentials
- "Table not found": Ensure BigQuery tables are created

Monitoring

Check Logs

# For cloud function
gcloud functions logs read

# For local server
Check terminal output

Monitor BigQuery Usage
- Visit Google Cloud Console > BigQuery
- Check query history and job status

Best Practices

Performance
- Use batch processing for large datasets
- Monitor memory usage when generating multiple embeddings
- Consider implementing rate limiting for API endpoints
Security
- Keep credentials secure
- Use environment variables for sensitive data
- Regularly rotate API keys
Maintenance
- Regularly update dependencies
- Monitor ngrok tunnel stability
- Back up important data and configurations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zendesk Ticket Embedding and Similarity Search

Overview

Key Features

Why This Matters

Project Structure

Core Components

BigQuery Architecture

Prerequisites

BigQuery Setup

Configuration

Running the Application

1. Generate Embeddings

2. Execute Similarity Search

Troubleshooting

Monitoring

Best Practices

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
archive		archive
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bq_embedding.py		bq_embedding.py
embed_cloud_fn.py		embed_cloud_fn.py
get_embedding.py		get_embedding.py
get_embeddings_local_llm.py		get_embeddings_local_llm.py
looker-tickets_zendesk_schema.md		looker-tickets_zendesk_schema.md
main.py		main.py
similarity_search.py		similarity_search.py

License

wrenchchatrepo/BigQuery_Embeddings_App

Folders and files

Latest commit

History

Repository files navigation

Zendesk Ticket Embedding and Similarity Search

Overview

Key Features

Why This Matters

Project Structure

Core Components

BigQuery Architecture

Prerequisites

BigQuery Setup

Configuration

Running the Application

1. Generate Embeddings

2. Execute Similarity Search

Troubleshooting

Monitoring

Best Practices

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages