This repository contains the materials and code for the JARVIS project, a personal assistant inspired by Marvel’s J.A.R.V.I.S. Over the course of the semester, we will integrate voice recognition (speech-to-text), text-to-speech (TTS), natural language processing (NLP), and various task automation features.
The project is split into two phases:
- API-Powered Assistant (Weeks 1–4): Utilize existing APIs (OpenAI, Whisper, etc.) to quickly develop JARVIS’ core functionality: speech recognition, TTS, and dynamic response generation.
- Offline Assistant (Weeks 5–10): Replace cloud APIs with locally hosted models (using tools like Ollama and Hugging Face) to make JARVIS fully offline, handling everything from speech recognition and TTS to NLP on your own machine.
- Voice Recognition and TTS
- Natural Language Processing (NLP)
- Offline Model Hosting (e.g., Ollama, Hugging Face)
- Working with Python libraries, asynchronous code, and APIs
- Automated Task Handling / Command System
Slides for supplementary learning can be found here (UMich login required).
- Solid Python Skills: You must be very comfortable with Python (object-oriented concepts, asynchronous programming, etc.).
- Operating System: macOS 11 or later, Windows 10 or later, or any modern Linux distribution.
- API Familiarity: You should have a strong understanding of how APIs work and how to integrate them.
- (Optional) ML Experience: Helpful but not required. We will cover essential ML topics as needed.
Week | Date | Topic | Objective |
---|---|---|---|
1 | 1/26 | Introduction, Setup, Voice Input + TTS | |
2 | 2/2 | Basic Command Handling System with LangChain | |
3 | 2/9 | OpenAI API for Dynamic Response Generation | |
4 | 2/16 | Ollama for Local Hosting | |
5 | 2/23 | Hugging Face Crash Course | Project Checkpoint |
- | - | Spring Break | |
- | - | Spring Break | |
6 | 3/16 | Offline NLP Pipeline | |
7 | 3/23 | Integrating Offline Speech Recognition, TTS, NLP | |
8 | 3/30 | Development Time | |
9 | 4/6 | Development Time | |
10 | 4/13 | Final Expo Prep | Final Deliverable Due |
- | 4/19 | Final Project Exposition 🎉 | Presentation Due |
Note: Weeks 1–4 focus on creating JARVIS with cloud APIs. Weeks 5–10 focus on transitioning to an offline solution.
If your local environment proves challenging, utilize cloud notebooks like Google Colab or Kaggle.
git clone https://github.com/MichiganDataScienceTeam/W25-JARVIS.git
cd W25-JARVIS
We recommend using a virtual environment (requires Python 3.9 or later):
python3 -m venv env
source env/bin/activate # Mac/Linux
env\Scripts\activate # Windows
pip install -r requirements.txt
In the initial phase, you will need an OpenAI API key (for GPT), plus any other keys for TTS or speech recognition if not handled locally.
Create a file named .env in the project’s root directory:
# .env
OPENAI_API_KEY=your_api_key
OTHER_API_KEYS=...
...
Then, load it in your scripts:
from dotenv import load_dotenv
load_dotenv()
import os
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
IMPORTANT: Never commit API keys or .env files to Git.
When using openai
, setting the API key should look like this:
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
We will make use of LangChain to handle LLM-driven workflows. Once your .env is set and loaded, LangChain will automatically pick up environment variables (e.g., OPENAI_API_KEY). If not, do the following:
X_API_KEY = os.getenv("API_KEY_NAME")
# then, pass the API KEY variable where necessary
After learning to integrate cloud APIs, we will shift towards offline hosting. Tools we’ll be using include:
- Ollama for local large language models.
- Hugging Face Transformers for offline NLP.
By the end of the project, you should have:
- A speech-to-text pipeline that captures voice commands.
- A text-to-speech engine that vocalizes JARVIS’ responses.
- A command handling system capable of both basic (hard-coded) commands and dynamic commands powered by large language models (cloud or local).
- An offline setup that relies on local models, culminating in a personal assistant that can handle open-ended queries, schedule reminders, and more — entirely on-device.
- Python Basics: Official Python Documentation
- Speech Recognition: SpeechRecognition Library, OpenAI Whisper
- TTS: pyttsx3, gTTS (for cloud-based TTS)
- Hugging Face: Transformers Documentation
- Ollama: Ollama
- OpenAI Whisper: OpenAI Whisper Repo
- Aarushi Shah – [email protected]
- Muhammad (Abubakar) Siddiq – [email protected]
- Alexander Devine
- Kajal Patel
- Luke Davey
- Naveen Premkumar
- Pear Seraypheap