-
Notifications
You must be signed in to change notification settings - Fork 207
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: support operator-browserbase (#132)
* feat: support operator-browserbase * chore: md * chore: test * chore: types * release: publish beta packages * chore: docs * chore: typo * refactor: pnpm * chore: reset gui agent * chore: types predictionParsed Close #136 * chore: types * fix: onData end trigger twice * chore: typo * chore: mouse speed * chore: copyright * chore: type * chore: publish
- Loading branch information
Showing
75 changed files
with
7,997 additions
and
272 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
'@ui-tars/operator-browserbase': patch | ||
'@ui-tars/operator-nut-js': patch | ||
'@ui-tars/shared': patch | ||
'@ui-tars/cli': patch | ||
'@ui-tars/sdk': patch | ||
--- | ||
|
||
chore: open-operator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
--- | ||
'@ui-tars/operator-browserbase': patch | ||
'@ui-tars/operator-nut-js': patch | ||
'@ui-tars/sdk': patch | ||
--- | ||
|
||
chore: types |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# OpenAI API Configuration | ||
UI_TARS_BASE_URL=your_ui_tars_base_url_here | ||
UI_TARS_API_KEY=your_ui_tars_api_key_here | ||
UI_TARS_MODEL=your_ui_tars_model_here | ||
|
||
# Browserbase Configuration | ||
BROWSERBASE_API_KEY=your_browserbase_api_key_here | ||
BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Compiled source # | ||
################### | ||
*.com | ||
*.class | ||
*.dll | ||
*.exe | ||
*.o | ||
*.so | ||
|
||
# Packages # | ||
############ | ||
# it's better to unpack these files and commit the raw source | ||
# git has its own built in compression methods | ||
*.7z | ||
*.dmg | ||
*.gz | ||
*.iso | ||
*.jar | ||
*.rar | ||
*.tar | ||
*.zip | ||
|
||
# Logs and databases # | ||
###################### | ||
*.log | ||
*.sql | ||
*.sqlite | ||
|
||
# OS generated files # | ||
###################### | ||
.DS_Store | ||
.DS_Store? | ||
._* | ||
.Spotlight-V100 | ||
.Trashes | ||
ehthumbs.db | ||
Thumbs.db | ||
|
||
# IDE and Editor folders # | ||
########################## | ||
.idea/ | ||
.vscode/ | ||
*.swp | ||
*.swo | ||
*~ | ||
|
||
# Node.js # | ||
########### | ||
node_modules/ | ||
npm-debug.log | ||
.next | ||
|
||
# Python # | ||
########## | ||
*.py[cod] | ||
__pycache__/ | ||
*.so | ||
|
||
# Java # | ||
######## | ||
*.class | ||
*.jar | ||
*.war | ||
*.ear | ||
|
||
# Gradle # | ||
########## | ||
.gradle | ||
/build/ | ||
|
||
# Maven # | ||
######### | ||
target/ | ||
|
||
# Miscellaneous # | ||
################# | ||
*.bak | ||
*.tmp | ||
*.temp | ||
.env | ||
.env.local |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Open Operator | ||
|
||
> [!WARNING] | ||
> This is simply a proof of concept. | ||
> Browserbase aims not to compete with web agents, but rather to provide all the necessary tools for anybody to build their own web agent. We strongly recommend you check out both [Browserbase](https://www.browserbase.com) and our open source project [Stagehand](https://www.stagehand.dev) to build your own web agent. | ||
[data:image/s3,"s3://crabby-images/c5542/c55422930910a32cc5fd25f6bee6cdc3ec8e835f" alt="Deploy with Vercel"](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fbrowserbase%2Fopen-operator&env=OPENAI_API_KEY,BROWSERBASE_API_KEY,BROWSERBASE_PROJECT_ID&envDescription=API%20keys%20needed%20to%20run%20Open%20Operator&envLink=https%3A%2F%2Fgithub.com%2Fbrowserbase%2Fopen-operator%23environment-variables) | ||
|
||
https://github.com/user-attachments/assets/354c3b8b-681f-4ad0-9ab9-365dbde894af | ||
|
||
## Getting Started | ||
|
||
First, install the dependencies for this repository. This requires [pnpm](https://pnpm.io/installation#using-other-package-managers). | ||
|
||
<!-- This doesn't work with NPM, haven't tested with yarn --> | ||
|
||
```bash | ||
pnpm install | ||
``` | ||
|
||
Next, copy the example environment variables: | ||
|
||
```bash | ||
cp .env.example .env.local | ||
``` | ||
|
||
You'll need to set up your API keys: | ||
|
||
1. Get your UI-TARS Service from [UI-TARS](https://github.com/bytedance/UI-TARS) | ||
2. Get your Browserbase API key and project ID from [Browserbase](https://www.browserbase.com) | ||
|
||
Update `.env.local` with your API keys: | ||
|
||
- `UI_TARS_BASE_URL`: Your UI-TARS Base Url | ||
- `UI_TARS_API_KEY`: Your UI-TARS API Key | ||
- `UI_TARS_MODEL`: Your UI-TARS Model | ||
- `BROWSERBASE_API_KEY`: Your Browserbase API key | ||
- `BROWSERBASE_PROJECT_ID`: Your Browserbase project ID | ||
|
||
Then, run the development server: | ||
|
||
<!-- This doesn't work with NPM, haven't tested with yarn --> | ||
|
||
```bash | ||
pnpm dev | ||
``` | ||
|
||
Open [http://localhost:3000](http://localhost:3000) with your browser to see Open Operator in action. | ||
|
||
## How It Works | ||
|
||
Building a web agent is a complex task. You need to understand the user's intent, convert it into headless browser operations, and execute actions, each of which can be incredibly complex on their own. | ||
|
||
data:image/s3,"s3://crabby-images/fb0ab/fb0ab6c14bd88f83a48b7504d447bbaad2217cb6" alt="public/agent_mess.png" | ||
|
||
Stagehand is a tool that helps you build web agents. It allows you to convert natural language into headless browser operations, execute actions on the browser, and extract results back into structured data. | ||
|
||
data:image/s3,"s3://crabby-images/3f606/3f6068ecb5a546c46038f0a883aa18656ff567ac" alt="public/stagehand_clean.png" | ||
|
||
Under the hood, we have a very simple agent loop that just calls Stagehand to convert the user's intent into headless browser operations, and then calls Browserbase to execute those operations. | ||
|
||
data:image/s3,"s3://crabby-images/3bb11/3bb112c4930933f2fd85143e3ec7776f91ea2330" alt="public/agent_loop.png" | ||
|
||
Stagehand uses Browserbase to execute actions on the browser, and OpenAI to understand the user's intent. | ||
|
||
For more on this, check out the code at [this commit](https://github.com/browserbase/open-operator/blob/6f2fba55b3d271be61819dc11e64b1ada52646ac/index.ts). | ||
|
||
### Key Technologies | ||
|
||
- **[Browserbase](https://www.browserbase.com)**: Powers the core browser automation and interaction capabilities | ||
- **[Stagehand](https://www.stagehand.dev)**: Handles precise DOM manipulation and state management | ||
- **[Next.js](https://nextjs.org)**: Provides the modern web framework foundation | ||
- **[OpenAI](https://openai.com)**: Enable natural language understanding and decision making | ||
|
||
## Contributing | ||
|
||
We welcome contributions! Whether it's: | ||
|
||
- Adding new features | ||
- Improving documentation | ||
- Reporting bugs | ||
- Suggesting enhancements | ||
|
||
Please feel free to open issues and pull requests. | ||
|
||
## License | ||
|
||
Open Operator is open source software licensed under the MIT license. | ||
|
||
## Acknowledgments | ||
|
||
This project is inspired by OpenAI's Operator feature and builds upon various open source technologies including Next.js, React, Browserbase, and Stagehand. |
Oops, something went wrong.