-
Notifications
You must be signed in to change notification settings - Fork 207
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* chore: version 0.0.5 * feat: setting preset (close: #57) (#61) * feat: implement preset setting (close: #57) * fix(renderer): setting page cannot scroll * refactor(renderer): setting layout and scroll behavior * refactor(renderer): remote preset management ui * refactor(renderer): setting input should be disabled when remote preset management take effect * fix(renderer): reset to manual does not work * chore(renderer): dispatch get setting after import preset * feat(renderer): clear setting * refactor(renderer): naming * chore: tweaks * chore: add setting store change log * refactor: enhance impl of `setting:resetPreset` * refactor: enhance impl of `setting:clear` * chore: tweaks * refactor: using self-implemented setting state management * chore: tweaks * fix(renderer): only fallback to download if upload was not configured or failed (#95) * refactor(ipc-bridge): replace zutron to trpc-like ipc bridge (#94) * refactor(ipc-bridge): replace zutron to trpc-like ipc bridge * chore: replace window * chore: ipc get permissions * refactor: remove all dispatch * docs: init setting docs (#96) * docs: init setting docs * Update docs/setting.md Co-authored-by: Charles <[email protected]> --------- Co-authored-by: Charles <[email protected]> * chore: make language setting optional (#99) * fix(windows): windows fails to handle events when the dock bar is opened after being closed (#100) * chore: tool * chore: canvas capture frame 30 to 60 * chore: upgrade electron 3.1.1 --------- Co-authored-by: ULIVZ <[email protected]>
- Loading branch information
Showing
64 changed files
with
10,660 additions
and
7,580 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# Preset Management Guide | ||
|
||
> [!IMPORTANT] | ||
> Currently, **UI-TARS Desktop** does not directly provide server-side capabilities, so we do not provide a Preset for the open source community. welcome community developers to contribute your presets [here](../examples/presets/). | ||
A **preset** is a collection of [settings](./setting.md) (_Introduced at [#61](https://github.com/bytedance/UI-TARS-desktop/pull/61)_), **UI-TARS Desktop** supports import presets via `files` or `URLs`: | ||
|
||
```mermaid | ||
graph TD | ||
A[Import Preset] --> B{Preset Type} | ||
B -->|File| C[YAML File] | ||
B -->|URL| D[URL Endpoint] | ||
C --> E[Manual Updates 🔧] | ||
D --> F[Auto Sync ⚡] | ||
``` | ||
|
||
<br> | ||
|
||
|
||
## Preset Types Comparison | ||
|
||
| Feature | Local Presets | Remote Presets | | ||
|-----------------------|------------------------|------------------------| | ||
| **Storage** | Device-local | Cloud-hosted | | ||
| **Update Mechanism** | Manual | Automatic | | ||
| **Access Control** | Read/Write | Read-Only | | ||
| **Versioning** | Manual | Git-integrated | | ||
|
||
|
||
|
||
<br> | ||
|
||
|
||
## Examples | ||
|
||
### Import from file | ||
|
||
**UI-TARS Desktop** supports importing presets from files. Once the file is parsed successfully, the settings will be automatically updated. | ||
|
||
| Function | Snapshot | | ||
| --- | ---| | ||
| Open Setting |<img width="320" alt="image" src="https://github.com/user-attachments/assets/1d2ae27c-9b2e-4896-96a6-04832f850907" /> | | ||
| Import Success | <img width="320" alt="image" src="https://github.com/user-attachments/assets/38f77101-7388-4363-ab27-668180f51aaa" />| | ||
| Exception: Invalid Content | <img width="320" alt="image" src="https://github.com/user-attachments/assets/5ebec2b2-12f6-4d1a-84a7-8202ef651223" /> | | ||
|
||
|
||
<br> | ||
|
||
|
||
### Import from URL | ||
|
||
**UI-TARS Desktop** also supports importing presets from URLs. If automatic updates are set, presets will be automatically pulled every time the application is started. | ||
|
||
| Function | Snapshot | | ||
| --- | ---| | ||
| Open Setting | <img width="320" alt="image" src="https://github.com/user-attachments/assets/d446da0e-3bb4-4ca5-bc95-4f235d979fd0" /> | | ||
| Import Success (Default) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/a6470ed4-80ac-45a1-aaba-39e598d5af0f" /> | | ||
| Import Success (Auto Update) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/b5364d66-6654-401b-969e-f85baeedbda0" />| | ||
|
||
|
||
<br> | ||
|
||
|
||
### Preset Example | ||
|
||
```yaml | ||
name: UI TARS Desktop Example Preset | ||
language: en | ||
vlmProvider: Hugging Face | ||
vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1 | ||
vlmApiKey: your_api_key | ||
vlmModelName: your_model_name | ||
reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload | ||
utioBaseUrl: https://your-utio-endpoint.com/collect | ||
``` | ||
See all [example presets](../examples/presets). | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,283 @@ | ||
# Settings Configuration Guide | ||
|
||
## Overview | ||
|
||
**UI-TARS Desktop** offers granular control over application behavior through its settings system. This document provides comprehensive guidance on configuration options, preset management, and operational best practices. | ||
|
||
<p align="center"> | ||
<img src="../images/setting.png" alt="Settings Interface Overview" width="650"> | ||
<br> | ||
<em>Main Settings Interface</em> | ||
</p> | ||
|
||
|
||
<br> | ||
|
||
|
||
## Configuration Options | ||
|
||
### Language | ||
|
||
Controls localization settings for VLM. | ||
|
||
| Property | Details | | ||
| ----------- | ------------------------------ | | ||
| **Type** | `string` | | ||
| **Options** | `en` (English), `zh` (Chinese) | | ||
| **Default** | `en` | | ||
|
||
> [!NOTE] | ||
> Changing the settings will **only** affect the output of VLM, not the language of the desktop app itself. Regarding the i18n of the App itself, welcome to contribute PR. | ||
|
||
<br> | ||
|
||
|
||
### VLM Provider | ||
|
||
Selects the backend VLM provider for make GUI action decisions. | ||
|
||
| Property | Details | | ||
| ----------- | ---------------------- | | ||
| **Type** | `string` | | ||
| **Options** | `Hugging Face`, `vLLM` | | ||
| **Default** | `Hugging Face` | | ||
|
||
> [!NOTE] | ||
> This is an interface reserved for different VLM providers. | ||
|
||
<br> | ||
|
||
|
||
|
||
### VLM Base URL | ||
|
||
Specify the base url of the VLM that needs to be requested. | ||
|
||
| Property | Details | | ||
| ------------ | -------- | | ||
| **Type** | `string` | | ||
| **Required** | `true` | | ||
|
||
> [!NOTE] | ||
> VLM Base URL should be OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details). | ||
|
||
<br> | ||
|
||
|
||
|
||
### VLM Model Name | ||
|
||
Specify the requested module name. | ||
|
||
| Property | Details | | ||
| ------------ | -------- | | ||
| **Type** | `string` | | ||
| **Required** | `true` | | ||
|
||
|
||
<br> | ||
|
||
|
||
### Report Storage Base URL | ||
|
||
Defines the base URL for uploading report file. By default, when this option is not set, when the user clicks **Export as HTML** (a.k.a. <b>Share</b>), it will automatically trigger the download of the report file: | ||
|
||
<p align="center"> | ||
<img src="../images/download-report.png" alt="Download report" width="320"> | ||
<br> | ||
</p> | ||
|
||
Once it's set, when user click **Export as HTML**, report file will firstly be uploaded to the Report Storage Server, which returns a publicly accessible URL for the persistent file. | ||
|
||
<p align="center"> | ||
<img src="../images/upload-report-success.png" alt="Download report" width="320"> | ||
<br> | ||
</p> | ||
|
||
#### Report Storage Server Interface | ||
|
||
The Report Storage Server should implement the following HTTP API endpoint: | ||
|
||
| Property | Details | | ||
| ------------ | ------------------------------------------------------------------------------------------------------------ | | ||
| **Endpoint** | `POST /your-storage-enpoint` | | ||
| **Headers** | Content-Type: `multipart/form-data` <br> <!-- - Authorization: Bearer \<access_token\> (Not Supported) --> | | ||
|
||
#### Request Body | ||
|
||
The request should be sent as `multipart/form-data` with the following field: | ||
|
||
| Field | Type | Required | Description | Constraints | | ||
| ----- | ---- | -------- | ---------------- | ---------------------------------- | | ||
| file | File | Yes | HTML report file | - Format: HTML<br>- Max size: 30MB | | ||
|
||
#### Response | ||
|
||
**Success Response (200 OK)** | ||
```json | ||
{ | ||
"url": "https://example.com/reports/xxx.html" | ||
} | ||
``` | ||
|
||
The response should return a JSON object containing a publicly accessible URL where the report can be accessed. | ||
|
||
> [!NOTE] | ||
> Currently, there is no authentication designed for Report Storage Server. If you have any requirements, please submit an [issue](https://github.com/bytedance/UI-TARS-desktop/issues). | ||
|
||
<br> | ||
|
||
|
||
### UTIO Base URL | ||
|
||
**UTIO** (_UI-TARS Insights and Observation_) is a data collection mechanism for insights into **UI-TARS Desktop** (_Introduced at [#60](https://github.com/bytedance/UI-TARS-desktop/pull/60)_). The design of UTIO is also related to sharing. The overall process is as follows: | ||
|
||
<p align="center"> | ||
<img src="../images/utio-flow.png" alt="UTIO Flow" width="800"> | ||
<br> | ||
<em>UTIO Flow</em> | ||
</p> | ||
|
||
This option defines the base URL for the **UTIO** server that handles application events and instructions. | ||
|
||
|
||
#### Server Interface Specification | ||
|
||
The UTIO server accepts events through HTTP POST requests and supports three types of events: | ||
|
||
| Property | Details | | ||
| ------------ | -------------------------------- | | ||
| **Endpoint** | `POST /your-utio-endpoint` | | ||
| **Headers** | Content-Type: `application/json` | | ||
|
||
##### Event Types | ||
|
||
The server handles three types of events: | ||
|
||
###### **Application Launch** | ||
```typescript | ||
interface AppLaunchedEvent { | ||
type: 'appLaunched'; | ||
platform: 'iOS' | 'Android' | 'Web'; | ||
osVersion: string; | ||
screenWidth: number; | ||
screenHeight: number; | ||
} | ||
``` | ||
|
||
###### **Send Instruction** | ||
```typescript | ||
interface SendInstructionEvent { | ||
type: 'sendInstruction'; | ||
instruction: string; | ||
} | ||
``` | ||
|
||
###### **Share Report** | ||
```typescript | ||
interface ShareReportEvent { | ||
type: 'shareReport'; | ||
lastScreenshot?: string; | ||
report?: string; | ||
instruction: string; | ||
} | ||
``` | ||
|
||
##### Request Example | ||
|
||
```json | ||
{ | ||
"type": "appLaunched", | ||
"platform": "iOS", | ||
"osVersion": "16.0.0", | ||
"screenWidth": 390, | ||
"screenHeight": 844 | ||
} | ||
``` | ||
|
||
##### Response | ||
|
||
**Success Response (200 OK)** | ||
```json | ||
{ | ||
"success": true | ||
} | ||
``` | ||
|
||
> [!NOTE] | ||
> All events are processed asynchronously. The server should respond promptly to acknowledge receipt of the event. | ||
|
||
##### Server Example | ||
|
||
###### Node.js | ||
|
||
```js | ||
const express = require('express'); | ||
const cors = require('cors'); | ||
const app = express(); | ||
const port = 3000; | ||
|
||
app.use(cors()); | ||
app.use(express.json()); | ||
|
||
app.post('/your-utio-endpoint', (req, res) => { | ||
const event = req.body; | ||
|
||
if (!event || !event.type) { | ||
return res.status(400).json({ error: 'Missing event type' }); | ||
} | ||
|
||
switch (event.type) { | ||
case 'appLaunched': | ||
return handleAppLaunch(event, res); | ||
case 'sendInstruction': | ||
return handleSendInstruction(event, res); | ||
case 'shareReport': | ||
return handleShareReport(event, res); | ||
default: | ||
return res.status(400).json({ error: 'Unsupported event type' }); | ||
} | ||
}); | ||
|
||
app.listen(port, () => { | ||
console.log(`Server listening on port ${port}`); | ||
}); | ||
``` | ||
|
||
###### Python | ||
|
||
```python | ||
from flask import Flask, request, jsonify | ||
from flask_cors import CORS | ||
import re | ||
|
||
app = Flask(__name__) | ||
CORS(app) | ||
|
||
@app.route('/events', methods=['POST']) | ||
def handle_event(): | ||
data = request.get_json() | ||
|
||
if not data or 'type' not in data: | ||
return jsonify({'error': 'Missing event type'}), 400 | ||
|
||
event_type = data['type'] | ||
|
||
if event_type == 'appLaunched': | ||
return handle_app_launch(data) | ||
elif event_type == 'sendInstruction': | ||
return handle_send_instruction(data) | ||
elif event_type == 'shareReport': | ||
return handle_share_report(data) | ||
else: | ||
return jsonify({'error': 'Unsupported event type'}), 400 | ||
|
||
if __name__ == '__main__': | ||
app.run(port=3000) | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
name: UI TARS Desktop Example Preset | ||
language: en | ||
vlmProvider: Hugging Face | ||
vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1 | ||
vlmApiKey: your_api_key | ||
vlmModelName: your_model_name | ||
reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload | ||
utioBaseUrl: https://your-utio-endpoint.com/collect |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.