Skip to content

Commit

Permalink
release: version 0.0.5 (#91)
Browse files Browse the repository at this point in the history
* chore: version 0.0.5

* feat: setting preset (close: #57) (#61)

* feat: implement preset setting (close: #57)

* fix(renderer): setting page cannot scroll

* refactor(renderer): setting layout and scroll behavior

* refactor(renderer): remote preset management ui

* refactor(renderer): setting input should be disabled when remote preset management take effect

* fix(renderer): reset to manual does not work

* chore(renderer): dispatch get setting after import preset

* feat(renderer): clear setting

* refactor(renderer): naming

* chore: tweaks

* chore: add setting store change log

* refactor: enhance impl of `setting:resetPreset`

* refactor: enhance impl of `setting:clear`

* chore: tweaks

* refactor: using self-implemented setting state management

* chore: tweaks

* fix(renderer): only fallback to download if upload was not configured or failed (#95)

* refactor(ipc-bridge): replace zutron to trpc-like ipc bridge (#94)

* refactor(ipc-bridge): replace zutron to trpc-like ipc bridge

* chore: replace window

* chore: ipc get permissions

* refactor: remove all dispatch

* docs: init setting docs (#96)

* docs: init setting docs

* Update docs/setting.md

Co-authored-by: Charles <[email protected]>

---------

Co-authored-by: Charles <[email protected]>

* chore: make language setting optional (#99)

* fix(windows): windows fails to handle events when the dock bar is opened after being closed (#100)

* chore: tool

* chore: canvas capture frame 30 to 60

* chore: upgrade electron 3.1.1

---------

Co-authored-by: ULIVZ <[email protected]>
  • Loading branch information
ycjcl868 and ulivz authored Feb 12, 2025
1 parent 22303b0 commit dbd09de
Show file tree
Hide file tree
Showing 64 changed files with 10,660 additions and 7,580 deletions.
78 changes: 78 additions & 0 deletions docs/preset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Preset Management Guide

> [!IMPORTANT]
> Currently, **UI-TARS Desktop** does not directly provide server-side capabilities, so we do not provide a Preset for the open source community. welcome community developers to contribute your presets [here](../examples/presets/).
A **preset** is a collection of [settings](./setting.md) (_Introduced at [#61](https://github.com/bytedance/UI-TARS-desktop/pull/61)_), **UI-TARS Desktop** supports import presets via `files` or `URLs`:

```mermaid
graph TD
A[Import Preset] --> B{Preset Type}
B -->|File| C[YAML File]
B -->|URL| D[URL Endpoint]
C --> E[Manual Updates 🔧]
D --> F[Auto Sync ⚡]
```

<br>


## Preset Types Comparison

| Feature | Local Presets | Remote Presets |
|-----------------------|------------------------|------------------------|
| **Storage** | Device-local | Cloud-hosted |
| **Update Mechanism** | Manual | Automatic |
| **Access Control** | Read/Write | Read-Only |
| **Versioning** | Manual | Git-integrated |



<br>


## Examples

### Import from file

**UI-TARS Desktop** supports importing presets from files. Once the file is parsed successfully, the settings will be automatically updated.

| Function | Snapshot |
| --- | ---|
| Open Setting |<img width="320" alt="image" src="https://github.com/user-attachments/assets/1d2ae27c-9b2e-4896-96a6-04832f850907" /> |
| Import Success | <img width="320" alt="image" src="https://github.com/user-attachments/assets/38f77101-7388-4363-ab27-668180f51aaa" />|
| Exception: Invalid Content | <img width="320" alt="image" src="https://github.com/user-attachments/assets/5ebec2b2-12f6-4d1a-84a7-8202ef651223" /> |


<br>


### Import from URL

**UI-TARS Desktop** also supports importing presets from URLs. If automatic updates are set, presets will be automatically pulled every time the application is started.

| Function | Snapshot |
| --- | ---|
| Open Setting | <img width="320" alt="image" src="https://github.com/user-attachments/assets/d446da0e-3bb4-4ca5-bc95-4f235d979fd0" /> |
| Import Success (Default) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/a6470ed4-80ac-45a1-aaba-39e598d5af0f" /> |
| Import Success (Auto Update) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/b5364d66-6654-401b-969e-f85baeedbda0" />|


<br>


### Preset Example

```yaml
name: UI TARS Desktop Example Preset
language: en
vlmProvider: Hugging Face
vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1
vlmApiKey: your_api_key
vlmModelName: your_model_name
reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload
utioBaseUrl: https://your-utio-endpoint.com/collect
```
See all [example presets](../examples/presets).
283 changes: 283 additions & 0 deletions docs/setting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
# Settings Configuration Guide

## Overview

**UI-TARS Desktop** offers granular control over application behavior through its settings system. This document provides comprehensive guidance on configuration options, preset management, and operational best practices.

<p align="center">
<img src="../images/setting.png" alt="Settings Interface Overview" width="650">
<br>
<em>Main Settings Interface</em>
</p>


<br>


## Configuration Options

### Language

Controls localization settings for VLM.

| Property | Details |
| ----------- | ------------------------------ |
| **Type** | `string` |
| **Options** | `en` (English), `zh` (Chinese) |
| **Default** | `en` |

> [!NOTE]
> Changing the settings will **only** affect the output of VLM, not the language of the desktop app itself. Regarding the i18n of the App itself, welcome to contribute PR.

<br>


### VLM Provider

Selects the backend VLM provider for make GUI action decisions.

| Property | Details |
| ----------- | ---------------------- |
| **Type** | `string` |
| **Options** | `Hugging Face`, `vLLM` |
| **Default** | `Hugging Face` |

> [!NOTE]
> This is an interface reserved for different VLM providers.

<br>



### VLM Base URL

Specify the base url of the VLM that needs to be requested.

| Property | Details |
| ------------ | -------- |
| **Type** | `string` |
| **Required** | `true` |

> [!NOTE]
> VLM Base URL should be OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).

<br>



### VLM Model Name

Specify the requested module name.

| Property | Details |
| ------------ | -------- |
| **Type** | `string` |
| **Required** | `true` |


<br>


### Report Storage Base URL

Defines the base URL for uploading report file. By default, when this option is not set, when the user clicks **Export as HTML** (a.k.a. <b>Share</b>), it will automatically trigger the download of the report file:

<p align="center">
<img src="../images/download-report.png" alt="Download report" width="320">
<br>
</p>

Once it's set, when user click **Export as HTML**, report file will firstly be uploaded to the Report Storage Server, which returns a publicly accessible URL for the persistent file.

<p align="center">
<img src="../images/upload-report-success.png" alt="Download report" width="320">
<br>
</p>

#### Report Storage Server Interface

The Report Storage Server should implement the following HTTP API endpoint:

| Property | Details |
| ------------ | ------------------------------------------------------------------------------------------------------------ |
| **Endpoint** | `POST /your-storage-enpoint` |
| **Headers** | Content-Type: `multipart/form-data` <br> <!-- - Authorization: Bearer \<access_token\> (Not Supported) --> |

#### Request Body

The request should be sent as `multipart/form-data` with the following field:

| Field | Type | Required | Description | Constraints |
| ----- | ---- | -------- | ---------------- | ---------------------------------- |
| file | File | Yes | HTML report file | - Format: HTML<br>- Max size: 30MB |

#### Response

**Success Response (200 OK)**
```json
{
"url": "https://example.com/reports/xxx.html"
}
```

The response should return a JSON object containing a publicly accessible URL where the report can be accessed.

> [!NOTE]
> Currently, there is no authentication designed for Report Storage Server. If you have any requirements, please submit an [issue](https://github.com/bytedance/UI-TARS-desktop/issues).

<br>


### UTIO Base URL

**UTIO** (_UI-TARS Insights and Observation_) is a data collection mechanism for insights into **UI-TARS Desktop** (_Introduced at [#60](https://github.com/bytedance/UI-TARS-desktop/pull/60)_). The design of UTIO is also related to sharing. The overall process is as follows:

<p align="center">
<img src="../images/utio-flow.png" alt="UTIO Flow" width="800">
<br>
<em>UTIO Flow</em>
</p>

This option defines the base URL for the **UTIO** server that handles application events and instructions.


#### Server Interface Specification

The UTIO server accepts events through HTTP POST requests and supports three types of events:

| Property | Details |
| ------------ | -------------------------------- |
| **Endpoint** | `POST /your-utio-endpoint` |
| **Headers** | Content-Type: `application/json` |

##### Event Types

The server handles three types of events:

###### **Application Launch**
```typescript
interface AppLaunchedEvent {
type: 'appLaunched';
platform: 'iOS' | 'Android' | 'Web';
osVersion: string;
screenWidth: number;
screenHeight: number;
}
```

###### **Send Instruction**
```typescript
interface SendInstructionEvent {
type: 'sendInstruction';
instruction: string;
}
```

###### **Share Report**
```typescript
interface ShareReportEvent {
type: 'shareReport';
lastScreenshot?: string;
report?: string;
instruction: string;
}
```

##### Request Example

```json
{
"type": "appLaunched",
"platform": "iOS",
"osVersion": "16.0.0",
"screenWidth": 390,
"screenHeight": 844
}
```

##### Response

**Success Response (200 OK)**
```json
{
"success": true
}
```

> [!NOTE]
> All events are processed asynchronously. The server should respond promptly to acknowledge receipt of the event.

##### Server Example

###### Node.js

```js
const express = require('express');
const cors = require('cors');
const app = express();
const port = 3000;

app.use(cors());
app.use(express.json());

app.post('/your-utio-endpoint', (req, res) => {
const event = req.body;

if (!event || !event.type) {
return res.status(400).json({ error: 'Missing event type' });
}

switch (event.type) {
case 'appLaunched':
return handleAppLaunch(event, res);
case 'sendInstruction':
return handleSendInstruction(event, res);
case 'shareReport':
return handleShareReport(event, res);
default:
return res.status(400).json({ error: 'Unsupported event type' });
}
});

app.listen(port, () => {
console.log(`Server listening on port ${port}`);
});
```

###### Python

```python
from flask import Flask, request, jsonify
from flask_cors import CORS
import re

app = Flask(__name__)
CORS(app)

@app.route('/events', methods=['POST'])
def handle_event():
data = request.get_json()

if not data or 'type' not in data:
return jsonify({'error': 'Missing event type'}), 400

event_type = data['type']

if event_type == 'appLaunched':
return handle_app_launch(data)
elif event_type == 'sendInstruction':
return handle_send_instruction(data)
elif event_type == 'shareReport':
return handle_share_report(data)
else:
return jsonify({'error': 'Unsupported event type'}), 400

if __name__ == '__main__':
app.run(port=3000)
```

8 changes: 8 additions & 0 deletions examples/presets/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: UI TARS Desktop Example Preset
language: en
vlmProvider: Hugging Face
vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1
vlmApiKey: your_api_key
vlmModelName: your_model_name
reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload
utioBaseUrl: https://your-utio-endpoint.com/collect
9 changes: 6 additions & 3 deletions forge.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ const skipDevDependencies = new Set([
const keepLanguages = new Set(['en', 'en_GB', 'en-US', 'en_US']);
// const ignorePattern = new RegExp(`^/node_modules/(${[...devDependencies].join("|")})`)

const enableOsxSign =
process.env.APPLE_ID &&
process.env.APPLE_PASSWORD &&
process.env.APPLE_TEAM_ID;

// remove folders & files not to be included in the app
async function cleanSources(
buildPath,
Expand Down Expand Up @@ -100,9 +105,7 @@ const config: ForgeConfig = {
prune: true,
executableName: 'UI-TARS',
extraResource: ['./resources/app-update.yml'],
...(process.env.APPLE_ID &&
process.env.APPLE_PASSWORD &&
process.env.APPLE_TEAM_ID
...(enableOsxSign
? {
osxSign: {
keychain: process.env.KEYCHAIN_PATH,
Expand Down
Binary file added images/download-report.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/import-preset-from-local.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/import-preset-from-remote.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/setting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/upload-report-success.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/utio-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit dbd09de

Please sign in to comment.