release: version 0.0.5 (#91)

* chore: version 0.0.5 * feat: setting preset (close: #57) (#61) * feat: implement preset setting (close: #57) * fix(renderer): setting page cannot scroll * refactor(renderer): setting layout and scroll behavior * refactor(renderer): remote preset management ui * refactor(renderer): setting input should be disabled when remote preset management take effect * fix(renderer): reset to manual does not work * chore(renderer): dispatch get setting after import preset * feat(renderer): clear setting * refactor(renderer): naming * chore: tweaks * chore: add setting store change log * refactor: enhance impl of `setting:resetPreset` * refactor: enhance impl of `setting:clear` * chore: tweaks * refactor: using self-implemented setting state management * chore: tweaks * fix(renderer): only fallback to download if upload was not configured or failed (#95) * refactor(ipc-bridge): replace zutron to trpc-like ipc bridge (#94) * refactor(ipc-bridge): replace zutron to trpc-like ipc bridge * chore: replace window * chore: ipc get permissions * refactor: remove all dispatch * docs: init setting docs (#96) * docs: init setting docs * Update docs/setting.md Co-authored-by: Charles <[email protected]> --------- Co-authored-by: Charles <[email protected]> * chore: make language setting optional (#99) * fix(windows): windows fails to handle events when the dock bar is opened after being closed (#100) * chore: tool * chore: canvas capture frame 30 to 60 * chore: upgrade electron 3.1.1 --------- Co-authored-by: ULIVZ <[email protected]>
bytedance · Feb 12, 2025 · dbd09de · dbd09de
1 parent 22303b0
commit dbd09de
Show file tree

Hide file tree

Showing 64 changed files with 10,660 additions and 7,580 deletions.
diff --git a/docs/preset.md b/docs/preset.md
@@ -0,0 +1,78 @@
+# Preset Management Guide
+
+> [!IMPORTANT]  
+> Currently, **UI-TARS Desktop** does not directly provide server-side capabilities, so we do not provide a Preset for the open source community. welcome community developers to contribute your presets [here](../examples/presets/).
+
+A **preset** is a collection of [settings](./setting.md)  (_Introduced at [#61](https://github.com/bytedance/UI-TARS-desktop/pull/61)_), **UI-TARS Desktop** supports import presets via `files` or `URLs`:
+
+```mermaid
+graph TD
+    A[Import Preset] --> B{Preset Type}
+    B -->|File| C[YAML File]
+    B -->|URL| D[URL Endpoint]
+    C --> E[Manual Updates 🔧]
+    D --> F[Auto Sync ⚡]
+```
+
+<br>
+
+
+## Preset Types Comparison
+
+| Feature               | Local Presets          | Remote Presets         |
+|-----------------------|------------------------|------------------------|
+| **Storage**           | Device-local           | Cloud-hosted          |
+| **Update Mechanism**  | Manual                 | Automatic             |
+| **Access Control**    | Read/Write             | Read-Only             |
+| **Versioning**        | Manual                 | Git-integrated        |
+
+
+
+<br>
+
+
+## Examples
+
+### Import from file
+
+**UI-TARS Desktop** supports importing presets from files. Once the file is parsed successfully, the settings will be automatically updated.
+
+| Function | Snapshot |
+| --- | ---|
+| Open Setting |<img width="320" alt="image" src="https://github.com/user-attachments/assets/1d2ae27c-9b2e-4896-96a6-04832f850907" /> |
+| Import Success | <img width="320" alt="image" src="https://github.com/user-attachments/assets/38f77101-7388-4363-ab27-668180f51aaa" />|
+| Exception: Invalid Content | <img width="320" alt="image" src="https://github.com/user-attachments/assets/5ebec2b2-12f6-4d1a-84a7-8202ef651223" /> |
+
+
+<br>
+
+
+### Import from URL
+
+**UI-TARS Desktop** also supports importing presets from URLs. If automatic updates are set, presets will be automatically pulled every time the application is started.
+
+| Function | Snapshot |
+| --- | ---|
+| Open Setting | <img width="320" alt="image" src="https://github.com/user-attachments/assets/d446da0e-3bb4-4ca5-bc95-4f235d979fd0" /> |
+| Import Success (Default) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/a6470ed4-80ac-45a1-aaba-39e598d5af0f" /> |
+| Import Success (Auto Update) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/b5364d66-6654-401b-969e-f85baeedbda0" />|
+
+
+<br>
+
+
+### Preset Example
+
+```yaml
+name: UI TARS Desktop Example Preset
+language: en
+vlmProvider: Hugging Face
+vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1
+vlmApiKey: your_api_key
+vlmModelName: your_model_name
+reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload
+utioBaseUrl: https://your-utio-endpoint.com/collect
+```
+
+See all [example presets](../examples/presets).
+
diff --git a/docs/setting.md b/docs/setting.md
@@ -0,0 +1,283 @@
+# Settings Configuration Guide
+
+## Overview
+
+**UI-TARS Desktop** offers granular control over application behavior through its settings system. This document provides comprehensive guidance on configuration options, preset management, and operational best practices.
+
+<p align="center">
+  <img src="../images/setting.png" alt="Settings Interface Overview" width="650">
+  <br>
+  <em>Main Settings Interface</em>
+</p>
+
+
+<br>
+
+
+## Configuration Options
+
+### Language
+
+Controls localization settings for VLM.
+
+| Property    | Details                        |
+| ----------- | ------------------------------ |
+| **Type**    | `string`                       |
+| **Options** | `en` (English), `zh` (Chinese) |
+| **Default** | `en`                           |
+
+> [!NOTE]
+> Changing the settings will **only** affect the output of VLM, not the language of the desktop app itself. Regarding the i18n of the App itself, welcome to contribute PR.
+
+
+<br>
+
+
+### VLM Provider
+
+Selects the backend VLM provider for make GUI action decisions.
+
+| Property    | Details                |
+| ----------- | ---------------------- |
+| **Type**    | `string`               |
+| **Options** | `Hugging Face`, `vLLM` |
+| **Default** | `Hugging Face`         |
+
+> [!NOTE]
+> This is an interface reserved for different VLM providers.
+
+
+<br>
+
+
+
+### VLM Base URL
+
+Specify the base url of the VLM that needs to be requested.
+
+| Property     | Details  |
+| ------------ | -------- |
+| **Type**     | `string` |
+| **Required** | `true`   |
+
+> [!NOTE]
+> VLM Base URL should be OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
+
+
+<br>
+
+
+
+### VLM Model Name
+
+Specify the requested module name.
+
+| Property     | Details  |
+| ------------ | -------- |
+| **Type**     | `string` |
+| **Required** | `true`   |
+
+
+<br>
+
+
+### Report Storage Base URL
+
+Defines the base URL for uploading report file. By default, when this option is not set, when the user clicks **Export as HTML** (a.k.a. <b>Share</b>), it will automatically trigger the download of the report file:
+
+<p align="center">
+  <img src="../images/download-report.png" alt="Download report" width="320">
+  <br>
+</p>
+
+Once it's set, when user click **Export as HTML**, report file will firstly be uploaded to the Report Storage Server, which returns a publicly accessible URL for the persistent file.
+
+<p align="center">
+  <img src="../images/upload-report-success.png" alt="Download report" width="320">
+  <br>
+</p>
+
+#### Report Storage Server Interface
+
+The Report Storage Server should implement the following HTTP API endpoint:
+
+| Property     | Details                                                                                                      |
+| ------------ | ------------------------------------------------------------------------------------------------------------ |
+| **Endpoint** | `POST /your-storage-enpoint`                                                                                 |
+| **Headers**  | Content-Type: `multipart/form-data` <br> <!-- - Authorization: Bearer \<access_token\> (Not Supported) --> |
+
+#### Request Body
+
+The request should be sent as `multipart/form-data` with the following field:
+
+| Field | Type | Required | Description      | Constraints                        |
+| ----- | ---- | -------- | ---------------- | ---------------------------------- |
+| file  | File | Yes      | HTML report file | - Format: HTML<br>- Max size: 30MB |
+
+#### Response
+
+**Success Response (200 OK)**
+```json
+{
+  "url": "https://example.com/reports/xxx.html"
+}
+```
+
+The response should return a JSON object containing a publicly accessible URL where the report can be accessed.
+
+> [!NOTE]
+> Currently, there is no authentication designed for Report Storage Server. If you have any requirements, please submit an [issue](https://github.com/bytedance/UI-TARS-desktop/issues).
+
+
+<br>
+
+
+### UTIO Base URL
+
+**UTIO** (_UI-TARS Insights and Observation_) is a data collection mechanism for insights into **UI-TARS Desktop** (_Introduced at [#60](https://github.com/bytedance/UI-TARS-desktop/pull/60)_). The design of UTIO is also related to sharing. The overall process is as follows:
+
+<p align="center">
+  <img src="../images/utio-flow.png" alt="UTIO Flow" width="800">
+  <br>
+  <em>UTIO Flow</em>
+</p>
+
+This option defines the base URL for the **UTIO** server that handles application events and instructions.
+
+
+#### Server Interface Specification
+
+The UTIO server accepts events through HTTP POST requests and supports three types of events:
+
+| Property     | Details                          |
+| ------------ | -------------------------------- |
+| **Endpoint** | `POST /your-utio-endpoint`       |
+| **Headers**  | Content-Type: `application/json` |
+
+##### Event Types
+
+The server handles three types of events:
+
+###### **Application Launch**
+```typescript
+interface AppLaunchedEvent {
+  type: 'appLaunched';
+  platform: 'iOS' | 'Android' | 'Web';
+  osVersion: string;
+  screenWidth: number;
+  screenHeight: number;
+}
+```
+
+###### **Send Instruction**
+```typescript
+interface SendInstructionEvent {
+  type: 'sendInstruction';
+  instruction: string;
+}
+```
+
+###### **Share Report**
+```typescript
+interface ShareReportEvent {
+  type: 'shareReport';
+  lastScreenshot?: string;
+  report?: string;
+  instruction: string;
+}
+```
+
+##### Request Example
+
+```json
+{
+  "type": "appLaunched",
+  "platform": "iOS",
+  "osVersion": "16.0.0",
+  "screenWidth": 390,
+  "screenHeight": 844
+}
+```
+
+##### Response
+
+**Success Response (200 OK)**
+```json
+{
+  "success": true
+}
+```
+
+> [!NOTE]
+> All events are processed asynchronously. The server should respond promptly to acknowledge receipt of the event.
+
+
+##### Server Example
+
+###### Node.js
+
+```js
+const express = require('express');
+const cors = require('cors');
+const app = express();
+const port = 3000;
+
+app.use(cors());
+app.use(express.json());
+
+app.post('/your-utio-endpoint', (req, res) => {
+  const event = req.body;
+
+  if (!event || !event.type) {
+    return res.status(400).json({ error: 'Missing event type' });
+  }
+
+  switch (event.type) {
+    case 'appLaunched':
+      return handleAppLaunch(event, res);
+    case 'sendInstruction':
+      return handleSendInstruction(event, res);
+    case 'shareReport':
+      return handleShareReport(event, res);
+    default:
+      return res.status(400).json({ error: 'Unsupported event type' });
+  }
+});
+
+app.listen(port, () => {
+  console.log(`Server listening on port ${port}`);
+});
+```
+
+###### Python
+
+```python
+from flask import Flask, request, jsonify
+from flask_cors import CORS
+import re
+
+app = Flask(__name__)
+CORS(app)
+
+@app.route('/events', methods=['POST'])
+def handle_event():
+    data = request.get_json()
+
+    if not data or 'type' not in data:
+        return jsonify({'error': 'Missing event type'}), 400
+
+    event_type = data['type']
+
+    if event_type == 'appLaunched':
+        return handle_app_launch(data)
+    elif event_type == 'sendInstruction':
+        return handle_send_instruction(data)
+    elif event_type == 'shareReport':
+        return handle_share_report(data)
+    else:
+        return jsonify({'error': 'Unsupported event type'}), 400
+
+if __name__ == '__main__':
+    app.run(port=3000)
+```
+
diff --git a/examples/presets/default.yaml b/examples/presets/default.yaml
@@ -0,0 +1,8 @@
+name: UI TARS Desktop Example Preset
+language: en
+vlmProvider: Hugging Face
+vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1
+vlmApiKey: your_api_key
+vlmModelName: your_model_name
+reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload
+utioBaseUrl: https://your-utio-endpoint.com/collect
diff --git a/forge.config.ts b/forge.config.ts
@@ -25,6 +25,11 @@ const skipDevDependencies = new Set([
 const keepLanguages = new Set(['en', 'en_GB', 'en-US', 'en_US']);
 // const ignorePattern = new RegExp(`^/node_modules/(${[...devDependencies].join("|")})`)
 
+const enableOsxSign =
+  process.env.APPLE_ID &&
+  process.env.APPLE_PASSWORD &&
+  process.env.APPLE_TEAM_ID;
+
 // remove folders & files not to be included in the app
 async function cleanSources(
   buildPath,
@@ -100,9 +105,7 @@ const config: ForgeConfig = {
     prune: true,
     executableName: 'UI-TARS',
     extraResource: ['./resources/app-update.yml'],
-    ...(process.env.APPLE_ID &&
-    process.env.APPLE_PASSWORD &&
-    process.env.APPLE_TEAM_ID
+    ...(enableOsxSign
       ? {
           osxSign: {
             keychain: process.env.KEYCHAIN_PATH,

diff --git a/images/download-report.png b/images/download-report.png
diff --git a/images/import-preset-from-local.png b/images/import-preset-from-local.png
diff --git a/images/import-preset-from-remote.png b/images/import-preset-from-remote.png
diff --git a/images/setting.png b/images/setting.png
diff --git a/images/upload-report-success.png b/images/upload-report-success.png
diff --git a/images/utio-flow.png b/images/utio-flow.png