-
Notifications
You must be signed in to change notification settings - Fork 23
Home
If you want to dive in to making your own smart glasses app, checkout How To Write a Smart Glasses App in 30 Minutes.
If you need help, reach out to the team: TOSG Discord
This is an overview of working with the whole system - it describe the main SmartGlassesManager app, smart glasses thin clients (like the AndroidSmartGlasses client, ActiveLook, etc.), and third party apps (actual use cases/applications, i.e. the thing you want to use the SmartGlassesManager to make).
The smart glasses industry: "I am a hardware company, I made smart glasses that use an MCU. I want developers to be able to write apps on a smart phone that send data to the glasses. That's easy for 1 app at a time, but there's no way to do that and have multiple apps at the same time, which is completely necessary. We also can't build all the apps into the same APK, because that limits developers, reinvents the wheel, and goes against App Store rules. Even if all of the apps could talk to the glasses at the same time, who handles user intention to run specific apps? Who decides which app gets to display if 2 apps try at the same time? And how do we make app development easy while solving those problems?"
Us: "No problem, use the SmartGlassesManager as a middleware, and all those problems are solved. Plus, all the apps that already work with SmartGlassesManager will immediately work on your glasses. 😎"
-
Android native smart phone server (WIS fork)
-
A native Android library that third parties app call to send data to smart glasses through the ASP server
-
Android native smart glasses thin client (WIS fork)gg
-
MCU C/++ smart glasses thin client (OSSG fork)
The thin client and server apps need a HARD line differentiation between the use cases and the SmartGlassesManager. All displays should be generic (ReferenceCard, ScrollView, etc.) and there should never be a reference to any specific use case.
- https://github.com/ActiveLook/Activelook-API-Documentation/blob/main/ActiveLook_API.md
- Pebble/Rebble
-
Connect smart phone to smart glasses
-
likely BLE, maybe WiFi
-
Android smart glasses
-
microcontroller smart glasses
-
-
Receive audio from glasses
-
Transcribe audio
-
Share transcription with other apps on the same device
-
Receive data from other apps on the same device, send this data to be displayed on smart glasses
-
UI, voice command (TBD how this works with everything else)
SGM - Smart glasses manager/middleware
ASP - Android smart phone
SGD - smart glasses device
3PA - third party app
WW - wake word
There are 2 cores interfaces:
3PA <-> SGM
SGM <-> SGD
The 3PA need to do 3 things:
- Receive data from the sensors on the SGD
- Receive computed data from the SGM or from other 3PAs through the SGM. Examples of this type of data are live transcription or an object recognition output.
- Trigger actions on the smart glasses (display something, play audio, take a picture, etc.)
The SGD needs to:
- Stream sensor data to the SGM
- Receive actions from the SGM and complete those actions
There are two types of commands:
- One-off commands - these commands are given, and they result in a single action, or a single UI response, and no mode change. They are a single interaction, and we return to the previous mode after they are finished. Examples of this type of command are to change the brightness of the display, a natural language query (“how far away is the sun?”), etc.
- Mode-change commands - these commands result in a mode change. All of these commands live under the launcher/run menu.
There should be some built in one-off commands that exist within the SGM, centered around the most basic functionality (think Notepad and Calculator built into windows) or settings (Control panel) but nothing complex. There should be some built-in mode-change commands (and thus, modes) that focus on the most basic functionality (live captions, take a picture, etc.).
PAs should be able to register command names/string with the SGM, so that they can choose an arbitrary voice/menu command as a one-off command or as a mode-change command in the launcher/run-menu.
Thus, there is an app launcher menu on the glasses, which will be a list of command words (voice command) and might also appear like the android launcher (need to be made compact enough to view them all on smart glasses), and can also be scrolled through as well (cursor, gestures, etc.).
- Power off
- Toggle display power
- Brightness control
- notify/update/check
- run/start
The way these are interacted with depends on the interaction method. If voice, they are said. If gestures, they are selected from a menu, probably scrolled through.
This should display a screen with all the notifications on your phone, time, weather, phone and smart glasses stats (wifi, bluetooth, battery), etc.
This launches an app and changes the SGM to be in a mode. This is the only mode-change command, and it receives an argument which is the name of the mode to be entered into.
**What if the 3PA that registered its mode isn’t running? **In this case, the glasses and SGM on ASP should display a notification that the app is down, and prompt the user to start that app on their ASP so they can run it on their glasses.
The SGM, by default, is a home page that is not giving preference to any one app or experience. This could be considered “contextual mode”, where the information that is displayed is whatever is most useful to the current context, or whatever is requested by the user (as a one-off request).
3PAs should subscribe to data
The interface exposed to the 3PA should be an event bus. We should also convert the SGM to use EventBus instead of RxJava pseudo-EventBus: https://github.com/greenrobot/EventBus
The SGM has a menu of different apps that can be launched. When an app is launched, it gets full priority/control of the display, and we have entered the mode of that app.
In general, we want the absolute minimum UI possible, and grow from there.
Our “home screen” should be black. Nothing on it. This should stay that way unless:
- An app requests to display contextually relevant information, and we display it (e.g. notifications).
- The user performs a “wake action” - wake word, wake button press, tap touch sensor on glasses, etc. This should load “wake screen”
Wake Screen
TBD depends on decision about bifurcation of 2 commands types.
Shows the user a list of commands they can run. Guides them through the steps of running those commands.
User Flow
The flow needs to be as fast and smooth as possible. The UI on the smart glasses is only going to be used for things that can be launched quickly and seamlessly. Otherwise, it’s easier/better to just pull out your phone. So, let’s focus on making the UX fast and focused on what users are actually going to do.
Things you might do on the glasses UI: ask a question, take a picture, change the brightness of the display, save a note
Things you wouldn’t do on the glasses UI (sans better AI/contextual interface): order an uber, put in directions for your friends house, add a new column to a spreadsheet, etc.
Another note - Commands mentions that we have two types of commands. However, users probably won’t think in this way - they’ll just want to ask/tell the glasses what to do. Separating commands into some that are available immediately and some that aren’t is thus a mistake. What should the flow look like, assuming users main mode of input is any of:
- Voice command (currently CLI, move to intent-based)
- Wrist/hand/finger gestures
- Capacitive touch sensor on glasses
We have to limit and simplify the views that the glasses can display in order to preserve battery life and simplify the interface. We won’t be sending complex graphics, often won’t even be sending images, we’ll just be sending text, or directions for where to show a built in graphic (e.g., an arrow).
A title and a body. The title is displayed on the top of the glasses content box, the body below it.
A title and a body. The title is displayed on the top of the glasses content box, the image below it, and the body below the image.
A scrolling text screen. Live ‘intermediate strings’ are shown live at the bottom of the screen. Above that, a list of ‘final strings’ is list and scrolls up.