Recently, there’s a popular electronic desktop pet application: AIRI. If you’re interested, you can check it out on Github: https://github.com/moeru-ai/airi
It’s essentially a virtual desktop pet based on 3D models (Live2D and VRM models, both are model formats specifically designed for digital humans, based on glTF). It can interact with the computer and integrates large language model features, giving the 3D model true intelligence.
On the desktop platform, it was initially a Rust application developed with Tauri (but now it’s not anymore, I don’t know the exact reason, but I’ll explain the probable reason later).
Although AI programming has significantly lowered the development barrier, it doesn’t mean we don’t need to think.
Note: This experiment was done on Linux, behavior on other platforms is unknown.
If You Want to Try Too
Can I implement a similar electronic desktop pet?
My idea is that I can use Tauri to implement a fullscreen, transparent, always-on-top window with click-through to simulate a desktop pet.
The key point is that this application window must be click-through, so it won’t interfere with the user’s operation of the underlying interface.
Model
Essentially, this is just using frontend three3D to render 3D models, then placing the model in a fullscreen window. The difference is we’re using special models - Live2D models or VRM models.
These model formats are specifically designed for digital humans, based on glTF. Therefore, their motion binding is much simpler than traditional 3D models. For example:
// Left arm hangs naturally
const leftUpperArm = vrm.humanoid.getNormalizedBoneNode('leftUpperArm');
const leftLowerArm = vrm.humanoid.getNormalizedBoneNode('leftLowerArm');
const leftHand = vrm.humanoid.getNormalizedBoneNode('leftHand');
// Right arm
const rightUpperArm = vrm.humanoid.getNormalizedBoneNode('rightUpperArm');
const rightLowerArm = vrm.humanoid.getNormalizedBoneNode('rightLowerArm');
const rightHand = vrm.humanoid.getNormalizedBoneNode('rightHand');
This is a high-level abstraction based on traditional models.
Here are the resources I used:
- three-vrm: Mainly used for rendering VRM models
- vroid hub: VRM model resources
- VRMA: VRMA are pre-packaged VRM model animations, there are 10 sets of common animations here that can adapt to almost all VRM models.
The result is as follows:

Eye Tracking
Can a pet that can’t interact with its owner really be called a pet?
So, I want its eyes to follow the mouse, giving the 3D model a bit of interactive effect. The implementation idea is simple: listen to mouse movement events, calculate the mouse position in the window, then pass the coordinates to the model. In fact, the @pixiv/three-vrm library itself supports LookAt functionality, so we can integrate it very easily.
The Problem
Remember the settings above? We set the application window to click-through, which is setIgnoreCursorEvents(true). If you want to control the application to get mouse events, you’ll find that this setting will prevent the application from getting mouse events.
We can’t get mouse events anymore, how do we solve this?
We’ll definitely think of a solution: listen to mouse events through the Rust layer, then pass the mouse events to the JS layer, and then the JS layer passes the mouse events to the application.
The Rust layer can use rdev or Enigo to get mouse events.
Listening to mouse events in the Rust layer:
/// Start mouse position listener thread
fn start_mouse_listener() {
// Create Enigo instance before Tauri initialization
let enigo = match Enigo::new(&Settings::default()) {
Ok(e) => Arc::new(Mutex::new(e)),
Err(e) => {
eprintln!("Failed to initialize Enigo: {:?}", e);
return;
}
};
// Print screen size
if let Ok(e) = enigo.lock() {
if let Ok(display) = e.main_display() {
println!("Screen size: {:?}", display);
}
}
println!("Starting mouse position listener...");
thread::spawn(move || {
let mut last_x: i32 = -1;
let mut last_y: i32 = -1;
let mut heartbeat = 0u32;
loop {
if let Ok(e) = enigo.lock() {
match e.location() {
Ok((x, y)) => {
if x != last_x || y != last_y {
println!("Mouse position: x={}, y={}", x, y);
MOUSE_X.store(x, Ordering::Relaxed);
MOUSE_Y.store(y, Ordering::Relaxed);
last_x = x;
last_y = y;
}
}
Err(err) => {
eprintln!("Failed to get mouse position: {:?}", err);
}
}
}
heartbeat += 1;
if heartbeat % 60 == 0 {
println!("[Heartbeat] Mouse listener thread running... (position: x={}, y={})", last_x, last_y);
}
thread::sleep(Duration::from_millis(16)); // ~60fps
}
});
}
The code above can indeed listen to mouse positions. However, when the window application is initialized, the enigo.lock() above can no longer get true. Because setIgnoreCursorEvents(true) is set after the webview is initialized. This indicates that setIgnoreCursorEvents(true) directly affects lower-level areas. Through analysis of Tauri’s source code, I found that setIgnoreCursorEvents(true) is implemented by the Wry runtime.
Since I don’t know much about the Wry runtime, I didn’t investigate further.
But at least, for normal developers, this problem is already difficult to solve.
See How Others Do It
Since I can’t solve it myself, let’s see how others do it. Through analysis of the AIRI project’s source code, I found that the AIRI project’s desktop version has completed the migration from Tauri to Electron…
Through git commit history, I found that when the AIRI project was using Tauri, it implemented click-through through the tauri-plugin-window-pass-through-on-hover plugin. But essentially, it’s also calling the underlying window.set_ignore_cursor_events(enabled) to implement click-through. And this plugin just doesn’t support Linux.