I first encountered plugin architecture in Tachiyomi/Mihon, a manga reader container that provides content through plugins as manga sources. Later, when using fcitx-android, I saw plugin functionality implemented again.
Although I’m not an Android developer, I’ve always been curious about this design. When implementing Kime, I realized that predictive text models, speech-to-text functionality, and emoji features should be diverse. For example, for predictive text, I needed to add my own chat logs to the dataset; speech-to-text might require different ASR models; and emoji packages vary from person to person.
This was the perfect opportunity for me to experiment with plugin architecture.
Research
I researched the plugin systems of both fcitx-android and Mihon.
In fact, I prioritized fcitx-android as a reference because, being an input method as well, I thought it would be simple, right?
It wasn’t.
Here’s a brief overview of Android plugin implementation: simply put, it’s classloader. Load plugin code through classloader; if there are resource packages, the host app’s assets management copies the resources over.
This is a conventional approach for same-process interaction. It’s more like an APK-disguised resource package than a plugin, as it has no lifecycle of its own.
There’s also a more complex cross-process communication approach where plugins have their own lifecycle, but I haven’t tried whether it’s feasible.
Due to permission issues, communication between two APKs has many, many restrictions, so what classloader can do is very limited.
Plugin Architecture Ideas
Mihon
Mihon (inherited from Tachiyomi) adopts a code-loading plugin architecture where extensions are essentially independently compiled APKs, dynamically loaded at runtime via DexClassLoader.
Fcitx5 Android
The Fcitx5 Android plugin system is relatively complex, with 2 types of plugins: data plugins and service plugins.
Data (native library) plugins: Pure data providers like Rime, Anthy, Hangul, Chewing, and other input method engine plugins only merge data into the main app’s data directory at startup.
Service plugins: Interact with the main app in real-time via IPC, such as clipboard-filter.
Plugin Comparison
| Fcitx5 Android | Mihon |
|---|---|
| 1. PackageManager query | 1. PackageManager query |
| 2. Parse plugin.xml | 2. Check Feature declaration |
| 3. API version verification | 3. Signature hash extraction |
| 4. Parse DataDescriptor | 4. Trust list verification |
| 5. Merge into DataHierarchy | 5. Version range check |
| 6. Calculate Diff | 6. Create ClassLoader |
| 7. Execute file operations | 7. Reflect and load source class |
| 8. (Optional) Bind IPC Service | 8. Instantiate Source |
| Failure Types: | Failure Types: |
| - Path conflict | - Untrusted (untrusted signature) |
| - Metadata parsing error | - Error (load failure) |
| - API incompatible |
Through comparison, I found Mihon’s approach more suitable for me.
Experiment
In Kime 1, I had three major features I wanted to plugin-ize:
- Emoji/stickers
- Predictive text
- Speech-to-text
The reason for plugin-izing predictive text was simple: I needed to train my own predictive text model, and since training data would include my private data, it definitely wasn’t suitable for open-sourcing. But what if others had similar needs? Plugin architecture was a good choice.
Speech-to-text boils down to ASR models, whether online API interfaces or local small models, both are suitable for plugins.
Difficulties
In Kime 1, I already encountered many difficulties implementing emoji and predictive text, and by the time I implemented speech-to-text, it became unsustainable.
For example, ProGuard rules and R8 obfuscation: if R8 obfuscation isn’t enabled, the installation package becomes very large; but if R8 obfuscation is enabled, plugins and the main app will have issues finding classes due to name obfuscation. There are also plugin permissions, dependency path conflicts that all need strict alignment. These issues make it impossible to simplify plugin development. If these issues can’t be decoupled from the main app, then plugin architecture is pointless.
But I also understood one thing: to simplify things, plugins themselves shouldn’t reference other dependencies, only relying on the main app’s dependencies.
Abandonment
After much struggle, Kime 2 ultimately only kept the emoji/sticker plugin, while predictive text and speech-to-text were both built into the main app.
Besides the reasons above, another reason is that predictive text and speech-to-text implementations are quite complex—they require not only Android development skills but also model inference knowledge. If predictive text and speech-to-text both introduced onnxruntime, and the versions were different, how would the main app handle it? What about ProGuard? And how to handle multiple plugins of the same type?
In contrast, emoji plugins are ultimately just resource packages. Disguising them as APKs is just for convenient installation and uninstallation, and there’s no complex logic involved, making them very suitable for plugin architecture.