Napaxi Explained: Ant Group's Open-Source Mobile Agent SDK (2026)
Coding & DevTools Guide

Napaxi Explained: Ant Group's Open-Source Mobile Agent SDK (2026)

Published May 20268 Min ReadExpert Review
💡

"Ant Group open-sourced Napaxi, a mobile-native agent SDK that runs AI agents entirely on-device. Here's what it does, how it works, and why it matters in 2026."

On July 1, 2026, Ant Group dropped Napaxi on GitHub: a full-stack mobile agent SDK that runs AI agents entirely on the phone. Two days later it has 22 stars, 3 forks, and one of the most ambitious architecture docs I've read from a Chinese tech company in years.

This isn't another LLM wrapper. It's a Rust runtime kernel with sandboxed execution, a capability-based security model, three mobile SDK adapters (Flutter, Android, iOS), and a protocol for connecting agents across apps, messaging platforms, Bluetooth headsets, cars, and drones. The README shows an agent generating an Android app from a phone, compiling it into an APK, and installing it back to the device. All without a cloud server.

Why does this matter now? 2026 is the year mobile agents stopped being a demo and started being a product requirement. Apple's on-device models, Google's AICore, and the push for privacy-preserving AI all point in one direction: the phone is the next agent platform. And Napaxi is the first SDK purpose-built for that reality to come out of a major company as open source.

The questions this piece answers: what Napaxi actually is, how its architecture works, what xApp/xAgent/xChannel means in practice, how it relates to MCP and A2A, what it's good at, and what I think it gets wrong.

What Napaxi Actually Is (and What It Isn't)

Napaxi is an SDK you embed inside your mobile app. You don't install it. You don't chat with it. You add it to your Flutter, Android, or iOS app as a dependency, and it gives your app an agent runtime - sessions, workspaces, tools, skills, MCP support, and a model loop that can call LLMs, run tool invocations, and return results.

The parts that live on your phone:

  • Session management: persistent chat sessions with history, isolated per agent and channel
  • Workspace files: a sandboxed filesystem directory the agent can read and write
  • Skill registry: SKILL.md manifests for reusable agent behaviors, with a catalog provider for discovery
  • Tool runtime: 14+ built-in mobile tools plus host-defined custom tools, all gated by a capability admission policy
  • LLM routing: OpenAI, Anthropic, Gemini, and any OpenAI-compatible endpoint, configured by the host app
  • Memory engine: curated memory reads/writes, semantic search, and session recall
  • Background execution: scheduled automation jobs, platform wakeup hooks, and foreground handoff
  • MCP client: connect external MCP servers as tool sources
  • Channel system: connect IM platforms (QQ, WeChat, Feishu, Telegram), Bluetooth devices, and other surfaces

The parts that stay with your app: UI, account management, model API keys, platform permissions, and product policy. Napaxi is a runtime, not an app.

Here's what Napaxi is not: a chatbot app, a Claude competitor, a no-code agent builder, or a desktop coding assistant. The coding-in-the-phone demo is a capability showcase, not the product. If you're looking for a tool that writes code for you, this isn't it. If you're building an app that needs an embedded agent, this is.

Architecture: Rust Core, Thin Adapters, Real Sandbox

The repo layout tells the story before you read a single line of code:

napaxi/
  crates/
    core/             Rust runtime kernel, engine, tools, sessions, LLM routing
    features/         Domain crates: skills, evolution (memory review, rollback)
  packages/
    api_bridge/       Rust FFI/FRB bridge over core API
    api_contract/     Language-agnostic API contract: methods, errors, fixtures
    flutter/          Flutter SDK (Dart package: napaxi_flutter)
    android/          Native Android Kotlin adapter
    ios/              Native iOS Swift Package adapter
    agent_provider/   Provider-side SDK for cross-app Agent actions
  examples/
    flutter/          Integration demo app
  vendor/             Patched third-party libs (libsql-patched for mobile)

The dependency direction is strictly one-way: features -> core -> api_bridge -> SDK adapters -> examples. Feature crates don't depend on core. Adapters don't import Rust internals directly. Demo apps only call public SDK APIs. This is the kind of discipline you expect from a team that's shipped production mobile infrastructure before.

The Core API Boundary

crates/core/src/api/ is the single entry point for all adapters. It exposes runtime semantics - engine handles, sessions, agents, workspaces, tools, skills, groups, MCP, channels, evolution - through adapter-neutral types. No Flutter, no Kotlin, no Swift. Just clean Rust types that become Dart models, Kotlin classes, or Swift structs through generated bridge code.

The API carries adapter-friendly DTOs under api::wire for JSON serialization. Generated bridge files (FRB for Flutter, JNI for Android, FFI for iOS) are never edited by hand. The build system enforces this with a hygiene check.

Capability System

Every feature - LLM providers, tools, MCP servers, platform tools, services - is a compiled capability with a stable ID, version, platform support, config schema, risk level, and activation mode. Current capability kinds: llm_provider, tool, platform_tool, mcp, policy, service, agent_engine.

Capabilities have three states:

  • Registered: the SDK binary contains the definition
  • Available: the current platform and host profile can satisfy it
  • Enabled: runtime config allows it to participate in execution

A Flutter app might declare napaxi.platform_tool.* in its host profile. Core checks that declaration against the capability registry and only exposes tools the host explicitly supports. An iOS app can add "disabled_capabilities": ["napaxi.platform_tool.install_apk"] and that tool simply won't exist in the agent's tool list.

Sandboxed Shell Execution

Mobile agents with shell access are terrifying if you think about it for three seconds. Napaxi's approach is the most thorough I've seen in any agent SDK:

  1. Hard gate: destructive and data-exfiltration commands (rm -rf /, mkfs, raw block-device access, fork bombs, netcat exfiltration) are rejected in every mode, including trusted_allow. The gate operates on a token stream, so echo "rm -rf /" passes (quoted text is data) but rm -rf / doesn't (spacing variants are caught).

  2. Known-safe allow-list: read-only commands run automatically. find is safe, find / -delete is not. git status is safe, git push --force is not. sed -n 1,5p is safe, sed -i is not.

  3. Approval posture: everything between hard-gated and known-safe is decided by ShellApprovalMode. Four options: read_only_only (strictest, prompt for everything), on_request (SDK default, prompt for non-safe commands), trusted_allow (run anything that clears the hard gate), and custom (deferred to a host-registered policy hook).

The demo app uses trusted_allow because its sandboxed workspace is the blast radius. Any production app using Napaxi should keep the default on_request.

Mobile Sandbox Environments

For Android, Napaxi runs shell commands inside a proot environment. For iOS, it uses iSH (a user-mode x86 emulator running Alpine Linux). These are packaged as native runtime assets in the SDK bundles. The third-party license situation here is non-trivial - the iSHCore component carries its own GPL/LGPL obligations layered on top of Napaxi's GPL-3.0. Before shipping an app with native runtime assets, you need to review THIRD-PARTY-LICENSES.md, which lists all the transitive obligations from the sandbox components.

xApp, xAgent, xChannel: The Connectivity Layer

The most ambitious part of Napaxi isn't the agent runtime. It's the connectivity model.

xApp: Cross-App Agent Actions

xApp lets an agent in your app trigger actions in another app. Not via URL schemes or Siri Intents - via a proper provider protocol with cryptographic trust.

The flow: the host app creates an ActionProposal (signed with HMAC-SHA256 when using protocol v2), dispatches it to a provider app via Android Intent or iOS URL handoff, the provider validates the proposal (checking caller package, signing certificate, expiry, nonce, and idempotency key), executes the action, and returns an ActionResult.

The security model is serious. Android install binding stores platform identity (package name, signing cert SHA256, activity name). Before every action handoff, the host re-reads the provider's signing certificate and rejects execution if the digest changed. iOS V1 uses foreground URL handoff with Universal Links, but the install binding stores ios_bundle_id, ios_team_id, host_instance_id, and a host_shared_secret for signing.

A provider app ships its own lightweight SDK (packages/agent_provider/) that doesn't depend on the full Napaxi SDK. The idea: a ride-hailing app, a food delivery app, or a banking app can expose a handful of actions through the Agent Provider protocol without embedding the entire agent runtime.

xAgent: Multi-Agent Collaboration

xAgent supports multiple agents within one host app, each with isolated sessions, skills, and tool sets. Agents can collaborate through a group API. More interestingly, the local A2A implementation enables device-to-device agent collaboration over LAN.

The local A2A transport: Android uses NSD service type _napaxi-a2a._tcp. with newline-delimited JSON frames over TCP. iOS uses Bonjour NetService discovery with Network.framework TCP JSON-lines. Both devices exchange a user-visible pairing secret out of band, derive a shared secret, and from that point forward all peer messages are wrapped in AES-256-GCM with HMAC-SHA256 signatures.

This isn't a cloud agent mesh. It's designed for two phones in the same room, or a phone and a tablet on the same Wi-Fi, collaborating directly without a server. The encrypted payload envelope binds message identity fields as AEAD additional data. Unsigned messages are accepted only as untrusted input requiring user confirmation.

Future transports include BLE discovery and a host-provided relay for bridging devices on isolated networks. But V1 ships with LAN TCP as the primary local transport, and it's already more sophisticated than most mobile SDKs' networking.

xChannel: IM, Bluetooth, Cars, Drones

xChannel extends the agent's reach to messaging platforms and device peripherals. The contract: any channel submits normalized inbound envelopes, core resolves the route (exact peer/thread route first, then channel default, then bridge default agent), creates stable sessions, streams agent events, handles ask-human continuations, records history, and queues outbound replies.

V1 ships with:

  • A first-party QQ Bot channel provider (QqBotChannelProvider) in the Flutter SDK, with sans-IO protocol kits in Rust core for payload mapping, gateway state, and fallback classification
  • A Bluetooth audio-device channel provider that accepts host/STT transcripts and routes outbound replies to a TTS sink
  • Channel types for WeChat, Feishu, Lark, Telegram, WhatsApp, Slack, Discord, and SMS - adapters can register these through the same contract

Channel records store surface kind (im, device, app, system, custom), endpoint kind (direct, group, room, thread, broadcast, device), supported modalities (text, audio, image, file, control, sensor, presence), and content format (plain_text, markdown).

A drone channel and a car head-unit channel are both surfaced as surface_kind = "device" with appropriate modality declarations. Core doesn't know about drone SDKs or car APIs - it just knows how to route envelopes, maintain sessions, and apply policy. The transport lives in the host app.

Where Napaxi Fits: MCP, A2A, and the Agent Protocol Landscape

The agent protocol space in 2026 has three major players, and Napaxi intersects with all of them.

vs. Anthropic MCP

MCP (Model Context Protocol) defines how LLMs connect to external tools, resources, and prompts. It's a client-server protocol where the LLM host is the client and tool/data providers are servers. MCP uses JSON-RPC 2.0 over stdio or HTTP+SSE.

Napaxi doesn't compete with MCP. It consumes MCP servers as a tool source. From Napaxi's perspective, an MCP server is just another capability (kind: "mcp") that feeds tool descriptors into the runtime's tool registry. Dynamic headers and tool discovery are scoped to the engine's files directory.

If you have an existing MCP server for database access, file operations, or API integration, you can connect it to a Napaxi agent alongside the 14+ built-in mobile tools. Napaxi adds the mobile-specific execution environment that MCP alone doesn't provide: sandboxed shell, platform tools, cross-app handoff, and background scheduling.

vs. OpenAI A2A

OpenAI's Agent-to-Agent protocol focuses on cloud agent interoperability - standardized task cards, streaming updates, and agent discovery. Napaxi's A2A is designed for a different problem: two phones finding each other on a local network, verifying identities, and exchanging encrypted agent tasks without a server.

Napaxi's local A2A includes AES-256-GCM encryption, HMAC-SHA256 signatures, nonce/idempotency checks, and an auditable ledger. OpenAI's A2A focuses on protocol-level interoperability at cloud scale. These are complementary specifications solving different layers of the agent communication stack - one for cloud mesh, one for peer-to-peer mobile.

vs. Google AI Edge / Android AICore

Google's on-device AI stack lets you run models locally: Gemini Nano through AICore, MediaPipe for vision/audio, custom models through TFLite. Napaxi doesn't bundle models. It calls remote LLMs.

The two are complementary: a Napaxi agent could route its LLM calls through an AICore-hosted local model instead of hitting OpenAI's API. Napaxi's provider capability system already supports routing through different LLM backends. Adding an napaxi.llm.aicore provider that wraps the Android AICore API would be a natural extension.

Comparison Table

| Dimension | Napaxi | MCP-only Tools | OpenAI A2A | Google AICore | |---|---|---|---|---| | Runs on-device | Yes (runtime) | Client only | No (cloud) | Yes (model) | | Sandboxed execution | Yes (proot/iSH) | No | No | No | | Mobile platform tools | 14+ built-in | None | None | None | | Cross-app agent actions | Yes (signed) | No | No | No | | Device-to-device agents | Yes (encrypted LAN) | No | Yes (cloud mesh) | No | | IM/integration channels | Yes (QQ, BT, etc.) | No | No | No | | License | GPL-3.0 | Varies | Varies | Proprietary | | Mobile SDK adapters | Flutter + Android + iOS | None | None | Android only |

Honest Assessment: What Napaxi Gets Right

The architecture is genuinely good. The Rust core with thin adapters, the capability system with registered/available/enabled state management, the one-way dependency direction, and the strict boundary between runtime policy (core) and host integration (adapters) - this is how mobile infrastructure should be built. The team clearly learned from building production mobile platforms.

The security model is unusually thorough for an open-source agent SDK. Shell command safety has three layers with a tokenizer-based hard gate. Agent app actions have cryptographic trust with signing certificate verification. Local A2A messages are AES-256-GCM encrypted with HMAC-SHA256 signatures. Policy hooks are process-global stateless predicates with per-engine admission traces. This is defense-in-depth thinking applied to mobile agents.

The connectivity model is the real innovation. xApp (cross-app agent actions with cryptographically verified providers), xAgent (multi-agent collaboration with encrypted local peer discovery), and xChannel (IM platforms, Bluetooth devices, cars, drones through a unified envelope/session contract) - this is a coherent vision for how agents operate across the mobile surface area. No other SDK has this.

The Flutter-first approach is strategic. Android and iOS adapters exist, but Flutter is the first complete adapter with the demo app. Ant Group has a massive Flutter investment (Alipay's Flutter usage is well-documented), and Napaxi reflects that reality. For teams already on Flutter, this drops in with minimal friction.

They shipped working code, not a whitepaper. The repo has a demo app that exercises the full API surface, GIFs showing on-device development workflows, and CI-verified build scripts. At 8,200 files and growing, this was clearly built internally before release, not thrown together for PR value.

What Napaxi Gets Wrong

The GPL-3.0 license is a real adoption barrier. For an SDK that you embed inside your app, GPL is unusually restrictive. The README acknowledges this implicitly by noting that third-party native components add their own GPL/LGPL obligations on top. If you're building a proprietary mobile app, you can't just add napaxi_flutter to your pubspec.yaml and ship it. You need to either isolate the runtime behind a process boundary (messy on mobile), open-source your entire app (unlikely for most companies), or negotiate a commercial license with Ant Group (no public pricing, no self-serve option).

There's no on-device model support. Napaxi calls remote LLMs. For a "mobile-native" SDK released in July 2026, when Google AICore ships on Pixel and Samsung devices and Apple Intelligence runs on iPhone 15 Pro+, the lack of a local inference path feels like a gap. The capability system could support it, and the architecture doesn't prevent it, but V1 doesn't ship with it.

The learning curve is steep. This is not a weekend project. You need Rust toolchain setup, mobile build targets, Git LFS for runtime assets, Android NDK/Xcode, and an understanding of the capability model before you get a working agent. The Flutter demo app is well-structured, but the "quick start" involves building native artifacts for two platforms before running anything.

iOS limitations are real and under-communicated. The iOS shell sandbox uses iSH (a user-mode x86 emulator), which is clever but comes with performance overhead and GPL license baggage. iOS automation jobs are "best-effort unless a host-controlled foreground handoff or push path wakes the app." iOS A2A handoff is foreground URL-based, not true background discovery. These are Apple platform constraints, not Napaxi bugs, but teams evaluating the SDK should understand the Android/iOS asymmetry before committing.

The dependency on external LLM providers creates latency and privacy tension. The runtime is on-device, but every agent turn hits a remote API. For use cases where latency matters (real-time conversation, device control) or privacy is paramount (health data, financial information), this is a meaningful constraint. A local inference capability would make Napaxi's "pure on-device" pitch more honest.

Only 22 stars on day two. This is barely a fair criticism for a repo that's existed for 48 hours, but it matters for one reason: community. An SDK this complex needs contributors, bug reports, third-party providers, and a plugin ecosystem to fulfill its ambition. Ant Group open-sourcing it is necessary but not sufficient. The real test is whether developers outside Ant build on it.

Who Should Care (and Who Should Wait)

Use Napaxi Now If:

  • You're building a mobile app in Flutter and need an embedded agent runtime
  • You work at a company with existing Ant/Alibaba relationships (commercial licensing is a phone call away)
  • You're building an open-source mobile app where GPL-3.0 isn't a blocker
  • You need cross-app agent actions with cryptographic trust on Android
  • You're researching mobile agent architectures and want a production-grade reference implementation
  • You want to build agents that span IM platforms, Bluetooth devices, or IoT peripherals with a unified runtime

Wait on Napaxi If:

  • You're building a proprietary app and need a permissive license (MIT/Apache 2.0)
  • You need on-device model inference without hitting a remote API
  • Your app is iOS-only and needs reliable background agent execution
  • You're looking for a turnkey solution rather than an SDK you need to integrate
  • You need a large community, extensive third-party providers, and Stack Overflow answers

The Pricing Reality

Napaxi is free as in GPL-3.0. That means free to use, modify, and distribute under the same license. For commercial apps that can't adopt GPL, the cost is whatever Ant Group negotiates. No public pricing exists. In my experience with similar dual-licensed infrastructure from Chinese tech companies (TiDB, OceanBase, Ant Design), enterprise licenses typically start in the five figures annually and scale with deployment size.

The real cost of Napaxi for most teams isn't the license fee. It's the integration effort. You're adopting a Rust runtime with three mobile SDK surfaces, a capability system, and a security model that touches every part of your app. Budget weeks to months, not hours to days.

Final Verdict

Napaxi is the most technically ambitious mobile agent SDK to come out of a major Chinese tech company, and its release under an open-source license (even GPL) is a strong signal about where Ant Group thinks mobile AI is heading. The architecture is production-grade, the security model is unusually thorough, and the connectivity layer (xApp/xAgent/xChannel) is genuinely novel.

The biggest question isn't whether Napaxi is well-built. It is. The question is whether GPL-3.0 and the integration complexity will limit its adoption to Ant Group's ecosystem - Alipay, Alibaba Cloud, and existing partners who can get a commercial license with a phone call. If Napaxi becomes the mobile agent runtime inside Alipay's "A Bao" agent (which Ant is already testing internally for ordering food, hailing rides, and buying funds on-device), its impact on Chinese mobile infrastructure will be enormous regardless of GitHub stars.

For the rest of the world, Napaxi is a reference architecture first and a usable SDK second. Read the capability model. Study the shell safety approach. Borrow the xApp provider protocol pattern. Even if you never add napaxi_flutter to your pubspec.yaml, the design decisions in this repo represent some of the best thinking available on mobile agent security and cross-surface connectivity.

If Ant Group ships an Apache 2.0-licensed subset, adds on-device model routing, and ships reference providers for major IM platforms, Napaxi could become the standard mobile agent runtime. Until then, it's an impressive piece of engineering with a licensing model that limits its audience to those who can afford a conversation with Ant's sales team.

I'll be watching the adoption curve. If it hits 1,000 stars by September and third-party Flutter packages start appearing on pub.dev, that's the signal that Napaxi has legs beyond Ant's walls. If it stays at a few hundred stars with mostly internal contributors, it'll be another well-engineered Chinese open-source project that the rest of the world reads but doesn't run.

Recommended AI Stack

The essential tools referenced in this guide.

Expert Community Feedback

Share your thoughts and join the AI strategic discussion.