cap_im_platform — IM platform integration
Entry point: cap_im_platform.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_platform.c · header: cap_im_platform.hcomponents/claw_capabilities/cap_im_platform/include/cap_im_platform.h · Skill: SKILL.mdcomponents/claw_capabilities/cap_im_platform/skills/cap_im_platform/SKILL.md
cap_im_platform is the unified source component for ESP-Claw IM integrations. It bundles the Feishu, QQ, Telegram, WeChat, and shared attachment implementation into one ESP-IDF component, while keeping the runtime surface split by platform.
This means the build dependency is unified, but the existing runtime group ids and tool names remain stable:
| Runtime group | Event source | Text | Image | File |
|---|---|---|---|---|
cap_im_feishu | feishu_gateway | feishu_send_message | feishu_send_image | feishu_send_file |
cap_im_qq | qq_gateway | qq_send_message | qq_send_image | qq_send_file |
cap_im_tg | tg_gateway | tg_send_message | tg_send_image | tg_send_file |
cap_im_wechat | wechat_gateway | wechat_send_message | wechat_send_image | Not supported |
cap_im_local | local_gateway | local_send_message | — | — |
Component layout
Section titled “Component layout”The platform component keeps each backend in its own source file so protocol-specific logic stays isolated:
| Source | Responsibility |
|---|---|
| cap_im_platform.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_platform.c | Registers all enabled IM runtime groups. |
| cap_im_feishu.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_feishu.c | Feishu WebSocket/Event API ingress, rich text flattening, and sends. |
| cap_im_qq.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_qq.c | QQ Bot WebSocket ingress, token handling, and sends. |
| cap_im_tg.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_tg.c | Telegram long-poll ingress, attachment download queue, and sends. |
| cap_im_wechat.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_wechat.c | WeChat ClawBot polling, QR login state, and sends. |
| cap_im_attachment.ccomponents/claw_capabilities/cap_im_platform/src/cap_im_attachment.c | Shared local attachment path helpers. |
Runtime model
Section titled “Runtime model”Each backend follows the same split:
- Event source: receive messages from the IM platform, normalize them, and publish
claw_event_routerevents. - Callable tools: expose platform-specific send functions so the Agent, Console, or automation can send text or media.
- Attachment handling: save inbound media under the configured inbox root and publish
attachment_savedevents for downstream rules.
Startup and visibility
Section titled “Startup and visibility”Application startup prepares credentials and attachment settings per enabled platform, then registers the matching runtime groups. The edge_agent app binds outbound Event Router channels such as qq, feishu, telegram, wechat, and web to the corresponding send tools.
The unified Skill declares all four IM runtime groups in cap_groups. Activating the Skill gives the model the platform-specific tools together with the guidance for choosing the current channel and avoiding duplicate replies.
Platform differences
Section titled “Platform differences”| Platform | Inbound model | Chat target | Notes |
|---|---|---|---|
| Feishu | WebSocket/Event API | Feishu chat_id, or user open_id beginning with ou_ | Text sends prefer Markdown-capable interactive cards with plain-text fallback. Media captions are sent as follow-up text. |
| QQ Bot WebSocket API | c2c:<openid> or group:<group_openid> | File delivery depends on QQ platform support; image and generic file paths are separate tool calls. | |
| Telegram | Bot API long polling | Numeric chat id such as 123456789 or -100... | Long text is chunked and files are uploaded with multipart streaming. |
| ClawBot polling API | Concrete room id or contact id | Text and image sends are supported; generic non-image file send is not available. | |
| Web (local) | HTTP/WebSocket (Web frontend) | web channel | Separate cap_im_local component for the built-in IM chat UI on the Web config page; supports text sending. |
Telegram as a reference backend
Section titled “Telegram as a reference backend”Telegram remains a useful representative implementation because it shows the full pattern in a compact backend: long-poll ingress, deduplication, async attachment downloads, and callable text/media sends.
Event source: long polling
Section titled “Event source: long polling”The Telegram backend starts two FreeRTOS tasks from the cap_im_tg group start hook.
tg_poll_task calls getUpdates with a 20 s long-poll timeout, parses each update, and publishes events:
claw_event_router then routes the event to claw_core for the Agent or to automation actions.
Dedup cache
Section titled “Dedup cache”Network jitter can replay updates; cap_im_tg keeps a ring of FNV-1a 64-bit hashes so the same message is not handled twice:
Attachments
Section titled “Attachments”Media download is slow, so Telegram handles it asynchronously:
tg_poll_taskenqueuescap_im_tg_attachment_job_titems into a queue.tg_attachment_taskconsumes jobs, callsgetFile, and streams the payload into FATFS.- On completion it publishes
attachment_savedwith local path, MIME, size, and platform metadata.
Downstream rules can listen for attachment_saved and chain cap_llm_inspect, file operations, or custom automation.
Callable tools
Section titled “Callable tools”The cap_im_tg runtime group registers four descriptors:
| Tool ID | Description | kind |
|---|---|---|
tg_gateway | Poll gateway (event source) | EVENT_SOURCE |
tg_send_message | Send text to a chat_id | CALLABLE |
tg_send_image | Send a local image file | CALLABLE |
tg_send_file | Send a local arbitrary file | CALLABLE |
For tg_send_message, chat_id falls back to the current call context when omitted:
Long text is chunked to fit Telegram’s message limits. tg_send_image and tg_send_file upload via multipart/form-data, using stat() for exact Content-Length and streaming parts through esp_http_client_open instead of buffering the whole file in RAM.
Configuration API
Section titled “Configuration API”Application code configures the Telegram backend through the cap_im_tg_* API exported by cap_im_platform:
The same architectural roles are used by the other backends, with platform-specific authentication, message formats, and media APIs.