cap_llm_inspect — image analysis
Source: cap_llm_inspect.c · header: cap_llm_inspect.h
cap_llm_inspect is the LLM-interaction reference capability. It demonstrates a special pattern: while executing a tool for the LLM, it starts another LLM inference (nested call) to analyze a local image file.
Useful subtasks include:
- Image understanding (this module)
- Text summarization
- Code explanation
Tool surface
Section titled “Tool surface”cap_llm_inspect registers a single Callable, inspect_image:
| Tool ID | inspect_image |
|---|---|
| Description | Analyze a local image from an absolute path. Confirm the path first, then provide a prompt describing what to inspect. |
| Input | { "path": "<absolute path>", "prompt": "<what to inspect>" } |
| Output | Textual analysis from the LLM |
Typical flow:
Core idea: nested LLM inference
Section titled “Core idea: nested LLM inference”The implementation is small (~110 lines). Core flow:
Unlike the main Agent chat
Section titled “Unlike the main Agent chat”claw_core_llm_infer_media is a standalone LLM invocation, isolated from the user-facing Agent session:
- Uses its own
system_prompt(image-focused, no chat history) - Does not consume the current session token budget as normal turns
- Does not go through
cap_skilltool-visibility management - Requires multimodal support from the configured backend
Error handling
Section titled “Error handling”If multimodal is unavailable or the path is bad, the error string is returned in output to the caller (LLM or automation).
Design takeaways
Section titled “Design takeaways”cap_llm_inspect highlights several choices:
-
Dedicated system prompt: image work should not be biased by conversational history; a fixed specialist prompt stabilizes tone.
-
“Confirm the path first” in the description: the LLM should use
cap_fileslist_dirbefore burning a nested call on a wrong path. -
Stateless: no retained state; each invocation is its own mini-task.
-
Separation of concerns: download lives in
cap_im_*, paths incap_files, analysis here.
Extensions
Section titled “Extensions”Same pattern could power:
inspect_audio(needs audio-capable models)summarize_document(text inference entrypoint)classify_image(different fixed system prompt)