add-image-vision

Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.

Best for Teams using WhatsApp as a c…Works with GitHub

#image #vision #multimodal #whatsapp #claude

⌘source

author: @qwibitai
repo: qwibitai/nanoclaw
language: TypeScript

✦overview.md

Key Features

·Processes WhatsApp image attachments
·Resizes images using sharp library
·Converts images to base64-encoded content blocks
·Passes images to Claude as multimodal input
·Requires WhatsApp skill to be installed first

Use Cases

→Enabling agents to analyze images sent via WhatsApp messages
→Processing visual content from WhatsApp for multimodal AI responses
→Adding image understanding capabilities to existing WhatsApp agent workflows

Best for

✓Teams using WhatsApp as a communication channel
✓Projects requiring image analysis in agent workflows

Not ideal for

!Non-WhatsApp communication channels
!Text-only agent applications

FAQs

.claude/skills/add-image-vision/SKILL.md

name

add-image-vision

description

Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks.

Image Vision Skill

Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.

Phase 1: Pre-flight

Check if src/image.ts exists — skip to Phase 3 if already applied
Confirm sharp is installable (native bindings require build tools)

Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

git remote -v

If whatsapp is missing, add it:

git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git

Merge the skill branch

git fetch whatsapp skill/image-vision
git merge whatsapp/skill/image-vision || {
  git checkout --theirs package-lock.json
  git add package-lock.json
  git merge --continue
}

This merges in:

src/image.ts (image download, resize via sharp, base64 encoding)
src/image.test.ts (8 unit tests)
Image attachment handling in src/channels/whatsapp.ts

...

$install

1-click copy

npx skills add qwibitai/nanoclaw --skill add-image-vision

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

5/ 5

excellent

Very clear and well structured, with almost no room for misunderstanding.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

5/ 5

very high

Highly actionable with clear, concrete steps that an agent can follow directly.

~community cookbook

April 18, 2026

◧ Compare

Image Vision Skill

Phase 1: Pre-flight

Check if src/image.ts exists — skip to Phase 3 if already applied

Confirm sharp is installable (native bindings require build tools)

Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

git remote -v

If whatsapp is missing, add it:

git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git

Merge the skill branch

git fetch whatsapp skill/image-vision git merge whatsapp/skill/image-vision || { git checkout --theirs package-lock.json git add package-lock.json git merge --continue }

This merges in:

src/image.ts (image download, resize via sharp, base64 encoding)

src/image.test.ts (8 unit tests)

Image attachment handling in src/channels/whatsapp.ts

add-image-vision

Key Features

Use Cases

Best for

Not ideal for

FAQs

What prerequisites are needed?

What does this skill modify?

Image Vision Skill

Phase 1: Pre-flight

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

Merge the skill branch

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

add-ollama-tool

customize

add-voice-transcription

add-compact

add-telegram-swarm

add-macos-statusbar

AI Skill Finder

add-image-vision

Key Features

Use Cases

Best for

Not ideal for

FAQs

What prerequisites are needed?

What does this skill modify?

Image Vision Skill

Phase 1: Pre-flight

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

Merge the skill branch

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

add-ollama-tool

customize

add-voice-transcription

add-compact

add-telegram-swarm

add-macos-statusbar