Ask me what skills you need
What are you building?
Tell me what you're working on and I'll find the best agent skills for you.
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.
Pure C/C++ LLM inference with minimal dependencies, optimized for CPUs and non-NVIDIA hardware.
Use llama.cpp when:
Use TensorRT-LLM instead when:
Use vLLM instead when:
# macOS/Linux
brew install llama.cpp
# Or build from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
# With Metal (Apple Silicon)
make LLAMA_METAL=1
# With CUDA (NVIDIA)
make LLAMA_CUDA=1
# With ROCm (AMD)
make LLAMA_HIP=1
# Download from HuggingFace (GGUF format)
huggingface-cli download \
TheBloke/Llama-2-7B-Chat-GGUF \
llama-2-7b-chat.Q4_K_M.gguf \
--local-dir models/
# Or convert from HuggingFace
python convert_hf_to_gguf.py models/llama-2-7b-chat/
npx skills add NousResearch/hermes-agent --skill llama-cppHow clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.
Very clear and well structured, with almost no room for misunderstanding.
How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.
Highly actionable with clear, concrete steps that an agent can follow directly.