huggingface-local-models

Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.

Best for local LLM inferenceWorks with GitHubLow risk

#huggingface #llama.cpp #gguf #local-models #cpu #cuda #rocm

⌘source

author: @huggingface
repo: huggingface/skills
language: Python

✦overview.md

Key Features

·Search Hugging Face Hub for llama.cpp GGUF repos
·Select right quantization (Q4KM, Q5KM, etc.)
·Launch models with llama-cli or llama-server
·Exact GGUF file lookup via API
·Convert Transformers weights if needed
·OpenAI-compatible local serving

Use Cases

→Run a Qwen2.5 model locally on Mac Metal for testing
→Serve a GGUF model as an OpenAI-compatible endpoint in CI
→Find the smallest quantized model that fits RAM constraints
→Convert a new Transformers model to GGUF for offline use

Best for

✓local LLM inference
✓GPU-accelerated environments
✓CI/CD pipelines needing local models

Not ideal for

!production cloud serving
!large-scale distributed inference

FAQs

skills/huggingface-local-models/SKILL.md

name

huggingface-local-models

description

Hugging Face Local Models

Search the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with llama-cli or llama-server.

Default Workflow

Search the Hub with apps=llama.cpp.
Open https://huggingface.co/<repo>?local-app=llama.cpp.
Prefer the exact HF local-app snippet and quant recommendation when it is visible.
Confirm exact .gguf filenames with https://huggingface.co/api/models/<repo>/tree/main?recursive=true.
Launch with llama-cli -hf <repo>:<QUANT> or llama-server -hf <repo>:<QUANT>.
Fall back to --hf-repo plus --hf-file when the repo uses custom file naming.
Convert from Transformers weights only if the repo does not already expose GGUF files.

Quick Start

Install llama.cpp

brew install llama.cpp
winget install llama.cpp

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
make

Authenticate for gated repos

hf auth login

Search the Hub

https://huggingface.co/models?apps=llama.cpp&sort=trending
https://huggingface.co/models?search=Qwen3.6&apps=llama.cpp&sort=trending

...

$install

1-click copy

npx skills add huggingface/skills --skill huggingface-local-models

Safety assessment

★

Clarity score

How clear and easy to understand the SKILL.md instructions are, rated from 1 to 5.

4/ 5

very good

Clear and well structured, with only minor parts that might need a second read.

◎

Actionability score

How directly an agent can act on the SKILL.md instructions, rated from 1 to 5.

4/ 5

high

Mostly actionable with clear steps; only a few small gaps remain.

~community cookbook

~you might also like

view all →

audit

★21k

testing#accessibility

[✓]from @pbakaus

[✓]

Run technical quality checks across accessibility, performance, theming, responsive design, and anti-patterns.

April 22, 2026

◧ Compare

seedance-video

★1.7k

testing#video generation

[✓]from @openakita

[✓]

通过火山引擎 Ark API 生成 AI 视频 — 文生视频、图生视频、多模态、视频编辑、视频续写、长视频分镜拼接。任务异步执行+轮询，成功后可自动下载到本地。

April 22, 2026

◧ Compare

threejs-shaders

★34k

code-review#three.js

[✓]from @sickn33

[✓]

Three.js shaders - GLSL, ShaderMaterial, uniforms, custom effects. Use when creating custom visual effects, modifying...

April 22, 2026

◧ Compare

tiktok-automation

★34k

devops#tiktok

[✓]from @sickn33

[✓]

Automate TikTok tasks via Rube MCP (Composio): upload/publish videos, post photos, manage content, and view user prof...

⚠ danger

April 22, 2026

◧ Compare

tmux

★34k

devops#tmux

[✓]from @sickn33

[✓]

Expert tmux session, window, and pane management for terminal multiplexing, persistent remote workflows, and shell sc...

April 22, 2026

◧ Compare

todoist-automation

★34k

devops#todoist

[✓]from @sickn33

[✓]

Automate Todoist task management, projects, sections, filtering, and bulk operations via Rube MCP (Composio).

April 22, 2026

◧ Compare

huggingface-local-models

Best for local LLM inferenceWorks with GitHubLow risk

Hugging Face Local Models

Search the Hugging Face Hub for llama.cpp-compatible GGUF repos, choose the right quant, and launch the model with llama-cli or llama-server.

Default Workflow

Search the Hub with apps=llama.cpp.

Open https://huggingface.co/<repo>?local-app=llama.cpp.

Prefer the exact HF local-app snippet and quant recommendation when it is visible.

Confirm exact .gguf filenames with https://huggingface.co/api/models/<repo>/tree/main?recursive=true.

Launch with llama-cli -hf <repo>:<QUANT> or llama-server -hf <repo>:<QUANT>.

Fall back to --hf-repo plus --hf-file when the repo uses custom file naming.

Convert from Transformers weights only if the repo does not already expose GGUF files.

Quick Start

Install llama.cpp

brew install llama.cpp winget install llama.cpp

git clone https://github.com/ggml-org/llama.cpp cd llama.cpp make

Authenticate for gated repos

hf auth login

Search the Hub

https://huggingface.co/models?apps=llama.cpp&sort=trending https://huggingface.co/models?search=Qwen3.6&apps=llama.cpp&sort=trending

huggingface-local-models

Key Features

Use Cases

Best for

Not ideal for

FAQs

How do I find the exact GGUF filename for a repo?

Can I use this skill with a GPU?

What if the repo doesn't have GGUF files?

Hugging Face Local Models

Default Workflow

Quick Start

Install llama.cpp

Authenticate for gated repos

Search the Hub

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

audit

seedance-video

threejs-shaders

tiktok-automation

tmux

todoist-automation

AI Skill Finder

huggingface-local-models

Key Features

Use Cases

Best for

Not ideal for

FAQs

How do I find the exact GGUF filename for a repo?

Can I use this skill with a GPU?

What if the repo doesn't have GGUF files?

Hugging Face Local Models

Default Workflow

Quick Start

Install llama.cpp

Authenticate for gated repos

Search the Hub

Safety assessment

Clarity score

Actionability score

~community cookbook

~you might also like

audit

seedance-video

threejs-shaders

tiktok-automation

tmux

todoist-automation