Qwenvert - Privacy-First Local AI for Claude Code

Why Qwenvert?

🔒

Privacy First

All inference happens locally. Localhost-only binding with 93 security tests ensures your code never leaves your machine.

💰

Zero API Costs

No subscription fees, no pay-per-token. Use your Mac's hardware for unlimited AI assistance.

⚡

Apple Silicon Optimized

Hardware-aware configuration for M1/M2/M3 Macs with Metal acceleration and thermal management.

🔄

Full API Compatibility

Drop-in replacement for Claude Code. Works with existing workflows, just point to localhost.

🎯

Multiple Backends

Choose between Ollama or llama.cpp. Switch backends without changing your setup.

📊

Built-in Monitoring

Real-time dashboard tracks performance, thermal behavior, and token throughput.

Quick Start

Note: Qwenvert is not yet published on PyPI. Install from source as shown below.

Installation

# Clone the repository
git clone https://github.com/kmesiab/qwenvert
cd qwenvert

# Install with pip
pip install -e .

Initialize & Start

# Detect hardware and download optimal model
qwenvert init

# Start the adapter + backend
qwenvert start

Configure Claude Code

export ANTHROPIC_BASE_URL=http://localhost:8088
export ANTHROPIC_API_KEY=local-qwen
export ANTHROPIC_MODEL=qwenvert-default

# Start coding!
claude

First run downloads a 4-10GB model (one-time). Requires Python 3.9-3.12 and Mac M1/M2/M3.

How It Works

Qwenvert is an HTTP adapter that translates between Claude Code and local LLM backends:

Claude Code CLI Your IDE

↓

Qwenvert Adapter localhost:8088

↓

Backend Ollama or llama.cpp

↓

Qwen Model Local inference

What Qwenvert Does

Translates APIs: Converts Anthropic Messages API → Ollama/llama.cpp format
Validates Security: All URLs/hosts checked for localhost-only access (93 tests)
Manages Backends: Launches and monitors Ollama or llama.cpp servers
Streams Responses: Server-Sent Events for real-time token streaming
Optimizes Hardware: Auto-configures for your Mac's specs and thermal profile