Local AI server with OpenAI-compatible API
Lemonade Server is a lightweight, high-performance local AI inference server
that provides an OpenAI-compatible API for running large language models on
your own hardware.
Features:
- OpenAI-compatible REST API (chat completions, embeddings, etc.)
- Multiple backend support: Vulkan, ROCm (AMD GPUs), and CPU
- Automatic model management and caching
- Support for GGUF models from Hugging Face
- Low latency local inference
- Runs as a background service
Supported Hardware:
- AMD GPUs: RDNA3 (RX 7000), RDNA4 (RX 9000), Strix Point/Halo APUs
- Any Vulkan-capable GPU
- CPU fallback for systems without GPU acceleration
Quick Start:
The server starts automatically after installation. Access the API at:
http://localhost:8000/api/v1
ROCm Support (AMD GPUs):
For ROCm GPU acceleration, connect the process-control interface:
sudo snap connect lemonade-server:process-control
Documentation: https://lemonade-server.ai/