cpu-inference

Star

Here are 53 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

Star

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Updated Nov 6, 2023
Python

CoderLSF / fast-llama

Star

Runs LLaMA with Extremely HIGH speed

llama inference-engine cpu-inference llama2

Updated Nov 21, 2023
C++

Scottcjn / ram-coffers

Star

RAM Coffers: Conditional Memory via NUMA-Distributed Weight Banking - O(1) lookup routing for LLM inference (Dec 16, 2025 - predates DeepSeek Engram by 27 days)

ai ram memory-management numa neuromorphic hacktoberfest ppc64le first-timers-only cognitive-computing power8 good-first-issue cpu-inference llm llama-cpp

Updated Mar 16, 2026
C

rbitr / llm.f90

Star

LLM inference in Fortran

ai chatbot transformer llama language-model mamba state-space-model cpu-inference llm llamacpp llama2 phi-2

Updated May 30, 2024
Fortran

Scottcjn / llama-cpp-power8

Star

AltiVec/VSX optimized llama.cpp for IBM POWER8

machine-learning ai numa ibm powerpc hacktoberfest altivec vsx ppc64le first-timers-only power8 good-first-issue cpu-inference llm llama-cpp ggml

Updated Mar 15, 2026
C

brontoguana / krasis

Star

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

transformer inference-engine inference-optimization mixture-of-experts cpu-inference large-language-models gpu-inference llm-inference high-performance-inference hybrid-inference gguf-model-support llama-cpp-alternative

Updated Mar 16, 2026
Rust

FoxNoseTech / diarize

Star

Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.

python audio-analysis speech-to-text speaker-recognition speech-processing speaker-diarization spectral-clustering voice-activity-detection onnx speaker-embedding diarization apache-2 rttm cpu-inference meeting-transcription who-spoke-when

Updated Mar 6, 2026
Python

gyunggyung / Tiny-MoA

Star

Running Mixture of Agents on CPU: LFM2.5 Brain (1.2B) + Falcon-R Reasoner (600M) + Tool Caller (90M). CPU-only, 16GB RAM. Lightweight AI Legion.

multilingual lightweight falcon agents moa uv on-device-ai cpu-inference llm llama-cpp mixture-of-agents tool-calling lfm2

Updated Feb 7, 2026
Python

PureBee / purebee

Star

A GPU defined in software. Runs Llama 3.2 1B at 3.6 tok/sec. Zero dependencies.

javascript gpu webassembly inference llama cpu-inference llm

Updated Feb 27, 2026
JavaScript

jozsefszalma / homelab

Star

The bare metal in my basement

Updated Dec 4, 2025

gabriele-mastrapasqua / qwen3-tts

Star

Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS. Supports 0.6B and 1.7B models, 9 voices, 10 languages.

multilingual c text-to-speech inference simd tts speech-synthesis pure-c inference-engine voicecloning cpu-inference voiceclone qwen qwen3-tts

Updated Mar 16, 2026
C

lucienhuangfu / eLLM

Star

eLLM Infers LLM on CPUs in Real Time

cpu-inference deep-thinking llm-infernece deep-research qwen3 context-engineering rust-llm

Updated Mar 16, 2026
Rust

yybit / pllm

Star

Portable LLM - A rust library for LLM inference

cpu-inference aigc llm llama2

Updated Apr 13, 2024
Rust

laelhalawani / gguf_llama

Star

Wrapper for simplified use of Llama2 GGUF quantized models.

llama quantization cpu-inference llamacpp llama2 gguf

Updated Jan 14, 2024
Python

Scottcjn / pse-vcipher-collapse

Star

Non-bijunctive attention collapse for LLM inference — POWER8 hardware AES (vcipher) + AltiVec vec_perm. Hebbian path selection, cross-head diffusion, O(1) KV prefiltering.