NeoCore is a state-of-the-art, CPU-optimized transformer architecture designed for edge computing and enterprise deployment. By leveraging advanced CPU-specific optimizations and modern architectural improvements, NeoCore achieves exceptional performance without requiring GPU acceleration.
- 🔋 CPU-Native Design: Optimized from the ground up for modern CPU architectures
- 🎯 Memory Efficient: Advanced caching and chunking strategies for optimal memory usage
- 🛠 Enterprise Ready: Production-grade implementation with comprehensive logging and monitoring
- 🔄 Modern Architecture: Incorporates Multi-Query Attention, RMSNorm, and Rotary Embeddings
- 📊 Extensive Benchmarking: Built-in performance profiling and optimization tools
pip install neocore
NeoCore introduces several architectural innovations:
- Multi-Query Attention (MQA)
Q: [Batch, Seq, Heads, Head_Dim] # Multiple query heads
K,V: [Batch, 1, Head_Dim] # Single key/value
- RMSNorm for Stabilization
RMSNorm(x) = x * scale / sqrt(mean(x²) + ε)
- Block-wise Computation
Input -> Chunked Processing -> Cache-Friendly Operations -> Output
┌──────────────────┐
│ Input Embedding │
└────────┬─────────┘
│
┌────▼────┐
│ Chunk 1 │──┐
└─────────┘ │
┌─────────┐ │
│ Chunk 2 │──┼─► Parallel Processing
└─────────┘ │
┌─────────┐ │
│ Chunk N │──┘
└─────────┘
- Custom blocked matrix multiplication
- Adaptive chunk sizing
- Operation result caching
# Traditional vs NeoCore MQA
Traditional: O(N * H * D) memory
NeoCore: O(N * D) memory
- Rotary embeddings for enhanced position awareness
- Cache-friendly implementation
- Optimized for CPU SIMD operations
Batch Size | Sequence Length | Processing Time (ms) | Tokens/Second |
---|---|---|---|
1 | 32 | 31.17 | 1,026 |
4 | 64 | 43.51 | 5,883 |
16 | 128 | 161.28 | 12,700 |
from neocore import NoamConfig, CPUOptimizedNoamTransformer
# Initialize configuration
config = NoamConfig(
d_model=512,
n_heads=8,
n_layers=6,
warmup_steps=4000,
chunk_size=32
)
# Create model
model = CPUOptimizedNoamTransformer(config)
# Process input
output = model(input_ids)
- Edge Computing: Optimal for deployment on CPU-only edge devices
- Enterprise Systems: Reliable performance on standard server hardware
- CI/CD Pipelines: Efficient inference in production pipelines
- Privacy-First Applications: On-device processing without GPU requirements
- Intelligent cache management system
- Adaptive chunk sizing based on input
- Memory-efficient attention patterns
Number of Threads = min(CPU_COUNT, MAX_EFFICIENT_THREADS)
Thread Pool Size = Adaptive based on workload
- Level 1: Basic CPU optimizations
- Level 2: Cache-aware operations
- Level 3: Advanced parallelization
- Level 4: Full SIMD utilization
Run comprehensive benchmarks:
python -m neocore.benchmark --config benchmark_config.yaml
We welcome contributions! Please see our Contributing Guidelines for details.
Apache License 2.0. See LICENSE for details.
Built on modern transformer innovations with specific optimizations for CPU architectures. Special thanks to the research community for their groundbreaking work in efficient transformer designs.
@software{neocore2024,
title={NeoCore: CPU-Optimized Transformer Architecture},
author={Kye Gomez},
year={2024},
publisher={GitHub},
url={https://github.com/neocore/neocore}
}