Blog
1 week ago
A Quick Guide to Quantization for LLMs
Quantization is a technique that reduces the precision of a model’s weights and activations. Quantization helps by: Shrinking model size (less disk storage) Reducing memory usage (fits in smaller GPUs/CPUs) Cutting down compute requirements.
Source: HackerNoon →