Project 01
SparseGPT · One-Shot LLM Pruning
Completed · LLM efficiency
Made LLaMA-7B lighter without retraining, then measured the real cost of that speedup instead of assuming compression is free.
- Built for: efficient inference experiments where GPU memory, latency, and model quality all matter.
- My work: implemented SparseGPT runs across unstructured and 2:4 / 4:8 structured sparsity.
- Measured: perplexity and zero-shot accuracy on ARC-C, ARC-E, and PIQA at 25 to 60% sparsity.
- Outcome: completed pruning benchmarks across sparsity settings and documented the accuracy/compression tradeoffs.