Even though the Perplexity scores of the pruned version are 3 times higher, the ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores are holding remarkably well, considering two layers were removed (5 and 39). This seems to support Xin Men et al conclusions in ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)
Results summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. ๐คฏ A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! ๐ค
We just released a paper (NeuZip) that compresses VRAM in a lossless manner to run larger models. This should be particularly useful when VRAM is insufficient during training/inference. Specifically, we look inside each floating number and find that the exponents are highly compressible (as shown in the figure below).
You can use it to build patchflows - workflows that use LLMs for software development tasks like bug fixing, pull request review, library migration and documentation.