AI & ML interests

None defined yet.

zillow's activity

csabakecskemetiย 
posted an update 4 days ago
view post
Post
3278
I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).

https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page

I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.
csabakecskemetiย 
posted an update 5 days ago
view post
Post
2324
Managed to get my hands on a 5090FE, it's beefy

| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ยฑ 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ยฑ 0.18 |

Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/
csabakecskemetiย 
posted an update 9 days ago
csabakecskemetiย 
posted an update 14 days ago
csabakecskemetiย 
posted an update 19 days ago
view post
Post
820
Fine tuning on the edge. Pushing the MI100 to it's limits.
QWQ-32B 4bit QLORA fine tuning
VRAM usage 31.498G/31.984G :D

  • 4 replies
ยท
csabakecskemetiย 
posted an update 26 days ago
view post
Post
1949
-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.
  • 4 replies
ยท
csabakecskemetiย 
posted an update about 1 month ago
view post
Post
2772
Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.
ยท
csabakecskemetiย 
posted an update about 1 month ago
csabakecskemetiย 
posted an update about 2 months ago
view post
Post
1868
Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the userโ€™s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.
  • 4 replies
ยท
csabakecskemetiย 
posted an update about 2 months ago
csabakecskemetiย 
posted an update 2 months ago
view post
Post
2334
I've run the open llm leaderboard evaluations + hellaswag on deepseek-ai/DeepSeek-R1-Distill-Llama-8B and compared to meta-llama/Llama-3.1-8B-Instruct and at first glance R1 do not beat Llama overall.

If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results

Am I made some mistake, or (at least this distilled version) not as good/better than the competition?

I'll run the same on the Qwen 7B distilled version too.
ยท
csabakecskemetiย 
posted an update 2 months ago
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 3 months ago
view post
Post
1555
Happy New Year, Huggingface community!
In 2025, I'll continue my quantization (and some fine-tuning) efforts to support the open-source AI and Make knowledge free for everyone.

https://huggingface.co/DevQuasar
https://devquasar.com/
  • 1 reply
ยท
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 3 months ago
view post
Post
1503
I've built a small utility to split safetensors file by file.
The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16.
The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.

BTW: I'll upload the bf16 version here:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
(it will take a while - days with my upload speed)
If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.

The tool, is available from here:
https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py
It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file.
I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline.
use --help for usage
Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 4 months ago