15 9 10

kas

shing3232

AI & ML interests

None yet

Recent Activity

new activity about 2 months ago

tencent/Tencent-Hunyuan-Large:这个模型得什么配置能运行起来啊

updated a model about 2 months ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

View all activity

Organizations

None yet

shing3232's activity

New activity in tencent/Tencent-Hunyuan-Large about 2 months ago

这个模型得什么配置能运行起来啊

#13 opened about 2 months ago by

demo001s

updated a model about 2 months ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

Updated Nov 8 • 15 • 1

upvoted a collection 3 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated 28 days ago • 257

liked a model 6 months ago

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3

Text Generation • Updated Jul 1 • 9.09k • 119

updated a model 7 months ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31 • 9 • 3

New activity in SakuraLLM/Sakura-14B-Qwen2beta-v0.9.2-GGUF 7 months ago

CUDA运行不了BF16模型？

#1 opened 7 months ago by

NeuronAstate

New activity in Qwen/Qwen1.5-7B-Chat-GGUF 7 months ago

Please post f16 quantization.

#1 opened 7 months ago by

ZeroWw

liked a model 7 months ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31 • 9 • 3

upvoted a paper 8 months ago

BASS: Batched Attention-optimized Speculative Sampling

Paper • 2404.15778 • Published Apr 24 • 8

New activity in Qwen/CodeQwen1.5-7B-Chat 8 months ago

What are the diffences of this with Qwen/CodeQwen1.5-7B

#5 opened 8 months ago by

Kalemnor

liked a model 9 months ago

databricks/dbrx-instruct

Text Generation • Updated Apr 19 • 19.1k • 1.11k

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat 9 months ago

请问这个版本GPU内存消耗28G与14B对比如何?

#7 opened 9 months ago by

william0014

upvoted a paper 9 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 91

New activity in Qwen/qwen1.5-MoE-A2.7B-Chat-demo 9 months ago

How is the inference so fast in this free hardware space?

#1 opened 9 months ago by

mahiatlinux

liked a Space 9 months ago

Running on CPU Upgrade

12.1k

🏆

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

upvoted a paper 9 months ago

ChatEDA: A Large Language Model Powered Autonomous Agent for EDA

Paper • 2308.10204 • Published Aug 20, 2023 • 1

New activity in ai21labs/Jamba-v0.1 9 months ago

Would there a chance Jamba to be train in 1.58bit weight?

#22 opened 9 months ago by

shing3232

reacted to merve's post with 🚀 9 months ago

Post

3325

LLaVA-NeXT is recently merged to Hugging Face transformers and it outperforms many of the closed source models like Gemini on various benchmarks 🤩 Let's take a look!
Demo: merve/llava-next
Notebook: https://colab.research.google.com/drive/1afNudu72SNWZCYtCVrRlb9T9Vj9CFJEK?usp=sharing
LLaVA is essentially a vision-language model that consists of ViT-based CLIP encoder, a MLP projection and Vicuna as decoder ✨
LLaVA 1.5 was released with Vicuna, but LLaVA NeXT (1.6) is released with four different LLMs:
- Nous-Hermes-Yi-34B
- Mistral-7B
- Vicuna 7B & 13B
Mistral and Nous-Hermes-Yi-34B are performing better and have better commercial use.
Moreover, according to authors' findings, the improvements comes from more diverse and high quality data mixture and dynamic high resolution.
LLaVA based on Nous-Hermes-Yi-34B outperforms many other models, including Gemini in various multimodal understanding and generation benchmarks 😊

commented a paper 9 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 603 •

142

updated a model 9 months ago

shing3232/Sakura13B-LNovel-v0.9-qwen1.5-GGUF-IMX

Updated Mar 25 • 83 • 7