[MODELS] Discussion

#372
by victor - opened
Hugging Chat org
โ€ข
edited Sep 23, 2024

Here we can discuss about HuggingChat available models.

image.png

victor pinned discussion

what are limits of using these? how many api calls can i send them per month?

How can I know which model am using

How can I know which model am using

at the bottom of your screen:
image.png

Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding

Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.

apparently gemma cannot code?

Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations

I would also like to see Cohere Command A added to HuggingChat.
While I can try it out at https://cohereforai-c4ai-command.hf.space/, integrating it into HuggingChat would be highly beneficial. It would allow chat history to be saved, enable defining assistants and adjusting temperature settings, and ensure multibyte characters like Japanese are displayed correctly.
I believe it also performs better in providing general responses and understanding prompts.
However, there are cases where Command A refuses to answer even when Command R+ would provide a response. While Command A remains more permissive than most models, it has lost some flexibility compared to Command R+. Since Command A is not a direct successor and has significantly different characteristics, I hope it does not replace Command R+.
If Command R+ were to be removed in favor of Command A, retaining the more flexible Command R+ might be preferable, though itโ€™s a tough decision.

sure, we hope command a will be added to huggingchat and therefore replacing the now-unusable nvidia-nemotron.

image.png

lmao.......... deepseek r1-distilled-32b is hallucinating. What is this!?
(note: qwq-32b rarely hallucinates like this, and after multiple tries this bot auto corrects itself. but deepseek r1-distilled-32b does not)

please fix deepseek r1-distill-32b hallucinations , idk why they popped up random chinese letters right in front of our answers

Max tokens are limited to 4096, but I can confirm that google/gemma-3-27b-it is working as of now.

Max tokens are limited to 4096, but I can confirm that google/gemma-3-27b-it is working as of now.

thanks. we also confirm that gemma 3 is working.

please don't forget to fix deepseek -r1-distill-32b and qwq-32b hallucinations, sometimes they pop out random chinese letters for nothing.

Gemma3 is working now sometimes it gives error input stream but it works majority of time

hi there!

since deepseek r1-distill-32b hallucinates pretty often, we wonder if this problem can be solved completely by replacing it with a much better distilled version of deepseek r1, such as the distill-70b-llama.
in the official deepseek r1's repo introduction they said that the distill-70b version outperformed the 32b-qwen version on 4 out of 6 categories.

we hope this distilled -70b llama model will be replacing the ageing distill-32b version :(((

This comment has been hidden (marked as Resolved)

hi there!

since deepseek r1-distill-32b hallucinates pretty often, we wonder if this problem can be solved completely by replacing it with a much better distilled version of deepseek r1, such as the distill-70b-llama.
in the official deepseek r1's repo introduction they said that the distill-70b version outperformed the 32b-qwen version on 4 out of 6 categories.

we hope this distilled -70b llama model will be replacing the ageing distill-32b version :(((

plus, we hope that the devs would consider adding the latest LGAI-EXAONE-Deep 32b, which is a huge competitor to qwq 32b and deepseek r1-distilled 32b. it would be interesting for another reasoning model to be added in order to compete with existing reasoning models.
https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment