Trangle Heshvp
Trangle
AI & ML interests
None yet
Recent Activity
upvoted
an
article
1 day ago
Open-source DeepResearch – Freeing our search agents
liked
a model
1 day ago
unsloth/DeepSeek-R1
liked
a dataset
2 days ago
Jiayi-Pan/Countdown-Tasks-3to4
Organizations
Trangle's activity
Is it possible that this is a small model of GPT-3.5?
1
#6 opened 9 months ago
by
Trangle
Why a different architecture from mini and medium?
5
#5 opened 9 months ago
by
winddude
![](https://cdn-avatars.huggingface.co/v1/production/uploads/62fd57e9c1588e1d4c699edc/Y3tvhM3gCj_I6WuZUBFIN.jpeg)
Excuse, how to load or use this dataset? thanks
#1 opened 10 months ago
by
Trangle
Please, authorize access for the base weight!
44
#5 opened 11 months ago
by
Undi95
![](https://cdn-avatars.huggingface.co/v1/production/uploads/63ab1241ad514ca8d1430003/d-43TcOxG-zqAbzrH2m7H.png)
Is there documentation for quantization alignment in long text?
2
#4 opened about 1 year ago
by
Trangle
For the original 200k context, would it be better to do an ntk patchwith 4k?patch
#5 opened about 1 year ago
by
Trangle
好奇现在大模型怎么不和通义千问对比一下,千问在理解能力和指令执行上效果比百川强
2
#2 opened over 1 year ago
by
zhenghui
FlashAttention推理时还是需要关闭,目前开启输出是错乱的
1
#27 opened over 1 year ago
by
Trangle
基于llama2训练的模型,你们有一个bug并没有修复
9
#1 opened over 1 year ago
by
Trangle
基于llama2训练的模型,你们有一个bug并没有修复
9
#1 opened over 1 year ago
by
Trangle
基于llama2训练的模型,你们有一个bug并没有修复
9
#1 opened over 1 year ago
by
Trangle