ModalTranscriberMCP / tests /cache /xyz_podcast_episode.srt
richard-su's picture
Upload folder using huggingface_hub
76f9cd2 verified
1
00:00:00,000 --> 00:00:06,600
各位听众朋友大家好
2
00:00:06,600 --> 00:00:09,980
欢迎收听Hugging Face每日爱论文速递周末特辑
3
00:00:09,980 --> 00:00:14,280
每周日准时为您带来一周内Hugging Face向最受欢迎的论文汇总
4
00:00:14,280 --> 00:00:18,379
本期节目涵盖的时间段是2025年6月2日至68
5
00:00:18,379 --> 00:00:25,199
在本期节目中我们将为您精选五篇备受关注的论文内容涵盖了通过强化学习RL
6
00:00:25,199 --> 00:00:28,400
提升大型语言模型LLM的自我改进
7
00:00:28,399 --> 00:00:33,079
高商仇恳在推理中的应用延长的强化学习对LM推理的拓展
8
00:00:33,079 --> 00:00:37,859
测试时驱动的大模型快慢思考框架以及一种经济高效的视觉
9
00:00:37,859 --> 00:00:39,500
语言动作模型
10
00:00:39,500 --> 00:00:44,159
接下来让我们一起深入这些前沿研究探索AI技术的最新进展
11
00:00:44,159 --> 00:00:45,340
节目正式开始
12
00:00:45,340 --> 00:00:53,500
本期节目的第一篇论文是反思重视奖励通过强化学习实现LM的自我提升
13
00:00:53,500 --> 00:00:57,039
这篇论文在Hugging Face社区获得了169个点赞
14
00:00:57,039 --> 00:00:59,759
显示出其研究价值和社区的关注度
15
00:00:59,759 --> 00:01:04,879
这篇论文的核心目标是提升大型语言模型LMS的性能
16
00:01:04,879 --> 00:01:06,700
通过一种名为反思
17
00:01:06,700 --> 00:01:07,359
重视
18
00:01:07,359 --> 00:01:09,239
奖励的新框架来实现
19
00:01:09,239 --> 00:01:13,219
这个框架的关键在于让模型在任务失败后进行自我反思
20
00:01:13,219 --> 00:01:14,400
分析失败原因
21
00:01:14,400 --> 00:01:17,799
并在再次尝试时利用这些反思来改进表现
22
00:01:17,799 --> 00:01:18,759
具体来说
23
00:01:18,759 --> 00:01:22,099
模型在失败后会生成一段自我反思的评论
24
00:01:22,099 --> 00:01:23,579
解释哪里出了问题
25
00:01:23,579 --> 00:01:25,019
并提出改进建议
26
00:01:25,019 --> 00:01:28,179
然后模型会根据这些反思再次尝试任务
27
00:01:28,179 --> 00:01:29,879
如果第二次尝试成功
28
00:01:29,879 --> 00:01:32,140
模型在反思阶段生成的内容
29
00:01:32,140 --> 00:01:34,920
会通过一种名为Group Relative Policy Optimization
30
00:01:34,920 --> 00:01:36,699
Gruple的算法获得奖励
31
00:01:36,699 --> 00:01:39,239
从而进一步优化其自我反思的能力
32
00:01:39,239 --> 00:01:42,319
论文中使用了多个模型进行实验
33
00:01:42,319 --> 00:01:43,379
包括Cornar
34
00:01:43,379 --> 00:01:44,519
Lama 3.1
35
00:01:44,519 --> 00:01:45,599
Fi 3.5
36
00:01:45,599 --> 00:01:46,799
Mini Instruct等
37
00:01:46,799 --> 00:01:48,579
并基于两个主要数据集
38
00:01:48,579 --> 00:01:49,780
Epojin和Countdown
39
00:01:49,780 --> 00:01:52,780
Epojin数据集包含6万个高质量的函数调用
40
00:01:52,780 --> 00:01:55,140
要求模型生成正确的工具调用
41
00:01:55,140 --> 00:01:56,299
Countdown数据集
42
00:01:56,299 --> 00:01:59,280
则包含45万个数字列表和目标数字
43
00:01:59,280 --> 00:02:03,000
要求模型通过这些数字生成正确的方程来达到目标
44
00:02:03,000 --> 00:02:04,299
研究结果显示
45
00:02:04,299 --> 00:02:05,200
这种反思
46
00:02:05,200 --> 00:02:05,820
重视
47
00:02:05,820 --> 00:02:09,219
奖励的方法在提升模型性能方面非常有效
48
00:02:09,219 --> 00:02:11,159
特别是在Epojin数据集上
49
00:02:11,159 --> 00:02:13,639
经过Gurple训练的Quin27B模型
50
00:02:13,639 --> 00:02:17,020
甚至超过了未经过训练的Quin272B模型
51
00:02:17,020 --> 00:02:17,639
此外
52
00:02:17,639 --> 00:02:21,620
自我反思显著提升了模型在Countdown数据集上的表现
53
00:02:21,620 --> 00:02:24,379
尤其是对于那些初始表现较差的模型
54
00:02:24,379 --> 00:02:26,000
论文还指出
55
00:02:26,000 --> 00:02:30,139
这种自我反思的方法不仅增强了模型解决复杂任务的能力
56
00:02:30,139 --> 00:02:33,599
还使得较小的模型能够超越较大的未训练模型
57
00:02:33,599 --> 00:02:36,359
显示出其在效率和通用性上的优势
58
00:02:36,359 --> 00:02:36,800
此外
59
00:02:36,800 --> 00:02:39,780
研究中几乎没有观察到灾难性遗忘的现象
60
00:02:39,780 --> 00:02:43,380
表明这种方法在模型乳棒性方面也有显著提升
61
00:02:43,380 --> 00:02:44,219
总的来说
62
00:02:44,219 --> 00:02:46,840
这篇论文提出了一种创新的方法
63
00:02:46,840 --> 00:02:48,660
通过强化学习的方式
64
00:02:48,660 --> 00:02:51,260
让LLMS进行自我反思和改进
65
00:02:51,260 --> 00:02:53,800
从而在复杂任务上取得更好的表现
66
00:02:54,500 --> 00:02:57,300
这是本期节目的第二篇论文
67
00:02:57,300 --> 00:02:59,300
题目是超越8020法则
68
00:02:59,300 --> 00:03:03,220
高商少数Token驱动LLM推理的有效强化学习
69
00:03:03,219 --> 00:03:07,319
这篇论文目前在Hugging Face社区获得了130个点赞
70
00:03:07,319 --> 00:03:10,120
显示出它在学术界引起了广泛关注
71
00:03:10,120 --> 00:03:12,300
这篇论文的核心研究问题是
72
00:03:12,300 --> 00:03:16,400
在大型语言模型LLMS的验证奖励强化学习
73
00:03:16,400 --> 00:03:17,379
RLVR中
74
00:03:17,379 --> 00:03:20,120
不同类型的Token如何影响推理性能
75
00:03:20,199 --> 00:03:24,680
以及是否可以通过专注于特定类型的Token来提升RLVR的效果
76
00:03:24,680 --> 00:03:26,719
研究团队提出了一个假设
77
00:03:26,719 --> 00:03:30,699
高商的少数Token作为推理路径中的关键分支点
78
00:03:30,699 --> 00:03:34,780
比低商的多数Token更能有效驱动RLVR他们进一步假设
79
00:03:34,780 --> 00:03:37,839
通过限制策略梯度更新到这些高商Token
80
00:03:37,839 --> 00:03:41,699
可以在保持或提升性能的同时提供计算上的优势
81
00:03:41,699 --> 00:03:43,599
为了验证这一假设
82
00:03:43,599 --> 00:03:46,079
研究团队进行了详细的实验设计
83
00:03:46,199 --> 00:03:51,839
他们选择了捆3LLM家族的8B 14B和32B基础模型作为研究对象
84
00:03:51,839 --> 00:03:55,219
通过链式思维COT推理中的Token商模式分析
85
00:03:55,219 --> 00:03:57,459
结合控制实验来调节这根商
86
00:03:57,460 --> 00:04:00,620
并在RLVR训练中选择性的更新策略梯度
87
00:04:00,620 --> 00:04:01,860
数据收集方面
88
00:04:01,860 --> 00:04:04,939
他们使用了M24 M25等数据集
89
00:04:04,939 --> 00:04:07,580
并在多个评估数据集上进行了验证
90
00:04:07,580 --> 00:04:08,900
实验结果显示
91
00:04:08,900 --> 00:04:11,980
高商Token在推理过程中起到了关键作用
92
00:04:11,980 --> 00:04:14,760
他们不仅连接了逻辑推理的各个环节
93
00:04:14,760 --> 00:04:18,319
还能通过调节节码温度来显著影响模型的性能
94
00:04:18,319 --> 00:04:19,240
具体来说
95
00:04:19,240 --> 00:04:21,819
降低高商Token的温度会降低性能
96
00:04:21,819 --> 00:04:24,060
而增加其温度则能提升性能
97
00:04:24,060 --> 00:04:24,620
此外
98
00:04:24,620 --> 00:04:27,980
RLVR在训练过程中保留了基础模型的商模式
99
00:04:27,980 --> 00:04:30,420
并且主要改变了高商Token的商值
100
00:04:30,420 --> 00:04:32,259
最令人振奋的是
101
00:04:32,259 --> 00:04:33,620
研究团队发现
102
00:04:33,620 --> 00:04:36,000
仅关注高商Token的策略梯度更新
103
00:04:36,000 --> 00:04:37,459
不仅没有降低性能
104
00:04:37,459 --> 00:04:40,639
反而在Koen3模型上显著提升了推理效果
105
00:04:40,639 --> 00:04:44,120
这一发现对于优化LM的推理能力具有重要意义
106
00:04:44,120 --> 00:04:46,480
尤其是在处理复杂推理任务时
107
00:04:46,480 --> 00:04:50,399
高商Token的聚焦策略能够平衡探索与训练稳定性
108
00:04:50,399 --> 00:04:52,560
为模型带来更大的性能提升
109
00:04:52,560 --> 00:04:57,100
总的来说这篇论文通过深入分析Token商对推理性能的影响
110
00:04:57,100 --> 00:05:01,019
揭示了高商少数Token在驱动LM推理中的关键作用
111
00:05:01,019 --> 00:05:04,720
为未来的LMU化提供了新的思路和方法
112
00:05:04,720 --> 00:05:08,220
这是本期节目的第三篇论文
113
00:05:08,220 --> 00:05:09,180
题目是Po
114
00:05:09,180 --> 00:05:12,760
延长的强化学习拓展大型语言模型的推理边界
115
00:05:12,760 --> 00:05:16,600
这篇论文目前在Hugging Face社区获得了115个点赞
116
00:05:16,600 --> 00:05:19,680
显示出它在研究社区中引起了广泛关注
117
00:05:19,680 --> 00:05:21,920
这篇论文的核心研究问题是
118
00:05:21,920 --> 00:05:26,820
延长的强化学习训练能否在大型语言模型中揭示出新的推理策略
119
00:05:26,819 --> 00:05:30,779
这些策略是基础模型在广泛采样下也无法获得的
120
00:05:30,779 --> 00:05:32,639
研究团队提出了一个假设
121
00:05:32,639 --> 00:05:34,779
通过延长的强化学习训练
122
00:05:34,779 --> 00:05:38,279
模型可以在其基础模型的基础上拓展推理能力
123
00:05:38,279 --> 00:05:40,079
发现新的解决方案路径
124
00:05:40,079 --> 00:05:42,079
并在各种任务中表现更好
125
00:05:42,079 --> 00:05:43,519
为了验证这一假设
126
00:05:43,519 --> 00:05:46,719
研究团队设计了一种名为Pro的新训练方法
127
00:05:46,719 --> 00:05:49,360
这种方法结合了KL散度控制
128
00:05:49,360 --> 00:05:52,259
参考策略重置以及一系列多样化的任务
129
00:05:52,259 --> 00:05:54,579
他们使用了三个模型进行实验
130
00:05:54,579 --> 00:05:55,939
DeepSea Car 1-1
131
00:05:55,939 --> 00:05:57,560
5B作为基础模型
132
00:05:57,560 --> 00:05:59,779
Demitra Research Reasoning宽1.5B
133
00:05:59,779 --> 00:06:01,660
作为经过Pro训练的模型
134
00:06:01,660 --> 00:06:04,519
以及DeepSea Car 1-7B用于比较
135
00:06:04,519 --> 00:06:05,600
在实验过程中
136
00:06:05,600 --> 00:06:09,100
Pro训练包括了超过2000步的强化学习训练
137
00:06:09,100 --> 00:06:11,819
同时引入了KL散度惩罚来保持伤
138
00:06:11,819 --> 00:06:13,220
并防止策略漂移
139
00:06:13,220 --> 00:06:14,980
参考策略会定期重置
140
00:06:14,980 --> 00:06:16,279
以允许持续改进
141
00:06:16,279 --> 00:06:18,060
训练数据涵盖了数学
142
00:06:18,060 --> 00:06:18,759
代码
143
00:06:18,759 --> 00:06:19,120
STEM
144
00:06:19,120 --> 00:06:21,560
逻辑谜题和指令跟随等多种任务
145
00:06:21,560 --> 00:06:24,480
共构建了一个包含136000个视力的
146
00:06:24,480 --> 00:06:25,800
多样化训练数据集
147
00:06:25,800 --> 00:06:27,160
研究结果显示
148
00:06:27,160 --> 00:06:29,259
经过强化学习训练的模型
149
00:06:29,259 --> 00:06:30,620
在各种任务中的表现
150
00:06:30,620 --> 00:06:32,100
显著优于基础模型
151
00:06:32,100 --> 00:06:32,700
例如
152
00:06:32,700 --> 00:06:33,900
在数学任务中
153
00:06:33,900 --> 00:06:36,900
PiSide1的提升达到了14.7%
154
00:06:36,900 --> 00:06:39,700
在编码任务中提升了13.9%
155
00:06:39,700 --> 00:06:42,640
在逻辑谜题中提升了54.8%
156
00:06:42,640 --> 00:06:45,860
在STEM推理任务中提升了25.1%
157
00:06:45,860 --> 00:06:49,080
在指令跟随任务中提升了18.1%
158
00:06:49,080 --> 00:06:49,439
此外
159
00:06:49,439 --> 00:06:50,540
研究还发现
160
00:06:50,540 --> 00:06:52,540
Pro训练在超过2000
161
00:06:52,540 --> 00:06:54,860
后仍能持续提升模型性能
162
00:06:54,860 --> 00:06:57,220
论文还引入了创造力指数
163
00:06:57,220 --> 00:06:59,160
来量化推理路径的吸引性
164
00:06:59,160 --> 00:07:00,180
结果表明
165
00:07:00,180 --> 00:07:01,879
延长的强化学习训练
166
00:07:01,879 --> 00:07:04,560
确实能够产生更具创新性的解决方案
167
00:07:04,560 --> 00:07:05,360
这一发现
168
00:07:05,360 --> 00:07:06,379
挑战了之前认为
169
00:07:06,379 --> 00:07:07,500
强化学习模型
170
00:07:07,500 --> 00:07:09,620
不会获得新推理能力的研究结论
171
00:07:09,620 --> 00:07:10,420
总的来说
172
00:07:10,420 --> 00:07:12,520
这篇论文提供了新的见解
173
00:07:12,520 --> 00:07:14,259
展示了在什么条件下
174
00:07:14,259 --> 00:07:17,560
强化学习能够有效拓展语言模型的推理边界
175
00:07:17,560 --> 00:07:18,920
研究结果表明
176
00:07:18,920 --> 00:07:21,500
通过稳定且延长的强化学习训练
177
00:07:22,540 --> 00:07:24,080
开发出超越基础模型
178
00:07:24,080 --> 00:07:25,800
初始能力的新的推理模式
179
00:07:25,800 --> 00:07:29,080
本期节目的第四篇论文
180
00:07:29,080 --> 00:07:30,220
我们来关注一篇
181
00:07:30,220 --> 00:07:31,480
名为Alpha 1
182
00:07:31,480 --> 00:07:33,120
测试时驱动大模型
183
00:07:33,120 --> 00:07:35,340
进行快慢思考的推理框架的研究
184
00:07:35,340 --> 00:07:37,740
这篇论文目前在Hugging Face社区
185
00:07:37,740 --> 00:07:39,180
获得了89个点赞
186
00:07:39,180 --> 00:07:42,660
显示出它在学术界和开发者社区中的广泛关注
187
00:07:42,660 --> 00:07:46,200
这篇论文的核心目标是解决大型推理模型
188
00:07:46,200 --> 00:07:47,860
LRMS在测试时
189
00:07:47,860 --> 00:07:50,140
如何动态调节推理过程的挑战
190
00:07:50,139 --> 00:07:52,539
研究人员提出了一个名为Alpha 1
191
00:07:52,539 --> 00:07:53,919
Alpha 1的框架
192
00:07:53,919 --> 00:07:56,879
旨在提升LRMS的推理能力和效率
193
00:07:56,879 --> 00:07:57,839
简单来说
194
00:07:57,839 --> 00:07:59,560
Alpha 1通过在测试时
195
00:07:59,560 --> 00:08:02,099
动态调度慢思考和快思考的转换
196
00:08:02,099 --> 00:08:06,680
帮助模型在深度分析和计算效率之间找到平衡
197
00:08:06,680 --> 00:08:07,379
具体来看
198
00:08:07,379 --> 00:08:11,180
研究团队使用了三个开源的LRMS作为基础模型
199
00:08:11,180 --> 00:08:12,719
分别是DeepSeq R1
200
00:08:12,719 --> 00:08:14,180
Distil QN1.5B
201
00:08:14,180 --> 00:08:15,079
DeepSeq R1
202
00:08:15,079 --> 00:08:17,379
Distil QN7B和QNQXRB
203
00:08:17,379 --> 00:08:18,899
他们在一系列涵盖数学
204
00:08:18,899 --> 00:08:22,279
编程和科学领域的六个基准测试上进行了实验
205
00:08:22,279 --> 00:08:23,699
包括M2024
206
00:08:23,699 --> 00:08:24,779
AMCR3
207
00:08:24,779 --> 00:08:25,759
Minerva Math等
208
00:08:25,759 --> 00:08:29,339
实验在NVIDIA L40S和A100GPU上进行
209
00:08:29,339 --> 00:08:32,480
确保了计算资源的充足和实验的可靠性
210
00:08:32,480 --> 00:08:37,120
论文的主要创新点在于引入了Alpha时刻AlphaMoment这一概念
211
00:08:37,120 --> 00:08:39,659
通过于Alpha和后Alpha时刻的调节
212
00:08:39,659 --> 00:08:43,340
Alpha1能够有效地在测试时对LRMS进行缩放
213
00:08:43,340 --> 00:08:45,320
研究人员还通过对比实验
214
00:08:45,320 --> 00:08:47,899
验证了Alpha1在问题解决准确性
215
00:08:47,899 --> 00:08:49,680
PiCity和推理效率
216
00:08:49,680 --> 00:08:51,700
FAP指标上的显著提升
217
00:08:51,700 --> 00:08:53,759
例如1.5B的模型
218
00:08:53,759 --> 00:08:54,920
在使用Alpha1后
219
00:08:54,920 --> 00:08:58,039
问题解决准确性提高了6.15%
220
00:08:58,039 --> 00:09:00,480
同时令牌长度减少了14%
221
00:09:00,480 --> 00:09:02,220
研究结果显示
222
00:09:02,220 --> 00:09:06,379
Alpha1不仅在准确性上超越了传统的测试时缩放方法
223
00:09:06,379 --> 00:09:07,899
如SE和Chain of Draft
224
00:09:07,899 --> 00:09:10,220
而且在推理效率上也表现出色
225
00:09:10,220 --> 00:09:11,060
特别是
226
00:09:11,060 --> 00:09:14,300
论文发现慢思考到快思考的线性调度方式
227
00:09:14,300 --> 00:09:16,440
能够带来最高的推理准确性
228
00:09:16,440 --> 00:09:20,279
这表明慢思考在提升推理效率方面起到了关键作用
229
00:09:20,279 --> 00:09:21,180
总体而言
230
00:09:21,180 --> 00:09:25,860
Alpha1为大型推理模型提供了一个通用的推理过程调节框架
231
00:09:25,860 --> 00:09:28,620
展示了慢思考和快思考的动态转换
232
00:09:28,620 --> 00:09:30,800
如何有效提升模型的推理能力
233
00:09:30,799 --> 00:09:34,839
这一研究不仅为LRMS的实际应用提供了新的思路
234
00:09:34,839 --> 00:09:38,719
也为未来在测试时优化模型推理提供了宝贵的经验
235
00:09:38,719 --> 00:09:44,899
这就是本期节目关于Alpha1测试时驱动大模型进行快慢思考的推理框架的介绍
236
00:09:44,899 --> 00:09:48,439
这是本期节目的第五篇论文
237
00:09:48,439 --> 00:09:48,939
题目是Small Flux
238
00:09:48,939 --> 00:09:52,439
一种用于经济高效型机器人的视觉
239
00:09:52,439 --> 00:09:53,079
语言
240
00:09:53,079 --> 00:09:54,059
动作模型
241
00:09:54,059 --> 00:09:58,000
这篇论文目前在Hugging Face社区获得了75个点赞
242
00:09:58,000 --> 00:10:00,980
论文的核心目标是解决现有大规模视觉
243
00:10:00,980 --> 00:10:01,600
语言
244
00:10:01,600 --> 00:10:02,299
动作
245
00:10:02,299 --> 00:10:02,779
Flux
246
00:10:02,779 --> 00:10:07,379
模型在机器人领域中面临的高训练成本和实际部署困难的问题
247
00:10:07,379 --> 00:10:09,879
研究团队提出了一个关键问题
248
00:10:09,879 --> 00:10:11,679
是否可以开发一种小型
249
00:10:11,679 --> 00:10:13,980
高效且由社区驱动的伐模型
250
00:10:13,980 --> 00:10:16,360
既能大幅降低训练和推理成本
251
00:10:16,360 --> 00:10:19,319
同时还能在机器人任务中保持竞争力
252
00:10:19,319 --> 00:10:20,720
论文的答案是Small Flux
253
00:10:20,720 --> 00:10:22,579
这是一种紧凑的伐模型
254
00:10:22,579 --> 00:10:26,179
专门设计用于单GPU训练和消费级设备的部署
255
00:10:26,179 --> 00:10:29,740
Small Flux通过利用社区收集的数据和一部推理技术
256
00:10:29,740 --> 00:10:33,539
实现了与更大规模模型相媲美的性能
257
00:10:33,539 --> 00:10:34,419
在方法论上
258
00:10:34,419 --> 00:10:37,019
Small Flux有一个紧凑的与训练视觉
259
00:10:37,019 --> 00:10:40,259
以N模型VLM和一个动作专家组成
260
00:10:40,259 --> 00:10:42,240
VLM负责处理语言指令
261
00:10:42,240 --> 00:10:44,620
RGB图像和机器人传感器状态
262
00:10:44,620 --> 00:10:48,919
而动作专家则通过交替的交叉注意力和自注意力快进行训练
263
00:10:48,919 --> 00:10:50,299
输出低级别动作
264
00:10:50,299 --> 00:10:51,259
数据集方面
265
00:10:51,259 --> 00:10:55,980
研究团队使用了来自Hugging Face的481个社区数据集的子集
266
00:10:55,980 --> 00:10:57,879
以及新的MetaWorld数据集
267
00:10:57,879 --> 00:11:00,679
和几个真实世界的机器人操作任务数据集
268
00:11:00,679 --> 00:11:01,820
训练过程中
269
00:11:01,820 --> 00:11:03,639
Small Flux通过模仿学习
270
00:11:03,639 --> 00:11:05,639
在社区数据集上进行运训练
271
00:11:05,639 --> 00:11:07,299
并使用现成的VLM
272
00:11:07,299 --> 00:11:08,419
如Kun 2.5
273
00:11:08,419 --> 00:11:09,860
VL3B Instruct
274
00:11:09,860 --> 00:11:11,220
自动生成任务描述
275
00:11:11,220 --> 00:11:12,639
以改进任务注视
276
00:11:12,639 --> 00:11:13,559
推理阶段
277
00:11:13,559 --> 00:11:14,700
一部推理技术
278
00:11:14,700 --> 00:11:17,340
将动作执行与观察处理和动作预测机
279
00:11:17,340 --> 00:11:19,320
从而提高了控制频率
280
00:11:19,320 --> 00:11:21,080
并减少了任务完成时间
281
00:11:21,080 --> 00:11:22,059
在评估中
282
00:11:22,059 --> 00:11:26,279
Small Flux在模拟和真实世界的机器人基准测试中表现出色
283
00:11:26,279 --> 00:11:29,740
特别是在识取、放置、堆叠和分类任务中
284
00:11:29,740 --> 00:11:31,299
优于其他Fla模型
285
00:11:31,299 --> 00:11:32,259
一部推理
286
00:11:32,259 --> 00:11:35,839
还使任务完成时间减少了约30%
287
00:11:35,839 --> 00:11:36,959
论文的结论表明
288
00:11:36,959 --> 00:11:39,000
通过利用社区驱动数据集
289
00:11:39,000 --> 00:11:41,600
优化模型架构和一部推理技术
290
00:11:41,600 --> 00:11:43,240
紧凑高效的Fla模型
291
00:11:43,240 --> 00:11:45,720
可以在机器人任务中取得竞争性表现
292
00:11:45,720 --> 00:11:47,299
Small Flux成功展示了
293
00:11:47,299 --> 00:11:49,720
开发经济高效型Fla模型的可行性
294
00:11:49,720 --> 00:11:52,240
为机器人研究提供了新的可能性
295
00:11:52,240 --> 00:11:55,419
并使更多资源有限的实际应用成为可能
296
00:11:55,419 --> 00:11:59,139
以上就是本期节目的全部内容
297
00:11:59,139 --> 00:12:00,459
感谢大家的收听
298
00:12:00,459 --> 00:12:02,059
如果你喜欢本期内容
299
00:12:02,059 --> 00:12:03,539
欢迎在评论区留言
300
00:12:03,539 --> 00:12:04,159
点赞
301
00:12:04,159 --> 00:12:04,740
转发
302
00:12:04,740 --> 00:12:05,979
并订阅我们的节目
303
00:12:05,979 --> 00:12:06,559
同时
304
00:12:06,559 --> 00:12:08,659
别忘了关注我们在小红书的账号
305
00:12:08,659 --> 00:12:09,199
ISOD
306
00:12:09,199 --> 00:12:10,539
我们下期节目再见
307
00:12:10,539 --> 00:12:12,179
Hayae
308
00:12:12,179 --> 00:12:28,179
��