Upload The AI Revolution_ A Debate.wav
Browse fileshttps://www.youtube.com/watch?v=vei7uf9wOxI
@adamkadmon6339
2 weeks ago
Machine learning used to be about maths. Now it is about hardware, hype, opinions, and moving big software blocks around. A generation has come into being that lacks the basic ability to make the kinds of jumps in theory made in the 1980s and 1990s.
26
Reply
23 replies
@KevinKreger
2 weeks ago
OK boomer
18
Reply
@RickySupriyadi
2 weeks ago
i disagree, the scaling does emerge something.
2
Reply
@marilynlucas5128
2 weeks ago
😂exactly. Spent time listening to the presentation and I heard nothing important. I’m trying to understand the mathematical framework behind what he was presenting. The easiest way to convey your message in machine learning is to use mathematics, algorithms etc. If I don’t hear any of that, I can’t take the presentation seriously.
4
Reply
@bertobertoberto3
2 weeks ago
Welcome to engineering
4
Reply
@adamkadmon6339
2 weeks ago
@KevinKreger Just a fact. Not generationism.
Reply
@adamkadmon6339
2 weeks ago
@RickySupriyadi Scaling: amazing results. But not maths. Not a theory. Not a method. Not an algorithm.
Reply
@adamkadmon6339
2 weeks ago
@bertobertoberto3 Need both. One disappeared.
1
Reply
@MoreCompute
2 weeks ago
Why do you say this?
Have you not seen the video?
2
Reply
@ianmatejka3533
2 weeks ago
100% wrong. The transformer has been proven as a general purpose architecture across multiple domains. Researchers are now focused on more efficient learning algorithms. Reinforcement Learning has improved leaps and bounds with MCTS based methods. These approaches will offer more gains then trying to reinvent the transformer
1
Reply
@adamkadmon6339
2 weeks ago
@MoreCompute I checked the paper. Sure it's good work. Thorough. Yes there is maths in it. But let me elaborate, since my initial remark was a bit throwaway.
The pioneers looked at the deficiencies of symbolic AI, and dragged neural nets into existence. A few people did it. Now an army of ML'ers look at ARC (for example) and do test-time fine-tuning, basically because they can't think of anything else. Training a network for everything that happens to you is not what we do, and it is not practical in general. Chollet is right. People are avoiding the question, following trends, failing to innovate, failing to provide new maths or theories, basically using backprop, here with active data selection, both very old ideas. People are being professional, competent and derivative, not thoughtful, clever and innovative.
Reply
@adamkadmon6339
2 weeks ago
@ianmatejka3533 You're wrong. Even the transformer was just an architectural innovation on old ideas. No new maths in it except location embedding, which is godawful. RL, MCTS: both old now. I don't dispute the scaling successes or the remarkable LLM results. But the research culture is faddish and lacks mathematical intelligence. If you can't see past the transformer, you are just a symptom of this problem. Chollet's critique of LLMs is correct, and also proven, and training a network for every example input is, at base, a dumb thing to do.
1
Reply
@badrraitabcas
2 weeks ago
I don’t know how familiar you’re with AI. historically, ai and mathematical guarantees don’t go hand in hand. It seems that empirical results are what’s been driving the connectionist era. By ai I mean deep learning, obviously.
1
Reply
@adamkadmon6339
2 weeks ago
@badrraitabcas Sorry - writing anonymously. I know the people and history very well. I had backprop coded up before it was published, back when we hand-coded gradients. In the beginning of connectionism many tinkered. This gave way to strong theory - everything that is used today. Successful empirical results followed mathematically motivated ideas. Both are needed. I am not suggesting Jonas is tinkering. But it is as clear as day that a better idea is required for chaining on a single datapoint.
Reply
@adamkadmon6339
2 weeks ago
Also: there is no money to be made from a mathematical advance unless you keep it secret. It will be out of your fingers before you know it. So the incentive structure of the field is screwed up and rewards those who pytorch for companies, rather than those who invest in technically deep formalisms. This may go some way towards explaining the shallowness of the culture.
1
Reply
@ianmatejka3533
2 weeks ago
Calling transformers an architectural innovation of old ideas is overly reductive. LeNet introduced convolutional networks in 1989, and they dominated deep learning for decades. By that logic, all neural networks trace back to Ivakhnenko. Transformers, however, emerged from a cascade of innovations: Word2Vec proved we could learn embeddings, self-attention was explored in earlier networks, and transformers only appeared 7 years ago. Modern LLMs like GPT are even younger, with general intelligence capabilities emerging in just the last 3 years. Incredibly, what once required 175B parameters can now be done more efficiently with 8B models at 4-bit precision.
Reply
@ianmatejka3533
2 weeks ago
Reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) are also far from “old.” The generalization of MCTS without human input was only achieved 5 years ago with MuZero. Integrating MCTS with NLP is still a novel problem, tackled only recently by OpenAI. Other RL techniques, like self-play (SimPo, SPIN), are in their infancy and hold significant potential.
Reply
@ianmatejka3533
2 weeks ago
Optimizers and learning algorithms are rapidly evolving too. DPO and KTO have made fine-tuning more stable and accessible. Tokenization, a critical bottleneck for LLMs, is seeing progress with solutions like MegaByte, though they remain underexplored. Similarly, in-context learning and mechanistic interpretability are promising but under-researched areas.
Reply
@ianmatejka3533
2 weeks ago
The transformer isn’t the final AI architecture, but we’ve barely begun exploring its potential. There’s a wealth of low-hanging fruit yet to be studied. While Chollet’s critique has merit, many experts—like Ilya Sutskever, Amodei, Shazer, and Karpathy—believe transformers are sufficient for AGI, and optimizing this architecture should take priority over reinventing the wheel.
Reply
@rjDOTdev
2 weeks ago
@adamkadmon6339 Coming from an education background, I don't know anyone that doesn't need some practice before answering a novel question. Perhaps there are different goals here? ie. Human intelligence being a distinct goal from super intelligence.
Reply
@henrismith7472
2 weeks ago
Are you nerds really bickering about this instead of comprehending the magnitude of change this technology will bring? Even if everything stopped advancing where it is, and time was spent implementing it, I still don't think we understand the scale of impact that would have. The fact that we have 2 more scaling laws, extremely efficient and powerful chips being invented (e.g thermodynamic-computation), countless converging exponentials and positive feedback loops... To an outsider who's only just started learning how to code, it really seems like some of you need to zoom out a bit if that makes sense.
- .gitattributes +1 -0
- The AI Revolution_ A Debate.wav +3 -0
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
The[[:space:]]AI[[:space:]]Revolution_[[:space:]]A[[:space:]]Debate.wav filter=lfs diff=lfs merge=lfs -text
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b8d22e1ac6162b206bc4b91081eaa800ba1b68ed5dcd4abca9f8c8f7886616ac
|
3 |
+
size 43359404
|