-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 40 -
SubGen: Token Generation in Sublinear Time and Memory
Paper • 2402.06082 • Published • 10 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 46 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 74
Purushotam Radadia
PG
AI & ML interests
None yet
Organizations
None yet
Collections
2
models
2
datasets
None public yet