-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
SubGen: Token Generation in Sublinear Time and Memory
Paper • 2402.06082 • Published • 12 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 50 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 90
Purushotam Radadia
PG
·
AI & ML interests
None yet
Organizations
None yet
LLM
-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
SubGen: Token Generation in Sublinear Time and Memory
Paper • 2402.06082 • Published • 12 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 50 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 90
Vision
datasets
0
None public yet