Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs Paper • 2411.08719 • Published Nov 10, 2024
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs Paper • 2412.14471 • Published Dec 19, 2024
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published 8 days ago • 1
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 53
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 16 days ago • 19
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 16 days ago • 19
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published 16 days ago • 6
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published Jan 17 • 8
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models Paper • 2501.00874 • Published Jan 1 • 13
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps Paper • 2412.15035 • Published Dec 19, 2024 • 4
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering Paper • 2411.09213 • Published Nov 14, 2024 • 7
Agent Skill Acquisition for Large Language Models via CycleQD Paper • 2410.14735 • Published Oct 16, 2024 • 2
Taipan: Efficient and Expressive State Space Language Models with Selective Attention Paper • 2410.18572 • Published Oct 24, 2024 • 18
Taipan: Efficient and Expressive State Space Language Models with Selective Attention Paper • 2410.18572 • Published Oct 24, 2024 • 18
A Comparative Study on Generative Models for High Resolution Solar Observation Imaging Paper • 2304.07169 • Published Apr 14, 2023
DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs Paper • 2111.02114 • Published Nov 3, 2021