Oscar Chen

ogchen

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

upvoted an article 6 days ago

Build awesome datasets for video generation

upvoted an article 6 days ago

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

View all activity

Organizations

None yet

ogchen's activity

upvoted a paper 5 days ago

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published 6 days ago • 29

upvoted 2 articles 6 days ago

Article

Build awesome datasets for video generation

7 days ago

• 24

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

7 days ago

• 47

reacted to Kseniase's post with 🔥 8 days ago

Post

7581

8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhances novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG -> CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

1 reply

upvoted a paper 12 days ago

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published 14 days ago • 40

liked 2 models 12 days ago

openai/whisper-large-v3-turbo

Automatic Speech Recognition • Updated Oct 4, 2024 • 8.78M • • 1.99k

facebook/seamless-m4t-v2-large

Automatic Speech Recognition • Updated Jan 4, 2024 • 177k • • 774

reacted to hexgrad's post with 👍 12 days ago

Post

5537

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.

upvoted 2 articles 13 days ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15

• 145

Article

Rearchitecting Hugging Face Uploads and Downloads

Nov 26, 2024

• 43

updated a collection 6 months ago

Work

Collection

1 item • Updated Aug 9, 2024