|
--- |
|
base_model: |
|
- meta-llama/Llama-3.1-8B-Instruct |
|
language: |
|
- en |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- rag |
|
library_name: transformers |
|
--- |
|
|
|
|
|
<div align="center"> |
|
<b style="font-size: 40px;">Ext2Gen-8B-R2</b> |
|
</div> |
|
|
|
Note: We are still working on this. |
|
|
|
Are you looking for a more robust and reliable generation model for RAG system? |
|
|
|
Here is a Ext2Gen-8B-R2 model that effectively mitigates hallucinations caused by retrieval noise and information overload. |
|
|
|
See the details in our paper [Link](https://arxiv.org/pdf/2503.04789) |
|
|
|
|
|
### What is Ext2Gen-8B-R2? |
|
Ext2Gen-8B-R2 is built upon Llama3.2-8B-Instruct, incorporating preference-aligned fine-tuning through pairwise feedback learning. |
|
|
|
This training strategy enables the model to: |
|
- Extract highly relevant sentences from retrieved chunks before generating an answer. |
|
- Filter out irrelevant or misleading information, reducing hallucinations. |
|
- Align generation with human preferences by optimizing for faithfulness, completeness, and conciseness. |
|
|
|
|
|
### Why does Ext2Gen-8B-R2 outperform standard RAG models? |
|
Standard RAG models often struggle due to: |
|
- Uncertain Placement – Relevant information may appear in unpredictable locations within retrieved chunks, making it difficult for LLMs to utilize it effectively. |
|
- Information Overload – The presence of irrelevant chunks can distract the model, leading to errors or hallucinations. |
|
- Lack of Alignment – Most generation models are not explicitly trained to prioritize relevant content over noise. |
|
|
|
|
|
### Need a Faster Inference? |
|
Our Ext2Gen model writes the sentences related to the query first before generating the answer. So, it needs more latency before getting the answer. |
|
|
|
If you don't want to see the extracted sentences but want to directly see the answer with low latency, use its variant we call Gen-8B-R2. |
|
|
|
Link: https://huggingface.co/DISLab/Gen-8B-R2 |
|
|
|
This model skips the sentence extraction phase but remains its high robustness comparable to Ext2Gen-8B-R2. |
|
|
|
|
|
### Recommended Prompt |
|
|
|
- query: the query to answer |
|
- chunk_list: the list of retrieved chunks, e.g., ["chunk 1", "chunk 2", "chunk 3"] |
|
|
|
```python |
|
|
|
def prepare_sample_text(prompt): |
|
row_json = [{"role": "user", "content": prompt}] |
|
return tokenizer.apply_chat_template(row_json, tokenize=False) |
|
|
|
def format_prompt_template(query, chunk_list): |
|
|
|
chunk_list = ['[Chunk ID: '+ str(idx+1) + '] ' + chunk_text for idx, chunk_text in enumerate(chunk_list)] |
|
chunk_list = ' |
|
|
|
'.join(chunk_list) |
|
|
|
prompt = ''' |
|
You are an expert assistant trained to extract essential sentences from document chunks and generate answers based on the extracted sentences. |
|
Your task is twofold: |
|
- Extraction: Identify sentences that contribute to constructing a precise and accurate response to the given query. |
|
- Generation: Formulate a concise and coherent answer based on the extracted sentences. |
|
|
|
|
|
### Extraction Instruction: |
|
- A query will be provided for you to answer. |
|
- Extract only the sentences that contribute to forming an answer to the query. |
|
- Ensure that the extracted sentences are sufficient to derive a correct and complete answer. |
|
- If no relevant sentences are found in the provided chunks, return an empty list. |
|
|
|
|
|
### Generation Instruction: |
|
- Use the extracted sentences to generate a well-formed answer to the query. |
|
- If no sentences are extracted, return "No Answer". |
|
|
|
|
|
### Output Example: |
|
Extracted Sentences: |
|
- Sentence 1 |
|
- Sentence 2 |
|
|
|
Answer: Your Answer |
|
|
|
|
|
### Query: |
|
%s |
|
|
|
|
|
### Chunk List: |
|
%s |
|
|
|
|
|
### Output: |
|
''' % (query, chunk_list) |
|
|
|
return prompt.strip() |
|
|
|
|
|
prompt = format_prompt_template(query, noisy_chunks) |
|
prompt = prepare_sample_text(prompt) |
|
``` |
|
|
|
|
|
Note that this prompt outputs both extracted relevant sentences and the answer to the query. |
|
|
|
The output follows a consistent format as seen in an example below. |
|
|
|
``` |
|
Extracted Sentences: |
|
- The estimated number of deaths is 150-300,000, mainly Jews. |
|
|
|
Answer: The estimated number of deaths at Chelmno is 150-300,000, mainly Jews. |
|
``` |
|
|
|
The number of extracted sentences vary depending on the QA. |
|
|
|
### Recommended Generation Parameters |
|
|
|
```python |
|
max_new_tokens=1024, # or 2048 |
|
do_sample=True, |
|
temperature=0.8, |
|
top_p=0.9, |
|
``` |
|
|
|
### Performance Benchmark |
|
Our evaluations demonstrate that Ext2Gen-8B-R2 significantly enhances robustness in RAG systems: |
|
* We conduct a QA task using RAG Systems on NQ, MS-MARCO, HotpotQA datasets. |
|
* The difference is the generation backbone: Llama3.1-8B-Instruct vs. Ext2Gen-8B-R2 |
|
|
|
See the results in the Figure below: |
|
|
|
 |
|
|