jaspercatapang
/

R-star

Text Classification

Model card Files Files and versions Community

jaspercatapang commited on 21 days ago

Commit

f177d8c

•

1 Parent(s): d58d533

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ inference: false
 <img src="logo.png" width=25%>
 # Model Description
-RoBERTA ReRanker for Retrieved Results or **R*** (pronounced R-star) is an advanced model designed to enhance search results' relevance and accuracy through reranking. By integrating the retrieval capabilities of **R*** with generative models, this hybrid approach significantly enhances the relevance and contextual depth of search results. Based on the [RoBERTa tiny](https://huggingface.co/haisongzhang/roberta-tiny-cased) architecture, **R*** is specialized in distinguishing relevant from irrelevant query-passage pairs, thereby refining the output of LLMs in retrieval and generative tasks.
 ## Training Data
 R* was trained on a dataset derived from the MS MARCO passage ranking dataset, consisting of 2.5 million query-positive passage pairs and an equal number of query-negative passage pairs, totaling 5 million query-passage pairs. This ensures a balanced training approach, exposing R* to both relevant and irrelevant examples equally.

 <img src="logo.png" width=25%>
 # Model Description
+RoBERTA ReRanker for Retrieved Results or **R*** (pronounced R-star) is an advanced model designed to enhance search results' relevance and accuracy through reranking. By integrating the retrieval capabilities of **R*** with generative models, this hybrid approach significantly enhances the relevance and contextual depth of search results. Based on the [RoBERTa tiny](https://huggingface.co/haisongzhang/roberta-tiny-cased) architecture, **R*** is specialized in distinguishing relevant from irrelevant query-passage pairs, thereby refining the output of LLMs in retrieval and generative tasks. This model is an experiment featured and presented in [PACLIC 38 (2024)](https://sites.google.com/view/paclic38), which would be published in the ACL Anthology.
 ## Training Data
 R* was trained on a dataset derived from the MS MARCO passage ranking dataset, consisting of 2.5 million query-positive passage pairs and an equal number of query-negative passage pairs, totaling 5 million query-passage pairs. This ensures a balanced training approach, exposing R* to both relevant and irrelevant examples equally.