ArchitRastogi
/

BGE-Small-LegalEmbeddings-USCode

@@ -4,53 +4,103 @@ tags:
 - sentence-transformers
 - feature-extraction
 - sentence-similarity
 ---
-# {MODEL_NAME}
-This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
-<!--- Describe your model here -->
-## Usage (Sentence-Transformers)
-Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
-```
-pip install -U sentence-transformers
-```
-Then you can use the model like this:
 ```python
-from sentence_transformers import SentenceTransformer
-sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('{MODEL_NAME}')
-embeddings = model.encode(sentences)
-print(embeddings)
 ```
-## Evaluation Results
-<!--- Describe how your model was evaluated -->
-For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
-## Full Model Architecture
-```
-SentenceTransformer(
-  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
-  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
-  (2): Normalize()
-)
-```
-## Citing & Authors
-<!--- Describe where people can find more information -->

 - sentence-transformers
 - feature-extraction
 - sentence-similarity
+- embeddings
+- legal
+- USCode
+license: apache-2.0
+datasets:
+- ArchitRastogi/USCode-QAPairs-Finetuning
+model_creator: Archit Rastogi
+language:
+- en
+library_name: transformers
+base_model: BGE-Small
+fine_tuned_from: sentence-transformers/BGE-Small
+task_categories:
+- sentence-similarity
+- embeddings
+- feature-extraction
+model-index:
+  - name: BGE-Small-LegalEmbeddings-USCode
+    results:
+      - task:
+          type: sentence-similarity
+        dataset:
+          name: USCode-QAPairs-Finetuning
+          type: USCode-QAPairs-Finetuning
+        metrics:
+          - name: Accuracy
+            type: Accuracy
+            value: 72
+          - name: Recall
+            type: Recall
+            value: 75
+        source:
+          name: Evaluation on USLawQA Dataset
+          url: https://huggingface.co/datasets/ArchitRastogi/USLawQA
 ---
+# BGE-Small Fine-Tuned on USCode-QueryPairs
+This is a fine-tuned version of the BGE Small embedding model, trained on the [USCode-QueryPairs](https://huggingface.co/datasets/ArchitRastogi/USCode-QueryPairs) dataset, a subset of the [USLawQA](https://huggingface.co/datasets/ArchitRastogi/USLawQA) corpus. The model is optimized for generating embeddings for legal text, achieving 75% accuracy on the test set.
+## Overview
+- **Base Model**: BGE Small
+- **Dataset**: [USCode-QueryPairs](https://huggingface.co/datasets/ArchitRastogi/USCode-QueryPairs)
+- **Training Details**:
+  - **Hardware**: Google Colab (T4 GPU)
+  - **Training Time**: 2 hours
+- **Accuracy**: 75% on the test set from [USLawQA](https://huggingface.co/datasets/ArchitRastogi/USLawQA)
+## Applications
+This model is ideal for:
+- **Legal Text Retrieval**: Efficient semantic search across legal documents.
+- **Question Answering**: Answering legal queries based on context from the US Code.
+- **Embeddings Generation**: Generating high-quality embeddings for downstream legal NLP tasks.
+## Usage
+The model can be used with `model.encode` for generating embeddings. Below is an example usage snippet:
 ```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("model name")
+model = AutoModel.from_pretrained("model name")
+text = "Duties of the president"
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+#Printing the Embeddings
+print(output)
 ```
+## Evaluation
+The model was evaluated on the test set of [USLawQA](https://huggingface.co/datasets/ArchitRastogi/USLawQA) and achieved the following metrics:
+- **Accuracy**: 75%
+- **Task**: Semantic similarity and legal question answering.
+## Related Resources
+- [USCode-QueryPairs Dataset](https://huggingface.co/datasets/ArchitRastogi/USCode-QueryPairs)
+- [USLawQA Corpus](https://huggingface.co/datasets/ArchitRastogi/USLawQA)
+## 📧 Contact
+For any inquiries, suggestions, or feedback, feel free to reach out:
+**Archit Rastogi**
+📧 [[email protected]](mailto:[email protected])
+---
+## 📜 License
+This dataset is distributed under the [Apache 2.0 License](LICENSE). Please ensure compliance with applicable copyright laws when using this dataset.