ArchitRastogi commited on
Commit
21cc8d6
·
verified ·
1 Parent(s): 73be84a

added metrics

Browse files
Files changed (1) hide show
  1. README.md +78 -28
README.md CHANGED
@@ -4,53 +4,103 @@ tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- # {MODEL_NAME}
11
 
12
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
 
14
- <!--- Describe your model here -->
15
 
16
- ## Usage (Sentence-Transformers)
17
 
18
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
19
 
20
- ```
21
- pip install -U sentence-transformers
22
- ```
 
 
 
 
 
 
 
 
 
 
23
 
24
- Then you can use the model like this:
 
 
25
 
26
  ```python
27
- from sentence_transformers import SentenceTransformer
28
- sentences = ["This is an example sentence", "Each sentence is converted"]
 
 
 
 
 
 
 
 
29
 
30
- model = SentenceTransformer('{MODEL_NAME}')
31
- embeddings = model.encode(sentences)
32
- print(embeddings)
33
  ```
34
 
 
35
 
 
 
 
36
 
37
- ## Evaluation Results
38
 
39
- <!--- Describe how your model was evaluated -->
 
40
 
41
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
42
 
 
43
 
 
 
44
 
45
- ## Full Model Architecture
46
- ```
47
- SentenceTransformer(
48
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
49
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
50
- (2): Normalize()
51
- )
52
- ```
53
 
54
- ## Citing & Authors
 
 
55
 
56
- <!--- Describe where people can find more information -->
 
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
+ - embeddings
8
+ - legal
9
+ - USCode
10
+ license: apache-2.0
11
+ datasets:
12
+ - ArchitRastogi/USCode-QAPairs-Finetuning
13
+ model_creator: Archit Rastogi
14
+ language:
15
+ - en
16
+ library_name: transformers
17
+ base_model: BGE-Small
18
+ fine_tuned_from: sentence-transformers/BGE-Small
19
+ task_categories:
20
+ - sentence-similarity
21
+ - embeddings
22
+ - feature-extraction
23
+ model-index:
24
+ - name: BGE-Small-LegalEmbeddings-USCode
25
+ results:
26
+ - task:
27
+ type: sentence-similarity
28
+ dataset:
29
+ name: USCode-QAPairs-Finetuning
30
+ type: USCode-QAPairs-Finetuning
31
+ metrics:
32
+ - name: Accuracy
33
+ type: Accuracy
34
+ value: 72
35
+ - name: Recall
36
+ type: Recall
37
+ value: 75
38
+ source:
39
+ name: Evaluation on USLawQA Dataset
40
+ url: https://huggingface.co/datasets/ArchitRastogi/USLawQA
41
  ---
42
 
 
43
 
 
44
 
45
+ # BGE-Small Fine-Tuned on USCode-QueryPairs
46
 
47
+ This is a fine-tuned version of the BGE Small embedding model, trained on the [USCode-QueryPairs](https://huggingface.co/datasets/ArchitRastogi/USCode-QueryPairs) dataset, a subset of the [USLawQA](https://huggingface.co/datasets/ArchitRastogi/USLawQA) corpus. The model is optimized for generating embeddings for legal text, achieving 75% accuracy on the test set.
48
 
49
+ ## Overview
50
 
51
+ - **Base Model**: BGE Small
52
+ - **Dataset**: [USCode-QueryPairs](https://huggingface.co/datasets/ArchitRastogi/USCode-QueryPairs)
53
+ - **Training Details**:
54
+ - **Hardware**: Google Colab (T4 GPU)
55
+ - **Training Time**: 2 hours
56
+ - **Accuracy**: 75% on the test set from [USLawQA](https://huggingface.co/datasets/ArchitRastogi/USLawQA)
57
+
58
+ ## Applications
59
+
60
+ This model is ideal for:
61
+ - **Legal Text Retrieval**: Efficient semantic search across legal documents.
62
+ - **Question Answering**: Answering legal queries based on context from the US Code.
63
+ - **Embeddings Generation**: Generating high-quality embeddings for downstream legal NLP tasks.
64
 
65
+ ## Usage
66
+
67
+ The model can be used with `model.encode` for generating embeddings. Below is an example usage snippet:
68
 
69
  ```python
70
+ # Load model directly
71
+ from transformers import AutoTokenizer, AutoModel
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained("model name")
74
+ model = AutoModel.from_pretrained("model name")
75
+ text = "Duties of the president"
76
+ inputs = tokenizer(text, return_tensors="pt")
77
+ outputs = model(**inputs)
78
+ #Printing the Embeddings
79
+ print(output)
80
 
 
 
 
81
  ```
82
 
83
+ ## Evaluation
84
 
85
+ The model was evaluated on the test set of [USLawQA](https://huggingface.co/datasets/ArchitRastogi/USLawQA) and achieved the following metrics:
86
+ - **Accuracy**: 75%
87
+ - **Task**: Semantic similarity and legal question answering.
88
 
89
+ ## Related Resources
90
 
91
+ - [USCode-QueryPairs Dataset](https://huggingface.co/datasets/ArchitRastogi/USCode-QueryPairs)
92
+ - [USLawQA Corpus](https://huggingface.co/datasets/ArchitRastogi/USLawQA)
93
 
94
+ ## 📧 Contact
95
 
96
+ For any inquiries, suggestions, or feedback, feel free to reach out:
97
 
98
+ **Archit Rastogi**
99
100
 
 
 
 
 
 
 
 
 
101
 
102
+ ---
103
+
104
+ ## 📜 License
105
 
106
+ This dataset is distributed under the [Apache 2.0 License](LICENSE). Please ensure compliance with applicable copyright laws when using this dataset.