SeanLee97 commited on
Commit
5e76795
·
verified ·
1 Parent(s): 25966c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -104
README.md CHANGED
@@ -1,144 +1,124 @@
1
  ---
2
- base_model: WhereIsAI/pre-UAE-Medical-Large-V1
3
- datasets: []
4
- language: []
 
 
 
 
 
 
 
 
 
5
  library_name: sentence-transformers
6
- pipeline_tag: sentence-similarity
7
- tags:
8
- - sentence-transformers
9
- - sentence-similarity
10
- - feature-extraction
11
- widget: []
12
  ---
13
 
14
- # SentenceTransformer based on WhereIsAI/pre-UAE-Medical-Large-V1
15
 
16
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [WhereIsAI/pre-UAE-Medical-Large-V1](https://huggingface.co/WhereIsAI/pre-UAE-Medical-Large-V1). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
17
 
18
- ## Model Details
19
 
20
- ### Model Description
21
- - **Model Type:** Sentence Transformer
22
- - **Base model:** [WhereIsAI/pre-UAE-Medical-Large-V1](https://huggingface.co/WhereIsAI/pre-UAE-Medical-Large-V1) <!-- at revision c989d8965d489e9a6e873eabce06e6ef6f2a0188 -->
23
- - **Maximum Sequence Length:** 512 tokens
24
- - **Output Dimensionality:** 1024 tokens
25
- - **Similarity Function:** Cosine Similarity
26
- <!-- - **Training Dataset:** Unknown -->
27
- <!-- - **Language:** Unknown -->
28
- <!-- - **License:** Unknown -->
29
 
30
- ### Model Sources
31
 
32
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
33
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
34
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
35
 
36
- ### Full Model Architecture
37
 
38
- ```
39
- SentenceTransformer(
40
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
41
- (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
42
- )
43
- ```
44
 
45
- ## Usage
46
 
47
- ### Direct Usage (Sentence Transformers)
48
 
49
- First install the Sentence Transformers library:
50
 
51
- ```bash
52
- pip install -U sentence-transformers
53
- ```
54
 
55
- Then you can load this model and run inference.
56
- ```python
57
- from sentence_transformers import SentenceTransformer
58
 
59
- # Download from the 🤗 Hub
60
- model = SentenceTransformer("WhereIsAI/pre-UAE-Medical-Large-V1")
61
- # Run inference
62
- sentences = [
63
- 'The weather is lovely today.',
64
- "It's so sunny outside!",
65
- 'He drove to the stadium.',
66
- ]
67
- embeddings = model.encode(sentences)
68
- print(embeddings.shape)
69
- # [3, 1024]
70
-
71
- # Get the similarity scores for the embeddings
72
- similarities = model.similarity(embeddings, embeddings)
73
- print(similarities.shape)
74
- # [3, 3]
75
  ```
76
 
77
- <!--
78
- ### Direct Usage (Transformers)
79
-
80
- <details><summary>Click to see the direct usage in Transformers</summary>
81
 
82
- </details>
83
- -->
 
84
 
85
- <!--
86
- ### Downstream Usage (Sentence Transformers)
87
 
88
- You can finetune this model on your own dataset.
 
 
 
 
 
89
 
90
- <details><summary>Click to expand</summary>
 
 
91
 
92
- </details>
93
- -->
94
 
95
- <!--
96
- ### Out-of-Scope Use
 
97
 
98
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
99
- -->
100
 
101
- <!--
102
- ## Bias, Risks and Limitations
103
 
104
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
105
- -->
106
 
107
- <!--
108
- ### Recommendations
 
109
 
110
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
111
- -->
 
112
 
113
- ## Training Details
114
 
115
- ### Framework Versions
116
- - Python: 3.10.12
117
- - Sentence Transformers: 3.0.1
118
- - Transformers: 4.42.3
119
- - PyTorch: 2.3.0+cu121
120
- - Accelerate: 0.30.1
121
- - Datasets: 2.19.1
122
- - Tokenizers: 0.19.1
123
 
124
- ## Citation
 
 
 
 
 
125
 
126
- ### BibTeX
127
 
128
- <!--
129
- ## Glossary
130
 
131
- *Clearly define terms in order to be accessible across audiences.*
132
- -->
 
133
 
134
- <!--
135
- ## Model Card Authors
136
 
137
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
138
- -->
139
 
140
- <!--
141
- ## Model Card Contact
142
 
143
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
144
- -->
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
4
+ model-index:
5
+ - name: WhereIsAI/pubmed-angle-base-en
6
+ results: []
7
+ datasets:
8
+ - WhereIsAI/medical-triples
9
+ - WhereIsAI/pubmedqa-test-angle-format-a
10
+ - qiaojin/PubMedQA
11
+ - ncbi/pubmed
12
+ language:
13
+ - en
14
  library_name: sentence-transformers
 
 
 
 
 
 
15
  ---
16
 
17
+ # WhereIsAI/pubmed-angle-base-en
18
 
19
+ This model is an example model for the Chinese blog post [【coming soon】](#) and [angle tutorial](https://angle.readthedocs.io/en/latest/notes/tutorial.html#tutorial).
20
 
21
+ It was fine-tuned with [AnglE Loss](https://arxiv.org/abs/2309.12871) using the official [angle-emb](https://github.com/SeanLee97/AnglE).
22
 
23
+ Related model: [WhereIsAI/pubmed-angle-base-en](https://huggingface.co/WhereIsAI/pubmed-angle-base-en)
 
 
 
 
 
 
 
 
24
 
25
+ **1. Training Setup:**
26
 
27
+ - Base model: [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext)
28
+ - Training Data: [WhereIsAI/medical-triples](https://huggingface.co/datasets/WhereIsAI/medical-triples), processed from [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA).
29
+ - Test Data: [WhereIsAI/pubmedqa-test-angle-format-a](https://huggingface.co/datasets/WhereIsAI/pubmedqa-test-angle-format-a)
30
 
31
+ **2. Performance:**
32
 
33
+ | Model | Pooling Strategy | Spearman's Correlation |
34
+ |----------------------------------------|------------------|:----------------------:|
35
+ | tavakolih/all-MiniLM-L6-v2-pubmed-full | avg | 84.56 |
36
+ | NeuML/pubmedbert-base-embeddings | avg | 84.88 |
37
+ | WhereIsAI/pubmed-angle-base-en | cls | 86.01 |
38
+ | **WhereIsAI/pubmed-angle-large-en** | cls | 86.21 |
39
 
40
+ **3. Citation**
41
 
42
+ Cite AnglE following 👉 https://huggingface.co/WhereIsAI/pubmed-angle-base-en#citation
43
 
 
44
 
45
+ ## Usage
 
 
46
 
47
+ ### via angle-emb
 
 
48
 
49
+ ```bash
50
+ python -m pip install -U angle-emb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  ```
52
 
53
+ Example:
 
 
 
54
 
55
+ ```python
56
+ from angle_emb import AnglE
57
+ from angle_emb.utils import cosine_similarity
58
 
59
+ # 1. load
60
+ angle = AnglE.from_pretrained('WhereIsAI/pubmed-angle-large-en', pooling_strategy='cls').cuda()
61
 
62
+ query = 'How to treat childhood obesity and overweight?'
63
+ docs = [
64
+ query,
65
+ 'The child is overweight. Parents should relieve their children\'s symptoms through physical activity and healthy eating. First, they can let them do some aerobic exercise, such as jogging, climbing, swimming, etc. In terms of diet, children should eat more cucumbers, carrots, spinach, etc. Parents should also discourage their children from eating fried foods and dried fruits, which are high in calories and fat. Parents should not let their children lie in bed without moving after eating. If their children\'s condition is serious during the treatment of childhood obesity, parents should go to the hospital for treatment under the guidance of a doctor in a timely manner.',
66
+ 'If you want to treat tonsillitis better, you can choose some anti-inflammatory drugs under the guidance of a doctor, or use local drugs, such as washing the tonsil crypts, injecting drugs into the tonsils, etc. If your child has a sore throat, you can also give him or her some pain relievers. If your child has a fever, you can give him or her antipyretics. If the condition is serious, seek medical attention as soon as possible. If the medication does not have a good effect and the symptoms recur, the author suggests surgical treatment. Parents should also make sure to keep their children warm to prevent them from catching a cold and getting tonsillitis again.',
67
+ ]
68
 
69
+ # 2. encode
70
+ embeddings = angle.encode(docs)
71
+ query_emb = embeddings[0]
72
 
73
+ for doc, emb in zip(docs[1:], embeddings[1:]):
74
+ print(cosine_similarity(query_emb, emb))
75
 
76
+ # 0.8181731743429251
77
+ # 0.43483792889514516
78
+ ```
79
 
 
 
80
 
81
+ ### via sentence-transformers
 
82
 
83
+ Install sentence-transformers
 
84
 
85
+ ```bash
86
+ python -m pip install -U sentence-transformers
87
+ ```
88
 
89
+ ```python
90
+ from sentence_transformers import SentenceTransformer
91
+ from sentence_transformers.util import cos_sim
92
 
 
93
 
94
+ # 1. load model
95
+ model = SentenceTransformer("WhereIsAI/pubmed-angle-large-en")
 
 
 
 
 
 
96
 
97
+ query = 'How to treat childhood obesity and overweight?'
98
+ docs = [
99
+ query,
100
+ 'The child is overweight. Parents should relieve their children\'s symptoms through physical activity and healthy eating. First, they can let them do some aerobic exercise, such as jogging, climbing, swimming, etc. In terms of diet, children should eat more cucumbers, carrots, spinach, etc. Parents should also discourage their children from eating fried foods and dried fruits, which are high in calories and fat. Parents should not let their children lie in bed without moving after eating. If their children\'s condition is serious during the treatment of childhood obesity, parents should go to the hospital for treatment under the guidance of a doctor in a timely manner.',
101
+ 'If you want to treat tonsillitis better, you can choose some anti-inflammatory drugs under the guidance of a doctor, or use local drugs, such as washing the tonsil crypts, injecting drugs into the tonsils, etc. If your child has a sore throat, you can also give him or her some pain relievers. If your child has a fever, you can give him or her antipyretics. If the condition is serious, seek medical attention as soon as possible. If the medication does not have a good effect and the symptoms recur, the author suggests surgical treatment. Parents should also make sure to keep their children warm to prevent them from catching a cold and getting tonsillitis again.',
102
+ ]
103
 
 
104
 
105
+ # 2. encode
106
+ embeddings = model.encode(docs)
107
 
108
+ similarities = cos_sim(embeddings[0], embeddings[1:])
109
+ print('similarities:', similarities)
110
+ ```
111
 
 
 
112
 
113
+ ## Citation
 
114
 
115
+ If you use this model for academic purpose, please cite AnglE's paper, as follows:
 
116
 
117
+ ```bibtext
118
+ @article{li2023angle,
119
+ title={AnglE-optimized Text Embeddings},
120
+ author={Li, Xianming and Li, Jing},
121
+ journal={arXiv preprint arXiv:2309.12871},
122
+ year={2023}
123
+ }
124
+ ```