Update README.md
Browse files
README.md
CHANGED
@@ -1,144 +1,124 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
library_name: sentence-transformers
|
6 |
-
pipeline_tag: sentence-similarity
|
7 |
-
tags:
|
8 |
-
- sentence-transformers
|
9 |
-
- sentence-similarity
|
10 |
-
- feature-extraction
|
11 |
-
widget: []
|
12 |
---
|
13 |
|
14 |
-
#
|
15 |
|
16 |
-
This is
|
17 |
|
18 |
-
|
19 |
|
20 |
-
|
21 |
-
- **Model Type:** Sentence Transformer
|
22 |
-
- **Base model:** [WhereIsAI/pre-UAE-Medical-Large-V1](https://huggingface.co/WhereIsAI/pre-UAE-Medical-Large-V1) <!-- at revision c989d8965d489e9a6e873eabce06e6ef6f2a0188 -->
|
23 |
-
- **Maximum Sequence Length:** 512 tokens
|
24 |
-
- **Output Dimensionality:** 1024 tokens
|
25 |
-
- **Similarity Function:** Cosine Similarity
|
26 |
-
<!-- - **Training Dataset:** Unknown -->
|
27 |
-
<!-- - **Language:** Unknown -->
|
28 |
-
<!-- - **License:** Unknown -->
|
29 |
|
30 |
-
|
31 |
|
32 |
-
-
|
33 |
-
-
|
34 |
-
-
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
|
49 |
-
First install the Sentence Transformers library:
|
50 |
|
51 |
-
|
52 |
-
pip install -U sentence-transformers
|
53 |
-
```
|
54 |
|
55 |
-
|
56 |
-
```python
|
57 |
-
from sentence_transformers import SentenceTransformer
|
58 |
|
59 |
-
|
60 |
-
|
61 |
-
# Run inference
|
62 |
-
sentences = [
|
63 |
-
'The weather is lovely today.',
|
64 |
-
"It's so sunny outside!",
|
65 |
-
'He drove to the stadium.',
|
66 |
-
]
|
67 |
-
embeddings = model.encode(sentences)
|
68 |
-
print(embeddings.shape)
|
69 |
-
# [3, 1024]
|
70 |
-
|
71 |
-
# Get the similarity scores for the embeddings
|
72 |
-
similarities = model.similarity(embeddings, embeddings)
|
73 |
-
print(similarities.shape)
|
74 |
-
# [3, 3]
|
75 |
```
|
76 |
|
77 |
-
|
78 |
-
### Direct Usage (Transformers)
|
79 |
-
|
80 |
-
<details><summary>Click to see the direct usage in Transformers</summary>
|
81 |
|
82 |
-
|
83 |
-
|
|
|
84 |
|
85 |
-
|
86 |
-
|
87 |
|
88 |
-
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
-
|
|
|
|
|
91 |
|
92 |
-
|
93 |
-
|
94 |
|
95 |
-
|
96 |
-
|
|
|
97 |
|
98 |
-
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
99 |
-
-->
|
100 |
|
101 |
-
|
102 |
-
## Bias, Risks and Limitations
|
103 |
|
104 |
-
|
105 |
-
-->
|
106 |
|
107 |
-
|
108 |
-
|
|
|
109 |
|
110 |
-
|
111 |
-
|
|
|
112 |
|
113 |
-
## Training Details
|
114 |
|
115 |
-
|
116 |
-
|
117 |
-
- Sentence Transformers: 3.0.1
|
118 |
-
- Transformers: 4.42.3
|
119 |
-
- PyTorch: 2.3.0+cu121
|
120 |
-
- Accelerate: 0.30.1
|
121 |
-
- Datasets: 2.19.1
|
122 |
-
- Tokenizers: 0.19.1
|
123 |
|
124 |
-
|
|
|
|
|
|
|
|
|
|
|
125 |
|
126 |
-
### BibTeX
|
127 |
|
128 |
-
|
129 |
-
|
130 |
|
131 |
-
|
132 |
-
|
|
|
133 |
|
134 |
-
<!--
|
135 |
-
## Model Card Authors
|
136 |
|
137 |
-
|
138 |
-
-->
|
139 |
|
140 |
-
|
141 |
-
## Model Card Contact
|
142 |
|
143 |
-
|
144 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: mit
|
3 |
+
base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
|
4 |
+
model-index:
|
5 |
+
- name: WhereIsAI/pubmed-angle-base-en
|
6 |
+
results: []
|
7 |
+
datasets:
|
8 |
+
- WhereIsAI/medical-triples
|
9 |
+
- WhereIsAI/pubmedqa-test-angle-format-a
|
10 |
+
- qiaojin/PubMedQA
|
11 |
+
- ncbi/pubmed
|
12 |
+
language:
|
13 |
+
- en
|
14 |
library_name: sentence-transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
+
# WhereIsAI/pubmed-angle-base-en
|
18 |
|
19 |
+
This model is an example model for the Chinese blog post [【coming soon】](#) and [angle tutorial](https://angle.readthedocs.io/en/latest/notes/tutorial.html#tutorial).
|
20 |
|
21 |
+
It was fine-tuned with [AnglE Loss](https://arxiv.org/abs/2309.12871) using the official [angle-emb](https://github.com/SeanLee97/AnglE).
|
22 |
|
23 |
+
Related model: [WhereIsAI/pubmed-angle-base-en](https://huggingface.co/WhereIsAI/pubmed-angle-base-en)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
+
**1. Training Setup:**
|
26 |
|
27 |
+
- Base model: [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext)
|
28 |
+
- Training Data: [WhereIsAI/medical-triples](https://huggingface.co/datasets/WhereIsAI/medical-triples), processed from [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA).
|
29 |
+
- Test Data: [WhereIsAI/pubmedqa-test-angle-format-a](https://huggingface.co/datasets/WhereIsAI/pubmedqa-test-angle-format-a)
|
30 |
|
31 |
+
**2. Performance:**
|
32 |
|
33 |
+
| Model | Pooling Strategy | Spearman's Correlation |
|
34 |
+
|----------------------------------------|------------------|:----------------------:|
|
35 |
+
| tavakolih/all-MiniLM-L6-v2-pubmed-full | avg | 84.56 |
|
36 |
+
| NeuML/pubmedbert-base-embeddings | avg | 84.88 |
|
37 |
+
| WhereIsAI/pubmed-angle-base-en | cls | 86.01 |
|
38 |
+
| **WhereIsAI/pubmed-angle-large-en** | cls | 86.21 |
|
39 |
|
40 |
+
**3. Citation**
|
41 |
|
42 |
+
Cite AnglE following 👉 https://huggingface.co/WhereIsAI/pubmed-angle-base-en#citation
|
43 |
|
|
|
44 |
|
45 |
+
## Usage
|
|
|
|
|
46 |
|
47 |
+
### via angle-emb
|
|
|
|
|
48 |
|
49 |
+
```bash
|
50 |
+
python -m pip install -U angle-emb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
```
|
52 |
|
53 |
+
Example:
|
|
|
|
|
|
|
54 |
|
55 |
+
```python
|
56 |
+
from angle_emb import AnglE
|
57 |
+
from angle_emb.utils import cosine_similarity
|
58 |
|
59 |
+
# 1. load
|
60 |
+
angle = AnglE.from_pretrained('WhereIsAI/pubmed-angle-large-en', pooling_strategy='cls').cuda()
|
61 |
|
62 |
+
query = 'How to treat childhood obesity and overweight?'
|
63 |
+
docs = [
|
64 |
+
query,
|
65 |
+
'The child is overweight. Parents should relieve their children\'s symptoms through physical activity and healthy eating. First, they can let them do some aerobic exercise, such as jogging, climbing, swimming, etc. In terms of diet, children should eat more cucumbers, carrots, spinach, etc. Parents should also discourage their children from eating fried foods and dried fruits, which are high in calories and fat. Parents should not let their children lie in bed without moving after eating. If their children\'s condition is serious during the treatment of childhood obesity, parents should go to the hospital for treatment under the guidance of a doctor in a timely manner.',
|
66 |
+
'If you want to treat tonsillitis better, you can choose some anti-inflammatory drugs under the guidance of a doctor, or use local drugs, such as washing the tonsil crypts, injecting drugs into the tonsils, etc. If your child has a sore throat, you can also give him or her some pain relievers. If your child has a fever, you can give him or her antipyretics. If the condition is serious, seek medical attention as soon as possible. If the medication does not have a good effect and the symptoms recur, the author suggests surgical treatment. Parents should also make sure to keep their children warm to prevent them from catching a cold and getting tonsillitis again.',
|
67 |
+
]
|
68 |
|
69 |
+
# 2. encode
|
70 |
+
embeddings = angle.encode(docs)
|
71 |
+
query_emb = embeddings[0]
|
72 |
|
73 |
+
for doc, emb in zip(docs[1:], embeddings[1:]):
|
74 |
+
print(cosine_similarity(query_emb, emb))
|
75 |
|
76 |
+
# 0.8181731743429251
|
77 |
+
# 0.43483792889514516
|
78 |
+
```
|
79 |
|
|
|
|
|
80 |
|
81 |
+
### via sentence-transformers
|
|
|
82 |
|
83 |
+
Install sentence-transformers
|
|
|
84 |
|
85 |
+
```bash
|
86 |
+
python -m pip install -U sentence-transformers
|
87 |
+
```
|
88 |
|
89 |
+
```python
|
90 |
+
from sentence_transformers import SentenceTransformer
|
91 |
+
from sentence_transformers.util import cos_sim
|
92 |
|
|
|
93 |
|
94 |
+
# 1. load model
|
95 |
+
model = SentenceTransformer("WhereIsAI/pubmed-angle-large-en")
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
|
97 |
+
query = 'How to treat childhood obesity and overweight?'
|
98 |
+
docs = [
|
99 |
+
query,
|
100 |
+
'The child is overweight. Parents should relieve their children\'s symptoms through physical activity and healthy eating. First, they can let them do some aerobic exercise, such as jogging, climbing, swimming, etc. In terms of diet, children should eat more cucumbers, carrots, spinach, etc. Parents should also discourage their children from eating fried foods and dried fruits, which are high in calories and fat. Parents should not let their children lie in bed without moving after eating. If their children\'s condition is serious during the treatment of childhood obesity, parents should go to the hospital for treatment under the guidance of a doctor in a timely manner.',
|
101 |
+
'If you want to treat tonsillitis better, you can choose some anti-inflammatory drugs under the guidance of a doctor, or use local drugs, such as washing the tonsil crypts, injecting drugs into the tonsils, etc. If your child has a sore throat, you can also give him or her some pain relievers. If your child has a fever, you can give him or her antipyretics. If the condition is serious, seek medical attention as soon as possible. If the medication does not have a good effect and the symptoms recur, the author suggests surgical treatment. Parents should also make sure to keep their children warm to prevent them from catching a cold and getting tonsillitis again.',
|
102 |
+
]
|
103 |
|
|
|
104 |
|
105 |
+
# 2. encode
|
106 |
+
embeddings = model.encode(docs)
|
107 |
|
108 |
+
similarities = cos_sim(embeddings[0], embeddings[1:])
|
109 |
+
print('similarities:', similarities)
|
110 |
+
```
|
111 |
|
|
|
|
|
112 |
|
113 |
+
## Citation
|
|
|
114 |
|
115 |
+
If you use this model for academic purpose, please cite AnglE's paper, as follows:
|
|
|
116 |
|
117 |
+
```bibtext
|
118 |
+
@article{li2023angle,
|
119 |
+
title={AnglE-optimized Text Embeddings},
|
120 |
+
author={Li, Xianming and Li, Jing},
|
121 |
+
journal={arXiv preprint arXiv:2309.12871},
|
122 |
+
year={2023}
|
123 |
+
}
|
124 |
+
```
|