|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
tags: |
|
- bert |
|
- feature-extraction |
|
- text2vec |
|
datasets: |
|
- shibing624/nli_zh |
|
pipeline_tag: sentence-similarity |
|
|
|
--- |
|
简介: |
|
参考 https://github.com/shibing624/text2vec |
|
基于Cosent模型架构,使用hfl/chinese-roberta-wwm-ext作为基座模型,在中文STS-B数据集上重新微调训练,将max_seq_length从原有的128扩展到了512 |
|
eval_spearman:0.833 |
|
|
|
--- |
|
下游任务: |
|
基于text2vec库或sentence-transformer库均可调用。 |
|
文本向量表征: |
|
``` |
|
>>> from text2vec import SentenceModel, EncoderType |
|
>>> model = SentenceModel('EricLee/text2vec-roberta-512', encoder_type=EncoderType.FIRST_LAST_AVG, max_seq_length=512) |
|
>>> model.encode("今天天气不错啊") |
|
Embedding shape: (768,) |
|
``` |