ai-forever
/

ruElectra-large

Inference Endpoints

Model card Files Files and versions Community

ai-forever commited on Jul 28, 2023

Commit

da6ad08

·

1 Parent(s): 49cf776

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -10,12 +10,18 @@ tags:
 # ruELECTRA large model multitask (cased) for Sentence Embeddings in Russian language.
 For better quality, use mean token embeddings.
 ## Usage (HuggingFace Models Repository)
 You can use the model directly from the model repository to compute sentence embeddings:
 ```python
 from transformers import AutoTokenizer, AutoModel
 import torch
 #Mean Pooling - Take attention mask into account for correct averaging
 def mean_pooling(model_output, attention_mask):
     token_embeddings = model_output[0] #First element of model_output contains all token embeddings
     input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
@@ -25,14 +31,18 @@ def mean_pooling(model_output, attention_mask):
 #Sentences we want sentence embeddings for
 sentences = ['Привет! Как твои дела?',
              'А правда, что 42 твое любимое число?']
 #Load AutoModel from huggingface model repository
 tokenizer = AutoTokenizer.from_pretrained("ai-forever/ruELECTRA-large")
 model = AutoModel.from_pretrained("ai-forever/ruELECTRA-large")
 #Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')
 #Compute token embeddings
 with torch.no_grad():
     model_output = model(**encoded_input)
 #Perform pooling. In this case, mean pooling
 sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
 ```

 # ruELECTRA large model multitask (cased) for Sentence Embeddings in Russian language.
 For better quality, use mean token embeddings.
 ## Usage (HuggingFace Models Repository)
 You can use the model directly from the model repository to compute sentence embeddings:
 ```python
 from transformers import AutoTokenizer, AutoModel
 import torch
 #Mean Pooling - Take attention mask into account for correct averaging
 def mean_pooling(model_output, attention_mask):
     token_embeddings = model_output[0] #First element of model_output contains all token embeddings
     input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
 #Sentences we want sentence embeddings for
 sentences = ['Привет! Как твои дела?',
              'А правда, что 42 твое любимое число?']
 #Load AutoModel from huggingface model repository
 tokenizer = AutoTokenizer.from_pretrained("ai-forever/ruELECTRA-large")
 model = AutoModel.from_pretrained("ai-forever/ruELECTRA-large")
 #Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')
 #Compute token embeddings
 with torch.no_grad():
     model_output = model(**encoded_input)
 #Perform pooling. In this case, mean pooling
 sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
 ```