Model Summary
The checkpoint aligns with our pixel-linguist-all setting in the paper. The model is initialized from our monolingual model, and is trained on parallel data (205000 steps) <-> AllNLI (2600 steps), going back and forth for three rounds. This model is the last round checkpoint. We recommend using it with A100 GPU, aligning with training.
Downstream Use
Semantic Textual Similarity, Information Retrieval, Reasoning Retrieval
Out-of-Scope Use
The model might not be optimal for further fine-tuning to do other tasks (such as classification), as it's trained to do representation tasks with similarity matching.
Training Data
Please refer to the paper for the exact process.
Inference
Encoding with our PixelLinguist class is very straightforward, just like using a SentenceTransformer class.
model_name = "AnonymousPage/checkpoint-all"
model = PixelLinguist(model_name)
texts = ["I love you","I like you"]
embeddings = model.encode(texts)
print(outputs[0] @ outputs[1].T) # just use dot product because the embeddings are normalized automatically in the model class.
#tensor(0.9217)
To use the PixelLinguist class: First install the package following our Github Repo. Then define our PixelLinguist Class as follow.
import torch
from PIL import Image
from pixel import (
AutoConfig,
PangoCairoTextRenderer,
PIXELForSequenceClassification,
PIXELForRepresentation,
PoolingMode,
get_attention_mask,
get_transforms,
glue_strip_spaces,
resize_model_embeddings,
)
from tqdm import tqdm
class PixelLinguist:
def __init__(self, model_name, batch_size = 16, max_seq_length = 64,
device=None, pooling = "mean", keep_mlp = False):
if device is not None:
self.device = device
else:
self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
self.config = AutoConfig.from_pretrained(model_name, num_labels=0)
self.batch_size = batch_size
if keep_mlp == True:
self.model = PIXELForSequenceClassification.from_pretrained(
model_name,
config=self.config,
pooling_mode=PoolingMode.from_string(pooling),
add_layer_norm=True
).to(self.device)
else:
self.model = PIXELForRepresentation.from_pretrained(
model_name,
config=self.config,
pooling_mode=PoolingMode.from_string(pooling),
add_layer_norm=True
).to(self.device)
self.processor = PangoCairoTextRenderer.from_pretrained(model_name, rgb=False)
self.processor.max_seq_length = max_seq_length
resize_model_embeddings(self.model, self.processor.max_seq_length)
self.transforms = get_transforms(do_resize=True, size=(self.processor.pixels_per_patch, self.processor.pixels_per_patch * self.processor.max_seq_length))
def preprocess(self, texts):
encodings = [self.processor(text=glue_strip_spaces(a)) for a in texts]
pixel_values = torch.stack([self.transforms(Image.fromarray(e.pixel_values)) for e in encodings])
attention_mask = torch.stack([get_attention_mask(e.num_text_patches, seq_length=self.processor.max_seq_length) for e in encodings])
return {'pixel_values': pixel_values, 'attention_mask': attention_mask}
def encode(self, texts, **kwargs):
all_outputs = []
for i in tqdm(range(0, len(texts), self.batch_size)):
batch_texts = texts[i:i+batch_size]
inputs = self.preprocess(batch_texts)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = self.model(**inputs).logits.detach().cpu()
all_outputs.append(outputs)
return torch.cat(all_outputs, dim=0)
Evaluation
For STS evaluation (see Github repo):
python tools/evaluation_sts_all.py
For BEIR information retrieval evaluation (see Github repo):
python tools/evaluation_retrieval.py
- Downloads last month
- 12