Dimensions of the embeddings
Hey @RaphaelMourad ,Amazing work!
Btw, I tried extracting embeddings from a protein sequence using your model.
prot = "JUYTRFDCVBNJKLMNBHGV"
inputs = tokenizer(prot , return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs.to("cuda"))[0] # [1, sequence_length, 256]
embedding with max pooling
embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 256
But the output "embedding_max" shape was "torch.Size([1024])". Which is far from 256, is it because this is a 1.6B model ????
(assuming you wrote 256 for other models of size 154M).
Thanks.
You guessed well, bigger model means bigger embedding.
Thanks for the timely reply
@RaphaelMourad
.
However while i am trying out the model. The 1.6B version. There seems to be a peculiar behavior. All the max pooling embeddings are 'nan' values.
torch.Size([1024])
tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0',grad_fn=)
I loaded the model directly in 16-bit precision ( in Colab ) and ran the same example from the model card.
insulin = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"
inputs = tokenizer(insulin, return_tensors = 'pt')["input_ids"]
hidden_states = model(inputs.to('cuda'))[0] # [1, sequence_length, 256]
# embedding with max pooling
embedding_max = torch.max(hidden_states[0], dim=0)[0]
print(embedding_max.shape) # expect to be 256
I did try the 'Mistral-Prot-v1-417M', which provides non-nan output, also how long did it take you to train this from scratch? What kind of compute power did you use man π ?