🪲 brown-beetle-small-v1 Model Card

Beetle logo

Beetles are some of the most diverse and interesting creatures on Earth. They are found in every environment, from the deepest oceans to the highest mountains. They are also known for their ability to adapt to a wide range of habitats and lifestyles. They are small, fast and powerful!

The beetle series of models are made as good starting points for Static Embedding training (via TokenLearn or Fine-tuning), as well as decent Static Embedding models. Each beetle model is made to be an improvement over the original M2V_base_output model in some way, and that's the threshold we set for each model (except the brown beetle series, which is the original model).

This model has been distilled from baai/bge-base-en-v1.5, with PCA with 256 dimensions and applying Zipf.

The brown beetle series is made for convinience in loading and using the model instead of having to run it, though it is pretty fast to reproduce anyways. If you want to use the original model by the folks from the Minish Lab, you can use the M2V_base_output model.

Version Information

  • brown-beetle-base-v0: The original model, without using PCA or Zipf. The lack of PCA and Zipf also makes this a decent model for further training.
  • brown-beetle-base-v0.1: The original model, with PCA but of the same size as the original model. This model is great if you want to experiment with Zipf or other weighting methods.
  • brown-beetle-base-v1: The original model, with PCA and Zipf.
  • brown-beetle-small-v1: A smaller version of the original model, with PCA and Zipf. Equivalent to M2V_base_output.
  • brown-beetle-tiny-v1: A tiny version of the original model, with PCA and Zipf.
  • brown-beetle-base-v1.1: The original model, with PCA with 768 dimensions, applying Zipf and applying SIF re-weighting, learnt from a subset of the C4 corpus. This model is significantly better than the M2V_base_output model.
  • brown-beetle-small-v1.1: A smaller version of the original model, with PCA with 256 dimensions, applying Zipf and applying SIF re-weighting, learnt from a subset of the C4 corpus. This model is significantly better than the M2V_base_output model but slightly worse than the brown-beetle-base-v1.1 model.
  • brown-beetle-tiny-v1.1: A tiny version of the original model, with PCA with 128 dimensions, applying Zipf and applying SIF re-weighting, learnt from a subset of the C4 corpus. This model is significantly better than the M2V_base_output model but slightly worse than the brown-beetle-small-v1.1 model.

Installation

Install model2vec using pip:

pip install model2vec

Usage

Load this model using the from_pretrained method:

from model2vec import StaticModel

# Load a pretrained Model2Vec model
model = StaticModel.from_pretrained("bhavnicksm/brown-beetle-small-v1")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Read more about the Model2Vec library here.

Reproduce this model

To reproduce this model, you must install the model2vec[distill] package and use the following code:

from model2vec.distill import distill

# Distill the model
m2v_model = distill(
    model_name="bge-base-en-v1.5",
    pca_dims=256,
    apply_zipf=True,
)

# Save the model
m2v_model.save_pretrained("brown-beetle-small-v1")

Comparison with other models

Coming soon...

Acknowledgements

This model is made using the Model2Vec library. Credit goes to the Minish Lab team for developing this library.

Citation

Please cite the Model2Vec repository if you use this model in your work.

@software{minishlab2024model2vec,
  authors = {Stephan Tulkens, Thomas van Dongen},
  title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
  year = {2024},
  url = {https://github.com/MinishLab/model2vec},
}
Downloads last month
71
Safetensors
Model size
7.56M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collection including bhavnicksm/brown-beetle-small-v1

Evaluation results