bge_pairs / README.md
smokxy's picture
Upload folder using huggingface_hub
1944716 verified
metadata
language: []
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:7033
  - loss:GISTEmbedLoss
base_model: BAAI/bge-small-en-v1.5
datasets: []
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@5
  - cosine_ndcg@10
  - cosine_ndcg@100
  - cosine_mrr@5
  - cosine_mrr@10
  - cosine_mrr@100
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@5
  - dot_ndcg@10
  - dot_ndcg@100
  - dot_mrr@5
  - dot_mrr@10
  - dot_mrr@100
  - dot_map@100
widget:
  - source_sentence: What is packaged drinking water?
    sentences:
      - >-
        '26.6  Packaged Drinking Water (other than mineral water) It can be
        defined as water derived from the surface water or underground water or
        sea water which is subjected to herein-under specified treatments,
        namely decantation, filtration, combination of filtration, aerations,
        filtration with membrane filter depth filter, cartridge filter,
        activated carbon filtration, de-mineralization, remineralization,
        reverse osmosis and packed after disinfecting the water to a level that
        shall not lead any harmful contamination in the drinking water by means
        of chemical agents or physical methods to reduce the number of
        micro-organisms to level beyond scientifically accepted level for foods
        safety or its susceptibility. The standards, packaging and labelling
        requirements have also been specified under FSSAI rules.'
      - >-
        'Some fruit or vegetable powders are produced from juices, concentrates,
        or pulps by using a spray drying technique. Dry powders can be directly
        used as important constituents of dry soups, yogurt, etc. The drying is
        achieved by spraying of the slurry into an airstream at a temperature of
        138°C to 150°C and introducing cold dry air either into the outlet end
        of the dryer or to the dryer walls to cool them to 38°C– 50°C. The most
        commonly used atomizers are rotary wheel and single-fluid pressure
        nozzle. A wide range of fruit and vegetable powders can be dried,
        agglomerated, and instantized in spray drying units, specially equipped
        with an internal static fluidized bed, integral filter, or external
        vibrofluidizer. Bananas, peaches, apricots, and to a lesser extent
        citrus powders are examples of products dried by such techniques.'
      - >-
        'LEAF FEEDER 7. Leaf webber , Eucosma critica, Eucosmidae, Lepidoptera
        Symptom of damage: During vegetative stage of the crop, the caterpillar
        damages leaves by webbing, while at the floral stages of the crop they
        enter the buds, flowers and pods and feed on the immature seeds. Nature
        of damage: Young larva gets itself concealed into the frass produced
        during the course of scratching. The grown-up larva then draws the two
        leaves together and spins a thread between them, in which it passes
        later instar and also pupates. Egg: Oval, creamy white in colour, laid
        singly in leaves, petioles or stem. Larva: Young larvae are pale-yellow
        in colour, moderately stout, smooth, except for a few short scattered
        hairs. It hibernates in larval form. Pupa: Yellowish in colour,
        gradually turn to light-brown and finally to dark brown. Pupates in thin
        papery white silken cocoon. Adult: Dusky brown with forewings having
        four black dots and a silvery transparent mark'
  - source_sentence: What are the different geographical regions of Uttar Pradesh?
    sentences:
      - >-
        '......................................... food grains. Table -
        Different climates, regions, conditions, and geographical regions of
        Uttar Pradesh Bhawar and Terai Western Plains Central Western Region
        Plains Saharanpur, Bijnor Ganga, Bijnor of Jamuna Doab, Moradabad,
        Situation Rampur, Moradabad District Saharanpur Rampur, Bareilly,
        Pilibhit, Bareilly Muzaffarnagar, Meerut Shahjahanpur, Badaun,
        Lakhimpur, Ghaziabad, Bulandshahr Jyotiba Phule Nagar Baghpat, Gautam
        Buddha Nagar 2 3 4 54. Unirrigated stage * Early maturing * Straight
        sowing Govinda, Govinda, Govinda Dry sowing Narendra-8 Narendra-8
        Narendra-97 Narendra-97 + Planting Govinda, Govinda, Govinda
        GovindaNarendra-80 Dry sowing Govinda, Govinda-80 Dry sowing Dry sowing
        Dry sowing Dry sowing Malaviya Paddy-2 (HUR-3022) 2. Irrigated stage +
        Early maturing day Narendra-8 Narendra-48 Narendra-8 02'
      - >-
        '(4) The Chief Executive shall be entrusted with substantial powers of
        management as the Board may determine. (5) Without prejudice to the
        generality of sub-section (4), the Chief Executive may exercise  the
        powers and discharge the functions, namely:-  (a) do administrative acts
        of a routine nature including managing the day-to-day affairs of the
        Producer Company; (b) operate bank accounts or authorise any person,
        subject to the general or special approval of the Board in this behalf,
        to operate the bank account;  (c) *make arrangements for safe custody of
        cash and other assets of the Producer Company;* (d) sign such documents
        as may be authorised by the Board, for and on behalf of the company; (e)
        maintain proper books of account; prepare annual accounts and audit
        thereof; place the  audited accounts before the Board and in the annual
        general meeting of the Members;  (f) furnish Members with periodic
        information to appraise them of the operation and functions of the
        Producer Company; (g) make appointments to posts in accordance with the
        powers dele-gated to him by the Board; (h) *assist the Board in the
        formulation of goals, objectives, strategies, plans and policies;*  i) 
        advise the Board with respect to legal and regulatory matters concerning
        the proposed  and ongoing activities and take necessary action in
        respect thereof;  (j)  *exercise the powers as may be necessary in the
        ordinary course of business;* (k) discharge such other functions, and
        exercise such other powers, as may be delegated by the Board.  (6) The
        Chief Executive shall manage the affairs of the Producer Company under
        the general superintendence, direction and control of the Board and be
        accountable for the performance of the Producer Company.'
      - >-
        '29.5.4  Firing/Drying Once optimum fermentation is achieved, it is
        necessary to destroy enzymes. The „dhool‟ is fed to the driers by
        conveyors at a temperature of 90-120 o C for 12-15 minutes. This process
        reduces the moisture content of fermented tea from ~ 60% to < 4%. It
        terminates fermentation by inactivating the enzymes. It makes the
        product fit for sorting and packaging. In driers the inlet and outlet
        temperature may range from 82-98 o C and 45-55 o C respectively.
        Fluidized bed driers are being used recently. In this the blown hot air
        moves the dhool by process of fluidization. The disadvantage of firing
        is loss of considerable amount of volatile aroma compounds.'
  - source_sentence: What are some pests that infest mung bean crops during the Kharif season?
    sentences:
      - >-
        'Sowing Time and Temperature: As soon as the rains begin, millet should
        be sown by the second week of July. The millet plant needs a temperature
        of 25 ° C to germinate and 30 to 3 ° C to grow. Its plants give good
        yield even at 40 ° C. Millet sowing method: In natural farming method,
        sowing millet on ridge is considered the best method. Sowing on the bed
        reduces the quantity of seed and saves up to 70% of water. And when the
        drain is irrigated, the roots of the crop grown on the bed move towards
        the water in search of moisture, which makes the roots more developed
        and the plant stronger. And if it rains and the field is waterlogged,
        there is less chance of damage to the crop grown on the beds. Seed rate:
        The natural cultivation of millets requires 5 kg / ha of certified seed.
        Seed treatment: Millet seeds are sown by treating them with
        \'Beejamrut,\' which protects the seeds from soil-borne diseases.
        Treating the seed leads to better germination and higher yield as a
        crop. Nutrient management: Prior to sowing for nutrient availability.'
      - >-
        '4. Collection of Corcyra eggs : Corcyra eggs are loosely laid and they
        are collected through the wire mesh at the bottom on a receiving
        container with funnel setup on an enamel tray. Eggs are to be collected
        daily and continuously for 4 days from each drum.. On the fifth day it
        is to be vacated and cleaned. A sheet of blotting paper is spread on the
        tray or in  the funnel set up. It retains most of the moths scales and
        body fragments while the eggs were easily rolled out during cleaning.
        The eggs are cleaned and separated from the moth’s scales by using a new
        gadget namely Corcyra moth scales and egg separator developed by TNAU.'
      - >-
        'Covering - Covering crops in mung bean covers the empty space by
        spreading the residues obtained from crops such as stalks, gasses and
        pollen in the husk, which increases the amount of organic carbon in the
        crop and along with the back formation of the cover, draws water from
        the atmosphere and gives it to the plants as moisture. Irrigation
        Management - Generally, Kharif crops do not require irrigation. If there
        is a lack of rain, an irrigation must be done while the pods are
        forming. Weed control - In natural farming, weeds grow and are removed
        by hand. Application of Jeevamrut - When mung beans are grown in a
        natural way, the first spraying of Jeevamrut is done at the initial
        stage of the crop and the second spraying is done at the time of
        fruiting. Pest management - In kharif, there is an infestation of pests
        like termites, scorpions, mongoose, whitefly, green oil, leaf beetle,
        legume borer, and succulent, etc. in mung bean crops, for the control of
        which spraying of decoction and firewood should be done at an interval
        of one day.'
  - source_sentence: How do corporates support POs with primary processing machinery?
    sentences:
      - >-
        '27.1 Introduction Fruit beverages and drinks are one of the popular
        categories of beverages that are consumed across the globe. The fruit
        beverages and drinks are easily digestible, highly refreshing, thirst
        quenching, appetizing and nutritionally far superior to most of the
        synthetic and aerated drinks. In recent past the consumption of fruit
        based beverages and drinks has increased at a fast rate. Fruit juices or
        pulp used for the preparation of these products are subjected to minimal
        processing operations like filteration, clarification and
        pasteurization. The fruit juice or pulp, are mixed with ingredients like
        sugar, acid, stabilizers, micronutrients and preservative to develop
        beverages and drinks. There are various categories of fruit juice or
        pulp based beverages and drinks which are listed below. Natural fruit
        juices, sweetened juices, ready-to-serve beverages, nectar, cordial,
        squash, crush, syrup, fruit juice concentrate and fruit juice powder
        belong to the category of non-alcoholic and non-carbonated beverages.
        The principle groups of fruit beverages are as follows: • Ready-to-Serve
        (RTS) pre-packaged Beverages • Fruit juice and Nectars • Dilutable
        beverages'
      - >-
        'Dairy animals/ Pigs/ Goats Protection during rains Heavy rainfall and
        high humidity predisposes mastitis in crossbred cows hence keep dairy
        shed clean and dry. Use post milking teat dip cup to prevent mastitis.
        Don’ts feed mouldy feed and fodder which causes detrimental effects on
        health of animals. i.e. black spots on stored dry fodder, unacceptable
        odour of oil cakes. In rainy season, dairy animals suffer with tick
        infestation. 5 to 10% ticks present on body of animals and 90 to 95 %
        present in the shed. Hence spray ectoparasiticide i.e., cypermethrin or
        deltamethrin 2-4% on animals’ body and also in the shed. Use flamegun to
        burn floor and walls of shed every 10 to 15 days. Hybrid Napier
        perennial fodder CO-5 performance is excellent in in Goa climatic
        condition. Farmer can get 300 to 350 metric tons of green fodder yield
        with six to seven cutting a year. Farmer can go for plantation in
        Kharif. Sololy grazing of dairy animals on lush greens may cause
        digestive disturbance and comparatively low fat in milk hence always
        daily offer dry fodder along with greens. Avoid water leakages in the
        shed which causes slippery floor. Apply lime in and around shed which
        causes disinfection and keep floor dry which helps to rest animals on
        the floor.'
      - >-
        '    7.22 What support is available from government departments for
        market linkage? Many State Governments have schemes for preferential
        procurement of produce from POs. For example, procurement of certified
        seeds through POs has been implemented by the Government of
        Chhattisgarh. The facilitating agency should be able to get the relevant
        information from the respective Governments. 7.23 What support is
        available from corporates for market linkage? The corporates need
        continuous supply of desired quality produce for processing and value
        addition. Therefore, they prefer to enter into contract with few
        producer organisations who will meet their requirement. Usually the
        following mechanisms are adopted:  a. Retail chains tie up with POs for
        procurement. b. Corporates extend dealership for farm machinery and
        inputs to POs. c. Corporates provide primary processing machinery to PO
        with buy-back arrangement  for the produce'
  - source_sentence: What does an Industry Analysis entail?
    sentences:
      - >-
        'Aggregating producers into collectives is one of the best mechanism to
        improve access of small producers to investment, technology and market. 
        The facilitating agency should however keep the following factors in
        view:   a. Types of small scale producers in the target area, volume of
        production, socioeconomic status, marketing arrangement  b. Sufficient
        demand in the existing market to absorb  the additional production
        without  significantly affecting the prices  c. Willingness of producers
        to invest and adopt new technology, if identified, to increase 
        productivity or quality of produce  d. Challenges in the market chain
        and market environment e. Vulnerability of the market to shocks, trends
        and seasonality  f. Previous experience of collective action (of any
        kind) in the community g. Key commodities, processed products or
        semi-finished goods demanded by major  retailers or processing companies
        in the surrounding areas/districts  h. Support from Government
        Departments, NGOs, specialist support agencies and  private companies 
        for enterprise development  i. Incentives for members (also
        disincentives) for joining the PO    Keeping in view the sustainability
        of a Producer Organisation, a flow chart of activities along with
        timeline, verifiable indicators and risk factors is provided at
        Attachment-5.'
      - >-
        'a. Executive summary    b. Business Description c. Industry/Sector
        analysis d. Marketing plan e. Operations plan  f. Financial plan    7.8
        What is included in an executive summary? The executive summary is an
        abstract containing the important points of the business plan. Its
        purpose is to communicate the plan in a convincing way to important
        audiences, such as potential investors, so they will read further. It
        may be the only chapter of the business plan a reader uses to make a
        quick decision on the proposal. As such, it should fulfill the reader's
        (financier's) expectations. It is prepared after the total plan has been
        written. The executive summary should describe the following:  a. The
        industry and market environment in which the opportunity will develop
        and  flourish   b. The special and unique business opportunity—the
        problem the product or service will  be solving   c. The strategies for
        success—what differentiates the product or service from the 
        competitors' products  d. The financial potential—the anticipated risk
        and reward of the business e. The management team—the people who will
        achieve the results   f.  The resources or capital being requested—a
        clear statement to your readers about what you hope to gain from them,
        whether it is capital or other resources    7.9 What is included in a
        Business Description? The business description explains the business
        concept by giving a brief yet informative picture of the history, the
        basic nature, and the purpose of the business, including business
        objectives and why the business will be successful. The purposes of the
        business description are to:  a. Express clearly  understanding of the
        business concept   b. Share enthusiasm for the venture   c. Meet the
        expectations of the reader by providing a realistic picture of the
        business  venture    7.10 What is Industry Analysis?'
      - >-
        '-Black cloth, -Khada cloth -Saw dust -0.025 % Sodium hypochlorite
        -Chick pea / groundnut seedlings -Bleaching powder -Coffee powder
        -Multivitamin syrup -10 % sucrose -Beaker 500 ml -Measuring cylinder
        -Egg laying chamber Procedure : 1. Release  10 males and 5 females at 2:
        1 ratio in plastic containers and cover with thin black cloth . ( Female
        require multiple mating to lay fertile eggs ) . 2. To induce the moths
        to lay more eggs multivitamin syrup 2 drops + 10 % sucrose is given
        through cotton swabs 3. Daily collect the egg cloth after 3 rd day of
        copulation . Provide 25- 28 o C , 80- 90 % R.H during egg laying. A
        female lays 300 –700 eggs 4. Sterilize the egg cloth in 0.025 % sodium
        hypochlorite for ten seconds and immediately dip the egg cloth in
        distilled water in 3 different buckets having distilled water one by
        one  and then dry it in shade. 5. Raise chickpea or groundnut seedlings
        in a week interval and provide for feeding 6. Place newly hatched larvae
        on chickpea/groundnut seedlings along with egg cloth for one day or
        place 3-4 eggs in vials containing artificial diet 7. Pick young larvae
        and rear on bhendi vegetable individually in  penicillin vials to avoid
        cannibalism. 8. Daily change diet till pre pupal stage 9. Collect pre
        –pupae and allow for pupation in plastic container having saw dust 10.
        Pupae sterilization is done with the help of coffee filter by  dip
        method 11. Transfer the pupae inside the egg lying chamber by keeping
        them on a separate petri dish without lid.'
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: val evaluator
          type: val_evaluator
        metrics:
          - type: cosine_accuracy@1
            value: 0.5127877237851662
            name: Cosine Accuracy@1
          - type: cosine_accuracy@5
            value: 0.9360613810741688
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9578005115089514
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5127877237851662
            name: Cosine Precision@1
          - type: cosine_precision@5
            value: 0.18721227621483372
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09578005115089515
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5127877237851662
            name: Cosine Recall@1
          - type: cosine_recall@5
            value: 0.9360613810741688
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9578005115089514
            name: Cosine Recall@10
          - type: cosine_ndcg@5
            value: 0.7467744044168642
            name: Cosine Ndcg@5
          - type: cosine_ndcg@10
            value: 0.7540621914922426
            name: Cosine Ndcg@10
          - type: cosine_ndcg@100
            value: 0.7627409192698384
            name: Cosine Ndcg@100
          - type: cosine_mrr@5
            value: 0.6829497016197776
            name: Cosine Mrr@5
          - type: cosine_mrr@10
            value: 0.6861121260098237
            name: Cosine Mrr@10
          - type: cosine_mrr@100
            value: 0.6880792251529196
            name: Cosine Mrr@100
          - type: cosine_map@100
            value: 0.6880792251529201
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.5127877237851662
            name: Dot Accuracy@1
          - type: dot_accuracy@5
            value: 0.9360613810741688
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.9578005115089514
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.5127877237851662
            name: Dot Precision@1
          - type: dot_precision@5
            value: 0.18721227621483372
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.09578005115089515
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.5127877237851662
            name: Dot Recall@1
          - type: dot_recall@5
            value: 0.9360613810741688
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.9578005115089514
            name: Dot Recall@10
          - type: dot_ndcg@5
            value: 0.7467744044168642
            name: Dot Ndcg@5
          - type: dot_ndcg@10
            value: 0.7540621914922426
            name: Dot Ndcg@10
          - type: dot_ndcg@100
            value: 0.7627409192698384
            name: Dot Ndcg@100
          - type: dot_mrr@5
            value: 0.6829497016197776
            name: Dot Mrr@5
          - type: dot_mrr@10
            value: 0.6861121260098237
            name: Dot Mrr@10
          - type: dot_mrr@100
            value: 0.6880792251529196
            name: Dot Mrr@100
          - type: dot_map@100
            value: 0.6880792251529201
            name: Dot Map@100

SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'What does an Industry Analysis entail?',
    "'a. Executive summary    b. Business Description c. Industry/Sector analysis d. Marketing plan e. Operations plan  f. Financial plan    7.8 What is included in an executive summary? The executive summary is an abstract containing the important points of the business plan. Its purpose is to communicate the plan in a convincing way to important audiences, such as potential investors, so they will read further. It may be the only chapter of the business plan a reader uses to make a quick decision on the proposal. As such, it should fulfill the reader's (financier's) expectations. It is prepared after the total plan has been written. The executive summary should describe the following:  a. The industry and market environment in which the opportunity will develop and  flourish   b. The special and unique business opportunity—the problem the product or service will  be solving   c. The strategies for success—what differentiates the product or service from the  competitors' products  d. The financial potential—the anticipated risk and reward of the business e. The management team—the people who will achieve the results   f.  The resources or capital being requested—a clear statement to your readers about what you hope to gain from them, whether it is capital or other resources    7.9 What is included in a Business Description? The business description explains the business concept by giving a brief yet informative picture of the history, the basic nature, and the purpose of the business, including business objectives and why the business will be successful. The purposes of the business description are to:  a. Express clearly  understanding of the business concept   b. Share enthusiasm for the venture   c. Meet the expectations of the reader by providing a realistic picture of the business  venture    7.10 What is Industry Analysis?'",
    "'-Black cloth, -Khada cloth -Saw dust -0.025 % Sodium hypochlorite -Chick pea / groundnut seedlings -Bleaching powder -Coffee powder -Multivitamin syrup -10 % sucrose -Beaker 500 ml -Measuring cylinder -Egg laying chamber Procedure : 1. Release  10 males and 5 females at 2: 1 ratio in plastic containers and cover with thin black cloth . ( Female require multiple mating to lay fertile eggs ) . 2. To induce the moths to lay more eggs multivitamin syrup 2 drops + 10 % sucrose is given through cotton swabs 3. Daily collect the egg cloth after 3 rd day of copulation . Provide 25- 28 o C , 80- 90 % R.H during egg laying. A female lays 300 –700 eggs 4. Sterilize the egg cloth in 0.025 % sodium hypochlorite for ten seconds and immediately dip the egg cloth in distilled water in 3 different buckets having distilled water one by one  and then dry it in shade. 5. Raise chickpea or groundnut seedlings in a week interval and provide for feeding 6. Place newly hatched larvae on chickpea/groundnut seedlings along with egg cloth for one day or place 3-4 eggs in vials containing artificial diet 7. Pick young larvae and rear on bhendi vegetable individually in  penicillin vials to avoid cannibalism. 8. Daily change diet till pre pupal stage 9. Collect pre –pupae and allow for pupation in plastic container having saw dust 10. Pupae sterilization is done with the help of coffee filter by  dip method 11. Transfer the pupae inside the egg lying chamber by keeping them on a separate petri dish without lid.'",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5128
cosine_accuracy@5 0.9361
cosine_accuracy@10 0.9578
cosine_precision@1 0.5128
cosine_precision@5 0.1872
cosine_precision@10 0.0958
cosine_recall@1 0.5128
cosine_recall@5 0.9361
cosine_recall@10 0.9578
cosine_ndcg@5 0.7468
cosine_ndcg@10 0.7541
cosine_ndcg@100 0.7627
cosine_mrr@5 0.6829
cosine_mrr@10 0.6861
cosine_mrr@100 0.6881
cosine_map@100 0.6881
dot_accuracy@1 0.5128
dot_accuracy@5 0.9361
dot_accuracy@10 0.9578
dot_precision@1 0.5128
dot_precision@5 0.1872
dot_precision@10 0.0958
dot_recall@1 0.5128
dot_recall@5 0.9361
dot_recall@10 0.9578
dot_ndcg@5 0.7468
dot_ndcg@10 0.7541
dot_ndcg@100 0.7627
dot_mrr@5 0.6829
dot_mrr@10 0.6861
dot_mrr@100 0.6881
dot_map@100 0.6881

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,033 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 15.86 tokens
    • max: 35 tokens
    • min: 116 tokens
    • mean: 283.94 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What role do emulsifying and stabilizing agents play in carbonated water? 'The consumption of carbonated water has increased rapidly. As per FSSAI definitions carbonated water conforming to the standards prescribed for packaged drinking water under Food Safety and Standard act, 2006 impregnated with carbon dioxide under pressure and may contain any of the listed additives singly or in combination. Permitted additives include sweeteners (sugar, liquid glucose, dextrose monohydrate, invert sugar, fructose, Honey) fruits & vegetables extractive, permitted flavouring, colouring matter, preservatives, emulsifying and stabilizing agents, acidulants (citric acid, fumaric acid and sorbitol, tartaric acid, phosphoric acid, lactic acid, ascorbic acid, malic acid), edible gums, salts of sodium, calcium and magnesium, vitamins, caffeine not exceeding 145 ppm, ester gum not exceeding 100 ppm and quinine salts not exceeding 100 ppm. It may contain Sodium saccharin not exceeding 100 ppm or Acesulfame-k 300 ppm or Aspartame not exceeding 700 ppm or sucralose not exceeding 300 ppm.'
    What is the purpose of the Agri Clinic and Agri Business Centres scheme? '
    What can be considered as outliers in terms of yield? 'Identification of Outliers: All these above analyses can be used to check whether there was any reason for yield deviation as presented in the CCE data. Then a yield proxy map may be prepared. The Yield proxy map can be derived from remote sensing vegetation indices (single or combination of indices), crop simulation model output, or an integration of various parameters, which are related to crop yield, such as soil, weather (gridded), satellite based products, etc. Whatever, yield proxies to be used, it is the responsibility of the organization to record documentary evidence (from their or other's published work) that the yield proxy is related to the particular crop's yield. Then the IU level yields need to be overlaid on the yield proxy map. Both yield proxy and CCE yield can be divided into 4-5 categories (e.g. Very good, Good, Medium, Poor, Very poor). Wherever there is large mismatch between yield proxy and the CCE yield (more than 2 levels), the CCE yield for that IU can be considered, as outliers.'
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01}
    

Evaluation Dataset

Unnamed Dataset

  • Size: 782 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.85 tokens
    • max: 53 tokens
    • min: 116 tokens
    • mean: 272.65 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What diseases do the mentioned pulses have resistance to? '..................................................... pulses. 20. Ta 2IPM - 409-4. _ 2020 (Heera) Meha 2005 (I. P.M. - 99-25) Pusa Vishal 200] H. UM-6 2006 (Malaviya Janakalyani). Malaviya Jyothi 999 (H. UM-) TMV-37 2005T. The BM-37 (t. M. - 99.37) Malaviya 2003 Jan Chetna (H. UM-42) IPM-2-3 2009I. P.M. 2 - 4 20
    What do hypertonic drinks have high levels of? 'There are three types of sports drinks all of which contain various levels of fluid, electrolytes, and carbohydrate. • Isotonic drinks have fluid, electrolytes and 6-8% carbohydrate. Isotonic drinks quickly replace fluids lost by sweating and supply a boost of carbohydrate. This kind of drink is the choice for most athletes especially middle and long distance running or team sports. • Hypotonic drinks have fluids, electrolytes and a low level of carbohydrates. Hypotonic drinks quickly replace flids lost by sweating. This kind of drink is suitable for athletes who need fluid without the boost of carbohydrates such as gymnasts. • Hypertonic drinks have high levels of carbohydrates. Hypertonic drinks can be used to supplement daily carbohydrate intake normally after exercise to top up muscle glycogen stores. In long distance events high levels of energy are required and hypertonic drinks'
    When should sowing be done? 'y Sowing should be done in the first fortnight of June and PR 126,PR 114, PR 121, PR 122, PR 127 are suitable varieties. Divide the field into kiyaras (plot) of desirable size after laser land levelling and apply pre-sowing (rauni) irrigation and prepare field when it comes to tar-wattar (good soil moisture) condition and immediately sow the crop with rice seed drill fitted with inclinedplate metering system or Lucky seed drill (for simultaneously sowing and spray of herbicide) by using 20 to 25 kg seed/ha in 20 cm spaced rows. The seed should be placed at 2-3 cm depth. Before sowing, treat rice seed with 3 g Sprint 75 WS (mencozeb + carbendazim) by dissolving in 10-12 ml water per kg seed; make paste of fungicide solution and rub on the seed.'
  • Loss: GISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.01}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • gradient_accumulation_steps: 4
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 40
  • warmup_ratio: 0.1
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 40
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss val_evaluator_cosine_map@100
2.2727 500 0.2767 0.0931 0.6449
4.5455 1000 0.067 0.0777 0.6501
6.8182 1500 0.0485 0.0621 0.6678
9.0909 2000 0.0361 0.0615 0.6707
11.3636 2500 0.0301 0.0687 0.6765
13.6364 3000 0.0274 0.0661 0.6733
15.9091 3500 0.0223 0.0606 0.6822
18.1818 4000 0.021 0.0563 0.6834
20.4545 4500 0.0203 0.0573 0.6681
22.7273 5000 0.0212 0.0637 0.6770
25.0 5500 0.018 0.0580 0.6781
27.2727 6000 0.0166 0.0567 0.6781
29.5455 6500 0.0194 0.0542 0.6835
31.8182 7000 0.0182 0.0547 0.6897
34.0909 7500 0.0157 0.0549 0.6899
36.3636 8000 0.016 0.053 0.686
38.6364 8500 0.0142 0.0541 0.6881
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.7
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.1
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, 
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}