nielsr HF Staff commited on
Commit
24b9770
·
verified ·
1 Parent(s): b350739

Improve model card: Add pipeline tag, library, license & usage

Browse files

This PR enhances the model card by adding key metadata such as `pipeline_tag: text-ranking` and `library_name: sentence-transformers`, which improve discoverability and usability on the Hugging Face Hub. It also explicitly sets the `license` to `mit` and provides a clear Python usage example for premise retrieval.

Files changed (1) hide show
  1. README.md +40 -1
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
- pipeline_tag: sentence-similarity
 
 
3
  ---
 
4
  # Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever
5
 
6
  ## Model Description
@@ -15,7 +18,43 @@ The model implementation and code are available at:
15
 
16
  [Try our model](https://premise-search.com)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Citation
21
  If you use this model, please cite the following paper:
 
1
  ---
2
+ pipeline_tag: text-ranking
3
+ library_name: sentence-transformers
4
+ license: mit
5
  ---
6
+
7
  # Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever
8
 
9
  ## Model Description
 
18
 
19
  [Try our model](https://premise-search.com)
20
 
21
+ ## Usage
22
+
23
+ You can use this model with the `sentence-transformers` library to embed queries and premises and then calculate their similarity for retrieval.
24
+
25
+ ```python
26
+ from sentence_transformers import SentenceTransformer, util
27
+ import torch
28
+
29
+ # Load the pretrained model
30
+ model = SentenceTransformer('ruc-ai4math/Lean_State_Search_Random')
31
+
32
+ # Example Lean proof state (query) and a list of premises
33
+ query = "<GOAL> (n : \u2115), n + 0 = n </GOAL>"
34
+ premises = [
35
+ "<VAR> (n : \u2115) </VAR> <GOAL> n + 0 = n </GOAL>",
36
+ "<VAR> (n m : \u2115) </VAR> <GOAL> n + m = m + n </GOAL>",
37
+ "<VAR> (n : \u2115) </VAR> <GOAL> n = n </GOAL>",
38
+ "lemma add_zero (n : \u2115) : n + 0 = n := by sorry" # An actual Lean lemma
39
+ ]
40
 
41
+ # Encode the query and premises into embeddings
42
+ query_embedding = model.encode(query, convert_to_tensor=True)
43
+ premise_embeddings = model.encode(premises, convert_to_tensor=True)
44
+
45
+ # Calculate cosine similarity between the query and all premises
46
+ cosine_scores = util.cos_sim(query_embedding, premise_embeddings)
47
+
48
+ # Print the scores for each premise
49
+ print("Similarity scores:")
50
+ for i, score in enumerate(cosine_scores[0]):
51
+ print(f" - Premise {i+1}: '{premises[i]}', Score: {score.item():.4f}")
52
+
53
+ # Find the index of the premise with the highest similarity score
54
+ best_match_idx = torch.argmax(cosine_scores).item()
55
+ print(f"
56
+ Best matching premise: '{premises[best_match_idx]}'")
57
+ ```
58
 
59
  ## Citation
60
  If you use this model, please cite the following paper: