Improve model card: Add pipeline tag, library, license & usage

This PR enhances the model card by adding key metadata such as `pipeline_tag: text-ranking` and `library_name: sentence-transformers`, which improve discoverability and usability on the Hugging Face Hub. It also explicitly sets the `license` to `mit` and provides a clear Python usage example for premise retrieval.

Files changed (1) hide show

README.md +40 -1

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
-pipeline_tag: sentence-similarity
 ---
 # Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever
 ## Model Description
@@ -15,7 +18,43 @@ The model implementation and code are available at:
 [Try our model](https://premise-search.com)
 ## Citation
 If you use this model, please cite the following paper:

 ---
+pipeline_tag: text-ranking
+library_name: sentence-transformers
+license: mit
 ---
 # Model Card: Assisting Mathematical Formalization with A Learning-based Premise Retriever
 ## Model Description
 [Try our model](https://premise-search.com)
+## Usage
+You can use this model with the `sentence-transformers` library to embed queries and premises and then calculate their similarity for retrieval.
+```python
+from sentence_transformers import SentenceTransformer, util
+import torch
+# Load the pretrained model
+model = SentenceTransformer('ruc-ai4math/Lean_State_Search_Random')
+# Example Lean proof state (query) and a list of premises
+query = "<GOAL> (n : \u2115), n + 0 = n </GOAL>"
+premises = [
+    "<VAR> (n : \u2115) </VAR> <GOAL> n + 0 = n </GOAL>",
+    "<VAR> (n m : \u2115) </VAR> <GOAL> n + m = m + n </GOAL>",
+    "<VAR> (n : \u2115) </VAR> <GOAL> n = n </GOAL>",
+    "lemma add_zero (n : \u2115) : n + 0 = n := by sorry" # An actual Lean lemma
+]
+# Encode the query and premises into embeddings
+query_embedding = model.encode(query, convert_to_tensor=True)
+premise_embeddings = model.encode(premises, convert_to_tensor=True)
+# Calculate cosine similarity between the query and all premises
+cosine_scores = util.cos_sim(query_embedding, premise_embeddings)
+# Print the scores for each premise
+print("Similarity scores:")
+for i, score in enumerate(cosine_scores[0]):
+    print(f"  - Premise {i+1}: '{premises[i]}', Score: {score.item():.4f}")
+# Find the index of the premise with the highest similarity score
+best_match_idx = torch.argmax(cosine_scores).item()
+print(f"
+Best matching premise: '{premises[best_match_idx]}'")
+```
 ## Citation
 If you use this model, please cite the following paper: