Spaces:

ought
/

raft-leaderboard

Runtime error

App Files Files Community

lewtun HF staff commited on Aug 27, 2021

Commit

b66bb5e

1 Parent(s): b0781a3

Cleanup

Browse files

Files changed (1) hide show

app.py +8 -5

app.py CHANGED Viewed

@@ -38,7 +38,6 @@ def download_submissions():
         tags = extract_tags(dataset)
         if tags.get("benchmark") == "ought/raft" and tags.get("type") == "evaluation":
             submissions.append(dataset)
-    submissions = sorted(submissions, key=lambda x: int(x["id"].split("-")[-1]))
     return submissions
@@ -47,7 +46,7 @@ def format_submissions(submissions):
     # TODO(lewtun): delete / filter all the junk repos from development
     # The following picks the latest submissions which adhere to the model card schema
-    for submission in submissions[-2:]:
         submission_id = submission["id"]
         response = requests.get(
             f"http://huggingface.co/api/datasets/{submission_id}?full=true",
@@ -80,15 +79,19 @@ def format_submissions(submissions):
 ###########
 st.set_page_config(layout="wide")
 st.title("RAFT: Real-world Annotated Few-shot Tasks")
-st.markdown("""
-Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? RAFT is a few-shot classification benchmark that tests language models:
 - across multiple domains (lit review, tweets, customer interaction, etc.)
 - on economically valuable classification tasks (someone inherently cares about the task)
 - in a setting that mirrors deployment (50 examples per task, info retrieval allowed, hidden test set)
 To submit to RAFT, follow the instruction posted on [this page](https://github.com/oughtinc/raft_submission).
-""")
 submissions = download_submissions()
 df = format_submissions(submissions)
 # hack to remove index column from https://github.com/streamlit/streamlit/issues/641

         tags = extract_tags(dataset)
         if tags.get("benchmark") == "ought/raft" and tags.get("type") == "evaluation":
             submissions.append(dataset)
     return submissions
     # TODO(lewtun): delete / filter all the junk repos from development
     # The following picks the latest submissions which adhere to the model card schema
+    for submission in submissions:
         submission_id = submission["id"]
         response = requests.get(
             f"http://huggingface.co/api/datasets/{submission_id}?full=true",
 ###########
 st.set_page_config(layout="wide")
 st.title("RAFT: Real-world Annotated Few-shot Tasks")
+st.markdown(
+    """
+Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants?
+[RAFT](https://raft.elicit.org) is a few-shot classification benchmark that tests language models:
 - across multiple domains (lit review, tweets, customer interaction, etc.)
 - on economically valuable classification tasks (someone inherently cares about the task)
 - in a setting that mirrors deployment (50 examples per task, info retrieval allowed, hidden test set)
 To submit to RAFT, follow the instruction posted on [this page](https://github.com/oughtinc/raft_submission).
+"""
+)
 submissions = download_submissions()
 df = format_submissions(submissions)
 # hack to remove index column from https://github.com/streamlit/streamlit/issues/641