Spaces:

jerome-white
/

llm-bradley-terry

Sleeping

jerome-white commited on Mar 28, 2024

Commit

d4dddf1

1 Parent(s): 180caf6

Allow Alpaca and Arena results to be presented in the same space

Files changed (5) hide show

app.py CHANGED Viewed

@@ -13,6 +13,7 @@ from datasets import load_dataset
 from scipy.special import expit
 HDI = cl.namedtuple('HDI', 'lower, upper')
 #
 # See https://cran.r-project.org/package=HDInterval
@@ -46,7 +47,7 @@ def load(repo):
         model,
         'value',
     ]
-    dataset = load_dataset(repo)
     return (dataset
             .get('train')
@@ -190,11 +191,10 @@ class DocumentationReader:
 #
 #
 #
-with gr.Blocks() as demo:
-    df = load('jerome-white/alpaca-bt-stan')
-    docs = DocumentationReader(Path('docs'))
-    gr.Markdown('# Alpaca Bradley–Terry')
     with gr.Row():
         with gr.Column():
             gr.Markdown(docs['readme'])
@@ -232,8 +232,9 @@ with gr.Blocks() as demo:
                 ''')
             with gr.Column():
-                models = sorted(df['model'].unique(), key=lambda x: x.lower())
-                drops = ft.partial(gr.Dropdown, choices=models)
                 inputs = [ drops(label=f'Model {x}') for x in range(1, 3) ]
                 button = gr.Button(value='Compare!')
@@ -242,4 +243,17 @@ with gr.Blocks() as demo:
     with gr.Accordion('Disclaimer', open=False):
         gr.Markdown(docs['disclaimer'])
 demo.launch()

 from scipy.special import expit
 HDI = cl.namedtuple('HDI', 'lower, upper')
+TabGroup = cl.namedtuple('TabGroup', 'name, docs, dataset')
 #
 # See https://cran.r-project.org/package=HDInterval
         model,
         'value',
     ]
+    dataset = load_dataset(str(repo))
     return (dataset
             .get('train')
 #
 #
 #
+def layout(tab):
+    df = load(Path('jerome-white', tab.dataset))
+    docs = DocumentationReader(Path('docs', t.docs))
     with gr.Row():
         with gr.Column():
             gr.Markdown(docs['readme'])
                 ''')
             with gr.Column():
+                models = df['model'].unique()
+                choices = sorted(models, key=lambda x: x.lower())
+                drops = ft.partial(gr.Dropdown, choices=choices)
                 inputs = [ drops(label=f'Model {x}') for x in range(1, 3) ]
                 button = gr.Button(value='Compare!')
     with gr.Accordion('Disclaimer', open=False):
         gr.Markdown(docs['disclaimer'])
+#
+#
+#
+with gr.Blocks() as demo:
+    tabs = it.starmap(TabGroup, (
+        ('Alpaca', 'alpaca', 'alpaca-bt-stan'),
+        ('Chatbot Arena', 'arena', 'arena-bt-stan'),
+    ))
+    for t in tabs:
+        with gr.Tab(t.name):
+            layout(t)
 demo.launch()

docs/{disclaimer.md → alpaca/disclaimer.md} RENAMED Viewed

File without changes

docs/{readme.md → alpaca/readme.md} RENAMED Viewed

File without changes

docs/arena/disclaimer.md ADDED Viewed

+# Disclaimer
+This Space is primarily intended for exploration. For now its results
+should be treated as points of reference rather than absolute
+facts. Viewers are encouraged to study the pipeline and understand the
+model to help put the results into context.
+Suggestions for improving this Space from those familiar with Chatbot
+Arena or Bayesian data analysis are welcome! Please use the
+[community](https://huggingface.co/spaces/jerome-white/arena-bradley-terry/discussions)
+to do so.
+## Resources
+* [Source code](https://github.com/jerome-white/alpaca-bda/tree/chatbot-arena) for
+  producing results
+## TODO
+* Extend the Stan model to incorporate ties and response presentation
+  ordering
+* Add details of the MCMC chains
+* Automate data processing
+* Explicit documentation of the process

docs/arena/readme.md ADDED Viewed

+[LMSYS Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) is an
+LLM evaluation platform. This Space presents an alternative method of
+ranking based on the [Bradley–Terry
+model](https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model)
+(BT). This Space takes a Bayesian approach to BT parameter estimation,
+unlike the MLE approach used by the LMSYS organization.
+This Space is divided into two primary sections: the first presents a
+ranking of models based on estimated ability. The figure on the right
+visualizes this ranking for the top 10 models, while the table below
+presents the full set. The second section estimates the probability
+that one model will be preferred to another.