Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update.
Browse files- src/about.py +5 -5
src/about.py
CHANGED
@@ -12,7 +12,7 @@ WHAT_IS_F1_HTML_TOP = f"""
|
|
12 |
|
13 |
<p class="text-lg mb-4 f1-p">We believe that existing benchmarks fail to capture the deep reasoning skills required for complex, research-level algorithmic problems. To address this gap, <a href="{PAPER_URL}" target="_blank" rel="noopener noreferrer" class="f1-a">we introduce <strong>FormulaOne</strong></a>.</p>
|
14 |
|
15 |
-
<p class="mb-4 f1-p"><strong>FormulaOne</strong> consists of
|
16 |
|
17 |
<!-- Clean, centered "table" using a single grid -->
|
18 |
<div class="f1-grid-wrap" role="region" aria-label="FormulaOne categories">
|
@@ -50,7 +50,7 @@ WHAT_IS_F1_HTML_BOTTOM_A_BEFORE_TABS = """
|
|
50 |
<div class="f1-container">
|
51 |
<section>
|
52 |
<p class="mb-4 f1-p">The latter category is incredibly demanding, requiring resolution of many points of uncertainty, and involving an array of reasoning steps, including topological and geometric insight, knowledge of mathematical domains such as extremal graph theory and logic, combinatorial considerations, precise implementation, and more.</p>
|
53 |
-
<p class="f1-p">Despite <a href="https://epoch.ai/frontiermath" target="_blank" rel="noopener noreferrer" class="f1-a">impressive</a> <a href="https://artificialanalysis.ai/evaluations/gpqa-diamond" target="_blank" rel="noopener noreferrer" class="f1-a">performance</a> on existing benchmarks, presently <strong>no model solves even a single Tier
|
54 |
</section>
|
55 |
|
56 |
<section>
|
@@ -82,14 +82,14 @@ WHAT_IS_F1_HTML_AFTER_VIDEO = """
|
|
82 |
<li class="f1-li"><strong>Consistency:</strong> The solution must produce the same output for a given graph, regardless of the specific tree decomposition.</li>
|
83 |
<li class="f1-li"><strong>Efficiency:</strong> The solution must be truly <a href="https://en.wikipedia.org/wiki/Parameterized_complexity" target="_blank" rel="noopener noreferrer" class="f1-a">fixed-parameter linear</a>.</li>
|
84 |
</ul>
|
85 |
-
<p class="mb-4 f1-p">To support research and encourage community contributions, the <code>FormulaOne-
|
86 |
<p class="f1-p">To maintain the integrity of the core benchmark, only a minimal subset of tests is released for the Deeper and Deepest Tier problems. Solutions submitted for evaluation on our benchmark are evaluated against a withheld comprehensive test-suite.</p>
|
87 |
"""
|
88 |
|
89 |
# Evaluation: begins the "Model Accuracy" subsection and the Warmup paragraph, up to (but not including) the Warmup figure.
|
90 |
WHAT_IS_F1_HTML_EVAL_BEFORE_WARMUPFIG = """
|
91 |
<h2 class="f1-h2">Model Accuracy</h2>
|
92 |
-
<p class="mb-4 f1-p">On the <strong>FormulaOne-
|
93 |
<!-- warmup_performance figure inserted via gr.Image in app.py -->
|
94 |
"""
|
95 |
|
@@ -101,7 +101,7 @@ WHAT_IS_F1_HTML_AFTER_WARMUPFIG = """
|
|
101 |
|
102 |
# Tail after Deeper figure (closes evaluation section + container)
|
103 |
WHAT_IS_F1_HTML_AFTER_TIER1FIG_TAIL = """
|
104 |
-
<p class="f1-p">This trend culminates in <strong>Tier
|
105 |
</section>
|
106 |
</div>
|
107 |
"""
|
|
|
12 |
|
13 |
<p class="text-lg mb-4 f1-p">We believe that existing benchmarks fail to capture the deep reasoning skills required for complex, research-level algorithmic problems. To address this gap, <a href="{PAPER_URL}" target="_blank" rel="noopener noreferrer" class="f1-a">we introduce <strong>FormulaOne</strong></a>.</p>
|
14 |
|
15 |
+
<p class="mb-4 f1-p"><strong>FormulaOne</strong> consists of novel dynamic programming problems over graphs. The problems are organised into three categories, ranging from moderate difficulty and all the way up to research-level.</p>
|
16 |
|
17 |
<!-- Clean, centered "table" using a single grid -->
|
18 |
<div class="f1-grid-wrap" role="region" aria-label="FormulaOne categories">
|
|
|
50 |
<div class="f1-container">
|
51 |
<section>
|
52 |
<p class="mb-4 f1-p">The latter category is incredibly demanding, requiring resolution of many points of uncertainty, and involving an array of reasoning steps, including topological and geometric insight, knowledge of mathematical domains such as extremal graph theory and logic, combinatorial considerations, precise implementation, and more.</p>
|
53 |
+
<p class="f1-p">Despite <a href="https://epoch.ai/frontiermath" target="_blank" rel="noopener noreferrer" class="f1-a">impressive</a> <a href="https://artificialanalysis.ai/evaluations/gpqa-diamond" target="_blank" rel="noopener noreferrer" class="f1-a">performance</a> on existing benchmarks, presently <strong>no model solves even a single 'Deepest Tier' problem</strong>.</p>
|
54 |
</section>
|
55 |
|
56 |
<section>
|
|
|
82 |
<li class="f1-li"><strong>Consistency:</strong> The solution must produce the same output for a given graph, regardless of the specific tree decomposition.</li>
|
83 |
<li class="f1-li"><strong>Efficiency:</strong> The solution must be truly <a href="https://en.wikipedia.org/wiki/Parameterized_complexity" target="_blank" rel="noopener noreferrer" class="f1-a">fixed-parameter linear</a>.</li>
|
84 |
</ul>
|
85 |
+
<p class="mb-4 f1-p">To support research and encourage community contributions, the <code>FormulaOne-Shallow</code> ("warmup") dataset is released as a public resource for training and fine-tuning models. The complete test suite for all 100 'Shallow' problems is available, alongside a standalone evaluation environment, in our <a href="https://github.com/double-ai/formulaone-dataset/tree/main" target="_blank" rel="noopener noreferrer" class="f1-a">GitHub repository</a>.</p>
|
86 |
<p class="f1-p">To maintain the integrity of the core benchmark, only a minimal subset of tests is released for the Deeper and Deepest Tier problems. Solutions submitted for evaluation on our benchmark are evaluated against a withheld comprehensive test-suite.</p>
|
87 |
"""
|
88 |
|
89 |
# Evaluation: begins the "Model Accuracy" subsection and the Warmup paragraph, up to (but not including) the Warmup figure.
|
90 |
WHAT_IS_F1_HTML_EVAL_BEFORE_WARMUPFIG = """
|
91 |
<h2 class="f1-h2">Model Accuracy</h2>
|
92 |
+
<p class="mb-4 f1-p">On the <strong>FormulaOne-Shallow</strong> problems, frontier models perform reasonably well. This confirms they have a foundational capability for these types of algorithmic tasks, in other words, the tasks are squarely in-distribution.</p>
|
93 |
<!-- warmup_performance figure inserted via gr.Image in app.py -->
|
94 |
"""
|
95 |
|
|
|
101 |
|
102 |
# Tail after Deeper figure (closes evaluation section + container)
|
103 |
WHAT_IS_F1_HTML_AFTER_TIER1FIG_TAIL = """
|
104 |
+
<p class="f1-p">This trend culminates in <strong>Deepest Tier</strong>, where the difficulty is characteristic of exploratory research problems. On this set of 20 problems, no current frontier model solves even a single one. This result starkly illustrates the gap that remains between high performance on existing benchmarks and the deep algorithmic reasoning required for truly complex problems.</p>
|
105 |
</section>
|
106 |
</div>
|
107 |
"""
|