Armeddinosaur commited on
Commit
17ad9a6
Β·
1 Parent(s): cf2253a

Updating readme

Browse files
Files changed (1) hide show
  1. README.md +15 -131
README.md CHANGED
@@ -16,90 +16,18 @@ This application provides a visual leaderboard for comparing AI model performanc
16
 
17
  The leaderboard uses the MLRC-BENCH benchmark, which measures what percentage of the top human-to-baseline performance gap an agent can close. Success is defined as achieving at least 5% of the margin by which the top human solution surpasses the baseline.
18
 
19
- ### Key Features
20
-
21
- - **Interactive Filtering**: Select specific model types and tasks to focus on
22
- - **Customizable Metrics**: Compare models using "Margin to Human" performance scores
23
- - **Hierarchical Table Display**: Fixed columns with scrollable metrics section
24
- - **Conditional Formatting**: Visual indicators for positive/negative values
25
- - **Model Type Color Coding**: Different colors for Open Source, Open Weights, and Closed Source models
26
- - **Medal Indicators**: Top-ranked models receive gold, silver, and bronze medals
27
- - **Task Descriptions**: Detailed explanations of what each task measures
28
-
29
- ## Project Structure
30
-
31
- The codebase follows a modular architecture for improved maintainability and separation of concerns:
32
-
33
- ```
34
- app.py (main entry point)
35
- β”œβ”€β”€ requirements.txt
36
- └── src/
37
- β”œβ”€β”€ app.py (main application logic)
38
- β”œβ”€β”€ components/
39
- β”‚ β”œβ”€β”€ header.py (header and footer components)
40
- β”‚ β”œβ”€β”€ filters.py (filter selection components)
41
- β”‚ β”œβ”€β”€ leaderboard.py (leaderboard table component)
42
- β”‚ └── tasks.py (task descriptions component)
43
- β”œβ”€β”€ data/
44
- β”‚ β”œβ”€β”€ processors.py (data processing utilities)
45
- β”‚ └── metrics/
46
- β”‚ └── margin_to_human.json (metric data file)
47
- β”œβ”€β”€ styles/
48
- β”‚ β”œβ”€β”€ base.py (combined styles)
49
- β”‚ β”œβ”€β”€ components.py (component styling)
50
- β”‚ β”œβ”€β”€ tables.py (table-specific styling)
51
- β”‚ └── theme.py (theme definitions)
52
- └── utils/
53
- β”œβ”€β”€ config.py (configuration settings)
54
- └── data_loader.py (data loading utilities)
55
- ```
56
-
57
- ### Module Descriptions
58
-
59
- #### Core Files
60
- - `app.py` (root): Simple entry point that imports and calls the main function
61
- - `src/app.py`: Main application logic, coordinates the overall flow
62
-
63
- #### Components
64
- - `header.py`: Manages the page header, section headers, and footer components
65
- - `filters.py`: Handles metric, task, and model type selection interfaces
66
- - `leaderboard.py`: Renders the custom HTML leaderboard table
67
- - `tasks.py`: Renders the task descriptions section
68
-
69
- #### Data Processing
70
- - `processors.py`: Contains utilities for data formatting and styling
71
- - `data_loader.py`: Functions for loading and processing metric data
72
-
73
- #### Styling
74
- - `theme.py`: Base theme definitions and color schemes
75
- - `components.py`: Styling for UI components (buttons, cards, etc.)
76
- - `tables.py`: Styling for tables and data displays
77
- - `base.py`: Combines all styles for application-wide use
78
-
79
- #### Configuration
80
- - `config.py`: Contains all configuration settings including themes, metrics, and model categorizations
81
-
82
- ## Benefits of Modular Architecture
83
-
84
- The modular structure provides several advantages:
85
-
86
- 1. **Improved Code Organization**: Code is logically separated based on functionality
87
- 2. **Better Separation of Concerns**: Each module has a clear, single responsibility
88
- 3. **Enhanced Maintainability**: Changes to one aspect don't require modifying the entire codebase
89
- 4. **Simplified Testing**: Components can be tested independently
90
- 5. **Easier Collaboration**: Multiple developers can work on different parts simultaneously
91
- 6. **Cleaner Entry Point**: Main app file is simple and focused
92
-
93
  ## Installation & Setup
94
 
95
  1. Clone the repository
96
- ```bash
97
- git clone <repository-url>
98
- cd model-capability-leaderboard
99
- ```
100
 
101
- 2. Install the required dependencies
102
  ```bash
 
 
103
  pip install -r requirements.txt
104
  ```
105
 
@@ -108,7 +36,14 @@ The modular structure provides several advantages:
108
  streamlit run app.py
109
  ```
110
 
111
- ## Extending the Application
 
 
 
 
 
 
 
112
 
113
  ### Adding New Metrics
114
 
@@ -156,57 +91,6 @@ To add new model types:
156
  }
157
  ```
158
 
159
- ### Modifying the UI Theme
160
-
161
- To change the theme colors:
162
-
163
- 1. Update the `dark_theme` dictionary in `src/utils/config.py`
164
-
165
- ### Adding New Components
166
-
167
- To add new visualization components:
168
-
169
- 1. Create a new file in the `src/components/` directory
170
- 2. Import and use the component in `src/app.py`
171
-
172
- ## Data Format
173
-
174
- The application uses JSON files for metric data. The expected format is:
175
-
176
- ```json
177
- {
178
- "task-name": {
179
- "model-name-1": value,
180
- "model-name-2": value
181
- },
182
- "another-task": {
183
- "model-name-1": value,
184
- "model-name-2": value
185
- }
186
- }
187
- ```
188
-
189
- ## Testing
190
-
191
- This modular structure makes it easier to write focused unit tests:
192
-
193
- ```python
194
- # Example test for data_loader.py
195
- def test_process_data():
196
- test_data = {"task": {"model": 0.5}}
197
- df = process_data(test_data)
198
- assert "Task" in df.columns
199
- assert df.loc["model", "Task"] == 0.5
200
- ```
201
-
202
  ## License
203
 
204
  [MIT License](LICENSE)
205
-
206
- ## Contributing
207
-
208
- Contributions are welcome! Please feel free to submit a Pull Request.
209
-
210
- ## Contact
211
-
212
- For any questions or feedback, please contact [[email protected]](mailto:[email protected]).
 
16
 
17
  The leaderboard uses the MLRC-BENCH benchmark, which measures what percentage of the top human-to-baseline performance gap an agent can close. Success is defined as achieving at least 5% of the margin by which the top human solution surpasses the baseline.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Installation & Setup
20
 
21
  1. Clone the repository
22
+ ```bash
23
+ git clone https://huggingface.co/spaces/launch/MLRC_Bench
24
+ cd MLRC_Bench
25
+ ```
26
 
27
+ 2. Setup virtual env and install the required dependencies
28
  ```bash
29
+ python -m venv env
30
+ source env/bin/activate
31
  pip install -r requirements.txt
32
  ```
33
 
 
36
  streamlit run app.py
37
  ```
38
 
39
+ ### Updating Metrics
40
+
41
+ To update the table, update the respective metric file in `src/data/metrics` directory
42
+
43
+ ### Updating Text
44
+
45
+ To update the tab on Benchmark details, make changes to the the following file - `src/components/tasks.py`
46
+ To update the metric definitions, make changes to the following file - `src/components/tasks.py`
47
 
48
  ### Adding New Metrics
49
 
 
91
  }
92
  ```
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  ## License
95
 
96
  [MIT License](LICENSE)