Yanglet commited on
Commit
8261efc
·
2 Parent(s): f362259 ce67009

Merge pull request #25 from miragecoa/main

Browse files
Files changed (1) hide show
  1. README.md +88 -69
README.md CHANGED
@@ -1,91 +1,110 @@
1
  ---
2
- title: Open Financial LLM Leaderboard
3
- emoji: 🏆
4
- colorFrom: blue
5
- colorTo: red
6
- sdk: docker
7
- hf_oauth: true
 
8
  pinned: true
9
  license: apache-2.0
10
- duplicated_from: open-llm-leaderboard/open_llm_leaderboard
11
- short_description: Evaluating LLMs on Multilingual Multimodal Financial Tasks
12
- tags:
13
- - leaderboard
14
- - modality:text
15
- - submission:manual
16
- - test:public
17
- - judge:function
18
- - eval:generation
19
- - domain:financial
20
  ---
21
 
22
- # Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- Modern React interface for comparing Large Language Models (LLMs) in an open and reproducible way.
25
 
26
- ## Features
27
 
28
- - 📊 Interactive table with advanced sorting and filtering
29
- - 🔍 Semantic model search
30
- - 📌 Pin models for comparison
31
- - 📱 Responsive and modern interface
32
- - 🎨 Dark/Light mode
33
- - ⚡️ Optimized performance with virtualization
34
 
35
- ## Architecture
36
 
37
- The project is split into two main parts:
38
 
39
- ### Frontend (React)
40
 
41
- ```
42
- frontend/
43
- ├── src/
44
- │ ├── components/ # Reusable UI components
45
- │ ├── pages/ # Application pages
46
- │ ├── hooks/ # Custom React hooks
47
- │ ├── context/ # React contexts
48
- │ └── constants/ # Constants and configurations
49
- ├── public/ # Static assets
50
- └── server.js # Express server for production
51
- ```
52
 
53
- ### Backend (FastAPI)
54
 
55
- ```
56
- backend/
57
- ├── app/
58
- │ ├── api/ # API router and endpoints
59
- │ │ └── endpoints/ # Specific API endpoints
60
- │ ├── core/ # Core functionality
61
- │ ├── config/ # Configuration
62
- │ └── services/ # Business logic services
63
- │ ├── leaderboard.py
64
- │ ├── models.py
65
- │ ├── votes.py
66
- │ └── hf_service.py
67
- └── utils/ # Utility functions
68
- ```
69
 
70
- ## Technologies
 
 
71
 
72
- ### Frontend
 
 
 
 
 
73
 
74
- - React
75
- - Material-UI
76
- - TanStack Table & Virtual
77
- - Express.js
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
- ### Backend
 
 
80
 
81
- - FastAPI
82
- - Hugging Face API
83
- - Docker
84
 
85
- ## Development
86
 
87
- The application is containerized using Docker and can be run using:
 
 
 
88
 
89
- ```bash
90
- docker-compose up
91
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Open FinLLM Leaderboard
3
+ emoji: 🥇
4
+ colorFrom: green
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 4.42.0
8
+ app_file: app.py
9
  pinned: true
10
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
+ ![badge-labs](https://user-images.githubusercontent.com/327285/230928932-7c75f8ed-e57b-41db-9fb7-a292a13a1e58.svg)
14
+
15
+ # Open Financial LLM Leaderboard (OFLL)
16
+
17
+ The growing complexity of financial large language models (LLMs) demands evaluations that go beyond general NLP benchmarks. Traditional leaderboards often focus on broader tasks like translation or summarization, but they fall short of addressing the specific needs of the finance industry. Financial tasks such as predicting stock movements, assessing credit risks, and extracting information from financial reports present unique challenges, requiring models with specialized capabilities. This is why we created the **Open Financial LLM Leaderboard (OFLL)**.
18
+
19
+ ## Why OFLL?
20
+
21
+ OFLL provides a specialized evaluation framework tailored specifically to the financial sector. It fills a critical gap by offering a transparent, one-stop solution to assess model readiness for real-world financial applications. The leaderboard focuses on tasks that matter most to finance professionals—information extraction from financial documents, market sentiment analysis, and financial trend forecasting.
22
+
23
+ ## Key Differentiators
24
+
25
+ - **Comprehensive Financial Task Coverage**: Unlike general LLM leaderboards that evaluate broad NLP capabilities, OFLL focuses exclusively on tasks directly relevant to finance. These include information extraction, sentiment analysis, credit risk scoring, and stock movement forecasting—tasks crucial for real-world financial decision-making.
26
 
27
+ - **Real-World Financial Relevance**: OFLL uses datasets that represent real-world challenges in the finance industry. This ensures models are not only tested on general NLP tasks but are also evaluated on their ability to handle complex financial data, making them suitable for industry applications.
28
 
29
+ - **Focused Zero-Shot Evaluation**: OFLL employs a zero-shot evaluation method, testing models on unseen financial tasks without prior fine-tuning. This highlights a model’s ability to generalize and perform well in financial contexts, such as predicting stock price movements or extracting entities from regulatory filings, without being explicitly trained on these tasks.
30
 
31
+ ## Key Features of OFLL
 
 
 
 
 
32
 
33
+ - **Diverse Task Categories**: OFLL covers tasks across seven categories: Information Extraction (IE), Textual Analysis (TA), Question Answering (QA), Text Generation (TG), Risk Management (RM), Forecasting (FO), and Decision-Making (DM).
34
 
35
+ - **Robust Evaluation Metrics**: Models are assessed using various metrics, including Accuracy, F1 Score, ROUGE Score, and Matthews Correlation Coefficient (MCC). These metrics provide a multidimensional view of model performance, helping users identify the strengths and weaknesses of each model.
36
 
37
+ The Open Financial LLM Leaderboard aims to set a new standard in evaluating the capabilities of language models in the financial domain, offering a specialized, real-world-focused benchmarking solution.
38
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
+ # Contribute to OFLL
41
 
42
+ To make the leaderboard more accessible for external contributors, we offer clear guidelines for adding tasks, updating result files, and other maintenance activities.
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
+ 1. **Primary Files**:
45
+ - `src/env.py`: Modify variables like repository paths for customization.
46
+ - `src/about.py`: Update task configurations here to add new datasets.
47
 
48
+ 2. **Adding New Tasks**:
49
+ - Navigate to `src/about.py` and specify new tasks in the `Tasks` enum section.
50
+ - Each task requires details such as `benchmark`, `metric`, `col_name`, and `category`. For example:
51
+ ```python
52
+ taskX = Task("DatasetName", "MetricType", "ColumnName", category="Category")
53
+ ```
54
 
55
+ 3. **Updating Results Files**:
56
+ - Results files should be in JSON format and structured as follows:
57
+ ```json
58
+ {
59
+ "config": {
60
+ "model_dtype": "torch.float16",
61
+ "model_name": "path of the model on the hub: org/model",
62
+ "model_sha": "revision on the hub"
63
+ },
64
+ "results": {
65
+ "task_name": {
66
+ "metric_name": score
67
+ },
68
+ "task_name2": {
69
+ "metric_name": score
70
+ }
71
+ }
72
+ }
73
+ ```
74
+
75
+ 4. **Updating Leaderboard Data**:
76
+ - When a new task is added, ensure that the results JSON files reflect this update. This process will be automated in future releases.
77
+ - Access the current results at [Hugging Face Datasets](https://huggingface.co/datasets/TheFinAI/results/tree/main/demo-leaderboard).
78
 
79
+ 5. **Useful Links**:
80
+ - [Hugging Face Leaderboard Documentation](https://huggingface.co/docs/leaderboards/en/leaderboards/building_page)
81
+ - [OFLL Demo on Hugging Face](https://huggingface.co/spaces/finosfoundation/Open-Financial-LLM-Leaderboard)
82
 
83
+
84
+ If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
 
85
 
86
+ # Code logic for more complex edits
87
 
88
+ You'll find
89
+ - the main table' columns names and properties in `src/display/utils.py`
90
+ - the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
91
+ - teh logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`
92
 
93
+ ## License
94
+
95
+ Copyright 2024 Fintech Open Source Foundation
96
+
97
+ Distributed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
98
+
99
+ SPDX-License-Identifier: [Apache-2.0](https://spdx.org/licenses/Apache-2.0)
100
+
101
+
102
+ ### Current submissions are manully evaluated. Will open a automatic evaluation pipeline in the future update
103
+ tags:
104
+ - leaderboard
105
+ - modality:text
106
+ - submission:manual
107
+ - test:public
108
+ - judge:humans
109
+ - eval:generation
110
+ - language:English