Spaces:

maum-ai
/

KOFFVQA-Leaderboard

Running

App Files Files Community

lastdefiance20 commited on Apr 1

Commit

e01a765

verified ·

1 Parent(s): 5f2ff98

Update content.py

Browse files

Files changed (1) hide show

content.py +9 -3

content.py CHANGED Viewed

@@ -13,9 +13,9 @@ Bottom_logo = f'''<img src="data:image/jpeg;base64,{bottom_logo}" style="width:2
 intro_md = f'''
 # {benchname} Leaderboard
-* [Dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data)
-* [Evaluation Code](https://github.com/maum-ai/KOFFVQA)
-* Report (coming soon)
 {benchname}🔍 is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
@@ -35,6 +35,8 @@ This benchmark includes a total of 275 Korean questions across 10 tasks. The que
 ## News
 * **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) and [dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data) release
 * **2024-12-06**: Leaderboard Release!
@@ -47,4 +49,8 @@ submit_md = f'''
 We are not accepting model addition requests at the moment. Once the request system is established, we will start accepting requests.
 '''.strip()

 intro_md = f'''
 # {benchname} Leaderboard
+* [📊 Dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data)
+* [🧪 Evaluation Code](https://github.com/maum-ai/KOFFVQA)
+* [📄 Report] (https://arxiv.org/abs/2503.23730)
 {benchname}🔍 is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
 ## News
+* **2025-04-01** : Our paper [KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language](https://arxiv.org/abs/2503.23730) has released and accepted to CVPRW 2025, Workshop on Benchmarking and Expanding AI Multimodal Approaches(BEAM 2025) 🎉
 * **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) and [dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data) release
 * **2024-12-06**: Leaderboard Release!
 We are not accepting model addition requests at the moment. Once the request system is established, we will start accepting requests.
+🚀 Curious how your VLM performs in Korean? Use our [Evaluation code](https://github.com/maum-ai/KOFFVQA) to run it on KOFFVQA and check the score.
+🧑‍⚖️ We currently use google/gemma-2-9b-it as the judge model, so there's no need to worry about API keys or usage fees.
 '''.strip()