Spaces:
Running
Running
File size: 2,708 Bytes
bc0f422 da544b6 df32536 bc0f422 df32536 96e3dc1 e01a765 2c94376 df32536 cc7cabb df32536 bc0f422 df32536 bc0f422 df32536 bc0f422 642b52f df32536 e01a765 0870049 96e3dc1 bc0f422 df32536 bc0f422 df32536 e01a765 df32536 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
import os
import base64
current_dir = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(current_dir, "bottom_logo.png"), "rb") as image_file:
bottom_logo = base64.b64encode(image_file.read()).decode("utf-8")
benchname = 'KOFFVQA'
Bottom_logo = f'''<img src="data:image/jpeg;base64,{bottom_logo}" style="width:20%;display:block;margin-left:auto;margin-right:auto">'''
intro_md = f'''
# {benchname} Leaderboard
* [π Dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data)
* [π§ͺ Evaluation Code](https://github.com/maum-ai/KOFFVQA)
* [π Report](https://arxiv.org/abs/2503.23730)
{benchname}π is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.
The dataset encompasses diverse real-world scenarios, including object attributes, recognition, relationship, etc.
The page will be continuously updated and will accept requests to add models to the leaderboard. For more details, please refer to the "Submit" tab.
'''.strip()
about_md = f'''
# About
The {benchname} benchmark is designed to evaluate and compare the performance of Vision-Language Models (VLMs) in Korean language environments.
This benchmark includes a total of 275 Korean questions across 10 tasks. The questions are open-ended, free-form VQA (Visual Question Answering) with objective answers, allowing responses without strict format constraints.
## News
* **2025-04-01** : Our paper [KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language](https://arxiv.org/abs/2503.23730) has released and accepted to CVPRW 2025, Workshop on Benchmarking and Expanding AI Multimodal Approaches(BEAM 2025) π
* **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) and [dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data) release
* **2024-12-06**: Leaderboard Release!
'''.strip()
submit_md = f'''
# Submit (coming soon)
We are not accepting model addition requests at the moment. Once the request system is established, we will start accepting requests.
π Curious how your VLM performs in Korean? Use our [Evaluation code](https://github.com/maum-ai/KOFFVQA) to run it on KOFFVQA and check the score.
π§ββοΈ We currently use google/gemma-2-9b-it as the judge model, so there's no need to worry about API keys or usage fees.
'''.strip()
|