Spaces:
Running
Running
lixuejing
commited on
Commit
·
f0c809a
1
Parent(s):
6981fa7
update
Browse files- src/about.py +58 -19
src/about.py
CHANGED
@@ -78,24 +78,19 @@ FlagEval-Embodied Verse is a scientific and comprehensive embodied evaluation to
|
|
78 |
|
79 |
We hope to promote a more open ecosystem for embodied model developers to participate and contribute accordingly to the advancement of embodied models. To achieve the goal of fairness, all models are evaluated all models are evaluated under the FlagEvalMM framework using standardized GPUs and a unified environment to ensure fairness.
|
80 |
|
81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
-
## Embodied Verse tool - FlagEvalMM
|
84 |
-
FlagEvalMM是一个开源评估框架,旨在全面评估多模态模型,其提供了一种标准化的方法来评估跨各种任务和指标使用多种模式(文本、图像、视频)的模型。
|
85 |
-
|
86 |
-
- 灵活的架构:支持多个多模态模型和评估任务,包括VQA、图像检索、文本到图像等。
|
87 |
-
- 全面的基准与度量:支持最新的和常用的基准和度量。
|
88 |
-
- 广泛的模型支持:model_zoo为广泛流行的多模态模型(包括QWenVL和LLaVA)提供了推理支持。此外,它还提供了与基于API的模型(如GPT、Claude和HuanYuan)的无缝集成。
|
89 |
-
- 可扩展的设计:易于扩展,可合并新的模型、基准和评估指标。
|
90 |
-
|
91 |
-
FlagEvalMM is an open-source evaluation framework designed to comprehensively assess multimodal models. It provides a standardized way to evaluate models that work with multiple modalities (text, images, video) across various tasks and metrics.
|
92 |
-
|
93 |
-
- Flexible Architecture: Support for multiple multimodal models and evaluation tasks, including: VQA, image retrieval, text-to-image, etc.
|
94 |
-
- Comprehensive Benchmarks and Metrics: Support new and commonly used benchmarks and metrics.
|
95 |
-
- Extensive Model Support: The model_zoo provides inference support for a wide range of popular multimodal models including QWenVL and LLaVA. Additionally, it offers seamless integration with API-based models such as GPT, Claude, and HuanYuan.
|
96 |
-
- Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
|
97 |
-
|
98 |
-
# Embodied Verse
|
99 |
EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
|
100 |
|
101 |
- <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
|
@@ -124,13 +119,57 @@ EmbodiedVerse-Open is a meta-dataset composed of 10 datasets for comprehensively
|
|
124 |
|
125 |
数据集子集链接 :comming soon
|
126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
|
128 |
-
|
129 |
You can find:
|
130 |
- detailed numerical results in the results Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_results" target="_blank"> EmbodiedVerse_results </a>
|
131 |
- community queries and running status in the requests Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_requests" target="_blank"> EmbodiedVerse_requests </a>
|
132 |
|
133 |
-
|
134 |
- <a href="https://github.com/flageval-baai/FlagEvalMM" target="_blank"> [FlagEvalMM] </a>
|
135 |
- <a href="https://flageval.baai.ac.cn/#/home" target="_blank"> [FlagEval] </a>
|
136 |
- <a href="https://huggingface.co/spaces/BAAI/open_flageval_vlm_leaderboard" target="_blank"> [VLM Leaderboard] </a>
|
|
|
78 |
|
79 |
We hope to promote a more open ecosystem for embodied model developers to participate and contribute accordingly to the advancement of embodied models. To achieve the goal of fairness, all models are evaluated all models are evaluated under the FlagEvalMM framework using standardized GPUs and a unified environment to ensure fairness.
|
80 |
|
81 |
+
## Embodied Verse
|
82 |
+
数据集 论文 链接
|
83 |
+
Where2Place https://arxiv.org/abs/2406.10721 https://huggingface.co/datasets/FlagEval/Where2Place
|
84 |
+
Blink https://arxiv.org/abs/2404.12390 https://huggingface.co/datasets/BLINK-Benchmark/BLINK
|
85 |
+
CVBench https://arxiv.org/abs/2406.16860 https://huggingface.co/datasets/nyu-visionx/CV-Bench
|
86 |
+
RoboSpatial-Home https://arxiv.org/abs/2411.16537 https://huggingface.co/datasets/chanhee-luke/RoboSpatial-Home
|
87 |
+
EmbspatialBench https://arxiv.org/abs/2406.05756 https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench
|
88 |
+
All-Angles Bench https://arxiv.org/abs/2504.15280 -
|
89 |
+
VSI-Bench https://arxiv.org/abs/2412.14171 https://huggingface.co/datasets/nyu-visionx/VSI-Bench
|
90 |
+
SAT https://arxiv.org/abs/2412.07755 https://huggingface.co/datasets/FlagEval/SAT
|
91 |
+
EgoPlan-Bench2 https://arxiv.org/abs/2412.04447 -
|
92 |
+
ERQA https://arxiv.org/abs/2503.20020 https://huggingface.co/datasets/FlagEval/ERQA/
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
|
95 |
|
96 |
- <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
|
|
|
119 |
|
120 |
数据集子集链接 :comming soon
|
121 |
|
122 |
+
## EmbodiedVerse Open Sample
|
123 |
+
|
124 |
+
我们对上述10个数据集的数据进行了能力维度划分,归纳出具身智能场景需要的4大能力维度空间理解,感知,预测,规划。并按照能力维度,采样出一个样本数为2042的优质子集,能力维度定义和各维度的数据量如下:
|
125 |
+
|
126 |
+
We have categorized the data of the above 10 datasets by capability dimensions, and summarized four major capability dimensions required for embodied intelligence scenarios: spatial understanding, perception, prediction, and planning. According to the capability dimensions, a high-quality subset with 2,042 samples was sampled. The definitions of the capability dimensions and the data volume of each dimension are as follows:
|
127 |
+
|
128 |
+
Spatial Reasoning: 1085 53.13%
|
129 |
+
Dynamic: 200 18.43%
|
130 |
+
Relative direction: 200 18.43%
|
131 |
+
Multi-view matching: 200 18.43%
|
132 |
+
Relative distance: 200 18.43%
|
133 |
+
Depth estimation: 107 9.86%
|
134 |
+
Relative shape: 82 7.56%
|
135 |
+
Size estimation: 96 8.85%
|
136 |
+
|
137 |
+
Perception: 448 21.94%
|
138 |
+
Visual Grounding: 200 44.64%
|
139 |
+
Counting: 200 44.64%
|
140 |
+
State & Activity Understanding: 48 10.71%
|
141 |
+
|
142 |
+
Prediction: 245 12.00%
|
143 |
+
Trajectory: 188 76.73%
|
144 |
+
Future prediction: 57 23.27%
|
145 |
+
|
146 |
+
Planning: 264 12.93%
|
147 |
+
Goal Decomposition: 200 75.76%
|
148 |
+
Navigation: 64 24.24%
|
149 |
+
|
150 |
+
## Embodied Verse tool - FlagEvalMM
|
151 |
+
FlagEvalMM是一个开源评估框架,旨在全面评估多模态模型,其提供了一种标准化的方法来评估跨各种任务和指标使用多种模式(文本、图像、视频)的模型。
|
152 |
+
|
153 |
+
- 灵活的架构:支持多个多模态模型和评估任务,包括VQA、图像检索、文本到图像等。
|
154 |
+
- 全面的基准与度量:支持最新的和常用的基准和度量。
|
155 |
+
- 广泛的模型支持:model_zoo为广泛流行的多模态模型(包括QWenVL和LLaVA)提供了推理支持。此外,它还提供了与基于API的模型(如GPT、Claude和HuanYuan)的无缝集成。
|
156 |
+
- 可扩展的设计:易于扩展,可合并新的模型、基准和评估指标。
|
157 |
+
|
158 |
+
FlagEvalMM is an open-source evaluation framework designed to comprehensively assess multimodal models. It provides a standardized way to evaluate models that work with multiple modalities (text, images, video) across various tasks and metrics.
|
159 |
+
|
160 |
+
- Flexible Architecture: Support for multiple multimodal models and evaluation tasks, including: VQA, image retrieval, text-to-image, etc.
|
161 |
+
- Comprehensive Benchmarks and Metrics: Support new and commonly used benchmarks and metrics.
|
162 |
+
- Extensive Model Support: The model_zoo provides inference support for a wide range of popular multimodal models including QWenVL and LLaVA. Additionally, it offers seamless integration with API-based models such as GPT, Claude, and HuanYuan.
|
163 |
+
- Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
|
164 |
+
|
165 |
+
|
166 |
|
167 |
+
# Details and logs
|
168 |
You can find:
|
169 |
- detailed numerical results in the results Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_results" target="_blank"> EmbodiedVerse_results </a>
|
170 |
- community queries and running status in the requests Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_requests" target="_blank"> EmbodiedVerse_requests </a>
|
171 |
|
172 |
+
# Useful links
|
173 |
- <a href="https://github.com/flageval-baai/FlagEvalMM" target="_blank"> [FlagEvalMM] </a>
|
174 |
- <a href="https://flageval.baai.ac.cn/#/home" target="_blank"> [FlagEval] </a>
|
175 |
- <a href="https://huggingface.co/spaces/BAAI/open_flageval_vlm_leaderboard" target="_blank"> [VLM Leaderboard] </a>
|