lixuejing commited on
Commit
f0c809a
·
1 Parent(s): 6981fa7
Files changed (1) hide show
  1. src/about.py +58 -19
src/about.py CHANGED
@@ -78,24 +78,19 @@ FlagEval-Embodied Verse is a scientific and comprehensive embodied evaluation to
78
 
79
  We hope to promote a more open ecosystem for embodied model developers to participate and contribute accordingly to the advancement of embodied models. To achieve the goal of fairness, all models are evaluated all models are evaluated under the FlagEvalMM framework using standardized GPUs and a unified environment to ensure fairness.
80
 
81
- # How it works
 
 
 
 
 
 
 
 
 
 
 
82
 
83
- ## Embodied Verse tool - FlagEvalMM
84
- FlagEvalMM是一个开源评估框架,旨在全面评估多模态模型,其提供了一种标准化的方法来评估跨各种任务和指标使用多种模式(文本、图像、视频)的模型。
85
-
86
- - 灵活的架构:支持多个多模态模型和评估任务,包括VQA、图像检索、文本到图像等。
87
- - 全面的基准与度量:支持最新的和常用的基准和度量。
88
- - 广泛的模型支持:model_zoo为广泛流行的多模态模型(包括QWenVL和LLaVA)提供了推理支持。此外,它还提供了与基于API的模型(如GPT、Claude和HuanYuan)的无缝集成。
89
- - 可扩展的设计:易于扩展,可合并新的模型、基准和评估指标。
90
-
91
- FlagEvalMM is an open-source evaluation framework designed to comprehensively assess multimodal models. It provides a standardized way to evaluate models that work with multiple modalities (text, images, video) across various tasks and metrics.
92
-
93
- - Flexible Architecture: Support for multiple multimodal models and evaluation tasks, including: VQA, image retrieval, text-to-image, etc.
94
- - Comprehensive Benchmarks and Metrics: Support new and commonly used benchmarks and metrics.
95
- - Extensive Model Support: The model_zoo provides inference support for a wide range of popular multimodal models including QWenVL and LLaVA. Additionally, it offers seamless integration with API-based models such as GPT, Claude, and HuanYuan.
96
- - Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
97
-
98
- # Embodied Verse
99
  EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
100
 
101
  - <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
@@ -124,13 +119,57 @@ EmbodiedVerse-Open is a meta-dataset composed of 10 datasets for comprehensively
124
 
125
  数据集子集链接 :comming soon
126
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
 
128
- ## Details and logs
129
  You can find:
130
  - detailed numerical results in the results Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_results" target="_blank"> EmbodiedVerse_results </a>
131
  - community queries and running status in the requests Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_requests" target="_blank"> EmbodiedVerse_requests </a>
132
 
133
- ## Useful links
134
  - <a href="https://github.com/flageval-baai/FlagEvalMM" target="_blank"> [FlagEvalMM] </a>
135
  - <a href="https://flageval.baai.ac.cn/#/home" target="_blank"> [FlagEval] </a>
136
  - <a href="https://huggingface.co/spaces/BAAI/open_flageval_vlm_leaderboard" target="_blank"> [VLM Leaderboard] </a>
 
78
 
79
  We hope to promote a more open ecosystem for embodied model developers to participate and contribute accordingly to the advancement of embodied models. To achieve the goal of fairness, all models are evaluated all models are evaluated under the FlagEvalMM framework using standardized GPUs and a unified environment to ensure fairness.
80
 
81
+ ## Embodied Verse
82
+ 数据集 论文 链接
83
+ Where2Place https://arxiv.org/abs/2406.10721 https://huggingface.co/datasets/FlagEval/Where2Place
84
+ Blink https://arxiv.org/abs/2404.12390 https://huggingface.co/datasets/BLINK-Benchmark/BLINK
85
+ CVBench https://arxiv.org/abs/2406.16860 https://huggingface.co/datasets/nyu-visionx/CV-Bench
86
+ RoboSpatial-Home https://arxiv.org/abs/2411.16537 https://huggingface.co/datasets/chanhee-luke/RoboSpatial-Home
87
+ EmbspatialBench https://arxiv.org/abs/2406.05756 https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench
88
+ All-Angles Bench https://arxiv.org/abs/2504.15280 -
89
+ VSI-Bench https://arxiv.org/abs/2412.14171 https://huggingface.co/datasets/nyu-visionx/VSI-Bench
90
+ SAT https://arxiv.org/abs/2412.07755 https://huggingface.co/datasets/FlagEval/SAT
91
+ EgoPlan-Bench2 https://arxiv.org/abs/2412.04447 -
92
+ ERQA https://arxiv.org/abs/2503.20020 https://huggingface.co/datasets/FlagEval/ERQA/
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset,包括:
95
 
96
  - <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>: 包含100张来自不同杂乱环境的真实世界图像,每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码,用于评估基于空间关系的自由空间指代表达。
 
119
 
120
  数据集子集链接 :comming soon
121
 
122
+ ## EmbodiedVerse Open Sample
123
+
124
+ 我们对上述10个数据集的数据进行了能力维度划分,归纳出具身智能场景需要的4大能力维度空间理解,感知,预测,规划。并按照能力维度,采样出一个样本数为2042的优质子集,能力维度定义和各维度的数据量如下:
125
+
126
+ We have categorized the data of the above 10 datasets by capability dimensions, and summarized four major capability dimensions required for embodied intelligence scenarios: spatial understanding, perception, prediction, and planning. According to the capability dimensions, a high-quality subset with 2,042 samples was sampled. The definitions of the capability dimensions and the data volume of each dimension are as follows:
127
+
128
+ Spatial Reasoning: 1085 53.13%
129
+ Dynamic: 200 18.43%
130
+ Relative direction: 200 18.43%
131
+ Multi-view matching: 200 18.43%
132
+ Relative distance: 200 18.43%
133
+ Depth estimation: 107 9.86%
134
+ Relative shape: 82 7.56%
135
+ Size estimation: 96 8.85%
136
+
137
+ Perception: 448 21.94%
138
+ Visual Grounding: 200 44.64%
139
+ Counting: 200 44.64%
140
+ State & Activity Understanding: 48 10.71%
141
+
142
+ Prediction: 245 12.00%
143
+ Trajectory: 188 76.73%
144
+ Future prediction: 57 23.27%
145
+
146
+ Planning: 264 12.93%
147
+ Goal Decomposition: 200 75.76%
148
+ Navigation: 64 24.24%
149
+
150
+ ## Embodied Verse tool - FlagEvalMM
151
+ FlagEvalMM是一个开源评估框架,旨在全面评估多模态模型,其提供了一种标准化的方法来评估跨各种任务和指标使用多种模式(文本、图像、视频)的模型。
152
+
153
+ - 灵活的架构:支持多个多模态模型和评估任务,包括VQA、图像检索、文本到图像等。
154
+ - 全面的基准与度量:支持最新的和常用的基准和度量。
155
+ - 广泛的模型支持:model_zoo为广泛流行的多模态模型(包括QWenVL和LLaVA)提供了推理支持。此外,它还提供了与基于API的模型(如GPT、Claude和HuanYuan)的无缝集成。
156
+ - 可扩展的设计:易于扩展,可合并新的模型、基准和评估指标。
157
+
158
+ FlagEvalMM is an open-source evaluation framework designed to comprehensively assess multimodal models. It provides a standardized way to evaluate models that work with multiple modalities (text, images, video) across various tasks and metrics.
159
+
160
+ - Flexible Architecture: Support for multiple multimodal models and evaluation tasks, including: VQA, image retrieval, text-to-image, etc.
161
+ - Comprehensive Benchmarks and Metrics: Support new and commonly used benchmarks and metrics.
162
+ - Extensive Model Support: The model_zoo provides inference support for a wide range of popular multimodal models including QWenVL and LLaVA. Additionally, it offers seamless integration with API-based models such as GPT, Claude, and HuanYuan.
163
+ - Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
164
+
165
+
166
 
167
+ # Details and logs
168
  You can find:
169
  - detailed numerical results in the results Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_results" target="_blank"> EmbodiedVerse_results </a>
170
  - community queries and running status in the requests Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_requests" target="_blank"> EmbodiedVerse_requests </a>
171
 
172
+ # Useful links
173
  - <a href="https://github.com/flageval-baai/FlagEvalMM" target="_blank"> [FlagEvalMM] </a>
174
  - <a href="https://flageval.baai.ac.cn/#/home" target="_blank"> [FlagEval] </a>
175
  - <a href="https://huggingface.co/spaces/BAAI/open_flageval_vlm_leaderboard" target="_blank"> [VLM Leaderboard] </a>