Spaces:

BAAI
/

EmbodiedVerse

Running

App Files Files Community

lixuejing commited on Jun 9

Commit

f0c809a

1 Parent(s): 6981fa7

update

Browse files

Files changed (1) hide show

src/about.py +58 -19

src/about.py CHANGED Viewed

@@ -78,24 +78,19 @@ FlagEval-Embodied Verse is a scientific and comprehensive embodied evaluation to
 We hope to promote a more open ecosystem for embodied model developers to participate and contribute accordingly to the advancement of embodied models. To achieve the goal of fairness, all models are evaluated all models are evaluated under the FlagEvalMM framework using standardized GPUs and a unified environment to ensure fairness.
-# How it works
-## Embodied Verse tool - FlagEvalMM
-FlagEvalMM是一个开源评估框架，旨在全面评估多模态模型，其提供了一种标准化的方法来评估跨各种任务和指标使用多种模式（文本、图像、视频）的模型。
-- 灵活的架构：支持多个多模态模型和评估任务，包括VQA、图像检索、文本到图像等。
-- 全面的基准与度量：支持最新的和常用的基准和度量。
-- 广泛的模型支持：model_zoo为广泛流行的多模态模型（包括QWenVL和LLaVA）提供了推理支持。此外，它还提供了与基于API的模型（如GPT、Claude和HuanYuan）的无缝集成。
-- 可扩展的设计：易于扩展，可合并新的模型、基准和评估指标。
-FlagEvalMM is an open-source evaluation framework designed to  comprehensively assess multimodal models. It provides a standardized way  to evaluate models that work with multiple modalities (text, images,  video) across various tasks and metrics.
-- Flexible Architecture: Support for multiple multimodal models and evaluation tasks, including: VQA, image retrieval, text-to-image, etc.
-- Comprehensive Benchmarks and Metrics: Support new and commonly used benchmarks and metrics.
-- Extensive Model Support: The model_zoo provides  inference support for a wide range of popular multimodal models  including QWenVL and LLaVA. Additionally, it offers seamless integration  with API-based models such as GPT, Claude, and HuanYuan.
-- Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
-# Embodied Verse
 EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset，包括：
 - <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>:  包含100张来自不同杂乱环境的真实世界图像，每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码，用于评估基于空间关系的自由空间指代表达。
@@ -124,13 +119,57 @@ EmbodiedVerse-Open is a meta-dataset composed of 10 datasets for comprehensively
 数据集子集链接 ：comming soon
-## Details and logs
 You can find:
 - detailed numerical results in the results Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_results" target="_blank"> EmbodiedVerse_results </a>
 - community queries and running status in the requests Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_requests" target="_blank"> EmbodiedVerse_requests </a>
-## Useful links
 - <a href="https://github.com/flageval-baai/FlagEvalMM" target="_blank"> [FlagEvalMM] </a>
 - <a href="https://flageval.baai.ac.cn/#/home" target="_blank"> [FlagEval] </a>
 - <a href="https://huggingface.co/spaces/BAAI/open_flageval_vlm_leaderboard" target="_blank"> [VLM Leaderboard] </a>

 We hope to promote a more open ecosystem for embodied model developers to participate and contribute accordingly to the advancement of embodied models. To achieve the goal of fairness, all models are evaluated all models are evaluated under the FlagEvalMM framework using standardized GPUs and a unified environment to ensure fairness.
+## Embodied Verse
+数据集	论文	链接
+Where2Place	https://arxiv.org/abs/2406.10721	https://huggingface.co/datasets/FlagEval/Where2Place
+Blink	https://arxiv.org/abs/2404.12390	https://huggingface.co/datasets/BLINK-Benchmark/BLINK
+CVBench	https://arxiv.org/abs/2406.16860	https://huggingface.co/datasets/nyu-visionx/CV-Bench
+RoboSpatial-Home	https://arxiv.org/abs/2411.16537	https://huggingface.co/datasets/chanhee-luke/RoboSpatial-Home
+EmbspatialBench	https://arxiv.org/abs/2406.05756	https://huggingface.co/datasets/Phineas476/EmbSpatial-Bench
+All-Angles Bench	https://arxiv.org/abs/2504.15280	-
+VSI-Bench	https://arxiv.org/abs/2412.14171	https://huggingface.co/datasets/nyu-visionx/VSI-Bench
+SAT	https://arxiv.org/abs/2412.07755	https://huggingface.co/datasets/FlagEval/SAT
+EgoPlan-Bench2	https://arxiv.org/abs/2412.04447	-
+ERQA	https://arxiv.org/abs/2503.20020	https://huggingface.co/datasets/FlagEval/ERQA/
 EmbodiedVerse-Open是一个由10个数据集构成的用于全面评测模型在具身智能场景下的meta dataset，包括：
 - <a href="https://arxiv.org/pdf/2406.10721" target="_blank"> Where2Place </a>:  包含100张来自不同杂乱环境的真实世界图像，每张图像都标注了一句描述所需自由空间位置的语句和一个对应的掩码，用于评估基于空间关系的自由空间指代表达。
 数据集子集链接 ：comming soon
+## EmbodiedVerse Open Sample
+我们对上述10个数据集的数据进行了能力维度划分，归纳出具身智能场景需要的4大能力维度空间理解，感知，预测，规划。并按照能力维度，采样出一个样本数为2042的优质子集，能力维度定义和各维度的数据量如下：
+We have categorized the data of the above 10 datasets by capability dimensions, and summarized four major capability dimensions required for embodied intelligence scenarios: spatial understanding, perception, prediction, and planning. According to the capability dimensions, a high-quality subset with 2,042 samples was sampled. The definitions of the capability dimensions and the data volume of each dimension are as follows:
+Spatial Reasoning: 1085 53.13%
+  Dynamic: 200 18.43%
+  Relative direction: 200 18.43%
+  Multi-view matching: 200 18.43%
+  Relative distance: 200 18.43%
+  Depth estimation: 107 9.86%
+  Relative shape: 82 7.56%
+  Size estimation: 96 8.85%
+Perception: 448 21.94%
+  Visual Grounding: 200 44.64%
+  Counting: 200 44.64%
+  State & Activity Understanding: 48 10.71%
+Prediction: 245 12.00%
+  Trajectory: 188 76.73%
+  Future prediction: 57 23.27%
+Planning: 264 12.93%
+  Goal Decomposition: 200 75.76%
+  Navigation: 64 24.24%
+## Embodied Verse tool - FlagEvalMM
+FlagEvalMM是一个开源评估框架，旨在全面评估多模态模型，其提供了一种标准化的方法来评估跨各种任务和指标使用多种模式（文本、图像、视频）的模型。
+- 灵活的架构：支持多个多模态模型和评估任务，包括VQA、图像检索、文本到图像等。
+- 全面的基准与度量：支持最新的和常用的基准和度量。
+- 广泛的模型支持：model_zoo为广泛流行的多模态模型（包括QWenVL和LLaVA）提供了推理支持。此外，它还提供了与基于API的模型（如GPT、Claude和HuanYuan）的无缝集成。
+- 可扩展的设计：易于扩展，可合并新的模型、基准和评估指标。
+FlagEvalMM is an open-source evaluation framework designed to  comprehensively assess multimodal models. It provides a standardized way  to evaluate models that work with multiple modalities (text, images,  video) across various tasks and metrics.
+- Flexible Architecture: Support for multiple multimodal models and evaluation tasks, including: VQA, image retrieval, text-to-image, etc.
+- Comprehensive Benchmarks and Metrics: Support new and commonly used benchmarks and metrics.
+- Extensive Model Support: The model_zoo provides  inference support for a wide range of popular multimodal models  including QWenVL and LLaVA. Additionally, it offers seamless integration  with API-based models such as GPT, Claude, and HuanYuan.
+- Extensible Design: Easily extendable to incorporate new models, benchmarks, and evaluation metrics.
+# Details and logs
 You can find:
 - detailed numerical results in the results Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_results" target="_blank"> EmbodiedVerse_results </a>
 - community queries and running status in the requests Hugging Face dataset: <a href="https://huggingface.co/datasets/open-cn-llm-leaderboard/EmbodiedVerse_requests" target="_blank"> EmbodiedVerse_requests </a>
+# Useful links
 - <a href="https://github.com/flageval-baai/FlagEvalMM" target="_blank"> [FlagEvalMM] </a>
 - <a href="https://flageval.baai.ac.cn/#/home" target="_blank"> [FlagEval] </a>
 - <a href="https://huggingface.co/spaces/BAAI/open_flageval_vlm_leaderboard" target="_blank"> [VLM Leaderboard] </a>