README / README.md
zjowowen's picture
Update README.md
b7e022f verified
metadata
title: README
emoji: πŸ’»
colorFrom: red
colorTo: gray
sdk: static
pinned: false
license: apache-2.0

OpenDILab is the first decision intelligence platform covering the most comprehensive algorithms and applications both in academia and industry. Since July 2021, OpenDILab has been officially open-sourced at the World Artificial Intelligence Conference(WAIC) opening ceremony.

As an important part of OpenXLab from Shanghai AI Laboratory, OpenDILab features a complete set of training and deployment frameworks for decision intelligence, consisting of the application ecological layer, algorithm abstraction layer, distributed management layer, and distributed execution layer. OpenDILab also supports full-scale scheduling system optimization from a single machine to the joint training of thousands of CPU/GPU.

feature

OpenDILab contributes to the integration of the latest and most comprehensive achievements in academia as well as the standardization of complex problems in the industry. Our future vision is to promote the development of AI from perceptual intelligence to decision intelligence, taking AI technology to a higher level of the general intelligence era.

If you want to contact us & join us, you can βœ‰οΈ to our team : [email protected].

Overview of Model Zoo

(1): "πŸ”“" means that this algorithm doesn't support this environment. (2): "⏳" means that the corresponding model is in the upload waitinglist (Work In Progress).

Deep Reinforcement Learning

(Click to Collapse)
Algo.\Env. LunarLander LunarLanderContinuous BipedalWalker Pendulum Pong SpaceInvaders Qbert Hopper Halfcheetah Walker2d
PPO βœ… βœ… βœ… βœ… βœ… βœ… βœ… βœ… βœ… βœ…
DQN βœ… πŸ”’ πŸ”’ πŸ”’ βœ… βœ… βœ… πŸ”’ πŸ”’ πŸ”’
C51 βœ… πŸ”’ πŸ”’ πŸ”’ βœ… βœ… βœ… πŸ”’ πŸ”’ πŸ”’
DDPG πŸ”’ βœ… βœ… βœ… πŸ”’ πŸ”’ πŸ”’ βœ… βœ… βœ…
TD3 πŸ”’ βœ… βœ… βœ… πŸ”’ πŸ”’ πŸ”’ βœ… βœ… βœ…
SAC πŸ”’ βœ… βœ… βœ… πŸ”’ πŸ”’ πŸ”’ βœ… βœ… βœ…
IMPALA βœ… βœ… βœ…

Monte Carlo tree search

(Click to Collapse)
Algo.\Env. CartPole LunarLander LunarLanderContinuous Pendulum Pong Breakout MsPacman TicTacToe
AlphaZero πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ βœ…
Sampled AlphaZero πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ βœ…
MuZero βœ… βœ… πŸ”’ βœ… βœ… βœ… βœ… βœ…
EfficientZero βœ… βœ… πŸ”’ βœ… βœ… βœ… πŸ”’
Gumbel MuZero βœ… πŸ”’ πŸ”’ πŸ”’ βœ…
Sampled EfficientZero βœ… βœ… βœ… βœ… πŸ”’
Stochastic MuZero πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’ πŸ”’

Multi-Agent Reinforcement Learning

(Click for Details) TBD

Offline Reinforcement Learning

(Click for Details) TBD

Model-Based Reinforcement Learning

(Click for Details) TBD