socr / docs /documentation /data /statsbomb.rst
scfive's picture
Upload 203 files
d6ea71e verified
.. currentmodule:: socceraction.data.statsbomb
=========================
Loading StatsBomb data
=========================
The :class:`StatsBombLoader` class provides an API client enabling you to
fetch `StatsBomb event stream data`_ as Pandas DataFrames. This document provides
an overview of the available data sources and how to access them.
------
Setup
------
To be able to load StatsBomb data, you'll first need to install a few
additional dependencies which are not included in the default installation of
socceraction. You can install these additional dependencies by running:
.. code-block:: console
$ pip install "socceraction[statsbomb]"
--------------------------
Connecting to a data store
--------------------------
First, you have to create a :class:`StatsBombLoader` object and configure it
for the data store you want to use. The :class:`StatsBombLoader` supports
loading data from the StatsBomb Open Data repository, from the official
StatsBomb API, and from local files.
Open Data repository
====================
StatsBomb has made event stream data of certain leagues freely available for
public non-commercial use at https://github.com/statsbomb/open-data. This open
data can be accessed without the need of authentication, but its use is
subject to a `user agreement`_. The code below shows how to setup an API client
that can fetch data from the repository.
.. code-block:: python
# optional: suppress warning about missing authentication
import warnings
from statsbombpy.api_client import NoAuthWarning
warnings.simplefilter('ignore', NoAuthWarning)
from socceraction.data.statsbomb import StatsBombLoader
api = StatsBombLoader(getter="remote", creds=None)
.. note::
If you publish, share or distribute any research, analysis or insights based
on this data, StatsBomb requires you to state the data source as StatsBomb
and use their logo.
StatsBomb API
=============
API access is for paying customers only. Authentication can be done by setting
environment variables named ``SB_USERNAME`` and ``SB_PASSWORD`` to your login
credentials. Alternatively, the constructor accepts an argument ``creds`` to
pass your login credentials in the format ``{"user": "", "passwd": ""}``.
.. code-block:: python
from socceraction.data.statsbomb import StatsBombLoader
# set authentication credentials as environment variables
import os
os.environ["SB_USERNAME"] = "your_username"
os.environ["SB_PASSWORD"] = "your_password"
api = StatsBombLoader(getter="remote")
# or provide authentication credentials as a dictionary
api = StatsBombLoader(getter="remote", creds={"user": "", "passwd": ""})
Local directory
===============
A final option is to load data from a local directory. This local directory
can be specified by passing the ``root`` argument to the constructor,
specifying the path to the local data directory.
.. code-block:: python
from socceraction.data.statsbomb import StatsBombLoader
api = StatsBombLoader(getter="local", root="data/statsbomb")
Note that the data should be organized in the same way as the StatsBomb Open
Data repository, which corresponds to the following file hierarchy:
.. code-block::
root
β”œβ”€β”€ competitions.json
β”œβ”€β”€ events
β”‚ β”œβ”€β”€ <match_id>.json
β”‚ β”œβ”€β”€ ...
β”‚ └── ...
β”œβ”€β”€ lineups
β”‚ β”œβ”€β”€ <match_id>.json
β”‚ └── ...
β”œβ”€β”€ matches
β”‚ β”œβ”€β”€ <competition_id>
β”‚ β”‚ └── <season_id>.json
β”‚ β”‚ └── ...
β”‚ └── ...
└── three-sixty
β”œβ”€β”€ <match_id>.json
└── ...
------------
Loading data
------------
Next, you can load the match event stream data and metadata by calling the
corresponding methods on the :class:`StatsBombLoader` object.
:func:`StatsBombLoader.competitions()`
======================================
.. code-block:: python
df_competitions = api.competitions()
.. csv-table::
:class: dataframe
:header: season_id,competition_id,competition_name,country_name,competition_gender,season_name
106,43,FIFA World Cup,International,male,2022
30,72,Women's World Cup,International,female,2019
3,43,FIFA World Cup,International,male,2018
:func:`StatsBombLoader.games()`
===============================
.. code-block:: python
df_games = api.games(competition_id=43, season_id=3)
.. csv-table::
:class: dataframe
:header: game_id,season_id,competition_id,competition_stage,game_day,game_date,home_team_id,away_team_id,home_score,away_score,venue,referee_id
8658,3,43,Final,7,2018-07-15 17:00:00,771,785,4,2,Stadion Luzhniki,730
8657,3,43,3rd Place Final,7,2018-07-14 16:00:00,782,768,2,0,Saint-Petersburg Stadium,741
:func:`StatsBombLoader.teams()`
===============================
.. code-block:: python
df_teams = api.teams(game_id=8658)
.. csv-table::
:class: dataframe
:header: team_id,team_name
:align: left
771,France
785,Croatia
:func:`StatsBombLoader.players()`
=================================
.. code-block:: python
df_players = api.players(game_id=8658)
.. csv-table::
:class: dataframe
:header: game_id,team_id,player_id,player_name,nickname,jersey_number,is_starter,starting_position_id,starting_position_name,minutes_played
8658,771,3009,Kylian MbappΓ© Lottin,Kylian MbappΓ©,10,True,12,Right Midfield,95
8658,785,5463,Luka Modrić,,10,True,13,Right Center Midfield,95
:func:`StatsBombLoader.events()`
================================
.. code-block:: python
df_events = api.events(game_id=8658)
.. csv-table::
:class: dataframe
:header: event_id,index,period_id,timestamp,minute,second,type_id,type_name,possession,possession_team_id,possession_team_name,play_pattern_id,play_pattern_name,team_id,team_name,duration,extra,related_events,player_id,player_name,position_id,position_name,location,under_pressure,counterpress,game_id
47638847-fd43-4656-b49c-cff64e5cfc0a,1,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,771,France,0.0,"{...}",[],,,,,,False,False,8658
0c04305d-5615-4520-9be5-7c232829954b,2,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,785,Croatia,1.412,"{...}",[],,,,,,False,False,8658
c5e17439-efe2-480b-9cff-1600998674d7,3,1,1900-01-01,0,0,18,Half Start,1,771,France,1,Regular Play,771,France,0.0,{},['7e1460eb-c572-4059-8cd4-cec4857f818d'],,,,,,False,False,8658
If `360 data snapshots`_ are available for the game, they can be loaded by
passing ``load_360=True`` to the ``events()`` method. This will add two columns
to the events dataframe: ``visible_area_360`` and ``freeze_frame_360``. The
former contains the visible area of the pitch in the 360 snapshot, while the
latter contains the player locations in the 360 snapshot.
.. code-block:: python
df_events = api.events(game_id=3788741, load_360=True)
.. _StatsBomb event stream data: https://statsbomb.com/what-we-do/soccer-data/
.. _statsbombpy: https://pypi.org/project/statsbombpy/
.. _user agreement: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf
.. _360 data snapshots: https://statsbomb.com/what-we-do/soccer-data/360-2/