|
.. currentmodule:: socceraction.data.statsbomb |
|
|
|
========================= |
|
Loading StatsBomb data |
|
========================= |
|
|
|
The :class:`StatsBombLoader` class provides an API client enabling you to |
|
fetch `StatsBomb event stream data`_ as Pandas DataFrames. This document provides |
|
an overview of the available data sources and how to access them. |
|
|
|
------ |
|
Setup |
|
------ |
|
|
|
To be able to load StatsBomb data, you'll first need to install a few |
|
additional dependencies which are not included in the default installation of |
|
socceraction. You can install these additional dependencies by running: |
|
|
|
.. code-block:: console |
|
|
|
$ pip install "socceraction[statsbomb]" |
|
|
|
|
|
-------------------------- |
|
Connecting to a data store |
|
-------------------------- |
|
|
|
First, you have to create a :class:`StatsBombLoader` object and configure it |
|
for the data store you want to use. The :class:`StatsBombLoader` supports |
|
loading data from the StatsBomb Open Data repository, from the official |
|
StatsBomb API, and from local files. |
|
|
|
|
|
Open Data repository |
|
==================== |
|
|
|
StatsBomb has made event stream data of certain leagues freely available for |
|
public non-commercial use at https://github.com/statsbomb/open-data. This open |
|
data can be accessed without the need of authentication, but its use is |
|
subject to a `user agreement`_. The code below shows how to setup an API client |
|
that can fetch data from the repository. |
|
|
|
.. code-block:: python |
|
|
|
# optional: suppress warning about missing authentication |
|
import warnings |
|
from statsbombpy.api_client import NoAuthWarning |
|
warnings.simplefilter('ignore', NoAuthWarning) |
|
|
|
from socceraction.data.statsbomb import StatsBombLoader |
|
|
|
api = StatsBombLoader(getter="remote", creds=None) |
|
|
|
|
|
.. note:: |
|
If you publish, share or distribute any research, analysis or insights based |
|
on this data, StatsBomb requires you to state the data source as StatsBomb |
|
and use their logo. |
|
|
|
|
|
StatsBomb API |
|
============= |
|
|
|
API access is for paying customers only. Authentication can be done by setting |
|
environment variables named ``SB_USERNAME`` and ``SB_PASSWORD`` to your login |
|
credentials. Alternatively, the constructor accepts an argument ``creds`` to |
|
pass your login credentials in the format ``{"user": "", "passwd": ""}``. |
|
|
|
.. code-block:: python |
|
|
|
from socceraction.data.statsbomb import StatsBombLoader |
|
|
|
# set authentication credentials as environment variables |
|
import os |
|
os.environ["SB_USERNAME"] = "your_username" |
|
os.environ["SB_PASSWORD"] = "your_password" |
|
api = StatsBombLoader(getter="remote") |
|
|
|
# or provide authentication credentials as a dictionary |
|
api = StatsBombLoader(getter="remote", creds={"user": "", "passwd": ""}) |
|
|
|
|
|
Local directory |
|
=============== |
|
|
|
A final option is to load data from a local directory. This local directory |
|
can be specified by passing the ``root`` argument to the constructor, |
|
specifying the path to the local data directory. |
|
|
|
.. code-block:: python |
|
|
|
from socceraction.data.statsbomb import StatsBombLoader |
|
|
|
api = StatsBombLoader(getter="local", root="data/statsbomb") |
|
|
|
Note that the data should be organized in the same way as the StatsBomb Open |
|
Data repository, which corresponds to the following file hierarchy: |
|
|
|
.. code-block:: |
|
|
|
root |
|
βββ competitions.json |
|
βββ events |
|
β βββ <match_id>.json |
|
β βββ ... |
|
β βββ ... |
|
βββ lineups |
|
β βββ <match_id>.json |
|
β βββ ... |
|
βββ matches |
|
β βββ <competition_id> |
|
β β βββ <season_id>.json |
|
β β βββ ... |
|
β βββ ... |
|
βββ three-sixty |
|
βββ <match_id>.json |
|
βββ ... |
|
|
|
|
|
|
|
------------ |
|
Loading data |
|
------------ |
|
|
|
Next, you can load the match event stream data and metadata by calling the |
|
corresponding methods on the :class:`StatsBombLoader` object. |
|
|
|
|
|
:func:`StatsBombLoader.competitions()` |
|
====================================== |
|
|
|
.. code-block:: python |
|
|
|
df_competitions = api.competitions() |
|
|
|
.. csv-table:: |
|
:class: dataframe |
|
:header: season_id,competition_id,competition_name,country_name,competition_gender,season_name |
|
|
|
106,43,FIFA World Cup,International,male,2022 |
|
30,72,Women's World Cup,International,female,2019 |
|
3,43,FIFA World Cup,International,male,2018 |
|
|
|
|
|
:func:`StatsBombLoader.games()` |
|
=============================== |
|
|
|
.. code-block:: python |
|
|
|
df_games = api.games(competition_id=43, season_id=3) |
|
|
|
|
|
.. csv-table:: |
|
:class: dataframe |
|
:header: game_id,season_id,competition_id,competition_stage,game_day,game_date,home_team_id,away_team_id,home_score,away_score,venue,referee_id |
|
|
|
8658,3,43,Final,7,2018-07-15 17:00:00,771,785,4,2,Stadion Luzhniki,730 |
|
8657,3,43,3rd Place Final,7,2018-07-14 16:00:00,782,768,2,0,Saint-Petersburg Stadium,741 |
|
|
|
:func:`StatsBombLoader.teams()` |
|
=============================== |
|
|
|
.. code-block:: python |
|
|
|
df_teams = api.teams(game_id=8658) |
|
|
|
.. csv-table:: |
|
:class: dataframe |
|
:header: team_id,team_name |
|
:align: left |
|
|
|
771,France |
|
785,Croatia |
|
|
|
|
|
|
|
:func:`StatsBombLoader.players()` |
|
================================= |
|
|
|
.. code-block:: python |
|
|
|
df_players = api.players(game_id=8658) |
|
|
|
|
|
.. csv-table:: |
|
:class: dataframe |
|
:header: game_id,team_id,player_id,player_name,nickname,jersey_number,is_starter,starting_position_id,starting_position_name,minutes_played |
|
|
|
8658,771,3009,Kylian MbappΓ© Lottin,Kylian MbappΓ©,10,True,12,Right Midfield,95 |
|
8658,785,5463,Luka ModriΔ,,10,True,13,Right Center Midfield,95 |
|
|
|
|
|
:func:`StatsBombLoader.events()` |
|
================================ |
|
|
|
.. code-block:: python |
|
|
|
df_events = api.events(game_id=8658) |
|
|
|
.. csv-table:: |
|
:class: dataframe |
|
:header: event_id,index,period_id,timestamp,minute,second,type_id,type_name,possession,possession_team_id,possession_team_name,play_pattern_id,play_pattern_name,team_id,team_name,duration,extra,related_events,player_id,player_name,position_id,position_name,location,under_pressure,counterpress,game_id |
|
|
|
47638847-fd43-4656-b49c-cff64e5cfc0a,1,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,771,France,0.0,"{...}",[],,,,,,False,False,8658 |
|
0c04305d-5615-4520-9be5-7c232829954b,2,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,785,Croatia,1.412,"{...}",[],,,,,,False,False,8658 |
|
c5e17439-efe2-480b-9cff-1600998674d7,3,1,1900-01-01,0,0,18,Half Start,1,771,France,1,Regular Play,771,France,0.0,{},['7e1460eb-c572-4059-8cd4-cec4857f818d'],,,,,,False,False,8658 |
|
|
|
|
|
If `360 data snapshots`_ are available for the game, they can be loaded by |
|
passing ``load_360=True`` to the ``events()`` method. This will add two columns |
|
to the events dataframe: ``visible_area_360`` and ``freeze_frame_360``. The |
|
former contains the visible area of the pitch in the 360 snapshot, while the |
|
latter contains the player locations in the 360 snapshot. |
|
|
|
.. code-block:: python |
|
|
|
df_events = api.events(game_id=3788741, load_360=True) |
|
|
|
|
|
.. _StatsBomb event stream data: https://statsbomb.com/what-we-do/soccer-data/ |
|
.. _statsbombpy: https://pypi.org/project/statsbombpy/ |
|
.. _user agreement: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf |
|
.. _360 data snapshots: https://statsbomb.com/what-we-do/soccer-data/360-2/ |
|
|