File size: 7,177 Bytes
d6ea71e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
.. currentmodule:: socceraction.data.statsbomb
=========================
Loading StatsBomb data
=========================
The :class:`StatsBombLoader` class provides an API client enabling you to
fetch `StatsBomb event stream data`_ as Pandas DataFrames. This document provides
an overview of the available data sources and how to access them.
------
Setup
------
To be able to load StatsBomb data, you'll first need to install a few
additional dependencies which are not included in the default installation of
socceraction. You can install these additional dependencies by running:
.. code-block:: console
$ pip install "socceraction[statsbomb]"
--------------------------
Connecting to a data store
--------------------------
First, you have to create a :class:`StatsBombLoader` object and configure it
for the data store you want to use. The :class:`StatsBombLoader` supports
loading data from the StatsBomb Open Data repository, from the official
StatsBomb API, and from local files.
Open Data repository
====================
StatsBomb has made event stream data of certain leagues freely available for
public non-commercial use at https://github.com/statsbomb/open-data. This open
data can be accessed without the need of authentication, but its use is
subject to a `user agreement`_. The code below shows how to setup an API client
that can fetch data from the repository.
.. code-block:: python
# optional: suppress warning about missing authentication
import warnings
from statsbombpy.api_client import NoAuthWarning
warnings.simplefilter('ignore', NoAuthWarning)
from socceraction.data.statsbomb import StatsBombLoader
api = StatsBombLoader(getter="remote", creds=None)
.. note::
If you publish, share or distribute any research, analysis or insights based
on this data, StatsBomb requires you to state the data source as StatsBomb
and use their logo.
StatsBomb API
=============
API access is for paying customers only. Authentication can be done by setting
environment variables named ``SB_USERNAME`` and ``SB_PASSWORD`` to your login
credentials. Alternatively, the constructor accepts an argument ``creds`` to
pass your login credentials in the format ``{"user": "", "passwd": ""}``.
.. code-block:: python
from socceraction.data.statsbomb import StatsBombLoader
# set authentication credentials as environment variables
import os
os.environ["SB_USERNAME"] = "your_username"
os.environ["SB_PASSWORD"] = "your_password"
api = StatsBombLoader(getter="remote")
# or provide authentication credentials as a dictionary
api = StatsBombLoader(getter="remote", creds={"user": "", "passwd": ""})
Local directory
===============
A final option is to load data from a local directory. This local directory
can be specified by passing the ``root`` argument to the constructor,
specifying the path to the local data directory.
.. code-block:: python
from socceraction.data.statsbomb import StatsBombLoader
api = StatsBombLoader(getter="local", root="data/statsbomb")
Note that the data should be organized in the same way as the StatsBomb Open
Data repository, which corresponds to the following file hierarchy:
.. code-block::
root
βββ competitions.json
βββ events
β βββ <match_id>.json
β βββ ...
β βββ ...
βββ lineups
β βββ <match_id>.json
β βββ ...
βββ matches
β βββ <competition_id>
β β βββ <season_id>.json
β β βββ ...
β βββ ...
βββ three-sixty
βββ <match_id>.json
βββ ...
------------
Loading data
------------
Next, you can load the match event stream data and metadata by calling the
corresponding methods on the :class:`StatsBombLoader` object.
:func:`StatsBombLoader.competitions()`
======================================
.. code-block:: python
df_competitions = api.competitions()
.. csv-table::
:class: dataframe
:header: season_id,competition_id,competition_name,country_name,competition_gender,season_name
106,43,FIFA World Cup,International,male,2022
30,72,Women's World Cup,International,female,2019
3,43,FIFA World Cup,International,male,2018
:func:`StatsBombLoader.games()`
===============================
.. code-block:: python
df_games = api.games(competition_id=43, season_id=3)
.. csv-table::
:class: dataframe
:header: game_id,season_id,competition_id,competition_stage,game_day,game_date,home_team_id,away_team_id,home_score,away_score,venue,referee_id
8658,3,43,Final,7,2018-07-15 17:00:00,771,785,4,2,Stadion Luzhniki,730
8657,3,43,3rd Place Final,7,2018-07-14 16:00:00,782,768,2,0,Saint-Petersburg Stadium,741
:func:`StatsBombLoader.teams()`
===============================
.. code-block:: python
df_teams = api.teams(game_id=8658)
.. csv-table::
:class: dataframe
:header: team_id,team_name
:align: left
771,France
785,Croatia
:func:`StatsBombLoader.players()`
=================================
.. code-block:: python
df_players = api.players(game_id=8658)
.. csv-table::
:class: dataframe
:header: game_id,team_id,player_id,player_name,nickname,jersey_number,is_starter,starting_position_id,starting_position_name,minutes_played
8658,771,3009,Kylian MbappΓ© Lottin,Kylian MbappΓ©,10,True,12,Right Midfield,95
8658,785,5463,Luka ModriΔ,,10,True,13,Right Center Midfield,95
:func:`StatsBombLoader.events()`
================================
.. code-block:: python
df_events = api.events(game_id=8658)
.. csv-table::
:class: dataframe
:header: event_id,index,period_id,timestamp,minute,second,type_id,type_name,possession,possession_team_id,possession_team_name,play_pattern_id,play_pattern_name,team_id,team_name,duration,extra,related_events,player_id,player_name,position_id,position_name,location,under_pressure,counterpress,game_id
47638847-fd43-4656-b49c-cff64e5cfc0a,1,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,771,France,0.0,"{...}",[],,,,,,False,False,8658
0c04305d-5615-4520-9be5-7c232829954b,2,1,1900-01-01,0,0,35,Starting XI,1,771,France,1,Regular Play,785,Croatia,1.412,"{...}",[],,,,,,False,False,8658
c5e17439-efe2-480b-9cff-1600998674d7,3,1,1900-01-01,0,0,18,Half Start,1,771,France,1,Regular Play,771,France,0.0,{},['7e1460eb-c572-4059-8cd4-cec4857f818d'],,,,,,False,False,8658
If `360 data snapshots`_ are available for the game, they can be loaded by
passing ``load_360=True`` to the ``events()`` method. This will add two columns
to the events dataframe: ``visible_area_360`` and ``freeze_frame_360``. The
former contains the visible area of the pitch in the 360 snapshot, while the
latter contains the player locations in the 360 snapshot.
.. code-block:: python
df_events = api.events(game_id=3788741, load_360=True)
.. _StatsBomb event stream data: https://statsbomb.com/what-we-do/soccer-data/
.. _statsbombpy: https://pypi.org/project/statsbombpy/
.. _user agreement: https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf
.. _360 data snapshots: https://statsbomb.com/what-we-do/soccer-data/360-2/
|