|
.. currentmodule:: socceraction.data |
|
|
|
************* |
|
Loading data |
|
************* |
|
|
|
Socceraction provides API clients for various popular event stream data |
|
sources. These clients enable fetching event streams and their corresponding |
|
metadata as Pandas DataFrames using a unified data model. |
|
Alternatively, you can also use `kloppy <https://kloppy.pysport.org/>`__ to |
|
load data. |
|
|
|
Loading data with socceraction |
|
============================== |
|
|
|
All API clients implemented in socceraction inherit from the |
|
:class:`~base.EventDataLoader` interface. This interface provides the |
|
following methods to retrieve data as a Pandas DataFrames with a unified data |
|
model (i.e., :class:`~pandera.Schema`). The schema defines the minimal set of |
|
columns and their types that are returned by each method. Implementations of |
|
the :class:`~base.EventDataLoader` interface may add additional columns. |
|
|
|
.. list-table:: |
|
:widths: 40 20 40 |
|
:header-rows: 1 |
|
|
|
* - Method |
|
- Output schema |
|
- Description |
|
* - :meth:`competitions() <base.EventDataLoader.competitions>` |
|
- :class:`~schema.CompetitionSchema` |
|
- All available competitions and seasons |
|
* - :meth:`games(competition_id, season_id) <base.EventDataLoader.games>` |
|
- :class:`~schema.GameSchema` |
|
- All available games in a season |
|
* - :meth:`teams(game_id) <base.EventDataLoader.teams>` |
|
- :class:`~schema.TeamSchema` |
|
- Both teams that participated in a game |
|
* - :meth:`players(game_id) <base.EventDataLoader.players>` |
|
- :class:`~schema.PlayerSchema` |
|
- All players that participated in a game |
|
* - :meth:`events(game_id) <base.EventDataLoader.events>` |
|
- :class:`~schema.EventSchema` |
|
- The event stream of a game |
|
|
|
Currently, the following data providers are supported: |
|
|
|
.. toctree:: |
|
:maxdepth: 1 |
|
|
|
statsbomb |
|
wyscout |
|
opta |
|
|
|
|
|
Loading data with kloppy |
|
========================= |
|
|
|
Similarly to socceraction, `kloppy <https://kloppy.pysport.org/>`__ implements |
|
a unified data model for soccer data. The main differences between kloppy and |
|
socceraction are: (1) kloppy supports more data sources (including tracking |
|
data), (2) kloppy uses a more flexible object-based data model in contrast to |
|
socceraction's dataframe-based model, and (3) kloppy covers a more complete |
|
set of events while socceraction focuses on-the-ball events. Thus, we recommend |
|
using kloppy if you want to load data from a source that is not supported by |
|
socceraction or when your analysis is not limited to on-the-ball events. |
|
|
|
The following code snippet shows how to load data from StatsBomb using |
|
kloppy:: |
|
|
|
from kloppy import statsbomb |
|
|
|
dataset = statsbomb.load_open_data(match_id=8657) |
|
|
|
Instructions for loading data from other sources can be found in the |
|
`kloppy documentation <https://kloppy.pysport.org/>`__. |
|
|
|
You can then convert the data to the SPADL format using the |
|
:func:`~socceraction.spadl.kloppy.convert_to_actions` function:: |
|
|
|
from socceraction.spadl.kloppy import convert_to_actions |
|
|
|
spadl_actions = convert_to_actions(dataset, game_id=8657) |
|
|
|
|
|
.. note:: |
|
|
|
Currently, the data model of kloppy is only complete for StatsBomb data. |
|
If you use kloppy to load data from other sources and convert it to the |
|
SPADL format, you may lose some information. |
|
|