|
|
|
.. currentmodule:: socceraction.data.wyscout |
|
|
|
========================= |
|
Loading Wyscout data |
|
========================= |
|
|
|
The :class:`WyscoutLoader` class provides an API client enabling you to fetch |
|
`Wyscout event stream data`_ as Pandas DataFrames. This document provides an |
|
overview of the available data sources and how to access them. |
|
|
|
.. note:: |
|
|
|
Currently, only version 2 of the Wyscout API is supported. |
|
See https://github.com/ML-KULeuven/socceraction/issues/156 |
|
for progress on version 3 support. |
|
|
|
|
|
|
|
Connecting to a data store |
|
|
|
|
|
First, you have to create a :class:`WyscoutLoader` object and configure it |
|
for the data store you want to use. The :class:`WyscoutLoader` supports |
|
loading data from the official Wyscout API and from local files. Additionally, |
|
the :class:`PublicWyscoutLoader` class can be used to load a publicly |
|
available dataset. |
|
|
|
|
|
Wyscout API |
|
============= |
|
|
|
`Wyscout API <https://apidocs.wyscout.com/>`_ access requires a separate |
|
subscription. Wyscout currently offers `three different packs |
|
<https://footballdata.wyscout.com/packages/>`_: a Database Pack (match sheet |
|
data), a Stats Pack (statistics derived from match event data), and an Events |
|
Pack (raw match event data). A subscription to the Events Pack is required to |
|
access the event stream data. |
|
|
|
Authentication can be done by setting environment variables named |
|
``WY_USERNAME`` and ``WY_PASSWORD`` to your login credentials (i.e., client id |
|
and secret). Alternatively, the constructor accepts an argument ``creds`` to |
|
pass your login credentials in the format ``{"user": "", "passwd": ""}``. |
|
|
|
|
|
.. code-block:: python |
|
|
|
from socceraction.data.wyscout import WyscoutLoader |
|
|
|
# set authentication credentials as environment variables |
|
import os |
|
os.environ["WY_USERNAME"] = "your_client_id" |
|
os.environ["WY_PASSWORD"] = "your_secret" |
|
api = WyscoutLoader(getter="remote") |
|
|
|
# or provide authentication credentials as a dictionary |
|
api = WyscoutLoader(getter="remote", creds={"user": "", "passwd": ""}) |
|
|
|
|
|
Local directory |
|
=============== |
|
|
|
Data can also be loaded from a local directory. This local directory |
|
can be specified by passing the ``root`` argument to the constructor, |
|
specifying the path to the local data directory. |
|
|
|
.. code-block:: python |
|
|
|
from socceraction.data.wyscout import WyscoutLoader |
|
|
|
ap = WyscoutLoader(getter="local", root="data/wyscout") |
|
|
|
|
|
The loader uses the directory structure and file names to determine which files |
|
should be parsed to retrieve the requested data. Therefore, the local directory |
|
should have a predefined file hierarchy. By default, it expects following file |
|
hierarchy: |
|
|
|
.. code-block:: |
|
|
|
root |
|
βββ competitions.json |
|
βββ seasons_<competition_id>.json |
|
βββ matches_<season_id>.json |
|
βββ matches |
|
βββ events_<game_id>.json |
|
βββ ... |
|
|
|
If your local directory has a different file hierarchy, you can specify |
|
this custom hierarchy by passing the ``feeds`` argument to the constructor. |
|
A wide range of file names and directory structures are supported. However, |
|
the competition, season, and game identifiers must be included in the file |
|
names to be able to locate the corresponding files for each entity. |
|
|
|
.. code-block:: python |
|
|
|
from socceraction.data.wyscout import WyscoutLoader |
|
|
|
ap = WyscoutLoader(getter="local", root="data/wyscout", feeds={ |
|
"competitions": "competitions.json", |
|
"seasons": "seasons_{competition_id}.json", |
|
"games": "matches_{season_id}.json", |
|
"events": "matches/events_{game_id}.json", |
|
})) |
|
|
|
The ``{competition_id}``, ``{season_id}``, and ``{game_id}`` placeholders |
|
will be replaced by the corresponding id values when data is retrieved. |
|
|
|
|
|
Soccer logs dataset |
|
=================== |
|
|
|
As part of the "A public data set of spatio-temporal match events in soccer |
|
competitions" paper, Wyscout made an event stream dataset available for |
|
research purposes. The dataset covers the 2017/18 season of the Spanish, |
|
Italian, English, German, and French first division. In addition, it includes |
|
the data of the 2018 World Cup and the 2016 European championship. The dataset |
|
is available at https://figshare.com/collections/Soccer_match_event_dataset/4415000/2. |
|
|
|
As the format of this dataset is slightly different from the format of the |
|
official Wyscout API, a separate :class:`PublicWyscoutLoader` class is |
|
provided to load this dataset. This loader will download the dataset once and |
|
extract it to the specified ``root`` directory. |
|
|
|
|
|
.. code-block:: python |
|
|
|
from socceraction.data.wyscout import PublicWyscoutLoader |
|
|
|
api = PublicWyscoutLoader(root="data/wyscout") |
|
|
|
|
|
|
|
Loading data |
|
|
|
|
|
Next, you can load the match event stream data and metadata by calling the |
|
corresponding methods on the :class:`WyscoutLoader` object. |
|
|
|
- :func:`WyscoutLoader.competitions()` |
|
- :func:`WyscoutLoader.games()` |
|
- :func:`WyscoutLoader.teams()` |
|
- :func:`WyscoutLoader.players()` |
|
- :func:`WyscoutLoader.events()` |
|
|
|
|
|
.. _Wyscout event stream data: https://footballdata.wyscout.com/ |
|
|