scfive's picture
Upload 203 files
d6ea71e verified
Quickstart
===========
Eager to get started valuing some soccer actions? This page gives a quick
introduction on how to get started.
Installation
------------
First, make sure that socceraction is installed:
.. code-block:: console
$ pip install socceraction[statsbomb]
For detailed instructions and other installation options, check out our
detailed :doc:`installation instructions <install>`.
Loading event stream data
-------------------------
First of all, you will need some data. Luckily, both `StatsBomb <https://github.com/statsbomb/open-data>`_ and
`Wyscout <https://www.nature.com/articles/s41597-019-0247-7>`_ provide a small freely available dataset.
The :ref:`data module<api-data>` of socceraction makes it trivial to load these datasets as
`Pandas DataFrames <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__.
In this short introduction, we will work with Statsbomb's dataset of the 2018 World Cup.
.. code-block:: python
import pandas as pd
from socceraction.data.statsbomb import StatsBombLoader
# Set up the StatsBomb data loader
SBL = StatsBombLoader()
# View all available competitions
df_competitions = SBL.competitions()
# Create a dataframe with all games from the 2018 World Cup
df_games = SBL.games(competition_id=43, season_id=3).set_index("game_id")
.. note::
Keep in mind that by using the public StatsBomb data you are agreeing to their `user agreement <https://github.com/statsbomb/open-data/blob/master/LICENSE.pdf>`__.
For each game, you can then retrieve a dataframe containing the teams, all
players that participated, and all events that were recorded in that game.
Specifically, we'll load the data from the third place play-off game between
England and Belgium.
.. code-block:: python
game_id = 8657
df_teams = SBL.teams(game_id)
df_players = SBL.players(game_id)
df_events = SBL.events(game_id)
Converting to SPADL actions
---------------------------
The event stream format is not well-suited for data analysis: some of the
recorded information is irrelevant for valuing actions, each vendor uses their
own custom format and definitions, and the events are stored as unstructured
JSON objects. Therefore, socceraction uses the :doc:`SPADL format
<spadl/index>` for describing actions on the pitch. With the code below, you
can convert the events to SPADL actions.
.. code-block:: python
import socceraction.spadl as spadl
home_team_id = df_games.at[game_id, "home_team_id"]
df_actions = spadl.statsbomb.convert_to_actions(df_events, home_team_id)
With the `matplotsoccer package <https://github.com/TomDecroos/matplotsoccer>`_, you can try plotting some of these
actions:
.. code-block:: python
import matplotsoccer as mps
# Select relevant actions
df_actions_goal = df_actions.loc[2196:2200]
# Replace result, actiontype and bodypart IDs by their corresponding name
df_actions_goal = spadl.add_names(df_actions_goal)
# Add team and player names
df_actions_goal = df_actions_goal.merge(df_teams).merge(df_players)
# Create the plot
mps.actions(
location=df_actions_goal[["start_x", "start_y", "end_x", "end_y"]],
action_type=df_actions_goal.type_name,
team=df_actions_goal.team_name,
result=df_actions_goal.result_name == "success",
label=df_actions_goal[["time_seconds", "type_name", "player_name", "team_name"]],
labeltitle=["time", "actiontype", "player", "team"],
zoom=False
)
.. figure:: spadl/eden_hazard_goal_spadl.png
:align: center
Valuing actions
---------------
We can now assign a numeric value to each of these individual actions that
quantifies how much the action contributed towards winning the game.
Socceraction implements three frameworks for doing this: xT, VAEP and
Atomic-Vaep. In this quickstart guide, we will focus on the xT framework.
The expected threat or xT model overlays a :math:`M \times N` grid on the
pitch in order to divide it into zones. Each zone :math:`z` is
then assigned a value :math:`xT(z)` that reflects how threatening teams are at
that location, in terms of scoring. An example grid is visualized below.
.. image:: valuing_actions/default_xt_grid.png
:width: 600
:align: center
The code below allows you to load
league-wide xT values from the 2017-18 Premier League season (the 12x8 grid
shown above). Instructions on how to train your own model can be found in the
:doc:`detailed documentation about xT <valuing_actions/xT>`.
.. code-block:: python
import socceraction.xthreat as xthreat
url_grid = "https://karun.in/blog/data/open_xt_12x8_v1.json"
xT_model = xthreat.load_model(url_grid)
Subsequently, the model can be used to value actions that successfully move
the ball between two zones by computing the difference between the threat
value on the start and end location of each action. The xT framework does not
assign a value to failed actions, shots and defensive actions such as tackles.
.. code-block:: python
df_actions_ltr = spadl.play_left_to_right(df_actions, home_team_id)
df_actions["xT_value"] = xT_model.rate(df_actions_ltr)
.. image:: valuing_actions/eden_hazard_goal_xt.png
:align: center
-----------------------
Ready for more? Check out the detailed documentation about the
:doc:`data representation <spadl/index>` and
:doc:`action value frameworks <valuing_actions/index>`.