Spaces:
Sleeping
Sleeping
# Basic Concepts | |
## Arthur Overview | |
The Arthur platform monitors, measures, and improves machine learning models to deliver better business outcomes. | |
Arthur helps data scientists, product owners, and business leaders accelerate model operations and optimize for | |
accuracy, explainability, and fairness. | |
To use Arthur, you: | |
1. Register your model with the Arthur system | |
2. Set reference data for baseline analytics | |
3. Send inference data over time | |
With this data, Arthur quantifies and centralizes your models' performance for efficient querying and automated | |
analytics. | |
## Models and Onboarding | |
### Registering a Model | |
When you register a model with Arthur, you define the way the model processes data. | |
Arthur is model-agnostic and platform-agnostic, so no matter which tools you used to build or deploy, | |
you can use Arthur to log all the data your model receives and produces - | |
registration tells Arthur how this will happen. | |
(basic_concepts_input_output_types)= | |
#### Input and Output Types | |
These are the data types that define the data that enters and exits your model. | |
The `InputType` of a model specifies whether data enters your model as a tabular dataframe, as an image, or as raw text. | |
The `OutputType` of a model specifies the modeling task at hand: | |
whether your model is predicting values for a regression task, predicting probabilities for a classification task, | |
or predicting bounding boxes for a computer vision object-detection task. | |
(basic_concepts_streaming_vs_batch)= | |
#### Streaming vs. Batch | |
When registering a model, you specify whether your model ingests data either as a stream or in batches. | |
A streaming model receives instances of data as they come into the deployed model. A batch model, in contrast, | |
receives data in groups, and is often preferred if your model runs as a job rather than operating in realtime or over | |
a data stream. | |
Indicating a batch model simply means that you'll supply an additional "batch_id" to group your inferences, and Arthur | |
will default to measuring performance for each batch rather than by the inference timestamps. | |
(basic_concepts_attributes_and_stages)= | |
### Attributes and Stages | |
Attributes are analagous to the different columns that make up your model's data. | |
Each attribute has a value type: these can be standard types like `int` and `str`, | |
or datatypes for complex models like raw text and images. | |
When you are onboarding a model, Arthur categorizes each attribute into a different `Stage`, | |
depending on the role of the attribute in the model pipeline: | |
1. `ModelPipelineInput`: these attributes are the features your model receives as input | |
1. `PredictedValue`: these attributes are the output values your model produces | |
1. `GroundTruth`: these attributes are the true values for your model's prediction task, for comparing the model's | |
outputs against for performance metrics | |
1. `NonInputData`: these attributes are the additional metadata you can log with Arthur | |
related to input data that your model doesn't take as an input feature, | |
e.g. protected attributes like age, race, or sex, or specific business data like a unique customer ID | |
### Model Schema | |
The model schema is a record of important properties of your model's attributes, including their value type and `Stage`. | |
As you log data over time with Arthur, the model schema is used to type-check ingested data. | |
This prevents analytics from being skewed by scenarios | |
like `int` values suddenly replacing `float` values causing silent bugs. | |
Arthur also records attribute properties in the model schema, | |
like the range of possible values an attribute has in your data. | |
These properties are used to get a sense of your data's high-level structure, | |
_not_ to enforce that future attributes have strictly these same properties. | |
(basic_concepts_reference_dataset)= | |
### Reference Dataset | |
```{image} /_static/images/Model-Reference-Dataset-Light-Mode.png | |
:align: center | |
:class: only-light | |
``` | |
```{image} /_static/images/Model-Reference-Dataset-Dark-Mode.png | |
:align: center | |
:class: only-dark | |
``` | |
The reference dataset is a representative sample of the input features your model ingests. | |
This can be the model's training data, | |
or any other set of data that captures the distribution of data your model's inputs are sourced from. | |
This dataset is used to compute baseline model analytics. | |
By capturing the distribution of data you expect your model to receive, | |
Arthur can detect, surface, and diagnose data drift before it impacts results. | |
Note that Arthur can compute data drift metrics against any two distributions you choose (e.g. inferences | |
now compared to the same time last year), but the platform uses the reference dataset as the default. | |
The only required stage to be included in the reference dataset is `ModelPipelineInput`. | |
But we also recommend including data from the `PredictedValue`, `GroundTruth`, and `NonInputData` stages so that | |
Arthur can measure drift in those attributes over time as well. | |
## Sending Data to Arthur | |
### Inferences | |
The data your model produces over time is logged in the Arthur platform as **inferences**. | |
These inferences contain attributes from the `ModelPipelineInput` and `PredictedValue` stage (model inputs and outputs), | |
from which Arthur computes performance metrics. In addition, when you log these inferences, | |
you have the option to include `GroundTruth` and `NonInputData` attributes. | |
### Sending Ground Truth Separately | |
```{image} /_static/images/Ground-Truth-Light-Mode.png | |
:align: center | |
:class: only-light | |
``` | |
```{image} /_static/images/Ground-Truth-Dark-Mode.png | |
:align: center | |
:class: only-dark | |
``` | |
`GroundTruth` attributes are often not available when models produce inferences. | |
Therefore, Arthur allows you to send this attribute data to the platform _after_ sending the original inferences, | |
using an ID to pair data with the right inference. | |
## Metrics | |
Metrics are the measurements Arthur computes to quantify model performance. | |
Default metrics are the basic model performance metrics generated automatically by Arthur, | |
e.g. accuracy, mean-squared error, or AUC. | |
Furthermore, additional metrics can be written using the API and added to a model for | |
measuring performance specific to a custom business use-case. | |
You can use the Arthur API to efficiently query model performance metrics at scale. | |
Model metrics can be accessed in the online Arthur UI, using the Arthur API, and by using the Arthur Python SDK. | |
See the {doc}`/user-guide/api-query-guide/index` for more resources on model metrics. | |
## Alerts | |
An alert is a message notifying you that something has occurred with your model. | |
With alerts, Arthur makes it easy to provide a continuous view into your model | |
by highlighting important changes in model performance. | |
An alert is triggered based on an **_alert rule_**, which you define using a metric and a threshold: | |
when the metric crosses your threshold, the alert is activated. | |
This alert can then be delivered to you via email, highlighted in the online Arthur UI, | |
and/or accessed via integrations such as PagerDuty and Slack. | |
For an in-depth guide to setting alerts, see the {doc}`/user-guide/walkthroughs/metrics_alerts` guide. | |
## Enrichments | |
Enrichments are additional services that the Arthur platform provides for state-of-the-art proactive model monitoring: | |
- **Explainability**: | |
methods for computing the importance of individual features from your data on your model's outcomes. | |
- **Anomaly Detection**: drift metrics for quantifying how far incoming inferences have drifted | |
from the distribution of your model's reference dataset. | |
- **Hotspots**: automated identification of segments of your data where your model is underperforming. | |
- **Bias Mitigation**: | |
methods for model post-processing that improve the fairness of outcomes without re-deploying your model. | |
Once activated, these enrichments are computed on Arthur's backend automatically, | |
with results viewable in the online UI dashboard and queryable from Arthur's API. | |
The {doc}`/user-guide/walkthroughs/enrichments` guide shows how to set up enrichments and describes all of Arthur's currently offered enrichments. | |
## Insights | |
Insights are proactive notifications about your model's performance. For example, once you've enabled the Hotspots | |
enrichment you'll receive insights about regions of your data space where model accuracy has significantly degraded. | |
## Model Groups and Versioning | |
Arthur helps you track the improvements of model updates with Model Versioning. | |
If a change has occurred to your data preprocessing pipeline, if you have retrained your model, | |
or if you've reset your model's reference data, your updated model is likely | |
a new version of a previous model addressing the same task. In this case, Arthur recommends | |
keeping these models within the same Model Group to track performance as you continue improving your model. | |
Each model you onboard to Arthur is placed in a Model Group. As you change the model over time, | |
you can add new versions of the model that will live in the same group. | |
Tracking improvement in performance over time within a Model Group is then | |
streamlined for quick insights in the Arthur UI dashboard. | |
## Next Steps | |
### {doc}`Onboard Your Model </user-guide/walkthroughs/model-onboarding/index>` | |
The {doc}`Model Onboarding walkthrough </user-guide/walkthroughs/model-onboarding/index>` covers the steps of onboarding | |
a model, formatting attribute data, and sending inferences to Arthur. | |