maxcembalest's picture
Upload 184 files
ad8da65
# Data Drift
## Querying Drift in Python
The basic format of a drift query using the Python SDK involves specifying that the
`query_type` parameter has the value 'drift':
```python
query = {...}
arthur_model.query(query, query_type='drift')
```
## Data Drift Endpoint
Data drift has a dedicated endpoint at `/models/{model_id}/inferences/query/data_drift`.
Returns the data drift metric between a `base` dataset with a `target` dataset. This endpoint can support up to 100 properties in one request.
* `num_bins` - Specifies the granularity of bucketing for continuous distributions and will be ignored if the attribute is categorical.
* `metric` - Specify one metric among {ref}`the data drift metrics Arthur offers <glossary_data_drift>`.
* `filter` - Optional blocks specific to either reference or inference set and specify which data should be used in the data drift calculation.
* `group_by` - Global and applies to both the base and target data.
* `rollup` - Optional parameter that will aggregate the calculated data drift value by the supported time dimension.
For `HypothesisTest`, the returned value is transformed as -log_10(P_value) to maintain directional parity with the other data drift metrics. That is, lower P_value is more significant and implies data drift, reflected in a higher -log_10(P_value). Further mathematical details are in the {ref}`glossary <glossary_hypothesis_test>.
Query Request:
```json
{
"properties": [
"<attribute1_name> [string]",
"<attribute2_name> [string]",
"<attribute3_name> [string]"
],
"num_bins": "<num_bins> [int]",
"metric": "[PSI|KLDivergence|JSDivergence|HellingerDistance|HypothesisTest]",
"base": {
"source": "[inference|reference]",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"target": {
"source": "[inference|reference|ground_truth]",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"group_by [Optional]": [
{
"property": "<group_by_attribute_name> [string]"
}
],
"rollup [Optional]": "minute|hour|day|month|year|batch_id"
}
```
Query Response:
```json
{
"query_result": [
{
"<attribute1_name>": "<attribute1_data_drift> [float]",
"<attribute2_name>": "<attribute2_data_drift> [float]",
"<attribute3_name>": "<attribute3_data_drift> [float]",
"<group_by_attribute_name>": "<group_by_attribute_value> [string|int|null]",
"rollup": "<rollup_attribute_value> [string|null]"
}
]
}
```
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators.
#### Example: Reference vs. Inference
Sample Request: Calculate data drift for males, grouped by country, rolled up by hour.
```json
{
"properties": [
"age"
],
"num_bins": 10,
"metric": "PSI",
"base": {
"source": "reference",
"filter": [
{
"property": "gender",
"comparator": "eq",
"value": "male"
}
]
},
"target": {
"source": "inference",
"filter": [
{
"property": "gender",
"comparator": "eq",
"value": "male"
},
{
"property": "inference_timestamp",
"comparator": "gte",
"value": "2020-07-22T10:00:00Z"
},
{
"property": "inference_timestamp",
"comparator": "lt",
"value": "2020-07-23T10:00:00Z"
}
]
},
"group_by": [
{
"property": "country"
}
],
"rollup": "hour"
}
```
Sample Response:
```json
{
"query_result": [
{
"age": 2.3,
"country": "Canada",
"rollup": "2020-07-22T10:00:00Z"
},
{
"age": 2.4,
"country": "United States",
"rollup": "2020-07-22T10:00:00Z"
}
]
}
```
### Example: Inference vs. Inference
Sample Request: Compare data drift between two batches, with no grouping, no filters, and no rollups.
```json
{
"properties": [
"age"
],
"num_bins": 10,
"metric": "PSI",
"base": {
"source": "inference",
"filter": [
{
"property": "batch_id",
"comparator": "eq",
"value": "5"
}
]
},
"target": {
"source": "inference",
"filter": [
{
"property": "batch_id",
"comparator": "eq",
"value": "6"
}
]
}
}
```
Sample Response:
```json
{
"query_result": [
{
"age": 2.3
}
]
}
```
[back to top](#data-drift)
### Example: Reference vs. Ground Truth
Sample Request: Calculate data drift for individual ground truth class prediction probabilities, rolled up by hour.
```json
{
"properties": [
"gt_1"
],
"num_bins": 10,
"metric": "PSI",
"base": {
"source": "reference"
},
"target": {
"source": "ground_truth",
"filter": [
{
"property": "ground_truth_timestamp",
"comparator": "gte",
"value": "2020-07-22T10:00:00Z"
},
{
"property": "ground_truth_timestamp",
"comparator": "lt",
"value": "2020-07-23T10:00:00Z"
}
]
},
"rollup": "hour"
}
```
Sample Response:
```json
{
"query_result": [
{
"gt_1": 0.03,
"rollup": "2020-07-22T10:00:00Z"
},
{
"gt_1": 0.4,
"rollup": "2020-07-22T11:00:00Z"
}
]
}
```
[back to top](#data-drift)
## Data Drift PSI Bucket Table Values
This metric has a dedicated endpoint at `/models/{model_id}/inferences/query/data_drift_psi_bucket_calculation_table`.
Returns the [PSI](https://scholarworks.wmich.edu/cgi/viewcontent.cgi?article=4249&context=dissertations) scores by bucket using the reference set data. This query for this endpoint omits the need for `metric` and takes in a single `property` but otherwise is identical to the [data drift endpoint](#data-drift-endpoint)
Note when using this endpoint with categorical features, the `bucket_min` and `bucket_max` fields will not be
returned in the response. Instead, the `bucket` field will contain the category name.
Query Request:
```json
{
"property": "<attribute_name> [string]",
"num_bins": "<num_bins> [int]",
"base": {
"source": "[inference|reference]",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"target": {
"source": "[inference|reference]",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"group_by [Optional]": [
{
"property": "<group_by_attribute_name> [string]"
}
],
"rollup [Optional]": "minute|hour|day|month|year|batch_id"
}
```
Query Response:
```json
{
"query_result": [
{
"bucket": "string",
"rollup": "string|null",
"group_by_property_1": "string|null",
"base_bucket_max": "number",
"base_bucket_min": "number",
"base_count_per_bucket": "number",
"base_ln_probability_per_bucket": "number",
"base_probability_per_bucket": "number",
"base_total": "number",
"target_bucket_max": "number",
"target_bucket_min": "number",
"target_count_per_bucket": "number",
"target_ln_probability_per_bucket": "number",
"target_probability_per_bucket": "number",
"target_total": "number",
"probability_difference": "number",
"ln_probability_difference": "number",
"psi": "number"
}
]
}
```
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators.
***
Sample Request: Calculate data drift bucket components for males, grouped by country, rolled up by hour.
```json
{
"property": "age",
"num_bins": 2,
"base": {
"source": "reference",
"filter": [
{
"property": "gender",
"comparator": "eq",
"value": "male"
}
]
},
"target": {
"source": "inference",
"filter": [
{
"property": "gender",
"comparator": "eq",
"value": "male"
},
{
"property": "inference_timestamp",
"comparator": "gte",
"value": "2020-07-22T10:00:00Z"
},
{
"property": "inference_timestamp",
"comparator": "lt",
"value": "2020-07-23T10:00:00Z"
}
]
},
"group_by": [
{
"property": "country"
}
],
"rollup": "hour"
}
```
Sample Response:
```json
{
"query_result": [
{
"bucket": "bucket_1",
"rollup": "2020-01-01T00:00:00Z",
"country": "Canada",
"base_bucket_max": 0.9999971182990177,
"base_bucket_min": 0.5009102069226075,
"base_count_per_bucket": 4988,
"base_ln_probability_per_bucket": -0.6955500651756032,
"base_probability_per_bucket": 0.4988,
"base_total": 10000,
"target_bucket_max": 0.9999971182990177,
"target_bucket_min": 0.5009102069226075,
"target_count_per_bucket": 2487,
"target_ln_probability_per_bucket": -0.6701670131762315,
"target_probability_per_bucket": 0.5116231228142357,
"target_total": 4861,
"probability_difference": -0.012823122814235699,
"ln_probability_difference": -0.025383051999371742,
"psi": 0.00032548999318807485
},
{
"bucket": "bucket_2",
"rollup": "2020-01-01T00:00:00Z",
"country": "United States",
"base_bucket_max": 0.9999971182990177,
"base_bucket_min": 0.5009102069226075,
"base_count_per_bucket": 4988,
"base_ln_probability_per_bucket": -0.6955500651756032,
"base_probability_per_bucket": 0.4988,
"base_total": 10000,
"target_bucket_max": 0.9999971182990177,
"target_bucket_min": 0.5009102069226075,
"target_count_per_bucket": 2487,
"target_ln_probability_per_bucket": -0.6701670131762315,
"target_probability_per_bucket": 0.5116231228142357,
"target_total": 4861,
"probability_difference": -0.012823122814235699,
"ln_probability_difference": -0.025383051999371742,
"psi": 0.00032548999318807485
},
{
"bucket": "bucket_1",
"rollup": "2020-01-01T01:00:00Z",
"country": "Canada",
"base_bucket_max": 0.9999971182990177,
"base_bucket_min": 0.5009102069226075,
"base_count_per_bucket": 4988,
"base_ln_probability_per_bucket": -0.6955500651756032,
"base_probability_per_bucket": 0.4988,
"base_total": 10000,
"target_bucket_max": 0.9999971182990177,
"target_bucket_min": 0.5009102069226075,
"target_count_per_bucket": 2487,
"target_ln_probability_per_bucket": -0.6701670131762315,
"target_probability_per_bucket": 0.5116231228142357,
"target_total": 4861,
"probability_difference": -0.012823122814235699,
"ln_probability_difference": -0.025383051999371742,
"psi": 0.00032548999318807485
},
{
"bucket": "bucket_2",
"rollup": "2020-01-01T01:00:00Z",
"country": "United States",
"base_bucket_max": 0.9999971182990177,
"base_bucket_min": 0.5009102069226075,
"base_count_per_bucket": 4988,
"base_ln_probability_per_bucket": -0.6955500651756032,
"base_probability_per_bucket": 0.4988,
"base_total": 10000,
"target_bucket_max": 0.9999971182990177,
"target_bucket_min": 0.5009102069226075,
"target_count_per_bucket": 2487,
"target_ln_probability_per_bucket": -0.6701670131762315,
"target_probability_per_bucket": 0.5116231228142357,
"target_total": 4861,
"probability_difference": -0.012823122814235699,
"ln_probability_difference": -0.025383051999371742,
"psi": 0.00032548999318807485
}
]
}
```
Sample Request: Compare data drift bucket components between two batches, with no grouping, no filters, and no rollups.
```json
{
"property": "age",
"num_bins": 10,
"base": {
"source": "inference",
"filter": [
{
"property": "batch_id",
"comparator": "eq",
"value": "5"
}
]
},
"target": {
"source": "inference",
"filter": [
{
"property": "batch_id",
"comparator": "eq",
"value": "6"
}
]
}
}
```
Sample Response:
```json
{
"query_result": [
{
"bucket": "bucket_1",
"base_bucket_max": 0.9999971182990177,
"base_bucket_min": 0.5009102069226075,
"base_count_per_bucket": 4988,
"base_ln_probability_per_bucket": -0.6955500651756032,
"base_probability_per_bucket": 0.4988,
"base_total": 10000,
"target_bucket_max": 0.9999971182990177,
"target_bucket_min": 0.5009102069226075,
"target_count_per_bucket": 2487,
"target_ln_probability_per_bucket": -0.6701670131762315,
"target_probability_per_bucket": 0.5116231228142357,
"target_total": 4861,
"probability_difference": -0.012823122814235699,
"ln_probability_difference": -0.025383051999371742,
"psi": 0.00032548999318807485
},
{
"bucket": "bucket_2",
"base_bucket_max": 0.9999971182990177,
"base_bucket_min": 0.5009102069226075,
"base_count_per_bucket": 4988,
"base_ln_probability_per_bucket": -0.6955500651756032,
"base_probability_per_bucket": 0.4988,
"base_total": 10000,
"target_bucket_max": 0.9999971182990177,
"target_bucket_min": 0.5009102069226075,
"target_count_per_bucket": 2487,
"target_ln_probability_per_bucket": -0.6701670131762315,
"target_probability_per_bucket": 0.5116231228142357,
"target_total": 4861,
"probability_difference": -0.012823122814235699,
"ln_probability_difference": -0.025383051999371742,
"psi": 0.00032548999318807485
}
]
}
```
[back to top](#data-drift)
## Data Drift for Classification Outputs
For classification outputs, one may want to examine drift among a collection of different classes, i.e. the system of outputs, instead of the drift of the probability predictions of a single class. The query uses one of `"predicted_classes": ["*"]` or `"ground_truth_classes": ["*"]` but otherwise is identical to a standard data drift query. Rather than using the star operator to select all prediction or ground truth classes, respectively, in a model, a list of string classes can be provided for looking at drift of a subset of multiclass outputs.
* `predicted_classes` - Specifies which prediction classes to use for `predictedClass` data drift.
* `ground_truth_classes` - Specifies which prediction classes to use for `groundTruthClass` data drift.
`properties` can be included in the same query as long as the target `source` corresonds to the classification output tag. For example, one can query drift on input attributes and `predictedClass` in the same query with target `source` of `inference`; one can query drift on individual ground truth labels and `groundTruthClass` in the same query with target `source` of `ground_truth`.
Query Request:
```json
{
"properties [Optional]": [
"<attribute1_name> [string]",
"<attribute2_name> [string]",
"<attribute3_name> [string]"
],
"[predicted_classes|ground_truth_classes]": [
"<class0_name> [string]"
"<class1_name> [string]"
],
"num_bins": "<num_bins> [int]",
"metric": "[PSI|KLDivergence|JSDivergence|HellingerDistance|HypothesisTest]",
"base": {
"source": "[inference|reference]",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"target": {
"source": "[inference|reference|ground_truth]",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"group_by [Optional]": [
{
"property": "<group_by_attribute_name> [string]"
}
],
"rollup [Optional]": "minute|hour|day|month|year|batch_id"
}
```
Query Response:
```json
{
"query_result": [
{
"<attribute1_name>": "<attribute1_data_drift> [float]",
"<attribute2_name>": "<attribute2_data_drift> [float]",
"<attribute3_name>": "<attribute3_data_drift> [float]",
"[predictedClass|groundTruthClass]": "<classification_data_drift> [float]",
"<group_by_attribute_name>": "<group_by_attribute_value> [string|int|null]",
"rollup": "<rollup_attribute_value> [string|null]"
}
]
}
```
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators.
***
Sample Request: Calculate data drift on all prediction classes.
```json
{
"predicted_classes": [
"*"
],
"num_bins": 20,
"base": {
"source": "reference"
},
"target": {
"source": "inference"
},
"metric": "PSI"
}
```
Sample Response:
```json
{
"query_result": [
{
"predictedClass": 0.021
}
]
}
```
Sample Request: Calculate data drift on ground truth using the first and third ground truth classes.
```json
{
"predicted_classes": [
"gt_1",
"gt_3"
],
"num_bins": 20,
"base": {
"source": "reference"
},
"target": {
"source": "ground_truth"
},
"metric": "PSI"
}
```
Sample Response:
```json
{
"query_result": [
{
"groundTruthClass": 0.021
}
]
}
```
[back to top](#data-drift)
(automated_data_drift_thresholds)=
## Automated Data Drift Thresholds
What is a sufficiently high data drift value to suggest that the target data has actually drifted from the base data? For `HypothesisTest`, we can reverse engineer -log_10(P_value) and plug in the conventional .05 alpha level to establish a lower bound of -log_10(.05).
For the other data drift metrics, it is not sufficient to pin a constant. We abstract this away for the user and allow queries to obtain automatically generated data drift thresholds (lower bounds) based on a model's data. These thresholds can be used in alerting. For more information see: [Automating Data Drift Thresholding in Machine Learning Systems](https://arthur.ai/blog/automating-data-drift-thresholding-in-machine-learning-systems).
The query uses `"metric": "Thresholds"` and does not require nor use `"target"` and `"rollup"` fields but otherwise is identical to a standard data drift query.
Query Request:
```json
{
"properties": [
"<attribute1_name> [string]",
"<attribute2_name> [string]",
"<attribute3_name> [string]"
],
"num_bins": "<num_bins> [int]",
"metric": "Thresholds",
"base": {
"source": "reference",
"filter [Optional]": [
{
"property": "<filter_attribute_name> [string]",
"comparator": "<comparator> [string]",
"value": "<filter_threshold_value> [string|int|float]"
}
]
},
"group_by [Optional]": [
{
"property": "<group_by_attribute_name> [string]"
}
]
}
```
Query Response:
```json
{
"query_result": [
{
"<attribute1_name>": {
"HellingerDistance": "<threshold> [float]",
"JSDivergence": "<threshold> [float]",
"KLDivergence": "<threshold> [float]",
"PSI": "<threshold> [float]"
},
"<attribute2_name>": {
"HellingerDistance": "<threshold> [float]",
"JSDivergence": "<threshold> [float]",
"KLDivergence": "<threshold> [float]",
"PSI": "<threshold> [float]"
}
}
]
}
```
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators.
***
Sample Request:
```json
{
"properties": [
"AGE"
],
"num_bins": 20,
"base": {
"source": "reference"
},
"metric": "Thresholds"
}
```
Sample Response:
```json
{
"query_result": [
{
"AGE": {
"HellingerDistance": 0.00041737395239735647,
"JSDivergence": 2.959228131592643,
"KLDivergence": 0.001893866910388703,
"PSI": 0.0018945640055550161
}
}
]
}
```
[back to top](#data-drift)