Spaces:
Sleeping
Sleeping
# Data Drift | |
## Querying Drift in Python | |
The basic format of a drift query using the Python SDK involves specifying that the | |
`query_type` parameter has the value 'drift': | |
```python | |
query = {...} | |
arthur_model.query(query, query_type='drift') | |
``` | |
## Data Drift Endpoint | |
Data drift has a dedicated endpoint at `/models/{model_id}/inferences/query/data_drift`. | |
Returns the data drift metric between a `base` dataset with a `target` dataset. This endpoint can support up to 100 properties in one request. | |
* `num_bins` - Specifies the granularity of bucketing for continuous distributions and will be ignored if the attribute is categorical. | |
* `metric` - Specify one metric among {ref}`the data drift metrics Arthur offers <glossary_data_drift>`. | |
* `filter` - Optional blocks specific to either reference or inference set and specify which data should be used in the data drift calculation. | |
* `group_by` - Global and applies to both the base and target data. | |
* `rollup` - Optional parameter that will aggregate the calculated data drift value by the supported time dimension. | |
For `HypothesisTest`, the returned value is transformed as -log_10(P_value) to maintain directional parity with the other data drift metrics. That is, lower P_value is more significant and implies data drift, reflected in a higher -log_10(P_value). Further mathematical details are in the {ref}`glossary <glossary_hypothesis_test>. | |
Query Request: | |
```json | |
{ | |
"properties": [ | |
"<attribute1_name> [string]", | |
"<attribute2_name> [string]", | |
"<attribute3_name> [string]" | |
], | |
"num_bins": "<num_bins> [int]", | |
"metric": "[PSI|KLDivergence|JSDivergence|HellingerDistance|HypothesisTest]", | |
"base": { | |
"source": "[inference|reference]", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"target": { | |
"source": "[inference|reference|ground_truth]", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"group_by [Optional]": [ | |
{ | |
"property": "<group_by_attribute_name> [string]" | |
} | |
], | |
"rollup [Optional]": "minute|hour|day|month|year|batch_id" | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<attribute1_name>": "<attribute1_data_drift> [float]", | |
"<attribute2_name>": "<attribute2_data_drift> [float]", | |
"<attribute3_name>": "<attribute3_data_drift> [float]", | |
"<group_by_attribute_name>": "<group_by_attribute_value> [string|int|null]", | |
"rollup": "<rollup_attribute_value> [string|null]" | |
} | |
] | |
} | |
``` | |
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators. | |
#### Example: Reference vs. Inference | |
Sample Request: Calculate data drift for males, grouped by country, rolled up by hour. | |
```json | |
{ | |
"properties": [ | |
"age" | |
], | |
"num_bins": 10, | |
"metric": "PSI", | |
"base": { | |
"source": "reference", | |
"filter": [ | |
{ | |
"property": "gender", | |
"comparator": "eq", | |
"value": "male" | |
} | |
] | |
}, | |
"target": { | |
"source": "inference", | |
"filter": [ | |
{ | |
"property": "gender", | |
"comparator": "eq", | |
"value": "male" | |
}, | |
{ | |
"property": "inference_timestamp", | |
"comparator": "gte", | |
"value": "2020-07-22T10:00:00Z" | |
}, | |
{ | |
"property": "inference_timestamp", | |
"comparator": "lt", | |
"value": "2020-07-23T10:00:00Z" | |
} | |
] | |
}, | |
"group_by": [ | |
{ | |
"property": "country" | |
} | |
], | |
"rollup": "hour" | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"age": 2.3, | |
"country": "Canada", | |
"rollup": "2020-07-22T10:00:00Z" | |
}, | |
{ | |
"age": 2.4, | |
"country": "United States", | |
"rollup": "2020-07-22T10:00:00Z" | |
} | |
] | |
} | |
``` | |
### Example: Inference vs. Inference | |
Sample Request: Compare data drift between two batches, with no grouping, no filters, and no rollups. | |
```json | |
{ | |
"properties": [ | |
"age" | |
], | |
"num_bins": 10, | |
"metric": "PSI", | |
"base": { | |
"source": "inference", | |
"filter": [ | |
{ | |
"property": "batch_id", | |
"comparator": "eq", | |
"value": "5" | |
} | |
] | |
}, | |
"target": { | |
"source": "inference", | |
"filter": [ | |
{ | |
"property": "batch_id", | |
"comparator": "eq", | |
"value": "6" | |
} | |
] | |
} | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"age": 2.3 | |
} | |
] | |
} | |
``` | |
[back to top](#data-drift) | |
### Example: Reference vs. Ground Truth | |
Sample Request: Calculate data drift for individual ground truth class prediction probabilities, rolled up by hour. | |
```json | |
{ | |
"properties": [ | |
"gt_1" | |
], | |
"num_bins": 10, | |
"metric": "PSI", | |
"base": { | |
"source": "reference" | |
}, | |
"target": { | |
"source": "ground_truth", | |
"filter": [ | |
{ | |
"property": "ground_truth_timestamp", | |
"comparator": "gte", | |
"value": "2020-07-22T10:00:00Z" | |
}, | |
{ | |
"property": "ground_truth_timestamp", | |
"comparator": "lt", | |
"value": "2020-07-23T10:00:00Z" | |
} | |
] | |
}, | |
"rollup": "hour" | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"gt_1": 0.03, | |
"rollup": "2020-07-22T10:00:00Z" | |
}, | |
{ | |
"gt_1": 0.4, | |
"rollup": "2020-07-22T11:00:00Z" | |
} | |
] | |
} | |
``` | |
[back to top](#data-drift) | |
## Data Drift PSI Bucket Table Values | |
This metric has a dedicated endpoint at `/models/{model_id}/inferences/query/data_drift_psi_bucket_calculation_table`. | |
Returns the [PSI](https://scholarworks.wmich.edu/cgi/viewcontent.cgi?article=4249&context=dissertations) scores by bucket using the reference set data. This query for this endpoint omits the need for `metric` and takes in a single `property` but otherwise is identical to the [data drift endpoint](#data-drift-endpoint) | |
Note when using this endpoint with categorical features, the `bucket_min` and `bucket_max` fields will not be | |
returned in the response. Instead, the `bucket` field will contain the category name. | |
Query Request: | |
```json | |
{ | |
"property": "<attribute_name> [string]", | |
"num_bins": "<num_bins> [int]", | |
"base": { | |
"source": "[inference|reference]", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"target": { | |
"source": "[inference|reference]", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"group_by [Optional]": [ | |
{ | |
"property": "<group_by_attribute_name> [string]" | |
} | |
], | |
"rollup [Optional]": "minute|hour|day|month|year|batch_id" | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"bucket": "string", | |
"rollup": "string|null", | |
"group_by_property_1": "string|null", | |
"base_bucket_max": "number", | |
"base_bucket_min": "number", | |
"base_count_per_bucket": "number", | |
"base_ln_probability_per_bucket": "number", | |
"base_probability_per_bucket": "number", | |
"base_total": "number", | |
"target_bucket_max": "number", | |
"target_bucket_min": "number", | |
"target_count_per_bucket": "number", | |
"target_ln_probability_per_bucket": "number", | |
"target_probability_per_bucket": "number", | |
"target_total": "number", | |
"probability_difference": "number", | |
"ln_probability_difference": "number", | |
"psi": "number" | |
} | |
] | |
} | |
``` | |
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators. | |
*** | |
Sample Request: Calculate data drift bucket components for males, grouped by country, rolled up by hour. | |
```json | |
{ | |
"property": "age", | |
"num_bins": 2, | |
"base": { | |
"source": "reference", | |
"filter": [ | |
{ | |
"property": "gender", | |
"comparator": "eq", | |
"value": "male" | |
} | |
] | |
}, | |
"target": { | |
"source": "inference", | |
"filter": [ | |
{ | |
"property": "gender", | |
"comparator": "eq", | |
"value": "male" | |
}, | |
{ | |
"property": "inference_timestamp", | |
"comparator": "gte", | |
"value": "2020-07-22T10:00:00Z" | |
}, | |
{ | |
"property": "inference_timestamp", | |
"comparator": "lt", | |
"value": "2020-07-23T10:00:00Z" | |
} | |
] | |
}, | |
"group_by": [ | |
{ | |
"property": "country" | |
} | |
], | |
"rollup": "hour" | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"bucket": "bucket_1", | |
"rollup": "2020-01-01T00:00:00Z", | |
"country": "Canada", | |
"base_bucket_max": 0.9999971182990177, | |
"base_bucket_min": 0.5009102069226075, | |
"base_count_per_bucket": 4988, | |
"base_ln_probability_per_bucket": -0.6955500651756032, | |
"base_probability_per_bucket": 0.4988, | |
"base_total": 10000, | |
"target_bucket_max": 0.9999971182990177, | |
"target_bucket_min": 0.5009102069226075, | |
"target_count_per_bucket": 2487, | |
"target_ln_probability_per_bucket": -0.6701670131762315, | |
"target_probability_per_bucket": 0.5116231228142357, | |
"target_total": 4861, | |
"probability_difference": -0.012823122814235699, | |
"ln_probability_difference": -0.025383051999371742, | |
"psi": 0.00032548999318807485 | |
}, | |
{ | |
"bucket": "bucket_2", | |
"rollup": "2020-01-01T00:00:00Z", | |
"country": "United States", | |
"base_bucket_max": 0.9999971182990177, | |
"base_bucket_min": 0.5009102069226075, | |
"base_count_per_bucket": 4988, | |
"base_ln_probability_per_bucket": -0.6955500651756032, | |
"base_probability_per_bucket": 0.4988, | |
"base_total": 10000, | |
"target_bucket_max": 0.9999971182990177, | |
"target_bucket_min": 0.5009102069226075, | |
"target_count_per_bucket": 2487, | |
"target_ln_probability_per_bucket": -0.6701670131762315, | |
"target_probability_per_bucket": 0.5116231228142357, | |
"target_total": 4861, | |
"probability_difference": -0.012823122814235699, | |
"ln_probability_difference": -0.025383051999371742, | |
"psi": 0.00032548999318807485 | |
}, | |
{ | |
"bucket": "bucket_1", | |
"rollup": "2020-01-01T01:00:00Z", | |
"country": "Canada", | |
"base_bucket_max": 0.9999971182990177, | |
"base_bucket_min": 0.5009102069226075, | |
"base_count_per_bucket": 4988, | |
"base_ln_probability_per_bucket": -0.6955500651756032, | |
"base_probability_per_bucket": 0.4988, | |
"base_total": 10000, | |
"target_bucket_max": 0.9999971182990177, | |
"target_bucket_min": 0.5009102069226075, | |
"target_count_per_bucket": 2487, | |
"target_ln_probability_per_bucket": -0.6701670131762315, | |
"target_probability_per_bucket": 0.5116231228142357, | |
"target_total": 4861, | |
"probability_difference": -0.012823122814235699, | |
"ln_probability_difference": -0.025383051999371742, | |
"psi": 0.00032548999318807485 | |
}, | |
{ | |
"bucket": "bucket_2", | |
"rollup": "2020-01-01T01:00:00Z", | |
"country": "United States", | |
"base_bucket_max": 0.9999971182990177, | |
"base_bucket_min": 0.5009102069226075, | |
"base_count_per_bucket": 4988, | |
"base_ln_probability_per_bucket": -0.6955500651756032, | |
"base_probability_per_bucket": 0.4988, | |
"base_total": 10000, | |
"target_bucket_max": 0.9999971182990177, | |
"target_bucket_min": 0.5009102069226075, | |
"target_count_per_bucket": 2487, | |
"target_ln_probability_per_bucket": -0.6701670131762315, | |
"target_probability_per_bucket": 0.5116231228142357, | |
"target_total": 4861, | |
"probability_difference": -0.012823122814235699, | |
"ln_probability_difference": -0.025383051999371742, | |
"psi": 0.00032548999318807485 | |
} | |
] | |
} | |
``` | |
Sample Request: Compare data drift bucket components between two batches, with no grouping, no filters, and no rollups. | |
```json | |
{ | |
"property": "age", | |
"num_bins": 10, | |
"base": { | |
"source": "inference", | |
"filter": [ | |
{ | |
"property": "batch_id", | |
"comparator": "eq", | |
"value": "5" | |
} | |
] | |
}, | |
"target": { | |
"source": "inference", | |
"filter": [ | |
{ | |
"property": "batch_id", | |
"comparator": "eq", | |
"value": "6" | |
} | |
] | |
} | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"bucket": "bucket_1", | |
"base_bucket_max": 0.9999971182990177, | |
"base_bucket_min": 0.5009102069226075, | |
"base_count_per_bucket": 4988, | |
"base_ln_probability_per_bucket": -0.6955500651756032, | |
"base_probability_per_bucket": 0.4988, | |
"base_total": 10000, | |
"target_bucket_max": 0.9999971182990177, | |
"target_bucket_min": 0.5009102069226075, | |
"target_count_per_bucket": 2487, | |
"target_ln_probability_per_bucket": -0.6701670131762315, | |
"target_probability_per_bucket": 0.5116231228142357, | |
"target_total": 4861, | |
"probability_difference": -0.012823122814235699, | |
"ln_probability_difference": -0.025383051999371742, | |
"psi": 0.00032548999318807485 | |
}, | |
{ | |
"bucket": "bucket_2", | |
"base_bucket_max": 0.9999971182990177, | |
"base_bucket_min": 0.5009102069226075, | |
"base_count_per_bucket": 4988, | |
"base_ln_probability_per_bucket": -0.6955500651756032, | |
"base_probability_per_bucket": 0.4988, | |
"base_total": 10000, | |
"target_bucket_max": 0.9999971182990177, | |
"target_bucket_min": 0.5009102069226075, | |
"target_count_per_bucket": 2487, | |
"target_ln_probability_per_bucket": -0.6701670131762315, | |
"target_probability_per_bucket": 0.5116231228142357, | |
"target_total": 4861, | |
"probability_difference": -0.012823122814235699, | |
"ln_probability_difference": -0.025383051999371742, | |
"psi": 0.00032548999318807485 | |
} | |
] | |
} | |
``` | |
[back to top](#data-drift) | |
## Data Drift for Classification Outputs | |
For classification outputs, one may want to examine drift among a collection of different classes, i.e. the system of outputs, instead of the drift of the probability predictions of a single class. The query uses one of `"predicted_classes": ["*"]` or `"ground_truth_classes": ["*"]` but otherwise is identical to a standard data drift query. Rather than using the star operator to select all prediction or ground truth classes, respectively, in a model, a list of string classes can be provided for looking at drift of a subset of multiclass outputs. | |
* `predicted_classes` - Specifies which prediction classes to use for `predictedClass` data drift. | |
* `ground_truth_classes` - Specifies which prediction classes to use for `groundTruthClass` data drift. | |
`properties` can be included in the same query as long as the target `source` corresonds to the classification output tag. For example, one can query drift on input attributes and `predictedClass` in the same query with target `source` of `inference`; one can query drift on individual ground truth labels and `groundTruthClass` in the same query with target `source` of `ground_truth`. | |
Query Request: | |
```json | |
{ | |
"properties [Optional]": [ | |
"<attribute1_name> [string]", | |
"<attribute2_name> [string]", | |
"<attribute3_name> [string]" | |
], | |
"[predicted_classes|ground_truth_classes]": [ | |
"<class0_name> [string]" | |
"<class1_name> [string]" | |
], | |
"num_bins": "<num_bins> [int]", | |
"metric": "[PSI|KLDivergence|JSDivergence|HellingerDistance|HypothesisTest]", | |
"base": { | |
"source": "[inference|reference]", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"target": { | |
"source": "[inference|reference|ground_truth]", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"group_by [Optional]": [ | |
{ | |
"property": "<group_by_attribute_name> [string]" | |
} | |
], | |
"rollup [Optional]": "minute|hour|day|month|year|batch_id" | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<attribute1_name>": "<attribute1_data_drift> [float]", | |
"<attribute2_name>": "<attribute2_data_drift> [float]", | |
"<attribute3_name>": "<attribute3_data_drift> [float]", | |
"[predictedClass|groundTruthClass]": "<classification_data_drift> [float]", | |
"<group_by_attribute_name>": "<group_by_attribute_value> [string|int|null]", | |
"rollup": "<rollup_attribute_value> [string|null]" | |
} | |
] | |
} | |
``` | |
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators. | |
*** | |
Sample Request: Calculate data drift on all prediction classes. | |
```json | |
{ | |
"predicted_classes": [ | |
"*" | |
], | |
"num_bins": 20, | |
"base": { | |
"source": "reference" | |
}, | |
"target": { | |
"source": "inference" | |
}, | |
"metric": "PSI" | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"predictedClass": 0.021 | |
} | |
] | |
} | |
``` | |
Sample Request: Calculate data drift on ground truth using the first and third ground truth classes. | |
```json | |
{ | |
"predicted_classes": [ | |
"gt_1", | |
"gt_3" | |
], | |
"num_bins": 20, | |
"base": { | |
"source": "reference" | |
}, | |
"target": { | |
"source": "ground_truth" | |
}, | |
"metric": "PSI" | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"groundTruthClass": 0.021 | |
} | |
] | |
} | |
``` | |
[back to top](#data-drift) | |
(automated_data_drift_thresholds)= | |
## Automated Data Drift Thresholds | |
What is a sufficiently high data drift value to suggest that the target data has actually drifted from the base data? For `HypothesisTest`, we can reverse engineer -log_10(P_value) and plug in the conventional .05 alpha level to establish a lower bound of -log_10(.05). | |
For the other data drift metrics, it is not sufficient to pin a constant. We abstract this away for the user and allow queries to obtain automatically generated data drift thresholds (lower bounds) based on a model's data. These thresholds can be used in alerting. For more information see: [Automating Data Drift Thresholding in Machine Learning Systems](https://arthur.ai/blog/automating-data-drift-thresholding-in-machine-learning-systems). | |
The query uses `"metric": "Thresholds"` and does not require nor use `"target"` and `"rollup"` fields but otherwise is identical to a standard data drift query. | |
Query Request: | |
```json | |
{ | |
"properties": [ | |
"<attribute1_name> [string]", | |
"<attribute2_name> [string]", | |
"<attribute3_name> [string]" | |
], | |
"num_bins": "<num_bins> [int]", | |
"metric": "Thresholds", | |
"base": { | |
"source": "reference", | |
"filter [Optional]": [ | |
{ | |
"property": "<filter_attribute_name> [string]", | |
"comparator": "<comparator> [string]", | |
"value": "<filter_threshold_value> [string|int|float]" | |
} | |
] | |
}, | |
"group_by [Optional]": [ | |
{ | |
"property": "<group_by_attribute_name> [string]" | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<attribute1_name>": { | |
"HellingerDistance": "<threshold> [float]", | |
"JSDivergence": "<threshold> [float]", | |
"KLDivergence": "<threshold> [float]", | |
"PSI": "<threshold> [float]" | |
}, | |
"<attribute2_name>": { | |
"HellingerDistance": "<threshold> [float]", | |
"JSDivergence": "<threshold> [float]", | |
"KLDivergence": "<threshold> [float]", | |
"PSI": "<threshold> [float]" | |
} | |
} | |
] | |
} | |
``` | |
See {ref}`endpoint_overview_filter_comparators` for a list of valid comparators. | |
*** | |
Sample Request: | |
```json | |
{ | |
"properties": [ | |
"AGE" | |
], | |
"num_bins": 20, | |
"base": { | |
"source": "reference" | |
}, | |
"metric": "Thresholds" | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"AGE": { | |
"HellingerDistance": 0.00041737395239735647, | |
"JSDivergence": 2.959228131592643, | |
"KLDivergence": 0.001893866910388703, | |
"PSI": 0.0018945640055550161 | |
} | |
} | |
] | |
} | |
``` | |
[back to top](#data-drift) | |