Spaces:
Sleeping
Sleeping
File size: 9,485 Bytes
ad8da65 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# Basic Concepts ## Arthur Overview The Arthur platform monitors, measures, and improves machine learning models to deliver better business outcomes. Arthur helps data scientists, product owners, and business leaders accelerate model operations and optimize for accuracy, explainability, and fairness. To use Arthur, you: 1. Register your model with the Arthur system 2. Set reference data for baseline analytics 3. Send inference data over time With this data, Arthur quantifies and centralizes your models' performance for efficient querying and automated analytics. ## Models and Onboarding ### Registering a Model When you register a model with Arthur, you define the way the model processes data. Arthur is model-agnostic and platform-agnostic, so no matter which tools you used to build or deploy, you can use Arthur to log all the data your model receives and produces - registration tells Arthur how this will happen. (basic_concepts_input_output_types)= #### Input and Output Types These are the data types that define the data that enters and exits your model. The `InputType` of a model specifies whether data enters your model as a tabular dataframe, as an image, or as raw text. The `OutputType` of a model specifies the modeling task at hand: whether your model is predicting values for a regression task, predicting probabilities for a classification task, or predicting bounding boxes for a computer vision object-detection task. (basic_concepts_streaming_vs_batch)= #### Streaming vs. Batch When registering a model, you specify whether your model ingests data either as a stream or in batches. A streaming model receives instances of data as they come into the deployed model. A batch model, in contrast, receives data in groups, and is often preferred if your model runs as a job rather than operating in realtime or over a data stream. Indicating a batch model simply means that you'll supply an additional "batch_id" to group your inferences, and Arthur will default to measuring performance for each batch rather than by the inference timestamps. (basic_concepts_attributes_and_stages)= ### Attributes and Stages Attributes are analagous to the different columns that make up your model's data. Each attribute has a value type: these can be standard types like `int` and `str`, or datatypes for complex models like raw text and images. When you are onboarding a model, Arthur categorizes each attribute into a different `Stage`, depending on the role of the attribute in the model pipeline: 1. `ModelPipelineInput`: these attributes are the features your model receives as input 1. `PredictedValue`: these attributes are the output values your model produces 1. `GroundTruth`: these attributes are the true values for your model's prediction task, for comparing the model's outputs against for performance metrics 1. `NonInputData`: these attributes are the additional metadata you can log with Arthur related to input data that your model doesn't take as an input feature, e.g. protected attributes like age, race, or sex, or specific business data like a unique customer ID ### Model Schema The model schema is a record of important properties of your model's attributes, including their value type and `Stage`. As you log data over time with Arthur, the model schema is used to type-check ingested data. This prevents analytics from being skewed by scenarios like `int` values suddenly replacing `float` values causing silent bugs. Arthur also records attribute properties in the model schema, like the range of possible values an attribute has in your data. These properties are used to get a sense of your data's high-level structure, _not_ to enforce that future attributes have strictly these same properties. (basic_concepts_reference_dataset)= ### Reference Dataset ```{image} /_static/images/Model-Reference-Dataset-Light-Mode.png :align: center :class: only-light ``` ```{image} /_static/images/Model-Reference-Dataset-Dark-Mode.png :align: center :class: only-dark ``` The reference dataset is a representative sample of the input features your model ingests. This can be the model's training data, or any other set of data that captures the distribution of data your model's inputs are sourced from. This dataset is used to compute baseline model analytics. By capturing the distribution of data you expect your model to receive, Arthur can detect, surface, and diagnose data drift before it impacts results. Note that Arthur can compute data drift metrics against any two distributions you choose (e.g. inferences now compared to the same time last year), but the platform uses the reference dataset as the default. The only required stage to be included in the reference dataset is `ModelPipelineInput`. But we also recommend including data from the `PredictedValue`, `GroundTruth`, and `NonInputData` stages so that Arthur can measure drift in those attributes over time as well. ## Sending Data to Arthur ### Inferences The data your model produces over time is logged in the Arthur platform as **inferences**. These inferences contain attributes from the `ModelPipelineInput` and `PredictedValue` stage (model inputs and outputs), from which Arthur computes performance metrics. In addition, when you log these inferences, you have the option to include `GroundTruth` and `NonInputData` attributes. ### Sending Ground Truth Separately ```{image} /_static/images/Ground-Truth-Light-Mode.png :align: center :class: only-light ``` ```{image} /_static/images/Ground-Truth-Dark-Mode.png :align: center :class: only-dark ``` `GroundTruth` attributes are often not available when models produce inferences. Therefore, Arthur allows you to send this attribute data to the platform _after_ sending the original inferences, using an ID to pair data with the right inference. ## Metrics Metrics are the measurements Arthur computes to quantify model performance. Default metrics are the basic model performance metrics generated automatically by Arthur, e.g. accuracy, mean-squared error, or AUC. Furthermore, additional metrics can be written using the API and added to a model for measuring performance specific to a custom business use-case. You can use the Arthur API to efficiently query model performance metrics at scale. Model metrics can be accessed in the online Arthur UI, using the Arthur API, and by using the Arthur Python SDK. See the {doc}`/user-guide/api-query-guide/index` for more resources on model metrics. ## Alerts An alert is a message notifying you that something has occurred with your model. With alerts, Arthur makes it easy to provide a continuous view into your model by highlighting important changes in model performance. An alert is triggered based on an **_alert rule_**, which you define using a metric and a threshold: when the metric crosses your threshold, the alert is activated. This alert can then be delivered to you via email, highlighted in the online Arthur UI, and/or accessed via integrations such as PagerDuty and Slack. For an in-depth guide to setting alerts, see the {doc}`/user-guide/walkthroughs/metrics_alerts` guide. ## Enrichments Enrichments are additional services that the Arthur platform provides for state-of-the-art proactive model monitoring: - **Explainability**: methods for computing the importance of individual features from your data on your model's outcomes. - **Anomaly Detection**: drift metrics for quantifying how far incoming inferences have drifted from the distribution of your model's reference dataset. - **Hotspots**: automated identification of segments of your data where your model is underperforming. - **Bias Mitigation**: methods for model post-processing that improve the fairness of outcomes without re-deploying your model. Once activated, these enrichments are computed on Arthur's backend automatically, with results viewable in the online UI dashboard and queryable from Arthur's API. The {doc}`/user-guide/walkthroughs/enrichments` guide shows how to set up enrichments and describes all of Arthur's currently offered enrichments. ## Insights Insights are proactive notifications about your model's performance. For example, once you've enabled the Hotspots enrichment you'll receive insights about regions of your data space where model accuracy has significantly degraded. ## Model Groups and Versioning Arthur helps you track the improvements of model updates with Model Versioning. If a change has occurred to your data preprocessing pipeline, if you have retrained your model, or if you've reset your model's reference data, your updated model is likely a new version of a previous model addressing the same task. In this case, Arthur recommends keeping these models within the same Model Group to track performance as you continue improving your model. Each model you onboard to Arthur is placed in a Model Group. As you change the model over time, you can add new versions of the model that will live in the same group. Tracking improvement in performance over time within a Model Group is then streamlined for quick insights in the Arthur UI dashboard. ## Next Steps ### {doc}`Onboard Your Model </user-guide/walkthroughs/model-onboarding/index>` The {doc}`Model Onboarding walkthrough </user-guide/walkthroughs/model-onboarding/index>` covers the steps of onboarding a model, formatting attribute data, and sending inferences to Arthur. |