Spaces:
Sleeping
Sleeping
File size: 28,173 Bytes
ad8da65 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 |
# Model Onboarding ### Overview This guide walks through the steps of onboarding a model deployed in production to Arthur. Once your deployed model is onboarded, you can use Arthur to retrieve insights about model performance efficiently and at scale. ```{note} This walkthrough uses tabular data. To onboard models of other input types, see {doc}`cv_onboarding` and {doc}`nlp_onboarding`. ``` ### Requirements You will need to have access to the data your model ingests and the predictions it produces. The model object itself is _not_ required, but it can be uploaded to enable the explainability enrichment. See our {doc}`/more-info/FAQs` for more info. *** ### Outline This guide will cover the three main steps to onboarding a model to the Arthur platform: - [Model Registration](#model-registration) is the process of registering the model schema with Arthur and sending reference data - [Onboarding Existing Inferences](#onboard-existing-inferences) sends your model's historical predictions to the Arthur platform - [Production Integration](#production-integration) connects your model's ongoing predictions in deployment to be logged with Arthur *** ## Model Registration ### Connect to Arthur The first step is to import functions from the `arthurai` package and establish a connection with an Arthur username and password. ```python # Arthur imports from arthurai import ArthurAI from arthurai.common.constants import InputType, OutputType, Stage, ValueType, Enrichment arthur = ArthurAI(url="https://app.arthur.ai", login="<YOUR_USERNAME_OR_EMAIL>") ``` ### Register Model Type To register a model, we start by creating a model object and defining its {ref}`high-level metadata <basic_concepts_input_output_types>`: ```python arthur_model = arthur.model( partner_model_id="OnboardingModel_123", display_name="OnboardingModel", input_type=InputType.Tabular, output_type=OutputType.Multiclass, is_batch=False) ``` In particular, we set `is_batch=False` to define this as a {ref}`streaming model <basic_concepts_streaming_vs_batch>`, which means the Arthur platform will receive the model's inferences as they are produced live in deployment. ### Register Attributes with [ArthurModel.build()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) Next we'll add more detail to the model metadata, defining the model's {ref}`attributes <basic_concepts_attributes_and_stages>`. The simplest method of registering your attributes is to use [ArthurModel.build()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) , which parses a Pandas DataFrame of your {ref}`reference dataset <basic_concepts_reference_dataset>` containing inputs, metadata, predictions, and ground truth labels. In addition, a `pred_to_ground_truth_map` is required, which tells Arthur which of your attributes represent to your model's predicted values, and how those predicted attributes correspond to your model's ground truth attributes. Here we build a model with a `pred_to_ground_truth_map` configured for a binary classification model. ```python # Map PredictedValue attribute to its corresponding GroundTruth attribute value. # This tells Arthur that in the data you send to the platform, # the `predicted_probability` column represents # the probability that the ground-truth column has the value 1 pred_to_ground_truth_map = { 'predicted_probability' : 1 } arthur_model.build( reference_df, ground_truth_column='ground_truth_label', pred_to_ground_truth_map=pred_to_ground_truth_map) ``` #### Non Input Attributes Some features of your data may be important to track for monitoring model performance even though they are not model inputs or outputs. These features can be added as non input attributes in the ArthurModel: ```python # Specifying additional non input attributes when building a model. # This tells Arthur to monitor ['age','sex','race','education'] # in the reference and inference data you send to the platform arthur_model.build( reference_df, ground_truth_column='ground_truth_label', pred_to_ground_truth_map=pred_to_ground_truth_map, non_input_columns=['age','sex','race','education'] ) ``` ### Register Attributes Manually As an alternative to passing a DataFrame to [ArthurModel.build()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) , attributes can also be registered for your model manually. Registering attributes manually may be preferable if you don't use the Pandas library, or if there are attribute properties not configurable from parsing your reference data alone. [ArthurModel.add_attribute()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.add_attribute) is the generic method add any type of attribute to a model - its [docstring](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.add_attribute) also links to the additional attribute registration methods tailored to specific model and data types for convenience. #### Binary Classifier with Two Ground Truth Classes If the data you send to the platform for a binary classifier has columns for the predicted probability and ground-truth-status of class 0, as well as columns for the predicted probability and ground-truth-status of class 1, then map each predicted value column to its corresponding ground truth column: ```python # Map PredictedValue attributes to their corresponding GroundTruth attribute names pred_to_ground_truth_map = {'pred_0' : 'gt_0', 'pred_1' : 'gt_1'} # add the ground truth and predicted attributes to the model # specifying that the `pred_1` attribute is the # positive predicted attribute, which means it corresponds to the # probability that the binary target attribute is 1 arthur_model.add_binary_classifier_output_attributes( positive_predicted_attr='pred_1', pred_to_ground_truth_map=pred_to_ground_truth_map) ``` #### More Than Two Ground Truth Classes If you are using a Multi-class model then you will have more than two Ground Truth classes. In order to make this work with the Arthur Platform, you will need to: 1. Ensure that you are using `predict_proba` (or a similar function) to predict the probability of a specific Ground Truth Class 2. Ensure that each class probability is included in its own column in your dataset 3. Ensure that your Ground Truth mapping contains all possible classes that might be predicted So for example, if your model identifies the presence of an animal, specifically a dog, cat, or horse, in an image, your Ground Truth mapping must contain items for each of these clasess (even if the model output doesn't predict a value for these categories). If the data you send to the platform has ground truth one-hot encoded, then map predictions to each column name: ```python # Map PredictedValue attributes to their corresponding GroundTruth attribute names. # This pred_to_ground_truth_map maps predicted values to one-hot encoded ground truth columns. # For example, this tells Arthur that the `probability_dog` column represents # the probability that the `dog_ground_truth` column has the value 1. pred_to_ground_truth_map = { "probability_dog": "dog_ground_truth", "probability_cat": "cat_ground_truth", "probability_horse": "horse_ground_truth" } arthur_model.add_multiclass_classifier_output_attributes( pred_to_ground_truth_map=pred_to_ground_truth_map ) ``` If the data you send to the platform has ground truth values in a single column, then map predictions to each column value: ```python # Map PredictedValue attributes to their corresponding GroundTruth attribute values. # This pred_to_ground_truth_map maps predicted values to the values of the ground truth column. # For example, this tells Arthur that the `probability_dog` column represents # the probability that the ground truth column has the value "dog". pred_to_ground_truth_map = { "probability_dog": "dog", "probability_cat": "cat", "probability_horse": "horse" } arthur_model.add_classifier_output_attributes_gtclass( pred_to_ground_truth_map=pred_to_ground_truth_map, ground_truth_column="animal" ) ``` #### Regression Attributes If you are registering a regression model, then specify the type of the predicted and ground truth values when registering the attributes: ```python # Map PredictedValue attribute to its corresponding GroundTruth attribute pred_to_ground_truth_map = { "predidcted_value": "ground_truth_value", } # add the pred_to_ground_truth_map, and specify the type of the # predicted and ground truth values arthur_model.add_regression_output_attributes( pred_to_ground_truth_map = pred_to_ground_truth_map, value_type = ValueType.Float ) ``` ### Set Reference Data If you used your reference data to register your model's attributes with [ArthurModel.build()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) , you don't need to complete this step because the dataframe you pass in as input to `build()` will be automatically saved as your model's reference data in the Arthur system. If you didn't use `build()` or want to update the reference dataset to be sent to Arthur, you can set it directly by using the [`ArthurModel.set_reference_data()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.set_reference_data) method. This is also necessary if your reference dataset is too large to fit into memory as a Pandas DataFrame. ### Review Model The method [ArthurModel.review()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.review) returns the model schema, which is a dataframe of properties for each of your model's registered attributes. The `review()` method is automatically called when using `build()`, and can also be called on its own. Inspecting the model schema `review()` returns is recommended to verify that attribute properties have been inferred correctly. ```{note} Some important properties to check in the model schema: - Check that attributes have the correct value types - Check that attributes are correctly marked as categorical or continuous - Check that attributes you want to monitor for bias have monitor_for_bias=True ``` By default, printing the model schema doesn't display all the attribute properties. Therefore if you want to examine the model schema in its entirety, you can do so by formatting the maximum number of rows and columns to display: ```python pd.set_option('display.max_columns', 10); pd.set_option('max_rows', 50) arthur_model.review() ``` The model schema should look like this: ```python name stage value_type categorical is_unique categories bins range monitor_for_bias 0 X0 PIPELINE_INPUT FLOAT False False [] None [16.0, 58.0] False 1 ground_truth_label GROUND_TRUTH INTEGER True False [{value: 0}, {value: 1}] None [None, None] False 2 predicted_probability PREDICTED_VALUE FLOAT False False [] None [0, 1] False ``` ```{note} To modify attribute properties in the model schema table, see the docstring for [ArthurAttribute](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.attributes.ArthurAttribute.html#arthurai.core.attributes.ArthurAttribute) for a complete description of model attribute properties and their configuration methods. ``` ### Save Model Once you have reviewed your model schema and made any necessary modification to your model's attributes, you are ready to save your model to Arthur. Calling `arthur_model.save()` returns the unique ID Arthur creates for your model. You can easily load the model from the Arthur system later on using either this ID or the `partner_model_id` you specified when you first created the model. ```python arthur_model_id = arthur_model.save() ``` ### Activate Enrichments [Enrichments](../basic_concepts.html#enrichments) are model monitoring services Arthur provides that can be activated once your model is saved to Arthur. Models will have the {ref}`Anomaly Detection <enrichments_anomaly_detection>` enabled by default if your plan supports it, but first we'll enable {ref}`Hotspots <enrichments_hotspots>` which doesn't require any configuration. Second, we activate explainability, which requires more configuration and therefore comes with its own helper function. ```python # first activate hotspots arthur_model.enable_hotspots() # enable explainability using its own helper function for convenience arthur_model.enable_explainability( df=X_train, project_directory="/path/to/model_folder/", requirements_file="requirements.txt", user_predict_function_import_path="model_entrypoint", ignore_dirs=["folder_to_ignore"] # optionally exclude directories within the project folder from being bundled with predict function ) ``` For more information on enabling enrichments and updating their configurations, see {doc}`/user-guide/walkthroughs/enrichments`. *** ## Onboarding Existing Inferences If your model is already running in production, a good next step is to send your historical inferences to Arthur. In this section, we'll gather those historical inferences and then send them to the platform. ### Collecting Historical Inferences When logging inferences with Arthur, you may include: - **Model Inputs** which were sent to your model to make predictions - **Model Predictions** which you could fetch from storage or re-compute from your input data if you don't have them saved - **Non-Input Data** that you want to include, and you registered with your Arthur model but doesn't feed into your model - **Ground Truth** labels for the inputs if you have them available - **Partner Inference IDs** that uniquely identify your predictions and can be used to update inferences with ground truth labels in the future (details below) - **Inference Timestamps** that you can approximate with the [`generate_timestamps()` function](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.util.generate_timestamps.html?highlight=generate_timestamps#arthurai.util.generate_timestamps) if you're just simulating production data or omit to use the current time - **Ground Truth Timestamps** that you can approximate with the [`generate_timestamps()` function](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.util.generate_timestamps.html?highlight=generate_timestamps#arthurai.util.generate_timestamps) if you're just simulating production data or omit to use the current time - **Batch IDs** that denote something like a unique "run ID" if your model is a batch model You might have all the data you need in one convenient place, or more often you'll need to gather them from a couple of tables or data stores. For example, you might: - collect your input and non-input data from your data warehouse - fetch your predictions and timestamps from blob storage used with your model deployment - match them to your ground truth labels in a different legacy system #### Partner Inference IDs Arthur offers Partner Inference IDs as a way to match specific inferences in Arthur against your other systems and update your inferences with ground truth labels as they become available in the future. The most appropriate choice for a partner inference ID depends on your specific circumstances but common strategies include _using existing IDs_ and _joining metadata with non-unique IDs_. If you already have existing IDs that are unique to each inference and easily attached to future ground truth labels, you can simply use those (casting to strings if needed). Another common approach is to construct a partner inference ID from multiple pieces of metadata. For example, if your model makes predictions about your customers at most once per day, you might construct your partner inference IDs as `{customer_id}-{date}`. This would be easy to reconstruct when sending ground truth labels much later: simply lookup the labels for all the customers passed to the model on a given day and append that date to their ID. If you don't supply partner inference IDs, the SDK will generate them for you and return them to your `send_inferences()` call. These can be kept for future reference, or discarded if you've already sent ground truth values or don't plan to in the future. ### Sending Inferences Arthur offers many flexible options for sending your inferences. We have a few SDK methods can accept Pandas DataFrames, native Python objects, and Parquet files — with data grouped into single datasets or spread across separate method calls and parameters. Two examples of these are outlined below, but for all the available usages see our SDK Reference for: - the [`ArthurModel.send_inference()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=send_inferences#arthurai.core.models.ArthurModel.send_inferences) and [`update_inference_ground_truths()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=update_inference_ground_truths#arthurai.core.models.ArthurModel.update_inference_ground_truths) methods, which are recommended for non-Parquet datasets under 100,000 rows - the [`ArthurModel.send_bulk_inferences()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=send_bulk_inferences#arthurai.core.models.ArthurModel.send_bulk_inferences) and [`send_bulk_ground_truths()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=send_bulk_ground_truths#arthurai.core.models.ArthurModel.send_bulk_ground_truths) methods which are recommended for sending large datasets or Parquet files If you'd prefer to send data directly the REST API, see the [Inferences section of our API Reference](https://docs.arthur.ai/api-documentation/v3-api-docs.html#tag/inferences). #### A Simple Case Here we suppose we've gathered our input, non-input, and ground truth labels into a single DataFrame. We also fetch our predictions and the time at which they were made, and send everything in a single method call. Here we're passing the predictions and timestamps as parameters into the method, but we could also simply add them to the `inference_data` DataFrame. We don't worry about partner inference IDs here, leaving them to be auto-generated. ```python # load model input and non-input values, and ground truth labels + timestamps as a Pandas DataFrame inference_data = ... # retrieve predictions and timestamps as lists # note that we could also include these as columns in the DataFrame above predictions, inference_timestamps = ... # Send the inferences to Arthur # just using auto-generated partner inference IDs since we're sending ground truth right now arthur_model.send_inferences( inference_data, predictions=predictions, inference_timestamps=inference_timestamps) ``` ### Sending Inferences at Scale with Delayed Ground Truth Next, we consider a more complex case where we have a batch model with many inferences and send the ground truth separately, relying on our Partner Inference IDs to join the ground truth values to the previous inferences. We assume the data is neatly collected as described above. This may rely on an [ETL job](https://en.wikipedia.org/wiki/Extract,_transform,_load) that might involve a Spark job or a Redshift export or a Snowflake export or Apache Beam job in Google Cloud Dataflow or Pandas `from_sql()` and `to_parquet()` calls or whatever data wrangling toolkit you're most comfortable with. ```python # we can collect a set of folder names each corresponding to a batch run, containing one or # more Parquet files with the input attributes columns, non-input attribute columns, and # prediction attribute columns as well as a "partner_inference_id" column with our unique # identifiers and an "inference_timestamp" column inference_batch_dirs = ... # then suppose we have a directory with one or more parquet files containing matching # "partner_inference_id"s and our ground truth attribute columns as well as a # "ground_truth_timestamp" column ground_truth_dir = ... # send the inferences to Arthur for batch_dir in inference_batch_dirs: batch_id = batch_dir.split("/")[-1] # use the directory name as the Batch ID arthur_model.send_bulk_inferences( directory_path=batch_dir, batch_id=batch_id) # send the ground truths to Arthur arthur_model.send_bulk_ground_truths(directory_path=ground_truth_dir) ``` ### See Model in Dashboard To confirm that the inferences have been sent, you can view your model and its inferences in the Arthur dashboard. ### Performance Results Once you've logged your model's inferences with Arthur you can evaluate your model performance. You can open your Arthur dashboard to view model performance in the UI, or use the code snippets below to fetch the same results right from your Python environment using {doc}`Arthur's Query API </user-guide/api-query-guide/index>`. #### Query Overall Performance You can query overall Accuracy Rate with the following snippet, but for non-classifier models you might consider replacing the `accuracyRate` function with another {doc}`model evaluation function </user-guide/api-query-guide/model_evaluation_functions>`. ```python # query model accuracy across the batches query = { "select": [ { "function": "accuracyRate" } ] } query_result = arthur_model.query(query) ``` #### Visualize Performance Results Visualize performance metrics over time: ```python # plot model performance metrics over time arthur_model.viz.metric_series( ["auc", "falsePositiveRate"], time_resolution="hour") ``` Visualize data drift over time: ```python # plot drift over time of attributes # from their baseline distribution in the model's reference data arthur_model.viz.drift_series( ["X0", "predicted_probability"], drift_metric="KLDivergence", time_resolution="hour") ``` #### {doc}`API Query Guide </user-guide/api-query-guide/index>` For more analysis of model performance, the {doc}`/user-guide/api-query-guide/index` shows how to use the Arthur API to get the model performance results you need, efficiently and at scale. Our backend query engine allows for fine-grained and customizable performance analysis. *** ## Production Integration Now that you have registered your model and successfully gotten initial performance metrics on your model's historical inferences, you are ready to connect your production pipeline to Arthur. Arthur has several methods of receiving your production model's inference data. Most involve some process making a call to one of the SDK methods described above, but where that process runs and reads data from depends on your production environment. We explore a few common patterns below, as well as some of Arthur's direct {doc}`integrations </user-guide/integrations/index>`. For a quick start, consider the [quick integration](#quick-integration), which only involves adding a few lines of code to your model prediction code. If your model inputs and predictions are written out to a data stream such as a Kafka topic, consider [adding a stream listener](#streaming-integration) If you don't mind a bit of latency between when your predictions are made and logged with Arthur or it's much easier to read your inference data from rest, consider setting up an [inference upload job](#inference-upload-job). Note that these methods can be combined for prediction and ground truth values: you might use the quick integration or streaming approach for inference data but a batch job to update ground labels. ### API Keys API Keys authorize your request to send and receive data to and from the Arthur platform. With a valid API key added to your production environment, your model deployment code can be augmented to send your model's inferences to Arthur. See the {doc}`/platform-management/access-control-overview/standard_access_control` to obtain an Arthur API key. ### Quick Integration Quick integration with Arthur means using the `send_inferences()` method *when* and *where* your model object produces inferences. This is the simplest and quickest way to connect a production model to Arthur. However, this option would have you add some latency to the speed with which your model is generating inferences. For more efficient approaches, see options 2 and 3. For example, suppose your model is hosted in production behind using an API using Flask - the call to `arthur_model.send_inferences()` just needs to be included wherever your `predict` function is defined so your updated code might look something like this: ```python #################################################### # New code to fetch the ArthurModel # connect to Arthur import os from arthurai import ArthurAI arthur = ArthurAI( url="https://app.arthur.ai", access_key=os.environ["ARTHUR_API_KEY"]) # retrieve the arthur model arthur_model = arthur.get_model(os.environ["ARTHUR_PARTNER_MODEL_ID"], id_type='partner_model_id') #################################################### # your original model prediction function # which can be on its own as a python script # or wrapped by an API like a Flask app def predict(): # get data to apply model to inference_data = ... # generate inferences # in this example, the predictions are classification probabilities predictions = model.predict_proba(...) #################################################### #### NEW PART OF YOUR MODEL'S PREDICTION SCRIPT # SEND NEW INFERENCES TO ARTHUR arthur_model.send_inferences( inference_data, predictions=predictions) #################################################### return predictions ``` Alternatively if you have a batch model that runs in jobs, you might add similar code to the very end of your job, rather than inside the `predict()` function. ### Streaming Integrations If you write your model's inputs and outputs to a data stream, you can add a listener to that stream to log those inferences with Arthur. For example, if you have a Kafka topic you might add a new `arthur` consumer group to listen to new events and pass them to the `send_inferences()` method. If your inputs and predictions live in different topics or you want to add non-input data from another topic, you might use [Kafka Streams](https://kafka.apache.org/documentation/streams/) to join the various topics before sending to Arthur. ### Inference Upload Jobs Another approach is to run jobs that read data from rest and send it to the Arthur platform. These jobs might be scheduled or event-driven, depending on your architecture. For example, you might have regularly scheduled jobs that: 1. look up the inference or ground truth data since the last run 1. format the data and write it to a few Parquet files 1. send the Parquet files to the Arthur platform using `send_bulk_inferences()` or `send_bulk_ground_truths()` ### Integrations Rather than hand-rolling your own inference upload jobs, Arthur also offers more direct integrations. For example, our {ref}`SageMaker Data Capture Integration <sagemaker_integration>` makes integrating with SageMaker models a breeze by utilizing Data Capture to log the inferences into files in S3, and triggering upload jobs in response to those file write events. Our {ref}`Batch Ingestion from S3 <s3_batch_ingestion>` allows you to just upload your Parquet files to S3, and Arthur will automatically import them into the system. ```{toctree} :hidden: :maxdepth: 3 General Onboarding <self> cv_onboarding nlp_onboarding ``` |