Spaces:
Sleeping
Sleeping
# Model Onboarding | |
### Overview | |
This guide walks through the steps of onboarding a model deployed in production to Arthur. | |
Once your deployed model is onboarded, you can use Arthur to retrieve insights about | |
model performance efficiently and at scale. | |
```{note} This walkthrough uses tabular data. | |
To onboard models of other input types, see {doc}`cv_onboarding` and {doc}`nlp_onboarding`. | |
``` | |
### Requirements | |
You will need to have access to the data your model ingests and the predictions it produces. | |
The model object itself is _not_ required, but it can be uploaded to enable the explainability enrichment. | |
See our {doc}`/more-info/FAQs` for more info. | |
*** | |
### Outline | |
This guide will cover the three main steps to onboarding a model to the Arthur platform: | |
- [Model Registration](#model-registration) is the process of registering the model schema with Arthur and sending reference data | |
- [Onboarding Existing Inferences](#onboard-existing-inferences) sends your model's historical predictions to the Arthur platform | |
- [Production Integration](#production-integration) connects your model's ongoing predictions in deployment to be logged with Arthur | |
*** | |
## Model Registration | |
### Connect to Arthur | |
The first step is to import functions from the `arthurai` package and establish a connection with an Arthur username and password. | |
```python | |
# Arthur imports | |
from arthurai import ArthurAI | |
from arthurai.common.constants import InputType, OutputType, Stage, ValueType, Enrichment | |
arthur = ArthurAI(url="https://app.arthur.ai", | |
login="<YOUR_USERNAME_OR_EMAIL>") | |
``` | |
### Register Model Type | |
To register a model, we start by creating a model object and defining its | |
{ref}`high-level metadata <basic_concepts_input_output_types>`: | |
```python | |
arthur_model = arthur.model( | |
partner_model_id="OnboardingModel_123", | |
display_name="OnboardingModel", | |
input_type=InputType.Tabular, | |
output_type=OutputType.Multiclass, | |
is_batch=False) | |
``` | |
In particular, we set `is_batch=False` to define this as a {ref}`streaming model <basic_concepts_streaming_vs_batch>`, | |
which means the Arthur platform will receive the model's inferences as they are produced live in deployment. | |
### Register Attributes with [ArthurModel.build()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) | |
Next we'll add more detail to the model metadata, defining the model's {ref}`attributes <basic_concepts_attributes_and_stages>`. | |
The simplest method of registering your attributes is to use | |
[ArthurModel.build()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) | |
, which parses a Pandas DataFrame of your {ref}`reference dataset <basic_concepts_reference_dataset>` containing inputs, | |
metadata, predictions, and ground truth labels. In addition, a `pred_to_ground_truth_map` is required, which tells | |
Arthur which of your attributes represent to your model's predicted values, and how those predicted attributes correspond | |
to your model's ground truth attributes. | |
Here we build a model with a `pred_to_ground_truth_map` configured for a binary classification model. | |
```python | |
# Map PredictedValue attribute to its corresponding GroundTruth attribute value. | |
# This tells Arthur that in the data you send to the platform, | |
# the `predicted_probability` column represents | |
# the probability that the ground-truth column has the value 1 | |
pred_to_ground_truth_map = { | |
'predicted_probability' : 1 | |
} | |
arthur_model.build( | |
reference_df, | |
ground_truth_column='ground_truth_label', | |
pred_to_ground_truth_map=pred_to_ground_truth_map) | |
``` | |
#### Non Input Attributes | |
Some features of your data may be important to track for monitoring model performance even though they are not model | |
inputs or outputs. These features can be added as non input attributes in the ArthurModel: | |
```python | |
# Specifying additional non input attributes when building a model. | |
# This tells Arthur to monitor ['age','sex','race','education'] | |
# in the reference and inference data you send to the platform | |
arthur_model.build( | |
reference_df, | |
ground_truth_column='ground_truth_label', | |
pred_to_ground_truth_map=pred_to_ground_truth_map, | |
non_input_columns=['age','sex','race','education'] | |
) | |
``` | |
### Register Attributes Manually | |
As an alternative to passing a DataFrame to | |
[ArthurModel.build()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) | |
, attributes can also be registered for your model | |
manually. Registering attributes manually may be preferable if you don't use the Pandas library, or if there are attribute | |
properties not configurable from parsing your reference data alone. | |
[ArthurModel.add_attribute()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.add_attribute) | |
is the generic method add any type of attribute to a model - its | |
[docstring](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.add_attribute) | |
also links to the additional attribute registration methods tailored to specific model and data types for convenience. | |
#### Binary Classifier with Two Ground Truth Classes | |
If the data you send to the platform for a binary classifier has columns for the predicted | |
probability and ground-truth-status of class 0, as well as columns for the predicted | |
probability and ground-truth-status of class 1, then map each predicted value column to its corresponding ground truth | |
column: | |
```python | |
# Map PredictedValue attributes to their corresponding GroundTruth attribute names | |
pred_to_ground_truth_map = {'pred_0' : 'gt_0', | |
'pred_1' : 'gt_1'} | |
# add the ground truth and predicted attributes to the model | |
# specifying that the `pred_1` attribute is the | |
# positive predicted attribute, which means it corresponds to the | |
# probability that the binary target attribute is 1 | |
arthur_model.add_binary_classifier_output_attributes( | |
positive_predicted_attr='pred_1', | |
pred_to_ground_truth_map=pred_to_ground_truth_map) | |
``` | |
#### More Than Two Ground Truth Classes | |
If you are using a Multi-class model then you will have more than two Ground Truth classes. In order to make this work with the Arthur Platform, you will need to: | |
1. Ensure that you are using `predict_proba` (or a similar function) to predict the probability of a specific Ground Truth Class | |
2. Ensure that each class probability is included in its own column in your dataset | |
3. Ensure that your Ground Truth mapping contains all possible classes that might be predicted | |
So for example, if your model identifies the presence of an animal, specifically a dog, cat, or horse, in an image, your Ground Truth mapping must contain items for each of these clasess (even if the model output doesn't predict a value for these categories). | |
If the data you send to the platform has ground truth one-hot encoded, then map predictions to each column name: | |
```python | |
# Map PredictedValue attributes to their corresponding GroundTruth attribute names. | |
# This pred_to_ground_truth_map maps predicted values to one-hot encoded ground truth columns. | |
# For example, this tells Arthur that the `probability_dog` column represents | |
# the probability that the `dog_ground_truth` column has the value 1. | |
pred_to_ground_truth_map = { | |
"probability_dog": "dog_ground_truth", | |
"probability_cat": "cat_ground_truth", | |
"probability_horse": "horse_ground_truth" | |
} | |
arthur_model.add_multiclass_classifier_output_attributes( | |
pred_to_ground_truth_map=pred_to_ground_truth_map | |
) | |
``` | |
If the data you send to the platform has ground truth values in a single column, then map predictions to each column value: | |
```python | |
# Map PredictedValue attributes to their corresponding GroundTruth attribute values. | |
# This pred_to_ground_truth_map maps predicted values to the values of the ground truth column. | |
# For example, this tells Arthur that the `probability_dog` column represents | |
# the probability that the ground truth column has the value "dog". | |
pred_to_ground_truth_map = { | |
"probability_dog": "dog", | |
"probability_cat": "cat", | |
"probability_horse": "horse" | |
} | |
arthur_model.add_classifier_output_attributes_gtclass( | |
pred_to_ground_truth_map=pred_to_ground_truth_map, | |
ground_truth_column="animal" | |
) | |
``` | |
#### Regression Attributes | |
If you are registering a regression model, then specify the type of the predicted and ground truth values when registering | |
the attributes: | |
```python | |
# Map PredictedValue attribute to its corresponding GroundTruth attribute | |
pred_to_ground_truth_map = { | |
"predidcted_value": "ground_truth_value", | |
} | |
# add the pred_to_ground_truth_map, and specify the type of the | |
# predicted and ground truth values | |
arthur_model.add_regression_output_attributes( | |
pred_to_ground_truth_map = pred_to_ground_truth_map, | |
value_type = ValueType.Float | |
) | |
``` | |
### Set Reference Data | |
If you used your reference data to register your model's attributes with | |
[ArthurModel.build()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.build) | |
, you don't need to complete this step because the dataframe you pass in as input to `build()` will be automatically saved | |
as your model's reference data in the Arthur system. | |
If you didn't use `build()` or want to update the reference dataset to be sent to Arthur, you can set it directly by using the | |
[`ArthurModel.set_reference_data()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.set_reference_data) | |
method. This is also necessary if your reference dataset is too large to fit into memory as a Pandas DataFrame. | |
### Review Model | |
The method | |
[ArthurModel.review()](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html#arthurai.core.models.ArthurModel.review) | |
returns the model schema, which is a dataframe of properties for each of your model's registered attributes. The `review()` | |
method is automatically called when using `build()`, and can also be called on its own. Inspecting the model schema `review()` | |
returns is recommended to verify that attribute properties have been inferred correctly. | |
```{note} | |
Some important properties to check in the model schema: | |
- Check that attributes have the correct value types | |
- Check that attributes are correctly marked as categorical or continuous | |
- Check that attributes you want to monitor for bias have monitor_for_bias=True | |
``` | |
By default, printing the model schema doesn't display all the attribute properties. | |
Therefore if you want to examine the model schema in its entirety, you can do so by formatting the maximum number of | |
rows and columns to display: | |
```python | |
pd.set_option('display.max_columns', 10); pd.set_option('max_rows', 50) | |
arthur_model.review() | |
``` | |
The model schema should look like this: | |
```python | |
name stage value_type categorical is_unique categories bins range monitor_for_bias | |
0 X0 PIPELINE_INPUT FLOAT False False [] None [16.0, 58.0] False | |
1 ground_truth_label GROUND_TRUTH INTEGER True False [{value: 0}, {value: 1}] None [None, None] False | |
2 predicted_probability PREDICTED_VALUE FLOAT False False [] None [0, 1] False | |
``` | |
```{note} | |
To modify attribute properties in the model schema table, see the docstring for | |
[ArthurAttribute](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.attributes.ArthurAttribute.html#arthurai.core.attributes.ArthurAttribute) | |
for a complete description of model attribute properties and their configuration methods. | |
``` | |
### Save Model | |
Once you have reviewed your model schema and made any necessary modification to your model's attributes, you are ready | |
to save your model to Arthur. | |
Calling `arthur_model.save()` returns the unique ID Arthur creates for your model. You can easily load the model from | |
the Arthur system later on using either this ID or the `partner_model_id` you specified when you first created the model. | |
```python | |
arthur_model_id = arthur_model.save() | |
``` | |
### Activate Enrichments | |
[Enrichments](../basic_concepts.html#enrichments) are model monitoring services Arthur provides that can be activated | |
once your model is saved to Arthur. | |
Models will have the {ref}`Anomaly Detection <enrichments_anomaly_detection>` enabled by default if your plan supports | |
it, but first we'll enable {ref}`Hotspots <enrichments_hotspots>` which doesn't require any configuration. | |
Second, we activate explainability, which requires more configuration and therefore comes with its own helper function. | |
```python | |
# first activate hotspots | |
arthur_model.enable_hotspots() | |
# enable explainability using its own helper function for convenience | |
arthur_model.enable_explainability( | |
df=X_train, | |
project_directory="/path/to/model_folder/", | |
requirements_file="requirements.txt", | |
user_predict_function_import_path="model_entrypoint", | |
ignore_dirs=["folder_to_ignore"] # optionally exclude directories within the project folder from being bundled with predict function | |
) | |
``` | |
For more information on enabling enrichments and updating their configurations, see {doc}`/user-guide/walkthroughs/enrichments`. | |
*** | |
## Onboarding Existing Inferences | |
If your model is already running in production, a good next step is to send your historical inferences to Arthur. In | |
this section, we'll gather those historical inferences and then send them to the platform. | |
### Collecting Historical Inferences | |
When logging inferences with Arthur, you may include: | |
- **Model Inputs** which were sent to your model to make predictions | |
- **Model Predictions** which you could fetch from storage or re-compute from your input data if you don't have them | |
saved | |
- **Non-Input Data** that you want to include, and you registered with your Arthur model but doesn't feed | |
into your model | |
- **Ground Truth** labels for the inputs if you have them available | |
- **Partner Inference IDs** that uniquely identify your predictions and can be used to update inferences with ground | |
truth labels in the future (details below) | |
- **Inference Timestamps** that you can approximate with the [`generate_timestamps()` function](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.util.generate_timestamps.html?highlight=generate_timestamps#arthurai.util.generate_timestamps) | |
if you're just simulating production data or omit to use the current time | |
- **Ground Truth Timestamps** that you can approximate with the [`generate_timestamps()` function](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.util.generate_timestamps.html?highlight=generate_timestamps#arthurai.util.generate_timestamps) | |
if you're just simulating production data or omit to use the current time | |
- **Batch IDs** that denote something like a unique "run ID" if your model is a batch model | |
You might have all the data you need in one convenient place, or more often you'll need to gather them from a couple of | |
tables or data stores. For example, you might: | |
- collect your input and non-input data from your data warehouse | |
- fetch your predictions and timestamps from blob storage used with your model deployment | |
- match them to your ground truth labels in a different legacy system | |
#### Partner Inference IDs | |
Arthur offers Partner Inference IDs as a way to match specific inferences in Arthur against your other systems and | |
update your inferences with ground truth labels as they become available in the future. The most appropriate choice | |
for a partner inference ID depends on your specific circumstances but common strategies include _using existing IDs_ | |
and _joining metadata with non-unique IDs_. | |
If you already have existing IDs that are unique to each inference and easily attached to future ground truth labels, | |
you can simply use those (casting to strings if needed). | |
Another common approach is to construct a partner inference ID from multiple pieces of metadata. For example, if your | |
model makes predictions about your customers at most once per day, you might construct your partner inference IDs as | |
`{customer_id}-{date}`. This would be easy to reconstruct when sending ground truth labels much later: simply lookup | |
the labels for all the customers passed to the model on a given day and append that date to their ID. | |
If you don't supply partner inference IDs, the SDK will generate them for you and return them to your | |
`send_inferences()` call. These can be kept for future reference, or discarded if you've already sent ground truth | |
values or don't plan to in the future. | |
### Sending Inferences | |
Arthur offers many flexible options for sending your inferences. We have a few SDK methods can accept Pandas DataFrames, | |
native Python objects, and Parquet files — with data grouped into single datasets or spread across separate method calls | |
and parameters. Two examples of these are outlined below, but for all the available usages see our SDK Reference for: | |
- the [`ArthurModel.send_inference()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=send_inferences#arthurai.core.models.ArthurModel.send_inferences) and [`update_inference_ground_truths()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=update_inference_ground_truths#arthurai.core.models.ArthurModel.update_inference_ground_truths) methods, | |
which are recommended for non-Parquet datasets under 100,000 rows | |
- the [`ArthurModel.send_bulk_inferences()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=send_bulk_inferences#arthurai.core.models.ArthurModel.send_bulk_inferences) and [`send_bulk_ground_truths()`](https://docs.arthur.ai/sdk/sdk_v3/apiref/arthurai.core.models.ArthurModel.html?highlight=send_bulk_ground_truths#arthurai.core.models.ArthurModel.send_bulk_ground_truths) | |
methods which are recommended for sending large datasets or Parquet files | |
If you'd prefer to send data directly the REST API, see the [Inferences section of our API Reference](https://docs.arthur.ai/api-documentation/v3-api-docs.html#tag/inferences). | |
#### A Simple Case | |
Here we suppose we've gathered our input, non-input, and ground truth labels into a single DataFrame. We also fetch | |
our predictions and the time at which they were made, and send everything in a single method call. Here we're passing | |
the predictions and timestamps as parameters into the method, but we could also simply add them to the `inference_data` | |
DataFrame. We don't worry about partner inference IDs here, leaving them to be auto-generated. | |
```python | |
# load model input and non-input values, and ground truth labels + timestamps as a Pandas DataFrame | |
inference_data = ... | |
# retrieve predictions and timestamps as lists | |
# note that we could also include these as columns in the DataFrame above | |
predictions, inference_timestamps = ... | |
# Send the inferences to Arthur | |
# just using auto-generated partner inference IDs since we're sending ground truth right now | |
arthur_model.send_inferences( | |
inference_data, | |
predictions=predictions, | |
inference_timestamps=inference_timestamps) | |
``` | |
### Sending Inferences at Scale with Delayed Ground Truth | |
Next, we consider a more complex case where we have a batch model with many inferences and send the ground truth | |
separately, relying on our Partner Inference IDs to join the ground truth values to the previous inferences. We | |
assume the data is neatly collected as described above. This may rely on an [ETL job](https://en.wikipedia.org/wiki/Extract,_transform,_load) | |
that might involve a Spark job or a Redshift export or a Snowflake export or Apache Beam job in Google Cloud Dataflow | |
or Pandas `from_sql()` and `to_parquet()` calls or whatever data wrangling toolkit you're most comfortable with. | |
```python | |
# we can collect a set of folder names each corresponding to a batch run, containing one or | |
# more Parquet files with the input attributes columns, non-input attribute columns, and | |
# prediction attribute columns as well as a "partner_inference_id" column with our unique | |
# identifiers and an "inference_timestamp" column | |
inference_batch_dirs = ... | |
# then suppose we have a directory with one or more parquet files containing matching | |
# "partner_inference_id"s and our ground truth attribute columns as well as a | |
# "ground_truth_timestamp" column | |
ground_truth_dir = ... | |
# send the inferences to Arthur | |
for batch_dir in inference_batch_dirs: | |
batch_id = batch_dir.split("/")[-1] # use the directory name as the Batch ID | |
arthur_model.send_bulk_inferences( | |
directory_path=batch_dir, | |
batch_id=batch_id) | |
# send the ground truths to Arthur | |
arthur_model.send_bulk_ground_truths(directory_path=ground_truth_dir) | |
``` | |
### See Model in Dashboard | |
To confirm that the inferences have been sent, you can view your model and its inferences in the Arthur dashboard. | |
### Performance Results | |
Once you've logged your model's inferences with Arthur you can evaluate your model performance. You can open your | |
Arthur dashboard to view model performance in the UI, or use the code snippets below to fetch the same results right | |
from your Python environment using {doc}`Arthur's Query API </user-guide/api-query-guide/index>`. | |
#### Query Overall Performance | |
You can query overall Accuracy Rate with the following snippet, but for non-classifier models you might consider | |
replacing the `accuracyRate` function with another {doc}`model evaluation function </user-guide/api-query-guide/model_evaluation_functions>`. | |
```python | |
# query model accuracy across the batches | |
query = { | |
"select": [ | |
{ | |
"function": "accuracyRate" | |
} | |
] | |
} | |
query_result = arthur_model.query(query) | |
``` | |
#### Visualize Performance Results | |
Visualize performance metrics over time: | |
```python | |
# plot model performance metrics over time | |
arthur_model.viz.metric_series( | |
["auc", "falsePositiveRate"], | |
time_resolution="hour") | |
``` | |
Visualize data drift over time: | |
```python | |
# plot drift over time of attributes | |
# from their baseline distribution in the model's reference data | |
arthur_model.viz.drift_series( | |
["X0", "predicted_probability"], | |
drift_metric="KLDivergence", | |
time_resolution="hour") | |
``` | |
#### {doc}`API Query Guide </user-guide/api-query-guide/index>` | |
For more analysis of model performance, the {doc}`/user-guide/api-query-guide/index` shows how to use the Arthur | |
API to get the model performance results you need, efficiently and at scale. Our backend query engine allows for fine-grained and | |
customizable performance analysis. | |
*** | |
## Production Integration | |
Now that you have registered your model and successfully gotten initial performance metrics on your model's | |
historical inferences, you are ready to connect your production pipeline to Arthur. | |
Arthur has several methods of receiving your production model's inference data. Most involve some process making a | |
call to one of the SDK methods described above, but where that process runs and reads data from depends on your | |
production environment. We explore a few common patterns below, as well as some of Arthur's direct | |
{doc}`integrations </user-guide/integrations/index>`. | |
For a quick start, consider the [quick integration](#quick-integration), which only involves adding a few lines of code | |
to your model prediction code. | |
If your model inputs and predictions are written out to a data stream such as a Kafka topic, consider [adding a stream | |
listener](#streaming-integration) | |
If you don't mind a bit of latency between when your predictions are made and logged with Arthur or it's much easier | |
to read your inference data from rest, consider setting up an [inference upload job](#inference-upload-job). | |
Note that these methods can be combined for prediction and ground truth values: you might use the quick integration or | |
streaming approach for inference data but a batch job to update ground labels. | |
### API Keys | |
API Keys authorize your request to send and receive data to and from the Arthur platform. With a valid API key added | |
to your production environment, your model deployment code can be augmented to send your model's inferences to Arthur. | |
See the {doc}`/platform-management/access-control-overview/standard_access_control` to obtain an Arthur API key. | |
### Quick Integration | |
Quick integration with Arthur means using the `send_inferences()` method *when* and *where* your model object | |
produces inferences. This is the simplest and quickest way to connect a production model to Arthur. However, this option | |
would have you add some latency to the speed with which your model is generating inferences. For more efficient | |
approaches, see options 2 and 3. | |
For example, suppose your model is hosted in production behind using an API using Flask - the call to | |
`arthur_model.send_inferences()` just needs to be included wherever your `predict` function is defined so your updated | |
code might look something like this: | |
```python | |
#################################################### | |
# New code to fetch the ArthurModel | |
# connect to Arthur | |
import os | |
from arthurai import ArthurAI | |
arthur = ArthurAI( | |
url="https://app.arthur.ai", | |
access_key=os.environ["ARTHUR_API_KEY"]) | |
# retrieve the arthur model | |
arthur_model = arthur.get_model(os.environ["ARTHUR_PARTNER_MODEL_ID"], id_type='partner_model_id') | |
#################################################### | |
# your original model prediction function | |
# which can be on its own as a python script | |
# or wrapped by an API like a Flask app | |
def predict(): | |
# get data to apply model to | |
inference_data = ... | |
# generate inferences | |
# in this example, the predictions are classification probabilities | |
predictions = model.predict_proba(...) | |
#################################################### | |
#### NEW PART OF YOUR MODEL'S PREDICTION SCRIPT | |
# SEND NEW INFERENCES TO ARTHUR | |
arthur_model.send_inferences( | |
inference_data, | |
predictions=predictions) | |
#################################################### | |
return predictions | |
``` | |
Alternatively if you have a batch model that runs in jobs, you might add similar code to the very end of your job, | |
rather than inside the `predict()` function. | |
### Streaming Integrations | |
If you write your model's inputs and outputs to a data stream, you can add a listener to that stream to log those | |
inferences with Arthur. For example, if you have a Kafka topic you might add a new `arthur` consumer group to listen | |
to new events and pass them to the `send_inferences()` method. If your inputs and predictions live in different topics | |
or you want to add non-input data from another topic, you might use [Kafka Streams](https://kafka.apache.org/documentation/streams/) | |
to join the various topics before sending to Arthur. | |
### Inference Upload Jobs | |
Another approach is to run jobs that read data from rest and send it to the Arthur platform. These jobs might be | |
scheduled or event-driven, depending on your architecture. | |
For example, you might have regularly scheduled jobs that: | |
1. look up the inference or ground truth data since the last run | |
1. format the data and write it to a few Parquet files | |
1. send the Parquet files to the Arthur platform using `send_bulk_inferences()` or `send_bulk_ground_truths()` | |
### Integrations | |
Rather than hand-rolling your own inference upload jobs, Arthur also offers more direct integrations. | |
For example, our {ref}`SageMaker Data Capture Integration <sagemaker_integration>` makes integrating with SageMaker | |
models a breeze by utilizing Data Capture to log the inferences into files in S3, and triggering upload jobs in | |
response to those file write events. | |
Our {ref}`Batch Ingestion from S3 <s3_batch_ingestion>` allows you to just upload your | |
Parquet files to S3, and Arthur will automatically import them into the system. | |
```{toctree} | |
:hidden: | |
:maxdepth: 3 | |
General Onboarding <self> | |
cv_onboarding | |
nlp_onboarding | |
``` | |