Spaces:
Sleeping
Sleeping
# Arthur Quickstart | |
From a Python environment with the `arthurai` package installed, this quickstart code will: | |
1. Make binary classification predictions on a small dataset | |
2. Onboard the model with reference data to Arthur | |
3. Log batches of model inference data with Arthur | |
4. Get performance results for our model | |
## Imports | |
The `arthurai` package can be `pip`-installed from the terminal, along with `numpy` and `pandas`: | |
`pip install arthurai numpy pandas`. | |
Then you can import from the `arthurai` package like this: | |
```python | |
# Arthur imports | |
from arthurai import ArthurAI | |
from arthurai.common.constants import InputType, OutputType, Stage | |
# other libraries used in this example | |
import numpy as np | |
import pandas as pd | |
``` | |
## Model Predictions | |
We write out samples from a Titanic survival prediction dataset explicitly in python, | |
giving the age of each passenger, the cost of their ticket, the passenger class of their ticket, | |
and the ground-truth label of whether they survived. | |
Our model's outputs are given by a predict function using only the `age` variable. We split the data into | |
* `reference_data` for onboarding the model | |
* `inference_data` for in-production inferences the model processes | |
```{note} | |
We include model outputs, ground-truth values, and non-input data in reference_data, which are optional but recommended. | |
``` | |
```python | |
# Define Titanic sample data | |
titanic_data = pd.DataFrame({ | |
'age': [16.0, 24.0, 19.0, 58.0, 30.0, 22.0, 40.0, 37.0, 65.0, 32.0], | |
'fare': [86.5, 49.5042, 8.05, 153.4625, 7.8958, 7.75, 7.8958, 29.7, 7.75, 7.8958], | |
'passenger_class': [1, 1, 3, 1, 3, 3, 3, 1, 3, 3], | |
'survived': [1, 1, 1, 1, 0, 1, 0, 0, 0, 0]}) | |
# Split into reference and inference data | |
reference_data, inference_data = titanic_data[:6], titanic_data[6:] | |
# Predict the probability of Titanic survival as inverse percentile of age | |
def predict(age): | |
nearest_age_index = np.argmin(np.abs(np.sort(reference_data['age']) - age)) | |
return 1 - (nearest_age_index / (len(reference_data) - 1)) | |
# reference_data and inference_data contain the model's inputs and outputs | |
reference_data['pred_survived'] = reference_data['age'].apply(predict) | |
inference_data['pred_survived'] = inference_data['age'].apply(predict) | |
``` | |
## Onboarding | |
This code will only run once you enter a valid username and password. | |
We register our `arthur_model` with Arthur as a tabular classifier with the name "TitanicQuickstart". | |
Then we build its model schema from `reference_data`, specifying which attributes are in which {ref}`stage <basic_concepts_attributes_and_stages>`. | |
Additionally, we configure extra settings for the `passenger_class` attribute. Then we save the model to the platform. | |
```python | |
# Connect to Arthur | |
arthur = ArthurAI(url="https://app.arthur.ai", | |
login="<YOUR_USERNAME_OR_EMAIL>") | |
# Register the model type with Arthur | |
arthur_model = arthur.model(partner_model_id="TitanicQuickstart", | |
input_type=InputType.Tabular, | |
output_type=OutputType.Multiclass) | |
# Map PredictedValue attribute to its corresponding GroundTruth attribute value. | |
# This tells Arthur that the `pred_survived` column represents | |
# the probability that the ground truth column has the value 1 | |
pred_to_ground_truth_map = {'pred_survived' : 1} | |
# Build arthur_model schema on the reference dataset, | |
# specifying which attribute represents ground truth | |
# and which attributes are NonInputData. | |
# Arthur will monitor NonInputData attributes even though they are not model inputs. | |
arthur_model.build(reference_data, | |
ground_truth_column='survived', | |
pred_to_ground_truth_map=pred_to_ground_truth_map, | |
non_input_columns=['fare', 'passenger_class']) | |
# Configure the `passenger_class` attribute | |
# 1. Turn on bias monitoring for the attribute. | |
# 2. Specify that the passenger_class attribute has possible values [1, 2, 3], | |
# since that information was not present in reference_data (only values 1 and 3 are present). | |
arthur_model.get_attribute(name='passenger_class').set(monitor_for_bias=True, | |
categories=[1,2,3]) | |
# onboard the model to Arthur | |
arthur_model.save() | |
``` | |
## Sending Inferences | |
Here we send batches of inferences from `inference_data` to Arthur. | |
```python | |
# send four batches of inferences | |
for batch in range(4): | |
# Sample the inference dataset with predictions | |
inferences = inference_data.sample(np.random.randint(2, 5)) | |
# Send the inferences to Arthur | |
arthur_model.send_inferences(inferences, batch_id=f"batch_{batch}") | |
``` | |
## Performance Results | |
With our model onboarded and inferences sent, we can get performance results from Arthur. View your model in your | |
Arthur dashboard, or use the code below to fetch the overall accuracy rate: | |
```python | |
# query model accuracy across the batches | |
query = { | |
"select": [ | |
{ | |
"function": "accuracyRate" | |
} | |
] | |
} | |
query_result = arthur_model.query(query) | |
``` | |
If you print `query_result`, you should see `[{'accuracyRate': 1}]`. | |
## Next Steps | |
### {doc}`Basic Concepts <basic_concepts>` | |
The {doc}`basic_concepts` page contains a quick introduction to important terms and ideas to get familiar with | |
model monitoring using the Arthur platform. | |
### {doc}`Onboard Your Model </user-guide/walkthroughs/model-onboarding/index>` | |
The {doc}`Model Onboarding walkthrough </user-guide/walkthroughs/model-onboarding/index>` page covers the steps of onboarding a model, formatting attribute | |
data, and sending inferences to Arthur. | |