maxcembalest's picture
Upload 184 files
ad8da65
# Arthur Quickstart
From a Python environment with the `arthurai` package installed, this quickstart code will:
1. Make binary classification predictions on a small dataset
2. Onboard the model with reference data to Arthur
3. Log batches of model inference data with Arthur
4. Get performance results for our model
## Imports
The `arthurai` package can be `pip`-installed from the terminal, along with `numpy` and `pandas`:
`pip install arthurai numpy pandas`.
Then you can import from the `arthurai` package like this:
```python
# Arthur imports
from arthurai import ArthurAI
from arthurai.common.constants import InputType, OutputType, Stage
# other libraries used in this example
import numpy as np
import pandas as pd
```
## Model Predictions
We write out samples from a Titanic survival prediction dataset explicitly in python,
giving the age of each passenger, the cost of their ticket, the passenger class of their ticket,
and the ground-truth label of whether they survived.
Our model's outputs are given by a predict function using only the `age` variable. We split the data into
* `reference_data` for onboarding the model
* `inference_data` for in-production inferences the model processes
```{note}
We include model outputs, ground-truth values, and non-input data in reference_data, which are optional but recommended.
```
```python
# Define Titanic sample data
titanic_data = pd.DataFrame({
'age': [16.0, 24.0, 19.0, 58.0, 30.0, 22.0, 40.0, 37.0, 65.0, 32.0],
'fare': [86.5, 49.5042, 8.05, 153.4625, 7.8958, 7.75, 7.8958, 29.7, 7.75, 7.8958],
'passenger_class': [1, 1, 3, 1, 3, 3, 3, 1, 3, 3],
'survived': [1, 1, 1, 1, 0, 1, 0, 0, 0, 0]})
# Split into reference and inference data
reference_data, inference_data = titanic_data[:6], titanic_data[6:]
# Predict the probability of Titanic survival as inverse percentile of age
def predict(age):
nearest_age_index = np.argmin(np.abs(np.sort(reference_data['age']) - age))
return 1 - (nearest_age_index / (len(reference_data) - 1))
# reference_data and inference_data contain the model's inputs and outputs
reference_data['pred_survived'] = reference_data['age'].apply(predict)
inference_data['pred_survived'] = inference_data['age'].apply(predict)
```
## Onboarding
This code will only run once you enter a valid username and password.
We register our `arthur_model` with Arthur as a tabular classifier with the name "TitanicQuickstart".
Then we build its model schema from `reference_data`, specifying which attributes are in which {ref}`stage <basic_concepts_attributes_and_stages>`.
Additionally, we configure extra settings for the `passenger_class` attribute. Then we save the model to the platform.
```python
# Connect to Arthur
arthur = ArthurAI(url="https://app.arthur.ai",
login="<YOUR_USERNAME_OR_EMAIL>")
# Register the model type with Arthur
arthur_model = arthur.model(partner_model_id="TitanicQuickstart",
input_type=InputType.Tabular,
output_type=OutputType.Multiclass)
# Map PredictedValue attribute to its corresponding GroundTruth attribute value.
# This tells Arthur that the `pred_survived` column represents
# the probability that the ground truth column has the value 1
pred_to_ground_truth_map = {'pred_survived' : 1}
# Build arthur_model schema on the reference dataset,
# specifying which attribute represents ground truth
# and which attributes are NonInputData.
# Arthur will monitor NonInputData attributes even though they are not model inputs.
arthur_model.build(reference_data,
ground_truth_column='survived',
pred_to_ground_truth_map=pred_to_ground_truth_map,
non_input_columns=['fare', 'passenger_class'])
# Configure the `passenger_class` attribute
# 1. Turn on bias monitoring for the attribute.
# 2. Specify that the passenger_class attribute has possible values [1, 2, 3],
# since that information was not present in reference_data (only values 1 and 3 are present).
arthur_model.get_attribute(name='passenger_class').set(monitor_for_bias=True,
categories=[1,2,3])
# onboard the model to Arthur
arthur_model.save()
```
## Sending Inferences
Here we send batches of inferences from `inference_data` to Arthur.
```python
# send four batches of inferences
for batch in range(4):
# Sample the inference dataset with predictions
inferences = inference_data.sample(np.random.randint(2, 5))
# Send the inferences to Arthur
arthur_model.send_inferences(inferences, batch_id=f"batch_{batch}")
```
## Performance Results
With our model onboarded and inferences sent, we can get performance results from Arthur. View your model in your
Arthur dashboard, or use the code below to fetch the overall accuracy rate:
```python
# query model accuracy across the batches
query = {
"select": [
{
"function": "accuracyRate"
}
]
}
query_result = arthur_model.query(query)
```
If you print `query_result`, you should see `[{'accuracyRate': 1}]`.
## Next Steps
### {doc}`Basic Concepts <basic_concepts>`
The {doc}`basic_concepts` page contains a quick introduction to important terms and ideas to get familiar with
model monitoring using the Arthur platform.
### {doc}`Onboard Your Model </user-guide/walkthroughs/model-onboarding/index>`
The {doc}`Model Onboarding walkthrough </user-guide/walkthroughs/model-onboarding/index>` page covers the steps of onboarding a model, formatting attribute
data, and sending inferences to Arthur.