Spaces:
Sleeping
Sleeping
File size: 5,620 Bytes
ad8da65 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
# Arthur Quickstart From a Python environment with the `arthurai` package installed, this quickstart code will: 1. Make binary classification predictions on a small dataset 2. Onboard the model with reference data to Arthur 3. Log batches of model inference data with Arthur 4. Get performance results for our model ## Imports The `arthurai` package can be `pip`-installed from the terminal, along with `numpy` and `pandas`: `pip install arthurai numpy pandas`. Then you can import from the `arthurai` package like this: ```python # Arthur imports from arthurai import ArthurAI from arthurai.common.constants import InputType, OutputType, Stage # other libraries used in this example import numpy as np import pandas as pd ``` ## Model Predictions We write out samples from a Titanic survival prediction dataset explicitly in python, giving the age of each passenger, the cost of their ticket, the passenger class of their ticket, and the ground-truth label of whether they survived. Our model's outputs are given by a predict function using only the `age` variable. We split the data into * `reference_data` for onboarding the model * `inference_data` for in-production inferences the model processes ```{note} We include model outputs, ground-truth values, and non-input data in reference_data, which are optional but recommended. ``` ```python # Define Titanic sample data titanic_data = pd.DataFrame({ 'age': [16.0, 24.0, 19.0, 58.0, 30.0, 22.0, 40.0, 37.0, 65.0, 32.0], 'fare': [86.5, 49.5042, 8.05, 153.4625, 7.8958, 7.75, 7.8958, 29.7, 7.75, 7.8958], 'passenger_class': [1, 1, 3, 1, 3, 3, 3, 1, 3, 3], 'survived': [1, 1, 1, 1, 0, 1, 0, 0, 0, 0]}) # Split into reference and inference data reference_data, inference_data = titanic_data[:6], titanic_data[6:] # Predict the probability of Titanic survival as inverse percentile of age def predict(age): nearest_age_index = np.argmin(np.abs(np.sort(reference_data['age']) - age)) return 1 - (nearest_age_index / (len(reference_data) - 1)) # reference_data and inference_data contain the model's inputs and outputs reference_data['pred_survived'] = reference_data['age'].apply(predict) inference_data['pred_survived'] = inference_data['age'].apply(predict) ``` ## Onboarding This code will only run once you enter a valid username and password. We register our `arthur_model` with Arthur as a tabular classifier with the name "TitanicQuickstart". Then we build its model schema from `reference_data`, specifying which attributes are in which {ref}`stage <basic_concepts_attributes_and_stages>`. Additionally, we configure extra settings for the `passenger_class` attribute. Then we save the model to the platform. ```python # Connect to Arthur arthur = ArthurAI(url="https://app.arthur.ai", login="<YOUR_USERNAME_OR_EMAIL>") # Register the model type with Arthur arthur_model = arthur.model(partner_model_id="TitanicQuickstart", input_type=InputType.Tabular, output_type=OutputType.Multiclass) # Map PredictedValue attribute to its corresponding GroundTruth attribute value. # This tells Arthur that the `pred_survived` column represents # the probability that the ground truth column has the value 1 pred_to_ground_truth_map = {'pred_survived' : 1} # Build arthur_model schema on the reference dataset, # specifying which attribute represents ground truth # and which attributes are NonInputData. # Arthur will monitor NonInputData attributes even though they are not model inputs. arthur_model.build(reference_data, ground_truth_column='survived', pred_to_ground_truth_map=pred_to_ground_truth_map, non_input_columns=['fare', 'passenger_class']) # Configure the `passenger_class` attribute # 1. Turn on bias monitoring for the attribute. # 2. Specify that the passenger_class attribute has possible values [1, 2, 3], # since that information was not present in reference_data (only values 1 and 3 are present). arthur_model.get_attribute(name='passenger_class').set(monitor_for_bias=True, categories=[1,2,3]) # onboard the model to Arthur arthur_model.save() ``` ## Sending Inferences Here we send batches of inferences from `inference_data` to Arthur. ```python # send four batches of inferences for batch in range(4): # Sample the inference dataset with predictions inferences = inference_data.sample(np.random.randint(2, 5)) # Send the inferences to Arthur arthur_model.send_inferences(inferences, batch_id=f"batch_{batch}") ``` ## Performance Results With our model onboarded and inferences sent, we can get performance results from Arthur. View your model in your Arthur dashboard, or use the code below to fetch the overall accuracy rate: ```python # query model accuracy across the batches query = { "select": [ { "function": "accuracyRate" } ] } query_result = arthur_model.query(query) ``` If you print `query_result`, you should see `[{'accuracyRate': 1}]`. ## Next Steps ### {doc}`Basic Concepts <basic_concepts>` The {doc}`basic_concepts` page contains a quick introduction to important terms and ideas to get familiar with model monitoring using the Arthur platform. ### {doc}`Onboard Your Model </user-guide/walkthroughs/model-onboarding/index>` The {doc}`Model Onboarding walkthrough </user-guide/walkthroughs/model-onboarding/index>` page covers the steps of onboarding a model, formatting attribute data, and sending inferences to Arthur. |