Spaces:

maxcembalest
/

ask-arthur

Sleeping

App Files Files Community

ask-arthur / files /arthur-docs-markdown /user-guide /arthur_quickstart.md.txt

maxcembalest

Upload 184 files

ad8da65 over 2 years ago

raw

history blame contribute delete

5.62 kB

	# Arthur Quickstart

	From a Python environment with the `arthurai` package installed, this quickstart code will:
	1. Make binary classification predictions on a small dataset
	2. Onboard the model with reference data to Arthur
	3. Log batches of model inference data with Arthur
	4. Get performance results for our model

	## Imports

	The `arthurai` package can be `pip`-installed from the terminal, along with `numpy` and `pandas`:
	`pip install arthurai numpy pandas`.

	Then you can import from the `arthurai` package like this:
	```python
	# Arthur imports
	from arthurai import ArthurAI
	from arthurai.common.constants import InputType, OutputType, Stage

	# other libraries used in this example
	import numpy as np
	import pandas as pd
	```

	## Model Predictions

	We write out samples from a Titanic survival prediction dataset explicitly in python,
	giving the age of each passenger, the cost of their ticket, the passenger class of their ticket,
	and the ground-truth label of whether they survived.
	Our model's outputs are given by a predict function using only the `age` variable. We split the data into

	* `reference_data` for onboarding the model
	* `inference_data` for in-production inferences the model processes

	```{note}
	We include model outputs, ground-truth values, and non-input data in reference_data, which are optional but recommended.
	```

	```python
	# Define Titanic sample data
	titanic_data = pd.DataFrame({
	'age': [16.0, 24.0, 19.0, 58.0, 30.0, 22.0, 40.0, 37.0, 65.0, 32.0],
	'fare': [86.5, 49.5042, 8.05, 153.4625, 7.8958, 7.75, 7.8958, 29.7, 7.75, 7.8958],
	'passenger_class': [1, 1, 3, 1, 3, 3, 3, 1, 3, 3],
	'survived': [1, 1, 1, 1, 0, 1, 0, 0, 0, 0]})

	# Split into reference and inference data
	reference_data, inference_data = titanic_data[:6], titanic_data[6:]

	# Predict the probability of Titanic survival as inverse percentile of age
	def predict(age):
	nearest_age_index = np.argmin(np.abs(np.sort(reference_data['age']) - age))
	return 1 - (nearest_age_index / (len(reference_data) - 1))

	# reference_data and inference_data contain the model's inputs and outputs
	reference_data['pred_survived'] = reference_data['age'].apply(predict)
	inference_data['pred_survived'] = inference_data['age'].apply(predict)
	```

	## Onboarding

	This code will only run once you enter a valid username and password.

	We register our `arthur_model` with Arthur as a tabular classifier with the name "TitanicQuickstart".
	Then we build its model schema from `reference_data`, specifying which attributes are in which {ref}`stage <basic_concepts_attributes_and_stages>`.
	Additionally, we configure extra settings for the `passenger_class` attribute. Then we save the model to the platform.

	```python
	# Connect to Arthur
	arthur = ArthurAI(url="https://app.arthur.ai",
	login="<YOUR_USERNAME_OR_EMAIL>")

	# Register the model type with Arthur
	arthur_model = arthur.model(partner_model_id="TitanicQuickstart",
	input_type=InputType.Tabular,
	output_type=OutputType.Multiclass)

	# Map PredictedValue attribute to its corresponding GroundTruth attribute value.
	# This tells Arthur that the `pred_survived` column represents
	# the probability that the ground truth column has the value 1
	pred_to_ground_truth_map = {'pred_survived' : 1}

	# Build arthur_model schema on the reference dataset,
	# specifying which attribute represents ground truth
	# and which attributes are NonInputData.
	# Arthur will monitor NonInputData attributes even though they are not model inputs.
	arthur_model.build(reference_data,
	ground_truth_column='survived',
	pred_to_ground_truth_map=pred_to_ground_truth_map,
	non_input_columns=['fare', 'passenger_class'])

	# Configure the `passenger_class` attribute
	# 1. Turn on bias monitoring for the attribute.
	# 2. Specify that the passenger_class attribute has possible values [1, 2, 3],
	# since that information was not present in reference_data (only values 1 and 3 are present).
	arthur_model.get_attribute(name='passenger_class').set(monitor_for_bias=True,
	categories=[1,2,3])
	# onboard the model to Arthur
	arthur_model.save()
	```


	## Sending Inferences

	Here we send batches of inferences from `inference_data` to Arthur.

	```python
	# send four batches of inferences
	for batch in range(4):
	# Sample the inference dataset with predictions
	inferences = inference_data.sample(np.random.randint(2, 5))

	# Send the inferences to Arthur
	arthur_model.send_inferences(inferences, batch_id=f"batch_{batch}")
	```

	## Performance Results
	With our model onboarded and inferences sent, we can get performance results from Arthur. View your model in your
	Arthur dashboard, or use the code below to fetch the overall accuracy rate:

	```python
	# query model accuracy across the batches
	query = {
	"select": [
	{
	"function": "accuracyRate"
	}
	]
	}
	query_result = arthur_model.query(query)
	```

	If you print `query_result`, you should see `[{'accuracyRate': 1}]`.

	## Next Steps

	### {doc}`Basic Concepts <basic_concepts>`

	The {doc}`basic_concepts` page contains a quick introduction to important terms and ideas to get familiar with
	model monitoring using the Arthur platform.

	### {doc}`Onboard Your Model </user-guide/walkthroughs/model-onboarding/index>`

	The {doc}`Model Onboarding walkthrough </user-guide/walkthroughs/model-onboarding/index>` page covers the steps of onboarding a model, formatting attribute
	data, and sending inferences to Arthur.