Spaces:

maxcembalest
/

ask-arthur

Sleeping

App Files Files Community

ask-arthur / files /arthur-docs-markdown /user-guide /api-query-guide /model_evaluation_functions.md.txt

maxcembalest

Upload 184 files

ad8da65 over 2 years ago

raw

history blame

22.7 kB

	# Model Evaluation Functions

	## Regression
	All regression evaluation metrics will follow the below request body structure.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "[rmse\|mae\|rSquared]",
	"alias": "<alias_name> [optional string]",
	"parameters": {
	"ground_truth_property": "<attribute_name> [string]",
	"predicted_property": "<attribute_name> [string]"
	}
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": "<evaluation_value> [float]"
	}
	]
	}
	```

	### RMSE
	Get the RMSE between a prediction attribute and a ground truth attribute.

	Sample Request:
	```json
	{
	"select": [
	{
	"function": "rmse",
	"alias": "error",
	"parameters": {
	"ground_truth_property": "FICO_actual",
	"predicted_property": "FICO_predicted"
	}
	}
	]
	}
	```
	Sample Response:
	```json
	{
	"query_result": [
	{
	"error": 0.76
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)

	### MAE
	Get the Mean Absolute Error between a prediction attribute and a ground truth attribute.
	This function takes an optional parameter `aggregation` that allows swapping the aggregation from `"avg"`
	to either `"min"` or `"max"`. This can be helpful if you're looking for extremes, as in the lowest or highest absolute error, respectively.
	Additionally, this functions supports optional params `normalizationMax` and `normalizationMin`
	that accept numbers and will perform min/max normalization on the values before aggregation if both params are provided.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "mae",
	"alias": "<alias_name> [optional string]",
	"parameters": {
	"predicted_property": "<predicted_property_name> [string]",
	"ground_truth_property": "<ground_truth_property_name> [string]",
	"aggregation": "[avg\|min\|max] (default avg, optional)",
	"normalizationMin": "<value> [optional number]",
	"normalizationMax": "<value> [optional number]"
	}
	}
	]
	}
	```

	Sample Request:
	```json
	{
	"select": [
	{
	"function": "mae",
	"alias": "error",
	"parameters": {
	"ground_truth_property": "FICO_actual",
	"predicted_property": "FICO_predicted"
	}
	}
	]
	}
	```
	Sample Response:
	```json
	{
	"query_result": [
	{
	"error": 0.76
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)


	### R Squared
	Get the R Squared value between a prediction attribute and a ground truth attribute.

	Sample Request:
	```json
	{
	"select": [
	{
	"function": "rSquared",
	"alias": "rsq",
	"parameters": {
	"ground_truth_property": "FICO_actual",
	"predicted_property": "FICO_predicted"
	}
	}
	]
	}
	```
	Sample Response:
	```json
	{
	"query_result": [
	{
	"rsq": 0.94
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)

	## Binary Classification
	When using any binary classification evaluation functions with a multiclass model, outputs will be calculated assuming a one vs. all approach.

	### Confusion Matrix
	Calculates the confusion matrix for a classification model. For binary classifiers, users must specify a probability `threshold` to count a prediction as a positive class.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "confusionMatrix",
	"alias": "<alias_name> [optional string]",
	"parameters": {
	"threshold": "<value [float]> [required only for binary classifiers]"
	}
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": {
	"true_positive": "<count> [int]",
	"false_positive": "<count> [int]",
	"true_negative": "<count> [int]",
	"false_negative": "<count> [int]"
	}
	}
	]
	}
	```

	Sample Request: Calculate the confusion matrix for a binary classifier with a threshold of 0.5 (standard threshold for confusion matrix).
	```json
	{
	"select": [
	{
	"function": "confusionMatrix",
	"parameters": {
	"threshold": 0.5
	}
	}
	]
	}
	```
	Sample Response:
	```json
	{
	"query_result": [
	{
	"confusionMatrix": {
	"true_positive": 100480,
	"false_positive": 100076,
	"true_negative": 100302,
	"false_negative": 99142
	}
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)


	### Confusion Matrix Rate
	Calculates the confusion matrix rates for a classification model. For binary classifiers, users must specify a probability `threshold` to count a prediction as a positive class.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "confusionMatrixRate",
	"alias": "<alias_name> [optional string]",
	"parameters": {
	"threshold": "<value [float]> [required only for binary classifiers]"
	}
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": {
	"true_positive_rate": "<rate> [float]",
	"false_positive_rate": "<rate> [float]",
	"true_negative_rate": "<rate> [float]",
	"false_negative_rate": "<rate> [float]",
	"accuracy_rate": "<rate> [float]"
	}
	}
	]
	}
	```


	Sample Request: Calculate the confusion matrix for a binary classifier with a threshold of 0.5 (standard threshold for confusion matrix).
	```json
	{
	"select": [
	{
	"function": "confusionMatrixRate",
	"parameters": {
	"threshold": 0.5
	}
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"confusionMatrixRate": {
	"true_positive_rate": 0.5033513340213003,
	"false_positive_rate": 0.49943606583557076,
	"true_negative_rate": 0.5005639341644292,
	"false_negative_rate": 0.4966486659786997
	}
	}
	]
	}
	```

	[back to top](#model-evaluation-functions)


	### Confusion Matrix Variants

	If you only want a specific metric derived from a confusion matrix, you can use one of the following functions:
	* `truePositiveRate`
	* `falsePositiveRate`
	* `trueNegativeRate`
	* `falseNegativeRate`
	* `accuracyRate`
	* `balancedAccuracyRate`
	* `f1`
	* `sensitivity`
	* `specificity`
	* `precision`
	* `recall`

	For example, to return the `truePositiveRate`:
	```json
	{
	"select": [
	{
	"function": "truePositiveRate",
	"parameters": {
	"threshold": 0.5,
	"ground_truth_property":"class_a",
	"predicted_property":"ground_truth_a"
	}
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"truePositiveRate": 0.5033513340213003
	}
	]
	}
	```

	[back to top](#model-evaluation-functions)

	### AUC

	The [Area Under the ROC Curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve)
	can also be computed for binary classifiers.

	Sample Query:
	```json
	{
	"select": [
	{
	"function": "auc",
	"parameters": {
	"ground_truth_property":"class_a",
	"predicted_property":"ground_truth_a"
	}
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"auc": 0.9192331426352897
	}
	]
	}
	```

	## Multiclass Classification

	### Multiclass Accuracy Rate
	Calculates the global accuracy rate.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "accuracyRateMulticlass",
	"alias": "<alias_name> [optional string]"
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"accuracyRateMulticlass": "<rate> [float]"
	}
	]
	}
	```


	Example:
	```json
	{
	"select": [
	{
	"function": "accuracyRateMulticlass"
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"accuracyRateMulticlass": 0.785
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)

	### Multiclass Confusion Matrix
	Calculates the confusion matrix for a multiclass model with regard to a single class. The predicted attribute and ground truth attribute must be passed as parameters.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "confusionMatrixMulticlass",
	"alias": "<alias_name> [optional string]",
	"parameters": {
	"predicted_property": "<predicted_property_name>",
	"ground_truth_property": "<ground_truth_property_name>"
	}
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": {
	"true_positive": "<count> [int]",
	"false_positive": "<count> [int]",
	"true_negative": "<count> [int]",
	"false_negative": "<count> [int]"
	}
	}
	]
	}
	```

	Example:
	```json
	{
	"select": [
	{
	"function": "confusionMatrixMulticlass",
	"parameters": {
	"predicted_property": "predicted_class_A",
	"ground_truth_property": "gt_predicted_class_A"
	}
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"confusionMatrix": {
	"true_positive": 100480,
	"false_positive": 100076,
	"true_negative": 100302,
	"false_negative": 99142
	}
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)


	### Multiclass Confusion Matrix Rate
	Calculates the confusion matrix rates for a multiclass classification model in regards to a single predicted class.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "confusionMatrixRateMulticlass",
	"alias": "<alias_name> [optional string]",
	"parameters": {
	"predicted_property": "predicted_class_A",
	"ground_truth_property": "gt_predicted_class_A"
	}
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": {
	"true_positive_rate": "<rate> [float]",
	"false_positive_rate": "<rate> [float]",
	"true_negative_rate": "<rate> [float]",
	"false_negative_rate": "<rate> [float]",
	"accuracy_rate": "<rate> [float]",
	"balanced_accuracy_rate": "<rate> [float]",
	"precision": "<rate> [float]",
	"f1": "<rate> [float]"
	}
	}
	]
	}
	```

	Example calculating the confusion matrix rates:
	```json
	{
	"select": [
	{
	"function": "confusionMatrixRateMulticlass",
	"parameters": {
	"predicted_property": "predicted_class_A",
	"ground_truth_property": "gt_predicted_class_A"
	}
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"confusionMatrixRateMulticlass": {
	"true_positive_rate": 0.6831683168316832,
	"false_positive_rate": 0.015653220951234198,
	"true_negative_rate": 0.9843467790487658,
	"false_negative_rate": 0.31683168316831684,
	"accuracy_rate": 0.9378818737270875,
	"balanced_accuracy_rate": 0.8337575479402245,
	"precision": 0.8884120171673819,
	"f1": 0.7723880597014925
	}
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)


	If you only want a specific value from the confusion matrix rate function, you can use one of the following functions:
	* `truePositiveRateMulticlass`
	* `falsePositiveRateMulticlass`
	* `trueNegativeRateMulticlass`
	* `falseNegativeRateMulticlass`

	For example, to return the `truePositiveRate`:
	```json
	{
	"select": [
	{
	"function": "truePositiveRateMulticlass",
	"parameters": {
	"predicted_property": "predicted_class_A",
	"ground_truth_property": "gt_predicted_class_A"
	}
	}
	]
	}
	```
	Response:
	```json
	{
	"query_result": [
	{
	"truePositiveRate": 0.5033513340213003
	}
	]
	}
	```
	[back to top](#model-evaluation-functions)

	### Multiclass F1
	Calculates the components needed to compute a F1 score for a multiclass model.

	In this example, the model has 3 classes: `class-1`, `class-2`, `class-3` and
	the corresponding ground truth labels `class-1-gt`, `class-2-gt`, `class-3-gt`.

	Query Request:
	```json
	{
	"select": [
	{
	"function": "count",
	"alias": "count"
	},
	{
	"function": "confusionMatrixRateMulticlass",
	"alias": "class-1",
	"parameters": {
	"predicted_property": "class-1",
	"ground_truth_property": "class-1-gt"
	}
	},
	{
	"function": "countIf",
	"alias": "class-1-gt",
	"parameters": {
	"property": "multiclass_model_ground_truth_class",
	"comparator": "eq",
	"value": "class-1-gt"
	},
	"stage": "GROUND_TRUTH"
	},
	{
	"function": "confusionMatrixRateMulticlass",
	"alias": "class-2",
	"parameters": {
	"predicted_property": "class-2",
	"ground_truth_property": "class-2-gt"
	}
	},
	{
	"function": "countIf",
	"alias": "class-2-gt",
	"parameters": {
	"property": "multiclass_model_ground_truth_class",
	"comparator": "eq",
	"value": "class-2-gt"
	},
	"stage": "GROUND_TRUTH"
	},
	{
	"function": "confusionMatrixRateMulticlass",
	"alias": "class-3",
	"parameters": {
	"predicted_property": "class-3",
	"ground_truth_property": "class-3-gt"
	}
	},
	{
	"function": "countIf",
	"alias": "class-3-gt",
	"parameters": {
	"property": "multiclass_model_ground_truth_class",
	"comparator": "eq",
	"value": "class-3-gt"
	},
	"stage": "GROUND_TRUTH"
	}
	]
	}
	```
	Query Response:
	```json
	{
	"query_result": [
	{
	"count": 7044794,
	"class-1-gt": 2540963,
	"class-2-gt": 2263918,
	"class-3-gt": 2239913,
	"class-1": {
	"true_positive_rate": 0.4318807475748368,
	"false_positive_rate": 0.3060401245073361,
	"true_negative_rate": 0.6939598754926639,
	"false_negative_rate": 0.5681192524251633,
	"accuracy_rate": 0.5994314383074935,
	"balanced_accuracy_rate": 0.5629203115337503,
	"precision": 0.4432575070302042,
	"f1": 0.437495178612114
	},
	"class-2": {
	"true_positive_rate": 0.42177322676881407,
	"false_positive_rate": 0.3514795196528837,
	"true_negative_rate": 0.6485204803471163,
	"false_negative_rate": 0.578226773231186,
	"accuracy_rate": 0.5756528863725469,
	"balanced_accuracy_rate": 0.5351468535579652,
	"precision": 0.3623427088234848,
	"f1": 0.38980575845890253
	},
	"class-3": {
	"true_positive_rate": 0.26144274353512836,
	"false_positive_rate": 0.2805894672521546,
	"true_negative_rate": 0.7194105327478454,
	"false_negative_rate": 0.7385572564648716,
	"accuracy_rate": 0.5737983254017079,
	"balanced_accuracy_rate": 0.4904266381414869,
	"precision": 0.3028268576818381,
	"f1": 0.2806172238153916
	}
	}
	]
	}
	```

	With this result, you can calculate the [weighted F1 score](https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1)
	by multiplying each classes's F1 score by the count of the ground truth and dividing by the total count.
	In this example, that would be
	```
	(class-1.f1 * class-1-gt + class-2.f1 * class-2-gt + class-3.f1 * class-3-gt) / count
	```
	and with numbers:
	```
	(0.437495178612114 * 2540963 +
	0.38980575845890253 * 2263918 +
	0.2806172238153916 * 2239913) / 7044794

	= 0.3722898785
	```

	[back to top](#model-evaluation-functions)

	## Object Detection

	### Objects Detected
	For multiclass, multilabel, and regression models, querying model performance works the same for Arthur computer vision models as more tabular and NLP models. But `Object Detection` computer vision have some special fields you can use when querying.

	Example query fetching all bounding box fields:

	```json
	{
	"selects": [
	{
	"property": "inference_id"
	},
	{
	"property": "objects_detected"
	}
	]
	}
	```

	The reponse will have 1 object per bounding box.

	```json
	{
	"query_result": [
	{
	"inference_id": "1",
	"objects_detected.class_id": 0,
	"objects_detected.confidence": 0.6,
	"objects_detected.top_left_x": 23,
	"objects_detected.top_left_y": 45,
	"objects_detected.width": 20,
	"objects_detected.height": 30
	},
	{
	"inference_id": "1",
	"objects_detected.class_id": 1,
	"objects_detected.confidence": 0.6,
	"objects_detected.top_left_x": 23,
	"objects_detected.top_left_y": 45,
	"objects_detected.width": 20,
	"objects_detected.height": 30
	},
	{"inference_id": 2,
	"...": "..."}
	]
	}
	```

	You can also specify only a single nested field:

	```json
	{
	"selects": [
	{
	"property": "inference_id"
	},
	{
	"property": "objects_detected.class_id"
	},
	{
	"property": "objects_detected.confidence"
	}
	]
	}
	```

	The reponse will have 1 object per bounding box.

	```json
	{
	"query_result": [
	{
	"inference_id": "1",
	"objects_detected.class_id": 0,
	"objects_detected.confidence": 0.6
	},
	{
	"inference_id": "1",
	"objects_detected.class_id": 1,
	"objects_detected.confidence": 0.6
	},
	{"inference_id": 2,
	"...": "..."}
	]
	}
	```

	```{note} When supplying the bounding box specific fields in filters, group bys, or order bys the columns must also be supplied in the select clause in order for the query to succeed.
	```

	### Mean Average Precision

	Calculates Mean Average Precision for an object detection model. This is used as measure of accuracy for object detection models.

	`threshold` determines minimum IoU value to be considered a match for a label. `predicted_property` and `ground_truth_property` are optional parameters and should be the names of the predicted and ground truth attributes for the model. They default to `"objects_detected"` and `"label"` respectively if nothing is specified for these parameters.

	Query Request:

	```json
	{
	"select": [
	{
	"function": "meanAveragePrecision",
	"alias": "<alias_name> [Optional]",
	"parameters": {
	"threshold": "<threshold> [float]",
	"predicted_property": "<predicted_property> [str]",
	"ground_truth_property": "<ground_truth_property> [str]"
	}
	}
	]
	}
	```

	Query Response:

	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": "<result> [float]"
	}
	]
	}
	```

	Example:

	```json
	{
	"select": [
	{
	"function": "meanAveragePrecision",
	"parameters": {
	"threshold": 0.5,
	"predicted_property": "objects_detected",
	"ground_truth_property": "label"
	}
	}
	]
	}
	```

	Query Response:

	```json
	{
	"query_result": [
	{
	"meanAveragePrecision": 0.78
	}
	]
	}
	```

	## Bias

	### Bias Mitigation

	Calculates mitigated predictions based on conditional thresholds, returning 0/1 for each inference.
	```{note} This function returns null for inferences that don't match any of the provided conditions.
	```
	Query Request:

	```json
	{
	"select":
	[
	{
	"function": "biasMitigatedPredictions",
	"alias": "<alias_name> [Optional]",
	"parameters":
	{
	"predicted_property": "<predicted_property> [str]",
	"thresholds":
	[
	{
	"conditions":
	{
	"property": "<attribute_name> [string or nested]",
	"comparator": "<comparator> [string] Optional: default 'eq'",
	"value": "<string or number to compare with property>"
	},
	"threshold": "<threshold> [float]"
	}
	]
	}
	}
	]
	}
	```

	Query Response:

	```json
	{
	"query_result": [
	{
	"<function_name/alias_name>": "<result> [int]"
	}
	]
	}
	```

	Example:

	```json
	{
	"select":
	[
	{
	"function": "biasMitigatedPredictions",
	"parameters":
	{
	"predicted_property": "prediction_1",
	"thresholds":
	[
	{
	"conditions":
	[
	{
	"property": "SEX",
	"value": 1
	}
	],
	"threshold": 0.4
	},
	{
	"conditions":
	[
	{
	"property": "SEX",
	"value": 2
	}
	],
	"threshold": 0.6
	}
	]
	}
	}
	]
	}
	```

	Response:
	```json
	{
	"query_result":
	[
	{
	"SEX": 1,
	"biasMitigatedPredictions": 1
	},
	{
	"SEX": 2,
	"biasMitigatedPredictions": 0
	},
	{
	"SEX": 1,
	"biasMitigatedPredictions": 0
	}
	]
	}
	```

	[back to top](#model-evaluation-functions)