Spaces:
Sleeping
Sleeping
ask-arthur
/
files
/arthur-docs-markdown
/user-guide
/api-query-guide
/transformation_functions.md.txt
# Transformation Functions | |
For transformation functions, it will be helpful to include `property` in the request to help associate the transformation function values, for example: | |
```json | |
{ | |
"select": [ | |
{ | |
"property": "<attribute_name> [string]" | |
}, | |
{ | |
"function": "roundTimestamp", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"property": "<attribute_name> [string]", | |
"time_interval": "[second|minute|hour|day|month|year]" | |
} | |
} | |
] | |
} | |
``` | |
We omit `property` for brevity in the following examples. | |
For an explanation of nested functions, see the guide {doc}`composing_functions`. | |
### Round Timestamp | |
Rounds a timestamp property to the provided time interval. This function requires one property | |
which must be an attribute of type datetime and one parameter, `time_interval`. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "roundTimestamp", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"property": "<attribute_name> [string or nested]", | |
"time_interval": "[second|minute|hour|day|month|year]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "<rounded_timestamp> [string]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "roundTimestamp", | |
"parameters": { | |
"property": "inference_timestamp", | |
"time_interval": "day" | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"roundTimestamp": "2020-08-10T00:00:00.000Z" | |
}, | |
{ | |
"roundTimestamp": "2020-08-09T00:00:00.000Z" | |
}, | |
{ | |
"roundTimestamp": "2020-08-08T00:00:00.000Z" | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Label By Max Column | |
Given a list of columns, returns a string column with the name of the column containing the max value for the row. For example, this function can be used to extract the max probability from a vector of probability properties. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "labelByMaxColumn", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"properties": [ | |
"<property name> [string]" | |
] | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "<one of the properties from the given list> [string]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "labelByMaxColumn", | |
"alias": "classPrediction", | |
"parameters": { | |
"properties": [ | |
"class_1", | |
"class_2", | |
"class_3" | |
] | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"classPrediction": [ | |
{ | |
"classPrediction": "class_1" | |
}, | |
{ | |
"classPrediction": "class_1" | |
}, | |
{ | |
"classPrediction": "class_2" | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Bin Continuous | |
This function bins a continuous value based on supplied thresholds. The bins will be formed as: | |
`[< threshold_1, threshold_1 <= x < threshold_2, ... , threshold_(n-1) <= x < threshold_(n), threshold_(n) < x]`. | |
The response bins will be labeled with an integer id corresponding to the ordered bin, starting at 1. | |
If `n` thresholds are given, `n+1` bins will be returned. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "binContinuous", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"property": "<property name> [string or nested]", | |
"bin_thresholds": [ | |
"<threshold_1> [number]", | |
"<threshold_2> [number]", | |
"<threshold_3> [number]" | |
] | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "<bin_id> [int]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"property": "age" | |
}, | |
{ | |
"function": "binContinuous", | |
"alias": "ageBin", | |
"parameters": { | |
"property": "age", | |
"bin_thresholds": [ | |
18, | |
65, | |
95 | |
] | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"age": 10, | |
"ageBin": 1 | |
}, | |
{ | |
"age": 20, | |
"ageBin": 2 | |
}, | |
{ | |
"age": 70, | |
"ageBin": 3 | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Bins To Quantiles | |
Returns an array of values representing the quantiles based on the number of bins passed to the function. | |
For example if you supply `"num_bins": "10"`, then this query will return the value in your data at the | |
10%, 20%, ... , 90%, 100% quantiles. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "binsToQuantiles", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"property": "<attribute_name> [string or nested]", | |
"num_bins": "<num_bins> [int]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": [ | |
"float" | |
] | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "binsToQuantiles", | |
"alias": "quantiles", | |
"parameters": { | |
"property": "age", | |
"num_bins": 10 | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": [ | |
19, | |
28, | |
37, | |
46, | |
55, | |
64, | |
73, | |
82, | |
91 | |
] | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Date Diff | |
Returns the difference of two timestamps in `unit`s. Valid units are: | |
`second`, `minute`, `hour`, `day`, `week`, `month`, `quarter`, and `year`. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "dateDiff", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"unit": "[second|minute|hour|day|week|month|quarter|year]", | |
"start_date": "<attribute_name> [string or nested]", | |
"end_date": "<attribute_name> [string or nested]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "difference [int]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "dateDiff", | |
"alias": "date_diff", | |
"parameters": { | |
"unit": "second", | |
"start_date": "inference_timestamp", | |
"end_date": "prev_timestamp" | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"date_diff": 100 | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Neighbor | |
Returns the value of the column `offset` rows next to this row in the ordering. | |
`default` is the value that is returned when the offset goes out of bounds on the row set. | |
It is recommended to use this function in a subquery with an `order_by` clause to get consistent ordering. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "neighbor", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"offset": "<offset> [int]", | |
"property": "<attribute_name> [string or nested]", | |
"default": "<default_value> [any]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "neighbor_value [any]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "neighbor", | |
"alias": "prev_timestamp", | |
"parameters": { | |
"property": "inference_timestamp", | |
"offset": -1, | |
"default": null | |
} | |
} | |
], | |
"order_by": [ | |
{ | |
"property": "inference_timestamp", | |
"direction": "desc" | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"prev_timestamp": "2021-06-15T00:00:00.000Z" | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Arithmetic | |
`add`, `subtract`, `multiply`, and `divide` are valid arithmetic functions. | |
Each takes two columns as input and returns the result of the arithmetic expression. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "[add|subtract|multiply|divide]", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"left": "<attribute_name> [string or nested]", | |
"right": "<attribute_name> [string or nested]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "expression_result [number]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "multiply", | |
"alias": "double_home_value", | |
"parameters": { | |
"left": "Home_Value", | |
"right": 2 | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"double_home_value": 20000 | |
} | |
] | |
} | |
``` | |
Sample nested request to compute `(Home_Value + Car_Value) * 2` | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "multiply", | |
"alias": "double_loans", | |
"parameters": { | |
"left": { | |
"nested_function": { | |
"function": "add", | |
"alias": "total_loan", | |
"parameters": { | |
"left": "Home_Value", | |
"right": "Car_Value" | |
} | |
} | |
}, | |
"right": 2 | |
} | |
} | |
] | |
} | |
``` | |
Sample Nested Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"double_loans": 20000 | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Absolute Value | |
Take the absolute value of a property. | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "abs", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"property": "<attribute_name> [string or nested]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "<abs_value> [float]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "abs", | |
"alias": "abs_delta", | |
"parameters": { | |
"property": "delta" | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"abs_delta": 55.45 | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) | |
### Logical Functions | |
`equals`, `and`, and `or` are valid logical functions. Each takes two columns as input and returns the result of the logical expression. These follow the same API as [Arithmetic Functions](#arithmetic) | |
Query Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "[equals|and|or]", | |
"alias": "<alias_name> [optional string]", | |
"parameters": { | |
"left": "<attribute_name> [string or nested]", | |
"right": "<attribute_name> [string or nested]" | |
} | |
} | |
] | |
} | |
``` | |
Query Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"<function_name/alias_name>": "expression_result [0 or 1]" | |
} | |
] | |
} | |
``` | |
Sample Request: | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "equals", | |
"alias": "has_phd", | |
"parameters": { | |
"left": "education", | |
"right": 4 | |
} | |
} | |
] | |
} | |
``` | |
Sample Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"has_phd": 1 | |
} | |
] | |
} | |
``` | |
Sample nested request to compute `has_phd` or `has_masters` | |
```json | |
{ | |
"select": [ | |
{ | |
"function": "or", | |
"alias": "has_higher_education", | |
"parameters": { | |
"left": { | |
"nested_function": { | |
"function": "equals", | |
"alias": "has_phd", | |
"parameters": { | |
"left": "education", | |
"right": 4 | |
} | |
} | |
}, | |
"right": { | |
"nested_function": { | |
"function": "equals", | |
"alias": "has_masters", | |
"parameters": { | |
"left": "education", | |
"right": 3 | |
} | |
} | |
} | |
} | |
} | |
] | |
} | |
``` | |
Sample Nested Response: | |
```json | |
{ | |
"query_result": [ | |
{ | |
"has_higher_education": 1 | |
} | |
] | |
} | |
``` | |
[back to top](#transformation-functions) |