Spaces:
Sleeping
Sleeping
document accuracy property
Browse files
README.md
CHANGED
@@ -105,9 +105,10 @@ overall_accuracy = np.mean(list(suite_accuracies.values()))
|
|
105 |
|
106 |
### Output Values
|
107 |
|
108 |
-
The metric returns a dict of `SyntaxGymMetricSuiteResult`
|
109 |
|
110 |
-
- **
|
|
|
111 |
- **region_totals** (`List[Dict[Tuple[str, int], float]`): For each item, a mapping from individual region (keys `(<condition_name>, <region_number>)`) to the float-valued total surprisal for tokens in this region. This is useful for visualization, or if you'd like to use the aggregate surprisal data for other tasks (e.g. reading time prediction or neural activity prediction).
|
112 |
|
113 |
```python
|
|
|
105 |
|
106 |
### Output Values
|
107 |
|
108 |
+
The metric returns a dict of `SyntaxGymMetricSuiteResult` objects, mapping test suite names to test suite performance. Each inner object has three properties:
|
109 |
|
110 |
+
- **accuracy** (`float`): Model accuracy on this suite. This is the accuracy of the conjunction of all boolean predictions per item in the suite.
|
111 |
+
- **prediction_results** (`List[List[bool]]`): For each item in the test suite, a list of booleans indicating whether each corresponding prediction came out `True`. Typically these are combined to yield an accuracy score (but you can simply use the `accuracy` property).
|
112 |
- **region_totals** (`List[Dict[Tuple[str, int], float]`): For each item, a mapping from individual region (keys `(<condition_name>, <region_number>)`) to the float-valued total surprisal for tokens in this region. This is useful for visualization, or if you'd like to use the aggregate surprisal data for other tasks (e.g. reading time prediction or neural activity prediction).
|
113 |
|
114 |
```python
|