File size: 6,176 Bytes
5afe6d3
42908b3
5afe6d3
 
05ca6e3
42908b3
 
 
 
 
 
44b771f
5afe6d3
9bed0f6
5afe6d3
992efef
 
 
5afe6d3
9bed0f6
 
 
05ca6e3
5afe6d3
a1d838d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5afe6d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1d838d
 
5afe6d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fffffe4
 
5afe6d3
 
 
05ca6e3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---

language:
- en
license: apache-2.0
inference: false
tags:
- text-classification
- onnx
- int8
- optimum
- ONNXRuntime
---
# LLM agent flow text classification

This model identifies common LLM agent events and patterns within the conversation flow. 
Such events include an apology, where the LLM acknowledges a mistake.
The flow labels can serve as foundational elements for sophisticated LLM analytics.

It is ONNX quantized and is a fined-tune of [MiniLMv2-L6-H384](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large).
The base model can be found [here](https://huggingface.co/minuva/MiniLMv2-agentflow-v2)

This model is *only* for the LLM agent texts in the dialog. For the user texts [use this model](https://huggingface.co/minuva/MiniLMv2-userflow-v2-onnx/).


# Optimum

## Installation

Install from source: 
```bash
python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
```


## Run the Model
```py
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForSequenceClassification.from_pretrained('minuva/MiniLMv2-agentflow-v2-onnx', provider="CPUExecutionProvider")
tokenizer = AutoTokenizer.from_pretrained('minuva/MiniLMv2-agentflow-v2-onnx', use_fast=True, model_max_length=256, truncation=True, padding='max_length')

pipe = pipeline(task='text-classification', model=model, tokenizer=tokenizer, )
texts = ["My apologies", "Im not sure what you mean"]
pipe(texts)
# [{'label': 'agent_apology_error_mistake', 'score': 0.9967106580734253},
# {'label': 'agent_didnt_understand', 'score': 0.9975798726081848}]
```

# ONNX Runtime only

A lighter solution for deployment


## Installation
```bash
pip install tokenizers
pip install onnxruntime
git clone https://huggingface.co/minuva/MiniLMv2-agentflow-v2-onnx
```

## Run the Model

```py
import os
import numpy as np
import json

from tokenizers import Tokenizer
from onnxruntime import InferenceSession


model_name = "minuva/MiniLMv2-agentflow-v2-onnx"

tokenizer = Tokenizer.from_pretrained(model_name)
tokenizer.enable_padding(
    pad_token="<pad>",
    pad_id=1,
)
tokenizer.enable_truncation(max_length=256)
batch_size = 16

texts = ["thats my mistake"]
outputs = []
model = InferenceSession("MiniLMv2-agentflow-v2-onnx/model_optimized_quantized.onnx", providers=['CPUExecutionProvider'])

with open(os.path.join("MiniLMv2-agentflow-v2-onnx", "config.json"), "r") as f:
            config = json.load(f)

output_names = [output.name for output in model.get_outputs()]
input_names = [input.name for input in model.get_inputs()]

for subtexts in np.array_split(np.array(texts), len(texts) // batch_size + 1):
            encodings = tokenizer.encode_batch(list(subtexts))
            inputs = {
                "input_ids": np.vstack(
                    [encoding.ids for encoding in encodings],
                ),
                "attention_mask": np.vstack(
                    [encoding.attention_mask for encoding in encodings],
                ),
                "token_type_ids": np.vstack(
                    [encoding.type_ids for encoding in encodings],
                ),
            }

            for input_name in input_names:
                if input_name not in inputs:
                    raise ValueError(f"Input name {input_name} not found in inputs")

            inputs = {input_name: inputs[input_name] for input_name in input_names}
            output = np.squeeze(
                np.stack(
                    model.run(output_names=output_names, input_feed=inputs)
                ),
                axis=0,
            )
            outputs.append(output)

outputs = np.concatenate(outputs, axis=0)
scores = 1 / (1 + np.exp(-outputs))
results = []
for item in scores:
    labels = []
    scores = []
    for idx, s in enumerate(item):
        labels.append(config["id2label"][str(idx)])
        scores.append(float(s))
    results.append({"labels": labels, "scores": scores})


res = []

for result in results:
    joined = list(zip(result['labels'], result['scores']))
    max_score = max(joined, key=lambda x: x[1])    
    res.append(max_score)

res
# [('agent_apology_error_mistake', 0.9991968274116516),
# ('agent_didnt_understand', 0.9993669390678406)]
```

# Categories Explanation

<details>
  <summary>Click to expand!</summary>
  
    - OTHER: Responses or actions by the agent that do not fit into the predefined categories or are outside the scope of the specific interactions listed.

    - agent_apology_error_mistake: When the agent acknowledges an error or mistake in the information provided or in the handling of the request.

    - agent_apology_unsatisfactory: The agent expresses an apology for providing an unsatisfactory response or for any dissatisfaction experienced by the user.

    - agent_didnt_understand: Indicates that the agent did not understand the user's request or question.

    - agent_limited_capabilities: The agent communicates its limitations in addressing certain requests or providing certain types of information.

    - agent_refuses_answer: When the agent explicitly refuses to answer a question or fulfill a request, due to policy restrictions or ethical considerations.

    - image_limitations": The agent points out limitations related to handling or interpreting images.

    - no_information_doesnt_know": The agent indicates that it has no information available or does not know the answer to the user's question.

    - success_and_followup_assistance": The agent successfully provides the requested information or service and offers further assistance or follow-up actions if needed.
</details>

<br>


# Metrics in our private test dataset
| Model (params)    |    Loss      |    Accuracy |  F1 |
|--------------------|-------------|----------|--------| 
| minuva/MiniLMv2-agentflow-v2 (33M) |   0.1462 | 0.9616 |  0.9618 |
| minuva/MiniLMv2-agentflow-v2-onnx (33M) |   -  |  0.9624 | 0.9626  |

# Deployment

Check our [llm-flow-classification repository](https://github.com/minuva/llm-flow-classification) for a FastAPI and ONNX based server to deploy this model on CPU devices.