Update README.md
Browse files
README.md
CHANGED
@@ -21,10 +21,12 @@ Berkeley function-calling leaderboard
|
|
21 |
|
22 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
23 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
24 |
-
| **Breeze-7B-FC-v1_0 (FC)** | 86.
|
25 |
| Gorilla-OpenFunctions-v2 (FC) | 85.95 | 60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
|
26 |
| GPT-3.5-Turbo-0125 (FC) | 72.77 | 4.58 | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |
|
27 |
|
|
|
|
|
28 |
![](misc/radar_chart_en.png)
|
29 |
|
30 |
**Evaluate function calling on ZHTW benchmark**
|
@@ -33,10 +35,12 @@ function-calling-leaderboard-for-zhtw
|
|
33 |
|
34 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
35 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
36 |
-
| **Breeze-7B-FC-v1_0 (FC)** |
|
37 |
| Gorilla-OpenFunctions-v2 (FC) | 75.68 | 53.75 | 84.75 | 86.50 | 72.50 | 68.00 | 92.00 | 92.00 | 62.00 | 72.50 |
|
38 |
| GPT-3.5-Turbo-0125 (FC) | 66.15 | 7.50 | 83.75 | 83.50 | 73.00 | 65.50 | 88.00 | 84.00 | 72.00 | 40.00 |
|
39 |
|
|
|
|
|
40 |
![](misc/radar_chart_zhtw.png)
|
41 |
|
42 |
|
|
|
21 |
|
22 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
23 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
24 |
+
| **Breeze-7B-FC-v1_0 (FC)** | 86.89 | 76.25 | 90.00 | 93.00 | 84.00 | 84.00 | 100.00 | 92.00 | 88.00 | 77.50 |
|
25 |
| Gorilla-OpenFunctions-v2 (FC) | 85.95 | 60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
|
26 |
| GPT-3.5-Turbo-0125 (FC) | 72.77 | 4.58 | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |
|
27 |
|
28 |
+
|
29 |
+
|
30 |
![](misc/radar_chart_en.png)
|
31 |
|
32 |
**Evaluate function calling on ZHTW benchmark**
|
|
|
35 |
|
36 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
37 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
38 |
+
| **Breeze-7B-FC-v1_0 (FC)** | 78.18 | 72.50 | 82.00 | 86.00 | 76.50|67.00|88.00|88.00|80.00|60.00|
|
39 |
| Gorilla-OpenFunctions-v2 (FC) | 75.68 | 53.75 | 84.75 | 86.50 | 72.50 | 68.00 | 92.00 | 92.00 | 62.00 | 72.50 |
|
40 |
| GPT-3.5-Turbo-0125 (FC) | 66.15 | 7.50 | 83.75 | 83.50 | 73.00 | 65.50 | 88.00 | 84.00 | 72.00 | 40.00 |
|
41 |
|
42 |
+
|
43 |
+
|
44 |
![](misc/radar_chart_zhtw.png)
|
45 |
|
46 |
|