YC-Chen commited on
Commit
64674b4
•
1 Parent(s): dfe4b31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -21,10 +21,12 @@ Berkeley function-calling leaderboard
21
 
22
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
23
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
24
- | **Breeze-7B-FC-v1_0 (FC)** | 86.01 | 74.58 | 90.00 | 93.00 | 82.00 | 83.00 | 98.00 | 92.00 | 88.00 | 75.00 |
25
  | Gorilla-OpenFunctions-v2 (FC) | 85.95 | 60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
26
  | GPT-3.5-Turbo-0125 (FC) | 72.77 | 4.58 | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |
27
 
 
 
28
  ![](misc/radar_chart_en.png)
29
 
30
  **Evaluate function calling on ZHTW benchmark**
@@ -33,10 +35,12 @@ function-calling-leaderboard-for-zhtw
33
 
34
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
35
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
36
- | **Breeze-7B-FC-v1_0 (FC)** | 77.70 | 71.67 | 82.00 | 86.50 | 76.00 | 65.50 | 87.00 | 88.00 | 80.00 | 57.50 |
37
  | Gorilla-OpenFunctions-v2 (FC) | 75.68 | 53.75 | 84.75 | 86.50 | 72.50 | 68.00 | 92.00 | 92.00 | 62.00 | 72.50 |
38
  | GPT-3.5-Turbo-0125 (FC) | 66.15 | 7.50 | 83.75 | 83.50 | 73.00 | 65.50 | 88.00 | 84.00 | 72.00 | 40.00 |
39
 
 
 
40
  ![](misc/radar_chart_zhtw.png)
41
 
42
 
 
21
 
22
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
23
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
24
+ | **Breeze-7B-FC-v1_0 (FC)** | 86.89 | 76.25 | 90.00 | 93.00 | 84.00 | 84.00 | 100.00 | 92.00 | 88.00 | 77.50 |
25
  | Gorilla-OpenFunctions-v2 (FC) | 85.95 | 60.00 | 94.25 | 95.50 | 86.50 | 86.00 | 97.00 | 96.00 | 80.00 | 75.00 |
26
  | GPT-3.5-Turbo-0125 (FC) | 72.77 | 4.58 | 87.75 | 90.50 | 88.50 | 82.50 | 91.00 | 82.00 | 78.00 | 52.50 |
27
 
28
+
29
+
30
  ![](misc/radar_chart_en.png)
31
 
32
  **Evaluate function calling on ZHTW benchmark**
 
35
 
36
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
37
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
38
+ | **Breeze-7B-FC-v1_0 (FC)** | 78.18 | 72.50 | 82.00 | 86.00 | 76.50|67.00|88.00|88.00|80.00|60.00|
39
  | Gorilla-OpenFunctions-v2 (FC) | 75.68 | 53.75 | 84.75 | 86.50 | 72.50 | 68.00 | 92.00 | 92.00 | 62.00 | 72.50 |
40
  | GPT-3.5-Turbo-0125 (FC) | 66.15 | 7.50 | 83.75 | 83.50 | 73.00 | 65.50 | 88.00 | 84.00 | 72.00 | 40.00 |
41
 
42
+
43
+
44
  ![](misc/radar_chart_zhtw.png)
45
 
46