Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ license: apache-2.0
|
|
17 |
|
18 |
**Evaluate function calling on EN benchmark**
|
19 |
|
20 |
-
[Berkeley function-calling leaderboard](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html)
|
21 |
|
22 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
23 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
@@ -31,7 +31,7 @@ license: apache-2.0
|
|
31 |
|
32 |
**Evaluate function calling on ZHTW benchmark**
|
33 |
|
34 |
-
[function-calling-leaderboard-for-zhtw](https://github.com/mtkresearch/function-calling-leaderboard-for-zhtw)
|
35 |
|
36 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
37 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
@@ -46,7 +46,7 @@ license: apache-2.0
|
|
46 |
|
47 |
**Evaluate instrustion following on EN benchmark**
|
48 |
|
49 |
-
MT-Bench
|
50 |
|
51 |
| | Win | Tie | Lose |
|
52 |
|---|---|---|---|
|
@@ -55,7 +55,7 @@ MT-Bench
|
|
55 |
|
56 |
**Evaluate instrustion following on ZHTW benchmark**
|
57 |
|
58 |
-
MT-Bench-TC
|
59 |
|
60 |
| | Win | Tie | Lose |
|
61 |
|---|---|---|---|
|
|
|
17 |
|
18 |
**Evaluate function calling on EN benchmark**
|
19 |
|
20 |
+
We evaluate the performance of function calling on English with benchmark [Berkeley function-calling leaderboard](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html).
|
21 |
|
22 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
23 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
|
|
31 |
|
32 |
**Evaluate function calling on ZHTW benchmark**
|
33 |
|
34 |
+
We evaluate the performance of function calling on Traditional Chinese with benchmark [function-calling-leaderboard-for-zhtw](https://github.com/mtkresearch/function-calling-leaderboard-for-zhtw).
|
35 |
|
36 |
| Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
|
37 |
|-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
|
|
|
46 |
|
47 |
**Evaluate instrustion following on EN benchmark**
|
48 |
|
49 |
+
We evaluate the performance of instruction following on English with benchmark [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/README.md).
|
50 |
|
51 |
| | Win | Tie | Lose |
|
52 |
|---|---|---|---|
|
|
|
55 |
|
56 |
**Evaluate instrustion following on ZHTW benchmark**
|
57 |
|
58 |
+
We evaluate the performance of instruction following on Traditional Chinese with benchmark [MT-Bench-TC](https://github.com/mtkresearch/TCEval).
|
59 |
|
60 |
| | Win | Tie | Lose |
|
61 |
|---|---|---|---|
|