metadata

title: CodeBLEU
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
description: CodeBLEU metric for Python and C++

Metric Card for CodeBLEU

Module Card Instructions: Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.

Metric Description

CodeBLEU metric is used on code synthesis not only consider the surface match similar with the original BLEU, but can also consider the grammatical correctness and the logic correctness, leveraging the abstract syntax tree and the data-flow structure.

How to Use

clone the repository

git clone https://huggingface.co/spaces/giulio98/codebleu.git

import metric

from codebleu.calc_code_bleu import calculate

compute score

true_codes = [["def hello_world():\n    print("hello world!")"], ["def add(a,b)\n    return a+b"]]
code_gens = ["def hello_world():\n    print("hello world!")", "def add(a,b)\n    return a+b"]
codebleu = calculate(references=true_codes, predictions=code_gens, language="python", alpha=0.25, beta=0.25, gamma=0.25, theta=0.25)
print(codebleu['code_bleu_score'])

Inputs

List all input arguments in the format below

references (list of list of string): contains n possible solutions for each problem
predictions (list of string): contains a single prediction for each problem
language (string): python or cpp

Output Values

Values from Popular Papers

Limitations and Bias

Citation

@unknown{unknown,
author = {Ren, Shuo and Guo, Daya and Lu, Shuai and Zhou, Long and Liu, Shujie and Tang, Duyu and Zhou, Ming and Blanco, Ambrosio and Ma, Shuai},
year = {2020},
month = {09},
pages = {},
title = {CodeBLEU: a Method for Automatic Evaluation of Code Synthesis}
}