Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.25.2
title: Salesforce CodeT5 Large Demo
emoji: ⚡
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.24.0
app_file: app.py
pinned: false
license: apache-2.0
datasets:
- CodeSearchNet/codesearchnet_python
- bigcode/the-stack-dedup
- codeparrot/codeparrot-clean
- openai_humaneval
- google/mbpp
- nvidia/OpenCodeReasoning
hf_oauth: true
hf_oauth_scopes:
- inference-api
short_description: Using the powerful Salesforce CodeT5-large model
⚡ Salesforce CodeT5-large Demo ⚡
Welcome! This repository/Hugging Face Space hosts a demonstration application for the powerful Salesforce CodeT5-large model. It showcases the model's capabilities in various code intelligence tasks using a Gradio interface.
About CodeT5-large
CodeT5 is an advanced encoder-decoder transformer model pre-trained on a vast collection of source code from multiple programming languages alongside natural language text. The codet5-large
variant excels at tasks such as:
- Code Generation: Creating code snippets from natural language descriptions (e.g., comments, docstrings).
- Code Summarization: Generating concise natural language summaries for given code blocks.
- Code Translation: Translating code from one programming language to another.
- Code Refinement: Improving code quality, fixing bugs, or optimizing code.
Using the Demo (Hugging Face Space)
This application is built with Gradio, providing an interactive web UI.
- Access the Space: Navigate to the Hugging Face Space hosting this demo.
- Interact: Use the input fields provided by the Gradio interface (
app.py
) to interact with the model.- (Example: You might enter a Python docstring in one box to get the generated function body in another, or input code to get a summary. Please update this section with specific instructions based on your
app.py
functionality!)
- (Example: You might enter a Python docstring in one box to get the generated function body in another, or input code to get a summary. Please update this section with specific instructions based on your
- Observe: See the results generated by the CodeT5-large model in the output fields.
Running Locally (GitHub / Manual Setup)
If you prefer to run this demo on your local machine:
Clone the Repository:
git clone <repository_url> # Replace with HF Space or GitHub repo URL cd <repository_directory>
Set up Environment: (Optional but recommended) Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Linux/macOS # venv\Scripts\activate # Windows
Install Dependencies: Ensure you have Python 3 installed. You'll need Gradio and the necessary libraries for CodeT5 (like
transformers
andtorch
). Create arequirements.txt
file if one doesn't exist:# requirements.txt gradio==5.23.3 transformers torch # Add any other specific libraries your app.py needs
Then install:
pip install -r requirements.txt
Run the Application:
python app.py
Access Locally: Open your web browser and navigate to the URL provided (typically
http://127.0.0.1:7860
).
Fine-tuning Datasets for Python & Logic
The CodeT5 model's performance on specific Python tasks or logical reasoning can be enhanced through fine-tuning. Here are some recommended datasets included in the metadata:
- CodeSearchNet (Python): Excellent for tasks involving matching natural language queries to relevant Python code snippets.
- The Stack (Deduped): A massive, permissively licensed dataset. Filter for Python files (
lang:python
) for broad fine-tuning on diverse Python code. - CodeParrot (Clean): A high-quality dataset specifically curated for Python code generation tasks.
- HumanEval: A benchmark dataset consisting of Python function programming problems defined by docstrings, ideal for fine-tuning code generation based on specifications and evaluating functional correctness.
- MBPP (Mostly Basic Python Problems): Contains around 1,000 crowd-sourced Python programming problems focused on basic concepts, useful for improving generation from descriptions and simple logical problem-solving.
License
This project and the underlying CodeT5 model are distributed under the terms of the Apache License 2.0. Please refer to the LICENSE file for details.