Solidity-Llama3-8b / README.md
andrijdavid's picture
Upload tokenizer
c1275c9 verified
|
raw
history blame
3.63 kB
---
language:
- en
library_name: transformers
tags:
- code
- solidity
---
# Solidity Llama 3
## Model Details
### Model Description
Solidity Llama 3 is a Large Language Model specifically designed for Solidity code completion and infilling. It's based on the LLAMA-3 8b model and has been further trained on the DISL dataset, which contains a large and diverse collection of real-world Solidity smart contracts that have been deployed to Ethereum mainnet. The model is intended to be used for tasks such as code completion within code editors, and users should be aware of its limitations based on its training data and the inherent limitations of the technology.
- **Model type:** Code Completion
- **License:** [More Information Needed]
- **Finetuned from model:** LLAMA-3 8b
## Uses
### Direct Use
Solidity Llama 3 can be used for code completion and infilling tasks within Solidity code editors. It was trained for this task using the fill-in-the-middle (FIM) objective, where you provide a prefix and a suffix as context for the completion. The following tokens are used to separate the different parts of the input:
- <|reserved_special_token_11|> precedes the context before the completion we want to run.
- <|reserved_special_token_10|> precedes the suffix. You must put this token exactly where the cursor would be positioned in an editor, as this is the location that will be completed by the model.
- <|reserved_special_token_12|> is the prompt that invites the model to run the generation.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
FIM_SUFFIX = "<|reserved_special_token_10|>"
FIM_PREFIX = "<|reserved_special_token_11|>"
FIM_MIDDLE = "<|reserved_special_token_12|>"
tokenizer = AutoTokenizer.from_pretrained("andrijdavid/Solidity-Llama3-8b")
model = AutoModelForCausalLM.from_pretrained("andrijdavid/Solidity-Llama3-8b")
prompt = f'''{FIM_PREFIX}contract SendEther {{
function sendViaTransfer(address payable _to) public payable {{
// This function is no longer recommended for sending Ether.
_to.transfer(msg.value);
}}
function sendViaSend(address payable _to) public payable {{
// Send returns a boolean value indicating success or failure.
// This function is not recommended for sending Ether.
{FIM_SUFFIX}
}}
function sendViaCall(address payable _to) public payable {{
// Call returns a boolean value indicating success or failure.
// This is the current recommended method to use.
(bool sent, bytes memory data) = _to.call{{value: msg.value}}("");
require(sent, "Failed to send Ether");
}}{FIM_MIDDLE}
'''
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
prompt_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0][prompt_len:]))
```
You can provide a list of terminators to the generate function, like this:
```python
terminators = tokenizer.convert_tokens_to_ids([FIM_PREFIX, FIM_MIDDLE, FIM_SUFFIX])
terminators += [tokenizer.eos_token_id]
outputs = model.generate(
**inputs,
max_new_tokens=1024,
eos_token_id=terminators,
)
print(tokenizer.decode(outputs[0][prompt_len:]))
```
### Out-of-Scope Use
The model may not perform well for tasks outside of Solidity code completion and infilling, and users should be aware of its limitations in these areas.
## Bias, Risks, and Limitations
The model's performance may be affected by biases in the training data, and users should be aware of these limitations. More information is needed for further recommendations.