andrijdavid
/

Solidity-Llama3-8b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Solidity-Llama3-8b / README.md

andrijdavid's picture

Upload tokenizer

c1275c9 verified 4 months ago

|

3.63 kB

	---
	language:
	- en
	library_name: transformers
	tags:
	- code
	- solidity
	---
	# Solidity Llama 3

	## Model Details

	### Model Description

	Solidity Llama 3 is a Large Language Model specifically designed for Solidity code completion and infilling. It's based on the LLAMA-3 8b model and has been further trained on the DISL dataset, which contains a large and diverse collection of real-world Solidity smart contracts that have been deployed to Ethereum mainnet. The model is intended to be used for tasks such as code completion within code editors, and users should be aware of its limitations based on its training data and the inherent limitations of the technology.

	- Model type: Code Completion
	- License: [More Information Needed]
	- Finetuned from model: LLAMA-3 8b

	## Uses

	### Direct Use

	Solidity Llama 3 can be used for code completion and infilling tasks within Solidity code editors. It was trained for this task using the fill-in-the-middle (FIM) objective, where you provide a prefix and a suffix as context for the completion. The following tokens are used to separate the different parts of the input:
	- <\|reserved_special_token_11\|> precedes the context before the completion we want to run.
	- <\|reserved_special_token_10\|> precedes the suffix. You must put this token exactly where the cursor would be positioned in an editor, as this is the location that will be completed by the model.
	- <\|reserved_special_token_12\|> is the prompt that invites the model to run the generation.


	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	FIM_SUFFIX = "<\|reserved_special_token_10\|>"
	FIM_PREFIX = "<\|reserved_special_token_11\|>"
	FIM_MIDDLE = "<\|reserved_special_token_12\|>"
	tokenizer = AutoTokenizer.from_pretrained("andrijdavid/Solidity-Llama3-8b")
	model = AutoModelForCausalLM.from_pretrained("andrijdavid/Solidity-Llama3-8b")

	prompt = f'''{FIM_PREFIX}contract SendEther {{
	function sendViaTransfer(address payable _to) public payable {{
	// This function is no longer recommended for sending Ether.
	_to.transfer(msg.value);
	}}

	function sendViaSend(address payable _to) public payable {{
	// Send returns a boolean value indicating success or failure.
	// This function is not recommended for sending Ether.
	{FIM_SUFFIX}
	}}

	function sendViaCall(address payable _to) public payable {{
	// Call returns a boolean value indicating success or failure.
	// This is the current recommended method to use.
	(bool sent, bytes memory data) = _to.call{{value: msg.value}}("");
	require(sent, "Failed to send Ether");
	}}{FIM_MIDDLE}
	'''
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	prompt_len = inputs["input_ids"].shape[-1]
	outputs = model.generate(**inputs, max_new_tokens=1024)
	print(tokenizer.decode(outputs[0][prompt_len:]))

	```

	You can provide a list of terminators to the generate function, like this:

	```python

	terminators = tokenizer.convert_tokens_to_ids([FIM_PREFIX, FIM_MIDDLE, FIM_SUFFIX])
	terminators += [tokenizer.eos_token_id]

	outputs = model.generate(
	**inputs,
	max_new_tokens=1024,
	eos_token_id=terminators,
	)
	print(tokenizer.decode(outputs[0][prompt_len:]))

	```

	### Out-of-Scope Use

	The model may not perform well for tasks outside of Solidity code completion and infilling, and users should be aware of its limitations in these areas.

	## Bias, Risks, and Limitations

	The model's performance may be affected by biases in the training data, and users should be aware of these limitations. More information is needed for further recommendations.