Data preparation and fine-tuning
#2
by
snehilsanyal
- opened
Hey, @JacksonLark can you provide some information on data preparation for the OIG datasets? Also, how did you split the dataset for fine-tuning? Any supporting resources would be helpful.
Hey, @JacksonLark can you provide some information on data preparation for the OIG datasets? Also, how did you split the dataset for fine-tuning? Any supporting resources would be helpful.
- data preparation depends on training code. My training data like:
{
"instruction":"Given the following schema:\nroad (road_name, state_name)\nstate (state_name, capital, population, area, country_name, density)\nhighlow (state_name, highest_point, highest_elevation, lowest_point, lowest_elevation)\nlake (lake_name, area, state_name, country_name)\nriver (river_name, length, traverse, country_name)\nborder_info (state_name, border)\nmountain (mountain_name, mountain_altitude, state_name, country_name)\ncity (city_name, state_name, population, country_name)\nWrite a SQL query to what states does the mississippi river run through",
"input":"",
"output":"SELECT traverse FROM river WHERE river_name = \"mississippi\" ;"
}
- fine tuning, simple you can use hf example code: https://github.com/huggingface/transformers/blob/main/examples/pytorch/summarization/README.md
Do you have a lead now @snehilsanyal , as I am new to these stuffs, any help from you would be grateful.
@NikAlan sorry, I have been a bit busy in other works, will start again on fine-tuning, will let you know.