Model overview
This model was trained in terms of GenChal 2022: Feedback Comment Generation for Writing Learning shared task
In this task, the model gets the string with text with the error and the exact span of the error and should return the comment in natural language, which explains the nature of the error.
How to use
!pip install feedback_generation_nigula
from feedback_generation_nigula.generator import FeedbackGenerator
fg = FeedbackGenerator(cuda_index = 0)
text_with_error = "The smoke flow my face ."
error_span = (10,17)
fg.get_feedback([text_with_error ], [error_span ])
# expected output ["When the <verb> <<flow>> is used as an <intransitive verb> to express'' to move in a stream'', a <preposition> needs to be placed to indicate the direction"]
Model training details
Data
The data was provided in the following way
input sentence [\t] offset range [\t] feedback comment
Here are some examples
The smoke flow my face . 10:17 When the <verb> <<flow>> is used as an <intransitive verb> to express ''to move in a stream'', a <preposition> needs to be placed to indicate the direction. 'To' and 'towards' are <prepositions> that indicate direction.
I want to stop smoking during driving bicycle . 23:29 A <gerund> does not normally follow the <preposition> <<during>>. Think of an expression using the <conjunction> 'while' instead of a <preposition>.
Grammar termins are highlighted with '< ... >' marks and word examples - with '<< ... >>'
Data preprocessing
We lowercased the text, split it from any punctuation, including task specific marks (<< >>) and explicitly pointed out the error in the original text using << >>.
the smoke < < flow > > < < my > > face . 10:17 When the < verb > < < flow > > is used as an < intransitive verb > to express '' to move in a stream '', a < preposition > needs to be placed to indicate the direction. ' to ' and ' towards ' are < prepositions > that indicate direction .
i want to stop smoking < < during > > driving bicycle . 23:29 a < gerund > does not normally follow the < preposition > < < during > > . think of an expression using the < conjunction > ' while ' instead of a < preposition > .
Data augmentation
The main feature of our training pipeline was data augmentation. The idea of the augmentation is as follows: we cut the existing text with error after the last word which was syntactically connected to the words inside the error span (syntactic dependencies were automatically parsed with spacy) and this cut version of the text with error was used as a prompt for language model (we used GPT-Neo 1.3B).
Using both initial and augmented data we fine-tuned t5-large.
Licensing Information
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Downloads last month
- 17