|
--- |
|
language: |
|
- en |
|
--- |
|
# Model Card: Context-to-QA-Generation using GPT-2 |
|
# Description |
|
This model is designed to generate questions, answers, hints, and multiple-choice options based on a given input context. It uses a fine-tuned GPT-2 model that has been trained to perform the task of generating questions and related content for a provided context. The model is trained to understand and follow the structure of providing questions, answers, hints, and multiple-choice options. |
|
|
|
# Intended Use |
|
This model is intended to be used for generating questions, answers, hints, and multiple-choice options based on a given context. It can be used for educational purposes, exam preparation, content creation, and other applications where automatic question generation is needed. |
|
|
|
# Limitations |
|
The quality of generated questions, answers, and hints depends on the quality and complexity of the input context. Simpler contexts are more likely to yield accurate and coherent outputs. |
|
The model may sometimes generate incorrect or nonsensical content, especially when the input context is complex or ambiguous. |
|
The model's output may be influenced by biases present in the training data, potentially leading to biased or inappropriate content generation. |
|
|
|
|
|
```python |
|
|
|
#!pip install transformers |
|
from transformers import AutoTokenizer, GPT2LMHeadModel |
|
checkpoint = "AbdelrahmanFakhry/finetuned-gpt2-multi-QA-Generation" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
model = GPT2LMHeadModel.from_pretrained(checkpoint) |
|
# Retrieve a test question from the test dataset |
|
#test_text = test_dataset.to_dict()['question'][3] |
|
|
|
|
|
# test_text should be like that |
|
test_text = '''Below is input text, the task is to generate questions from input text and multiple answers for |
|
each question and provide hint and correct answer for each question.\n\n### Input:\n<hl> Local intercellular |
|
communication is the province of the paracrine , also called a paracrine factor , which is a chemical that |
|
induces a response in neighboring cells . <hl> Although paracrines may enter the bloodstream , their concentration |
|
is generally too low to elicit a response from distant tissues . A familiar example to those with asthma is histamine , |
|
a paracrine that is released by immune cells in the bronchial tree . Histamine causes the smooth muscle cells of the bronchi |
|
to constrict , narrowing the airways . Another example is the neurotransmitters of the nervous system , which act only locally |
|
within the synaptic cleft .\n\n### Response: ''' |
|
|
|
|
|
|
|
def inference(text, model, tokenizer, max_input_tokens=3000, max_output_tokens=500): |
|
""" |
|
Generate text continuation based on the given input text using a pretrained model. |
|
|
|
Args: |
|
text (str): The input text for which to generate a continuation. |
|
model (PreTrainedModel): The pretrained model to use for text generation. |
|
tokenizer (PreTrainedTokenizer): The tokenizer used to preprocess the input and decode the output. |
|
max_input_tokens (int): Maximum number of tokens allowed for the input text. |
|
max_output_tokens (int): Maximum number of tokens in the generated output. |
|
|
|
Returns: |
|
generated_text_answer (str): The generated text continuation. |
|
""" |
|
# Tokenize the input text |
|
input_ids = tokenizer.encode( |
|
text, |
|
return_tensors="pt", |
|
truncation=True, |
|
max_length=max_input_tokens |
|
) |
|
|
|
# Generate text continuation |
|
device = model.device |
|
generated_tokens_with_prompt = model.generate( |
|
input_ids=input_ids.to(device), |
|
max_length=max_output_tokens |
|
) |
|
|
|
# Decode the generated tokens into text |
|
generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True) |
|
|
|
# Extract the generated text continuation without the input prompt |
|
generated_text_answer = generated_text_with_prompt[0][len(text):] |
|
generated_text_answer = generated_text_answer.lstrip(" '][{").rstrip(" '][{}") |
|
return generated_text_answer |
|
|
|
generated_answer = inference(test_text, model, tokenizer) |
|
#Generated Answer should be look like that: |
|
''' |
|
"Choices': ['paracrine factor', 'paracrine factor', 'paracrine factor II', 'paracrine factor III'], |
|
'Question': 'Which of the following is not a paracrine factor?', |
|
'answer': 'paracrine factor II', |
|
'hint': 'Local intercellular communication is the province of the paracrine, also called a paracrine factor, |
|
which is a chemical that induces a response in neighboring cells." |
|
''' |
|
print('Generated Answer:') |
|
print(generated_answer) |
|
``` |
|
|
|
# Acknowledgments |
|
This model is built upon the GPT-2 architecture and fine-tuned using a custom dataset for the specific task of generating questions, answers, hints, and choices. |
|
|
|
# Disclaimer |
|
This model's performance may vary depending on the input context and task requirements. It is recommended to review and edit the generated content before using it in critical applications. The model's limitations and biases should also be considered when interpreting its outputs. |
|
|