--- license: cc-by-nc-nd-4.0 language: - es pipeline_tag: text-generation tags: - dialogue - conversational - gpt - gpt2 - text-generation inference: false --- # DialoGPT-medium-spanish-chitchat ## Description This is a **transformer-decoder** [GPT-2 model](https://huggingface.co/gpt2), adapted for **single-turn dialogue tasks in Spanish**. We fine-tuned a [DialoGPT-medium](https://huggingface.co/microsoft/DialoGPT-medium) 345M parameters model from Microsoft, following the CLM (Causal Language Modelling) objective. We used one of the datasets available in the [Bot Framework Tools repository](https://github.com/microsoft/botframework-cli). We processed [the professional-styled personality chat dataset in Spanish](https://github.com/microsoft/botframework-cli/blob/main/packages/qnamaker/docs/chit-chat-dataset.md), the file is available [here to download](https://qnamakerstore.blob.core.windows.net/qnamakerdata/editorial/spanish/qna_chitchat_professional.tsv) --- ## Example inference script ### Check at this example script to run this model in inference mode ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer CHAT_TURNS = 5 MAX_LENGTH = 1000 model = AutoModelForCausalLM.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat') tokenizer = AutoTokenizer.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat') device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) for i in range(CHAT_TURNS): user_input = input(f"Step - {i} >> user prompt ->") with torch.no_grad(): # User turn, where "user_input" is the question (single-turn dialogue task) user_inputs_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors="pt") user_inputs_ids = user_inputs_ids.to(device) # The chat history adds the generated tokens for the answer chat_history = model.generate(user_inputs_ids, max_length=MAX_LENGTH, pad_token_id=tokenizer.eos_token_id) # decode just the last generated output tokens from the model (do not include the user prompt again) step_model_answer = tokenizer.decode(chat_history[:, user_inputs_ids.shape[-1]:][0], skip_special_tokens=True) print(f"Step - {i} >> DialoGPT-spanish model answer -> {step_model_answer}") ``` --- ## Examples ### Single-turn generation examples from an interactive environment |Role | Response | |---------|--------| |User | Hola, buenos dias | | Bot | Buenos días| |User | ¿Qué tal estás? | | Bot | Fenomenal, gracias. | |User | ¡Tengo hambre! | | Bot | Quizá picar algo te ayude. | |User | Estoy triste | | Bot | Siento escuchar eso. | |User | ¿Tienes personalidad? | | Bot | Soy un bot. | |User | ¡Cuenta un chiste! | | Bot | En realidad no tengo tanta gracia. | |User | Jajajajajajajaja | | Bot | ¡Me alegro de que te rías! | --- ## Fine-tuning hyper-parameters | **Hyper-parameter** | **Value** | |:----------------------------------------:|:---------------------------:| | Validation partition (%) | 20% | | Training batch size | 8 | | Learning rate | 5e-4 | | Max training epochs | 20 | | Warmup training steps (%) | 6% | | Weight decay | 0.01 | | Optimiser (beta1, beta2, epsilon) | AdamW (0.9, 0.999, 1e-08) | | Monitoring metric (delta, patience) | validation loss (0.1, 3) | ## Fine-tuning in a different dataset or style If you want to fine-tune your own dialogue model, we recommend you to start from the [DialoGPT model](https://huggingface.co/microsoft/DialoGPT-medium). You can check the [original GitHub repository](https://github.com/microsoft/DialoGPT). ## Limitations - This model is intended to be used **just for single-turn chitchat conversations in Spanish**. - This model's generation capabilities are limited to the extent of the aforementioned fine-tuning dataset. - This model generates short answers, providing general context dialogue in a professional style.