---
license: cc-by-nc-nd-4.0
language:
- es
pipeline_tag: text-generation
tags:
- dialogue
- conversational
- gpt
- gpt2
- text-generation
inference: false
---

# DialoGPT-medium-spanish-chitchat

## Description

This is a **transformer-decoder** [GPT-2 model](https://huggingface.co/gpt2), adapted for **single-turn dialogue tasks in Spanish**. We fine-tuned a [DialoGPT-medium](https://huggingface.co/microsoft/DialoGPT-medium) 345M parameters model from Microsoft, following the CLM (Causal Language Modelling) objective.
We used one of the datasets available in the [Bot Framework Tools repository](https://github.com/microsoft/botframework-cli). We processed [the professional-styled personality chat dataset in Spanish](https://github.com/microsoft/botframework-cli/blob/main/packages/qnamaker/docs/chit-chat-dataset.md), the file is available [here to download](https://qnamakerstore.blob.core.windows.net/qnamakerdata/editorial/spanish/qna_chitchat_professional.tsv)

---

## Example inference script

### Check at this example script to run this model in inference mode

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

CHAT_TURNS = 5
MAX_LENGTH = 1000

model = AutoModelForCausalLM.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat')
tokenizer = AutoTokenizer.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
for i in range(CHAT_TURNS):
  user_input = input(f"Step - {i} >> user prompt ->")
  with torch.no_grad():
    # User turn, where "user_input" is the question (single-turn dialogue task)
    user_inputs_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors="pt")
    user_inputs_ids = user_inputs_ids.to(device)
    # The chat history adds the generated tokens for the answer
    chat_history = model.generate(user_inputs_ids, max_length=MAX_LENGTH, pad_token_id=tokenizer.eos_token_id)
    # decode just the last generated output tokens from the model (do not include the user prompt again)
    step_model_answer = tokenizer.decode(chat_history[:, user_inputs_ids.shape[-1]:][0], skip_special_tokens=True)
  print(f"Step - {i} >> DialoGPT-spanish model answer -> {step_model_answer}")
```
---

## Examples

### Single-turn generation examples from an interactive environment

|Role | Response |
|---------|--------|
|User | Hola, buenos dias |
| Bot | Buenos días|
|User | ¿Qué tal estás? |
| Bot | Fenomenal, gracias. |
|User | ¡Tengo hambre! |
| Bot | Quizá picar algo te ayude. |
|User | Estoy triste |
| Bot | Siento escuchar eso. |
|User | ¿Tienes personalidad? |
| Bot | Soy un bot. |
|User | ¡Cuenta un chiste! |
| Bot |  En realidad no tengo tanta gracia. |
|User | Jajajajajajajaja |
| Bot | ¡Me alegro de que te rías! |

---

## Fine-tuning hyper-parameters

|            **Hyper-parameter**           |          **Value**          |
|:----------------------------------------:|:---------------------------:|
|         Validation partition (%)         |             20%             |
|            Training batch size           |              8              |
|               Learning rate              |             5e-4            |
|            Max training epochs           |              20             |
|         Warmup training steps (%)        |             6%              |
|               Weight decay               |             0.01            |
| Optimiser (beta1, beta2, epsilon) | AdamW (0.9, 0.999, 1e-08) |
|  Monitoring metric (delta, patience)     |   validation loss (0.1, 3)  |


## Fine-tuning in a different dataset or style

If you want to fine-tune your own dialogue model, we recommend you to start from the [DialoGPT model](https://huggingface.co/microsoft/DialoGPT-medium).
You can check the [original GitHub repository](https://github.com/microsoft/DialoGPT).

## Limitations

- This model is intended to be used **just for single-turn chitchat conversations in Spanish**.
- This model's generation capabilities are limited to the extent of the aforementioned fine-tuning dataset.
- This model generates short answers, providing general context dialogue in a professional style.