Spaces:

pankaj9075rawat
/

transliteration

Sleeping

App Files Files Community

Pankaj Singh Rawat commited on Oct 4, 2024

Commit

9e582c5

0 Parent(s):

Initial commit

Browse files

Files changed (31) hide show

.gitattributes +35 -0
.gitignore +3 -0
README copy.md +130 -0
README.md +12 -0
app.py +23 -0
fast_api.py +38 -0
inference/__init__.py +0 -0
inference/decoder (1).pth +3 -0
inference/demo.ipynb +185 -0
inference/encoder (1).pth +3 -0
inference/input_lang.pkl +3 -0
inference/language.py +25 -0
inference/output_lang.pkl +3 -0
inference/transformer.py +54 -0
inference/utility.py +172 -0
main.py +26 -0
notebooks/encoder_decoder_RNNs.ipynb +1924 -0
notebooks/transformers.ipynb +1929 -0
predictions_attention/predictions.csv +0 -0
predictions_transformer/predictions.csv +0 -0
predictions_vanilla/predictions _vanilla.csv +0 -0
requirements.txt +93 -0
src/decoder.py +46 -0
src/encoder.py +39 -0
src/helper.py +60 -0
src/language.py +24 -0
src/translator.py +159 -0
test_best_attention.py +380 -0
test_best_vanilla.py +35 -0
train.py +91 -0
train_attention.py +406 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+wandb
+__pycache__
+transliteration

README copy.md ADDED Viewed

	@@ -0,0 +1,130 @@

+# Sequence to Sequence Language Transliteration using RNNs and Transformers
+This repository contains the files for the third assignment of the course CS6910 - Deep Learning at IIT Madras.
+The transformers part was added later and was not part of the assignment.
+Implemented a Encoder Decoder Architecture with/without Attention Mechanism, and later with Transformers, and used then to perform Transliteration on the Akshanrankar Dataset(Englist-Hindi transliteration pairs) provided. These models where built using RNN, LSTM and GRU cells provided by PyTorch.
+Transformers architecture is built from scratch following the "Attention is All You Need" paper. Used basic feed forward and embeddings layers from pyTorch.
+Jump to Section: [Usage](#usage)
+Report: [Report](https://wandb.ai/iitmadras/CS6910_Assignment_3/reports/CS6910-Assignment-3-Report--Vmlldzo0MzQyNDk5)
+## Encoder
+The encoder is a simple cell of either LSTM, RNN or GRU. The input to the encoder is a sequence of characters and the output is a sequence of hidden states. The hidden state of the last time step is used as the context vector for the decoder.
+Encoder can also be a transformer encoder with multiple layers containing self-attention mechanism. The output generated by the encoder is fed to the decoder of transformers.
+## Decoder
+The decoder is again a simple cell of either LSTM, RNN or GRU. The input to the decoder is the hidden state of the encoder and the output of the previous time step. The output of the decoder is a sequence of characters. The decoder has an additional fully connected layer and a log softmax which is used to predict the next character.
+Decoder can also be a transformer decoder with multiple layers containing masked self-attention and masked cross-attention mechanism. The output generated by the encoder is fed as input to the decoder of transformers. Next character prediction model is used to generate the complete target sequence in Hindi.
+## Attention Mechanism
+The attention mechanism is implemented using the dot product attention mechanism. The attentions are calulated by a weighted sum of softmax values of dot products of the hidden states of the decoder and the hidden states of the encoder. The attention values are then concatenated with the hidden states of the decoder and passed through a fully connected layer to get the output of the decoder.
+## Dataset
+The dataset used is the Aksharankar Dataset provided by the course. The dataset contains 3 files, namely, `train.csv`, `valid.csv` and `test.csv` for each language for a subset of indian languages. I have used the Tamil dataset for this assignment. The dataset contains 2 columns, namely, `English` and `Hindi` words which are the input and output strings respectively.
+## Used Python Libraries and Version
+ - Python 3.10.9
+ - Pytorch 1.13.1
+ - Pandas 1.5.3
+## Usage
+To run the training code for the standard encoder decoder architecture using the best set of hyperparameters, run the following command:
+```bash
+python3 train.py
+```
+To run the training code for the encoder decoder architecture with attention mechanism using the best set of hyperparameters, run the following command:
+```bash
+python3 train_attention.py
+```
+To run the inference code for the standard encoder decoder architecture using the best set of hyperparameters, run the following command: (This uses the state dicts stored in the best_models folder and creates a file named test_gen.txt with the test predictions)
+```bash
+python3 test_best_vanilla.py
+```
+To run the inference code for the encoder decoder architecture with attention mechanism using the best set of hyperparameters, run the following command: (This uses the state dicts stored in the best_models folder and creates a file named test_gen.txt with the test predictions)
+```bash
+python3 test_best_attention.py
+```
+To run with custom hyperparameters, run the following command:
+```bash
+python3 train.py -h
+```
+```bash
+# The output of the above command is as follows:
+usage: train.py [-h]
+                [-es EMBED_SIZE]
+                [-hs HIDDEN_SIZE]
+                [-ct CELL_TYPE]
+                [-nl NUM_LAYERS]
+                [-d DROPOUT]
+                [-lr LEARNING_RATE]
+                [-o OPTIMIZER]
+                [-l LANGUAGE]
+Transliteration Model
+options:
+  -h, --help                                        show this help message and exit
+  -es EMBED_SIZE, --embed_size EMBED_SIZE           Embedding Size, good_choices = [8, 16, 32]
+  -hs HIDDEN_SIZE, --hidden_size HIDDEN_SIZE        Hidden Size, good_choices = [128, 256, 512]
+  -ct CELL_TYPE, --cell_type CELL_TYPE              Cell Type, choices: [LSTM, GRU, RNN]
+  -nl NUM_LAYERS, --num_layers NUM_LAYERS           Number of Layers, choices: [1, 2, 3]
+  -d DROPOUT, --dropout DROPOUT                     Dropout, good_choices: [0, 0.1, 0.2]
+  -lr LEARNING_RATE, --learning_rate LEARNING_RATE  Learning Rate, good_choices: [0.0005, 0.001, 0.005]
+  -o OPTIMIZER, --optimizer OPTIMIZER               Optimizer, choices: [SGD, ADAM]
+  -l LANGUAGE, --language LANGUAGE                  Language
+```
+To run the training code for the attention mechanism with custom hyperparameters, run the following command:
+```bash
+python3 train_attention.py -h
+```
+```bash
+usage: train_attention.py [-h]
+                          [-es EMBED_SIZE]
+                          [-hs HIDDEN_SIZE]
+                          [-ct CELL_TYPE]
+                          [-nl NUM_LAYERS]
+                          [-dr DROPOUT]
+                          [-lr LEARNING_RATE]
+                          [-op OPTIMIZER]
+                          [-wd WEIGHT_DECAY]
+                          [-l LANG]
+Transliteration Model with Attention
+options:
+  -h, --help                                        show this help message and exit
+  -es EMBED_SIZE, --embed_size EMBED_SIZE           Embedding size
+  -hs HIDDEN_SIZE, --hidden_size HIDDEN_SIZE        Hidden size
+  -ct CELL_TYPE, --cell_type CELL_TYPE              Cell type
+  -nl NUM_LAYERS, --num_layers NUM_LAYERS           Number of layers
+  -dr DROPOUT, --dropout DROPOUT                    Dropout
+  -lr LEARNING_RATE, --learning_rate LEARNING_RATE  Learning rate
+  -op OPTIMIZER, --optimizer OPTIMIZER              Optimizer
+  -wd WEIGHT_DECAY, --weight_decay WEIGHT_DECAY     Weight decay
+  -l LANG, --lang LANG                              Language
+```

README.md ADDED Viewed

	@@ -0,0 +1,12 @@

+---
+title: Transliteration
+emoji: 📈
+colorFrom: green
+colorTo: gray
+sdk: gradio
+sdk_version: 4.44.1
+app_file: app.py
+pinned: false
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,23 @@

+import gradio as gr
+from inference.language import Language
+from inference.utility import Encoder, Decoder, encoderBlock, decoderBlock, MultiHeadAttention, Head, FeedForward
+from inference.transformer import generate
+# Function to call the FastAPI backend
+def predict(user_input):
+    # Prepare the data to send to the FastAPI API
+    input = user_input.split(" ")
+    result = generate(input)
+    # Extract the answer
+    return " ".join(result)
+# Launch the Gradio interface
+if __name__ == "__main__":
+    gr.Interface(predict,
+                 inputs=gr.Textbox(placeholder="Your Hinglish text"),
+                 outputs=gr.Textbox(placeholder="Output Hindi text"),
+                 description="A English to Hindi Transliteration app",
+                 examples=["namaste aapko", "kese ho aap, sab badiya"]).launch(share=False)

fast_api.py ADDED Viewed

	@@ -0,0 +1,38 @@

+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from inference.language import Language
+from inference.utility import Encoder, Decoder, encoderBlock, decoderBlock, MultiHeadAttention, Head, FeedForward
+from inference.transformer import generate
+from typing import List
+import uvicorn
+# Initialize FastAPI app
+app = FastAPI()
+# Create a request model to define the input for the transliteration pipeline
+class TransRequest(BaseModel):
+    query: str
+# Create a response model to define the output of the RAG pipeline
+class TransResponse(BaseModel):
+    response: List[str]
+# Define a FastAPI endpoint for transliteration pipeline
+@app.post("/trans", response_model=TransResponse)
+async def get_transliteration(request: TransRequest):
+    try:
+        # Call the RAG pipeline function with the query
+        input = request.query.split(" ")
+        result = generate(input)
+        return TransResponse(
+            response=result
+        )
+    except Exception as e:
+        # In case of an error, return an HTTPException with a 500 status code
+        raise HTTPException(status_code=500, detail=str(e))
+# Run the FastAPI application (for local testing)
+if __name__ == "__main__":
+    uvicorn.run(app, host="127.0.0.1", port=8000)

inference/__init__.py ADDED Viewed

File without changes

inference/decoder (1).pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66c2e34a0036672d568c9b5faa09bcd80f75f0edeea6053aeacb20b8094aace6
+size 14137430

inference/demo.ipynb ADDED Viewed

	@@ -0,0 +1,185 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import pickle\n",
+    "from language import Language\n",
+    "from utility import Encoder, Decoder, encoderBlock, decoderBlock, MultiHeadAttention, Head, FeedForward\n",
+    "import warnings\n",
+    "from typing import List\n",
+    "warnings.filterwarnings(\"ignore\", category=FutureWarning)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['लॉक्सलाक्राक्यालालासी']"
+      ]
+     },
+     "execution_count": 44,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "s = 'a' * 1\n",
+    "generate([s])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'^थ्रालाष्राप्टोार्फ्रास्रफ्फ्फ्'"
+      ]
+     },
+     "execution_count": 39,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "output_lang.decode(o.tolist()[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([20,  4,  5, 12,  4,  3])"
+      ]
+     },
+     "execution_count": 28,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "s = \"pankaj\"\n",
+    "torch.tensor(input_lang.encode(s), device=device, dtype=torch.long)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Running on local URL:  http://127.0.0.1:7864\n",
+      "\n",
+      "Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div><iframe src=\"http://127.0.0.1:7864/\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "import gradio as gr\n",
+    "\n",
+    "# Define the API endpoint\n",
+    "API_URL = \"http://127.0.0.1:8000/trans\"\n",
+    "\n",
+    "# Function to call the FastAPI backend\n",
+    "def predict(user_input):\n",
+    "    # Prepare the data to send to the FastAPI API\n",
+    "    payload = {\"query\": user_input}\n",
+    "    \n",
+    "    # Make a request to the FastAPI backend\n",
+    "    response = requests.post(API_URL, json=payload)\n",
+    "    \n",
+    "    # Get the response JSON\n",
+    "    result = response.json()\n",
+    "    \n",
+    "    # Extract the answer \n",
+    "    return \" \".join(result[\"response\"])\n",
+    "    \n",
+    "\n",
+    "# Launch the Gradio interface\n",
+    "if __name__ == \"__main__\":\n",
+    "    gr.Interface(predict,\n",
+    "                 inputs=['textbox'],\n",
+    "                 outputs=['text']).launch(share=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import gradio as gr\n",
+    "\n",
+    "def greet(name, intensity):\n",
+    "    return \"Hello, \" + name + \"!\" * int(intensity)\n",
+    "\n",
+    "demo = gr.Interface(\n",
+    "    fn=greet,\n",
+    "    inputs=[\"text\", \"slider\"],\n",
+    "    outputs=[\"text\"],\n",
+    ")\n",
+    "\n",
+    "demo.launch()\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "transliteration",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

inference/encoder (1).pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b433380e98834313e590be330b6c2543eb8131ce043922fd95d37def151d1e7c
+size 9799034

inference/input_lang.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9671af04c3790c0ae6183278806e948f2674db3506d3b1a1c65b924a4adbec78
+size 396

inference/language.py ADDED Viewed

	@@ -0,0 +1,25 @@

+class Language:
+    def __init__(self, name):
+        self.name = name
+        self.char2index = {'#': 0, '$': 1, '^': 2}   # '^': start of sequence, '$' : unknown char, '#' : padding
+        self.index2char = {0: '#', 1: '$', 2: '^'}
+        self.vocab_size = 3  # Count
+    def addWord(self, word):
+        for char in word:
+            self.addChar(char)
+    def addChar(self, char):
+        if char not in self.char2index:
+            self.char2index[char] = self.vocab_size
+            self.index2char[self.vocab_size] = char
+            self.vocab_size += 1
+    def encode(self, s):
+        return [self.char2index[ch] for ch in s]
+    def decode(self, l):
+        return ''.join([self.index2char[i] for i in l])
+    def vocab(self):
+        return self.char2index.keys()

inference/output_lang.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:412857775d61629580d62df66b7e663bab247467f9a975920df3b1265acd6326
+size 964

inference/transformer.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import torch
+import pickle
+import sys
+import os
+from inference.language import Language
+from inference.utility import Encoder, Decoder, encoderBlock, decoderBlock, MultiHeadAttention, Head, FeedForward
+import warnings
+from typing import List
+warnings.filterwarnings("ignore", category=FutureWarning)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+with open(os.path.join(os.path.dirname(__file__), 'input_lang.pkl'), "rb") as file:
+    input_lang = pickle.load(file)
+with open(os.path.join(os.path.dirname(__file__), 'output_lang.pkl'), "rb") as file:
+    output_lang = pickle.load(file)
+encoder = torch.load(os.path.join(os.path.dirname(__file__), 'encoder (1).pth'), map_location=device)
+decoder = torch.load(os.path.join(os.path.dirname(__file__), 'decoder (1).pth'), map_location=device)
+input_vocab_size = input_lang.vocab_size
+output_vocab_size = output_lang.vocab_size
+def encode(s):
+    return [input_lang.char2index.get(ch, input_lang.char2index['$']) for ch in s]
+def generate(input: List[str]) -> List[str]:
+    # pre-process the input: same length and max_length = 33
+    for i, inp in enumerate(input):
+        input[i] = input[i][:33] if len(input[i]) > 33 else input[i].ljust(33, '#')
+    input = torch.tensor([encode(i) for i in input], device=device, dtype=torch.long)
+    B, T = input.shape
+    encoder_output = encoder(input)
+    idx = torch.full((B, 1), 2, dtype=torch.long, device=device) # (B,1)
+    # idx is (B, T) array of indices in the current context
+    for _ in range(30):
+        # get the predictions
+        logits, loss = decoder(idx, encoder_output) # logits (B, T, vocab_size)
+        # focus only on the last time step
+        logits = logits[:, -1, :] # becomes (B, C)
+        # apply softmax to get probabilities
+        idx_next = torch.argmax(logits, dim=-1, keepdim=True) # (B, 1)
+        # append sampled index to the running sequence
+        idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)
+    ans = []
+    for id in idx:
+        ans.append(output_lang.decode(id.tolist()[1:]).split('#', 1)[0])
+    return ans

inference/utility.py ADDED Viewed

	@@ -0,0 +1,172 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+encoder_block_size = 33
+decoder_block_size = 30
+class Head(nn.Module):
+    """ one self-attention head """
+    def __init__(self, n_embd, d_k, dropout, mask=0): # d_k is dimention of key , nomaly d_k = n_embd / 4
+        super().__init__()
+        self.mask = mask
+        self.key = nn.Linear(n_embd, d_k, bias=False, device=device)
+        self.query = nn.Linear(n_embd, d_k, bias=False, device=device)
+        self.value = nn.Linear(n_embd, d_k, bias=False, device=device)
+        if mask:
+            self.register_buffer('tril', torch.tril(torch.ones(encoder_block_size, encoder_block_size, device=device)))
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x, encoder_output = None):
+        B,T,C = x.shape
+        if encoder_output is not None:
+            k = self.key(encoder_output)
+            Be, Te, Ce = encoder_output.shape
+        else:
+            k = self.key(x) # (B,T,d_k)
+        q = self.query(x) # (B,T,d_k)
+        # compute attention scores
+        wei = q @ k.transpose(-2, -1) * C**-0.5 # (B,T,T)
+        if self.mask:
+            if encoder_output is not None:
+                wei = wei.masked_fill(self.tril[:T, :Te] == 0, float('-inf')) # (B,T,T)
+            else:
+                wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B,T,T)
+        wei = F.softmax(wei, dim=-1)
+        wei = self.dropout(wei)
+        # perform weighted aggregation of values
+        if encoder_output is not None:
+            v = self.value(encoder_output)
+        else:
+            v = self.value(x)
+        out = wei @ v # (B,T,C)
+        return out
+class MultiHeadAttention(nn.Module):
+    """ multiple self attention heads in parallel """
+    def __init__(self, n_embd, num_head, d_k, dropout, mask=0):
+        super().__init__()
+        self.heads = nn.ModuleList([Head(n_embd, d_k, dropout, mask) for _ in range(num_head)])
+        self.proj = nn.Linear(n_embd, n_embd)
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x, encoder_output=None):
+        out = torch.cat([h(x, encoder_output) for h in self.heads], dim=-1)
+        out = self.dropout(self.proj(out))
+        return out
+class FeedForward(nn.Module):
+    """ multiple self attention heads in parallel """
+    def __init__(self, n_embd, dropout):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(n_embd, 4 * n_embd),
+            nn.ReLU(),
+            nn.Linear(4 * n_embd, n_embd),
+            nn.Dropout(dropout)
+        )
+    def forward(self, x):
+        return self.net(x)
+class encoderBlock(nn.Module):
+    """ Tranformer encoder block : communication followed by computation """
+    def __init__(self, n_embd, n_head, dropout):
+        super().__init__()
+        d_k = n_embd // n_head
+        self.sa = MultiHeadAttention(n_embd, n_head, d_k, dropout)
+        self.ffwd = FeedForward(n_embd, dropout)
+        self.ln1 = nn.LayerNorm(n_embd)
+        self.ln2 = nn.LayerNorm(n_embd)
+    def forward(self, x, encoder_output=None):
+        x = x + self.sa(self.ln1(x), encoder_output)
+        x = x + self.ffwd(self.ln2(x))
+        return x
+class Encoder(nn.Module):
+    def __init__(self, n_embd, n_head, n_layers, dropout):
+        super().__init__()
+        self.token_embedding_table = nn.Embedding(input_vocab_size, n_embd) # n_embd: input embedding dimension
+        self.position_embedding_table = nn.Embedding(encoder_block_size, n_embd)
+        self.blocks = nn.Sequential(*[encoderBlock(n_embd, n_head, dropout) for _ in range(n_layers)])
+        self.ln_f = nn.LayerNorm(n_embd) # final layer norm
+    def forward(self, idx):
+        B, T = idx.shape
+        tok_emb = self.token_embedding_table(idx) # (B,T,n_embd)
+        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,n_embd)
+        x = tok_emb + pos_emb # (B,T,n_embd)
+        x = self.blocks(x) # apply one attention layer (B,T,C)
+        x = self.ln_f(x) # (B,T,C)
+        return x
+class decoderBlock(nn.Module):
+    """ Tranformer decoder block : self communication then cross communication followed by computation """
+    def __init__(self, n_embd, n_head, dropout):
+        super().__init__()
+        d_k = n_embd // n_head
+        self.sa = MultiHeadAttention(n_embd, n_head, d_k, dropout, mask = 1)
+        self.ca = MultiHeadAttention(n_embd, n_head, d_k, dropout, mask = 1)
+        self.ffwd = FeedForward(n_embd, dropout)
+        self.ln1 = nn.LayerNorm(n_embd, device=device)
+        self.ln2 = nn.LayerNorm(n_embd, device=device)
+        self.ln3 = nn.LayerNorm(n_embd, device=device)
+    def forward(self, x_encoder_output):
+        x = x_encoder_output[0]
+        encoder_output = x_encoder_output[1]
+        x = x + self.sa(self.ln1(x))
+        x = x + self.ca(self.ln2(x), encoder_output)
+        x = x + self.ffwd(self.ln3(x))
+        return (x,encoder_output)
+class Decoder(nn.Module):
+    def __init__(self, n_embd, n_head, n_layers, dropout):
+        super().__init__()
+        self.token_embedding_table = nn.Embedding(output_vocab_size, n_embd) # n_embd: input embedding dimension
+        self.position_embedding_table = nn.Embedding(decoder_block_size, n_embd)
+        self.blocks = nn.Sequential(*[decoderBlock(n_embd, n_head=n_head, dropout=dropout) for _ in range(n_layers)])
+        self.ln_f = nn.LayerNorm(n_embd) # final layer norm
+        self.lm_head = nn.Linear(n_embd, output_vocab_size)
+    def forward(self, idx, encoder_output, targets=None):
+        B, T = idx.shape
+        tok_emb = self.token_embedding_table(idx) # (B,T,n_embd)
+        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,n_embd)
+        x = tok_emb + pos_emb # (B,T,n_embd)
+        x =self.blocks((x, encoder_output))
+        x = self.ln_f(x[0]) # (B,T,C)
+        logits = self.lm_head(x) # (B,T,output_vocab_size)
+        if targets is None:
+            loss = None
+        else:
+            B, T, C = logits.shape
+            temp_logits = logits.view(B*T, C)
+            targets = targets.reshape(B*T)
+            loss = F.cross_entropy(temp_logits, targets.long())
+        # print(logits)
+        # out = torch.argmax(logits)
+        return logits, loss

main.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import requests
+import gradio as gr
+# Define the API endpoint
+API_URL = "http://127.0.0.1:8000/trans"
+# Function to call the FastAPI backend
+def predict(user_input):
+    # Prepare the data to send to the FastAPI API
+    payload = {"query": user_input}
+    # Make a request to the FastAPI backend
+    response = requests.post(API_URL, json=payload)
+    # Get the response JSON
+    result = response.json()
+    # Extract the answer
+    return " ".join(result["response"])
+# Launch the Gradio interface
+if __name__ == "__main__":
+    gr.Interface(predict,
+                 inputs=['textbox'],
+                 outputs=['text']).launch(share=True)

notebooks/encoder_decoder_RNNs.ipynb ADDED Viewed

	@@ -0,0 +1,1924 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/pankajrawat9075/CS6910_assignment_3/blob/main/DL_PA3_final.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hRdpoWePeYHn"
+      },
+      "source": [
+        "## Importing Libraries and models"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0LBvFtYGCNgJ"
+      },
+      "outputs": [],
+      "source": [
+        "%%capture\n",
+        "!pip install wandb"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "zkZTzr7OCPBM"
+      },
+      "outputs": [],
+      "source": [
+        "import wandb"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "z4ZVrIumZcDt"
+      },
+      "outputs": [],
+      "source": [
+        "from __future__ import unicode_literals, print_function, division\n",
+        "from io import open\n",
+        "import unicodedata\n",
+        "import string\n",
+        "import re\n",
+        "import random\n",
+        "import pandas as pd\n",
+        "import torch\n",
+        "import torch.nn as nn\n",
+        "from torch import optim\n",
+        "import torch.nn.functional as F\n",
+        "\n",
+        "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+        "torch.cuda.empty_cache()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "qwL09v65CIse",
+        "outputId": "f1dcbc80-5110-48f9-d0c5-836a2daa05b4"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "cuda\n"
+          ]
+        }
+      ],
+      "source": [
+        "print(device)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "44xIRolL_T_d"
+      },
+      "source": [
+        "## Load Dataset"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "-XRMpx9eBzRK",
+        "outputId": "177ee7ae-bb7d-46ea-9269-fa3aa045a89e"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Mounted at /content/drive\n"
+          ]
+        }
+      ],
+      "source": [
+        "from google.colab import drive\n",
+        "drive.mount('/content/drive')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Y4zemXiyE6Fi"
+      },
+      "outputs": [],
+      "source": [
+        "class Lang:\n",
+        "    def __init__(self, name):\n",
+        "        self.name = name\n",
+        "        self.char2index = {'#': 0, '$': 1, '^': 2}\n",
+        "        self.char2count = {'#': 1, '$': 1, '^': 1}\n",
+        "        self.index2char = {0: '#', 1: '$', 2: '^'}\n",
+        "        self.n_chars = 3  # Count\n",
+        "        self.data = {}\n",
+        "        \n",
+        "\n",
+        "    def addWord(self, word):\n",
+        "        for char in word:\n",
+        "            self.addChar(char)\n",
+        "\n",
+        "    def addChar(self, char):\n",
+        "        if char not in self.char2index:\n",
+        "            self.char2index[char] = self.n_chars\n",
+        "            self.char2count[char] = 1\n",
+        "            self.index2char[self.n_chars] = char\n",
+        "            self.n_chars += 1\n",
+        "        else:\n",
+        "            self.char2count[char] += 1\n",
+        "\n",
+        "    \n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "dCR658yRvXpy"
+      },
+      "outputs": [],
+      "source": [
+        "# return max length of input and output words\n",
+        "def maxLength(data):\n",
+        "    ip_mlen, op_mlen = 0, 0\n",
+        "\n",
+        "    for i in range(len(data)):\n",
+        "        input = data[0][i]\n",
+        "        output = data[1][i]\n",
+        "        if(len(input)>ip_mlen):\n",
+        "            ip_mlen=len(input)\n",
+        "\n",
+        "        if(len(output)>op_mlen):\n",
+        "            op_mlen=len(output)\n",
+        "\n",
+        "    return ip_mlen, op_mlen"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "IDGaCO8DkYpc"
+      },
+      "outputs": [],
+      "source": [
+        "import numpy\n",
+        "input_shape = 0\n",
+        "from torch.utils.data import TensorDataset, DataLoader\n",
+        "def preprocess(data, input_lang, output_lang):\n",
+        "    maxlenInput, maxlenOutput = maxLength(data)\n",
+        "    # we use maxlenInput as 26 since it is the maximum of all input len\n",
+        "    maxlenInput = 26\n",
+        "    input = numpy.zeros((len(data), maxlenInput + 1))\n",
+        "    output = numpy.zeros((len(data), maxlenOutput + 2))\n",
+        "    maxlenInput, maxlenOutput = maxLength(data)\n",
+        "    unknown = input_lang.char2index['$']\n",
+        "\n",
+        "    for i in range(len(data)):\n",
+        "        op = '^' + data[1][i]\n",
+        "        ip = data[0][i].ljust(maxlenInput + 1, '#')\n",
+        "        op = op.ljust(maxlenOutput + 2, '#')\n",
+        "        \n",
+        "\n",
+        "        for index, char in enumerate(ip):\n",
+        "            if input_lang.char2index.get(char) is not None:\n",
+        "                input[i][index] = input_lang.char2index[char]\n",
+        "            else:\n",
+        "                input[i][index] = unknown\n",
+        "        \n",
+        "\n",
+        "        \n",
+        "        for index, char in enumerate(op):\n",
+        "            if output_lang.char2index.get(char) is not None:\n",
+        "                output[i][index] = output_lang.char2index[char]\n",
+        "            else:\n",
+        "                output[i][index] = unknown  \n",
+        "\n",
+        "    print(input.shape)\n",
+        "    print(output.shape)\n",
+        "\n",
+        "    return TensorDataset(torch.from_numpy(input), torch.from_numpy(output))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "PdS5OXKxfdCX",
+        "outputId": "178f1d73-5b0c-431d-ca9b-d9435b924c41"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "(51200, 27)\n",
+            "(51200, 22)\n",
+            "(4096, 27)\n",
+            "(4096, 22)\n",
+            "(4096, 27)\n",
+            "(4096, 22)\n"
+          ]
+        }
+      ],
+      "source": [
+        "def loadData(lang):\n",
+        "    train_df = pd.read_csv(f\"drive/MyDrive/aksharantar_sampled/{lang}/{lang}_train.csv\", header = None)\n",
+        "    val_df = pd.read_csv(f\"drive/MyDrive/aksharantar_sampled/{lang}/{lang}_valid.csv\", header = None)\n",
+        "    test_df = pd.read_csv(f\"drive/MyDrive/aksharantar_sampled/{lang}/{lang}_test.csv\", header = None)\n",
+        "\n",
+        "    input_lang = Lang('eng')\n",
+        "    output_lang = Lang(lang)\n",
+        "    \n",
+        "    # add the words to the respective languages\n",
+        "    for i in range(len(train_df)):\n",
+        "        \n",
+        "        input_lang.addWord(train_df[0][i])\n",
+        "        output_lang.addWord(train_df[1][i])\n",
+        "\n",
+        "    # print(input_lang.char2index)\n",
+        "    # print(input_lang.index2char)\n",
+        "    trainDataset = preprocess(train_df, input_lang, output_lang)\n",
+        "    testDataset = preprocess(test_df, input_lang, output_lang)\n",
+        "    valDataset = preprocess(val_df, input_lang, output_lang)\n",
+        "\n",
+        "    return trainDataset, testDataset, valDataset, input_lang, output_lang\n",
+        "\n",
+        "\n",
+        "trainData, testData, valData, ipLang, opLang = loadData('hin')\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "SvmzS5Lt_Jnl",
+        "outputId": "33defb60-5aee-46cb-e683-ee2df9e98436"
+      },
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\u001b[34m\u001b[1mwandb\u001b[0m: W&B API key is configured. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m If you're specifying your api key in code, ensure this code is not shared publicly.\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Consider setting the WANDB_API_KEY environment variable, or running `wandb login` from the command line.\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc\n"
+          ]
+        },
+        {
+          "data": {
+            "text/plain": [
+              "True"
+            ]
+          },
+          "execution_count": 10,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "wandb.login(key =\"\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Q1TioafYgICa"
+      },
+      "source": [
+        "# seq2seq model"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "svxssm9Havhb"
+      },
+      "source": [
+        "## Encoder"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "YTwk8nKNcbkb"
+      },
+      "outputs": [],
+      "source": [
+        "class EncoderRNN(nn.Module):\n",
+        "    def __init__(self, input_size, hidden_size, embedding_size, # input_size is size of input language dictionary\n",
+        "                 num_layers, cell_type,\n",
+        "                  bidirectional, dropout, batch_size) :\n",
+        "        super(EncoderRNN, self).__init__()\n",
+        "        self.hidden_size = hidden_size  # size of an hidden state representation\n",
+        "        self.num_layers = num_layers   \n",
+        "        self.bidirectional = True if bidirectional == 'Yes' else False\n",
+        "        self.batch_size = batch_size\n",
+        "        self.cell_type = cell_type\n",
+        "        self.embedding_size=embedding_size\n",
+        "\n",
+        "        # this adds the embedding layer\n",
+        "        self.embedding = nn.Embedding(num_embeddings=input_size,embedding_dim= embedding_size)\n",
+        "        self.dropout = nn.Dropout(dropout)\n",
+        "\n",
+        "        # this adds the Neural Network layer for the encoder\n",
+        "        if self.cell_type == \"GRU\":\n",
+        "            self.rnn = nn.GRU(embedding_size, hidden_size, num_layers=num_layers, bidirectional=self.bidirectional, dropout=dropout)\n",
+        "        elif self.cell_type == \"LSTM\":\n",
+        "            self.rnn = nn.LSTM(embedding_size, hidden_size, num_layers=num_layers, bidirectional=self.bidirectional, dropout=dropout)\n",
+        "        else:\n",
+        "            self.rnn = nn.RNN(embedding_size, hidden_size, num_layers=num_layers, bidirectional=self.bidirectional, dropout=dropout)\n",
+        "\n",
+        "    def forward(self, input, hidden): # input shape (seq_len, batch_size) hidden shape tuple for lstm, otherwise single\n",
+        "        embedded = self.embedding(input.long()).view(-1,self.batch_size, self.embedding_size)\n",
+        "        output = self.dropout(embedded) # output shape (seq_len, batch_size, embedding size)\n",
+        "\n",
+        "        output, hidden = self.rnn(output, hidden) # for LSTM hidden is a tuple\n",
+        "        if self.bidirectional:\n",
+        "            if self.cell_type == \"LSTM\":\n",
+        "                hidden_state = hidden[0].resize(2,self.num_layers,self.batch_size,self.hidden_size)\n",
+        "                cell_state = hidden[1].resize(2,self.num_layers,self.batch_size,self.hidden_size)\n",
+        "                hidden = (torch.add(hidden_state[0],hidden_state[1])/2, torch.add(cell_state[0],cell_state[1])/2)\n",
+        "            else:\n",
+        "                hidden=hidden.resize(2,self.num_layers,self.batch_size,self.hidden_size)\n",
+        "                hidden=torch.add(hidden[0],hidden[1])/2\n",
+        "            \n",
+        "            split_tensor= torch.split(output, self.hidden_size, dim=-1)\n",
+        "            output=torch.add(split_tensor[0],split_tensor[1])/2\n",
+        "        return output, hidden\n",
+        "\n",
+        "    # initializing the initial hidden state for the encoder\n",
+        "    def initHidden(self):\n",
+        "        num_directions = 2 if self.bidirectional else 1\n",
+        "        if self.cell_type == \"LSTM\":\n",
+        "            return (torch.zeros(self.num_layers * num_directions, self.batch_size, self.hidden_size, device=device),\n",
+        "                    torch.zeros(self.num_layers * num_directions, self.batch_size, self.hidden_size, device=device))\n",
+        "        else:\n",
+        "            return torch.zeros(self.num_layers * num_directions, self.batch_size, self.hidden_size, device=device)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "J56aq1J6a07q"
+      },
+      "source": [
+        "## Decoder"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "53ki6eJUH2u2"
+      },
+      "outputs": [],
+      "source": [
+        "class DecoderRNN(nn.Module):\n",
+        "    def __init__(self, hidden_size, output_size, embedding_size, num_layers, # output size is the size of output language dictionary\n",
+        "                 cell_type, dropout, batch_size):\n",
+        "        super(DecoderRNN, self).__init__()\n",
+        "        self.hidden_size = hidden_size\n",
+        "        self.num_layers = num_layers\n",
+        "        self.cell_type = cell_type.lower()\n",
+        "        self.batch_size = batch_size\n",
+        "        self.embedding_size=embedding_size\n",
+        "\n",
+        "        self.embedding = nn.Embedding(output_size, embedding_size)\n",
+        "        # self.dropout = nn.Dropout(dropout)\n",
+        "        \n",
+        "        if self.cell_type == \"gru\":\n",
+        "            self.rnn = nn.GRU(embedding_size, hidden_size, num_layers=num_layers)\n",
+        "        elif self.cell_type == \"lstm\":\n",
+        "            self.rnn = nn.LSTM(embedding_size, hidden_size, num_layers=num_layers)\n",
+        "        else:\n",
+        "            self.rnn = nn.RNN(embedding_size, hidden_size, num_layers=num_layers)\n",
+        "\n",
+        "        self.out = nn.Linear(hidden_size, output_size)\n",
+        "        self.softmax = nn.LogSoftmax(dim=2)\n",
+        "\n",
+        "    def forward(self, input, hidden): # input shape (1, batch_size)\n",
+        "        embedded = self.embedding(input.long()).view(-1, self.batch_size, self.embedding_size)\n",
+        "        # # shape (1, batch_size, embedding_size)\n",
+        "        output = F.relu(embedded)\n",
+        "        output, hidden = self.rnn(output, hidden) # output shape (1, batch_size, hidden_size)\n",
+        "        output = self.softmax(self.out(output)) # shape (1, batch_size, output_size)\n",
+        "        return output, hidden\n",
+        "\n",
+        "    # not needed since hidden will be provided by the encoder"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5JcQdylzI_Fc"
+      },
+      "source": [
+        "## Attention Decoder"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "R1Xysuv9I-Qr"
+      },
+      "outputs": [],
+      "source": [
+        "class AttentionDecoderRNN(nn.Module):\n",
+        "    def __init__(self, hidden_size, output_size, embedding_size, num_layers,\n",
+        "                 cell_type, dropout, batch_size, max_length):\n",
+        "        super(AttentionDecoderRNN, self).__init__()\n",
+        "        self.hidden_size = hidden_size\n",
+        "        self.num_layers = num_layers\n",
+        "        self.cell_type = cell_type\n",
+        "        self.batch_size = batch_size\n",
+        "        self.embedding_size = embedding_size\n",
+        "        self.max_length = max_length\n",
+        "        self.dropout = dropout\n",
+        "\n",
+        "        self.embedding = nn.Embedding(output_size, embedding_size)\n",
+        "        self.dropout = nn.Dropout(self.dropout)\n",
+        "        self.attention = nn.Linear(hidden_size + embedding_size, self.max_length)\n",
+        "        self.attention_combine = nn.Linear(hidden_size + embedding_size, hidden_size)\n",
+        "\n",
+        "        if self.cell_type == \"GRU\":\n",
+        "            self.rnn = nn.GRU(hidden_size, hidden_size, num_layers=num_layers)\n",
+        "        elif self.cell_type == \"LSTM\":\n",
+        "            self.rnn = nn.LSTM(hidden_size, hidden_size, num_layers=num_layers)\n",
+        "        else:\n",
+        "            self.rnn = nn.RNN(hidden_size, hidden_size, num_layers=num_layers)\n",
+        "\n",
+        "        self.out = nn.Linear(hidden_size, output_size)\n",
+        "        self.softmax = nn.LogSoftmax(dim=2)\n",
+        "\n",
+        "    def forward(self, input, hidden, encoder_outputs): #input shape (1, batch_size)\n",
+        "        embedded = self.embedding(input.long()).view(-1, self.batch_size, self.embedding_size) \n",
+        "        # embedded shape (1, batch_size, embedding_size)\n",
+        "        embedded = F.relu(embedded)\n",
+        "\n",
+        "        # Compute attention scores\n",
+        "        if self.cell_type == \"LSTM\":\n",
+        "            attn_hidden = torch.mean(hidden[0], dim=0)\n",
+        "        else:\n",
+        "            attn_hidden = torch.mean(hidden, dim = 0)\n",
+        "        attn_scores = self.attention(torch.cat((embedded, attn_hidden.unsqueeze(0)), dim=2)) # attn_scores shape (1, batch_size, max_length)\n",
+        "        \n",
+        "        attn_weights = F.softmax(attn_scores, dim=-1)  # attn_scores shape (1, 16, 25)\n",
+        "        \n",
+        "\n",
+        "        # Apply attention weights to encoder outputs\n",
+        "        attn_applied = torch.bmm(attn_weights.transpose(0, 1), encoder_outputs.transpose(0, 1))\n",
+        "        \n",
+        "        # Combine attention output and embedded input\n",
+        "        combined = torch.cat((embedded, attn_applied.transpose(0, 1)), dim=2)\n",
+        "        combined = self.attention_combine(combined)\n",
+        "        combined = F.relu(combined) # shape (1, batch_size, hidden_size)\n",
+        "\n",
+        "        # Run through the RNN\n",
+        "        output, hidden = self.rnn(combined, hidden)\n",
+        "        # output shape: (1, batch_size, hidden_size)\n",
+        "\n",
+        "        # Pass through linear layer and softmax activation\n",
+        "        output = self.out(output)  # shape: (1, batch_size, output_size)\n",
+        "        output = self.softmax(output)\n",
+        "        return output, hidden, attn_weights.transpose(0, 1)\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "LJ2Papj_jTX8"
+      },
+      "outputs": [],
+      "source": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "658W9RARGEUf"
+      },
+      "source": [
+        "# Helper functions"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "q7fAgs5uQni_"
+      },
+      "source": [
+        "## count matches"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "8fzy8U6_lbug"
+      },
+      "outputs": [],
+      "source": [
+        "def count_exact_matches(pred, target):\n",
+        "    \"\"\"\n",
+        "    Counts the number of rows in preds tensor that match exactly with each row in y tensor.\n",
+        "    pred: tensor of shape (batch_size, seq_len-1)\n",
+        "    y: tensor of shape (batch_size, seq_len-1)\n",
+        "    \"\"\"\n",
+        "    \n",
+        "    count=0;\n",
+        "    for i in range(pred.shape[0]):\n",
+        "      flag = True\n",
+        "      for j in range(pred.shape[1]):\n",
+        "        if(target[i][j]!=pred[i][j]):\n",
+        "          flag=False\n",
+        "          break;\n",
+        "         \n",
+        "      if(flag):\n",
+        "        count+=1;\n",
+        "    \n",
+        "    return count"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "n4rGh7vuQqaa"
+      },
+      "source": [
+        "## evaluation"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "zp6gvWmDlWoB"
+      },
+      "outputs": [],
+      "source": [
+        "def evaluate(data,encoder, decoder,output_size,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention):\n",
+        "    \n",
+        "\n",
+        "\n",
+        "    running_loss = 0\n",
+        "    correct =0\n",
+        "    \n",
+        "    loader = DataLoader(data, batch_size=batch_size)\n",
+        "    loss_fun = nn.CrossEntropyLoss(reduction=\"sum\")\n",
+        "    seq_len = 0\n",
+        "\n",
+        "    atten_weights = torch.zeros(1,21, 27).to(device) # required to return the attention weights\n",
+        "    predictions = torch.zeros(22-1, 1).to(device)\n",
+        "    with torch.no_grad():\n",
+        "      for j,(x,y) in enumerate(loader):\n",
+        "        loss=0\n",
+        "        encoder.eval()\n",
+        "        decoder.eval()\n",
+        "\n",
+        "        x = x.to(device)\n",
+        "        y = y.to(device)\n",
+        "\n",
+        "        x = x.T\n",
+        "        y = y.T\n",
+        "        seq_len = len(y)\n",
+        "        \n",
+        "        encoder_hidden=encoder.initHidden()\n",
+        "        encoder_output,encoder_hidden = encoder(x,encoder_hidden)\n",
+        "        \n",
+        "        \n",
+        "        decoder_input =y[0]\n",
+        "        \n",
+        "        # Handle different numbers of layers in the encoder and decoder\n",
+        "        if num_layers_encoder != num_layers_decoder:\n",
+        "            if num_layers_encoder < num_layers_decoder:\n",
+        "                remaining_layers = num_layers_decoder - num_layers_encoder\n",
+        "\n",
+        "                # Copy all encoder hidden layers and then repeat the top layer\n",
+        "                if cell_type == \"LSTM\":\n",
+        "                    top_layer_hidden = (encoder_hidden[0][-1].unsqueeze(0), encoder_hidden[1][-1].unsqueeze(0))\n",
+        "                    extra_hidden = (top_layer_hidden[0].repeat(remaining_layers, 1, 1), top_layer_hidden[1].repeat(remaining_layers, 1, 1))\n",
+        "                    decoder_hidden = (torch.cat((encoder_hidden[0], extra_hidden[0]), dim=0), torch.cat((encoder_hidden[1], extra_hidden[1]), dim=0))\n",
+        "                else:\n",
+        "                    top_layer_hidden = encoder_hidden[-1].unsqueeze(0) #top_layer_hidden shape (1, batch_size, hidden_size)\n",
+        "                    extra_hidden = top_layer_hidden.repeat(remaining_layers, 1, 1)\n",
+        "                    decoder_hidden = torch.cat((encoder_hidden, extra_hidden), dim=0)\n",
+        "\n",
+        "            else:\n",
+        "                # Slice the hidden states of the encoder to match the decoder layers\n",
+        "                if cell_type == \"LSTM\":\n",
+        "                    decoder_hidden = (encoder_hidden[0][-num_layers_decoder:], encoder_hidden[1][-num_layers_decoder:])\n",
+        "                else :\n",
+        "                    decoder_hidden = encoder_hidden[-num_layers_decoder:]\n",
+        "        else:\n",
+        "            decoder_hidden = encoder_hidden\n",
+        "\n",
+        "        pred=torch.zeros(len(y)-1, batch_size).to(device)\n",
+        "        atten_weight_default = torch.zeros(batch_size,1, 27).to(device)\n",
+        "        for k in range(1,len(y)):\n",
+        "          if attention == \"Yes\":\n",
+        "              \n",
+        "              decoder_output, decoder_hidden, atten_weight = decoder(decoder_input, decoder_hidden, encoder_output)\n",
+        "              atten_weight_default = torch.cat((atten_weight_default, atten_weight), dim = 1)\n",
+        "          else:\n",
+        "              decoder_output, decoder_hidden= decoder(decoder_input, decoder_hidden)\n",
+        "          max_prob, index = decoder_output.topk(1) # max_prob shape (1, batch_size, 1)\n",
+        "          decoder_output = torch.squeeze(decoder_output)\n",
+        "          loss += loss_fun(decoder_output, y[k].long())\n",
+        "          pred[k-1]= torch.squeeze(index)\n",
+        "          decoder_input = index\n",
+        "        if attention == \"Yes\":\n",
+        "            atten_weights = torch.cat((atten_weights, atten_weight_default[:, 1:, :]), dim = 0)\n",
+        "\n",
+        "        running_loss += loss.item()\n",
+        "        correct += count_exact_matches(pred.T,y[1:,:].T)\n",
+        "        predictions = torch.cat((predictions, pred), dim=1)\n",
+        "\n",
+        "        \n",
+        "    avg_loss = running_loss / (len(data) * seq_len)\n",
+        "    print(\"correct =\", correct)\n",
+        "    avg_acc = 100 * (correct / (len(data)))\n",
+        "    if attention == \"Yes\":\n",
+        "        return avg_loss, avg_acc, predictions, atten_weights[1:, :, :]\n",
+        "    else:\n",
+        "        return avg_loss, avg_acc, predictions\n",
+        "            \n",
+        "   \n",
+        " "
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0SsnRWlgQmCI"
+      },
+      "source": [
+        "# Training function"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "PhDgsZG0QqPW"
+      },
+      "outputs": [],
+      "source": [
+        "def train(sweeps = True, test = False):\n",
+        "\n",
+        "    if sweeps == False: \n",
+        "        configs = config_defaults  # use the default configuration which has the best hyperparameters\n",
+        "    else:\n",
+        "        wandb.init(config= config_defaults, project='DL_assign_3')   # if not test then run wandb sweeps\n",
+        "        configs=wandb.config\n",
+        "       \n",
+        "\n",
+        "    learn_rate = configs['learn_rate']\n",
+        "    batch_size = configs['batch_size']\n",
+        "    hidden_size = configs['hidden_size']\n",
+        "    embedding_size = configs['embedding_size']\n",
+        "    num_layers_encoder = configs['num_layers_encoder']\n",
+        "    num_layers_decoder = configs['num_layers_decoder']\n",
+        "    cell_type = configs['cell_type']\n",
+        "    bidirectional = configs['bidirectional']\n",
+        "    dropout = configs['dropout']\n",
+        "    teach_ratio = configs['teach_ratio']\n",
+        "    epochs = configs['epochs']\n",
+        "    attention = configs['attention']\n",
+        "\n",
+        "    if sweeps:\n",
+        "       wandb.run.name='hidden_'+str(hidden_size)+'_batch_'+str(batch_size)+'_embed_size_'+str(embedding_size)+'_dropout_'+str(dropout)+'_cell_'+str(cell_type)\n",
+        "\n",
+        "    input_len = ipLang.n_chars\n",
+        "    output_len = opLang.n_chars\n",
+        "    \n",
+        "    encoder = EncoderRNN(input_len, hidden_size, embedding_size, \n",
+        "                 num_layers_encoder, cell_type,\n",
+        "                  bidirectional, dropout, batch_size)\n",
+        "    \n",
+        "    if attention ==\"Yes\":\n",
+        "        decoder = AttentionDecoderRNN(hidden_size, output_len, embedding_size, num_layers_decoder, \n",
+        "                 cell_type, dropout, batch_size, 27)\n",
+        "    else:\n",
+        "        decoder = DecoderRNN(hidden_size, output_len, embedding_size, num_layers_decoder, \n",
+        "                 cell_type, dropout, batch_size)#dropout not used\n",
+        "    \n",
+        "    train_loader = DataLoader(trainData, batch_size=batch_size, shuffle=True)\n",
+        "    val_loader = DataLoader(valData, batch_size=batch_size, shuffle=True)\n",
+        "\n",
+        "    encoder_optimizer=optim.Adam(encoder.parameters(),learn_rate)\n",
+        "    decoder_optimizer=optim.Adam(decoder.parameters(),learn_rate)\n",
+        "    loss_fun=nn.CrossEntropyLoss(reduction=\"sum\")\n",
+        "\n",
+        "    encoder.to(device)\n",
+        "    decoder.to(device)\n",
+        "    seq_len = 0\n",
+        "\n",
+        "    # Initialize variables for early stopping\n",
+        "    best_val_loss = float('inf')\n",
+        "    patience = 5\n",
+        "    epochs_without_improvement = 0\n",
+        "\n",
+        "    for i in range(epochs):\n",
+        "        \n",
+        "        running_loss = 0.0\n",
+        "        train_correct = 0\n",
+        "\n",
+        "        encoder.train()\n",
+        "        decoder.train()\n",
+        "\n",
+        "        for j,(train_x,train_y) in enumerate(train_loader):\n",
+        "            train_x = train_x.to(device)\n",
+        "            train_y = train_y.to(device)\n",
+        "\n",
+        "            encoder_optimizer.zero_grad()\n",
+        "            decoder_optimizer.zero_grad()\n",
+        "\n",
+        "            train_x=train_x.T\n",
+        "            train_y=train_y.T\n",
+        "            # print(\"train_x.shapetrain_x.shape)\n",
+        "            seq_len = len(train_y)\n",
+        "            encoder_hidden=encoder.initHidden()\n",
+        "            # for LSTM encoder_hidden shape ((num_layers * num_directions, batch_size,hidden_size),(self.num_layers * num_directions, batch_size, hidden_size))\n",
+        "            encoder_output,encoder_hidden = encoder(train_x,encoder_hidden)\n",
+        "            # encoder_hidden shape (num_layers, batch_size, hidden_size)\n",
+        "            \n",
+        "            \n",
+        "            # lets move to the decoder\n",
+        "            decoder_input = train_y[0] # shape (1, batch_size)\n",
+        "           \n",
+        "            # Handle different numbers of layers in the encoder and decoder\n",
+        "            if num_layers_encoder != num_layers_decoder:\n",
+        "                if num_layers_encoder < num_layers_decoder:\n",
+        "                    remaining_layers = num_layers_decoder - num_layers_encoder\n",
+        "                    # Copy all encoder hidden layers and then repeat the top layer\n",
+        "                    if cell_type == \"LSTM\":\n",
+        "                        top_layer_hidden = (encoder_hidden[0][-1].unsqueeze(0), encoder_hidden[1][-1].unsqueeze(0))\n",
+        "                        extra_hidden = (top_layer_hidden[0].repeat(remaining_layers, 1, 1), top_layer_hidden[1].repeat(remaining_layers, 1, 1))\n",
+        "                        decoder_hidden = (torch.cat((encoder_hidden[0], extra_hidden[0]), dim=0), torch.cat((encoder_hidden[1], extra_hidden[1]), dim=0))\n",
+        "                    else:\n",
+        "                        top_layer_hidden = encoder_hidden[-1].unsqueeze(0) #top_layer_hidden shape (1, batch_size, hidden_size)\n",
+        "                        extra_hidden = top_layer_hidden.repeat(remaining_layers, 1, 1)\n",
+        "                        decoder_hidden = torch.cat((encoder_hidden, extra_hidden), dim=0)\n",
+        "  \n",
+        "                else:\n",
+        "                    # Slice the hidden states of the encoder to match the decoder layers\n",
+        "                    if cell_type == \"LSTM\":\n",
+        "                        decoder_hidden = (encoder_hidden[0][-num_layers_decoder:], encoder_hidden[1][-num_layers_decoder:])\n",
+        "                    else :\n",
+        "                        decoder_hidden = encoder_hidden[-num_layers_decoder:]\n",
+        "            else:\n",
+        "                decoder_hidden = encoder_hidden\n",
+        "            \n",
+        "            loss = 0\n",
+        "            correct = 0\n",
+        "           \n",
+        "            for k in range(0, len(train_y)-1):\n",
+        "                \n",
+        "                if attention == \"Yes\":\n",
+        "                    decoder_output, decoder_hidden, atten_weights = decoder(decoder_input, decoder_hidden, encoder_output)\n",
+        "                else:\n",
+        "                    decoder_output, decoder_hidden= decoder(decoder_input, decoder_hidden) # decoder_output shape (1, batch_size, output_size)\n",
+        "\n",
+        "                max_prob, index = decoder_output.topk(1) # max_prob shape (1, batch_size, 1)\n",
+        "                index = torch.squeeze(index) # shape (batch_size)\n",
+        "                decoder_output = torch.squeeze(decoder_output)\n",
+        "                loss += loss_fun(decoder_output, train_y[k+1].long())\n",
+        "                \n",
+        "                correct += (index == train_y[k+1]).sum().item()\n",
+        "\n",
+        "                # Apply teacher forcing\n",
+        "                use_teacher_forcing = True if random.random() < teach_ratio else False\n",
+        "\n",
+        "                if use_teacher_forcing:\n",
+        "                    decoder_input = train_y[k+1]\n",
+        "                \n",
+        "                else:\n",
+        "                    decoder_input = index\n",
+        "\n",
+        "            running_loss += loss.item()\n",
+        "            train_correct += correct\n",
+        "            loss.backward()\n",
+        "            encoder_optimizer.step()\n",
+        "            decoder_optimizer.step()\n",
+        "        \n",
+        "\n",
+        "        # find train loss and accuracy and print + log to wandb\n",
+        "        if attention == \"Yes\":\n",
+        "            _, train_accuracy,_, _ = evaluate(trainData,encoder, decoder,output_len,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention)\n",
+        "        else:\n",
+        "            _, train_accuracy,_= evaluate(trainData,encoder, decoder,output_len,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention)\n",
+        "        \n",
+        "        print(f\"epoch {i}, training loss {running_loss/(len(trainData)* seq_len)}, training accuracy {train_accuracy}\")\n",
+        "        if sweeps:\n",
+        "            wandb.log({\"epoch\": i, \"train_loss\": running_loss/(len(trainData)* seq_len), \"train_accuracy\": train_accuracy})\n",
+        "        \n",
+        "        # # find validation loss and accuracy and print + log to wandb\n",
+        "        if attention == \"Yes\":\n",
+        "            val_loss, val_accuracy,_, _ = evaluate(valData,encoder, decoder,output_len,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention)\n",
+        "        else:\n",
+        "            val_loss, val_accuracy,_ = evaluate(valData,encoder, decoder,output_len,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention)\n",
+        "        \n",
+        "        print(f\"epoch {i}, validation loss {val_loss}, validation accuracy {val_accuracy}\")\n",
+        "        if sweeps:\n",
+        "            wandb.log({\"val_loss\": val_loss, \"val_accuracy\": val_accuracy})\n",
+        "\n",
+        "        # Check for early stopping\n",
+        "        if val_loss < best_val_loss:\n",
+        "            best_val_loss = val_loss\n",
+        "            epochs_without_improvement = 0\n",
+        "            # Save the model weights\n",
+        "            torch.save(encoder.state_dict(), 'best_encoder.pt')\n",
+        "            torch.save(decoder.state_dict(), 'best_decoder.pt')\n",
+        "        else:\n",
+        "            epochs_without_improvement += 1\n",
+        "            if epochs_without_improvement >= patience:\n",
+        "                print(\"Early stopping triggered. No improvement in validation loss.\")\n",
+        "                break\n",
+        "        \n",
+        "    \n",
+        "    # if testing mode is on print the test accuracy \n",
+        "    if test:\n",
+        "        # Load the best model weights\n",
+        "        encoder.load_state_dict(torch.load('best_encoder.pt'))\n",
+        "        decoder.load_state_dict(torch.load('best_decoder.pt'))\n",
+        "        if attention == \"Yes\":\n",
+        "            _, test_accuracy, pred, atten_weights = evaluate(testData,encoder, decoder,output_len,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention)\n",
+        "        else:\n",
+        "            _, test_accuracy, pred = evaluate(testData,encoder, decoder,output_len,batch_size,hidden_size,num_layers_encoder,num_layers_decoder, cell_type, attention)\n",
+        "        print(f\"test accuracy {test_accuracy}\")\n",
+        "\n",
+        "    if attention == \"Yes\":\n",
+        "        return pred, atten_weights\n",
+        "    else:\n",
+        "        return pred\n",
+        "           "
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "nvyRJWUUbR2f"
+      },
+      "source": [
+        "# Translating predictions to words\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Hd3zCTnSbSaL"
+      },
+      "outputs": [],
+      "source": [
+        "def translate_prediction(input_dict , input, output_dict, pred,target):\n",
+        "    \n",
+        "    '''pred in shape of seq_len-1 * dataset_size\n",
+        "       target in shape datasize * seq_len-1\n",
+        "    '''\n",
+        "    pred = pred.T # shape datasize * seq len-1\n",
+        "    pred = pred[1:, :-1] # ignore last index of each row\n",
+        "    input = input[:, :-1] # ignore  last index of each row\n",
+        "    target = target[:, 1:-1] # ignore last index of each row\n",
+        "    print(f\"pred shape {pred.shape}, input shape {input.shape}, target shape {target.shape}\")\n",
+        "    predictions = [] \n",
+        "    Input = [] \n",
+        "    Target = []\n",
+        "    for i in range(len(pred)):\n",
+        "        \n",
+        "        pred_word=\"\"\n",
+        "        input_word=\"\"\n",
+        "        target_word = \"\"\n",
+        "\n",
+        "        for j in range(pred.shape[1]):\n",
+        "\n",
+        "            # Ignore padding\n",
+        "            if(target[i][j].item() != 0):\n",
+        "              \n",
+        "              pred_word += output_dict[pred[i][j].item()]\n",
+        "              target_word += output_dict[target[i][j].item()]\n",
+        "                    \n",
+        "        for j in range(input.shape[1]):\n",
+        "            \n",
+        "               if(input[i][j].item()!=0):\n",
+        "                    \n",
+        "                    input_word += input_dict[input[i][j].item()]   \n",
+        "\n",
+        "        # Append words in respective List\n",
+        "        \n",
+        "        predictions.append(pred_word)\n",
+        "        Input.append(input_word)         \n",
+        "        Target.append(target_word)   \n",
+        "\n",
+        "    # Create a DataFrame\n",
+        "    df = pd.DataFrame({\"input\": Input, \"predicted\": predictions,\"Actual\":Target})\n",
+        "    return df\n",
+        "\n",
+        "            "
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "8ETW0BG_Pa24"
+      },
+      "source": [
+        "#call train"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "pgGp7MoGzfPg"
+      },
+      "outputs": [],
+      "source": [
+        "# train(sweeps = False, test = True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MQPGy32rnD3V"
+      },
+      "source": [
+        "# Runnning sweeps for models without Attention\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "z_aYZvDD1OHU"
+      },
+      "source": [
+        "## Sweep Config"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "SVv8bI-D1Q_I"
+      },
+      "outputs": [],
+      "source": [
+        "sweep_config = {\n",
+        "  'name': 'sweepDL',  \n",
+        "  'method': 'bayes',\n",
+        "  'metric': {\n",
+        "        'name': 'val_accuracy',\n",
+        "        'goal': 'maximize'\n",
+        "    },\n",
+        "  'parameters': {\n",
+        "        \n",
+        "        'learn_rate': {\n",
+        "            'values': [0.01, 0.001, 0.001]\n",
+        "        },\n",
+        "        'embedding_size': {\n",
+        "            'values': [32, 64, 128, 256, 512, 1024]\n",
+        "        },\n",
+        "        'batch_size':{\n",
+        "            'values':[16, 32, 64, 128, 256]\n",
+        "        },\n",
+        "        'hidden_size':{\n",
+        "            'values':[32, 64, 128, 256, 512, 1024]\n",
+        "        },\n",
+        "        'teach_ratio':{\n",
+        "            'values':[0.4, 0.5, 0.6]\n",
+        "        },\n",
+        "        'dropout':{\n",
+        "            'values':[0, 0.2, 0.4]\n",
+        "        },\n",
+        "        'cell_type':{\n",
+        "            'values':[\"RNN\", \"LSTM\", \"GRU\"]\n",
+        "        },\n",
+        "        'bidirectional':{\n",
+        "            'values' : [\"Yes\",\"No\"]\n",
+        "        },\n",
+        "        'num_layers_decoder':{\n",
+        "            'values': [1,2, 3, 4]\n",
+        "        },\n",
+        "        'num_layers_encoder':{\n",
+        "            'values': [1,2,3,4]\n",
+        "        },\n",
+        "        'epochs':{\n",
+        "            'values': [10, 15, 20, 25, 30]\n",
+        "        },\n",
+        "        'attention':{\n",
+        "            'values': [\"Yes\"]\n",
+        "        }\n",
+        "           \n",
+        "    }\n",
+        "}\n",
+        "config_defaults={\n",
+        "    'learn_rate' : 0.001,\n",
+        "    'embedding_size': 32,\n",
+        "    'batch_size': 256,\n",
+        "    'hidden_size' : 1024,\n",
+        "    'num_layers_encoder': 3,\n",
+        "    'num_layers_decoder': 3,\n",
+        "    'bidirectional': 'No',\n",
+        "    'cell_type': \"LSTM\",\n",
+        "    'teach_ratio': 0.6,\n",
+        "    'dropout': 0.4,\n",
+        "    'epochs': 15,\n",
+        "    'attention': \"No\"\n",
+        "}"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4KxsOOpvr1oi"
+      },
+      "outputs": [],
+      "source": [
+        "sweep_id=wandb.sweep(sweep_config, project=\"CS6910_Assignment_3\")\n",
+        "wandb.agent(sweep_id,function=train)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "pKvBd5mKf0Hf"
+      },
+      "source": [
+        "# Testing the Best Model(without Attention) on Test Data \n",
+        "Set default hyperparameters to the best hyperparameters got from sweeps Hyperparamer tuning"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "kMQvZjZl0q4U"
+      },
+      "outputs": [],
+      "source": [
+        "config_defaults={\n",
+        "    'learn_rate' : 0.001,\n",
+        "    'embedding_size': 32,\n",
+        "    'batch_size': 256,\n",
+        "    'hidden_size' : 1024,\n",
+        "    'num_layers_encoder': 3,\n",
+        "    'num_layers_decoder': 3,\n",
+        "    'bidirectional': 'No',\n",
+        "    'cell_type': \"LSTM\",\n",
+        "    'teach_ratio': 0.6,\n",
+        "    'dropout': 0.4,\n",
+        "    'epochs': 15,\n",
+        "    'attention': \"No\"\n",
+        "}"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "ygtFpEvp8jFU",
+        "outputId": "1a71d3be-f17f-498c-8844-3c115c411f0a"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "correct = 1490\n",
+            "test accuracy 36.376953125\n"
+          ]
+        }
+      ],
+      "source": [
+        "pred= train(sweeps = False, test = True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hMf0OAuscOJx"
+      },
+      "source": [
+        "# Saving the predictions by Vanilla model in csv file"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "1cgUOUdsfzUB",
+        "outputId": "8784a3aa-315e-476f-cced-c38ebb8434b3"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "pred shape torch.Size([4096, 20]), input shape torch.Size([4096, 26]), target shape torch.Size([4096, 20])\n"
+          ]
+        }
+      ],
+      "source": [
+        "# save the predictions\n",
+        "dataframe = translate_prediction(ipLang.index2char, testData[:][0], opLang.index2char, pred, testData[:][1])\n",
+        "dataframe.to_csv(\"predictions.csv\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ZZW-IEWZ5syU"
+      },
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "data = pd.read_csv(\"predictions.csv\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 424
+        },
+        "id": "2sOkc_0vmDlB",
+        "outputId": "750d06b5-fee2-4eb8-d7e6-a7043cd0c15a"
+      },
+      "outputs": [],
+      "source": [
+        "data"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 142
+        },
+        "id": "AkG1vCpZ_vjG",
+        "outputId": "d64b794c-d173-4871-80fc-93b8211ebedc"
+      },
+      "outputs": [],
+      "source": [
+        "# We also want to plot the prdiction table to wandb\n",
+        "wandb.init(project=\"CS6910_Assignment_3\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "MmKDX6V5_kGu"
+      },
+      "outputs": [],
+      "source": [
+        "table = wandb.Table(dataframe=data)\n",
+        "wandb.log({\"data\": table})"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "FYMa5jTQRUaB"
+      },
+      "source": [
+        "## Plotting the confusion matrix in wandB"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "YBaJZCIBRAGZ"
+      },
+      "outputs": [],
+      "source": [
+        "import numpy as np\n",
+        "CM = np.zeros((opLang.n_chars, ipLang.n_chars))\n",
+        "\n",
+        "for i in range(len(testData[1])):\n",
+        "  for j in range(testData[1].shape[1]):\n",
+        "      pred = int(pred[i][j])\n",
+        "      targ = int(testData[1][i][j])\n",
+        "      CM[pred][targ] += 1\n",
+        "\n",
+        "classes =[]\n",
+        "\n",
+        "for i in range(len(CM)):\n",
+        "  classes.append(opLang.index2char[i])\n",
+        "\n",
+        "percentages = 100 * (CM / np.sum(CM))\n",
+        "\n",
+        "# Define the text for each cell\n",
+        "cell_text = []\n",
+        "for i in range(len(classes)):\n",
+        "    row_text = []\n",
+        "    for j in range(len(classes)):\n",
+        "\n",
+        "        txt = \"Total \"+f'{CM[i, j]}Per. ({percentages[i, j]:.3f})'\n",
+        "        if(i==j):\n",
+        "          txt =\"Correcty Predicted \" +classes[i]+\"\"+txt\n",
+        "        if(i!=j):\n",
+        "          txt =\"Predicted \" +classes[j]+\" For \"+classes[i]+\"\"+txt\n",
+        "        row_text.append(txt)\n",
+        "    cell_text.append(row_text)\n",
+        "\n",
+        "import plotly.graph_objs as go\n",
+        "\n",
+        "# Define the trace\n",
+        "trace = go.Heatmap(z=percentages,\n",
+        "                  x=classes,\n",
+        "                  y=classes,\n",
+        "                  colorscale='Blues',\n",
+        "                  colorbar=dict(title='Percentage'),\n",
+        "                  hovertemplate='%{text}%',\n",
+        "                  text=cell_text,\n",
+        "                  )\n",
+        "\n",
+        "# Define the layout\n",
+        "layout = go.Layout(title='Confusion Matrix',\n",
+        "                  xaxis=dict(title='Predicted Character'),\n",
+        "                  yaxis=dict(title='True Character'),\n",
+        "                  )\n",
+        "\n",
+        "# Plot the figure\n",
+        "fig = go.Figure(data=[trace], layout=layout)\n",
+        "wandb.log({'confusion_matrix': (fig)})"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "zfuv5FoA1wt2"
+      },
+      "source": [
+        "# Runnning sweeps for models with Attention\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tsHS0PkNGHdV"
+      },
+      "source": [
+        "## Sweep Config"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "HwCn-Ci5xkTb"
+      },
+      "outputs": [],
+      "source": [
+        "sweep_config = {\n",
+        "  'name': 'sweepDL',  \n",
+        "  'method': 'bayes',\n",
+        "  'metric': {\n",
+        "        'name': 'val_accuracy',\n",
+        "        'goal': 'maximize'\n",
+        "    },\n",
+        "  'parameters': {\n",
+        "        \n",
+        "        'learn_rate': {\n",
+        "            'values': [0.01, 0.001, 0.001]\n",
+        "        },\n",
+        "        'embedding_size': {\n",
+        "            'values': [32, 64, 128, 256, 512, 1024]\n",
+        "        },\n",
+        "        'batch_size':{\n",
+        "            'values':[16, 32, 64, 128, 256]\n",
+        "        },\n",
+        "        'hidden_size':{\n",
+        "            'values':[32, 64, 128, 256, 512, 1024]\n",
+        "        },\n",
+        "        'teach_ratio':{\n",
+        "            'values':[0.4, 0.5, 0.6]\n",
+        "        },\n",
+        "        'dropout':{\n",
+        "            'values':[0, 0.2, 0.4]\n",
+        "        },\n",
+        "        'cell_type':{\n",
+        "            'values':[\"RNN\", \"LSTM\", \"GRU\"]\n",
+        "        },\n",
+        "        'bidirectional':{\n",
+        "            'values' : [\"Yes\",\"No\"]\n",
+        "        },\n",
+        "        'num_layers_decoder':{\n",
+        "            'values': [1,2, 3, 4]\n",
+        "        },\n",
+        "        'num_layers_encoder':{\n",
+        "            'values': [1,2,3,4]\n",
+        "        },\n",
+        "        'epochs':{\n",
+        "            'values': [10, 15, 20, 25, 30]\n",
+        "        },\n",
+        "        'attention':{\n",
+        "            'values': [\"Yes\"]\n",
+        "        }\n",
+        "           \n",
+        "    }\n",
+        "}\n",
+        "config_defaults={\n",
+        "    'learn_rate' : 0.001,\n",
+        "    'embedding_size': 32,\n",
+        "    'batch_size': 64,\n",
+        "    'hidden_size' : 1024,\n",
+        "    'num_layers_encoder': 1,\n",
+        "    'num_layers_decoder': 1,\n",
+        "    'bidirectional': 'Yes',\n",
+        "    'cell_type': \"LSTM\",\n",
+        "    'teach_ratio': 0.5,\n",
+        "    'dropout': 0.4,\n",
+        "    'epochs': 20,\n",
+        "    'attention': \"Yes\"\n",
+        "}"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "3ADMwinqaQVF"
+      },
+      "outputs": [],
+      "source": [
+        "sweep_id=wandb.sweep(sweep_config, project=\"CS6910_Assignment_3\")\n",
+        "wandb.agent(sweep_id,function=train)\n",
+        "# wandb.agent(sweep_id= \"xiyggu44\",function=train, project=\"CS6910_Assignment_3\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "W7CYNChRGuGK"
+      },
+      "source": [
+        "# Testing the Best Model(with Attention) on Test Data \n",
+        "Set default hyperparameters to the best hyperparameters got from sweeps Hyperparamer tuning"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "C9MUrsXu_Rr4"
+      },
+      "outputs": [],
+      "source": [
+        "config_defaults={\n",
+        "    'learn_rate' : 0.001,\n",
+        "    'embedding_size': 32,\n",
+        "    'batch_size': 64,\n",
+        "    'hidden_size' : 1024,\n",
+        "    'num_layers_encoder': 1,\n",
+        "    'num_layers_decoder': 1,\n",
+        "    'bidirectional': 'Yes',\n",
+        "    'cell_type': \"LSTM\",\n",
+        "    'teach_ratio': 0.5,\n",
+        "    'dropout': 0.4,\n",
+        "    'epochs': 20,\n",
+        "    'attention': \"Yes\"\n",
+        "}"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "u7XAB4Q5Hpxj"
+      },
+      "outputs": [],
+      "source": [
+        "pred, atten_weights = train(sweeps = False, test = True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fld21YRZdRdG"
+      },
+      "source": [
+        "# Saving the predictions by Vanilla model in csv file"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "BpDQ1mrydYWg",
+        "outputId": "8784a3aa-315e-476f-cced-c38ebb8434b3"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "pred shape torch.Size([4096, 20]), input shape torch.Size([4096, 26]), target shape torch.Size([4096, 20])\n"
+          ]
+        }
+      ],
+      "source": [
+        "# save the predictions\n",
+        "dataframe = translate_prediction(ipLang.index2char, testData[:][0], opLang.index2char, pred, testData[:][1])\n",
+        "dataframe.to_csv(\"predictions.csv\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "PKMYPZdtdbDh"
+      },
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "data = pd.read_csv(\"predictions.csv\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 142
+        },
+        "id": "8gCL1rXCdgYp",
+        "outputId": "d64b794c-d173-4871-80fc-93b8211ebedc"
+      },
+      "outputs": [],
+      "source": [
+        "# We also want to plot the prdiction table to wandb\n",
+        "wandb.init(project=\"CS6910_Assignment_3\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "N1r2ownhdjbz"
+      },
+      "outputs": [],
+      "source": [
+        "table = wandb.Table(dataframe=data)\n",
+        "wandb.log({\"data\": table})"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "LDP4KvWdFnIL"
+      },
+      "source": [
+        "# Plotting the Attention HeatMaps"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 1000,
+          "referenced_widgets": [
+            "1b0c5a6e21a349cba57322f850ad9f48",
+            "3aa935a6db14483d8aaada58a84a3e47",
+            "eabcea7a8bbf42f6aaa3995c0dece721",
+            "b3b7711edb5542e08c53c4f37da10203",
+            "39a8a3a9b6f1495ea17fd1b3d86b67c0",
+            "18a8e2e817b947f9aad87b1ccaf96ea6",
+            "da62d6e5ad0a462b98e1591d39038e1e",
+            "9b5bb4f7f4a846c28ab967b64107726e"
+          ]
+        },
+        "id": "4WfJEdcgFmiI",
+        "outputId": "ff266529-4345-4cdc-9860-11914b099052"
+      },
+      "outputs": [],
+      "source": [
+        "import matplotlib.pyplot as plt\n",
+        "import numpy as np\n",
+        "from matplotlib.font_manager import FontProperties\n",
+        "tel_font = FontProperties(fname = 'TiroDevanagariHindi-Regular.ttf')\n",
+        "# Assuming you have attention_weights of shape (batch_size, output_sequence_length, batch_size, input_sequence_length)\n",
+        "# and prediction_matrix of shape (batch_size, output_sequence_length)\n",
+        "# and input_matrix of shape (batch_size, input_sequence_length)\n",
+        "\n",
+        "# Define the grid dimensions\n",
+        "rows = int(np.ceil(np.sqrt(12)))\n",
+        "cols = int(np.ceil(12 / rows))\n",
+        "\n",
+        "# Create a figure and subplots\n",
+        "fig, axes = plt.subplots(rows, cols, figsize=(9, 9))\n",
+        "\n",
+        "for i, ax in enumerate(axes.flatten()):\n",
+        "    if i < 12:\n",
+        "        prediction = [opLang.index2char[j.item()] for j in pred[i+1]]\n",
+        "        \n",
+        "        pred_word=\"\"\n",
+        "        input_word=\"\"\n",
+        "\n",
+        "        for j in range(len(prediction)):\n",
+        "            # Ignore padding\n",
+        "            if(prediction[j] != '#'):\n",
+        "                pred_word += prediction[j]\n",
+        "            else : \n",
+        "                break\n",
+        "        input_seq = [ipLang.index2char[j.item()] for j in testData[i][0]]\n",
+        "                    \n",
+        "        for j in range(len(input_seq)):\n",
+        "            if(input_seq[j] != '#'):\n",
+        "                    input_word += input_seq[j]\n",
+        "            else : \n",
+        "                break\n",
+        "        attn_weights = atten_weights[i, :len(pred_word), :len(input_word)].detach().cpu().numpy()\n",
+        "        ax.imshow(attn_weights.T, cmap='hot', interpolation='nearest')\n",
+        "        ax.xaxis.set_label_position('top')\n",
+        "        ax.set_title(f'Example {i+1}')\n",
+        "        ax.set_xlabel('Output predicted')\n",
+        "        ax.set_ylabel('Input word')\n",
+        "        ax.set_xticks(np.arange(len(pred_word)))\n",
+        "        ax.set_xticklabels(pred_word, rotation = 90, fontproperties = tel_font,fontdict={'fontsize':8})\n",
+        "        ax.xaxis.tick_top()\n",
+        "\n",
+        "        ax.set_yticks(np.arange(len(input_word)))\n",
+        "        ax.set_yticklabels(input_word, rotation=90)\n",
+        "        \n",
+        "        \n",
+        "\n",
+        "# Adjust the spacing between subplots\n",
+        "plt.tight_layout()\n",
+        "\n",
+        "# Show the plot\n",
+        "plt.show()\n",
+        "wandb.init(project='CS6910_Assignment_3')\n",
+        "\n",
+        "# Convert the matplotlib figure to an image\n",
+        "fig.canvas.draw()\n",
+        "image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')\n",
+        "image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))\n",
+        "\n",
+        "# Log the image in wandb\n",
+        "wandb.log({\"attention_heatmaps\": [wandb.Image(image)]})"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "FnHR_oql6-S4"
+      },
+      "outputs": [],
+      "source": []
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "collapsed_sections": [
+        "hRdpoWePeYHn",
+        "44xIRolL_T_d",
+        "Q1TioafYgICa",
+        "svxssm9Havhb",
+        "J56aq1J6a07q",
+        "5JcQdylzI_Fc",
+        "658W9RARGEUf",
+        "q7fAgs5uQni_",
+        "n4rGh7vuQqaa",
+        "0SsnRWlgQmCI",
+        "nvyRJWUUbR2f",
+        "8ETW0BG_Pa24",
+        "MQPGy32rnD3V",
+        "z_aYZvDD1OHU",
+        "pKvBd5mKf0Hf",
+        "FYMa5jTQRUaB",
+        "zfuv5FoA1wt2",
+        "W7CYNChRGuGK"
+      ],
+      "gpuType": "T4",
+      "include_colab_link": true,
+      "provenance": []
+    },
+    "gpuClass": "standard",
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "widgets": {
+      "application/vnd.jupyter.widget-state+json": {
+        "18a8e2e817b947f9aad87b1ccaf96ea6": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_module_version": "1.5.0",
+          "model_name": "DescriptionStyleModel",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "DescriptionStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "description_width": ""
+          }
+        },
+        "1b0c5a6e21a349cba57322f850ad9f48": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_module_version": "1.5.0",
+          "model_name": "VBoxModel",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "VBoxModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "VBoxView",
+            "box_style": "",
+            "children": [
+              "IPY_MODEL_3aa935a6db14483d8aaada58a84a3e47",
+              "IPY_MODEL_eabcea7a8bbf42f6aaa3995c0dece721"
+            ],
+            "layout": "IPY_MODEL_b3b7711edb5542e08c53c4f37da10203"
+          }
+        },
+        "39a8a3a9b6f1495ea17fd1b3d86b67c0": {
+          "model_module": "@jupyter-widgets/base",
+          "model_module_version": "1.2.0",
+          "model_name": "LayoutModel",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "3aa935a6db14483d8aaada58a84a3e47": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_module_version": "1.5.0",
+          "model_name": "LabelModel",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "LabelModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "LabelView",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_39a8a3a9b6f1495ea17fd1b3d86b67c0",
+            "placeholder": "",
+            "style": "IPY_MODEL_18a8e2e817b947f9aad87b1ccaf96ea6",
+            "value": "0.071 MB of 0.071 MB uploaded (0.000 MB deduped)\r"
+          }
+        },
+        "9b5bb4f7f4a846c28ab967b64107726e": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_module_version": "1.5.0",
+          "model_name": "ProgressStyleModel",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "ProgressStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "bar_color": null,
+            "description_width": ""
+          }
+        },
+        "b3b7711edb5542e08c53c4f37da10203": {
+          "model_module": "@jupyter-widgets/base",
+          "model_module_version": "1.2.0",
+          "model_name": "LayoutModel",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "da62d6e5ad0a462b98e1591d39038e1e": {
+          "model_module": "@jupyter-widgets/base",
+          "model_module_version": "1.2.0",
+          "model_name": "LayoutModel",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "eabcea7a8bbf42f6aaa3995c0dece721": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_module_version": "1.5.0",
+          "model_name": "FloatProgressModel",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "FloatProgressModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "ProgressView",
+            "bar_style": "",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_da62d6e5ad0a462b98e1591d39038e1e",
+            "max": 1,
+            "min": 0,
+            "orientation": "horizontal",
+            "style": "IPY_MODEL_9b5bb4f7f4a846c28ab967b64107726e",
+            "value": 1
+          }
+        }
+      }
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}

notebooks/transformers.ipynb ADDED Viewed

	@@ -0,0 +1,1929 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "view-in-github"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/pankajrawat9075/Language-Transliteration-Model/blob/main/transformers_encoder_decoder.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hRdpoWePeYHn"
+      },
+      "source": [
+        "## Importing Libraries and models"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:27:53.981869Z",
+          "iopub.status.busy": "2024-04-06T12:27:53.981590Z",
+          "iopub.status.idle": "2024-04-06T12:28:06.958537Z",
+          "shell.execute_reply": "2024-04-06T12:28:06.957350Z",
+          "shell.execute_reply.started": "2024-04-06T12:27:53.981844Z"
+        },
+        "id": "0LBvFtYGCNgJ",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "%%capture\n",
+        "!pip install wandb"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:28:06.960754Z",
+          "iopub.status.busy": "2024-04-06T12:28:06.960461Z",
+          "iopub.status.idle": "2024-04-06T12:28:12.559713Z",
+          "shell.execute_reply": "2024-04-06T12:28:12.558903Z",
+          "shell.execute_reply.started": "2024-04-06T12:28:06.960728Z"
+        },
+        "id": "z4ZVrIumZcDt",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "from __future__ import unicode_literals, print_function, division\n",
+        "from io import open\n",
+        "import unicodedata\n",
+        "import string\n",
+        "import re\n",
+        "import wandb\n",
+        "import random\n",
+        "import pandas as pd\n",
+        "import torch\n",
+        "import time\n",
+        "import numpy as np\n",
+        "import torch.nn as nn\n",
+        "from torch import optim\n",
+        "import matplotlib.pyplot as plt\n",
+        "import torch.nn.functional as F\n",
+        "from torch.utils.data import TensorDataset, DataLoader\n",
+        "\n",
+        "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+        "torch.cuda.empty_cache()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:28:12.561336Z",
+          "iopub.status.busy": "2024-04-06T12:28:12.560805Z",
+          "iopub.status.idle": "2024-04-06T12:28:12.571498Z",
+          "shell.execute_reply": "2024-04-06T12:28:12.570579Z",
+          "shell.execute_reply.started": "2024-04-06T12:28:12.561311Z"
+        },
+        "id": "qwL09v65CIse",
+        "outputId": "5ea72523-6a50-474c-b617-b77e16d72ef3",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "cuda\n"
+          ]
+        }
+      ],
+      "source": [
+        "print(device)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "44xIRolL_T_d"
+      },
+      "source": [
+        "## Load Dataset"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:28:12.573774Z",
+          "iopub.status.busy": "2024-04-06T12:28:12.573504Z",
+          "iopub.status.idle": "2024-04-06T12:28:12.583678Z",
+          "shell.execute_reply": "2024-04-06T12:28:12.582875Z",
+          "shell.execute_reply.started": "2024-04-06T12:28:12.573751Z"
+        },
+        "id": "Y4zemXiyE6Fi",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "class Language:\n",
+        "    def __init__(self, name):\n",
+        "        self.name = name\n",
+        "        self.char2index = {'#': 0, '$': 1, '^': 2}   # '^': start of sequence, '$' : unknown char, '#' : padding\n",
+        "        self.index2char = {0: '#', 1: '$', 2: '^'}\n",
+        "        self.vocab_size = 3  # Count\n",
+        "\n",
+        "    def addWord(self, word):\n",
+        "        for char in word:\n",
+        "            self.addChar(char)\n",
+        "\n",
+        "    def addChar(self, char):\n",
+        "        if char not in self.char2index:\n",
+        "            self.char2index[char] = self.vocab_size\n",
+        "            self.index2char[self.vocab_size] = char\n",
+        "            self.vocab_size += 1\n",
+        "\n",
+        "    def encode(self, s):\n",
+        "        return [self.char2index[ch] for ch in s]\n",
+        "\n",
+        "    def decode(self, l):\n",
+        "        return ''.join([self.index2char[i] for i in l])\n",
+        "\n",
+        "    def vocab(self):\n",
+        "        return self.char2index.keys()\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:28:12.584802Z",
+          "iopub.status.busy": "2024-04-06T12:28:12.584565Z",
+          "iopub.status.idle": "2024-04-06T12:28:12.594791Z",
+          "shell.execute_reply": "2024-04-06T12:28:12.593973Z",
+          "shell.execute_reply.started": "2024-04-06T12:28:12.584781Z"
+        },
+        "id": "IDGaCO8DkYpc",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "input_shape = 0\n",
+        "def preprocess(data, input_lang, output_lang, s=''):\n",
+        "\n",
+        "    unknown = input_lang.char2index['$']\n",
+        "\n",
+        "    input_max_len = 27\n",
+        "    output_max_len = max([len(o) for o in data[1]])\n",
+        "\n",
+        "    n = len(data)\n",
+        "    input = torch.zeros((n, input_max_len + 1), device = device)\n",
+        "    output = torch.zeros((n, output_max_len + 2), device = device)\n",
+        "\n",
+        "    for i in range(n):\n",
+        "\n",
+        "        inp = data[0][i].ljust(input_max_len + 1, '#')\n",
+        "        op = '^' + data[1][i]       # add start symbol to output\n",
+        "        op = op.ljust(output_max_len + 2, '#')\n",
+        "\n",
+        "        for index, char in enumerate(inp):\n",
+        "            if char in input_lang.char2index:\n",
+        "                input[i][index] = input_lang.char2index[char]\n",
+        "            else:\n",
+        "                input[i][index] = unknown\n",
+        "\n",
+        "        for index, char in enumerate(op):\n",
+        "            if char in output_lang.char2index:\n",
+        "                output[i][index] = output_lang.char2index[char]\n",
+        "            else:\n",
+        "                output[i][index] = unknown\n",
+        "\n",
+        "    print(s, ' dataset')\n",
+        "    print(input.shape)\n",
+        "    print(output.shape)\n",
+        "\n",
+        "    return TensorDataset(input.to(torch.int32), output.to(torch.int32))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:28:12.596018Z",
+          "iopub.status.busy": "2024-04-06T12:28:12.595741Z",
+          "iopub.status.idle": "2024-04-06T12:29:16.322883Z",
+          "shell.execute_reply": "2024-04-06T12:29:16.321877Z",
+          "shell.execute_reply.started": "2024-04-06T12:28:12.595995Z"
+        },
+        "id": "PdS5OXKxfdCX",
+        "outputId": "283fb51a-9a4a-4fc5-bad1-ea66373b29b4",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "train  dataset\n",
+            "torch.Size([51200, 28])\n",
+            "torch.Size([51200, 22])\n",
+            "validation  dataset\n",
+            "torch.Size([4096, 28])\n",
+            "torch.Size([4096, 22])\n",
+            "test  dataset\n",
+            "torch.Size([4096, 28])\n",
+            "torch.Size([4096, 22])\n"
+          ]
+        }
+      ],
+      "source": [
+        "def load_prepare_data(lang):\n",
+        "\n",
+        "    train_df = pd.read_csv(f\"drive/MyDrive/aksharantar_sampled/{lang}/{lang}_train.csv\", header = None)\n",
+        "    val_df = pd.read_csv(f\"drive/MyDrive/aksharantar_sampled/{lang}/{lang}_valid.csv\", header = None)\n",
+        "    test_df = pd.read_csv(f\"drive/MyDrive/aksharantar_sampled/{lang}/{lang}_test.csv\", header = None)\n",
+        "\n",
+        "    input_lang = Language('eng')\n",
+        "    output_lang = Language(lang)\n",
+        "\n",
+        "    # create vocablury\n",
+        "    for i in range(len(train_df)):\n",
+        "        input_lang.addWord(train_df[0][i]) # 'eng'\n",
+        "        output_lang.addWord(train_df[1][i]) # 'hin'\n",
+        "\n",
+        "    # encode the datasets\n",
+        "    train_data = preprocess(train_df, input_lang, output_lang, 'train')\n",
+        "    val_data = preprocess(val_df, input_lang, output_lang, 'validation')\n",
+        "    test_data = preprocess(test_df, input_lang, output_lang, 'test')\n",
+        "\n",
+        "    return train_data, val_data, test_data, input_lang, output_lang\n",
+        "\n",
+        "\n",
+        "train_data, val_data, test_data, input_lang, output_lang = load_prepare_data('hin')\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:16.324674Z",
+          "iopub.status.busy": "2024-04-06T12:29:16.324273Z",
+          "iopub.status.idle": "2024-04-06T12:29:16.334834Z",
+          "shell.execute_reply": "2024-04-06T12:29:16.333992Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:16.324643Z"
+        },
+        "id": "nu-NTR6BDj8e",
+        "outputId": "bd3dba2a-092d-4846-a5fb-f703f119b56a",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "hankers#####################\n"
+          ]
+        },
+        {
+          "data": {
+            "text/plain": [
+              "'^हैंकर्स##############'"
+            ]
+          },
+          "execution_count": 7,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "print(input_lang.decode(train_data[23][0].tolist()))\n",
+        "output_lang.decode(train_data[23][1].tolist())"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:16.336734Z",
+          "iopub.status.busy": "2024-04-06T12:29:16.336128Z",
+          "iopub.status.idle": "2024-04-06T12:29:16.355166Z",
+          "shell.execute_reply": "2024-04-06T12:29:16.354327Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:16.336702Z"
+        },
+        "id": "yJI8iU6dBSE0",
+        "outputId": "818815ee-503e-4dcd-b7a6-5f00a06b5ace",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "tensor([ 2, 34, 36, 17, 15,  7,  5,  4,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,\n",
+              "         0,  0,  0,  0], device='cuda:0', dtype=torch.int32)"
+            ]
+          },
+          "execution_count": 8,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "train_data[23][1]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:16.356467Z",
+          "iopub.status.busy": "2024-04-06T12:29:16.356175Z",
+          "iopub.status.idle": "2024-04-06T12:29:19.315416Z",
+          "shell.execute_reply": "2024-04-06T12:29:19.314522Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:16.356444Z"
+        },
+        "id": "SvmzS5Lt_Jnl",
+        "outputId": "1387d646-ea3c-4fbf-b44f-c071e2b07784",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\u001b[34m\u001b[1mwandb\u001b[0m: W&B API key is configured. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m If you're specifying your api key in code, ensure this code is not shared publicly.\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m Consider setting the WANDB_API_KEY environment variable, or running `wandb login` from the command line.\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc\n"
+          ]
+        },
+        {
+          "data": {
+            "text/plain": [
+              "True"
+            ]
+          },
+          "execution_count": 9,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "wandb.login(key =\"\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Q1TioafYgICa"
+      },
+      "source": [
+        "# seq2seq tranformer model"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "K94_u35dCk7-"
+      },
+      "source": [
+        "### hyperparameter settings"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:19.318625Z",
+          "iopub.status.busy": "2024-04-06T12:29:19.318195Z",
+          "iopub.status.idle": "2024-04-06T12:29:19.324068Z",
+          "shell.execute_reply": "2024-04-06T12:29:19.323194Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:19.318601Z"
+        },
+        "id": "PugX7KHvc65u",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "n_embd = 64\n",
+        "batch_size = 256\n",
+        "learning_rate = 1e-3\n",
+        "n_head = 4 # other options factors of 32 like 2, 8\n",
+        "n_layers = 6\n",
+        "dropout = 0.2\n",
+        "epochs = 50\n",
+        "\n",
+        "# encoder specific detail\n",
+        "input_vocab_size = input_lang.vocab_size\n",
+        "encoder_block_size = len(train_data[0][0])\n",
+        "\n",
+        "# decoder specific detail\n",
+        "output_vocab_size = output_lang.vocab_size\n",
+        "decoder_block_size = len(train_data[0][1])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XdltQ7oJCq1j"
+      },
+      "source": [
+        "### Encoder model"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:19.325685Z",
+          "iopub.status.busy": "2024-04-06T12:29:19.325424Z",
+          "iopub.status.idle": "2024-04-06T12:29:19.351414Z",
+          "shell.execute_reply": "2024-04-06T12:29:19.350579Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:19.325663Z"
+        },
+        "id": "uiluDiY7FAMU",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "class Head(nn.Module):\n",
+        "    \"\"\" one self-attention head \"\"\"\n",
+        "\n",
+        "    def __init__(self, n_embd, d_k, dropout, mask=0): # d_k is dimention of key , nomaly d_k = n_embd / 4\n",
+        "        super().__init__()\n",
+        "        self.mask = mask\n",
+        "        self.key = nn.Linear(n_embd, d_k, bias=False, device=device)\n",
+        "        self.query = nn.Linear(n_embd, d_k, bias=False, device=device)\n",
+        "        self.value = nn.Linear(n_embd, d_k, bias=False, device=device)\n",
+        "        if mask:\n",
+        "            self.register_buffer('tril', torch.tril(torch.ones(encoder_block_size, encoder_block_size, device=device)))\n",
+        "        self.dropout = nn.Dropout(dropout)\n",
+        "\n",
+        "    def forward(self, x, encoder_output = None):\n",
+        "        B,T,C = x.shape\n",
+        "\n",
+        "        if encoder_output is not None:\n",
+        "            k = self.key(encoder_output)\n",
+        "            Be, Te, Ce = encoder_output.shape\n",
+        "        else:\n",
+        "            k = self.key(x) # (B,T,d_k)\n",
+        "\n",
+        "        q = self.query(x) # (B,T,d_k)\n",
+        "        # compute attention scores\n",
+        "        wei = q @ k.transpose(-2, -1) * C**-0.5 # (B,T,T)\n",
+        "\n",
+        "        if self.mask:\n",
+        "            if encoder_output is not None:\n",
+        "                wei = wei.masked_fill(self.tril[:T, :Te] == 0, float('-inf')) # (B,T,T)\n",
+        "            else:\n",
+        "                wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B,T,T)\n",
+        "\n",
+        "        wei = F.softmax(wei, dim=-1)\n",
+        "        wei = self.dropout(wei)\n",
+        "        # perform weighted aggregation of values\n",
+        "        if encoder_output is not None:\n",
+        "            v = self.value(encoder_output)\n",
+        "        else:\n",
+        "            v = self.value(x)\n",
+        "        out = wei @ v # (B,T,C)\n",
+        "        return out\n",
+        "\n",
+        "class MultiHeadAttention(nn.Module):\n",
+        "    \"\"\" multiple self attention heads in parallel \"\"\"\n",
+        "\n",
+        "    def __init__(self, n_embd, num_head, d_k, dropout, mask=0):\n",
+        "        super().__init__()\n",
+        "        self.heads = nn.ModuleList([Head(n_embd, d_k, dropout, mask) for _ in range(num_head)])\n",
+        "        self.proj = nn.Linear(n_embd, n_embd)\n",
+        "        self.dropout = nn.Dropout(dropout)\n",
+        "\n",
+        "    def forward(self, x, encoder_output=None):\n",
+        "        out = torch.cat([h(x, encoder_output) for h in self.heads], dim=-1)\n",
+        "        out = self.dropout(self.proj(out))\n",
+        "        return out\n",
+        "\n",
+        "class FeedForward(nn.Module):\n",
+        "    \"\"\" multiple self attention heads in parallel \"\"\"\n",
+        "\n",
+        "    def __init__(self, n_embd, dropout):\n",
+        "        super().__init__()\n",
+        "        self.net = nn.Sequential(\n",
+        "            nn.Linear(n_embd, 4 * n_embd),\n",
+        "            nn.ReLU(),\n",
+        "            nn.Linear(4 * n_embd, n_embd),\n",
+        "            nn.Dropout(dropout)\n",
+        "        )\n",
+        "\n",
+        "    def forward(self, x):\n",
+        "        return self.net(x)\n",
+        "\n",
+        "class encoderBlock(nn.Module):\n",
+        "    \"\"\" Tranformer encoder block : communication followed by computation \"\"\"\n",
+        "\n",
+        "    def __init__(self, n_embd, n_head, dropout):\n",
+        "        super().__init__()\n",
+        "        d_k = n_embd // n_head\n",
+        "        self.sa = MultiHeadAttention(n_embd, n_head, d_k, dropout)\n",
+        "        self.ffwd = FeedForward(n_embd, dropout)\n",
+        "        self.ln1 = nn.LayerNorm(n_embd)\n",
+        "        self.ln2 = nn.LayerNorm(n_embd)\n",
+        "\n",
+        "    def forward(self, x, encoder_output=None):\n",
+        "        x = x + self.sa(self.ln1(x), encoder_output)\n",
+        "        x = x + self.ffwd(self.ln2(x))\n",
+        "        return x\n",
+        "\n",
+        "class Encoder(nn.Module):\n",
+        "\n",
+        "    def __init__(self, n_embd, n_head, n_layers, dropout):\n",
+        "        super().__init__()\n",
+        "\n",
+        "        self.token_embedding_table = nn.Embedding(input_vocab_size, n_embd) # n_embd: input embedding dimension\n",
+        "        self.position_embedding_table = nn.Embedding(encoder_block_size, n_embd)\n",
+        "        self.blocks = nn.Sequential(*[encoderBlock(n_embd, n_head, dropout) for _ in range(n_layers)])\n",
+        "        self.ln_f = nn.LayerNorm(n_embd) # final layer norm\n",
+        "\n",
+        "    def forward(self, idx):\n",
+        "        B, T = idx.shape\n",
+        "        tok_emb = self.token_embedding_table(idx) # (B,T,n_embd)\n",
+        "        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,n_embd)\n",
+        "        x = tok_emb + pos_emb # (B,T,n_embd)\n",
+        "        x = self.blocks(x) # apply one attention layer (B,T,C)\n",
+        "        x = self.ln_f(x) # (B,T,C)\n",
+        "        return x\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "GgPU486JC8Mz"
+      },
+      "source": [
+        "### Decoder model"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:19.352896Z",
+          "iopub.status.busy": "2024-04-06T12:29:19.352571Z",
+          "iopub.status.idle": "2024-04-06T12:29:19.367829Z",
+          "shell.execute_reply": "2024-04-06T12:29:19.366971Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:19.352872Z"
+        },
+        "id": "JteOV0CdC_bv",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "class decoderBlock(nn.Module):\n",
+        "    \"\"\" Tranformer decoder block : self communication then cross communication followed by computation \"\"\"\n",
+        "\n",
+        "    def __init__(self, n_embd, n_head, dropout):\n",
+        "        super().__init__()\n",
+        "        d_k = n_embd // n_head\n",
+        "        self.sa = MultiHeadAttention(n_embd, n_head, d_k, dropout, mask = 1)\n",
+        "        self.ca = MultiHeadAttention(n_embd, n_head, d_k, dropout, mask = 1)\n",
+        "        self.ffwd = FeedForward(n_embd, dropout)\n",
+        "        self.ln1 = nn.LayerNorm(n_embd, device=device)\n",
+        "        self.ln2 = nn.LayerNorm(n_embd, device=device)\n",
+        "        self.ln3 = nn.LayerNorm(n_embd, device=device)\n",
+        "\n",
+        "    def forward(self, x_encoder_output):\n",
+        "        x = x_encoder_output[0]\n",
+        "        encoder_output = x_encoder_output[1]\n",
+        "        x = x + self.sa(self.ln1(x))\n",
+        "        x = x + self.ca(self.ln2(x), encoder_output)\n",
+        "        x = x + self.ffwd(self.ln3(x))\n",
+        "        return (x,encoder_output)\n",
+        "\n",
+        "class Decoder(nn.Module):\n",
+        "\n",
+        "    def __init__(self, n_embd, n_head, n_layers, dropout):\n",
+        "        super().__init__()\n",
+        "\n",
+        "        self.token_embedding_table = nn.Embedding(output_vocab_size, n_embd) # n_embd: input embedding dimension\n",
+        "        self.position_embedding_table = nn.Embedding(decoder_block_size, n_embd)\n",
+        "        self.blocks = nn.Sequential(*[decoderBlock(n_embd, n_head=n_head, dropout=dropout) for _ in range(n_layers)])\n",
+        "        self.ln_f = nn.LayerNorm(n_embd) # final layer norm\n",
+        "        self.lm_head = nn.Linear(n_embd, output_vocab_size)\n",
+        "\n",
+        "    def forward(self, idx, encoder_output, targets=None):\n",
+        "        B, T = idx.shape\n",
+        "\n",
+        "        tok_emb = self.token_embedding_table(idx) # (B,T,n_embd)\n",
+        "        pos_emb = self.position_embedding_table(torch.arange(T, device=device)) # (T,n_embd)\n",
+        "        x = tok_emb + pos_emb # (B,T,n_embd)\n",
+        "\n",
+        "        x =self.blocks((x, encoder_output))\n",
+        "        x = self.ln_f(x[0]) # (B,T,C)\n",
+        "        logits = self.lm_head(x) # (B,T,output_vocab_size)\n",
+        "\n",
+        "        if targets is None:\n",
+        "            loss = None\n",
+        "        else:\n",
+        "            B, T, C = logits.shape\n",
+        "            temp_logits = logits.view(B*T, C)\n",
+        "            targets = targets.reshape(B*T)\n",
+        "\n",
+        "            loss = F.cross_entropy(temp_logits, targets.long())\n",
+        "\n",
+        "        # print(logits)\n",
+        "        # out = torch.argmax(logits)\n",
+        "\n",
+        "        return logits, loss\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "EBjmsIcklM8Y"
+      },
+      "source": [
+        "# Training Time"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lLfHEDk8FNfY"
+      },
+      "source": [
+        "## sweep config"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-04T14:54:15.308213Z",
+          "iopub.status.busy": "2024-04-04T14:54:15.307981Z",
+          "iopub.status.idle": "2024-04-04T14:54:15.319933Z",
+          "shell.execute_reply": "2024-04-04T14:54:15.319070Z",
+          "shell.execute_reply.started": "2024-04-04T14:54:15.308192Z"
+        },
+        "id": "nDcRZmb80msE",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "# Define sweep config\n",
+        "sweep_configuration = {\n",
+        "    \"method\": \"bayes\",\n",
+        "    \"name\": \"sweep\",\n",
+        "    \"metric\": {\"goal\": \"maximize\", \"name\": \"val_acc\"},\n",
+        "    \"parameters\": {\n",
+        "        \"batch_size\": {\"values\": [64, 128, 256]},\n",
+        "        \"epochs\": {\"values\": [20, 40, 50, 100]},\n",
+        "        \"lr\": {\"max\": 0.1, \"min\": 0.0001},\n",
+        "        \"n_embd\": {\"values\": [16, 32, 64]},\n",
+        "        \"n_head\": {\"values\": [2, 4, 8]},\n",
+        "        \"n_layers\": {\"values\": [4, 6, 8]},\n",
+        "        \"dropout\": {\"values\": [0, .1, .2, .3]}\n",
+        "    },\n",
+        "}\n",
+        "\n",
+        "sweep_id = wandb.sweep(sweep=sweep_configuration, project=\"Tranliteration-Tranformers\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-04T14:54:15.325199Z",
+          "iopub.status.busy": "2024-04-04T14:54:15.324615Z",
+          "iopub.status.idle": "2024-04-04T14:54:15.330172Z",
+          "shell.execute_reply": "2024-04-04T14:54:15.329301Z",
+          "shell.execute_reply.started": "2024-04-04T14:54:15.325168Z"
+        },
+        "id": "9CguGUG5_1NL",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "# wandb.sweep_cancel(sweep_id)\n",
+        "# wandb.finish()\n",
+        "# wandb.run.cancel()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "d5T58TQRECbZ"
+      },
+      "source": [
+        "## train function"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-04T14:54:15.331837Z",
+          "iopub.status.busy": "2024-04-04T14:54:15.331538Z",
+          "iopub.status.idle": "2024-04-04T14:54:15.351924Z",
+          "shell.execute_reply": "2024-04-04T14:54:15.351027Z",
+          "shell.execute_reply.started": "2024-04-04T14:54:15.331810Z"
+        },
+        "id": "3GWnCggNFLs3",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "def train():\n",
+        "    run = wandb.init()\n",
+        "\n",
+        "    n_embd = wandb.config.n_embd\n",
+        "    n_head = wandb.config.n_head\n",
+        "    n_layers = wandb.config.n_layers\n",
+        "    dropout = wandb.config.dropout\n",
+        "    epochs = wandb.config.epochs\n",
+        "    batch_size = wandb.config.batch_size\n",
+        "    learning_rate = wandb.config.lr\n",
+        "\n",
+        "\n",
+        "    encoder = Encoder(n_embd, n_head, n_layers, dropout)\n",
+        "    decoder = Decoder(n_embd, n_head, n_layers, dropout)\n",
+        "    encoder.to(device)\n",
+        "    decoder.to(device)\n",
+        "\n",
+        "    train_losses, train_accuracies, val_losses, val_accuracies = [], [], [], []\n",
+        "\n",
+        "    # print the number of parameters in the model\n",
+        "    print(sum([p.numel() for p in encoder.parameters()] + [p.numel() for p in decoder.parameters()])/1e3, 'K model parameters')\n",
+        "\n",
+        "    train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)\n",
+        "    val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)\n",
+        "\n",
+        "    # create a PyTorch optimizer\n",
+        "    encoder_optimizer = torch.optim.AdamW(encoder.parameters(), lr=learning_rate)\n",
+        "    decoder_optimizer = torch.optim.AdamW(decoder.parameters(), lr=learning_rate)\n",
+        "\n",
+        "# print('Step | Training Loss | Validation Loss   |   Training Accuracy %  |  Validation Accuracy %')\n",
+        "\n",
+        "    least_error = float('inf')\n",
+        "    patience = 20  # The number of epochs without improvement to wait before stopping\n",
+        "    no_improvement = 0\n",
+        "\n",
+        "    for i in range(epochs):\n",
+        "        running_loss = 0.0\n",
+        "        train_correct = 0\n",
+        "\n",
+        "        encoder.train()\n",
+        "        decoder.train()\n",
+        "\n",
+        "        for j,(train_x,train_y) in enumerate(train_loader):\n",
+        "            train_x = train_x.to(device)\n",
+        "            train_y = train_y.to(device)\n",
+        "\n",
+        "            encoder_optimizer.zero_grad(set_to_none=True)\n",
+        "            decoder_optimizer.zero_grad(set_to_none=True)\n",
+        "\n",
+        "            encoder_output = encoder(train_x)\n",
+        "            logits, loss = decoder(train_y[:, :-1], encoder_output, train_y[:, 1:])\n",
+        "\n",
+        "            encoder_optimizer.zero_grad(set_to_none=True)\n",
+        "            decoder_optimizer.zero_grad(set_to_none=True)\n",
+        "            loss.backward()\n",
+        "            encoder_optimizer.step()\n",
+        "            decoder_optimizer.step()\n",
+        "\n",
+        "            running_loss += loss\n",
+        "            pred_decoder_output = torch.argmax(logits, dim=-1)\n",
+        "            # print(pred_decoder_output, \" target: \", train_y[:, 1:])\n",
+        "            train_correct += (pred_decoder_output == train_y[:, 1:]).sum().item()\n",
+        "\n",
+        "\n",
+        "        ## validation code\n",
+        "        running_loss_val, val_correct = 0, 0\n",
+        "        encoder.eval()\n",
+        "        decoder.eval()\n",
+        "        for j,(val_x,val_y) in enumerate(val_loader):\n",
+        "            val_x = val_x.to(device)\n",
+        "            val_y = val_y.to(device)\n",
+        "\n",
+        "            encoder_output = encoder(val_x)\n",
+        "            logits, loss = decoder(val_y[:, :-1], encoder_output, val_y[:, 1:])\n",
+        "\n",
+        "            running_loss_val += loss\n",
+        "            pred_decoder_output = torch.argmax(logits, dim=-1)\n",
+        "            val_correct += torch.sum(pred_decoder_output == val_y[:, 1:])\n",
+        "\n",
+        "\n",
+        "        if running_loss_val < least_error:\n",
+        "            least_error = running_loss_val\n",
+        "            no_improvement = 0\n",
+        "        else:\n",
+        "            no_improvement += 1\n",
+        "\n",
+        "        if no_improvement >= patience:\n",
+        "            print(f\"Early stopping at epoch {i}\")\n",
+        "            break\n",
+        "\n",
+        "        wandb.log(\n",
+        "            {\n",
+        "                \"train_loss\": running_loss / len(train_data),\n",
+        "                \"val_loss\": (running_loss_val/len(val_data)),\n",
+        "                \"train_acc\": ((train_correct*100) / (len(train_data)* (decoder_block_size-1))),\n",
+        "                \"val_acc\": ((val_correct*100)/(len(val_data)* (decoder_block_size-1))),\n",
+        "            }\n",
+        "        )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "CxzRR9cjEGDm"
+      },
+      "source": [
+        "## run sweep"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 295,
+          "referenced_widgets": [
+            ""
+          ]
+        },
+        "execution": {
+          "iopub.execute_input": "2024-04-04T14:54:15.353688Z",
+          "iopub.status.busy": "2024-04-04T14:54:15.353125Z"
+        },
+        "id": "u_QFbYe32t7r",
+        "outputId": "97153eab-b36f-454b-9fed-53ae0287aee1",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: dcco6zur with config:\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tbatch_size: 64\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tdropout: 0\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tepochs: 50\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tlr: 0.0003\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_embd: 64\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_head: 4\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_layers: 6\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mcs22m062\u001b[0m (\u001b[33miitmadras\u001b[0m). Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n"
+          ]
+        },
+        {
+          "data": {
+            "text/html": [
+              "wandb version 0.16.6 is available!  To upgrade, please run:\n",
+              " $ pip install wandb --upgrade"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Tracking run with wandb version 0.16.4"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Run data is saved locally in <code>/kaggle/working/wandb/run-20240404_145417-dcco6zur</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Syncing run <strong><a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/dcco6zur' target=\"_blank\">eager-sweep-2</a></strong> to <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View project at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View sweep at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/dcco6zur' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/dcco6zur</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "710.915 K model parameters\n",
+            "Early stopping at epoch 32\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "<style>\n",
+              "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
+              "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
+              "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
+              "    </style>\n",
+              "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>train_acc</td><td>▁▅▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇███████████████</td></tr><tr><td>train_loss</td><td>█▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁</td></tr><tr><td>val_acc</td><td>▁▅▆▆▇▇▇▇▇▇██████████████████████</td></tr><tr><td>val_loss</td><td>█▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>train_acc</td><td>97.8125</td></tr><tr><td>train_loss</td><td>0.00096</td></tr><tr><td>val_acc</td><td>95.29739</td></tr><tr><td>val_loss</td><td>0.00286</td></tr></table><br/></div></div>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run <strong style=\"color:#cdcd00\">eager-sweep-2</strong> at: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/dcco6zur' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/dcco6zur</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Find logs at: <code>./wandb/run-20240404_145417-dcco6zur/logs</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: 4qb2bmi8 with config:\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tbatch_size: 128\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tdropout: 0.1\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tepochs: 20\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tlr: 0.03\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_embd: 16\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_head: 4\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_layers: 6\n"
+          ]
+        },
+        {
+          "data": {
+            "text/html": [
+              "wandb version 0.16.6 is available!  To upgrade, please run:\n",
+              " $ pip install wandb --upgrade"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Tracking run with wandb version 0.16.4"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Run data is saved locally in <code>/kaggle/working/wandb/run-20240404_153243-4qb2bmi8</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Syncing run <strong><a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/4qb2bmi8' target=\"_blank\">peach-sweep-3</a></strong> to <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View project at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View sweep at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/4qb2bmi8' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/4qb2bmi8</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "48.755 K model parameters\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "<style>\n",
+              "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
+              "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
+              "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
+              "    </style>\n",
+              "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>train_acc</td><td>▁▅▇▇▇███████████████</td></tr><tr><td>train_loss</td><td>█▄▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁</td></tr><tr><td>val_acc</td><td>▁▆▇▇▇███████████████</td></tr><tr><td>val_loss</td><td>█▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>train_acc</td><td>89.37686</td></tr><tr><td>train_loss</td><td>0.00256</td></tr><tr><td>val_acc</td><td>92.66765</td></tr><tr><td>val_loss</td><td>0.0018</td></tr></table><br/></div></div>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run <strong style=\"color:#cdcd00\">peach-sweep-3</strong> at: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/4qb2bmi8' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/4qb2bmi8</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Find logs at: <code>./wandb/run-20240404_153243-4qb2bmi8/logs</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: gtz48xe5 with config:\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tbatch_size: 32\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tdropout: 0\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tepochs: 30\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tlr: 0.01\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_embd: 16\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_head: 4\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_layers: 4\n"
+          ]
+        },
+        {
+          "data": {
+            "text/html": [
+              "wandb version 0.16.6 is available!  To upgrade, please run:\n",
+              " $ pip install wandb --upgrade"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Tracking run with wandb version 0.16.4"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Run data is saved locally in <code>/kaggle/working/wandb/run-20240404_154533-gtz48xe5</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Syncing run <strong><a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/gtz48xe5' target=\"_blank\">cerulean-sweep-4</a></strong> to <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View project at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View sweep at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/gtz48xe5' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/gtz48xe5</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "33.683 K model parameters\n"
+          ]
+        },
+        {
+          "data": {
+            "application/vnd.jupyter.widget-view+json": {
+              "model_id": "",
+              "version_major": 2,
+              "version_minor": 0
+            },
+            "text/plain": [
+              "VBox(children=(Label(value='0.001 MB of 0.047 MB uploaded\\r'), FloatProgress(value=0.028017589156043247, max=1…"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "<style>\n",
+              "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
+              "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
+              "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
+              "    </style>\n",
+              "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>train_acc</td><td>▁▆▆▇▇▇▇▇▇█████████████████████</td></tr><tr><td>train_loss</td><td>█▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁</td></tr><tr><td>val_acc</td><td>▁▃▄▅▅▆▇▆▇▆▇▇▆▆▆▇▆▇█▇▇▇▇██▇▇▇█▇</td></tr><tr><td>val_loss</td><td>█▆▅▃▃▃▂▂▂▂▂▂▃▂▂▂▃▂▂▂▂▂▂▁▁▂▂▂▁▂</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>train_acc</td><td>92.21615</td></tr><tr><td>train_loss</td><td>0.00725</td></tr><tr><td>val_acc</td><td>93.30009</td></tr><tr><td>val_loss</td><td>0.00663</td></tr></table><br/></div></div>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run <strong style=\"color:#cdcd00\">cerulean-sweep-4</strong> at: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/gtz48xe5' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/gtz48xe5</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Find logs at: <code>./wandb/run-20240404_154533-gtz48xe5/logs</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: aoy7fr9k with config:\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tbatch_size: 256\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tdropout: 0.1\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tepochs: 30\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tlr: 0.0003\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_embd: 64\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_head: 8\n",
+            "\u001b[34m\u001b[1mwandb\u001b[0m: \tn_layers: 4\n"
+          ]
+        },
+        {
+          "data": {
+            "text/html": [
+              "wandb version 0.16.6 is available!  To upgrade, please run:\n",
+              " $ pip install wandb --upgrade"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Tracking run with wandb version 0.16.4"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Run data is saved locally in <code>/kaggle/working/wandb/run-20240404_163029-aoy7fr9k</code>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              "Syncing run <strong><a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/aoy7fr9k' target=\"_blank\">warm-sweep-6</a></strong> to <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View project at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View sweep at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/sweeps/jbut4161</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "data": {
+            "text/html": [
+              " View run at <a href='https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/aoy7fr9k' target=\"_blank\">https://wandb.ai/iitmadras/Tranliteration-Tranformers/runs/aoy7fr9k</a>"
+            ],
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ]
+          },
+          "metadata": {},
+          "output_type": "display_data"
+        },
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "478.595 K model parameters\n"
+          ]
+        }
+      ],
+      "source": [
+        "wandb.agent(sweep_id=sweep_id, function=train)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cNtTaEc6kxuC"
+      },
+      "source": [
+        "# Test Time\n",
+        "Since this is the best model(validation accuracy) , we will train it on both train and validation data.\n",
+        "We will then test the model on test data"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "QcgfjfD9lvWJ"
+      },
+      "source": [
+        "## Best Hyperparameter from validation"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T15:11:46.239015Z",
+          "iopub.status.busy": "2024-04-06T15:11:46.237962Z",
+          "iopub.status.idle": "2024-04-06T15:11:46.337285Z",
+          "shell.execute_reply": "2024-04-06T15:11:46.336384Z",
+          "shell.execute_reply.started": "2024-04-06T15:11:46.238979Z"
+        },
+        "id": "q7SXqJhekxuC",
+        "outputId": "17c0dfd2-2e0b-4449-80fe-9f7a2ce68c28",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            " \n"
+          ]
+        }
+      ],
+      "source": [
+        "n_embd = 128\n",
+        "batch_size = 64\n",
+        "learning_rate = 3e-3\n",
+        "n_head = 8 # other options factors of 32 like 2, 8\n",
+        "n_layers = 6\n",
+        "dropout = 0.1\n",
+        "epochs = 200\n",
+        "\n",
+        "encoder = Encoder(n_embd, n_head, n_layers, dropout)\n",
+        "decoder = Decoder(n_embd, n_head, n_layers, dropout)\n",
+        "encoder.to(device)\n",
+        "decoder.to(device)\n",
+        "print(\" \")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "P0-9k1L6l0iZ"
+      },
+      "source": [
+        "## Train on train_data + val_data"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T15:11:51.054081Z",
+          "iopub.status.busy": "2024-04-06T15:11:51.053142Z",
+          "iopub.status.idle": "2024-04-06T17:55:02.351999Z",
+          "shell.execute_reply": "2024-04-06T17:55:02.350323Z",
+          "shell.execute_reply.started": "2024-04-06T15:11:51.054049Z"
+        },
+        "id": "TQVFJyvlTMjS",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "\n",
+        "# print the number of parameters in the model\n",
+        "print(sum([p.numel() for p in encoder.parameters()] + [p.numel() for p in decoder.parameters()])/1e3, 'K model parameters')\n",
+        "\n",
+        "train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)\n",
+        "val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=True)\n",
+        "\n",
+        "# create a PyTorch optimizer\n",
+        "encoder_optimizer = torch.optim.AdamW(encoder.parameters(), lr=learning_rate)\n",
+        "decoder_optimizer = torch.optim.AdamW(decoder.parameters(), lr=learning_rate)\n",
+        "\n",
+        "# print('Step | Training Loss | Validation Loss   |   Training Accuracy %  |  Validation Accuracy %')\n",
+        "\n",
+        "least_error = float('inf')\n",
+        "patience = 20  # The number of epochs without improvement to wait before stopping\n",
+        "no_improvement = 0\n",
+        "\n",
+        "for i in range(epochs):\n",
+        "    running_loss = 0.0\n",
+        "    train_correct = 0\n",
+        "\n",
+        "    encoder.train()\n",
+        "    decoder.train()\n",
+        "\n",
+        "    for j,(train_x,train_y) in enumerate(train_loader):\n",
+        "        train_x = train_x.to(device)\n",
+        "        train_y = train_y.to(device)\n",
+        "\n",
+        "        encoder_optimizer.zero_grad(set_to_none=True)\n",
+        "        decoder_optimizer.zero_grad(set_to_none=True)\n",
+        "\n",
+        "        encoder_output = encoder(train_x)\n",
+        "        logits, loss = decoder(train_y[:, :-1], encoder_output, train_y[:, 1:])\n",
+        "\n",
+        "        encoder_optimizer.zero_grad(set_to_none=True)\n",
+        "        decoder_optimizer.zero_grad(set_to_none=True)\n",
+        "        loss.backward()\n",
+        "        encoder_optimizer.step()\n",
+        "        decoder_optimizer.step()\n",
+        "\n",
+        "        running_loss += loss\n",
+        "        pred_decoder_output = torch.argmax(logits, dim=-1)\n",
+        "        # print(pred_decoder_output, \" target: \", train_y[:, 1:])\n",
+        "        train_correct += (pred_decoder_output == train_y[:, 1:]).sum().item()\n",
+        "\n",
+        "    for j,(train_x,train_y) in enumerate(val_loader):\n",
+        "        train_x = train_x.to(device)\n",
+        "        train_y = train_y.to(device)\n",
+        "\n",
+        "        encoder_optimizer.zero_grad(set_to_none=True)\n",
+        "        decoder_optimizer.zero_grad(set_to_none=True)\n",
+        "\n",
+        "        encoder_output = encoder(train_x)\n",
+        "        logits, loss = decoder(train_y[:, :-1], encoder_output, train_y[:, 1:])\n",
+        "\n",
+        "        encoder_optimizer.zero_grad(set_to_none=True)\n",
+        "        decoder_optimizer.zero_grad(set_to_none=True)\n",
+        "        loss.backward()\n",
+        "        encoder_optimizer.step()\n",
+        "        decoder_optimizer.step()\n",
+        "\n",
+        "        running_loss += loss\n",
+        "        pred_decoder_output = torch.argmax(logits, dim=-1)\n",
+        "        # print(pred_decoder_output, \" target: \", train_y[:, 1:])\n",
+        "        train_correct += (pred_decoder_output == train_y[:, 1:]).sum().item()\n",
+        "\n",
+        "\n",
+        "    metrics = {\n",
+        "            \"train_loss\": running_loss.cpu().detach().numpy() / (len(train_data)+len(val_data)),\n",
+        "            \"train_acc\": ((train_correct*100) / ((len(train_data)+len(val_data))* (decoder_block_size-1))),\n",
+        "        }\n",
+        "    if i % 5 == 0:\n",
+        "        print(\"Step: \",i)\n",
+        "        print(\"train_loss: \", metrics[\"train_loss\"])\n",
+        "        print(\"train_acc: \", metrics[\"train_acc\"])"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T00:22:11.853957Z",
+          "iopub.status.busy": "2024-04-06T00:22:11.852912Z",
+          "iopub.status.idle": "2024-04-06T00:22:11.923978Z",
+          "shell.execute_reply": "2024-04-06T00:22:11.923143Z",
+          "shell.execute_reply.started": "2024-04-06T00:22:11.853919Z"
+        },
+        "id": "hAjg5s0IkxuC",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "PATH = '/kaggle/working/encoder.pth'\n",
+        "torch.save(encoder, PATH)\n",
+        "PATH = '/kaggle/working/decoder.pth'\n",
+        "torch.save(encoder, PATH)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "x4M3aMxTl-zb"
+      },
+      "source": [
+        "## generate output sequence"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T12:29:19.489092Z",
+          "iopub.status.busy": "2024-04-06T12:29:19.488711Z",
+          "iopub.status.idle": "2024-04-06T12:29:19.496406Z",
+          "shell.execute_reply": "2024-04-06T12:29:19.495353Z",
+          "shell.execute_reply.started": "2024-04-06T12:29:19.489065Z"
+        },
+        "id": "mfIxu6njkxuD",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "def generate(input):\n",
+        "    B, T = input.shape\n",
+        "    encoder_output = encoder(input)\n",
+        "    idx = torch.full((B, 1), 2, dtype=torch.long, device=device) # (B,1)\n",
+        "\n",
+        "    # idx is (B, T) array of indices in the current context\n",
+        "    for _ in range(decoder_block_size-1):\n",
+        "        # get the predictions\n",
+        "        logits, loss = decoder(idx, encoder_output) # logits (B, T, vocab_size)\n",
+        "        # focus only on the last time step\n",
+        "        logits = logits[:, -1, :] # becomes (B, C)\n",
+        "        # apply softmax to get probabilities\n",
+        "        idx_next = torch.argmax(logits, dim=-1, keepdim=True) # (B, 1)\n",
+        "        # append sampled index to the running sequence\n",
+        "        idx = torch.cat((idx, idx_next), dim=1) # (B, T+1)\n",
+        "    return idx"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "BeB2nYeFmXy8"
+      },
+      "source": [
+        "## Check Test Accuracy"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "execution": {
+          "iopub.execute_input": "2024-04-06T18:00:25.146854Z",
+          "iopub.status.busy": "2024-04-06T18:00:25.146119Z",
+          "iopub.status.idle": "2024-04-06T18:00:25.156303Z",
+          "shell.execute_reply": "2024-04-06T18:00:25.155453Z",
+          "shell.execute_reply.started": "2024-04-06T18:00:25.146826Z"
+        },
+        "id": "dIzXiSLBkxuD",
+        "outputId": "ebe1d201-32bb-4372-e64a-62ebe173799d",
+        "trusted": true
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "test accuracy(word level) : 67.2188\n"
+          ]
+        }
+      ],
+      "source": [
+        "def check():\n",
+        "## validation code\n",
+        "    running_loss_val, val_correct = 0, 0\n",
+        "    encoder.eval()\n",
+        "    decoder.eval()\n",
+        "    test_loader = DataLoader(test_data, batch_size=64, shuffle=True)\n",
+        "    for _ in range(50):\n",
+        "        val_x,val_y = next(iter(test_loader))\n",
+        "\n",
+        "        val_x = val_x.to(device)\n",
+        "        val_y = val_y.to(device)\n",
+        "\n",
+        "        output = generate(val_x)\n",
+        "\n",
+        "        encoder_output = encoder(val_x)\n",
+        "        logits, loss = decoder(val_y[:, :-1], encoder_output, val_y[:, 1:])\n",
+        "\n",
+        "        running_loss_val += loss\n",
+        "        # checking val_correct for the whole sequence\n",
+        "        val_correct += torch.sum(torch.sum(output[:, 1:] != val_y[:, 1:], dim=-1) == 0)\n",
+        "\n",
+        "    print(\"test accuracy(word level) : \", ((val_correct.cpu().detach().numpy()*100) / len(test_data)))\n",
+        "\n",
+        "check()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "LDP4KvWdFnIL"
+      },
+      "source": [
+        "# Plotting the Attention HeatMaps"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4WfJEdcgFmiI",
+        "trusted": true
+      },
+      "outputs": [],
+      "source": [
+        "import matplotlib.pyplot as plt\n",
+        "import numpy as np\n",
+        "from matplotlib.font_manager import FontProperties\n",
+        "tel_font = FontProperties(fname = 'TiroDevanagariHindi-Regular.ttf')\n",
+        "# Assuming you have attention_weights of shape (batch_size, output_sequence_length, batch_size, input_sequence_length)\n",
+        "# and prediction_matrix of shape (batch_size, output_sequence_length)\n",
+        "# and input_matrix of shape (batch_size, input_sequence_length)\n",
+        "\n",
+        "# Define the grid dimensions\n",
+        "rows = int(np.ceil(np.sqrt(12)))\n",
+        "cols = int(np.ceil(12 / rows))\n",
+        "\n",
+        "# Create a figure and subplots\n",
+        "fig, axes = plt.subplots(rows, cols, figsize=(9, 9))\n",
+        "\n",
+        "for i, ax in enumerate(axes.flatten()):\n",
+        "    if i < 12:\n",
+        "        prediction = [opLang.index2char[j.item()] for j in pred[i+1]]\n",
+        "\n",
+        "        pred_word=\"\"\n",
+        "        input_word=\"\"\n",
+        "\n",
+        "        for j in range(len(prediction)):\n",
+        "            # Ignore padding\n",
+        "            if(prediction[j] != '#'):\n",
+        "                pred_word += prediction[j]\n",
+        "            else :\n",
+        "                break\n",
+        "        input_seq = [ipLang.index2char[j.item()] for j in testData[i][0]]\n",
+        "\n",
+        "        for j in range(len(input_seq)):\n",
+        "            if(input_seq[j] != '#'):\n",
+        "                    input_word += input_seq[j]\n",
+        "            else :\n",
+        "                break\n",
+        "        attn_weights = atten_weights[i, :len(pred_word), :len(input_word)].detach().cpu().numpy()\n",
+        "        ax.imshow(attn_weights.T, cmap='hot', interpolation='nearest')\n",
+        "        ax.xaxis.set_label_position('top')\n",
+        "        ax.set_title(f'Example {i+1}')\n",
+        "        ax.set_xlabel('Output predicted')\n",
+        "        ax.set_ylabel('Input word')\n",
+        "        ax.set_xticks(np.arange(len(pred_word)))\n",
+        "        ax.set_xticklabels(pred_word, rotation = 90, fontproperties = tel_font,fontdict={'fontsize':8})\n",
+        "        ax.xaxis.tick_top()\n",
+        "\n",
+        "        ax.set_yticks(np.arange(len(input_word)))\n",
+        "        ax.set_yticklabels(input_word, rotation=90)\n",
+        "\n",
+        "\n",
+        "\n",
+        "# Adjust the spacing between subplots\n",
+        "plt.tight_layout()\n",
+        "\n",
+        "# Show the plot\n",
+        "plt.show()\n",
+        "wandb.init(project='CS6910_Assignment_3')\n",
+        "\n",
+        "# Convert the matplotlib figure to an image\n",
+        "fig.canvas.draw()\n",
+        "image = np.frombuffer(fig.canvas.tostring_rgb(), dtype='uint8')\n",
+        "image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))\n",
+        "\n",
+        "# Log the image in wandb\n",
+        "wandb.log({\"attention_heatmaps\": [wandb.Image(image)]})"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "FnHR_oql6-S4"
+      },
+      "outputs": [],
+      "source": []
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "collapsed_sections": [
+        "hRdpoWePeYHn",
+        "44xIRolL_T_d",
+        "XdltQ7oJCq1j",
+        "GgPU486JC8Mz",
+        "658W9RARGEUf",
+        "q7fAgs5uQni_",
+        "n4rGh7vuQqaa",
+        "nvyRJWUUbR2f",
+        "8ETW0BG_Pa24",
+        "MQPGy32rnD3V",
+        "z_aYZvDD1OHU",
+        "pKvBd5mKf0Hf",
+        "FYMa5jTQRUaB",
+        "zfuv5FoA1wt2",
+        "W7CYNChRGuGK"
+      ],
+      "gpuType": "T4",
+      "include_colab_link": true,
+      "provenance": [],
+      "toc_visible": true
+    },
+    "kaggle": {
+      "accelerator": "gpu",
+      "dataSources": [
+        {
+          "datasetId": 4721249,
+          "sourceId": 8013732,
+          "sourceType": "datasetVersion"
+        }
+      ],
+      "dockerImageVersionId": 30674,
+      "isGpuEnabled": true,
+      "isInternetEnabled": true,
+      "language": "python",
+      "sourceType": "notebook"
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.10.13"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}

predictions_attention/predictions.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

predictions_transformer/predictions.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

predictions_vanilla/predictions _vanilla.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,93 @@

+aiofiles==23.2.1
+annotated-types==0.7.0
+anyio==4.6.0
+asttokens==2.4.1
+certifi==2024.8.30
+charset-normalizer==3.3.2
+click==8.1.7
+colorama==0.4.6
+comm==0.2.2
+contourpy==1.3.0
+cycler==0.12.1
+debugpy==1.8.6
+decorator==5.1.1
+docker-pycreds==0.4.0
+executing==2.1.0
+fastapi==0.115.0
+ffmpy==0.4.0
+filelock==3.16.1
+fonttools==4.54.1
+fsspec==2024.9.0
+gitdb==4.0.11
+GitPython==3.1.43
+gradio==4.44.1
+gradio_client==1.3.0
+h11==0.14.0
+httpcore==1.0.6
+httpx==0.27.2
+huggingface-hub==0.25.1
+idna==3.10
+importlib_resources==6.4.5
+ipykernel==6.29.5
+ipython==8.28.0
+jedi==0.19.1
+Jinja2==3.1.4
+jupyter_client==8.6.3
+jupyter_core==5.7.2
+kiwisolver==1.4.7
+markdown-it-py==3.0.0
+MarkupSafe==2.1.5
+matplotlib==3.9.2
+matplotlib-inline==0.1.7
+mdurl==0.1.2
+mpmath==1.3.0
+nest-asyncio==1.6.0
+networkx==3.3
+numpy==2.1.1
+orjson==3.10.7
+packaging==24.1
+pandas==2.2.3
+parso==0.8.4
+pillow==10.4.0
+platformdirs==4.3.6
+prompt_toolkit==3.0.48
+protobuf==5.28.2
+psutil==6.0.0
+pure_eval==0.2.3
+pydantic==2.9.2
+pydantic_core==2.23.4
+pydub==0.25.1
+Pygments==2.18.0
+pyparsing==3.1.4
+python-dateutil==2.9.0.post0
+python-multipart==0.0.12
+pytz==2024.2
+PyYAML==6.0.2
+pyzmq==26.2.0
+requests==2.32.3
+rich==13.9.1
+ruff==0.6.8
+semantic-version==2.10.0
+sentry-sdk==2.15.0
+setproctitle==1.3.3
+setuptools==75.1.0
+shellingham==1.5.4
+six==1.16.0
+smmap==5.0.1
+sniffio==1.3.1
+stack-data==0.6.3
+starlette==0.38.6
+sympy==1.13.3
+tomlkit==0.12.0
+torch==2.4.1
+tornado==6.4.1
+tqdm==4.66.5
+traitlets==5.14.3
+typer==0.12.5
+typing_extensions==4.12.2
+tzdata==2024.2
+urllib3==2.2.3
+uvicorn==0.31.0
+wandb==0.18.3
+wcwidth==0.2.13
+websockets==12.0

src/decoder.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from src.helper import get_cell
+class Decoder(nn.Module):
+    def __init__(self,
+                 out_sz: int,
+                 embed_sz: int,
+                 hidden_sz: int,
+                 cell_type: str,
+                 n_layers: int,
+                 dropout: float,
+                 device: str):
+        super(Decoder, self).__init__()
+        self.hidden_sz = hidden_sz
+        self.n_layers = n_layers
+        self.dropout = dropout
+        self.cell_type = cell_type
+        self.embedding = nn.Embedding(out_sz, embed_sz)
+        self.device = device
+        self.rnn = get_cell(cell_type)(input_size = embed_sz,
+                                        hidden_size = hidden_sz,
+                                        num_layers = n_layers,
+                                        dropout = dropout)
+        self.out = nn.Linear(hidden_sz, out_sz)
+        self.softmax = nn.LogSoftmax(dim=1)
+    def forward(self, input, hidden, cell):
+        output = self.embedding(input).view(1, 1, -1)
+        output = F.relu(output)
+        if(self.cell_type == "LSTM"):
+            output, (hidden, cell) = self.rnn(output, (hidden, cell))
+        else:
+            output, hidden = self.rnn(output, hidden)
+        output = self.softmax(self.out(output[0]))
+        return output, hidden, cell
+    def initHidden(self):
+        return torch.zeros(self.n_layers, 1, self.hidden_sz, device=self.device)

src/encoder.py ADDED Viewed

	@@ -0,0 +1,39 @@

+ort torch
+import torch.nn as nn
+from src.helper import get_cell
+class Encoder(nn.Module):
+    def __init__(self,
+                 in_sz: int,
+                 embed_sz: int,
+                 hidden_sz: int,
+                 cell_type: str,
+                 n_layers: int,
+                 dropout: float,
+                 device: str):
+        super(Encoder, self).__init__()
+        self.hidden_sz = hidden_sz
+        self.n_layers = n_layers
+        self.dropout = dropout
+        self.cell_type = cell_type
+        self.embedding = nn.Embedding(in_sz, embed_sz)
+        self.device = device
+        self.rnn = get_cell(cell_type)(input_size = embed_sz,
+                                       hidden_size = hidden_sz,
+                                       num_layers = n_layers,
+                                       dropout = dropout)
+    def forward(self, input, hidden, cell):
+        embedded = self.embedding(input).view(1, 1, -1)
+        if(self.cell_type == "LSTM"):
+            output, (hidden, cell) = self.rnn(embedded, (hidden, cell))
+        else:
+            output, hidden = self.rnn(embedded, hidden)
+        return output, hidden, cell
+    def initHidden(self):
+        return torch.zeros(self.n_layers, 1, self.hidden_sz, device=self.device)

src/helper.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import pandas as pd
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from src.language import Language, EOS_token
+def get_data(lang: str, type: str) -> list[list[str]]:
+    """
+    Returns: 'pairs': list of [input_word, target_word] pairs
+    """
+    path = "./aksharantar_sampled/{}/{}_{}.csv".format(lang, lang, type)
+    df = pd.read_csv(path, header=None)
+    pairs = df.values.tolist()
+    return pairs
+def get_languages(lang: str):
+    """
+    Returns
+    1. input_lang: input language - English
+    2. output_lang: output language - Given language
+    3. pairs: list of [input_word, target_word] pairs
+    """
+    input_lang = Language('eng')
+    output_lang = Language(lang)
+    pairs = get_data(lang, "train")
+    for pair in pairs:
+        input_lang.addWord(pair[0])
+        output_lang.addWord(pair[1])
+    return input_lang, output_lang, pairs
+def get_cell(cell_type: str):
+    if cell_type == "LSTM":
+        return nn.LSTM
+    elif cell_type == "GRU":
+        return nn.GRU
+    elif cell_type == "RNN":
+        return nn.RNN
+    else:
+        raise Exception("Invalid cell type")
+def get_optimizer(optimizer: str):
+    if optimizer == "SGD":
+        return optim.SGD
+    elif optimizer == "ADAM":
+        return optim.Adam
+    else:
+        raise Exception("Invalid optimizer")
+def indexesFromWord(lang:Language, word:str):
+    return [lang.word2index[char] for char in word]
+def tensorFromWord(lang:Language, word:str, device:str):
+    indexes = indexesFromWord(lang, word)
+    indexes.append(EOS_token)
+    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
+def tensorsFromPair(input_lang:Language, output_lang:Language, pair:list[str], device:str):
+    input_tensor = tensorFromWord(input_lang, pair[0], device)
+    target_tensor = tensorFromWord(output_lang, pair[1], device)
+    return (input_tensor, target_tensor)

src/language.py ADDED Viewed

	@@ -0,0 +1,24 @@

+# Language Model
+SOS_token = 0
+EOS_token = 1
+class Language:
+    def __init__(self, name):
+        self.name = name
+        self.word2index = {}
+        self.word2count = {}
+        self.index2word = {SOS_token: "<", EOS_token: ">"}
+        self.n_chars = 2  # Count SOS and EOS
+    def addWord(self, word):
+        for char in word:
+            self.addChar(char)
+    def addChar(self, char):
+        if char not in self.word2index:
+            self.word2index[char] = self.n_chars
+            self.word2count[char] = 1
+            self.index2word[self.n_chars] = char
+            self.n_chars += 1
+        else:
+            self.word2count[char] += 1

src/translator.py ADDED Viewed

	@@ -0,0 +1,159 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from src.helper import get_optimizer, tensorsFromPair, get_languages, tensorFromWord, get_data
+from src.language import SOS_token, EOS_token
+from src.encoder import Encoder
+from src.decoder import Decoder
+import random
+import time
+import numpy as np
+PRINT_EVERY = 5000
+PLOT_EVERY = 100
+class Translator:
+    def __init__(self, lang: str, params: dict, device: str):
+        self.lang = lang
+        self.input_lang, self.output_lang, self.pairs = get_languages(self.lang)
+        self.input_size = self.input_lang.n_chars
+        self.output_size = self.output_lang.n_chars
+        self.device = device
+        self.training_pairs = [tensorsFromPair(self.input_lang, self.output_lang, pair, self.device) for pair in self.pairs]
+        self.encoder = Encoder(in_sz = self.input_size,
+                             embed_sz = params["embed_size"],
+                             hidden_sz = params["hidden_size"],
+                             cell_type = params["cell_type"],
+                             n_layers = params["num_layers"],
+                             dropout = params["dropout"],
+                             device=self.device).to(self.device)
+        self.decoder = Decoder(out_sz = self.output_size,
+                             embed_sz = params["embed_size"],
+                             hidden_sz = params["hidden_size"],
+                             cell_type = params["cell_type"],
+                             n_layers = params["num_layers"],
+                             dropout = params["dropout"],
+                             device=self.device).to(self.device)
+        self.encoder_optimizer = get_optimizer(params["optimizer"])(self.encoder.parameters(), lr=params["learning_rate"])
+        self.decoder_optimizer = get_optimizer(params["optimizer"])(self.decoder.parameters(), lr=params["learning_rate"])
+        self.criterion = nn.NLLLoss()
+        self.teacher_forcing_ratio = params["teacher_forcing_ratio"]
+        self.max_length = params["max_length"]
+    def train_single(self, input_tensor, target_tensor):
+        encoder_hidden = self.encoder.initHidden()
+        encoder_cell = self.encoder.initHidden()
+        self.encoder_optimizer.zero_grad()
+        self.decoder_optimizer.zero_grad()
+        input_length = input_tensor.size(0)
+        target_length = target_tensor.size(0)
+        encoder_outputs = torch.zeros(self.max_length, self.encoder.hidden_sz, device=self.device)
+        loss = 0
+        for ei in range(input_length):
+            encoder_output, encoder_hidden, encoder_cell = self.encoder(input_tensor[ei], encoder_hidden, encoder_cell)
+            encoder_outputs[ei] = encoder_output[0, 0]
+        decoder_input = torch.tensor([[SOS_token]], device=self.device)
+        decoder_hidden, decoder_cell = encoder_hidden, encoder_cell
+        use_teacher_forcing = True if random.random() < self.teacher_forcing_ratio else False
+        if use_teacher_forcing:
+            for di in range(target_length):
+                decoder_output, decoder_hidden, decoder_cell = self.decoder(decoder_input, decoder_hidden, decoder_cell)
+                loss += self.criterion(decoder_output, target_tensor[di])
+                decoder_input = target_tensor[di]
+        else:
+            for di in range(target_length):
+                decoder_output, decoder_hidden, decoder_cell = self.decoder(decoder_input, decoder_hidden, decoder_cell)
+                loss += self.criterion(decoder_output, target_tensor[di])
+                topv, topi = decoder_output.topk(1)
+                decoder_input = topi.squeeze().detach()
+                if decoder_input.item() == EOS_token:
+                    break
+        loss.backward()
+        self.encoder_optimizer.step()
+        self.decoder_optimizer.step()
+        return loss.item() / target_length
+    def train(self, iters=-1):
+        start_time = time.time()
+        plot_losses = []
+        print_loss_total = 0
+        plot_loss_total = 0
+        random.shuffle(self.training_pairs)
+        iters = len(self.training_pairs) if iters == -1 else iters
+        for iter in range(1, iters+1):
+            training_pair = self.training_pairs[iter - 1]
+            input_tensor = training_pair[0]
+            target_tensor = training_pair[1]
+            loss = self.train_single(input_tensor, target_tensor)
+            print_loss_total += loss
+            plot_loss_total += loss
+            if iter % PRINT_EVERY == 0:
+                print_loss_avg = print_loss_total / PRINT_EVERY
+                print_loss_total = 0
+                current_time = time.time()
+                print("Loss: {:.4f} | Iterations: {} | Time: {:.3f}".format(print_loss_avg, iter, current_time - start_time))
+            if iter % PLOT_EVERY == 0:
+                plot_loss_avg = plot_loss_total / PLOT_EVERY
+                plot_losses.append(plot_loss_avg)
+                plot_loss_total = 0
+        return plot_losses
+    def evaluate(self, word):
+        with torch.no_grad():
+            input_tensor = tensorFromWord(self.input_lang, word, self.device)
+            input_length = input_tensor.size()[0]
+            encoder_hidden = self.encoder.initHidden()
+            encoder_cell = self.encoder.initHidden()
+            encoder_outputs = torch.zeros(self.max_length, self.encoder.hidden_sz, device=self.device)
+            for ei in range(input_length):
+                encoder_output, encoder_hidden, encoder_cell = self.encoder(input_tensor[ei], encoder_hidden, encoder_cell)
+                encoder_outputs[ei] += encoder_output[0, 0]
+            decoder_input = torch.tensor([[SOS_token]], device=self.device)
+            decoder_hidden, decoder_cell = encoder_hidden, encoder_cell
+            decoded_chars = ""
+            for di in range(self.max_length):
+                decoder_output, decoder_hidden, decoder_cell = self.decoder(decoder_input, decoder_hidden, decoder_cell)
+                topv, topi = decoder_output.topk(1)
+                if topi.item() == EOS_token:
+                    break
+                else:
+                    decoded_chars += self.output_lang.index2word[topi.item()]
+                decoder_input = topi.squeeze().detach()
+            return decoded_chars
+    def test_validate(self, type:str):
+        pairs = get_data(self.lang, type)
+        accuracy = np.sum([self.evaluate(pair[0]) == pair[1] for pair in pairs])
+        return accuracy / len(pairs)

test_best_attention.py ADDED Viewed

	@@ -0,0 +1,380 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+import random
+import numpy as np
+import matplotlib.pyplot as plt
+import pandas as pd
+import time
+import wandb
+wandb.login()
+random.seed()
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+print(device)
+# Language Model
+SOS_token = 0
+EOS_token = 1
+class Language:
+    def __init__(self, name):
+        self.name = name
+        self.word2index = {}
+        self.word2count = {}
+        self.index2word = {SOS_token: "<", EOS_token: ">"}
+        self.n_chars = 2  # Count SOS and EOS
+    def addWord(self, word):
+        for char in word:
+            self.addChar(char)
+    def addChar(self, char):
+        if char not in self.word2index:
+            self.word2index[char] = self.n_chars
+            self.word2count[char] = 1
+            self.index2word[self.n_chars] = char
+            self.n_chars += 1
+        else:
+            self.word2count[char] += 1
+def get_data(lang: str, type: str) -> list[list[str]]:
+    """
+    Returns: 'pairs': list of [input_word, target_word] pairs
+    """
+    path = "./aksharantar_sampled/{}/{}_{}.csv".format(lang, lang, type)
+    df = pd.read_csv(path, header=None)
+    pairs = df.values.tolist()
+    return pairs
+def get_languages(lang: str):
+    """
+    Returns
+    1. input_lang: input language - English
+    2. output_lang: output language - Given language
+    3. pairs: list of [input_word, target_word] pairs
+    """
+    input_lang = Language('eng')
+    output_lang = Language(lang)
+    pairs = get_data(lang, "train")
+    for pair in pairs:
+        input_lang.addWord(pair[0])
+        output_lang.addWord(pair[1])
+    return input_lang, output_lang, pairs
+def get_cell(cell_type: str):
+    if cell_type == "LSTM":
+        return nn.LSTM
+    elif cell_type == "GRU":
+        return nn.GRU
+    elif cell_type == "RNN":
+        return nn.RNN
+    else:
+        raise Exception("Invalid cell type")
+def get_optimizer(optimizer: str):
+    if optimizer == "SGD":
+        return optim.SGD
+    elif optimizer == "ADAM":
+        return optim.Adam
+    else:
+        raise Exception("Invalid optimizer")
+class Encoder(nn.Module):
+    def __init__(self,
+                 in_sz: int,
+                 embed_sz: int,
+                 hidden_sz: int,
+                 cell_type: str,
+                 n_layers: int,
+                 dropout: float):
+        super(Encoder, self).__init__()
+        self.hidden_sz = hidden_sz
+        self.n_layers = n_layers
+        self.dropout = dropout
+        self.cell_type = cell_type
+        self.embedding = nn.Embedding(in_sz, embed_sz)
+        self.rnn = get_cell(cell_type)(input_size = embed_sz,
+                                       hidden_size = hidden_sz,
+                                       num_layers = n_layers,
+                                       dropout = dropout)
+    def forward(self, input, hidden, cell):
+        embedded = self.embedding(input).view(1, 1, -1)
+        if(self.cell_type == "LSTM"):
+            output, (hidden, cell) = self.rnn(embedded, (hidden, cell))
+        else:
+            output, hidden = self.rnn(embedded, hidden)
+        return output, hidden, cell
+    def initHidden(self):
+        return torch.zeros(self.n_layers, 1, self.hidden_sz, device=device)
+class AttentionDecoder(nn.Module):
+    def __init__(self,
+                 out_sz: int,
+                 embed_sz: int,
+                 hidden_sz: int,
+                 cell_type: str,
+                 n_layers: int,
+                 dropout: float):
+        super(AttentionDecoder, self).__init__()
+        self.hidden_sz = hidden_sz
+        self.n_layers = n_layers
+        self.dropout = dropout
+        self.cell_type = cell_type
+        self.embedding = nn.Embedding(out_sz, embed_sz)
+        self.attn = nn.Linear(hidden_sz + embed_sz, 50)
+        self.attn_combine = nn.Linear(hidden_sz + embed_sz, hidden_sz)
+        self.rnn = get_cell(cell_type)(input_size = hidden_sz,
+                                       hidden_size = hidden_sz,
+                                       num_layers = n_layers,
+                                       dropout = dropout)
+        self.out = nn.Linear(hidden_sz, out_sz)
+        self.softmax = nn.LogSoftmax(dim=1)
+    def forward(self, input, hidden, cell, encoder_outputs):
+        embedding = self.embedding(input).view(1, 1, -1)
+        attn_weights = F.softmax(self.attn(torch.cat((embedding[0], hidden[0]), 1)), dim=1)
+        attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))
+        output = torch.cat((embedding[0], attn_applied[0]), 1)
+        output = self.attn_combine(output).unsqueeze(0)
+        if(self.cell_type == "LSTM"):
+            output, (hidden, cell) = self.rnn(output, (hidden, cell))
+        else:
+            output, hidden = self.rnn(output, hidden)
+        output = self.softmax(self.out(output[0]))
+        return output, hidden, cell, attn_weights
+    def initHidden(self):
+        return torch.zeros(self.n_layers, 1, self.hidden_sz, device=device)
+def indexesFromWord(lang:Language, word:str):
+    return [lang.word2index[char] for char in word]
+def tensorFromWord(lang:Language, word:str):
+    indexes = indexesFromWord(lang, word)
+    indexes.append(EOS_token)
+    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
+def tensorsFromPair(input_lang:Language, output_lang:Language, pair:list[str]):
+    input_tensor = tensorFromWord(input_lang, pair[0])
+    target_tensor = tensorFromWord(output_lang, pair[1])
+    return (input_tensor, target_tensor)
+def params_definition():
+    """
+    params:
+        embed_size : size of embedding (input and output) (8, 16, 32, 64)
+        hidden_size : size of hidden layer (64, 128, 256, 512)
+        cell_type : type of cell (LSTM, GRU, RNN)
+        num_layers : number of layers in encoder (1, 2, 3)
+        dropout : dropout probability
+        learning_rate : learning rate
+        teacher_forcing_ratio : teacher forcing ratio (0.5 fixed for now)
+        optimizer : optimizer (SGD, Adam)
+        max_length : maximum length of input word (50 fixed for now)
+    """
+    pass
+PRINT_EVERY = 5000
+PLOT_EVERY = 100
+class Translator:
+    def __init__(self, lang: str, params: dict):
+        self.lang = lang
+        self.input_lang, self.output_lang, self.pairs = get_languages(self.lang)
+        self.input_size = self.input_lang.n_chars
+        self.output_size = self.output_lang.n_chars
+        self.training_pairs = [tensorsFromPair(self.input_lang, self.output_lang, pair) for pair in self.pairs]
+        self.encoder = Encoder(in_sz = self.input_size,
+                             embed_sz = params["embed_size"],
+                             hidden_sz = params["hidden_size"],
+                             cell_type = params["cell_type"],
+                             n_layers = params["num_layers"],
+                             dropout = params["dropout"]).to(device)
+        self.decoder = AttentionDecoder(out_sz = self.output_size,
+                             embed_sz = params["embed_size"],
+                             hidden_sz = params["hidden_size"],
+                             cell_type = params["cell_type"],
+                             n_layers = params["num_layers"],
+                             dropout = params["dropout"]).to(device)
+        self.encoder_optimizer = get_optimizer(params["optimizer"])(self.encoder.parameters(), lr=params["learning_rate"], weight_decay=params["weight_decay"])
+        self.decoder_optimizer = get_optimizer(params["optimizer"])(self.decoder.parameters(), lr=params["learning_rate"], weight_decay=params["weight_decay"])
+        self.criterion = nn.NLLLoss()
+        self.teacher_forcing_ratio = params["teacher_forcing_ratio"]
+        self.max_length = params["max_length"]
+    def train_single(self, input_tensor, target_tensor):
+        encoder_hidden = self.encoder.initHidden()
+        encoder_cell = self.encoder.initHidden()
+        self.encoder_optimizer.zero_grad()
+        self.decoder_optimizer.zero_grad()
+        input_length = input_tensor.size(0)
+        target_length = target_tensor.size(0)
+        encoder_outputs = torch.zeros(self.max_length, self.encoder.hidden_sz, device=device)
+        loss = 0
+        for ei in range(input_length):
+            encoder_output, encoder_hidden, encoder_cell = self.encoder(input_tensor[ei], encoder_hidden, encoder_cell)
+            encoder_outputs[ei] = encoder_output[0, 0]
+        decoder_input = torch.tensor([[SOS_token]], device=device)
+        decoder_hidden, decoder_cell = encoder_hidden, encoder_cell
+        use_teacher_forcing = True if random.random() < self.teacher_forcing_ratio else False
+        if use_teacher_forcing:
+            for di in range(target_length):
+                decoder_output, decoder_hidden, decoder_cell, decoder_attention = self.decoder(decoder_input, decoder_hidden, decoder_cell, encoder_outputs)
+                loss += self.criterion(decoder_output, target_tensor[di])
+                decoder_input = target_tensor[di]
+        else:
+            for di in range(target_length):
+                decoder_output, decoder_hidden, decoder_cell, decoder_attention = self.decoder(decoder_input, decoder_hidden, decoder_cell, encoder_outputs)
+                loss += self.criterion(decoder_output, target_tensor[di])
+                topv, topi = decoder_output.topk(1)
+                decoder_input = topi.squeeze().detach()
+                if decoder_input.item() == EOS_token:
+                    break
+        loss.backward()
+        self.encoder_optimizer.step()
+        self.decoder_optimizer.step()
+        return loss.item() / target_length
+    def train(self, iters=-1):
+        start_time = time.time()
+        plot_losses = []
+        print_loss_total = 0
+        plot_loss_total = 0
+        random.shuffle(self.training_pairs)
+        iters = len(self.training_pairs) if iters == -1 else iters
+        for iter in range(1, iters):
+            training_pair = self.training_pairs[iter - 1]
+            input_tensor = training_pair[0]
+            target_tensor = training_pair[1]
+            loss = self.train_single(input_tensor, target_tensor)
+            print_loss_total += loss
+            plot_loss_total += loss
+            if iter % PRINT_EVERY == 0:
+                print_loss_avg = print_loss_total / PRINT_EVERY
+                print_loss_total = 0
+                current_time = time.time()
+                print("Loss: {:.4f} | Iterations: {} | Time: {:.3f}".format(print_loss_avg, iter, current_time - start_time))
+            if iter % PLOT_EVERY == 0:
+                plot_loss_avg = plot_loss_total / PLOT_EVERY
+                plot_losses.append(plot_loss_avg)
+                plot_loss_total = 0
+        return plot_losses
+    def evaluate(self, word):
+        with torch.no_grad():
+            input_tensor = tensorFromWord(self.input_lang, word)
+            input_length = input_tensor.size()[0]
+            encoder_hidden = self.encoder.initHidden()
+            encoder_cell = self.encoder.initHidden()
+            encoder_outputs = torch.zeros(self.max_length, self.encoder.hidden_sz, device=device)
+            for ei in range(input_length):
+                encoder_output, encoder_hidden, encoder_cell = self.encoder(input_tensor[ei], encoder_hidden, encoder_cell)
+                encoder_outputs[ei] += encoder_output[0, 0]
+            decoder_input = torch.tensor([[SOS_token]], device=device)
+            decoder_hidden, decoder_cell = encoder_hidden, encoder_cell
+            decoded_chars = ""
+            decoder_attentions = torch.zeros(self.max_length, self.max_length)
+            for di in range(self.max_length):
+                decoder_output, decoder_hidden, decoder_cell, decoder_attention = self.decoder(decoder_input, decoder_hidden, decoder_cell, encoder_outputs)
+                decoder_attentions[di] = decoder_attention.data
+                topv, topi = decoder_output.topk(1)
+                if topi.item() == EOS_token:
+                    break
+                else:
+                    decoded_chars += self.output_lang.index2word[topi.item()]
+                decoder_input = topi.squeeze().detach()
+            return decoded_chars, decoder_attentions[:di + 1]
+    def test_validate(self, type:str):
+        pairs = get_data(self.lang, type)
+        accuracy = 0
+        for pair in pairs:
+            output, _ = self.evaluate(pair[0])
+            if output == pair[1]:
+                accuracy += 1
+        return accuracy / len(pairs)
+params = {
+    "embed_size": 32,
+    "hidden_size": 256,
+    "cell_type": "RNN",
+    "num_layers": 2,
+    "dropout": .1,
+    "learning_rate": 0.001,
+    "optimizer": "SGD",
+    "teacher_forcing_ratio": 0.5,
+    "max_length": 50,
+    "weight_decay": 0.001
+}
+model = Translator('tam', params)
+model.encoder.load_state_dict(torch.load('./best_models_attn/encoder.pt'))
+model.decoder.load_state_dict(torch.load('./best_models_attn/decoder.pt'))
+with open("test_gen_attn.txt", "w") as f:
+    test_data = get_data("tam", "test")
+    f.write("Input, Target, Output\n")
+    accuracy = 0
+    for i in range(len(test_data)):
+        word, _ = model.evaluate(test_data[i][0])
+        f.write(test_data[i][0] + ", " + test_data[i][1] + ", " + word + "\n")
+        if test_data[i][1] == word:
+            accuracy += 1
+    print("Test Accuracy: " + str(accuracy/len(test_data) * 100) + "%")

test_best_vanilla.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from src.translator import Translator
+import torch
+import random
+from src.helper import get_data
+random.seed()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+params = {
+    "embed_size": 16,
+    "hidden_size": 512,
+    "cell_type": "LSTM",
+    "num_layers": 2,
+    "dropout": 0.1,
+    "learning_rate": 0.005,
+    "optimizer": "SGD",
+    "teacher_forcing_ratio": 0.5,
+    "max_length": 50
+}
+model = Translator("tam", params, device)
+model.encoder.load_state_dict(torch.load("./best_model_vanilla/encoder.pt"))
+model.decoder.load_state_dict(torch.load("./best_model_vanilla/decoder.pt"))
+with open("test_gen.txt", "w") as f:
+    test_data = get_data("tam", "test")
+    f.write("Input, Target, Output\n")
+    accuracy = 0
+    for i in range(len(test_data)):
+        f.write(test_data[i][0] + ", " + test_data[i][1] + ", " + model.evaluate(test_data[i][0]) + "\n")
+        if test_data[i][1] == model.evaluate(test_data[i][0]):
+            accuracy += 1
+    print("Test Accuracy: " + str(accuracy/len(test_data) * 100) + "%")

train.py ADDED Viewed

	@@ -0,0 +1,91 @@

+from src.translator import Translator
+import torch
+import random
+import argparse
+random.seed()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+params = {
+    "embed_size": 16,
+    "hidden_size": 512,
+    "cell_type": "LSTM",
+    "num_layers": 2,
+    "dropout": 0.1,
+    "learning_rate": 0.005,
+    "optimizer": "SGD",
+    "teacher_forcing_ratio": 0.5,
+    "max_length": 50
+}
+language = "tam"
+# Argument Parser
+parser = argparse.ArgumentParser(description="Transliteration Model")
+parser.add_argument("-es", "--embed_size", type=int, default=16, help="Embedding Size, good_choices = [8, 16, 32]")
+parser.add_argument("-hs", "--hidden_size", type=int, default=512, help="Hidden Size, good_choices = [128, 256, 512]")
+parser.add_argument("-ct", "--cell_type", type=str, default="LSTM", help="Cell Type, choices: [LSTM, GRU, RNN]")
+parser.add_argument("-nl", "--num_layers", type=int, default=2, help="Number of Layers, choices: [1, 2, 3]")
+parser.add_argument("-d", "--dropout", type=float, default=0.1, help="Dropout, good_choices: [0, 0.1, 0.2]")
+parser.add_argument("-lr", "--learning_rate", type=float, default=0.005, help="Learning Rate, good_choices: [0.0005, 0.001, 0.005]")
+parser.add_argument("-o", "--optimizer", type=str, default="SGD", help="Optimizer, choices: [SGD, ADAM]")
+parser.add_argument("-l", "--language", type=str, default="tam", help="Language")
+args = parser.parse_args()
+params["embed_size"] = args.embed_size
+params["hidden_size"] = args.hidden_size
+params["cell_type"] = args.cell_type
+params["num_layers"] = args.num_layers
+params["dropout"] = args.dropout
+params["learning_rate"] = args.learning_rate
+params["optimizer"] = args.optimizer
+language = args.language
+model = Translator(language, params, device)
+print("Training Model")
+print("Language: {}".format(language))
+print("Embedding Size: {}".format(params["embed_size"]))
+print("Hidden Size: {}".format(params["hidden_size"]))
+print("Cell Type: {}".format(params["cell_type"]))
+print("Number of Layers: {}".format(params["num_layers"]))
+print("Dropout: {}".format(params["dropout"]))
+print("Learning Rate: {}".format(params["learning_rate"]))
+print("Optimizer: {}".format(params["optimizer"]))
+print("Teacher Forcing Ratio: {}".format(params["teacher_forcing_ratio"]))
+print("Max Length: {}\n".format(params["max_length"]))
+epochs = 10
+old_validation_accuracy = 0
+for epoch in range(epochs):
+    print("Epoch: {}".format(epoch + 1))
+    plot_losses = model.train()
+    # take average of plot losses as training loss
+    training_loss = sum(plot_losses) / len(plot_losses)
+    print("Training Loss: {:.4f}".format(training_loss))
+    training_accuracy = model.test_validate('train')
+    print("Training Accuracy: {:.4f}".format(training_accuracy))
+    validation_accuracy = model.test_validate('valid')
+    print("Validation Accuracy: {:.4f}".format(validation_accuracy))
+    if epoch > 0:
+        if validation_accuracy < 0.0001:
+            print("Validation Accuracy is too low. Stopping training.")
+            break
+        if validation_accuracy < 0.95 * old_validation_accuracy:
+            print("Validation Accuracy is decreasing. Stopping training.")
+            break
+    old_validation_accuracy = validation_accuracy
+print("Training Complete")
+print("Testing Model")
+test_accuracy = model.test_validate('test')
+print("Test Accuracy: {:.4f}".format(test_accuracy))
+print("Testing Complete")

train_attention.py ADDED Viewed

	@@ -0,0 +1,406 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+import random
+import numpy as np
+import matplotlib.pyplot as plt
+import pandas as pd
+import time
+import argparse
+random.seed()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Language Model
+SOS_token = 0
+EOS_token = 1
+class Language:
+    def __init__(self, name):
+        self.name = name
+        self.word2index = {}
+        self.word2count = {}
+        self.index2word = {SOS_token: "<", EOS_token: ">"}
+        self.n_chars = 2  # Count SOS and EOS
+    def addWord(self, word):
+        for char in word:
+            self.addChar(char)
+    def addChar(self, char):
+        if char not in self.word2index:
+            self.word2index[char] = self.n_chars
+            self.word2count[char] = 1
+            self.index2word[self.n_chars] = char
+            self.n_chars += 1
+        else:
+            self.word2count[char] += 1
+def get_data(lang: str, type: str) -> list[list[str]]:
+    """
+    Returns: 'pairs': list of [input_word, target_word] pairs
+    """
+    path = "./aksharantar_sampled/{}/{}_{}.csv".format(lang, lang, type)
+    df = pd.read_csv(path, header=None)
+    pairs = df.values.tolist()
+    return pairs
+def get_languages(lang: str):
+    """
+    Returns
+    1. input_lang: input language - English
+    2. output_lang: output language - Given language
+    3. pairs: list of [input_word, target_word] pairs
+    """
+    input_lang = Language('eng')
+    output_lang = Language(lang)
+    pairs = get_data(lang, "train")
+    for pair in pairs:
+        input_lang.addWord(pair[0])
+        output_lang.addWord(pair[1])
+    return input_lang, output_lang, pairs
+def get_cell(cell_type: str):
+    if cell_type == "LSTM":
+        return nn.LSTM
+    elif cell_type == "GRU":
+        return nn.GRU
+    elif cell_type == "RNN":
+        return nn.RNN
+    else:
+        raise Exception("Invalid cell type")
+def get_optimizer(optimizer: str):
+    if optimizer == "SGD":
+        return optim.SGD
+    elif optimizer == "ADAM":
+        return optim.Adam
+    else:
+        raise Exception("Invalid optimizer")
+class Encoder(nn.Module):
+    def __init__(self,
+                 in_sz: int,
+                 embed_sz: int,
+                 hidden_sz: int,
+                 cell_type: str,
+                 n_layers: int,
+                 dropout: float):
+        super(Encoder, self).__init__()
+        self.hidden_sz = hidden_sz
+        self.n_layers = n_layers
+        self.dropout = dropout
+        self.cell_type = cell_type
+        self.embedding = nn.Embedding(in_sz, embed_sz)
+        self.rnn = get_cell(cell_type)(input_size = embed_sz,
+                                       hidden_size = hidden_sz,
+                                       num_layers = n_layers,
+                                       dropout = dropout)
+    def forward(self, input, hidden, cell):
+        embedded = self.embedding(input).view(1, 1, -1)
+        if(self.cell_type == "LSTM"):
+            output, (hidden, cell) = self.rnn(embedded, (hidden, cell))
+        else:
+            output, hidden = self.rnn(embedded, hidden)
+        return output, hidden, cell
+    def initHidden(self):
+        return torch.zeros(self.n_layers, 1, self.hidden_sz, device=device)
+class AttentionDecoder(nn.Module):
+    def __init__(self,
+                 out_sz: int,
+                 embed_sz: int,
+                 hidden_sz: int,
+                 cell_type: str,
+                 n_layers: int,
+                 dropout: float):
+        super(AttentionDecoder, self).__init__()
+        self.hidden_sz = hidden_sz
+        self.n_layers = n_layers
+        self.dropout = dropout
+        self.cell_type = cell_type
+        self.embedding = nn.Embedding(out_sz, embed_sz)
+        self.attn = nn.Linear(hidden_sz + embed_sz, 50)
+        self.attn_combine = nn.Linear(hidden_sz + embed_sz, hidden_sz)
+        self.rnn = get_cell(cell_type)(input_size = hidden_sz,
+                                       hidden_size = hidden_sz,
+                                       num_layers = n_layers,
+                                       dropout = dropout)
+        self.out = nn.Linear(hidden_sz, out_sz)
+        self.softmax = nn.LogSoftmax(dim=1)
+    def forward(self, input, hidden, cell, encoder_outputs):
+        embedding = self.embedding(input).view(1, 1, -1)
+        attn_weights = F.softmax(self.attn(torch.cat((embedding[0], hidden[0]), 1)), dim=1)
+        attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))
+        output = torch.cat((embedding[0], attn_applied[0]), 1)
+        output = self.attn_combine(output).unsqueeze(0)
+        if(self.cell_type == "LSTM"):
+            output, (hidden, cell) = self.rnn(output, (hidden, cell))
+        else:
+            output, hidden = self.rnn(output, hidden)
+        output = self.softmax(self.out(output[0]))
+        return output, hidden, cell, attn_weights
+    def initHidden(self):
+        return torch.zeros(self.n_layers, 1, self.hidden_sz, device=device)
+def indexesFromWord(lang:Language, word:str):
+    return [lang.word2index[char] for char in word]
+def tensorFromWord(lang:Language, word:str):
+    indexes = indexesFromWord(lang, word)
+    indexes.append(EOS_token)
+    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
+def tensorsFromPair(input_lang:Language, output_lang:Language, pair:list[str]):
+    input_tensor = tensorFromWord(input_lang, pair[0])
+    target_tensor = tensorFromWord(output_lang, pair[1])
+    return (input_tensor, target_tensor)
+def params_definition():
+    """
+    params:
+        embed_size : size of embedding (input and output) (8, 16, 32, 64)
+        hidden_size : size of hidden layer (64, 128, 256, 512)
+        cell_type : type of cell (LSTM, GRU, RNN)
+        num_layers : number of layers in encoder (1, 2, 3)
+        dropout : dropout probability
+        learning_rate : learning rate
+        teacher_forcing_ratio : teacher forcing ratio (0.5 fixed for now)
+        optimizer : optimizer (SGD, Adam)
+        max_length : maximum length of input word (50 fixed for now)
+    """
+    pass
+PRINT_EVERY = 5000
+PLOT_EVERY = 100
+class Translator:
+    def __init__(self, lang: str, params: dict):
+        self.lang = lang
+        self.input_lang, self.output_lang, self.pairs = get_languages(self.lang)
+        self.input_size = self.input_lang.n_chars
+        self.output_size = self.output_lang.n_chars
+        self.training_pairs = [tensorsFromPair(self.input_lang, self.output_lang, pair) for pair in self.pairs]
+        self.encoder = Encoder(in_sz = self.input_size,
+                             embed_sz = params["embed_size"],
+                             hidden_sz = params["hidden_size"],
+                             cell_type = params["cell_type"],
+                             n_layers = params["num_layers"],
+                             dropout = params["dropout"]).to(device)
+        self.decoder = AttentionDecoder(out_sz = self.output_size,
+                             embed_sz = params["embed_size"],
+                             hidden_sz = params["hidden_size"],
+                             cell_type = params["cell_type"],
+                             n_layers = params["num_layers"],
+                             dropout = params["dropout"]).to(device)
+        self.encoder_optimizer = get_optimizer(params["optimizer"])(self.encoder.parameters(), lr=params["learning_rate"], weight_decay=params["weight_decay"])
+        self.decoder_optimizer = get_optimizer(params["optimizer"])(self.decoder.parameters(), lr=params["learning_rate"], weight_decay=params["weight_decay"])
+        self.criterion = nn.NLLLoss()
+        self.teacher_forcing_ratio = params["teacher_forcing_ratio"]
+        self.max_length = params["max_length"]
+    def train_single(self, input_tensor, target_tensor):
+        encoder_hidden = self.encoder.initHidden()
+        encoder_cell = self.encoder.initHidden()
+        self.encoder_optimizer.zero_grad()
+        self.decoder_optimizer.zero_grad()
+        input_length = input_tensor.size(0)
+        target_length = target_tensor.size(0)
+        encoder_outputs = torch.zeros(self.max_length, self.encoder.hidden_sz, device=device)
+        loss = 0
+        for ei in range(input_length):
+            encoder_output, encoder_hidden, encoder_cell = self.encoder(input_tensor[ei], encoder_hidden, encoder_cell)
+            encoder_outputs[ei] = encoder_output[0, 0]
+        decoder_input = torch.tensor([[SOS_token]], device=device)
+        decoder_hidden, decoder_cell = encoder_hidden, encoder_cell
+        use_teacher_forcing = True if random.random() < self.teacher_forcing_ratio else False
+        if use_teacher_forcing:
+            for di in range(target_length):
+                decoder_output, decoder_hidden, decoder_cell, decoder_attention = self.decoder(decoder_input, decoder_hidden, decoder_cell, encoder_outputs)
+                loss += self.criterion(decoder_output, target_tensor[di])
+                decoder_input = target_tensor[di]
+        else:
+            for di in range(target_length):
+                decoder_output, decoder_hidden, decoder_cell, decoder_attention = self.decoder(decoder_input, decoder_hidden, decoder_cell, encoder_outputs)
+                loss += self.criterion(decoder_output, target_tensor[di])
+                topv, topi = decoder_output.topk(1)
+                decoder_input = topi.squeeze().detach()
+                if decoder_input.item() == EOS_token:
+                    break
+        loss.backward()
+        self.encoder_optimizer.step()
+        self.decoder_optimizer.step()
+        return loss.item() / target_length
+    def train(self, iters=-1):
+        start_time = time.time()
+        plot_losses = []
+        print_loss_total = 0
+        plot_loss_total = 0
+        random.shuffle(self.training_pairs)
+        iters = len(self.training_pairs) if iters == -1 else iters
+        for iter in range(1, iters):
+            training_pair = self.training_pairs[iter - 1]
+            input_tensor = training_pair[0]
+            target_tensor = training_pair[1]
+            loss = self.train_single(input_tensor, target_tensor)
+            print_loss_total += loss
+            plot_loss_total += loss
+            if iter % PRINT_EVERY == 0:
+                print_loss_avg = print_loss_total / PRINT_EVERY
+                print_loss_total = 0
+                current_time = time.time()
+                print("Loss: {:.4f} | Iterations: {} | Time: {:.3f}".format(print_loss_avg, iter, current_time - start_time))
+            if iter % PLOT_EVERY == 0:
+                plot_loss_avg = plot_loss_total / PLOT_EVERY
+                plot_losses.append(plot_loss_avg)
+                plot_loss_total = 0
+        return plot_losses
+    def evaluate(self, word):
+        with torch.no_grad():
+            input_tensor = tensorFromWord(self.input_lang, word)
+            input_length = input_tensor.size()[0]
+            encoder_hidden = self.encoder.initHidden()
+            encoder_cell = self.encoder.initHidden()
+            encoder_outputs = torch.zeros(self.max_length, self.encoder.hidden_sz, device=device)
+            for ei in range(input_length):
+                encoder_output, encoder_hidden, encoder_cell = self.encoder(input_tensor[ei], encoder_hidden, encoder_cell)
+                encoder_outputs[ei] += encoder_output[0, 0]
+            decoder_input = torch.tensor([[SOS_token]], device=device)
+            decoder_hidden, decoder_cell = encoder_hidden, encoder_cell
+            decoded_chars = ""
+            decoder_attentions = torch.zeros(self.max_length, self.max_length)
+            for di in range(self.max_length):
+                decoder_output, decoder_hidden, decoder_cell, decoder_attention = self.decoder(decoder_input, decoder_hidden, decoder_cell, encoder_outputs)
+                decoder_attentions[di] = decoder_attention.data
+                topv, topi = decoder_output.topk(1)
+                if topi.item() == EOS_token:
+                    break
+                else:
+                    decoded_chars += self.output_lang.index2word[topi.item()]
+                decoder_input = topi.squeeze().detach()
+            return decoded_chars, decoder_attentions[:di + 1]
+    def test_validate(self, type:str):
+        pairs = get_data(self.lang, type)
+        accuracy = 0
+        for pair in pairs:
+            output, _ = self.evaluate(pair[0])
+            if output == pair[1]:
+                accuracy += 1
+        return accuracy / len(pairs)
+params = {
+    "embed_size": 32,
+    "hidden_size": 256,
+    "cell_type": "RNN",
+    "num_layers": 2,
+    "dropout": 0,
+    "learning_rate": 0.001,
+    "optimizer": "SGD",
+    "teacher_forcing_ratio": 0.5,
+    "max_length": 50,
+    "weight_decay": 0.001
+}
+language = "tam"
+parser = argparse.ArgumentParser(description="Transliteration Model with Attention")
+parser.add_argument('-es', '--embed_size', type=int, default=32, help='Embedding size')
+parser.add_argument('-hs', '--hidden_size', type=int, default=256, help='Hidden size')
+parser.add_argument('-ct', '--cell_type', type=str, default='RNN', help='Cell type')
+parser.add_argument('-nl', '--num_layers', type=int, default=2, help='Number of layers')
+parser.add_argument('-dr', '--dropout', type=float, default=0, help='Dropout')
+parser.add_argument('-lr', '--learning_rate', type=float, default=0.001, help='Learning rate')
+parser.add_argument('-op', '--optimizer', type=str, default='SGD', help='Optimizer')
+parser.add_argument('-wd', '--weight_decay', type=float, default=0.001, help='Weight decay')
+parser.add_argument('-l', '--lang', type=str, default='tam', help='Language')
+args = parser.parse_args()
+for arg in vars(args):
+    params[arg] = getattr(args, arg)
+language = args.lang
+print("Language: {}".format(language))
+print("Embedding size: {}".format(params['embed_size']))
+print("Hidden size: {}".format(params['hidden_size']))
+print("Cell type: {}".format(params['cell_type']))
+print("Number of layers: {}".format(params['num_layers']))
+print("Dropout: {}".format(params['dropout']))
+print("Learning rate: {}".format(params['learning_rate']))
+print("Optimizer: {}".format(params['optimizer']))
+print("Weight decay: {}".format(params['weight_decay']))
+print("Teacher forcing ratio: {}".format(params['teacher_forcing_ratio']))
+print("Max length: {}".format(params['max_length']))
+model = Translator(language, params)
+epochs = 10
+for epoch in range(epochs):
+    print("Epoch: {}".format(epoch + 1))
+    model.train()
+    train_accuracy = model.test_validate('train')
+    print("Training Accuracy: {:.4f}".format(train_accuracy))
+    validation_accuracy = model.test_validate('valid')
+    print("Validation Accuracy: {:.4f}".format(validation_accuracy))
+test_accuracy = model.test_validate('test')
+print("Test Accuracy: {:.4f}".format(test_accuracy))