reichenbach
commited on
Commit
•
da3adf5
1
Parent(s):
46fd1f4
Readme Update
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- seq2seq
|
4 |
+
- character-level
|
5 |
+
- machine translation
|
6 |
+
|
7 |
+
---
|
8 |
+
## Tensorflow Keras Implementation of Character-level Recurrent Sequence-to-Sequence Model
|
9 |
+
|
10 |
+
This repo contains code using the model. [Character-level recurrent sequence-to-sequence model](https://keras.io/examples/nlp/lstm_seq2seq/).
|
11 |
+
|
12 |
+
Credits: [fchollet](https://twitter.com/fchollet) - Original Author
|
13 |
+
|
14 |
+
HF Contribution: [Rishav Chandra Varma](https://huggingface.co/reichenbach)
|
15 |
+
|
16 |
+
## Background Information
|
17 |
+
|
18 |
+
### Introduction
|
19 |
+
|
20 |
+
This example demonstrates how to implement a basic character-level recurrent sequence-to-sequence model. We apply it to translating short English sentences into short French sentences, character-by-character. Note that it is fairly unusual to do character-level machine translation, as word-level models are more common in this domain.
|
21 |
+
|
22 |
+
### Summary of the algorithm
|
23 |
+
|
24 |
+
We start with input sequences from a domain (e.g. English sentences) and corresponding target sequences from another domain (e.g. French sentences).
|
25 |
+
An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs).
|
26 |
+
A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one timestep in the future, a training process called "teacher forcing" in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence.
|
27 |
+
In inference mode, when we want to decode unknown input sequences, we: - Encode the input sequence into state vectors - Start with a target sequence of size 1 (just the start-of-sequence character) - Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character - Sample the next character using these predictions (we simply use argmax). - Append the sampled character to the target sequence - Repeat until we generate the end-of-sequence character or we hit the character limit.
|