torinriley
commited on
Commit
•
f4ad2fa
1
Parent(s):
27fd53d
Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,35 @@ language:
|
|
4 |
- it
|
5 |
- en
|
6 |
pipeline_tag: translation
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- it
|
5 |
- en
|
6 |
pipeline_tag: translation
|
7 |
+
---
|
8 |
+
|
9 |
+
# OratioAI
|
10 |
+
Sequecne to Sequence anguage translation, implimenting the methodes outlined in *'attention is all you need'*
|
11 |
+
|
12 |
+
1. Input Tokenization:
|
13 |
+
The source and target sentences are tokenized using custom WordPiece tokenizers. Tokens are mapped to embeddings via the InputEmbeddings module, scaled by the model dimension.
|
14 |
+
2. Positional Encoding:
|
15 |
+
Positional information is added to token embeddings using a fixed sinusoidal encoding strategy.
|
16 |
+
3. Encoding Phase:
|
17 |
+
The encoder processes the source sequence, transforming token embeddings into contextual representations using stacked EncoderBlock modules.
|
18 |
+
4. Decoding Phase:
|
19 |
+
The decoder autoregressively generates target tokens by attending to both previous tokens and encoder outputs. Cross-attention layers align source and target sequences effectively.
|
20 |
+
5. Projection:
|
21 |
+
Final decoder outputs are projected into the target vocabulary space for token prediction.
|
22 |
+
6. Output Generation:
|
23 |
+
Decoding is performed using a beam search or greedy approach to produce the final translated sentence.
|
24 |
+
|
25 |
+
|
26 |
+
|
27 |
+
|
28 |
+
| Resource | Description |
|
29 |
+
|-----------------------------------|----------------------------------------------------------|
|
30 |
+
| [Training Space](https://huggingface.co/spaces/torinriley/OratioAI) | Hugging Face Space for training and testing the model. |
|
31 |
+
| [GitHub Source Code](https://huggingface.co/spaces/torinriley/OratioAI) | Source code repository for the translation project. |
|
32 |
+
| [Attention Is All You Need](https://arxiv.org/pdf/1706.03762) | Original paper on the transformer architecture published from google |
|
33 |
+
|
34 |
+
| Dataset | Description |
|
35 |
+
|-----------------------------------|----------------------------------------------------------|
|
36 |
+
| [Dataset](https://opus.nlpl.eu/Europarl/en&it/v8/Europarl) | Dataset Used for main model training. |
|
37 |
+
|
38 |
+
|