File size: 3,112 Bytes
23516c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b89d38f
 
 
 
facc855
b89d38f
 
 
 
 
 
 
 
 
 
 
 
 
facc855
b89d38f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---

language: en
license: mit
tags:
  - gpt
  - transformer
  - small-model
  - from-scratch
  - babymodel
datasets:
  - roneneldan/TinyStories
library_name: transformers
pipeline_tag: text-generation
---



# 🍼 BabyLangModel

A tiny GPT-style language model trained from scratch on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Built using PyTorch and a custom architecture inspired by [nanoGPT](https://github.com/karpathy/nanoGPT). This model was trained for 200k iterations on a consumer GPU (RTX 4060) using custom code from scratch.

---

## 🧠 Model Details

- **Architecture**: GPT (custom implementation)
- **Parameters**: ~10–15M
- **Layers**: 6
- **Heads**: 6
- **Embedding Size**: 384
- **Block Size**: 128
- **Tokenizer**: GPT-2 (`tiktoken`)
- **Training Steps**: 200,000
- **Training Loss**: ~1.80

---

## πŸ“š Training Data

We trained on the open-source **[TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)** dataset by Microsoft Research. It's a dataset of short, simple English stories written for young children (ages 2–4).

- Clean, simple narratives
- Ideal for small model generalization
- 100% open and publicly available

---

## 🧰 Usage (with `transformers`)

This model uses a **custom architecture**, so you need to use `trust_remote_code=True`:

```python

from transformers import AutoModel



model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)

```

---

## ✨ Sample Generation

```text

Prompt: Once upon a time there was a tiny robot who



Output: ...lived in a far away home. One day, a little girl named Lily decided to go on a special trip in the forest. She walked and walked until she got there but suddenly she started to go. Her mom called her and said, "Don't worry, Lily. We will get you my special ride."

```

> πŸ—£οΈ Still improving, but quite readable and story-like after 200k iterations!

---

## πŸ’» Train It Yourself

You can find the full training code on [GitHub](https://github.com/Exquisique/Babymodel) or use this structure:

```bash

python -m src.tokenizer      # Tokenize TinyStories

python -m src.train          # Train model from scratch

python -m src.generate       # Generate text

```

You’ll also find:
- Checkpointing & resume support
- Configurable hyperparams
- Gradient accumulation & mixed precision

---

## πŸ”§ Config Used

```json

{

  "vocab_size": 50257,

  "block_size": 128,

  "n_layer": 6,

  "n_head": 6,

  "n_embd": 384,

  "model_type": "gpt"

}

```

---

## πŸ“¦ Inference Notes

To load the model, use:

```python

from transformers import AutoModel

model = AutoModel.from_pretrained("Exquisique/BabyLangModel", trust_remote_code=True)

```

You can also upload a tokenizer later for full text input support (e.g. with `tiktoken`).

---

## πŸ§‘β€πŸ’» Author
**Exquisique** β€” GenAI explorer, poetic dreamer, and neural model whisperer.

---

## πŸ“œ License
MIT β€” open source, fine-tune and remix freely. ✨