|
--- |
|
license: mit |
|
datasets: |
|
- wikimedia/wikipedia |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- pytorch |
|
- Thinking |
|
- CustomModel |
|
--- |
|
|
|
# Latent Recurrent Depth Language Model |
|
|
|
## Overview |
|
|
|
The Latent Recurrent Depth Language Model (LRD-LM) is an experimental text-generation architecture designed to capture deeper contextual information through iterative, latent processing. Instead of generating verbose chain-of-thought sequences, LRD-LM refines its internal state over multiple recurrent iterations to improve text generation quality while keeping the parameter count modest. |
|
|
|
## Architecture |
|
|
|
The model is built around three key components: |
|
|
|
- **Prelude Block:** |
|
This block handles the initial processing by embedding input tokens and applying self-attention with positional encodings. |
|
|
|
- **Recurrent Block:** |
|
A core, weight-shared block that iteratively refines a latent state. By repeatedly processing the prelude output along with its own evolving state, the model effectively “thinks” over the input without outputting intermediate tokens. |
|
|
|
- **Coda Block:** |
|
The final block decodes the refined latent state into output token probabilities. |
|
|
|
## Applications & Limitations |
|
|
|
**Intended Uses:** |
|
- **Text Generation:** |
|
Generate creative text, dialogue, code, or other natural language content. |
|
- **Research:** |
|
Serve as a testbed for exploring novel architectures and techniques in language modeling. |
|
|
|
**Limitations:** |
|
- **Data Constraints:** |
|
Trained on a small subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset, which may limit its performance compared to models trained on larger corpora. |
|
- **Performance:** |
|
While it demonstrates the potential of latent recurrent depth, its overall performance is experimental and may not match state-of-the-art models. |
|
- **Computational Overhead:** |
|
The iterative processing introduces extra computation. |
|
- **Bias:** |
|
As with all language models, generated outputs may reflect biases present in the training data. |
|
|
|
## Training Details |
|
|
|
The model was fine-tuned on a subset of the Wikitext-2-raw-v1 dataset (first 1000 samples) using the AdamW optimizer and a cosine annealing learning rate scheduler. The training configuration and hyperparameters are provided in the accompanying code, and adjustments may be needed for improved performance. |
|
|
|
## Usage |
|
|
|
The model can be used for text generation via its integrated `generate()` method, which allows you to control parameters such as the maximum sequence length, number of recurrent iterations, temperature, and top‑k filtering. |
|
|
|
### Example: Direct Inference |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel |
|
|
|
# Load the model and tokenizer from the hub |
|
model = AutoModelForCausalLM.from_pretrained("codewithdark/latent-recurrent-depth-lm") |
|
tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm") |
|
|
|
prompt = "In the realm of language modeling" |
|
input_ids = tokenizer(prompt, return_tensors='pt').input_ids |
|
|
|
# Generate logits using a specified number of recurrent iterations |
|
logits = model(input_ids, num_iterations=3) |
|
|
|
# Sample from logits to produce generated text |
|
import torch |
|
probs = torch.softmax(logits[:, -1, :], dim=-1) |
|
next_token = torch.multinomial(probs, num_samples=1) |
|
generated_ids = torch.cat([input_ids, next_token], dim=1) |
|
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
clean_text = generated_text.replace('Ġ','') |
|
print(generated_text) |
|
``` |
|
|
|
### Alternative: Using the `generate()` Method |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm") |
|
model = AutoModel.from_pretrained("codewithdark/latent-recurrent-depth-lm", trust_remote_code=True) |
|
|
|
prompt = "In the realm of language modeling" |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
generated_ids = model.generate(input_ids, max_length=50, num_iterations=10, temperature=0.5, top_k=50) |
|
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
clean_text = generated_text.replace('Ġ','') |
|
print(clean_text) |
|
|
|
``` |
|
|
|
## Ethical Considerations |
|
|
|
This model is intended for research and experimental use. Users must ensure ethical application and carefully consider potential biases and misuse when deploying or further developing this technology. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. |