codewithdark's picture
Update README.md
9fad8af verified
---
license: mit
datasets:
- wikimedia/wikipedia
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- pytorch
- Thinking
- CustomModel
---
# Latent Recurrent Depth Language Model
## Overview
The Latent Recurrent Depth Language Model (LRD-LM) is an experimental text-generation architecture designed to capture deeper contextual information through iterative, latent processing. Instead of generating verbose chain-of-thought sequences, LRD-LM refines its internal state over multiple recurrent iterations to improve text generation quality while keeping the parameter count modest.
## Architecture
The model is built around three key components:
- **Prelude Block:**
This block handles the initial processing by embedding input tokens and applying self-attention with positional encodings.
- **Recurrent Block:**
A core, weight-shared block that iteratively refines a latent state. By repeatedly processing the prelude output along with its own evolving state, the model effectively “thinks” over the input without outputting intermediate tokens.
- **Coda Block:**
The final block decodes the refined latent state into output token probabilities.
## Applications & Limitations
**Intended Uses:**
- **Text Generation:**
Generate creative text, dialogue, code, or other natural language content.
- **Research:**
Serve as a testbed for exploring novel architectures and techniques in language modeling.
**Limitations:**
- **Data Constraints:**
Trained on a small subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset, which may limit its performance compared to models trained on larger corpora.
- **Performance:**
While it demonstrates the potential of latent recurrent depth, its overall performance is experimental and may not match state-of-the-art models.
- **Computational Overhead:**
The iterative processing introduces extra computation.
- **Bias:**
As with all language models, generated outputs may reflect biases present in the training data.
## Training Details
The model was fine-tuned on a subset of the Wikitext-2-raw-v1 dataset (first 1000 samples) using the AdamW optimizer and a cosine annealing learning rate scheduler. The training configuration and hyperparameters are provided in the accompanying code, and adjustments may be needed for improved performance.
## Usage
The model can be used for text generation via its integrated `generate()` method, which allows you to control parameters such as the maximum sequence length, number of recurrent iterations, temperature, and top‑k filtering.
### Example: Direct Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel
# Load the model and tokenizer from the hub
model = AutoModelForCausalLM.from_pretrained("codewithdark/latent-recurrent-depth-lm")
tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")
prompt = "In the realm of language modeling"
input_ids = tokenizer(prompt, return_tensors='pt').input_ids
# Generate logits using a specified number of recurrent iterations
logits = model(input_ids, num_iterations=3)
# Sample from logits to produce generated text
import torch
probs = torch.softmax(logits[:, -1, :], dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
generated_ids = torch.cat([input_ids, next_token], dim=1)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
clean_text = generated_text.replace('Ġ','')
print(generated_text)
```
### Alternative: Using the `generate()` Method
```python
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")
model = AutoModel.from_pretrained("codewithdark/latent-recurrent-depth-lm", trust_remote_code=True)
prompt = "In the realm of language modeling"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=50, num_iterations=10, temperature=0.5, top_k=50)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
clean_text = generated_text.replace('Ġ','')
print(clean_text)
```
## Ethical Considerations
This model is intended for research and experimental use. Users must ensure ethical application and carefully consider potential biases and misuse when deploying or further developing this technology.
## License
This project is licensed under the MIT License.