Update README.md

9fad8af verified 5 months ago

4.48 kB

	---
	license: mit
	datasets:
	- wikimedia/wikipedia
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- pytorch
	- Thinking
	- CustomModel
	---

	# Latent Recurrent Depth Language Model

	## Overview

	The Latent Recurrent Depth Language Model (LRD-LM) is an experimental text-generation architecture designed to capture deeper contextual information through iterative, latent processing. Instead of generating verbose chain-of-thought sequences, LRD-LM refines its internal state over multiple recurrent iterations to improve text generation quality while keeping the parameter count modest.

	## Architecture

	The model is built around three key components:

	- Prelude Block:
	This block handles the initial processing by embedding input tokens and applying self-attention with positional encodings.

	- Recurrent Block:
	A core, weight-shared block that iteratively refines a latent state. By repeatedly processing the prelude output along with its own evolving state, the model effectively “thinks” over the input without outputting intermediate tokens.

	- Coda Block:
	The final block decodes the refined latent state into output token probabilities.

	## Applications & Limitations

	Intended Uses:
	- Text Generation:
	Generate creative text, dialogue, code, or other natural language content.
	- Research:
	Serve as a testbed for exploring novel architectures and techniques in language modeling.

	Limitations:
	- Data Constraints:
	Trained on a small subset (first 1000 samples) of the Wikitext-2-raw-v1 dataset, which may limit its performance compared to models trained on larger corpora.
	- Performance:
	While it demonstrates the potential of latent recurrent depth, its overall performance is experimental and may not match state-of-the-art models.
	- Computational Overhead:
	The iterative processing introduces extra computation.
	- Bias:
	As with all language models, generated outputs may reflect biases present in the training data.

	## Training Details

	The model was fine-tuned on a subset of the Wikitext-2-raw-v1 dataset (first 1000 samples) using the AdamW optimizer and a cosine annealing learning rate scheduler. The training configuration and hyperparameters are provided in the accompanying code, and adjustments may be needed for improved performance.

	## Usage

	The model can be used for text generation via its integrated `generate()` method, which allows you to control parameters such as the maximum sequence length, number of recurrent iterations, temperature, and top‑k filtering.

	### Example: Direct Inference

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel

	# Load the model and tokenizer from the hub
	model = AutoModelForCausalLM.from_pretrained("codewithdark/latent-recurrent-depth-lm")
	tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")

	prompt = "In the realm of language modeling"
	input_ids = tokenizer(prompt, return_tensors='pt').input_ids

	# Generate logits using a specified number of recurrent iterations
	logits = model(input_ids, num_iterations=3)

	# Sample from logits to produce generated text
	import torch
	probs = torch.softmax(logits[:, -1, :], dim=-1)
	next_token = torch.multinomial(probs, num_samples=1)
	generated_ids = torch.cat([input_ids, next_token], dim=1)
	generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	clean_text = generated_text.replace('Ġ','')
	print(generated_text)
	```

	### Alternative: Using the `generate()` Method

	```python
	from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("codewithdark/latent-recurrent-depth-lm")
	model = AutoModel.from_pretrained("codewithdark/latent-recurrent-depth-lm", trust_remote_code=True)

	prompt = "In the realm of language modeling"
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids
	generated_ids = model.generate(input_ids, max_length=50, num_iterations=10, temperature=0.5, top_k=50)
	generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
	clean_text = generated_text.replace('Ġ','')
	print(clean_text)

	```

	## Ethical Considerations

	This model is intended for research and experimental use. Users must ensure ethical application and carefully consider potential biases and misuse when deploying or further developing this technology.

	## License

	This project is licensed under the MIT License.