llama3-janus / README.md

Update README.md

788246a verified 27 days ago

3.78 kB

	---
	language:
	- en
	base_model:
	- meta-llama/Meta-Llama-3-8B
	pipeline_tag: text2text-generation
	---

	## Janus
	(Built with Meta Llama 3)

	For the version with the PoS tag visit [Janus (PoS)](https://huggingface.co/ChangeIsKey/llama3-janus-pos).

	### Model Details
	- Model Name: Janus
	- Version: 1.0
	- Developers: Pierluigi Cassotti, Nina Tahmasebi
	- Affiliation: University of Gothenburg
	- License: MIT
	- GitHub Repository: [Historical Word Usage Generation](https://github.com/ChangeIsKey/historical-word-usage-generation)
	- Paper: [Sense-specific Historical Word Usage Generation](https://transacl.org)
	- Contact: [email protected]

	### Model Description
	Janus is a fine-tuned Llama 3 8B model designed to generate historically and semantically accurate word usages. It takes as input a word, its sense definition, and a year and produces example sentences that reflect linguistic usage from the specified period. This model is particularly useful for semantic change detection, historical NLP, and linguistic research.

	### Intended Use
	- Semantic Change Detection: Investigating how word meanings evolve over time.
	- Historical Text Processing: Enhancing the understanding and modeling of historical texts.
	- Corpus Expansion: Generating sense-annotated corpora for linguistic studies.

	### Training Data
	- Dataset: Extracted from the Oxford English Dictionary (OED)
	- Size: Over 1.2 million sense-annotated historical usages
	- Time Span: 1700 - 2020
	- Data Format:
	```
	<year><\|t\|><lemma><\|t\|><definition><\|s\|><historical usage sentence><\|end\|>
	```
	- Janus (PoS) Format:
	```
	<year><\|t\|><lemma><\|t\|><definition><\|p\|><PoS><\|p\|><\|s\|><historical usage sentence><\|end\|>
	```

	### Training Procedure
	- Base Model: `meta-llama/Llama-3-8B`
	- Optimization: QLoRA (Quantized Low-Rank Adaptation)
	- Batch Size: 4
	- Learning Rate: 2e-4
	- Epochs: 1

	### Model Performance
	- Temporal Accuracy: Root mean squared error (RMSE) of ~52.7 years (close to OED ground truth)
	- Semantic Accuracy: Comparable to OED test data on human evaluations
	- Context Variability: Low lexical repetition, preserving natural linguistic diversity

	### Usage Example
	#### Generating Historical Usages
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "ChangeIsKey/llama3-janus"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	input_text = "1800<\|t\|>awful<\|t\|>Used to emphasize something unpleasant or negative; ‘such a’, ‘an absolute’.<\|s\|>"
	inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

	output = model.generate(**inputs, temperature=1.0, top_p=0.9, max_new_tokens=50)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	For more examples, see the GitHub repository [Historical Word Usage Generation](https://github.com/ChangeIsKey/historical-word-usage-generation)

	### Limitations & Ethical Considerations
	- Historical Bias: The model may reflect biases present in historical texts.
	- Time Granularity: The temporal resolution is approximate (~50 years RMSE).
	- Modern Influence: Despite fine-tuning, the model may still generate modern phrases in older contexts.
	- Not Trained for Fairness: The model has not been explicitly trained to be fair or unbiased. It may produce sensitive, outdated, or culturally inappropriate content.

	### Citation
	If you use Janus, please cite:
	```
	@article{Cassotti2024Janus,
	author = {Pierluigi Cassotti and Nina Tahmasebi},
	title = {Sense-specific Historical Word Usage Generation},
	journal = {TACL},
	year = {2025}
	}
	```