iimaginary
/

schorbGPT-medium

Text Generation

Model card Files Files and versions

schorbGPT-medium / README.md

iimaginary's picture

Update README.md

466613f verified 8 months ago

|

history blame contribute delete

3.25 kB

	---
	language:
	- en
	tags:
	- gpt2
	- text-generation
	- pytorch
	license: mit
	---

	# SchorbGPT-Medium

	This is a medium sized language model trained on web data. The model uses the GPT-2 architecture and tokenizer.

	## Model Details

	- Model Type: GPT-2
	- Training Data: Web text data
	- Number of Parameters: GPT-2 medium scale
	- Context Length: 512 tokens
	- Training Framework: PyTorch

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("iimaginary/schorbGPT-medium")
	model = AutoModelForCausalLM.from_pretrained("iimaginary/schorbGPT-medium")

	text = "Your prompt here"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=100)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Performance and Model Analysis

	### Zero-shot Evaluation Results

	\| Task \| Metric \| Value \| Stderr \|
	\|------\|--------\|-------\|--------\|
	\| WikiText \| bits_per_byte \| 0.9860 \| N/A \|
	\| WikiText \| byte_perplexity \| 1.9806 \| N/A \|
	\| WikiText \| word_perplexity \| 38.6497 \| N/A \|
	\| ARC Easy \| accuracy \| 48.02% \| ±1.03% \|
	\| ARC Easy \| accuracy (normalized) \| 42.17% \| ±1.01% \|
	\| HellaSwag \| accuracy \| 29.06% \| ±0.45% \|
	\| HellaSwag \| accuracy (normalized) \| 31.26% \| ±0.46% \|
	\| LAMBADA \| accuracy \| 33.90% \| ±0.66% \|
	\| LAMBADA \| perplexity \| 36.2055 \| ±1.4052 \|
	\| PIQA \| accuracy \| 61.92% \| ±1.13% \|
	\| PIQA \| accuracy (normalized) \| 62.46% \| ±1.13% \|
	\| Winogrande \| accuracy \| 50.59% \| ±1.41% \|

	### Analysis and Comparisons

	#### Language Modeling Performance
	The model achieves a word perplexity of 38.65 on WikiText, which is competitive with similar-sized models. For comparison:
	- Original GPT-2 (small): ~35-40 perplexity
	- GPT-2 medium: ~30-35 perplexity
	- BERT-base: ~40-45 perplexity

	#### Task-Specific Analysis:

	1. Physical and Commonsense Reasoning:
	- PIQA: 61.92% (Random baseline: 50%)
	- Comparable to GPT-2 small/medium performance
	- Shows good physical commonsense understanding

	2. Science Knowledge:
	- ARC Easy: 48.02% (Random baseline: 25%)
	- Above random chance and demonstrates basic scientific knowledge
	- Similar to performance seen in early GPT-2 variants

	3. Linguistic Understanding:
	- LAMBADA: 33.90% accuracy with perplexity of 36.21
	- HellaSwag: 31.26% (Random baseline: 25%)
	- Performance indicates basic linguistic and contextual understanding
	- Typical range for non-fine-tuned models of this scale

	4. Reasoning and Logic:
	- Winogrande: 50.59% (Random baseline: 50%)
	- At par with random chance, suggesting room for improvement in complex reasoning tasks
	- Common for base models without specific fine-tuning

	### Strengths and Limitations

	Strengths:
	- Strong performance on physical commonsense (PIQA)
	- Decent basic science knowledge (ARC Easy)
	- Competitive language modeling metrics

	Limitations:
	- Limited complex reasoning capabilities (Winogrande)
	- Basic linguistic understanding could be improved (LAMBADA, HellaSwag)
	- Performance typical of base models without task-specific fine-tuning

	## Limitations

	This is a base model without fine-tuning or alignment. It should be used with appropriate consideration of its capabilities and limitations.