TeeZee
/

NEBULA-23.8B-v1.0-bpw4.0-h6-exl2

Text Generation

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NEBULA-23.8B-v1.0

Technical notes

108 layers,DUS procedure, mistral(32)->SOLAR(48)->GALAXY(72)->NEBULA(108)
23.8B parameters
model created as a extension of depth upscaling procedure used for SOLAR by upstage

Results

model can and will produce NSFW content
GSM8k evaluation seems to be often broken, HellaSwag, Winograde and TQA show that its a smart model
RP and ERP work surprisingly good and I didn't encounter any GPTisms yet
lower memory footprint than 20B and 23B models
follows character card very well
NSFW output feels fresh comparing to existing models

Finetuning for RP

SFT using MinervaAI/Aesir-Preview dataset, 10 epochs
DPO using athirdpath/DPO_Pairs-Roleplay-Alpaca-NSFW dataset, 1 epoch
SFT using 1xAda6000, 10h
DPO using 1x3090, 30h
jupyter notebooks or mergekit configs for anyone wanting to reproduce/reuse scripts - just drop me a message

Prompt template

Alpaca
chat template is embedded in tokenizer config, should load automatically

Context size

4096

All comments are greatly appreciated, download, test and if you appreciate my work, consider buying me my fuel:

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	59.94
AI2 Reasoning Challenge (25-Shot)	66.72
HellaSwag (10-Shot)	86.98
MMLU (5-Shot)	65.40
TruthfulQA (0-shot)	57.60
Winogrande (5-shot)	82.95
GSM8k (5-shot)	0.00

Downloads last month: 5

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train TeeZee/NEBULA-23.8B-v1.0-bpw4.0-h6-exl2

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

66.720
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

86.980
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

65.400
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

57.600
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

82.950
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

0.000

View on Papers With Code