Text Generation
Transformers
GGUF
English
Inference Endpoints
Edit model card

WikiChat-v0.2

Training in progress model to have conversations.

The GGUFs uploaded are full FP32 precision.

Using OpenOrca GPT-4 data + cosmopedia for some extra data + dolly15k for instruct

Model Details:

  • 83.59M parameters (83591800)
  • 8 attention heads
  • 40 layers
  • 384 embeddings size
  • 4096/8192/16384 context (please use 2/4x RoPE scaling, may train a 16k finetuned version later)
  • Batch size 16
  • llama.cpp (train-text-from-scratch)

Prompt Format (Alpaca):

Instruction: {system}
Input: {prompt}
Response: {response}

Please structure your prompts in an instruct format for maximum performance.

Training Details:

  • 1x RTX 3070 8GB (Infrencing speed: 80tok/s, full GPU offload)
  • 1x Ryzen 3 3700x
  • 96gb RAM
  • 10 iterations
  • Loss Target = 2.5 to 3.0
  • Approx 480 samples/1M train tokens (>0.0001 epoches)
  • Training data = Refer to OpenOrca page

Notes:

The model isn't ready yet; this is to test tokenization of OpenOrca and a balance between training speed and model size

Example output:

User: What is the square root of 4?
Assistant: The square root of 4 is 2.
Downloads last month
33
GGUF
Model size
83.6M params
Architecture
llama

32-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train leafspark/wikichat-v2