wasm-32B-Instruct-V1

wasm-32B-Instruct-V1 is a state-of-the-art instruction-tuned large language model developed by wasmdashai. With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks,

πŸš€ Introduction

wasm-32B-Instruct-V1 is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including:

  • 🧠 Code generation and debugging
  • πŸ“š Long-context understanding
  • πŸ—£οΈ Multi-turn dialogue and reasoning
  • πŸ” Privacy-conscious edge deployments (e.g., via WebAssembly)

This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency.

πŸ—οΈ Model Details

  • Type: Causal Language Model (Decoder-only)

  • Parameters: 32 Billion

  • Training: Pretraining + Instruction Fine-tuning

  • Architecture: Transformer with:

    • Rotary Position Embeddings (RoPE)
    • SwiGLU activation
    • RMSNorm
    • Attention with QKV bias
  • Context Length: Up to 32,768 tokens

  • Extended Context Option: Via rope_scaling (supports up to 128K with YaRN)

  • Format: Hugging Face Transformers-compatible

βš™οΈ Requirements

To use this model, install the latest version of πŸ€— transformers (>= 4.37.0 recommended):

pip install --upgrade transformers

πŸ§ͺ Quickstart

Here is a minimal example to load the model and generate a response:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "wasmdashai/wasm-32B-Instruct-V1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Explain the concept of recursion with Python code."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

🧩 Processing Long Texts

This model supports context lengths up to 32,768 tokens. For even longer inputs, you can enable YaRN scaling by modifying the config.json as follows:

{
  "rope_scaling": {
    "type": "yarn",
    "factor": 4.0,
    "original_max_position_embeddings": 32768
  }
}

This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits.

πŸ“¦ Deployment Notes

We recommend using vLLM for efficient deployment, especially with large input lengths or real-time serving needs. Please note:

  • vLLM currently supports static YaRN only.
  • Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs.

πŸ“¬ Contact

For support, feedback, or collaboration inquiries, please contact:

πŸ“§ [email protected]


Downloads last month
4
Safetensors
Model size
32.8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for wasmdashai/wasm-32B-Instruct-V1

Quantizations
1 model

Space using wasmdashai/wasm-32B-Instruct-V1 1