File size: 3,812 Bytes
1907e26 93a1fa9 1907e26 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
library_name: transformers
tags:
- pruna-ai
---
# Model Card for PrunaAI/test-tiny-random-llama4-smashed
This model was created using the [pruna](https://github.com/PrunaAI/pruna) library. Pruna is a model optimization framework built for developers, enabling you to deliver more efficient models with minimal implementation overhead.
## Usage
First things first, you need to install the pruna library:
```bash
pip install pruna
```
You can then load this model using the following code:
```python
from pruna import PrunaModel
loaded_model = PrunaModel.from_hub("PrunaAI/test-tiny-random-llama4-smashed")
```
After loading the model, you can use the inference methods of the original model.
## Smash Configuration
The compression configuration of the model is stored in the `smash_config.json` file.
```bash
{
"batcher": null,
"cacher": null,
"compiler": null,
"pruner": null,
"quantizer": null,
"max_batch_size": 1,
"device": "cpu",
"save_fns": [],
"load_fns": [
"transformers"
],
"reapply_after_load": {
"pruner": null,
"quantizer": null,
"cacher": null,
"compiler": null,
"batcher": null
}
}
```
## Model Configuration
The configuration of the model is stored in the `config.json` file.
```bash
{
"config": {
"architectures": [
"Llama4TextModel"
],
"attention_bias": false,
"attention_chunk_size": 8192,
"attention_dropout": 0.0,
"attn_scale": 0.1,
"attn_temperature_tuning": 4,
"bos_token_id": 200000,
"cache_implementation": "hybrid",
"eos_token_id": [
200001,
200007,
200008
],
"floor_scale": 8192,
"for_llm_compressor": false,
"head_dim": 8,
"hidden_act": "silu",
"hidden_size": 16,
"initializer_range": 0.02,
"interleave_moe_layer_step": 1,
"intermediate_size": 32,
"intermediate_size_mlp": 64,
"max_position_embeddings": 10485760,
"model_type": "llama4_text",
"moe_layers": [
0,
1,
2,
3,
4
],
"no_rope_layers": [
1,
1,
1,
0,
1
],
"num_attention_heads": 10,
"num_experts_per_tok": 1,
"num_hidden_layers": 5,
"num_key_value_heads": 2,
"num_local_experts": 4,
"output_router_logits": false,
"pad_token_id": 200018,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"router_aux_loss_coef": 0.001,
"router_jitter_noise": 0.0,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.51.3",
"use_cache": true,
"use_qk_norm": true,
"vocab_size": 202048
}
}
```
## 🌍 Join the Pruna AI community!
[](https://twitter.com/PrunaAI)
[](https://github.com/PrunaAI)
[](https://www.linkedin.com/company/93832878/admin/feed/posts/?feedType=following)
[](https://discord.com/invite/rskEr4BZJx)
[](https://www.reddit.com/r/PrunaAI/) |