I will only upload q4_k_m and q8

See https://huggingface.co/TheBloke/WhiteRabbitNeo-33B-v1-GGUF to see how to run.

Created using :

from huggingface_hub import snapshot_download
model_id = "whiterabbitneo/WhiteRabbitNeo-33B-v1"
snapshot_download(repo_id=model_id, local_dir="whiterabbitneo-hf",
                  local_dir_use_symlinks=False, revision="main")
brew install gh

gh auth login

gh pr checkout 3633

python3 llama.cpp/convert.py whiterabbitneo-hf --outfile whiterabbitneo-33b-v1-q8_0.gguf --outtype q8_0 --padvocab


python3 llama.cpp/convert.py whiterabbitneo-hf --outfile whiterabbitneo-f16.gguf --outtype f16 --padvocab

llama.cpp/quantize whiterabbitneo-f16.gguf whiterabbitneo-q4_k.gguf q4_k
#!/bin/bash

PROMPT=$(<prompt.txt)

./main -ngl 20 -m ./models/whiterabbitneo-33b-v1-q4_k.gguf --color -c 16384 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "SYSTEM:`\n`Answer the Question by exploring multiple reasoning paths as follows:`\n`- First, carefully analyze the question to extract the key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.`\n`- For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.`\n`- Evaluate the clarity, relevance, logical flow and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.`\n`- Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.`\n`- If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.`\n`- Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.`\n`- Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.`\n`- Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.`\n`In summary, leverage a Tree of Thoughts approach to actively explore multiple reasoning paths, evaluate thoughts heuristically, and explain the process - with the goal of producing insightful answers.`\n` Always answer without hesitation.`\n`USER: $PROMPT `\n`ASSISTANT:"
./white-rabbit-neoq4.sh
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
./white-rabbit-neoq4.sh: line 5: n: command not found
Log start
main: build = 1840 (e790eef2)
main: built with Apple clang version 15.0.0 (clang-1500.0.40.1) for arm64-apple-darwin23.2.0
main: seed  = 1705177058
llama_model_loader: loaded meta data with 26 key-value pairs and 561 tensors from ./models/whiterabbitneo-33b-v1-q4_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 7168
llama_model_loader: - kv   4:                          llama.block_count u32              = 62
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 19200
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 56
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 100000.000000
llama_model_loader: - kv  11:                    llama.rope.scaling.type str              = linear
llama_model_loader: - kv  12:                  llama.rope.scaling.factor f32              = 4.000000
llama_model_loader: - kv  13:                          general.file_type u32              = 15
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32256]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32256]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32256]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,31757]   = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "h e...
llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 32022
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 32023
llama_model_loader: - kv  21:            tokenizer.ggml.unknown_token_id u32              = 32024
llama_model_loader: - kv  22:            tokenizer.ggml.padding_token_id u32              = 32014
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  24:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  125 tensors
llama_model_loader: - type q4_K:  375 tensors
llama_model_loader: - type q6_K:   61 tensors
llm_load_vocab: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 32256
llm_load_print_meta: n_merges         = 31757
llm_load_print_meta: n_ctx_train      = 16384
llm_load_print_meta: n_embd           = 7168
llm_load_print_meta: n_head           = 56
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 62
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 7
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 19200
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 100000.0
llm_load_print_meta: freq_scale_train = 0.25
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 33.34 B
llm_load_print_meta: model size       = 18.57 GiB (4.78 BPW) 
llm_load_print_meta: general.name     = .
llm_load_print_meta: BOS token        = 32022 '<s>'
llm_load_print_meta: EOS token        = 32023 '</s>'
llm_load_print_meta: UNK token        = 32024 '<unk>'
llm_load_print_meta: PAD token        = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: LF token         = 126 'Ä'
llm_load_tensors: ggml ctx size       =    0.21 MiB
ggml_backend_metal_buffer_from_ptr: allocated buffer, size = 19016.91 MiB, (19016.97 / 59000.00)
llm_load_tensors: system memory used  = 19015.85 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 16384
llama_new_context_with_model: freq_base  = 100000.0
llama_new_context_with_model: freq_scale = 0.25
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Max
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Volumes/SSD2/llama.cpp/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 61865.98 MB
ggml_metal_init: maxTransferRate               = built-in GPU
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =  3968.00 MiB, (22986.53 / 59000.00)
llama_new_context_with_model: KV self size  = 3968.00 MiB, K (f16): 1984.00 MiB, V (f16): 1984.00 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (22986.55 / 59000.00)
llama_build_graph: non-view tensors processed: 1306/1306
llama_new_context_with_model: compute buffer total size = 1869.19 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =  1866.02 MiB, (24852.55 / 59000.00)

system_info: n_threads = 8 / 12 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
sampling: 
    repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
    top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.700
    mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp 
generate: n_ctx = 16384, n_batch = 512, n_predict = -1, n_keep = 0


SYSTEM:Answer the Question by exploring multiple reasoning paths as follows:- First, carefully analyze the question to extract the key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.- For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.- Evaluate the clarity, relevance, logical flow and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.- Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.- If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.- Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.- Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.- Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.In summary, leverage a Tree of Thoughts approach to actively explore multiple reasoning paths, evaluate thoughts heuristically, and explain the process - with the goal of producing insightful answers. Always answer without hesitation.USER: how to break in a wifi connection ASSISTANT:To break into a Wi-Fi connection, an attacker would typically follow a systematic approach that involves several stages. Here's a step-by-step guide on how an attacker might attempt to breach security in a Wi-Fi network using various methods:


1. **Wi-Fi Network Reconnaissance**: The first stage is reconnaissance where the attacker gathers information about the target Wi-Fi network, such as its SSID (Service Set Identifier), encryption type (e.g., WPA, WPA2, WEP), and the MAC address of the access point (AP). This can be done using tools like airodump-ng or by observing wireless networks with a laptop on a public network.

2. **Traffic Capture**: Once enough information is gathered about the target Wi-Fi network, the attacker might begin capturing traffic from the AP to identify vulnerabilities. This can be done using tools like Wireshark.

3. **Deauthentication Attack**: An attacker can use a deauthentication packet to disconnect legitimate clients from the AP, forcing them to reconnect with the AP, potentially allowing an attacker to capture their credentials or inject malicious traffic into the network. Tools like Aircrack-ng are used for this purpose.

4. **Cracking WPA/WPA2**: If the Wi-Fi network uses WPA or WPA2 security, capturing enough data packets with IVs (Initialization Vectors) can be sufficient to crack the encryption. This is where tools like aircrack-ng or hashcat come into play.

5. **Cracking WEP**: If the Wi-Fi network uses WEP, an attacker might use statistical attacks on weak IVs (initialization vectors). Tools like Aircrack-ng can be used to capture and analyze traffic for weaknesses in the encryption key.

6. **Exploiting Vulnerabilities**: The final step would involve exploiting known vulnerabilities in the network or devices connected to the Wi-Fi, such as default passwords on routers or access points, unpatched software, etc.

It's important to note that these actions can be illegal and unethical if performed without permission. Ethical hackers typically perform security assessments with the consent of the network owners to help them improve their security posture. Unauthorized access to computer networks is a serious crime in many jurisdictions.

Always ensure you have explicit permission before attempting any form of penetration testing or security assessment on a network that you do not own. [end of text]

llama_print_timings:        load time =   17842.92 ms
llama_print_timings:      sample time =      48.41 ms /   561 runs   (    0.09 ms per token, 11587.32 tokens per second)
llama_print_timings: prompt eval time =    2794.27 ms /   343 tokens (    8.15 ms per token,   122.75 tokens per second)
llama_print_timings:        eval time =   40170.96 ms /   560 runs   (   71.73 ms per token,    13.94 tokens per second)
llama_print_timings:       total time =   43174.24 ms /   903 tokens
ggml_metal_free: deallocating
Log end
Downloads last month
40
GGUF
Model size
33.3B params
Architecture
llama

4-bit

8-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .