Triangle104's picture
Update README.md
a2b464f verified
|
raw
history blame
4.92 kB
metadata
library_name: transformers
license: apache-2.0
license_link: >-
  https://huggingface.co/huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2/blob/main/LICENSE
language:
  - en
pipeline_tag: text-generation
base_model: huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
tags:
  - chat
  - abliterated
  - uncensored
  - llama-cpp
  - gguf-my-repo

Triangle104/Qwen2.5-14B-Instruct-abliterated-v2-Q6_K-GGUF

This model was converted to GGUF format from huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2 using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.


Model details:

This is an uncensored version of Qwen2.5-14B-Instruct created with abliteration (see this article to know more about it).

Special thanks to @FailSpy for the original code and technique. Please follow him if you're interested in abliterated models.

Important Note This version is an improvement over the previous one Qwen2.5-14B-Instruct-abliterated. Usage

You can use this model in your applications by loading it with Hugging Face's transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

Load the model and tokenizer

model_name = "huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name)

Initialize conversation context

initial_messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."} ] messages = initial_messages.copy() # Copy the initial conversation context

Enter conversation loop

while True: # Get user input user_input = input("User: ").strip() # Strip leading and trailing spaces

# If the user types '/exit', end the conversation
if user_input.lower() == "/exit":
    print("Exiting chat.")
    break

# If the user types '/clean', reset the conversation context
if user_input.lower() == "/clean":
    messages = initial_messages.copy()  # Reset conversation context
    print("Chat history cleared. Starting a new conversation.")
    continue

# If input is empty, prompt the user and continue
if not user_input:
    print("Input cannot be empty. Please enter something.")
    continue

# Add user input to the conversation
messages.append({"role": "user", "content": user_input})

# Build the chat template
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize input and prepare it for the model
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate a response from the model
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)

# Extract model output, removing special tokens
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Add the model's response to the conversation
messages.append({"role": "assistant", "content": response})

# Print the model's response
print(f"Qwen: {response}")

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/Qwen2.5-14B-Instruct-abliterated-v2-Q6_K-GGUF --hf-file qwen2.5-14b-instruct-abliterated-v2-q6_k.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/Qwen2.5-14B-Instruct-abliterated-v2-Q6_K-GGUF --hf-file qwen2.5-14b-instruct-abliterated-v2-q6_k.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/Qwen2.5-14B-Instruct-abliterated-v2-Q6_K-GGUF --hf-file qwen2.5-14b-instruct-abliterated-v2-q6_k.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Triangle104/Qwen2.5-14B-Instruct-abliterated-v2-Q6_K-GGUF --hf-file qwen2.5-14b-instruct-abliterated-v2-q6_k.gguf -c 2048