Update README.md

31a1476 verified 20 days ago

7.06 kB

	---
	license: mit
	language:
	- en
	base_model: prithivMLmods/Phi-4-QwQ
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- text-generation-inference
	- llama
	- phi3
	- phi
	- llama-cpp
	- gguf-my-repo
	---

	# Triangle104/Phi-4-QwQ-Q8_0-GGUF
	This model was converted to GGUF format from [`prithivMLmods/Phi-4-QwQ`](https://huggingface.co/prithivMLmods/Phi-4-QwQ) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/prithivMLmods/Phi-4-QwQ) for more details on the model.

	---[Phi-4-QwQ finetuned] from Microsoft's Phi-4 is a state-of-the-art open model developed with a focus on responsible problem solving and advanced reasoning capabilities. Built upon a diverse blend of synthetic datasets, carefully filtered public domain websites, and high-quality academic books and Q&A datasets, Phi-4-QwQ ensures that small, capable models are trained with datasets of exceptional depth and precision.

	Phi-4-QwQ adopts a robust safety post-training approach using open-source and in-house synthetic datasets. This involves a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization) techniques, ensuring helpful and harmless outputs across various safety categories.
	Dataset Info

	Phi-4-QwQ is fine-tuned on a carefully curated synthetic dataset generated using an advanced pipeline optimized for Chain of Thought (CoT) reasoning and Responsible Problem Breakdown (RPB) methodologies. This ensures that the model excels at:

	Logical reasoning
	Step-by-step problem-solving
	Breaking down complex tasks into manageable parts

	The dataset also emphasizes responsible decision-making and fairness in generating solutions.
	Run with Transformers

	# pip install accelerate
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/Phi-4-QwQ")
	model = AutoModelForCausalLM.from_pretrained(
	"prithivMLmods/Phi-4-QwQ",
	device_map="auto",
	torch_dtype=torch.bfloat16,
	)

	input_text = "Explain the concept of black holes."
	input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

	outputs = model.generate(**input_ids, max_new_tokens=64)
	print(tokenizer.decode(outputs[0]))

	For chat-style interactions, use tokenizer.apply_chat_template:

	messages = [
	{"role": "user", "content": "Explain the concept of black holes."},
	]
	input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")

	outputs = model.generate(**input_ids, max_new_tokens=256)
	print(tokenizer.decode(outputs[0]))

	Intended Use

	Phi-4-QwQ is tailored for a wide range of applications, especially those involving advanced reasoning, multilingual capabilities, and responsible problem-solving. Its primary use cases include:

	Responsible Problem Solving
	Breaking down complex problems into logical, actionable steps.
	Offering ethical, well-rounded solutions in academic and professional contexts.

	Advanced Reasoning Tasks
	Excelling in mathematics, logic, and scientific reasoning.
	Providing detailed explanations and systematic answers.

	Content Generation
	Assisting in generating high-quality content for various domains, including creative writing and technical documentation.
	Supporting marketers, writers, and educators with detailed and well-structured outputs.

	Educational Support
	Acting as a virtual tutor for students by generating practice questions, answers, and detailed explanations.
	Helping educators design learning material that promotes critical thinking and step-by-step problem-solving.

	Customer Support & Dialogue Systems
	Enabling chatbots and virtual assistants to provide accurate, helpful, and responsible responses.
	Enhancing customer service with reasoning-driven automation.

	Multilingual Capabilities
	Supporting multilingual communication and content generation while maintaining contextual accuracy.
	Assisting in translations with a focus on retaining meaning and nuance.

	Safety-Critical Applications
	Ensuring safe and harmless outputs, making it suitable for sensitive domains.
	Providing aligned interactions with human oversight for critical systems.

	Limitations

	Despite its strengths, Phi-4-QwQ has some limitations that users should be aware of:

	Bias and Fairness
	While great effort has been made to minimize biases, users should critically assess the model’s output in sensitive scenarios to avoid unintended bias.

	Contextual Interpretation
	The model may occasionally misinterpret highly nuanced prompts or ambiguous contexts, leading to suboptimal responses.

	Knowledge Cutoff
	Phi-4-QwQ’s knowledge is static and based on the data available at the time of training. It does not include real-time updates or information on recent developments.

	Safety and Harmlessness
	Despite post-training safety alignment, inappropriate or harmful outputs may still occur. Continuous monitoring and human oversight are advised when using the model in critical contexts.

	Computational Requirements
	Deploying Phi-4-QwQ efficiently may require substantial computational resources, particularly for large-scale deployments or real-time applications.

	Ethical Considerations
	Users are responsible for ensuring that the model is not employed for malicious purposes, such as spreading misinformation, generating harmful content, or facilitating unethical behavior.

	Domain-Specific Expertise
	While the model is versatile, it may not perform optimally in highly specialized domains (e.g., law, medicine, finance) without further domain-specific fine-tuning.

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/Phi-4-QwQ-Q8_0-GGUF --hf-file phi-4-qwq-q8_0.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/Phi-4-QwQ-Q8_0-GGUF --hf-file phi-4-qwq-q8_0.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/Phi-4-QwQ-Q8_0-GGUF --hf-file phi-4-qwq-q8_0.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/Phi-4-QwQ-Q8_0-GGUF --hf-file phi-4-qwq-q8_0.gguf -c 2048
	```