Update README.md

4dbbf0c verified 2 months ago

5.38 kB

	---
	license: apache-2.0
	datasets:
	- prithivMLmods/Math-Solve
	- AI-MO/NuminaMath-CoT
	- amphora/QwQ-LongCoT-130K
	- amphora/QwQ-LongCoT-130K-2
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-14B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- Math
	- text-generation-inference
	- Deep-think
	---

	# Deepthink-Reasoning-14B

	The Deepthink-Reasoning-14B model is a fine-tuned version of the Qwen2.5 base model, designed for text generation tasks requiring deep reasoning, logical structuring, and problem-solving. This model leverages its optimized architecture to provide accurate and contextually relevant outputs for complex queries, making it ideal for applications in education, programming, and creative writing.

	With its robust natural language processing capabilities, Deepthink-Reasoning-14B excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates an advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.

	- It possesses significantly more knowledge and exhibits greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
	- Offers substantial improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format. It is more resilient to diverse system prompts, enhancing role-play implementation and condition-setting for chatbots.
	- Provides long-context support for up to 128K tokens and can generate up to 8K tokens.
	- Features multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.


	# Quickstart with Tranformers

	Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "prithivMLmods/Deepthink-Reasoning-14B"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Give me a short introduction to large language model."
	messages = [
	{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	### Intended Use:
	1. Education: Ideal for creating step-by-step solutions to complex problems, explanations, and generating educational content in multiple languages.
	2. Programming: Excels in coding tasks, debugging, and generating structured outputs such as JSON, enhancing productivity for developers.
	3. Creative Writing: Suitable for generating stories, essays, and other forms of creative content with logical and coherent structure.
	4. Long-Context Processing: Capable of handling and generating long texts, making it useful for summarizing lengthy documents or creating detailed reports.
	5. Multilingual Applications: Supports 29+ languages, enabling usage in global contexts for translation, multilingual education, and cross-cultural communication.
	6. Data Structuring: Performs well with structured data, such as tables and JSON outputs, making it effective for business analytics and automated report generation.
	7. Chatbots and Role-Play: Enhances chatbot interactions with its ability to follow diverse instructions, adapt to different prompts, and maintain long conversational contexts.


	### Limitations:
	1. Resource Requirements: Its large size and capabilities demand significant computational resources, making it less accessible for low-resource environments.
	2. Hallucination Risk: The model may generate incorrect or fabricated information, particularly when dealing with unknown or ambiguous inputs.
	3. Limited Domain-Specific Expertise: While it has broad knowledge, it might underperform in highly specialized fields not covered in its training data.
	4. Long-Context Limitations: Although it supports up to 128K tokens, performance may degrade or exhibit inefficiencies with extremely lengthy or complex contexts.
	5. Bias in Outputs: The model might reflect biases present in its training data, affecting its objectivity in certain contexts or cultural sensitivity in multilingual outputs.
	6. Dependence on Prompt Quality: Results heavily depend on well-structured and clear inputs. Poorly framed prompts can lead to irrelevant or suboptimal responses.
	7. Error in Multilingual Output: Despite robust multilingual support, subtle errors in grammar, syntax, or cultural nuances might appear, especially in low-resource languages.