--- library_name: transformers tags: - medical-qa - healthcare - llama - fine-tuned - llama-cpp - gguf-my-repo license: llama3.2 datasets: - ruslanmv/ai-medical-chatbot base_model: Ellbendls/llama-3.2-3b-chat-doctor --- # Triangle104/llama-3.2-3b-chat-doctor-Q4_K_M-GGUF This model was converted to GGUF format from [`Ellbendls/llama-3.2-3b-chat-doctor`](https://huggingface.co/Ellbendls/llama-3.2-3b-chat-doctor) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. Refer to the [original model card](https://huggingface.co/Ellbendls/llama-3.2-3b-chat-doctor) for more details on the model. --- Model details: - Llama-3.2-3B-Chat-Doctor is a specialized medical question-answering model based on the Llama 3.2 3B architecture. This model has been fine-tuned specifically for providing accurate and helpful responses to medical-related queries. Developed by: Ellbendl Satria Model type: Language Model (Conversational AI) Language: English Base Model: Meta Llama-3.2-3B-Instruct Model Size: 3 Billion Parameters Specialization: Medical Question Answering License: llama3.2 Model Capabilities Provides informative responses to medical questions Assists in understanding medical terminology and health-related concepts Offers preliminary medical information (not a substitute for professional medical advice) Direct Use This model can be used for: Providing general medical information Explaining medical conditions and symptoms Offering basic health-related guidance Supporting medical education and patient communication Limitations and Important Disclaimers ⚠️ CRITICAL WARNINGS: NOT A MEDICAL PROFESSIONAL: This model is NOT a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider for medical concerns. The model's responses should be treated as informational only and not as medical recommendations. Out-of-Scope Use The model SHOULD NOT be used for: Providing emergency medical advice Diagnosing specific medical conditions Replacing professional medical consultation Making critical healthcare decisions Bias, Risks, and Limitations Potential Biases May reflect biases present in the training data Responses might not account for individual patient variations Limited by the comprehensiveness of the training dataset Technical Limitations Accuracy is limited to the knowledge in the training data May not capture the most recent medical research or developments Cannot perform physical examinations or medical tests Recommendations Always verify medical information with professional healthcare providers Use the model as a supplementary information source Be aware of potential inaccuracies or incomplete information Training Details Training Data Source Dataset: ruslanmv/ai-medical-chatbot Base Model: Meta Llama-3.2-3B-Instruct Training Procedure [Provide details about the fine-tuning process, if available] Fine-tuning approach Computational resources used Training duration Specific techniques applied during fine-tuning How to Use the Model Hugging Face Transformers from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Ellbendls/llama-3.2-3b-chat-doctor" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Example usage input_text = "I had a surgery which ended up with some failures. What can I do to fix it?" # Prepare inputs with explicit padding and attention mask inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True) # Generate response with more explicit parameters outputs = model.generate( input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], max_new_tokens=150, # Specify max new tokens to generate do_sample=True, # Enable sampling for more diverse responses temperature=0.7, # Control randomness of output top_p=0.9, # Nucleus sampling to maintain quality num_return_sequences=1 # Number of generated sequences ) # Decode the generated response response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) Ethical Considerations This model is developed with the intent to provide helpful, accurate, and responsible medical information. Users are encouraged to: Use the model responsibly Understand its limitations Seek professional medical advice for serious health concerns --- ## Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) ```bash brew install llama.cpp ``` Invoke the llama.cpp server or the CLI. ### CLI: ```bash llama-cli --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q4_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q4_k_m.gguf -p "The meaning to life and the universe is" ``` ### Server: ```bash llama-server --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q4_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q4_k_m.gguf -c 2048 ``` Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. Step 1: Clone llama.cpp from GitHub. ``` git clone https://github.com/ggerganov/llama.cpp ``` Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). ``` cd llama.cpp && LLAMA_CURL=1 make ``` Step 3: Run inference through the main binary. ``` ./llama-cli --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q4_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q4_k_m.gguf -p "The meaning to life and the universe is" ``` or ``` ./llama-server --hf-repo Triangle104/llama-3.2-3b-chat-doctor-Q4_K_M-GGUF --hf-file llama-3.2-3b-chat-doctor-q4_k_m.gguf -c 2048 ```