--- license: apache-2.0 datasets: - 6cf/liveideabench language: - en base_model: - Qwen/QwQ-32B-Preview tags: - chemistry - biology - climate - medical --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/JEZgA_xV6oF8AIsya9dop.jpeg) # IdeaWhiz Model Card ๐Ÿง  ## Model Summary ๐Ÿ”ฌ IdeaWhiz is a fine-tuned version of QwQ-32B-Preview, specifically optimized for scientific creativity and step-by-step reasoning. The model leverages the LiveIdeaBench dataset to enhance its capabilities in generating novel scientific ideas and hypotheses. ## Key Features ๐ŸŒŸ - Base Model: QwQ-32B-Preview ๐Ÿ”‹ - Training Dataset: LiveIdeaBench ๐Ÿ“Š - Main Focus: Scientific creativity and idea generation ๐Ÿ’ก - Reasoning Style: o1-style step-by-step reasoning โšก ## Intended Use ๐ŸŽฏ - Scientific hypothesis generation ๐Ÿงช - Creative problem-solving in research ๐Ÿ” - Step-by-step scientific reasoning ๐Ÿ“ - Research direction brainstorming ๐ŸŒฑ ## Model Performance Compared to QwQ-32B-Preview ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/a1PnP5YH_4b5SrH7JdGBf.png) ## Quickstart ๐Ÿš€ ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "6cf/QwQ-32B-Preview-IdeaWhiz-v1" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = """I'll be submitting your next responses to a "Good Scientific Idea" expert review panel. If they consider your idea to be a good one, you'll receive a reward. Your assigned keyword is: "cancer". You may provide background information. The idea MUST be within 100 words (including background information). (Note: good scientific ideas should be novel, verifiable, practically valuable, and able to advance the field.). NOTE: You MUST give your answer after **Final Idea:** """ messages = [ {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=4096 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` # Example Outputs ``` Alright, so I have this task to come up with a good scientific idea related to cancer, and it has to be novel, verifiable, practically valuable, and able to advance the field. Plus, it needs to be within 100 words, including any background information. That's a pretty tight constraint, but I think I can manage. First, ... ... **Final Idea:** propose a novel approach to cancer treatment by developing personalized cancer vaccines tailored to each patient's unique tumor mutations. By integrating machine learning algorithms with comprehensive omics data, including genomics, epigenetics, transcriptomics, and proteomics, this method aims to identify and prioritize the most immunogenic tumor-specific antigens. This integrated strategy could enhance vaccine efficacy and represents an advancement in precision medicine for cancer treatment. ``` # Traning Dataset ## ๐Ÿค–๐Ÿ’ก LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context ### Dataset [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-yellow)](https://huggingface.co/datasets/6cf/liveideabench) ### Paper [![arXiv](https://img.shields.io/badge/arXiv-2412.17596-b31b1b.svg)](https://arxiv.org/abs/2412.17596) If you use this model, please cite: ``` @article{ruan2024liveideabench, title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context}, author={Kai Ruan and Xuan Wang and Jixiang Hong and Peng Wang and Yang Liu and Hao Sun}, journal={arXiv preprint arXiv:2412.17596}, year={2024} } ```