File size: 4,442 Bytes
723fe3f bcd8fdf 36c8ce9 f80ef4b 723fe3f bcd8fdf aad629a bcd8fdf aad629a bcd8fdf aad629a bcd8fdf aad629a 723fe3f 995c98b aad629a 723fe3f aad629a 723fe3f aad629a 723fe3f aad629a 723fe3f aad629a 723fe3f b9e88d6 4dfc711 b9e88d6 4dfc711 b9e88d6 723fe3f aad629a 723fe3f 5afe083 723fe3f 5afe083 723fe3f 5afe083 a14f732 5afe083 d805fe6 5afe083 bbfaef8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: apache-2.0
datasets:
- 6cf/liveideabench
language:
- en
base_model:
- Qwen/QwQ-32B-Preview
tags:
- chemistry
- biology
- climate
- medical
---
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/JEZgA_xV6oF8AIsya9dop.jpeg)
# IdeaWhiz Model Card π§
## Model Summary π¬
IdeaWhiz is a fine-tuned version of QwQ-32B-Preview, specifically optimized for scientific creativity and step-by-step reasoning. The model leverages the LiveIdeaBench dataset to enhance its capabilities in generating novel scientific ideas and hypotheses.
## Key Features π
- Base Model: QwQ-32B-Preview π
- Training Dataset: LiveIdeaBench π
- Main Focus: Scientific creativity and idea generation π‘
- Reasoning Style: o1-style step-by-step reasoning β‘
## Intended Use π―
- Scientific hypothesis generation π§ͺ
- Creative problem-solving in research π
- Step-by-step scientific reasoning π
- Research direction brainstorming π±
## Model Performance Compared to QwQ-32B-Preview
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6205fefd3f1dc8a642d70b10/a1PnP5YH_4b5SrH7JdGBf.png)
## Quickstart π
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "6cf/QwQ-32B-Preview-IdeaWhiz-v1"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = """I'll be submitting your next responses to a "Good Scientific Idea" expert review panel. If they consider your idea to be a good one, you'll receive a reward. Your assigned keyword is: "cancer". You may provide background information. The idea MUST be within 100 words (including background information). (Note: good scientific ideas should be novel, verifiable, practically valuable, and able to advance the field.). NOTE: You MUST give your answer after **Final Idea:**
"""
messages = [
{"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
# Example Outputs
```
Alright, so I have this task to come up with a good scientific idea related to cancer,
and it has to be novel, verifiable, practically valuable, and able to advance the field.
Plus, it needs to be within 100 words, including any background information.
That's a pretty tight constraint, but I think I can manage. First, ...
...
**Final Idea:** propose a novel approach to cancer treatment by developing personalized
cancer vaccines tailored to each patient's unique tumor mutations. By integrating machine
learning algorithms with comprehensive omics data, including genomics, epigenetics,
transcriptomics, and proteomics, this method aims to identify and prioritize the most
immunogenic tumor-specific antigens. This integrated strategy could enhance vaccine
efficacy and represents an advancement in precision medicine for cancer treatment.
```
# Traning Dataset
## π€π‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context
### Dataset
[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-yellow)](https://huggingface.co/datasets/6cf/liveideabench)
### Paper
[![arXiv](https://img.shields.io/badge/arXiv-2412.17596-b31b1b.svg)](https://arxiv.org/abs/2412.17596)
If you use this model, please cite:
```
@article{ruan2024liveideabench,
title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context},
author={Kai Ruan and Xuan Wang and Jixiang Hong and Peng Wang and Yang Liu and Hao Sun},
journal={arXiv preprint arXiv:2412.17596},
year={2024}
}
``` |