Papers
arxiv:2412.20956

QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing

Published on Dec 30, 2024
Authors:

Abstract

We present QuantumLLMInstruct (QLMMI), an innovative dataset featuring over 500,000 meticulously curated instruction-following problem-solution pairs designed specifically for quantum computing - the largest and most comprehensive dataset of its kind. Originating from over 90 primary seed domains and encompassing hundreds of subdomains autonomously generated by LLMs, QLMMI marks a transformative step in the diversity and richness of quantum computing datasets. Designed for instruction fine-tuning, QLMMI seeks to significantly improve LLM performance in addressing complex quantum computing challenges across a wide range of quantum physics topics. While Large Language Models (LLMs) have propelled advancements in computational science with datasets like Omni-MATH and OpenMathInstruct, these primarily target Olympiad-level mathematics, leaving quantum computing largely unexplored. The creation of QLMMI follows a rigorous four-stage methodology. Initially, foundational problems are developed using predefined templates, focusing on critical areas such as synthetic Hamiltonians, QASM code generation, Jordan-Wigner transformations, and Trotter-Suzuki quantum circuit decompositions. Next, detailed and domain-specific solutions are crafted to ensure accuracy and relevance. In the third stage, the dataset is enriched through advanced reasoning techniques, including Chain-of-Thought (CoT) and Task-Oriented Reasoning and Action (ToRA), which enhance problem-solution diversity while adhering to strict mathematical standards. Lastly, a zero-shot Judge LLM performs self-assessments to validate the dataset's quality and reliability, minimizing human oversight requirements.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.20956 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.20956 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.20956 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.