arxiv:2502.16069

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Published on Feb 22

· Submitted by

Authors:

Abstract

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4times improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.

View arXiv page View PDF Add to collection

Community

AmberLJC

Paper submitter about 21 hours ago

Move Scientific Research at the Speed of Thought. This paper introduces Curie, an AI agent framework designed to automate scientific research experimentation. By integrating modules that enhance reliability, enforce methodical control, and improve interpretability, Curie addresses the critical challenges of automating rigorous experimentation. Curie is able to reproduce a few AI research paper through experimentation.
Evaluated against an experimentation benchmark spanning multiple computer science domains, Curie demonstrated a 3.4× improvement in accurately answering experimental questions compared to existing baselines.

librarian-bot

about 13 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arthrod

about 12 hours ago

So cool that you guys used Madame's name! I had one issue with it, while the abstract really gives the idea that we would rigorous validation, I felt the questions sent to the model are a bit too specific in the sense that provides the model the precise location of the issue. it's relevant because my understanding of validation is more: given sentence x, which may or may not have an issue, is it valid? The experiment was more: given this sentence, which has a problem that needs solutions what's the solution? Appreciate it!

AmberLJC

Paper submitter about 9 hours ago

Thank you for the thoughtful feedback! 😊

You bring up an excellent point regarding the input questions. We aimed to strike a balance between open-ended validation (e.g., "Is this valid?", "what is the relationship between A and B?", "What is the best configuration choice") and targeted problem-solving (e.g., "What is the solution to this identified issue?").

For this initial evaluation, we focused on a more directed approach to assess Curie’s ability to provide precise, actionable insights, which is why the questions may seem specific. However, we absolutely see the value in broader, more open-ended validation tasks and agree that this is a natural and important next step. Happy to discuss more!