metadata

title: Inkling
emoji: 🌐
colorFrom: indigo
colorTo: yellow
python_version: 3.1
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: true
license: agpl-3.0
short_description: Use AI to find obvious research links in unexpected places.
datasets:
  - nomadicsynth/arxiv-dataset-abstract-embeddings
models:
  - nomadicsynth/research-compass-arxiv-abstracts-embedding-model

Inkling: AI-assisted research discovery

Inkling is an AI-assisted tool that helps you discover meaningful connections between research papers — the kind of links a domain expert might spot, if they had time to read everything.

Rather than relying on superficial similarity or shared keywords, Inkling is trained to recognize reasoning-based relationships between papers. It evaluates conceptual, methodological, and application-level connections — even across disciplines — and surfaces links that may be overlooked due to the sheer scale of the research landscape.

This demo uses the first prototype of the model, trained on a dataset of 10,000+ rated abstract pairs, built from a larger pool of arXiv triplets. The system will continue to improve with feedback and will be released alongside the dataset for public research.

What it does

Accepts a research abstract, idea, or question
Searches for papers with deep, contextual relevance
Highlights key conceptual links and application overlaps
Offers reasoning-based analysis between selected papers
Gathers user feedback to improve the model over time

Background and Motivation

Scientific progress often depends on connecting ideas across papers, fields, and years of literature. But with the volume of research growing exponentially, it's increasingly difficult for any one person — or even a team — to stay on top of it all. As a result, valuable connections between papers often go unnoticed simply because the right expert never read both.

In 2024, Luo et al. published a landmark study in Nature Human Behaviour showing that large language models (LLMs) can outperform human experts in predicting the results of neuroscience experiments by integrating knowledge across the scientific literature. Their model, BrainGPT, demonstrated how tuning a general-purpose LLM (like Mistral-7B) on domain-specific data could synthesize insights that surpass human forecasting ability. Notably, the authors found that models as small as 7B parameters performed well — an insight that influenced the foundation for this project.

Inspired by this work — and a YouTube breakdown by physicist and science communicator Sabine Hossenfelder, titled "AIs Predict Research Results Without Doing Research" — this project began as an attempt to explore similar methods of knowledge integration at the level of paper-pair relationships. Her clear explanation and commentary sparked the idea to apply this paradigm not just to forecasting outcomes, but to identifying latent connections between published studies.

Originally conceived as a perplexity-ranking experiment using LLMs directly (mirroring Luo et al.'s evaluation method), the project gradually evolved into what it is now — Inkling, a reasoning-aware embedding model fine-tuned on LLM-rated abstract pairings, and built to help researchers uncover links that would be obvious — if only someone had the time to read everything.

Why Inkling?

Because the right connection is often obvious — once someone points it out.

Researchers today are overwhelmed by volume. Inkling helps restore those missed-but-meaningful links between ideas, methods, and fields — links that could inspire new directions, clarify existing work, or enable cross-pollination across domains.

Citation

Luo, X., Rechardt, A., Sun, G. et al. Large language models surpass human experts in predicting neuroscience results. Nat Hum Behav 9, 305–315 (2025). https://www.nature.com/articles/s41562-024-02046-9

Status

Inkling is in alpha and under active development. The current model is hosted via Gradio, with a Hugging Face Space available for live interaction and feedback. Contributions, feedback, and collaboration are welcome.