metadata

title: Inkling
emoji: 🌐
colorFrom: indigo
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: true
license: agpl-3.0
short_description: Use AI to find obvious research links in unexpected places.
datasets:
  - nomadicsynth/arxiv-dataset-abstract-embeddings
models:
  - nomadicsynth/research-compass-arxiv-abstracts-embedding-model

Inkling: Bridging the Unconnected in Scientific Literature

Inkling is an experimental bridge-finding engine for scientific literature, built to uncover latent connections between research papers—relationships that are obvious in hindsight but buried under the sheer volume of modern research. It’s inspired by the work of Don R. Swanson, the visionary who discovered the link between fish oil and Raynaud’s syndrome using nothing but manual literature analysis. Today, we call this approach Literature-Based Discovery - and Inkling is our attempt to automate it with modern NLP.

The Problem: Lost in the Literature

The scientific literature is growing exponentially, but human researchers can only read so much. As Sabine Hossenfelder explained in her 2024 YouTube video "AIs Predict Research Results Without Doing Research", even experts miss critical connections because no one has time to read everything. Swanson’s 1986 discovery of the fish oil–Raynaud’s link was a wake-up call: the knowledge existed in plain sight, but the papers were siloed. Inkling is our attempt to fix that.

The Vision: A Bridge-Finding Machine

Inkling isn’t just a search engine. It’s a hypothesis generator. It learns to recognize intermediate concepts that connect seemingly unrelated papers—like Swanson’s "blood viscosity" bridge. The model is built to:

Find indirect links between papers that don’t cite each other.
Surface connections that feel obvious once explained but are buried in the noise.
Scale to the entire arXiv corpus and beyond.

How It Works

Model Architecture

Base Model: A SentenceTransformer using Llama-7B as its base (with frozen weights) and a dense embedding head.
Training:
- v1: Trained on a synthetic dataset of randomly paired papers, rated for conceptual overlap.
- v2 (in progress): Focused on bridge detection, using prompts to explicitly identify intermediate concepts (e.g., "What connects these two papers?").
Embedding Strategy:
- Dense vector representations of abstracts.
- FAISS for fast approximate nearest-neighbor search.

Dataset Philosophy

v1: Random paper pairs rated for generic "relevance" (too broad, limited bridge detection).
v2: Focus on explicit bridge extraction using LLM-generated triplets (e.g., "Paper A → Bridge Concept → Paper B").

The Inspiration

This project was born from a nerd-sniping moment after watching Sabine Hossenfelder’s video on AI’s ability to predict neuroscience results without experiments. That led to three key influences:

1. Swanson’s "Undiscovered Public Knowledge"

Swanson’s 1986 paper showed that the fish oil–Raynaud’s link existed in the literature for decades—it just took a human to connect the dots. Inkling automates this process.

2. Tshitoyan et al. (2019): Word Embeddings in Materials Science

Their work demonstrated that unsupervised embeddings could predict future material discoveries from latent knowledge. Inkling applies this idea to conceptual bridges in all scientific fields.

3. Luo et al. (2024): LLMs Beat Human Experts

This study showed that a 7B LLM (like Mistral) could outperform neuroscientists in predicting experimental outcomes. Inkling leverages this power to find connections even domain experts might miss.

What It Can Do (and What’s Next)

Current Capabilities

Embed arXiv abstracts into dense vectors.
Search for papers with conceptual overlap (50% relevance in top-10/25 queries, per manual testing).
Visualize results in a Gradio interface with FAISS-powered speed.

Roadmap

v2: Train on LLM-generated bridge triplets (e.g., "Paper A → Blood Viscosity → Paper B").
Gradio Enhancements:
- Interactive bridge visualization (D3.js or Plotly).
- User feedback loop for improving the model.
Automated Updates: Embed new arXiv papers nightly.
Domain-Specific Tools:
- Drug repurposing mode (e.g., "Find new uses for aspirin").
- Interdisciplinary connection finder (e.g., "How does physics inform AI research?").

Why This Matters

Inkling is not a polished product—it’s a chaotic, ADHD-fueled experiment in democratizing scientific discovery. It’s for:

Researchers drowning in paper overload.
Interdisciplinary thinkers who thrive on unexpected connections.
Anyone who’s ever thought, "I could’ve thought of that!" after a breakthrough.

As Sabine Hossenfelder put it: "The future of research isn’t in doing more experiments—it’s in connecting the dots we already have." - Citation needed.

Status

Model: v1 (proof of concept, 50-50 if it does anything or my brain is just playing tricks).
Dataset: v1 (random pairs, too broad). v2 (in planning, focused on bridge detection).
Interface: Gradio-powered demo with FAISS backend.
Next Steps: Refine training data, automate updates, and scale to all of arXiv.

Credits

Inspiration: Sabine Hossenfelder’s "AIs Predict Research Results" video.
Foundational Work: Don R. Swanson, V. Tshitoyan, X. Luo.
Model Architecture: Llama-7B + SentenceTransformer.

Try It

Live Demo
Paste an abstract, find a bridge, and see if the connection feels obvious in hindsight. 🚀

This is a work in progress. Feedback, ideas, and nerd-sniped collaborators are welcome.