Kuldeep Singh Sidhu
singhsidhukuldeep
AI & ML interests
😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io
Recent Activity
posted
an
update
28 minutes ago
O1 Embedder: Transforming Retrieval Models with Reasoning Capabilities
Researchers from University of Science and Technology of China and Beijing Academy of Artificial Intelligence have developed a novel retrieval model that mimics the slow-thinking capabilities of reasoning-focused LLMs like OpenAI's O1 and DeepSeek's R1.
Unlike traditional embedding models that directly match queries with documents, O1 Embedder first generates thoughtful reflections about the query before performing retrieval. This two-step process significantly improves performance on complex retrieval tasks, especially those requiring intensive reasoning or zero-shot generalization to new domains.
The technical implementation is fascinating:
- The model integrates two essential functions: Thinking and Embedding
- It uses an "Exploration-Refinement" data synthesis workflow where initial thoughts are generated by an LLM and refined by a retrieval committee
- A multi-task training method fine-tunes a pre-trained LLM to generate retrieval thoughts via behavior cloning while simultaneously learning embedding capabilities through contrastive learning
- Memory-efficient joint training enables both tasks to share encoding results, dramatically increasing batch size
The results are impressive - O1 Embedder outperforms existing methods across 12 datasets in both in-domain and out-of-domain scenarios. For example, it achieves a 3.9% improvement on Natural Questions and a 3.0% boost on HotPotQA compared to models without thinking capabilities.
This approach represents a significant paradigm shift in retrieval technology, bridging the gap between traditional dense retrieval and the reasoning capabilities of large language models.
What do you think about this approach? Could "thinking before retrieval" transform how we build search systems?
posted
an
update
1 day ago
I just came across a groundbreaking paper titled "Hypencoder: Hypernetworks for Information Retrieval" by researchers from the University of Massachusetts Amherst that introduces a fundamentally new paradigm for search technology.
Most current retrieval models rely on simple inner product calculations between query and document vectors, which severely limits their expressiveness. The authors prove theoretically that inner product similarity functions fundamentally constrain what types of relevance relationships can be captured.
Hypencoder takes a radically different approach: instead of encoding a query as a vector, it generates a small neural network (called a "q-net") that acts as a learned relevance function. This neural network takes document representations as input and produces relevance scores.
Under the hood, Hypencoder uses:
- Attention-based hypernetwork layers (hyperhead layers) that transform contextualized query embeddings into weights and biases for the q-net
- A document encoder that produces vector representations similar to existing models
- A graph-based greedy search algorithm for efficient retrieval that can search 8.8M documents in under 60ms
The results are impressive - Hypencoder significantly outperforms strong dense retrieval models on standard benchmarks like MS MARCO and TREC Deep Learning Track. The performance gap widens even further on complex retrieval tasks like tip-of-the-tongue queries and instruction-following retrieval.
What makes this approach particularly powerful is that neural networks are universal approximators, allowing Hypencoder to express far more complex relevance relationships than inner product similarity functions. The framework is also flexible enough to replicate any existing neural retrieval method while adding the ability to learn query-dependent weights.
posted
an
update
18 days ago
Fascinating deep dive into Swiggy's Hermes - their in-house Text-to-SQL solution that's revolutionizing data accessibility!
Hermes enables natural language querying within Slack, generating and executing SQL queries with an impressive <2 minute turnaround time. The system architecture is particularly intriguing:
Technical Implementation:
- Built on GPT-4 with a Knowledge Base + RAG approach for Swiggy-specific context
- AWS Lambda middleware handles communication between Slack UI and the Gen AI model
- Databricks jobs orchestrate query generation and execution
Under the Hood:
The pipeline employs a sophisticated multi-stage approach:
1. Metrics retrieval using embedding-based vector lookup
2. Table/column identification through metadata descriptions
3. Few-shot SQL retrieval with vector-based search
4. Structured prompt creation with data snapshots
5. Query validation with automated error correction
Architecture Highlights:
- Compartmentalized by business units (charters) for better context management
- Snowflake integration with seamless authentication
- Automated metadata onboarding with QA validation
- Real-time feedback collection via Slack
What's particularly impressive is how they've solved the data context challenge through charter-specific implementations, significantly improving query accuracy for well-defined metadata sets.
Kudos to the Swiggy team for democratizing data access across their organization. This is a brilliant example of practical AI implementation solving real business challenges.
Organizations
singhsidhukuldeep's activity
Update Request
2
#2 opened 3 months ago
by
singhsidhukuldeep

The model can be started using vllm, but no dialogue is possible.
3
#2 opened 7 months ago
by
SongXiaoMao

Adding chat_template to tokenizer_config.json file
1
#3 opened 7 months ago
by
singhsidhukuldeep

Script request
3
#1 opened 7 months ago
by
singhsidhukuldeep

Requesting script
#1 opened 7 months ago
by
singhsidhukuldeep
