Rohit Kumar

rohitdavas

https://rohitdavas.github.io

AI & ML interests

machine learning, computational neuroscience, deep learning, AI

Recent Activity

new activity 1 day ago

facebook/detr-resnet-50:Adding ONNX file of this model

new activity 8 days ago

pytorch/SSD:Update app.py

reacted to TuringsSolutions's post with 👍 7 months ago

I developed a way to test very clearly whether or not a Transformers model can actually learn symbolic reasoning, or if LLM models are forever doomed to be offshoots of 'Socratic Parrots'. The results are in, undeniable proof that Transformers models CAN learn symbolic relationships. Undeniable proof that AI can learn its ABC's. Credit goes to myself, Claude, and ChatGPT. I would not be able to prove this without Claude or ChatGPT. https://www.youtube.com/watch?v=I8jHRgahRfY

View all activity

Organizations

rohitdavas's activity

New activity in facebook/detr-resnet-50 1 day ago

Adding ONNX file of this model

#8 opened over 1 year ago by

dikar8

New activity in pytorch/SSD 8 days ago

Update app.py

#1 opened 8 days ago by

rohitdavas

reacted to TuringsSolutions's post with 👍 7 months ago

Post

2277

replied to Jaward's post about 1 year ago

Thanks for writing. Would you like to suggest some readings / blogs / implementations as well to further read ? Nice highlights.

reacted to Jaward's post with ❤️ about 1 year ago

Post

Retrieval-Augmented Generation (RAG)
Redeemer of the "hallucination problem"

It is fair enough to argue that "hallucinations" in LLMs are just mere reflections of what we humans occasionally do - well it gets worse as we get older, but these models are brain inspired, thus such behaviors are likely inherently unavoidable. After all, we are just dreamers trying make sense of this life.

The best we can do is minimize and control it - but humanly how? By first feeding on relevant facts and then developing a habit that allows us to easily access those facts when needed. This is what RAG is all about - it's just a control mechanism that keeps the LLM aligned with reality and fact.

But How Does RAG Work?

Well, to some extent it is domain-specific but the overall workflow boils down to the following:

1. It makes use of a retrieval mechanism that hunts for facts relevant to a query - this involves an end-to-end backpropagation that leverages a retriever (Query Encoder + Document Index or Source of Truth) with a pre-trained generative model.

2. The generative model then uses the facts retrieved, performs some verification to give a more accurate response.

To summarize, the RAG architecture houses a pre-existing knowledge source model (termed parametric memory), which then utilizes a Source-of-Truth model or vector indexed data (termed non-parametric memory) that is accessed by a pre-trained neural retriever, in order to produce more informed, contextually appropriate and factually correct responses.

Sort of a "Genius Engine" if you might say. If only we humans could harness such, AGI would be much much sooner lol.

In the meantime, I have been Jaward Sesay (Chinese name 苏杰 Sujie) - a young Sierra Leonean, aspiring AI Researcher. I like to read, share and try implementing AI research papers. Also like dunking on big tech while rooting for open-source. My mentor @karpathy , I dream of him following me back on X lol. Thanks.

2 replies

reacted to vladbogo's post with 🤗 about 1 year ago

Post

A recent paper titled "Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters" proposes using fine-tuned Multimodal Language Models (MLMs) as high-quality filters for image-text data.

Key points:
* Defines multiple metrics to assess image-text quality from different perspectives like object details, text quality, and semantic understanding.
* Leverages GPT-4 and GPT-4V to construct high-quality instruction data for fine-tuning open-source MLMs as effective data filters.
* Fine-tuned MLM filters generate more precise scores, leading to better filtered data and improved performance of pre-trained models on various downstream tasks.

Congrats to the authors for their work!

Paper: Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters (2403.02677)
Code: https://github.com/Victorwz/MLM_Filter
Dataset: weizhiwang/mlm_filter_instructions
Model: weizhiwang/mlm-filter-llava-13b-gpt4v

New activity in arnabdhar/YOLOv8-Face-Detection about 1 year ago

bug - YOLO called instead of model in the example

#4 opened about 1 year ago by

rohitdavas

updated 13 models over 1 year ago