Vectara

company

https://vectara.com

vectara

Activity Feed Request to join this org

AI & ML interests

retrieval augmented generation, grounded generation, large language models, LLMs, question answering, chatbot

Recent Activity

ofermend updated a Space 4 days ago

vectara/cfpb-assistant

ofermend updated a Space 4 days ago

vectara/ev-assistant

ofermend updated a Space 4 days ago

vectara/finance-assistant

View all activity

vectara's activity

ofermend

updated 6 Spaces 4 days ago

Sleeping

🐨

CFPB Assistant

CFPB Assistant using vectara-agentic

Sleeping

🐨

EV Assistant

EV Assistant using vectara-agentic

Running

🐨

Finance assistant

Finance chatbot using vectara-agentic

Running

🐨

Legal Assistant

Legal Assistant using vectara-agentic

Sleeping

🐨

Justice Harvard

Teacher Assistant for Justice Harvard using vectara-agentic

Running

🐨

Hacker News chat

chatbot with HN data using vectara-agentic

Miaoran000

updated a Space 6 days ago

Running on CPU Upgrade

🥇

vectara/leaderboard_results

Viewer • Updated 14 days ago • 95.8k • 119 • 1

clefourrier

authored a paper 19 days ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published 21 days ago • 17

nthakur

authored a paper about 1 month ago

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

Paper • 2410.13716 • Published Oct 17

nthakur

authored a paper 4 months ago

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Paper • 2406.16828 • Published Jun 24

ofermend

posted an update 5 months ago

Post

1824

I'm excited to share our updated hallucination evaluation model (called HHEM-2.1-Open) as well as the updated leaderboard that ranks LLM by the propensity to hallucinate.

vectara/Hallucination-evaluation-leaderboard

1 reply

clefourrier

authored 2 papers 6 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8 • 8

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

ofermend

posted an update 8 months ago

Post

1754

If you are a debate fan or did this as an extracurricular activity as a kid, you might have fun with this demo - debate bot. Debate against AI/RAG:

vectara/debate-bot

4 replies

nthakur

posted an update 8 months ago

Post

3289

🦢 The SWIM-IR dataset contains 29 million text-retrieval training pairs across 27 diverse languages. It is one of the largest synthetic multilingual datasets generated using PaLM 2 on Wikipedia! 🔥🔥

SWIM-IR dataset contains three subsets :
- Cross-lingual:nthakur/swim-ir-cross-lingual
- Monolingual: nthakur/swim-ir-monolingual
- Indic Cross-lingual: nthakur/indic-swim-ir-cross-lingual

Check it out:
https://huggingface.co/collections/nthakur/swim-ir-dataset-662ddaecfc20896bf14dd9b7

clefourrier

posted an update 8 months ago

Post

5448

In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm

clefourrier

posted an update 8 months ago

Post

4429

Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

clefourrier

posted an update 8 months ago

Post

2209

🆕 Evaluate your RL agents - who's best at Atari?🏆

The new RL leaderboard evaluates agents in 87 possible environments (from Atari 🎮 to motion control simulations🚶and more)!

When you submit your model, it's run and evaluated in real time - and the leaderboard displays small videos of the best model's run, which is super fun to watch! ✨

Kudos to @qgallouedec for creating and maintaining the leaderboard!
Let's find out which agent is the best at games! 🚀

open-rl-leaderboard/leaderboard

clefourrier

posted an update 9 months ago

Post

2216

Fun fact about evaluation, part 2!

How much do scores change depending on prompt format choice?

Using different prompts (all present in the literature, from Prompt question? to Question: prompt question?\nChoices: enumeration of all choices\nAnswer: ), we get a score range of...

10 points for a single model!
Keep in mind that we only changed the prompt, not the evaluation subsets, etc.
Again, this confirms that evaluation results reported without their details are basically bullshit.

Prompt format on the x axis, all these evals look at the logprob of either "choice A/choice B..." or "A/B...".

Incidentally, it also changes model rankings - so a "best" model might only be best on one type of prompt...

AI & ML interests

Recent Activity

Team members 29

vectara's activity

CFPB Assistant

EV Assistant

Finance assistant

Legal Assistant

Justice Harvard

Hacker News chat

HHEM Leaderboard