SizorCloud (Sizor Cloud)

liked 3 models 27 days ago

upvoted a collection 3 months ago

LLaVA-Video

Collection

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5 • 55

liked a Space 3 months ago

Running on Zero

43

🌋📹

Llava Video

interact with videos !

liked a model 4 months ago

arcee-ai/Llama-Spark

Text Generation • Updated Sep 9 • 2.78k • 26

liked a model 5 months ago

openbmb/MiniCPM-V-2_6

Image-Text-to-Text • Updated Nov 15 • 48.7k • 873

reacted to as-cle-bert's post with 🤯 5 months ago

Post

5059

Hi HF Community!🤗

In the past days, OpenAI announced their search engine, SearchGPT: today, I'm glad to introduce you SearchPhi, an AI-powered and open-source web search tool that aims to reproduce similar features to SearchGPT, built upon microsoft/Phi-3-mini-4k-instruct, llama.cpp🦙 and Streamlit.
Although not as capable as SearchGPT, SearchPhi v0.0-beta.0 is a first step toward a fully functional and multimodal search engine :)
If you want to know more, head over to the GitHub repository (https://github.com/AstraBert/SearchPhi) and, to test it out, use this HF space: as-cle-bert/SearchPhi
Have fun!🐱

reacted to merve's post with 🔥 5 months ago

Post

2282

We have recently merged Video-LLaVA to transformers! 🤗🎞️
What makes this model different?

Demo: llava-hf/video-llava
Model: LanguageBind/Video-LLaVA-7B-hf

Compared to other models that take image and video input and either project them separately or downsampling video and projecting selected frames, Video-LLaVA is converting images and videos to unified representation and project them using a shared projection layer.

It uses Vicuna 1.5 as the language model and LanguageBind's own encoders that's based on OpenCLIP, these encoders project the modalities to an unified representation before passing to projection layer.

I feel like one of the coolest features of this model is the joint understanding which is also introduced recently with many models

It's a relatively older model but ahead of it's time and works very well! Which means, e.g. you can pass model an image of a cat and a video of a cat and ask questions like whether the cat in the image exists in video or not 🤩

New activity in arcee-ai/Arcee-Nova 5 months ago

Amazing!

1

#2 opened 5 months ago by

SizorCloud

liked 4 models 5 months ago

arcee-ai/Arcee-Nova

Text Generation • Updated Jul 18 • 2.69k • 49

Kwai-Kolors/Kolors

Text-to-Image • Updated Jul 12 • 1.34k • 734

xinsir/controlnet-union-sdxl-1.0

Text-to-Image • Updated Jul 30 • 71.6k • 1.2k

fal/AuraFlow

Text-to-Image • Updated Jul 18 • 5.09k • 635

New activity in rombodawg/Open_Gpt4_8x7B_v0.2 5 months ago

Very good model

1

#4 opened 5 months ago by

SizorCloud

liked a model 5 months ago

rombodawg/Open_Gpt4_8x7B_v0.2

Text Generation • Updated Mar 4 • 1.08k • 12

New activity in TheDrummer/Tiger-Gemma-9B-v1 6 months ago

Differences between Tiger Gemma, Smegmma and Broken Gemma

22

#1 opened 6 months ago by

isr431

reacted to mitkox's post with 🔥 6 months ago

Post

2365

Me: I want on device AI: fast, without latency, with real privacy, convenient for use and development.

Microsoft: The best I can do is Copilot+. You need a special Qualcomm chip and Windows 11 24H2. Today I can give you only Recall, taking screenshots and running a visual model to write context about what you are doing in the unencrypted Semantic Index database for embeddings. I'm giving you SLMs Phi Silica, accessible only via API and SDK. In the autumn I can give you the developer tools for C#/C++ and you can use them.

Apple: The best I can do is Apple Intelligence. You need a special Apple chip and macOS 15. Today I can give you only marketing. In the autumn I can give you on-device 3B quantized to 3.5bit mysterious SLMs and diffusion models with LoRA adapters. We will have an encrypted Semantic Index database for embeddings and agentic flows with function calling. We will call all of them with different names. In the autumn I will give you the developer tools in Swift and you can use them.

Open Source: The best I can do is llama.cpp. You can run it on any chip and OS. Today you can run AI inferencing on device and add other open source components for your solution. I can give you local AI models SLMs/LLMs - from wqen2-0.5B to Llama3-70B. You can have an encrypted local embeddings database with PostgreSQL/pgvector or SQLite-Vec. I can give you a wide choice of integrations and open-source components for your solution- from UIs to agentic workflows with function calling. Today I can give you the developer tools in Python/C/C++/Rust/Go/Node.js/JS/C#/Scala/Java and you can use them.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

4 replies

·

reacted to victor's post with 🔥 6 months ago

Post

4002

Together MoA is a really interesting approach based on open source models!

"We introduce Mixture of Agents (MoA), an approach to harness the collective strengths of multiple LLMs to improve state-of-the-art quality. And we provide a reference implementation, Together MoA, which leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0, surpassing prior leader GPT-4o (57.5%)."

Read more here: https://www.together.ai/blog/together-moa

PS: they provide some demo code: (https://github.com/togethercomputer/MoA/blob/main/bot.py) - if someone release a Space for it it could go 🚀