Jędrzej Grabala

jgitsolutions

https://jgitsol.github.io

AI & ML interests

Local Drive Human Overseered System of Agents, LLMs, Langchains & other useful stuff on mid-to-low-end of commercial hardware.

Recent Activity

liked a Space about 1 month ago

nvidia/PartPacker

liked a Space about 1 month ago

tencent/SongGeneration

liked a Space about 1 month ago

ilcve21/Sparc3D

View all activity

Organizations

liked 3 Spaces about 1 month ago

273

PartPacker

🪴

Part-level image-to-3D generation.

368

Song Generation

🎵

Generate a custom song from lyrics and prompts

1.29k

Sparc3D

🏃

Next-Gen High-Resolution 3D Model Generation

liked 4 Spaces 3 months ago

11.7k

DeepSite v2

🐳

Generate any application with DeepSeek

QwenSite

⚛

Generate any application with Qwen

6.2k

MTEB Leaderboard

🥇

Embedding Leaderboard

1.01k

Open ASR Leaderboard

🏆

Request evaluation for a speech model

reacted to fdaudens's post with 🔥 3 months ago

Post

3208

Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

1 reply

liked a Space 3 months ago

652

ICEdit

🖼

Universal Image Editing is worth a single LoRA

reacted to RiverZ's post with 👀 3 months ago

Post

3478

🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer～

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO:
RiverZ/ICEdit

🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface:
sanaka87/ICEdit-MoE-LoRA

📄 arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!

upvoted a paper 3 months ago

TesserAct: Learning 4D Embodied World Models

Paper • 2504.20995 • Published Apr 29 • 21

liked 2 Spaces 3 months ago

1.63k

Dia 1.6B

👯

Generate realistic dialogue from a script, using Dia!

335

Describe Anything

⚡

Describe masked areas in images

liked a dataset 4 months ago

SynthLabsAI/Big-Math-RL-Verified

Viewer • Updated Mar 25 • 251k • 8.07k • 191

updated a collection 4 months ago

UniMathResolverDatasets

Collection

3 items • Updated Apr 27

liked 2 datasets 4 months ago

open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18 • 450k • 14.6k • 630

Luckyjhg/Geo170K

Viewer • Updated Feb 19 • 177k • 243 • 36

liked a model 4 months ago

renjiepi/G-LLaVA-7B

Text Generation • Updated Mar 25, 2024 • 8 • 4

upvoted a paper 4 months ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 147

reacted to seawolf2357's post with 👀 4 months ago

Post

6741

🔥 AgenticAI: The Ultimate Multimodal AI with 16 MBTI Girlfriend Personas! 🔥

Hello AI community! Today, our team is thrilled to introduce AgenticAI, an innovative open-source AI assistant that combines deep technical capabilities with uniquely personalized interaction. 💘

🛠️ MBTI 16 Types SPACES Collections link
seawolf2357/heartsync-mbti-67f793d752ef1fa542e16560

✨ 16 MBTI Girlfriend Personas

Complete MBTI Implementation: All 16 MBTI female personas modeled after iconic characters (Dana Scully, Lara Croft, etc.)
Persona Depth: Customize age groups and thinking patterns for hyper-personalized AI interactions
Personality Consistency: Each MBTI type demonstrates consistent problem-solving approaches, conversation patterns, and emotional expressions

🚀 Cutting-Edge Multimodal Capabilities

Integrated File Analysis: Deep analysis and cross-referencing of images, videos, CSV, PDF, and TXT files
Advanced Image Understanding: Interprets complex diagrams, mathematical equations, charts, and tables
Video Processing: Extracts key frames from videos and understands contextual meaning
Document RAG: Intelligent analysis and summarization of PDF/CSV/TXT files

💡 Deep Research & Knowledge Enhancement

Real-time Web Search: SerpHouse API integration for latest information retrieval and citation
Deep Reasoning Chains: Step-by-step inference process for solving complex problems
Academic Analysis: In-depth approach to mathematical problems, scientific questions, and data analysis
Structured Knowledge Generation: Systematic code, data analysis, and report creation

🖼️ Creative Generation Engine

FLUX Image Generation: Custom image creation reflecting the selected MBTI persona traits
Data Visualization: Automatic generation of code for visualizing complex datasets
Creative Writing: Story and scenario writing matching the selected persona's style