AdinaY (Adina Yakefu)

posted an update about 21 hours ago

Post

425

Qwen2.5-VL-32B-Instruct 🔥 @Alibaba_Qwen just released this new user friendly VLM model on the hub
Model: Qwen/Qwen2.5-VL-32B-Instruct
Demo: Qwen/Qwen2.5-VL-32B-Instruct

1 reply

·

replied to their post 1 day ago

Awesome🔥
Would be great to share it on HF blog as well 👉 https://huggingface.co/blog

posted an update 5 days ago

Post

1982

FlexWorld 🔥 an open framework that generates 3D scenes from a single image!

Model: GSAI-ML/FlexWorld
Paper: FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis (2503.13265)

✨ 360° rotation & zooming
✨ High quality novel views powered by video-to-video diffusion model
✨ Progressive 3D expansion

posted an update 6 days ago

Post

1374

Step-Video-TI2V 🔥 text driven Image-to-Video Model released by StepFun_ai

Paper:
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model (2503.11251)
Model:
stepfun-ai/stepvideo-ti2v

✨ 30B with MIT license
✨ Up to 102 frames from text + image inputs
✨ Supports motion control and anime-style

posted an update 6 days ago

Post

2834

RWKV7-G1 0.1B 🔥 Pure RNN reasoning model released by RWKV

Model: BlinkDL/rwkv7-g1
paper: RWKV-7 "Goose" with Expressive Dynamic State Evolution (2503.14456)

✨ Apache2.0
✨ Supports 100+ languages
✨ 0.1 B runs smoothly on low power devices
✨ 0.4B/1.5B/2.9B are coming soon!!

1 reply

·

posted an update 7 days ago

Post

2053

Skywork-R1V🚀 38B open multimodal reasoning model with advanced visual CoT capabilities, released by Skywork.

Skywork/Skywork-R1V-38B

✨ Visual Reasoning: Breaks down complex images step by step.
✨ Math & Science: Solves visual problems with high precision.
✨ Combines text & images for deeper understanding.

posted an update 7 days ago

Post

2649

New 3D models from Tencent Hunyuan are now available on the hub 🔥

✨ Hunyuan3D-2mv: multiview shape model for high quality generation
✨ Hunyuan3D-2mini: 0.6B lightweight model for efficient workflows

Model:
tencent/Hunyuan3D-2mv
tencent/Hunyuan3D-2mini
Demo:
tencent/Hunyuan3D-2mv

1 reply

·

reacted to Kseniase's post with 🔥 9 days ago

Post

7607

15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention “heads” are run in parallel. The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments 👇

1 reply

·

reacted to aifeifei798's post with 👀 10 days ago

Post

1161

一个加入水印的小程序

from PIL import Image, ImageDraw, ImageFont

def add_watermark(image):
    watermark_text = "AI Generated by DarkIdol FeiFei"

    # Ensure the input is an Image object
    if not isinstance(image, Image.Image):
        raise ValueError("Input must be a PIL Image object")

    width, height = image.size

    # Create a drawing object to draw on the image
    draw = ImageDraw.Draw(image)

    # Set the font size for the watermark text
    font_size = 10  # Set font size to 10
    try:
        # Try to use a common font file
        font = ImageFont.truetype("Iansui-Regular.ttf", font_size)
    except IOError:
        # Use the default font if the specified font file is not found
        font = ImageFont.load_default()

    # Calculate the width and height of the watermark text using textbbox
    bbox = draw.textbbox((0, 0), watermark_text, font=font)
    text_width = bbox[2] - bbox[0]
    text_height = bbox[3] - bbox[1]

    # Calculate the position for the watermark text (bottom-right corner)
    x = width - text_width - 10  # 10 is the right margin
    y = height - text_height - 10  # 10 is the bottom margin

    # Add the watermark text to the image
    draw.text((x, y), watermark_text, font=font, fill=(255, 255, 255, 128))

    # Return the modified image object
    return image

- 字体从https://fonts.google.com去找就可以了,程序都标注清楚了,自行修改

1 reply

·

reacted to clem's post with 🚀🤗 11 days ago

Post

4559

We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

3 replies

·

posted an update 12 days ago

Post

1642

SEA-VL🔥 an OPEN dataset bridge the AI culture gap in Southeast Asia!

Paper: Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (2503.07920)

✨1.28M+ culturally relevant images
✨85% accuracy in auto-collected images
✨Tracking underrepresented SEA languages & traditions

posted an update 12 days ago

Post

1890

Open Sora 2.0 is out 🔥
hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5
✨ 11B with Apache2.0
✨ Low training cost - $200k
✨ open weights, code and training workflow

reacted to clefourrier's post with 🚀 13 days ago

Post

1860

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

reacted to thomwolf's post with 🚀🔥 13 days ago

Post

2558

We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions

posted an update 13 days ago

Post

1301

Spark TTS 🔊New OPEN TTS model that can generate any voice with just seconds of audio!

Released by SparkAudio community🔥

Model👉 SparkAudio/Spark-TTS-0.5B
Paper👉 Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens (2503.01710)

✨ Supports English & Chinese
✨ BiCodec Speech Codec: Enables precise voice control by separating semantics & speaker attributes

posted an update 13 days ago

Post

1402

R1-Omni🔥RLVR-Powered Multimodal LLM released by Alibaba

Model: StarJiaxing/R1-Omni-0.5B
Paper: R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning (2503.05379)

✨0.5B with Apache2.0
✨ Improve emotion recognition with visual and audio cues

1 reply

·

posted an update 19 days ago

Post

2295

Babel🗼A multilingual LLM supporting 25 languages, released by the Alibaba DAMO team.

Model: Tower-Babel/babel-67c172157372d4d6c4b4c6d5
Paper: Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers (2503.00865)

✨ 9B/83B chat & base
✨ Supports 25 languages: English, Chinese, Hindi, Spanish, Arabic, French, Bengali, Portuguese, Russian, Urdu, Indonesian, German, Japanese, Swahili, Filipino, Tamil, Vietnamese, Turkish, Italian, Javanese, Korean, Hausa, Persian, Thai, and Burmese

1 reply

·

reacted to clem's post with 🔥 21 days ago

Post

5900

Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!

Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise

Adina Yakefu

AI & ML interests

Recent Activity

Organizations

AdinaY's activity