Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

openfree 
posted an update 2 days ago
view post
Post
2972
MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include:

Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.

Main applications of MixGen3:

Content creation
Design and illustration
Marketing and advertising
Education and learning

Value of MixGen3:

Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development

Expected effects:

Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity

MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space

contacts: [email protected]
  • 1 reply
·
davidberenstein1957 
posted an update 2 days ago
view post
Post
2033
Don't use an LLM when you can use a much cheaper model.

The problem is that no one tells you how to actually do it.

Just picking a pre-trained model (e.g., BERT) and throwing it at your problem won't work!

If you want a small model to perform well on your problem, you need to fine-tune it.

And to fine-tune it, you need data.

The good news is that you don't need a lot of data but instead high-quality data for your specific problem.

In the latest livestream, I showed you guys how to get started with Argilla on the Hub! Hope to see you at the next one.

https://www.youtube.com/watch?v=BEe7shiG3rY
albertvillanova 
posted an update about 13 hours ago
view post
Post
522
🚨 We’ve just released a new tool to compare the performance of models in the 🤗 Open LLM Leaderboard: the Comparator 🎉
open-llm-leaderboard/comparator

Want to see how two different versions of LLaMA stack up? Let’s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. 🦙🧵👇

1/ Load the Models' Results
- Go to the 🤗 Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!

2/ Compare Metric Results in the Results Tab 📊
- Head over to the Results tab.
- Here, you’ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! 🌟
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.

3/ Check Config Alignment in the Configs Tab ⚙️
- To ensure you’re comparing apples to apples, head to the Configs tab.
- Review both models’ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, it’s good to know before drawing conclusions! ✅

4/ Compare Predictions by Sample in the Details Tab 🔍
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each model’s outputs.

5/ With this tool, it’s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether you’re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.

🚀 Try the 🤗 Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
m-ric 
posted an update about 12 hours ago
view post
Post
533
By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.

Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. 🤔🔎

Try it out here 👉 open-llm-leaderboard/comparator
  • 1 reply
·
louisbrulenaudet 
posted an update about 20 hours ago
view post
Post
735
🚨 I have $3,500 in Azure credits, including access to an H100 (96 Go), expiring on November 12, 2024.

I won’t be able to use it all myself, so I’m reaching out to the @huggingface community: Are there any open-source projets with data ready for some compute power?

Let’s collaborate and make the most of it together 🔗
  • 2 replies
·
bartowski 
posted an update 1 day ago
view post
Post
1741
In regards to the latest mistral model and GGUFs for it:

Yes, they may be subpar and may require changes to llama.cpp to support the interleaved sliding window

Yes, I got excited when a conversion worked and released them ASAP

That said, generation seems to work right now and seems to mimic the output from spaces that are running the original model

I have appended -TEST to the model names in an attempt to indicate that they are not final or perfect, but if people still feel mislead and that it's not the right thing to do, please post (civilly) below your thoughts, I will highly consider pulling the conversions if that's what people think is best. After all, that's what I'm here for, in service to you all !
·
singhsidhukuldeep 
posted an update 2 days ago
view post
Post
1866
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."

The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.

The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.

How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar λ for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.

What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.

Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.

This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
  • 1 reply
·
santiviquez 
posted an update 2 days ago
view post
Post
1360
Professors should ask students to write blog posts based on their final projects instead of having them do paper-like reports.

A single blog post, accessible to the entire internet, can have a greater career impact than dozens of reports that nobody will read.
nroggendorff 
posted an update about 13 hours ago
view post
Post
370
100 followers? When did that happen?
merve 
posted an update 1 day ago