Jeonghwan Park's picture

Jeonghwan Park PRO

maywell

AI & ML interests

None yet

Recent Activity

Organizations

Social Post Explorers's profile picture ์ธ์ŠคํŠธ๋ŸญํŠธ.ํ•œ๊ตญ's profile picture Elyn AI's profile picture Nothing is Real's profile picture Wanot.AI's profile picture

maywell's activity

reacted to yongchanghao's post with ๐Ÿ”ฅ 3 months ago
view post
Post
3765
We just released a paper (NeuZip) that compresses VRAM in a lossless manner to run larger models. This should be particularly useful when VRAM is insufficient during training/inference. Specifically, we look inside each floating number and find that the exponents are highly compressible (as shown in the figure below).

Read more about the work at NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks (2410.20650)
reacted to thomwolf's post with โค๏ธ 4 months ago
view post
Post
4199
Parents in the 1990: Teach the kids to code
Parents now: Teach the kids to fix the code when it starts walking around ๐Ÿค–โœจ
  • 2 replies
ยท
reacted to beomi's post with ๐Ÿ”ฅ 4 months ago
view post
Post
5596
# PyTorch == 2.5.0 Breaks Transformers' SDPAttention!

When you encounter "RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph."

We can use workaround like this:

torch.backends.cuda.enable_cudnn_sdp(False)


but this slow downs the performance gain from PyTorch 2.5.

Although it is fixed(not "fixed" but default option is turn-off the cuDNN SDPA) at here -- https://github.com/pytorch/pytorch/pull/138587 , but not released yet. (you need to install directly from source)

Fastest way for now : pip install "torch<2.5"

Ref: https://github.com/huggingface/diffusers/issues/9704#issuecomment-2422585273
reacted to Felladrin's post with โค๏ธ 4 months ago
view post
Post
3097
MiniSearch is celebrating its 1st birthday! ๐ŸŽ‰

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
  • 1 reply
ยท
reacted to Wauplin's post with ๐Ÿ”ฅ 5 months ago
view post
Post
4677
๐Ÿš€ Exciting News! ๐Ÿš€

We've just released ๐š‘๐šž๐š๐š๐š’๐š—๐š๐š๐šŠ๐šŒ๐šŽ_๐š‘๐šž๐š‹ v0.25.0 and it's packed with powerful new features and improvements!

โœจ ๐—ง๐—ผ๐—ฝ ๐—›๐—ถ๐—ด๐—ต๐—น๐—ถ๐—ด๐—ต๐˜๐˜€:

โ€ข ๐Ÿ“ ๐—จ๐—ฝ๐—น๐—ผ๐—ฎ๐—ฑ ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ ๐—ณ๐—ผ๐—น๐—ฑ๐—ฒ๐—ฟ๐˜€ with ease using huggingface-cli upload-large-folder. Designed for your massive models and datasets. Much recommended if you struggle to upload your Llama 70B fine-tuned model ๐Ÿคก
โ€ข ๐Ÿ”Ž ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—”๐—ฃ๐—œ: new search filters (gated status, inference status) and fetch trending score.
โ€ข โšก๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ๐—–๐—น๐—ถ๐—ฒ๐—ป๐˜: major improvements simplifying chat completions and handling async tasks better.

Weโ€™ve also introduced tons of bug fixes and quality-of-life improvements - thanks to the awesome contributions from our community! ๐Ÿ’ช

๐Ÿ’ก Check out the release notes: Wauplin/huggingface_hub#8

Want to try it out? Install the release with:

pip install huggingface_hub==0.25.0

  • 1 reply
ยท
reacted to do-me's post with ๐Ÿš€ 5 months ago
view post
Post
3377
SemanticFinder now supports WebGPU thanks to @Xenova 's efforts with transformers.js v3!
Expect massive performance gains. Inferenced a whole book with 46k chunks in <5min. If your device doesn't support #WebGPU use the classic Wasm-based version:
- WebGPU: https://do-me.github.io/SemanticFinder/webgpu/
- Wasm: https://do-me.github.io/SemanticFinder/

WebGPU harnesses the full power of your hardware, no longer being restricted to just the CPU. The speedup is significant (4-60x) for all kinds of devices: consumer-grade laptops, heavy Nvidia GPU setups or Apple Silicon. Measure the difference for your device here: Xenova/webgpu-embedding-benchmark
Chrome currently works out of the box, Firefox requires some tweaking.

WebGPU + transformers.js allows to build amazing applications and make them accessible to everyone. E.g. SemanticFinder could become a simple GUI for populating your (vector) DB of choice. See the pre-indexed community texts here: do-me/SemanticFinder
Happy to hear your ideas!
  • 1 reply
ยท
reacted to fantos's post with ๐Ÿš€โค๏ธ 6 months ago
view post
Post
2543
1. **Overview**
"EveryText" is at the forefront of AI image generation, offering a novel "TBF ('Text by Font') Image Model" that enables the representation of all languages globally in AI-generated images without prior training.

2. **Background**
Platforms like MidJourneyV6 and FLUX have advanced AI image generation, typically supporting English text. Alibaba Group expanded this to include Chinese, Japanese, and Korean, signaling a shift towards global language support.

3. **Challenges**
Existing methods faced several challenges including the need for additional editing, dependency on specific training, and substantial resource requirements. These approaches also struggled with limited vocabulary and were primarily effective only for English.

4. **Innovative Solution**
EveryText utilizes "Fonts" as pre-trained models, allowing any text to be visually represented without traditional training. This approach not only enhances diversity and aesthetics by utilizing various fonts but also ensures unlimited expression.

5. **Using the Service**
EveryText is free and easy to use:
- **Prompt**: Describe the image.
- **Text for Image Generation**: Add your text.
- **Text Position and Size**: Customize the text's placement and size.
- **Font Selection**: Optionally select a font.
- **Advanced Settings**: Further refine the image creation.
- Click "START" to generate the image.

6. **Comparative Analysis**
EveryText supports all languages with superior image quality and text legibility, setting it apart from platforms like MidJourneyV6/Flux and AnyText by Alibaba Group.

7. **Conclusion**
EveryText has revolutionized AI-generated imagery by integrating all global languages, broadening the scope for creative and communicative applications. Its future potential is vast and promising.

**Related Links**
- Huggingface Service: https://fantos-EveryText.hf.space
-email: [email protected]
reacted to maximuspowers's post with ๐Ÿ‘€ 6 months ago
view post
Post
2537
Here's my favorite piece of the summer bias detection research project (paper coming in Sept). We trained BERT for token classification (multi-label), to identify:
- Generalizations
- Unfairness
- Stereotypes

HF Space: maximuspowers/bias-detection-ner
Article on Training: https://huggingface.co/blog/maximuspowers/bias-entity-recognition

Pls reach out with ideas!! Lot's more info coming soon, our research group has workshops and a hackathon planned for launching this open source project. Thanks
reacted to davidberenstein1957's post with โค๏ธ 6 months ago
view post
Post
1768
๐Ÿ“ฃ Introducing Dataset Viber: your chill repo for data collection, annotation and vibe checks! ๐ŸŽ‰

I've cooked up Dataset Viber, a set of cool tools designed to make data preparation for AI models easier, more approachable and enjoyable for standalone AI engineers and enthusiasts.

๐Ÿ”ง What Dataset Viber offers:
- CollectorInterface: Lazily collect model interaction data without human annotation
- AnnotatorInterface: Annotate your data with models in the loop
- BulkInterface: Explore data distribution and annotate in bulk
- Embedder: Efficiently embed data with ONNX-optimized speeds

๐ŸŽฏ Key features:
- Supports various tasks for text, chat, and image modalities
- Runs in .ipynb notebooks
- Logs data to local CSV or directly to Hugging Face Hub
- Easy to install via pip: pip install dataset-viber

It's not designed for team collaboration or production use, but rather as a fun and efficient toolkit for individual projects.

Want to give it a try? Check out the repository link https://github.com/davidberenstein1957/dataset-viber/.

I'm excited to hear your feedback and learn how you vibe with your data. Feel free to open an issue or reach out if you have any questions or suggestions!

Some shoutouts:
- Gradio for the amazing backbone
- Daniel van Strien for some initial presentations I did on vibe checks
- Emily Omier for the workshop on structuring GitHub repo READMEs
- Hamel Husain for keeping mentioning that people should look at their data.
- Philipp Schmid for his code for ONNX feature-extractors
- Ben Burtenshaw for the first PR
  • 1 reply
ยท
reacted to Jaward's post with ๐Ÿ”ฅ 6 months ago
view post
Post
1780
PyTorch implementation of the Self-Compression & Differentiable Quantization Algorithm introduced in โ€œSelf-Compressing Neural Networksโ€ paper.

The algorithm shows dynamic neural network compression during training - with reduced size of weight, activation tensors and bits required to represent weights.

Itโ€™s basically shrinking the neural network size (weights and activations) as itโ€™s being trained without compromising performance - this helps reduce compute and inference cost.

Code: https://github.com/Jaykef/ai-algorithms
Paper: https://arxiv.org/pdf/2301.13142
reacted to davanstrien's post with โค๏ธ 6 months ago
view post
Post
3160
Is your summer reading list still empty? Curious if an LLM can generate a book blurb you'd enjoy and help build a KTO preference dataset at the same time?

A demo using Hugging Face Spaces and Gradio to collect LLM output preferences: davanstrien/would-you-read-it
  • 1 reply
ยท
reacted to fdaudens's post with ๐Ÿš€ 7 months ago
view post
Post
2675
Exciting news for audio AI enthusiasts! ๐ŸŽ™๏ธ๐ŸŒ

The Emilia dataset dropped last week, and it's a cool one:
- 101k+ hours of high-quality audio
- 6 languages: ๐Ÿ‡จ๐Ÿ‡ณ ๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‡ฏ๐Ÿ‡ต ๐Ÿ‡ฐ๐Ÿ‡ท ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท
- Diverse content: talk shows, interviews, debates, sports commentary, audiobooks

This dataset could improve multilingual speech generation and recognition. Opens up many possibilities for global media, language learning, and accessibility!

Explore it: amphion/Emilia

#AIAudio
reacted to lamhieu's post with ๐Ÿ˜” 7 months ago
view post
Post
4284
๐ŸŽ‰ The Ghost 8B Beta model outperforms prominent models such as Llama 3 8B Instruct, GPT 3.5 Turbo in the lc_winrate score. In addition, it also outperforms Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large when comparing the winrate score of AlpacaEval 2.0.

Ghost 8B Beta is a large language model developed with goals that include excellent multilingual support, superior knowledge capabilities, and cost-effectiveness. The model comes in two context length versions, 8k and 128k, along with multilingual function tools support by default.
The languages supported are ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean and ๐Ÿ‡จ๐Ÿ‡ณ Chinese.

Explore the Potential:
To learn more about this groundbreaking language model, visit the official website or explore the online demo platforms:
- Ghost 8B Beta (ฮฒ, 8k) on Spaces: lamhieu/ghost-8b-beta-8k.
- Ghost 8B Beta (ฮฒ, 128k) on Spaces: lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta
ยท
reacted to m-ric's post with ๐Ÿ‘ 8 months ago
view post
Post
3137
๐Ÿ’ฐ ๐—š๐—ฒ๐˜ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ ๐—ผ๐—ณ ๐—ฎ๐—ป๐˜† ๐—Ÿ๐—Ÿ๐—  ๐—”๐—ฃ๐—œ ๐—ฟ๐—ฒ๐—พ๐˜‚๐—ฒ๐˜€๐˜ โ‡’ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐—ฐ๐—ผ๐˜€๐˜

I've just found out about ๐™ฐ๐š๐šŽ๐š—๐š๐™พ๐š™๐šœ-๐™ฐ๐™ธ/๐š๐š˜๐š”๐šŽ๐š—๐šŒ๐š˜๐šœ๐š (https://github.com/AgentOps-AI/tokencost).
๐—ง๐—ต๐—ถ๐˜€ ๐—น๐—ถ๐—ฏ๐—ฟ๐—ฎ๐—ฟ๐˜† ๐—ด๐—ถ๐˜ƒ๐—ฒ๐˜€ ๐˜†๐—ผ๐˜‚ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฟ๐—ถ๐—ฐ๐—ฒ ๐—ผ๐—ณ ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฐ๐—ฎ๐—น๐—น๐˜€ ๐˜๐—ผ ๐—ฎ๐—ป๐˜† ๐—Ÿ๐—Ÿ๐—  ๐—”๐—ฃ๐—œ: OpenAI, Anthropic, Mistral, AWS or Databricks...

For any model, you can use as input either string prompts or messages, and get as outputs either the price or token count.

Congrats to the AgentOps-AI team: this will be very useful when trying to get a ballpark estimate of a project's price, to compare APIs, or for precise monitoring of usage!

โœจ Daily reminder: ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—ป ๐—”๐Ÿญ๐Ÿฌ๐Ÿฌ ๐—ฐ๐—ผ๐˜€๐˜๐˜€ ๐˜†๐—ผ๐˜‚ ๐—ฒ๐˜…๐—ฎ๐—ฐ๐˜๐—น๐˜† $๐Ÿฌ.๐Ÿฌ๐Ÿฌ/๐—ต๐—ผ๐˜‚๐—ฟ (or 0.00โ‚ฌ in current exchange rates) on a HF space with ZeroGPU!
Learn more on ZeroGPU ๐Ÿ‘‰ https://www.datacenterdynamics.com/en/news/hugging-face-launches-zerogpu-project-to-democratize-ai-gives-away-10-million-worth-of-compute/
ยท
reacted to loubnabnl's post with ๐Ÿ”ฅ 9 months ago
view post
Post
5870
๐Ÿท FineWeb technical report is out and so is ๐Ÿ“š FineWeb-Edu, a 1.3 trillion tokens dataset that outperforms all other open web datasets, with remarkable improvements on educational benchmarksย such as MMLU, ARC, and OpenBookQA.

Technical report: HuggingFaceFW/blogpost-fineweb-v1
Dataset: HuggingFaceFW/fineweb-edu

We used Llama 3 generations to train an educational quality classifier, filtering the 15 trillion tokens of FineWeb to select only those with high educational value (an approach also used in Llama 3 and Phi-3 training datasets). We're releasing both FineWeb-Edu and the classifier, along with a larger, less heavily filtered version containing 5.4 trillion tokens.

You can find more details about the dataset and the experiments we ran in the FineWeb technical report, It's a 45-minute read but it contains all the secret sauce for building high quality web datasets.

Enjoy!
replied to their post 9 months ago
view reply

Hi, just read it. It's merging method with calibration looks interesting. I don't see their method without it have significant benefit over previous methods.

reacted to mmhamdy's post with โค๏ธ 10 months ago
view post
Post
1777
โŒš Visiting the past with Time Machine GPT!

We are all familiar with the concept of a suite of models being a series of variants of a certain model that differ mainly in size. For example, Llama-2 7B, Llama-2 13B, Llama-2 70B

But this is not always the case. Researchers from The University of Oxford, The Alan Turing Institute, and The University of Manchester introduced TimeMachineGPT (TiMaGPT), a suite of language models that were pretrained on data constrained by a certain period in time. Instead of various sizes of the model, you get the same model but trained on different data coming from different times.

Using a GPT-2 model architecture with 117 million parameters, they trained 12 different models on Wikipedia and WMT News from 2011 to 2022 with each year represented by a model. For example, TiMaGPT-2011, TiMaGPT-2012, ..., TiMaGPT-2022.

๐Ÿค” But how could these models be useful?

They can be very useful. For example:

1๏ธโƒฃ Most language models are static in the sense that they are trapped in the time bubble of their pretraining data, their knowledge is limited by the cut-off date of their training dataset. In order to update their knowledge, Temporal Adaptation can be performed, which means further training on newer data. The TiMaGPT series of models can be used to study the limitations of Temporal Adaptation of language models.

2๏ธโƒฃ Word meaning can change not only with its context but also with its time of use and there is a large amount of research that focuses on understanding how embeddings shift through time. TiMaGPT will be very helpful in studying this phenomenon.

3๏ธโƒฃ One more use case in the context of Time-series forecasting and event prediction is "backtesting". Which is using historical data to evaluate new models for forecasting the future. Models like TiMaGPT (each living in its own time without any knowledge of the future/present) will be great for such a use case.

๐Ÿค— All models and datasets are on the hub: https://huggingface.co/Ti-Ma
  • 1 reply
ยท
reacted to m-ric's post with ๐Ÿคฏ๐Ÿ‘€ 10 months ago
view post
Post
2801
๐Ÿ’ฐโŒ ๐‘๐ž๐ฌ๐ž๐š๐ซ๐œ๐ก ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐ฏ๐ž๐ซ๐ฒ ๐†๐๐” ๐๐จ๐จ๐ซ - ๐’๐œ๐š๐ฅ๐ข๐ง๐  ๐ฅ๐š๐ฐ๐ฌ ๐ซ๐ž๐ฉ๐ฅ๐ข๐œ๐š๐ญ๐ข๐จ๐ง

๐ŸŽ† Good news: ๐˜†๐—ผ๐˜‚ ๐—ฐ๐—ฎ๐—ป ๐—ฑ๐—ผ ๐—ฐ๐˜‚๐˜๐˜๐—ถ๐—ป๐—ด-๐—ฒ๐—ฑ๐—ด๐—ฒ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐—ฐ๐—ฎ๐—น๐—ฐ๐˜‚๐—น๐—ฎ๐˜๐—ผ๐—ฟ ๐—ฎ๐—ป๐—ฑ ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜ ๐—ฃ๐—ฎ๐—ถ๐—ป๐˜ ๐Ÿฎ๐Ÿฌ๐Ÿฌ๐Ÿฒ!

The Chinchilla experiments (by Google DeepMind) ran hundreds of pre-trainings with models >1B parameters (I do not want to imagine how much that cost) to ๐—ณ๐—ถ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฝ๐˜๐—ถ๐—บ๐—ฎ๐—น ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ ๐—ผ๐—ณ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜€๐—ถ๐˜‡๐—ฒ ๐˜ƒ๐˜€ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€. Why is this question so important?
Well, you only ever have access to a fixed compute, counted in FLOPs (floating point operations). So if your model is bigger, you will have less compute to train on many tokens, and if you want to train on more tokens, your model will be smaller. When model trainings cost million, you absolutely need to get this right.

The new paper "Chinchilla Scaling: A replication attempt" by Epoch AI sets on on the ambitious goal of reproducing this.

But since the authors do not have infinite money, they decided to directly run their computations from DeepMind's own experiments! They took the figure from the last experiment (cf slide below), measured point positions, picked color codes, and ended up reconstructing the underlying data.

๐Ÿ’ฅ They then just fit the scaling laws proposed by the Chinchilla Authors, but arrived at wildly different results! They find that as a rough rule of thumb, you should use 20 training tokens for each parameter in your model, instead of the 70 obtained in the original paper. They also point out inconsistencies in the paper, and unrealistically narrow confidence intervals.

โžก๏ธ This only contradicts the results from the last (out of 3) experiments in the Chinchilla paper. And the model trained at the end of the Chinchilla paper still seems properly scaled.

โœ… But it does show that a tiny bit more theoretical work can go a long way, especially given the huge financial costs that such an error can have!