README.md · TucanoBR/README at main

metadata

title: README
emoji: 🌐
colorFrom: gray
colorTo: yellow
sdk: static
pinned: true
license: apache-2.0
short_description: Efficient foundation models for low-resource languages.
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/62e1cc43926f4892a4ca2ff9/_1WxGqMpLN0RuX02Dq9Df.png

In recent years, generative AI has seen remarkable advancements, with foundation models emerging as the cornerstone of much of the research and development in the field. However, the prevailing deep learning paradigm demands vast resources in terms of data and computation. This data-intensive approach has inadvertently deepened the divide between high-resource and low-resource languages. High-resource languages benefit from the bulk of development efforts and readily available resources, while low-resource languages face significant challenges in achieving comparable performance and autonomy.

To foster a more equitable, sustainable, and open ecosystem for AI research and development, we aim to create tools and resources to support the development of foundation models for low-resource languages. This includes developing models, datasets, and open-source code to empower underrepresented linguistic communities.

Recent Publications 📚

ViTucano: A Portuguese Vision Assitant | GitHub | Collection |
Tucano: Advancing Neural Text Generation for Portuguese | GitHub | Collection | Paper |
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese | GitHub | Collection | Paper |

News 🚀

[13/01/2025] We release ViTucano, a pair of vision assistants natively pretrained in Portuguese (ViTucano-1b5-v1, ViTucano-2b8-v1).
[13/01/2025] We release the datasets used to pretrain and fine-tune the ViTucano models: ViTucano-Pretrain and ViTucano-SFT.
[29/11/2024] Tucano is mentioned on Deutsche Welle: "Cientistas criam maior banco de dados em português para IA".
[27/11/2024] Tucano video presentation at the C4AI (USP) [available on YouTube].
[12/11/2024] "Tucano: Advancing Neural Text Generation for Portuguese" is published as a preprint on ArXiv, with all models and datasets released on Hugging Face.

Community Contributions 🤝

Demo on how to run inference on ViTucano.
Demo on how to run inference on Tucano.
Demo on how to create a simple Chat UI for Tucano using Gradio.
Tucano OpenVINO is a ported version of Tucano-2b4-Instruct optimized for Intel openVINO inference technology.