File size: 3,754 Bytes
c1ed89a
 
088385f
215420e
c1ed89a
 
215420e
 
088385f
b873cc0
 
c1ed89a
215420e
088385f
 
 
be8a5d3
6fdc30b
088385f
3da2859
 
 
ed58497
6fdc30b
ed58497
088385f
 
ed58497
0c0673c
ed58497
088385f
ed58497
 
088385f
ed58497
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
title: README
emoji: 🌐
colorFrom: gray
colorTo: yellow
sdk: static
pinned: true
license: apache-2.0
short_description: Efficient foundation models for low-resource languages.
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/62e1cc43926f4892a4ca2ff9/_1WxGqMpLN0RuX02Dq9Df.png
---

In recent years, generative AI has seen remarkable advancements, with foundation models emerging as the cornerstone of much of the research and development in the field. However, the prevailing deep learning paradigm demands vast resources in terms of data and computation. This data-intensive approach has inadvertently deepened the divide between high-resource and low-resource languages. High-resource languages benefit from the bulk of development efforts and readily available resources, while low-resource languages face significant challenges in achieving comparable performance and autonomy.

To foster a more equitable, sustainable, and open ecosystem for AI research and development, we aim to create tools and resources to support the development of foundation models for low-resource languages. This includes developing models, datasets, and open-source code to empower underrepresented linguistic communities.

## Recent Publications 📚

- **ViTucano: A Portuguese Vision Assitant** | [GitHub](https://github.com/Nkluge-correa/TinyLLaVA_Factory) | [Collection](https://huggingface.co/collections/TucanoBR/vitucano-v1-67804623a92cd2fabcafa0a3) |
- **Tucano: Advancing Neural Text Generation for Portuguese** | [GitHub](https://github.com/Nkluge-correa/Tucano) | [Collection](https://huggingface.co/collections/TucanoBR/tucano-670565e8c5325fb7f2da4361) | [Paper](https://arxiv.org/abs/2411.07854) |
- **TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese** | [GitHub](https://github.com/Nkluge-correa/TeenyTinyLlama) | [Collection](https://huggingface.co/collections/nicholasKluge/teenytinyllama-6582ea8129e72d1ea4d384f1) | [Paper](https://www.sciencedirect.com/science/article/pii/S2666827024000343) |

## News 🚀

- [13/01/2025] We release ViTucano, a pair of vision assistants natively pretrained in Portuguese ([ViTucano-1b5-v1](https://huggingface.co/TucanoBR/ViTucano-1b5-v1), [ViTucano-2b8-v1](https://huggingface.co/TucanoBR/ViTucano-2b8-v1)).
- [13/01/2025] We release the datasets used to pretrain and fine-tune the ViTucano models: [ViTucano-Pretrain](https://huggingface.co/datasets/TucanoBR/ViTucano-Pretrain) and [ViTucano-SFT](https://huggingface.co/datasets/TucanoBR/ViTucano-SFT).
- [29/11/2024] Tucano is mentioned on Deutsche Welle: "[Cientistas criam maior banco de dados em português para IA](https://www.dw.com/pt-br/pesquisadores-da-alemanha-criam-maior-banco-de-dados-p%C3%BAblico-em-portugu%C3%AAs-para-ia/a-70917082)".
- [27/11/2024] Tucano video presentation at the C4AI (USP) [available on [YouTube](https://www.youtube.com/watch?v=BscOHn54ld8)].
- [12/11/2024] "[Tucano: Advancing Neural Text Generation for Portuguese](https://arxiv.org/abs/2411.07854)" is published as a preprint on ArXiv, with all models and datasets released on [Hugging Face](https://huggingface.co/TucanoBR).

## Community Contributions 🤝

- Demo on how to [run inference on ViTucano](https://colab.research.google.com/drive/110_Gtjgu4pldRQP864_Y-rSm2VhyW7Li).
- Demo on how to [run inference on Tucano](https://colab.research.google..com/drive/1Qf2DsFOFDA7RKkamI-tH3OregtOlZ8Cz).
- Demo on how to create a simple [Chat UI for Tucano](https://colab.research.google.com/drive/1fEW10CXksMfMv1veLr22OESwDs6e-W1b) using Gradio.
- [Tucano OpenVINO](https://huggingface.co/cabelo/Tucano-2b4-Instruct-fp16-ov) is a ported version of Tucano-2b4-Instruct optimized for Intel openVINO inference technology.