Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nicolay-rΒ 
posted an update 2 days ago
Post
1977
πŸ“’ So far I noticed that 🧠 reasoning with llm πŸ€– in English is tend to be more accurate than in other languages.
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.πŸ”΄ Third-party framework installation
2.πŸ”΄ Text chunking
3.πŸ”΄ support of meta-annotation like spans / objects / etc.

πŸ’Ž To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. 🎊

⭐ https://github.com/nicolay-r/bulk-translate

bulk-translate is a tiny Python 🐍 no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.

It supports πŸ‘¨β€πŸ’» API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python 🐍
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
πŸ“˜ https://github.com/nicolay-r/bulk-translate/wiki

All you have to do is to provide iterator of texts, where each text:
1. βœ… String object
2. βœ… List of strings and nested lists that represent spans (value + any ID data).

πŸ€– By default I provide a wrapper over googletrans which you can override with your own πŸ”₯
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py
In this post