Spaces:
Running
Running
title: Beta | |
emoji: 🐢 | |
colorFrom: blue | |
colorTo: yellow | |
sdk: static | |
pinned: true | |
license: mpl-2.0 | |
short_description: চা খাবা? | |
<div align="center"> | |
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67497128927b345d1345e9de/69fZeWPoXB20L7do9nZDY.png" width="300" alt="Shôbdhonic Logo"> | |
# শব্দনিক | Shôbdhonic | |
### **বাংলা NLP-এর নতুন যুগ** | |
*"ভাষাকে জানো, AI-কে চেনো!"* | |
*(Unlock Bangla's Future with AI)* | |
[](https://shobdhonic.com) | |
[](https://discord.gg/shobdhonic) | |
[](https://twitter.com/Shobdhonic) | |
[](https://t.me/Shobdhonic) | |
[](https://github.com/Shobdhonic) | |
[](https://huggingface.co/Shobdhonic) | |
</div> | |
--- | |
## 🚀 **Why Shôbdhonic?** | |
A **next-gen Bangla NLP platform** built for: | |
- 🔥 **Gen-Z Creators**: Meme generators, slang translators, TikTok/Reels integrations | |
- 🏢 **Enterprises**: Sentiment analysis, fraud detection, document processing | |
- 🇧🇩 **Cultural Preservation**: Digitize literature, dialects, and oral histories | |
- 🧠 **Research**: Advanced Bangla language models, transformer architectures, and fine-tuning pipelines | |
- 🌐 **Web3**: Blockchain integration for digital Bangla content authentication | |
--- | |
## ✨ **Key Features** | |
| **Category** | **Tools** | | |
|-----------------------|------------------------------------------------------------------------------------| | |
| **Gen-Z Playground** | `MemeGPT` • `Slang Translator` • `AI Rap Generator` • `Voice Filters` • `TikTok Content API` | | |
| **Enterprise NLP** | `Legal Doc Analyzer` • `News Sentiment API` • `Plagiarism Checker` • `Customer Service Bot` • `Bangla Data OCR` | | |
| **Voice Lab** | `Celebrity Voice Cloning` • `Regional Accent TTS` • `Audio Transcription` • `Dialect Analysis` • `Emotion Detection` | | |
| **Real-Time AI** | `Trend Predictor` • `Social Media Pulse` • `Ittefaq News Scanner` • `Market Sentiment Analysis` • `Election Opinion Tracker` | | |
| **Academia** | `Literature Analysis` • `Academic Paper Assistant` • `Educational Content Generator` • `Bangla Research Corpus` | | |
| **Security Suite** | `Bangla Fraud Detection` • `Phishing Text Analysis` • `Disinformation Tracker` • `Financial Alert System` | | |
--- | |
## 🎯 **Core Technologies** | |
### **Models Architecture** | |
- **ShobdhoBERT**: Transformer-based model trained on 5TB of Bangla text corpus | |
- **ShobdhoGPT-3.5**: GPT-based generative model fine-tuned on diverse Bangla content | |
- **DialectDiffusion**: Voice synthesis specialized for regional Bangla dialects | |
- **BanglaLLM-7B**: Large Language Model optimized for Bangla instruction following | |
- **Multimodal-Bangla**: Vision-language model for Bangla image-text understanding | |
### **Data Processing Pipeline** | |
- Proprietary text normalization for Bangla script variations | |
- Context-aware slang detection and interpretation | |
- Real-time news corpus analysis with automated categorization | |
- Specialized tokenization for Bangla script with compound word handling | |
- Advanced sentiment analysis for cultural nuances | |
--- | |
## 🎨 **Brand Identity** | |
### **Colors** | |
| Role | Hex | Preview | | |
|---------------|-----------|------------------------| | |
| Primary | `#6A5ACD` |  | | |
| Secondary | `#FF69B4` |  | | |
| Accent | `#00FFE0` |  | | |
| Dark Mode | `#1A1A2E` |  | | |
| Light Mode | `#F5F5F7` |  | | |
### **Mascot** | |
**বর্গী বট (Borgi Bot)** – Our street-smart AI mascot for Gen-Z campaigns: | |
 | |
--- | |
## ⚡ **Quick Start** | |
### **Prerequisites** | |
- Python 3.10+ / Node.js 18+ | |
- Hugging Face API Key (Register [here](https://huggingface.co/Shobdhonic)) | |
- Docker (optional, for containerized deployment) | |
- GPU acceleration (recommended for model training/inference) | |
### **Installation** | |
```bash | |
# Clone repo | |
git clone https://github.com/Shobdhonic/core-engine.git | |
cd core-engine | |
# Create virtual environment | |
python -m venv shobdhonic-env | |
source shobdhonic-env/bin/activate # On Windows: shobdhonic-env\Scripts\activate | |
# Install dependencies (Python) | |
pip install -r requirements.txt | |
# Or for Node.js | |
npm install | |
# Set up environment variables | |
cp .env.example .env | |
# Edit .env with your API keys | |
``` | |
### **Docker Setup** | |
```bash | |
# Build the Docker image | |
docker build -t shobdhonic:latest . | |
# Run the container | |
docker run -p 8000:8000 -v $(pwd):/app --env-file .env shobdhonic:latest | |
``` | |
### **Generate Your First Meme** | |
```python | |
from shobdhonic import MemeMaster | |
# Initialize with your API key | |
meme_api = MemeMaster(api_key="your_api_key_here") | |
# Create a meme with custom text and template | |
meme = meme_api.create( | |
text="একটা চা আর হয়না? ☕", | |
template="cha_kaku", | |
style="viral", # Options: viral, minimal, dramatic, retro | |
font="bangla_classic", | |
format="jpg" # Options: jpg, png, gif, mp4 | |
) | |
# Save the meme | |
meme.download("output/cha_kaku_meme.jpg") | |
# Share directly to social media | |
meme.share(platform="facebook") # Options: facebook, twitter, instagram, whatsapp | |
``` | |
### **Advanced Voice Cloning** | |
```python | |
from shobdhonic import VoiceForge | |
import numpy as np | |
# Initialize voice engine | |
voice_api = VoiceForge(api_key="your_api_key_here") | |
# Clone a voice with emotion parameters | |
voice = voice_api.clone( | |
target_voice="bappa_sir", # Popular Bangla YouTuber | |
text="ভাই, লাইক আর সাবস্ক্রাইব মনে হয়না!", | |
emotion="excited", # Options: neutral, sad, excited, angry, persuasive | |
dialect="dhaka", # Options: dhaka, chittagong, sylhet, rajshahi, khulna, barishal | |
speed=1.2, # Playback speed multiplier (0.5 - 2.0) | |
pitch_shift=0.3 # Adjust pitch (-1.0 to 1.0) | |
) | |
# Play the generated audio | |
voice.play() | |
# Save to file | |
voice.save("output/bappa_youtube_promo.mp3") | |
# Get waveform data for further processing | |
waveform = voice.get_waveform() | |
frequencies = np.fft.fft(waveform) | |
``` | |
### **News Sentiment Analysis** | |
```python | |
from shobdhonic import NewsAnalyzer | |
import pandas as pd | |
import matplotlib.pyplot as plt | |
# Initialize news analyzer | |
news_api = NewsAnalyzer(api_key="your_api_key_here") | |
# Analyze recent articles | |
results = news_api.analyze( | |
source="prothom_alo", # Options: prothom_alo, ittefaq, bangla_tribune, bbc_bangla | |
category="politics", # Options: politics, business, sports, entertainment, tech | |
date_range="last_7_days", # Options: today, last_24h, last_7_days, last_30_days, custom | |
sample_size=100 # Number of articles to analyze | |
) | |
# Get sentiment breakdown | |
sentiment_df = pd.DataFrame(results.sentiment_data) | |
# Plot results | |
plt.figure(figsize=(10, 6)) | |
plt.bar(sentiment_df['sentiment'], sentiment_df['percentage']) | |
plt.title('Political News Sentiment Analysis') | |
plt.xlabel('Sentiment') | |
plt.ylabel('Percentage (%)') | |
plt.savefig('output/sentiment_analysis.png') | |
``` | |
### **Enterprise Document Processing** | |
```python | |
from shobdhonic import DocumentProcessor | |
from shobdhonic.security import SensitiveDataDetector | |
# Initialize document processor | |
doc_api = DocumentProcessor(api_key="your_api_key_here") | |
# Process legal document | |
processed_doc = doc_api.process( | |
file_path="contracts/agreement.pdf", | |
tasks=[ | |
"summarize", # Create executive summary | |
"extract_entities", # Find people, organizations, dates | |
"identify_clauses", # Detect important legal clauses | |
"risk_assessment" # Flag potentially problematic terms | |
], | |
output_format="json" | |
) | |
# Check for sensitive information | |
sensitive_detector = SensitiveDataDetector() | |
security_scan = sensitive_detector.scan(processed_doc.raw_text) | |
if security_scan.has_sensitive_data: | |
print(f"WARNING: Found {len(security_scan.findings)} instances of sensitive data") | |
for finding in security_scan.findings: | |
print(f"- {finding.type}: {finding.severity} risk level") | |
# Export processed results | |
processed_doc.export( | |
output_path="output/processed_contract.json", | |
include_metadata=True, | |
redact_sensitive=True | |
) | |
``` | |
--- | |
## 🔋 **Core Modules** | |
### **Text Processing** | |
- `shobdhonic.tokenizer`: Advanced Bangla tokenization | |
- `shobdhonic.transformer`: Pre-trained transformer models | |
- `shobdhonic.nlp`: Natural language processing utilities | |
- `shobdhonic.generator`: Text generation capabilities | |
- `shobdhonic.translator`: Cross-language translation services | |
### **Audio & Speech** | |
- `shobdhonic.voice`: Text-to-speech and speech-to-text | |
- `shobdhonic.audio`: Audio processing utilities | |
- `shobdhonic.dialect`: Regional dialect processing | |
### **Media & Content** | |
- `shobdhonic.meme`: Meme generation engine | |
- `shobdhonic.social`: Social media integration | |
- `shobdhonic.content`: Content creation assistants | |
- `shobdhonic.video`: Video generation and editing | |
### **Analysis & Intelligence** | |
- `shobdhonic.sentiment`: Sentiment analysis tools | |
- `shobdhonic.analytics`: Usage statistics and reporting | |
- `shobdhonic.trends`: Trend detection and prediction | |
### **Security & Enterprise** | |
- `shobdhonic.security`: Security and compliance tools | |
- `shobdhonic.enterprise`: Enterprise integration utilities | |
- `shobdhonic.docs`: Document processing pipeline | |
--- | |
## 📈 **Performance Benchmarks** | |
| **Task** | **Shôbdhonic** | **Other Bangla NLP** | **Improvement** | | |
|------------------------------|-----------------|----------------------|-----------------| | |
| Text Classification | 94.7% | 88.2% | +6.5% | | |
| Named Entity Recognition | 92.3% | 85.9% | +6.4% | | |
| Sentiment Analysis | 89.8% | 81.3% | +8.5% | | |
| Question Answering | 87.6% | 79.1% | +8.5% | | |
| Text Generation (BLEU) | 0.731 | 0.658 | +11.1% | | |
| Speech Recognition (WER) | 6.4% | 11.7% | -5.3% (better) | | |
| Text-to-Speech (MOS) | 4.52/5 | 3.87/5 | +16.8% | | |
*Benchmarks conducted using standard Bangla test sets and industry metrics. Full methodology available in our [technical paper](https://shobdhonic.com/research/benchmarks).* | |
--- | |
## 📊 **Enterprise Solutions** | |
<div align="center"> | |
<a href="https://shobdhonic.com/enterprise"> | |
<img src="https://img.shields.io/badge/Shobdhonic_Enterprise-Get_Custom_Solutions-f42a41?style=for-the-badge&logo=gitlab"> | |
</a> | |
</div> | |
### **Banking & Finance** | |
- Fraud detection in Bangla SMS/call transcripts | |
- Customer support automation | |
- Financial document processing | |
- Transaction pattern analysis | |
- Risk assessment NLP | |
### **Media & Publishing** | |
- Auto-summarize news articles from Prothom Alo/Ittefaq | |
- Content recommendation engines | |
- Automated content tagging | |
- Engagement prediction | |
- Toxic comment filtering | |
### **Education** | |
- Essay grading and feedback | |
- Personalized learning content | |
- Question generation from textbooks | |
- Academic plagiarism detection | |
- Educational chatbots in Bangla | |
### **Government & NGOs** | |
- Citizen feedback analysis | |
- Service request categorization | |
- Policy document processing | |
- Public sentiment monitoring | |
- Disinformation detection | |
--- | |
## 💻 **API Integration** | |
### **REST API Example** | |
```javascript | |
// Using fetch in JavaScript | |
const fetchMeme = async () => { | |
const response = await fetch('https://api.shobdhonic.com/v1/create-meme', { | |
method: 'POST', | |
headers: { | |
'Content-Type': 'application/json', | |
'Authorization': 'Bearer YOUR_API_KEY' | |
}, | |
body: JSON.stringify({ | |
text: 'পরীক্ষার রেজাল্ট দেখার পর আমি', | |
template: 'sad_pepe', | |
format: 'jpg' | |
}) | |
}); | |
const data = await response.json(); | |
return data.meme_url; | |
}; | |
// Call the function | |
fetchMeme().then(url => { | |
document.getElementById('meme-image').src = url; | |
}); | |
``` | |
### **Python SDK Example** | |
```python | |
from shobdhonic import ShobdhonicClient | |
import asyncio | |
async def main(): | |
# Initialize client | |
client = ShobdhonicClient(api_key="YOUR_API_KEY") | |
# Use the sentiment analysis API | |
result = await client.analyze_sentiment( | |
text="এই সিনেমাটা দেখে আমি খুবই মুগ্ধ হয়েছি।", | |
detailed=True | |
) | |
print(f"Overall sentiment: {result.sentiment}") | |
print(f"Confidence score: {result.confidence:.2f}") | |
print(f"Emotional breakdown: {result.emotions}") | |
# Use the translation API | |
translation = await client.translate( | |
text="আমি বাংলায় কথা বলতে পারি।", | |
target_language="en" | |
) | |
print(f"Translation: {translation.text}") | |
print(f"Source language detected: {translation.source_language}") | |
# Run the async function | |
asyncio.run(main()) | |
``` | |
### **Webhook Integration** | |
```python | |
from flask import Flask, request, jsonify | |
import hmac | |
import hashlib | |
app = Flask(__name__) | |
@app.route('/webhook/shobdhonic', methods=['POST']) | |
def shobdhonic_webhook(): | |
# Verify the webhook signature | |
signature = request.headers.get('X-Shobdhonic-Signature') | |
secret = 'your_webhook_secret' | |
computed_signature = hmac.new( | |
secret.encode('utf-8'), | |
request.data, | |
hashlib.sha256 | |
).hexdigest() | |
if not hmac.compare_digest(signature, computed_signature): | |
return jsonify({'error': 'Invalid signature'}), 401 | |
# Process the webhook data | |
data = request.json | |
event_type = data.get('event_type') | |
if event_type == 'sentiment_alert': | |
handle_sentiment_alert(data) | |
elif event_type == 'content_moderation': | |
handle_content_moderation(data) | |
elif event_type == 'trend_detected': | |
handle_trend_detection(data) | |
return jsonify({'status': 'success'}), 200 | |
def handle_sentiment_alert(data): | |
# Process sentiment alerts | |
pass | |
def handle_content_moderation(data): | |
# Process content moderation events | |
pass | |
def handle_trend_detection(data): | |
# Process trend detection events | |
pass | |
if __name__ == '__main__': | |
app.run(debug=True, port=5000) | |
``` | |
--- | |
## 🧩 **Project Structure** | |
``` | |
shobdhonic/ | |
├── api/ # API endpoints | |
├── cli/ # Command-line tools | |
├── core/ # Core functionality | |
│ ├── models/ # ML models | |
│ ├── processors/ # Text processors | |
│ ├── tokenizers/ # Bangla tokenizers | |
│ └── vectors/ # Word embeddings | |
├── data/ # Data handling | |
│ ├── corpus/ # Text corpora | |
│ ├── loaders/ # Data loaders | |
│ └── scrapers/ # Web scrapers | |
├── media/ # Media generation | |
│ ├── audio/ # Audio processing | |
│ ├── images/ # Image generation | |
│ └── video/ # Video processing | |
├── security/ # Security tools | |
├── services/ # External services | |
├── ui/ # User interfaces | |
│ ├── web/ # Web interface | |
│ ├── mobile/ # Mobile interface | |
│ └── widgets/ # Embeddable widgets | |
├── utils/ # Utility functions | |
└── tests/ # Test suite | |
``` | |
--- | |
## 🛠️ **Development Workflow** | |
### **Setting Up Development Environment** | |
```bash | |
# Clone the development repository | |
git clone https://github.com/Shobdhonic/shobdhonic-dev.git | |
cd shobdhonic-dev | |
# Create development environment | |
python -m venv dev-env | |
source dev-env/bin/activate | |
# Install development dependencies | |
pip install -r requirements-dev.txt | |
# Set up pre-commit hooks | |
pre-commit install | |
``` | |
### **Running Tests** | |
```bash | |
# Run all tests | |
pytest | |
# Run specific test category | |
pytest tests/test_tokenizers.py | |
# Run with coverage report | |
pytest --cov=shobdhonic --cov-report=html | |
``` | |
### **Building Documentation** | |
```bash | |
# Generate API documentation | |
cd docs | |
make html | |
# View documentation | |
python -m http.server -d _build/html | |
``` | |
### **CI/CD Pipeline** | |
Our continuous integration and deployment pipeline automatically: | |
1. Runs tests on all pull requests | |
2. Performs code quality checks | |
3. Builds and publishes packages on releases | |
4. Deploys to staging/production environments | |
5. Updates documentation site | |
--- | |
## 🤝 **Contribute to Bangla AI** | |
We welcome contributions from the community! Here's how to get started: | |
1. **Fork the Repository**: [GitHub/Shobdhonic](https://github.com/Shobdhonic) | |
2. **Pick an Issue**: Look for issues labeled `good-first-issue`, `help-wanted`, or `Gen-Z feature` | |
3. **Set Up Your Environment**: Follow the development setup instructions above | |
4. **Make Your Changes**: Write code and tests for your feature or fix | |
5. **Submit a Pull Request**: Follow our [Contribution Guidelines](CONTRIBUTING.md) | |
### **Areas We Need Help With** | |
- 🧠 **Model Training**: Fine-tuning transformers on Bangla data | |
- 🎮 **Gen-Z Features**: Cultural memes, slang translators, social integrations | |
- 📱 **Mobile Development**: React Native components for our SDK | |
- 🔊 **Voice Data**: Collection and processing of regional dialects | |
- 📚 **Documentation**: Tutorials, examples, and API documentation | |
### **Contributor Code of Conduct** | |
All contributors are expected to adhere to our [Code of Conduct](CODE_OF_CONDUCT.md) which promotes a welcoming, inclusive, and harassment-free experience for everyone. | |
--- | |
## 📒 **Documentation** | |
### **API Reference** | |
Complete API documentation is available at [docs.shobdhonic.com](https://docs.shobdhonic.com) | |
### **Tutorials** | |
Step-by-step tutorials for common tasks: | |
- [Getting Started with Shôbdhonic](https://docs.shobdhonic.com/tutorials/getting-started) | |
- [Building a Bangla Chatbot](https://docs.shobdhonic.com/tutorials/chatbot) | |
- [Voice Cloning Basics](https://docs.shobdhonic.com/tutorials/voice-cloning) | |
- [Meme Generation](https://docs.shobdhonic.com/tutorials/meme-gen) | |
- [Enterprise Document Processing](https://docs.shobdhonic.com/tutorials/document-processing) | |
### **Examples** | |
Explore our [examples directory](https://github.com/Shobdhonic/examples) for complete code samples: | |
- Basic NLP tasks (tokenization, classification, etc.) | |
- Voice synthesis and analysis | |
- Media generation workflows | |
- Enterprise integration patterns | |
- Web and mobile application samples | |
--- | |
## 📜 **License & Ethics** | |
```text | |
MIT License | © 2024 Shôbdhonic | |
*Bangla Data Ethics Pledge:* | |
- No misuse of dialects/regional languages | |
- Cite sources like Ittefaq/Prothom Alo | |
- Free access for academic research and non-profits/NGOs | |
- Respecting privacy and data sovereignty | |
- Preserving Bangla linguistic diversity | |
``` | |
### **Ethical AI Commitment** | |
At Shôbdhonic, we commit to: | |
- Transparency in our AI systems | |
- Fairness and bias mitigation | |
- Protection of user privacy | |
- Responsible data collection practices | |
- Supporting cultural preservation | |
- Making advanced Bangla NLP accessible to all | |
Our complete AI Ethics Policy is available [here](https://shobdhonic.com/ethics). | |
--- | |
## 🧪 **Research** | |
Our team publishes open research on Bangla NLP: | |
- [BanglaTransformers: Pre-training Transformers for Bengali NLP](https://arxiv.org/abs/xxxx.xxxxx) | |
- [Dialect-Aware Speech Synthesis for Low-Resource Languages](https://arxiv.org/abs/xxxx.xxxxx) | |
- [BanglaEval: Benchmarking NLP Systems for Bengali](https://arxiv.org/abs/xxxx.xxxxx) | |
Interested in research collaboration? Contact us at [email protected] | |
--- | |
## 🌐 **Connect** | |
<div align="center"> | |
[](https://huggingface.co/Shobdhonic) | |
[](https://youtube.com/Shobdhonic) | |
[](https://linkedin.com/company/Shobdhonic) | |
[](https://medium.com/Shobdhonic) | |
[](https://discord.gg/shobdhonic) | |
</div> | |
--- | |
<div align="center"> | |
**মহাযুদ্ধ বাংলা ভাষার, আমরা প্রস্তুত!** | |
*Powered by রক্তে বাংলা, প্রযুক্তিতে Shôbdhonic* | |
</div> |