nyuuzyou's picture

nyuuzyou PRO

nyuuzyou

AI & ML interests

None yet

Recent Activity

Organizations

Social Post Explorers's profile picture AI Starter Pack's profile picture

nyuuzyou's activity

posted an update 3 days ago
view post
Post
478
🌐 Public MediaWiki Collection Dataset - nyuuzyou/wikis

Collection of 1.66M+ articles from 930 public MediaWiki instances featuring:

- Full article content from diverse public wikis across the internet
- Complete metadata including templates, categories, and section structure
- Rich structural information preserving wiki organization and links
- Multilingual content across 35+ languages including English, Chinese, Spanish, and more
- Regional language variants including US/UK English, Brazilian Portuguese, and Traditional/Simplified Chinese

Key contents:
- 1,662,448 wiki articles with full text
- Extensive metadata including templates, categories, sections
- Internal wikilinks and external reference information
- Cross-domain knowledge spanning multiple topics and fields
posted an update 6 days ago
view post
Post
2432
πŸ“š Historical Russian Technical Journal Images Dataset - nyuuzyou/journals

Π‘ollection of digitized pages from vintage Russian technical journals featuring:

- 7.47k high-quality images
- Machine-generated descriptions in Russian
- Valuable historical technical content for image-to-text applications

Content descriptions are dedicated to the public domain under the CC0 1.0 license, allowing unrestricted use without attribution.
New activity in nyuuzyou/journals 7 days ago
posted an update 7 days ago
view post
Post
1968
🌐 Grustnogram Social Media Dataset - nyuuzyou/grustnogram

A comprehensive collection of 597K posts from Grustnogram.ru featuring:

- 597K social media posts with full text and image content (all images are black and white)
- Rich metadata including user IDs, post interactions (likes, comments)
- Content from anonymous text-only posts
- Approximately 278.9 GB of content

Content is dedicated to the public domain under the CC0 1.0 license, allowing unrestricted reuse without attribution or share-alike requirements.
reacted to ngxson's post with πŸš€ 7 days ago
view post
Post
2849
A comprehensive matrix for which format should you use.

Read more on my blog post: https://huggingface.co/blog/ngxson/common-ai-model-formats

| Hardware        | GGUF      | PyTorch                | Safetensors              | ONNX  |
|-----------------|-----------|------------------------|--------------------------|-------|
| CPU             | βœ… (best) | 🟑                      | 🟑                       | βœ…    |
| GPU             | βœ…        | βœ…                      | βœ…                       | βœ…    |
| Mobile          | βœ…        | 🟑 (via executorch)     | ❌                       | βœ…    |
| Apple silicon   | βœ…        | 🟑                      | βœ… (via MLX framework)   | βœ…    |
  • 1 reply
Β·
posted an update 9 days ago
view post
Post
626
πŸ›« AEX.ru Aviation News Dataset - nyuuzyou/aex

Key contents:
- 249,149 aviation news articles with full text
- Metadata including tags, image captions, and attributions
- URL information for reference
- Russian language content focusing on aviation topics
reacted to stefan-it's post with πŸ‘ 10 days ago
view post
Post
5057
She arrived 😍

[Expect more models soon...]
  • 2 replies
Β·
reacted to fdaudens's post with ❀️ 14 days ago
posted an update 14 days ago
view post
Post
1298
🌐 Fandom.com Community Dataset - nyuuzyou/fandom

A comprehensive collection of 7.04M wiki pages from Fandom.com communities featuring:
- Full article content and metadata from current pages
- Rich structural data including templates, categories, and links
- Multilingual content across 40+ languages
- Complete metadata including titles and section structure

Content is available under CC-BY-SA 3.0 license, allowing reuse with attribution and share-alike requirements.

Key contents:
- 7.04M wiki articles with full text
- Metadata including templates, categories, sections
- Internal and external link information
- Multi-language support including major world languages

The dataset provides a valuable resource for:
- Text generation and classification tasks
- Topic modeling and categorization
- Cross-language information retrieval
- Wiki structure analysis

All content comes from public Fandom.com community wikis as of February 2025 and maintains original CC-BY-SA 3.0 licensing.