44 7 191

sometimesanotion

https://ko-fi.com/sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

liked a model about 8 hours ago

jxm/gpt-oss-20b-base

upvoted a collection about 12 hours ago

GPT-OSS Pruned Experts (4.2B-20B) [IF, Science, Math, etc.]

liked a dataset about 13 hours ago

AmanPriyanshu/GPT-OSS-20B-MoE-expert-activations

View all activity

Organizations

liked a model about 8 hours ago

jxm/gpt-oss-20b-base

Text Generation • 21B • Updated 3 days ago • 1.33k • 140

upvoted a collection about 12 hours ago

GPT-OSS Pruned Experts (4.2B-20B) [IF, Science, Math, etc.]

Collection

Complete collection of domain-specialized GPT-OSS models (1-32 experts) optimized for science, math, medicine, law, safety, and instruction following. • 8 items • Updated 4 days ago • 5

liked a dataset about 13 hours ago

AmanPriyanshu/GPT-OSS-20B-MoE-expert-activations

Preview • Updated 9 days ago • 21 • 2

upvoted a collection about 13 hours ago

GPT-OSS General (4.2B to 20B)

Collection

Collection of pruned GPT-OSS models spanning 1-32 experts, maintaining general capabilities across domains while reducing computational requirements. • 29 items • Updated 4 days ago • 5

liked 2 models about 13 hours ago

AmanPriyanshu/gpt-oss-4.2b-specialized-all-pruned-moe-only-4-experts

Text Generation • 4B • Updated 5 days ago • 18 • 4

AmanPriyanshu/gpt-oss-9.0b-specialized-science-pruned-moe-only-12-experts

Text Generation • 9B • Updated 5 days ago • 4 • 2

liked a dataset 9 days ago

XenArcAI/MathX-5M

Viewer • Updated 23 days ago • 4.32M • 4.59k • 57

replied to sequelbox's post 9 days ago

I also have a hypothesis that this model can be efficiently downsized not by pruning experts, but by using merges and LoRAs to downsize their unique parameter count. The merge would be most of the shared parameters, and the routing table need not change.

I'm building up a new version of my pipeline to test this hypothesis. I suspect it'd let us get most of the performance in <12B parameters.

replied to sequelbox's post 10 days ago

This is a very cool release! I really enjoy the ShiningValiant series!

Do you see potential to prune experts or layers from the gpt-oss-20b model to downsize it, and then finetune?

reacted to sequelbox's post with 🔥👍 10 days ago

Post

2161

NEW RELEASE: Shining Valiant 3 now available for openai/gpt-oss-20b!

- Cutting edge science-reasoning: sequelbox/Celestia3-DeepSeek-R1-0528 for physics, biology, chemistry, compsci, astronomy, Earth science, and information theory.
- AI to build AI: the all-new sequelbox/Mitakihara-DeepSeek-R1-0528 dataset for high-quality reasoning performance on AI, MLOps, math and CUDA, complex adaptive and agentic systems, cognition, logic, linguistics, simulation, knowledge management, and more!
- Creative reasoning and general chat performance supplemented with sequelbox/Raiden-DeepSeek-R1

Get the new SV3: ValiantLabs/gpt-oss-20b-ShiningValiant3

This is our first release for the new openai/gpt-oss-20b - we're hoping to support this model with more releases going forward.

We're also excited to bring our models to Qwen/Qwen3-4B-Thinking-2507 and the other 2507 Qwen 3 models - coming very soon!

We want to bring SV3, Esper 3, and our Experimental Reasoning finetunes to more models ASAP. Help us out: sequelbox/SupportOpenSource

Open source matters. Fight for it with us.

love,
allegra

3 replies

liked a model 10 days ago

ValiantLabs/gpt-oss-20b-ShiningValiant3

Text Generation • 21B • Updated 11 days ago • 183 • 12

reacted to sequelbox's post with 🚀 19 days ago

Post

2289

NEW EXPERIMENTAL RELEASE: DAG Reasoning is here!

- Our first Experimental Reasoning Modality release: create structured, analytical Directed Acyclic Graphs to provide insight into your queries and situations!
- Multi-step analysis identifies causal relationships, produces confidence measurements, and forms a single structured graph object.
- DAG Reasoning Format provides clear, readable JSON containing structured, useful information; easy to use for creating visualizations, doing analysis, or further conversation with your assistant.
- Trained in a variety of subjects for flexible analysis: programming, science, business, economics, finance, law, logistics, management, and more!

Our first DAG Reasoning release is Qwen 3, starting off with 8B and 14B!
Get 8B: sequelbox/Qwen3-8B-DAG-Reasoning
Get 14B: sequelbox/Qwen3-14B-DAG-Reasoning

You can also get the DAG Reasoning dataset, to train your own models to use DAG Reasoning Format: sequelbox/DAG-Reasoning-DeepSeek-R1-0528

Support our experimental open-source research efforts, models and datasets: sequelbox/SupportOpenSource

with love,
allegra

2 replies

liked 2 models 23 days ago

huihui-ai/Huihui-Jan-nano-128k-abliterated

Text Generation • 4B • Updated Jul 3 • 57 • 4

huihui-ai/Huihui-Qwen3-4B-abliterated-v2

Text Generation • 4B • Updated Jun 19 • 320 • 4

liked 2 models 24 days ago

ertghiu256/qwen-3-reasoning-combination-with-deepseek

Text Generation • 4B • Updated Jul 10 • 15 • 1

ValiantLabs/Qwen3-4B-ShiningValiant3

Text Generation • 4B • Updated 9 days ago • 96 • 4

liked a model 25 days ago

arcee-ai/Homunculus

Text Generation • 12B • Updated Jun 3 • 112 • 97

reacted to sequelbox's post with 🔥 about 1 month ago

Post

2605

Some new releases:

- brought the new Shining Valiant 3 series (science-reasoning, AI-reasoning, general chat) to Qwen 3 4B: ValiantLabs/Qwen3-4B-ShiningValiant3
- merged models for Shining Valiant 3 and Esper 3, combining their technical expertise and reasoning skills:
4b: sequelbox/Qwen3-4B-PlumEsper
8b: sequelbox/Qwen3-8B-PlumEsper

coming up we'll have some experimental reasoning releases - datasets and models will be out soon!

also will be bringing SV3 and Esper 3 to more models.

lets keep working for open source :)

love,
allegra

liked a model about 1 month ago

LiquidAI/LFM2-1.2B

Text Generation • 1B • Updated 5 days ago • 20.8k • 255

sometimesanotion

AI & ML interests

Recent Activity

Organizations

sometimesanotion's activity