Primus - a trendmicro-ailab Collection

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training

Paper • 2502.11191 • Published 7 days ago • 3

Note Start by reading the 🚀Primus Paper! To the best of our knowledge, we are the 🔥 first to release datasets covering cybersecurity pretraining, IFT, and reasoning distillation. Of course, we are also the first to pretrain an LLM on a large-scale cybersecurity corpus.

trendmicro-ailab/Llama-Primus-Base

Text Generation • Updated 1 day ago • 8 • 4

Note Based on Llama-3.1-8B-Instruct, continually pretrained on 2.77B tokens of cybersecurity text, achieving a 🚀15.88% improvement in the aggregated score across multiple cybersecurity benchmarks.

trendmicro-ailab/Llama-Primus-Merged

Text Generation • Updated 1 day ago • 121 • 7

Note Instruct Model! While maintaining nearly the same instruction-following capability as Llama-3.1-8B-Instruct, achieving a 🚀14.84% improvement across multiple cybersecurity benchmarks.

trendmicro-ailab/Llama-Primus-Reasoning

Text Generation • Updated 3 days ago • 8 • 1

Note Distilled on reasoning and reflection data from o1-preview for cybersecurity tasks, achieving a 🚀10% improvement on CISSP.

trendmicro-ailab/Primus-Seed

Viewer • Updated 1 day ago • 174k

Note Includes high-quality cybersecurity texts manually collected from reputable sources such as wikipedia, MITRE, cybersecurity company websites, CTI, and more.

trendmicro-ailab/Primus-FineWeb

Viewer • Updated 2 days ago • 3.39M • 21 • 4

Note Includes 2.57B tokens of cybersecurity texts filtered from FineWeb.

trendmicro-ailab/Primus-Instruct

Viewer • Updated 4 days ago • 835 • 7

Note Includes approximately 1K QA pairs covering common cybersecurity business scenarios.

trendmicro-ailab/Primus-Reasoning

Viewer • Updated 1 day ago • 2.4k • 12 • 4

Note Includes reasoning and reflection data generated by o1-preview on cybersecurity tasks for distillation.