Is the dataset available?

by xzxy - opened Mar 6, 2024

xzxy

Mar 6, 2024

Excellent work! I am working to build one myself for our internal security team (Offensive, defensive, and compliance). However, I have yet to find a decent dataset to build from. Do you mind sharing yours? I thought of building one myself by feeding text documents into Mistral and outputting input/output pairs, but a head start on a dataset would be appreciated :)

EtienneDu91

Mar 15, 2024

Hello, good job. I have the same request. Thank you so much.

medmac01

Mar 15, 2024

Hi there! 🤗

Great work on the dataset! Could you share insights into how the data pairs were collected? Also, any plans to release the dataset publicly? I'm currently working on building a cybersecurity chatbot similar to Lily and would find this data incredibly useful. Thanks!

unshadow

Sego Lily Labs org May 16, 2024

Thanks for the comments. I am working on cleaning this dataset so I can release it. I am also in the process of creating a new model and dataset that uses about 3 million pairs.

Maj3Ai

May 23, 2024

Great work! I am also interested in the dataset. Thanks

Gordo-Nation

Jun 12, 2024

Really nice work!!! Is the dataset available? Thank you.

chaithanyasai

Jun 12, 2024

Hi @unshadow ,

I appreciate your work on creating the fine-tuned LLM for cyber security. Can you please let me know the size of the dataset used for fine-tuning? Also, is the dataset available, and when are you planning to release it? I appreciate your time and response. Thank you!

P00j4n

Jun 27, 2024

Hey There, I'm new to this and i have ben assigned to make a model like this. I downloaded the Lexi Llama 3 uncensored to test if it could help. It did. But How can i fine tune it on free google collab? Which model shall i use? llama3 direct, but it is censored or any other model. Also, it'd very lovely if y'all just guide me out of this. Please!. I Dont Really know what the dataset should look like or what to do!

alvaroarr

Aug 23, 2024

Hey, I was also wondering the same thing, is the dataset available? Thank you for creating this tool!

fahadshery

Sep 2, 2024

+1 from me too.

Life moves on but could you release the dataset and we will do the cleaning for you :)

naveenvuppu

Dec 5, 2024

Hi @unshadow

I really appreciate the work you’ve done on the fine-tuned LLM for cybersecurity—it’s impressive! I was curious about a couple of things:
1.How big was the dataset you used for fine-tuning?
2.Is the dataset available, or are there any plans to release it?

Thanks so much for your time! Looking forward to hearing back.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment