Is the dataset available?

#1
by xzxy - opened

Excellent work! I am working to build one myself for our internal security team (Offensive, defensive, and compliance). However, I have yet to find a decent dataset to build from. Do you mind sharing yours? I thought of building one myself by feeding text documents into Mistral and outputting input/output pairs, but a head start on a dataset would be appreciated :)

Hello, good job. I have the same request. Thank you so much.

Hi there! 🤗

Great work on the dataset! Could you share insights into how the data pairs were collected? Also, any plans to release the dataset publicly? I'm currently working on building a cybersecurity chatbot similar to Lily and would find this data incredibly useful. Thanks!

Sego Lily Labs org

Thanks for the comments. I am working on cleaning this dataset so I can release it. I am also in the process of creating a new model and dataset that uses about 3 million pairs.

Great work! I am also interested in the dataset. Thanks

Really nice work!!! Is the dataset available? Thank you.

Hi @unshadow ,

I appreciate your work on creating the fine-tuned LLM for cyber security. Can you please let me know the size of the dataset used for fine-tuning? Also, is the dataset available, and when are you planning to release it? I appreciate your time and response. Thank you!

Hey There, I'm new to this and i have ben assigned to make a model like this. I downloaded the Lexi Llama 3 uncensored to test if it could help. It did. But How can i fine tune it on free google collab? Which model shall i use? llama3 direct, but it is censored or any other model. Also, it'd very lovely if y'all just guide me out of this. Please!. I Dont Really know what the dataset should look like or what to do!

Hey, I was also wondering the same thing, is the dataset available? Thank you for creating this tool!

+1 from me too.

Life moves on but could you release the dataset and we will do the cleaning for you :)

Hi @unshadow

I really appreciate the work you’ve done on the fine-tuned LLM for cybersecurity—it’s impressive! I was curious about a couple of things:
1.How big was the dataset you used for fine-tuning?
2.Is the dataset available, or are there any plans to release it?

Thanks so much for your time! Looking forward to hearing back.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment