anvilogic-admin commited on
Commit
ec9869b
·
verified ·
1 Parent(s): b9a71db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -5
README.md CHANGED
@@ -9,11 +9,25 @@ pinned: false
9
 
10
  # **Anvilogic - Where AI Meets Cybersecurity**
11
 
12
- Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models! Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
 
13
 
14
- ## Model collections:
 
 
 
15
 
16
- ### Typosquatting collection:
17
- This collection is comprised of :
18
 
19
- - **Embedder** This model provide representation for domain names. This is used to mine similar domain .
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  # **Anvilogic - Where AI Meets Cybersecurity**
11
 
12
+ Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models!
13
+ Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
14
 
15
+ ## Typosquatting collection
16
+ Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites.
17
+ This collection aims at detecting typosquatted domains by identifying and flagging such domains :
18
+ It is comprised of the following:
19
 
20
+ ### Models
 
21
 
22
+ - **Embedder :** This model provides representation for domain names. This is used to mine similar domain. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
23
+ - **Cross-Encoder :** This model is able to compare two domain names and conclude if one model is a typosquat of another. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
24
+ - **T5 Detection :** This model is a derived version of T5 trained on a new task. with the prefix : "Is the first domain a typosquat of the second : " to which we append *typosquat candidate domain* and *Legitimate domain*
25
+
26
+ ### Datasets
27
+
28
+ - **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs
29
+ - **Cross-Encoder :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
30
+ - **T5 Detection :** Dataset formatted to train T5 model with (prompt,response) pairs .
31
+
32
+ ### Spaces
33
+ Multiple spaces are provided to try aforementioned models.