Update README.md
Browse files
README.md
CHANGED
@@ -9,11 +9,25 @@ pinned: false
|
|
9 |
|
10 |
# **Anvilogic - Where AI Meets Cybersecurity**
|
11 |
|
12 |
-
Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models!
|
|
|
13 |
|
14 |
-
##
|
|
|
|
|
|
|
15 |
|
16 |
-
###
|
17 |
-
This collection is comprised of :
|
18 |
|
19 |
-
- **Embedder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
# **Anvilogic - Where AI Meets Cybersecurity**
|
11 |
|
12 |
+
Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models!
|
13 |
+
Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
|
14 |
|
15 |
+
## Typosquatting collection
|
16 |
+
Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites.
|
17 |
+
This collection aims at detecting typosquatted domains by identifying and flagging such domains :
|
18 |
+
It is comprised of the following:
|
19 |
|
20 |
+
### Models
|
|
|
21 |
|
22 |
+
- **Embedder :** This model provides representation for domain names. This is used to mine similar domain. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
|
23 |
+
- **Cross-Encoder :** This model is able to compare two domain names and conclude if one model is a typosquat of another. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
|
24 |
+
- **T5 Detection :** This model is a derived version of T5 trained on a new task. with the prefix : "Is the first domain a typosquat of the second : " to which we append *typosquat candidate domain* and *Legitimate domain*
|
25 |
+
|
26 |
+
### Datasets
|
27 |
+
|
28 |
+
- **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs
|
29 |
+
- **Cross-Encoder :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
|
30 |
+
- **T5 Detection :** Dataset formatted to train T5 model with (prompt,response) pairs .
|
31 |
+
|
32 |
+
### Spaces
|
33 |
+
Multiple spaces are provided to try aforementioned models.
|