anvilogic-admin commited on
Commit
235c630
·
verified ·
1 Parent(s): ec9869b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -19,15 +19,17 @@ Founded in 2019, Anvilogic specializes in AI-driven threat detection and automat
19
 
20
  ### Models
21
 
22
- - **Embedder :** This model provides representation for domain names. This is used to mine similar domain. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
23
- - **Cross-Encoder :** This model is able to compare two domain names and conclude if one model is a typosquat of another. This model exists both based on RoBerta model (with BPE tokenization) and CANINE-c (with character-level encoding)
24
- - **T5 Detection :** This model is a derived version of T5 trained on a new task. with the prefix : "Is the first domain a typosquat of the second : " to which we append *typosquat candidate domain* and *Legitimate domain*
25
 
26
  ### Datasets
27
 
28
  - **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs
29
- - **Cross-Encoder :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
30
- - **T5 Detection :** Dataset formatted to train T5 model with (prompt,response) pairs .
31
 
32
  ### Spaces
33
- Multiple spaces are provided to try aforementioned models.
 
 
 
19
 
20
  ### Models
21
 
22
+ - **Embedders :** This model provides representation for domain names. This is used to mine similar domains. This model exists both based on RoBERTa model (with BPE tokenization) and CANINE-c (with character-level encoding)
23
+ - **Cross-Encoders :** This model is able to compare two domain names and conclude if one domain is a typosquat of another. This model exists both based on RoBERTa model (with BPE tokenization) and CANINE-c (with character-level encoding)
24
+ - **T5 :** This model is a derived version of T5 trained on a new task, with the prefix : "Is the first domain a typosquat of the second : " to which we append *TYPOSQUAT_DOMAIN* and *LEGITIMATE_DOMAIN*
25
 
26
  ### Datasets
27
 
28
  - **Embedder training dataset :** Dataset formatted to train embedding model with (Anchor,Positive) pairs
29
+ - **Cross-Encoder training dataset :** Dataset formatted to train Cross-encoder model with (Anchor,Positive,label) samples.
30
+ - **T5 training dataset :** Dataset formatted to train T5 model with (prompt,response) pairs .
31
 
32
  ### Spaces
33
+ - **Embedder Typosquat Detect :** Allows the user to retrieve most similar domains from a pool of 4000 most common domains.
34
+ - **CE Typosquat Detect :** Allows the user to compare two domains using Cross-encoders.The model outputs of a probability of typosquatting.
35
+ - **T5 Typosquat Detect :** Allows the user to compare two domains using T5. The model outputs a boolean.