databio
/

attribute-standardizer-model6

Model card Files Files and versions Community

saanikat commited on Oct 4, 2024

Commit

8943500

1 Parent(s): 787b39b

config files

Browse files

Files changed (4) hide show

README.md +4 -0
bedbase/config_bedbase.yaml +6 -0
encode/config_encode.yaml +6 -0
fairtracks/config_fairtracks.yaml +6 -0

README.md CHANGED Viewed

@@ -11,16 +11,19 @@ This repository hosts three pre-trained models desgined for metadata attribute s
         - label_encoder_bedbase.pkl # Unqiue label values derived from training data, model classifies the output into these labels for BEDBASE schema
         - model_bedbase.pth # BEDBASE schema trained model
         - vectorizer_bedbase.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
     /encode
         - encode_schema_design.yaml #ENCODE schema
         - label_encoder_encode.pkl # Unqiue label values derived from training data, model classifies the output into these labels for ENCODE schema
         - model_encode.pth # ENCODE schema trained model
         - vectorizer_encode.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
     /fairtracks
         - fairtracks_schema_design.yaml # FAIRTRACKS schema
         - label_encoder_fairtracks.pkl # Unqiue label values derived from training data, model classifies the output into these labels for FAIRTRACKS schema
         - model_fairtracks.pth #FAIRTRACKS schema trained model
         - vectorizer_fairtracks.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
 ```
 ### Usage
@@ -43,4 +46,5 @@ To add a schema model:
         - label_encoder_new_schema.pkl
         - model_new_schema.pth
         - vectorizer_new_schema.pkl
 ```

         - label_encoder_bedbase.pkl # Unqiue label values derived from training data, model classifies the output into these labels for BEDBASE schema
         - model_bedbase.pth # BEDBASE schema trained model
         - vectorizer_bedbase.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
+        - config_bedbase.yaml # Config file with model parameters
     /encode
         - encode_schema_design.yaml #ENCODE schema
         - label_encoder_encode.pkl # Unqiue label values derived from training data, model classifies the output into these labels for ENCODE schema
         - model_encode.pth # ENCODE schema trained model
         - vectorizer_encode.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
+        - config_encode.yaml # Config file with model parameters
     /fairtracks
         - fairtracks_schema_design.yaml # FAIRTRACKS schema
         - label_encoder_fairtracks.pkl # Unqiue label values derived from training data, model classifies the output into these labels for FAIRTRACKS schema
         - model_fairtracks.pth #FAIRTRACKS schema trained model
         - vectorizer_fairtracks.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
+        - config_fairtracks.yaml # Config file with model parameters
 ```
 ### Usage
         - label_encoder_new_schema.pkl
         - model_new_schema.pth
         - vectorizer_new_schema.pkl
+        - config_new_schema.yaml
 ```

bedbase/config_bedbase.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+params:
+  input_size_bow: 13708
+  embedding_size: 384
+  hidden_size: 32
+  output_size: 12
+  dropout_prob: 0.113

encode/config_encode.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+params:
+  input_size_bow: 10459
+  embedding_size: 384
+  hidden_size: 32
+  output_size: 18
+  dropout_prob: 0.113

fairtracks/config_fairtracks.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+params:
+  input_size_bow: 13617
+  embedding_size: 384
+  hidden_size: 32
+  output_size: 15
+  dropout_prob: 0.113