|
### Model Description |
|
|
|
This repository hosts three pre-trained models desgined for metadata attribute standardization for genomic regions metadata. The three pre-trained models are: `ENCODE`, `FAIRTRACKS` and `BEDBASE`. These models, along with their associated files and schema designs are used for standardization by `BEDMS` (BED Metadata Standardizer). To know more about BEDMS, you can visit: https://github.com/databio/bedms |
|
|
|
### Directory struture |
|
|
|
``` |
|
/attribute-standardizer-model6 |
|
/bedbase |
|
- bedbase_schema_design.yaml # BEDBASE schema |
|
- label_encoder_bedbase.pkl # Unqiue label values derived from training data, model classifies the output into these labels for BEDBASE schema |
|
- model_bedbase.pth # BEDBASE schema trained model |
|
- vectorizer_bedbase.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model |
|
- config_bedbase.yaml # Config file with model parameters |
|
/encode |
|
- encode_schema_design.yaml #ENCODE schema |
|
- label_encoder_encode.pkl # Unqiue label values derived from training data, model classifies the output into these labels for ENCODE schema |
|
- model_encode.pth # ENCODE schema trained model |
|
- vectorizer_encode.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model |
|
- config_encode.yaml # Config file with model parameters |
|
/fairtracks |
|
- fairtracks_schema_design.yaml # FAIRTRACKS schema |
|
- label_encoder_fairtracks.pkl # Unqiue label values derived from training data, model classifies the output into these labels for FAIRTRACKS schema |
|
- model_fairtracks.pth #FAIRTRACKS schema trained model |
|
- vectorizer_fairtracks.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model |
|
- config_fairtracks.yaml # Config file with model parameters |
|
``` |
|
|
|
### Usage |
|
|
|
To use this model, refer to the GitHub repository of `bedms`: |
|
|
|
[BEDMS](https://github.com/databio/bedms) |
|
|
|
### Contribution |
|
|
|
To add a schema model: |
|
1. You should first train the new model using [BEDMS](https://github.com/databio/bedms). |
|
2. Create a new directory within this repository with the name of the new schema. ( For example, "new_schema"). |
|
3. Maintain the directory structure like this: |
|
|
|
``` |
|
/attribute-standardizer-model6 |
|
/new_schema |
|
- new_schema_design.yaml |
|
- label_encoder_new_schema.pkl |
|
- model_new_schema.pth |
|
- vectorizer_new_schema.pkl |
|
- config_new_schema.yaml |
|
``` |
|
|