File size: 4,369 Bytes
49f0c5b 30fc9b7 49f0c5b 30fc9b7 49f0c5b 30fc9b7 49f0c5b 30fc9b7 49f0c5b 30fc9b7 49f0c5b 30fc9b7 66c569a 49f0c5b 30fc9b7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# prodigy-ecfr-textcat
## About the Project
Our goal is to organize these financial institution rules and regulations so financial institutions can go through newly created rules and regulations to know which departments to send the information to and to allow easy retrieval of these regulations when necessary. Text mining and information retrieval will allow a large step of the process to be automated. Automating these steps will allow less time and effort to be contributed for financial institutions employees. This allows more time and work to be used to accomplish other projects.
## Table of Contents
- [About the Project](#about-the-project)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [File Structure](#file-structure)
- [License](#license)
- [Acknowledgements](#acknowledgements)
## Getting Started
Instructions on setting up the project on a local machine.
### Prerequisites
Before running the project, ensure you have the following software dependencies installed:
- [Python 3.x](https://www.python.org/downloads/)
- [spaCy](https://spacy.io/usage)
- [Prodigy](https://prodi.gy/docs/) (optional)
### Installation
Follow these step-by-step instructions to install and configure the project:
1. **Clone this repository to your local machine.**
```bash
git clone <https://github.com/ManjinderUNCC/prodigy-ecfr-textcat.git>
2. Install the required dependencies by running:
```bash
pip install -r requirements.txt
```
## Usage
To use the project, follow these steps:
1. **Prepare your data:**
- Place your dataset files in the `/data` directory.
- Optionally, annotate your data using Prodigy and save the annotations in the `/data` directory.
2. **Train the text classification model:**
- Run the training script located in the `/python_Code` directory.
3. **Evaluate the model:**
- Use the evaluation script to assess the model's performance on labeled data.
4. **Make predictions:**
- Apply the trained model to new, unlabeled data to classify it into relevant categories.
## File Structure
Describe the organization of files and directories within the project.
- `/corpus`
- `/labels`
- `ner.json`
- `parser.json`
- `tagger.json`
- `textcat_multilabel.json`
- `/data`
- `eval.jsonl`
- `firstStep_file.jsonl`
- `five_examples_annotated5.jsonl`
- `goldenEval.jsonl`
- `thirdStep_file.jsonl`
- `train.jsonl`
- `train200.jsonl`
- `train4465.jsonl`
- `/my_trained_model`
- `/textcat_multilabel`
- `cfg`
- `model`
- `/vocab`
- `key2row`
- `lookups.bin`
- `strings.json`
- `vectors`
- `vectors.cfg`
- `config.cfg`
- `meta.json`
- `tokenizer`
- `/output`
- `/experiment1`
- `/model-best`
- `/textcat_multilabel`
- `cfg`
- `model`
- `/vocab`
- `key2row`
- `lookups.bin`
- `strings.json`
- `vectors`
- `vectors.cfg`
- `config.cfg`
- `meta.json`
- `tokenizer`
- `/model-last`
- `/textcat_multilabel`
- `cfg`
- `model`
- `/vocab`
- `key2row`
- `lookups.bin`
- `strings.json`
- `vectors`
- `vectors.cfg`
- `config.cfg`
- `meta.json`
- `tokenizer`
- `/experiment3`
- `/model-best`
- `/textcat_multilabel`
- `cfg`
- `model`
- `/vocab`
- `key2row`
- `lookups.bin`
- `strings.json`
- `vectors`
- `vectors.cfg`
- `config.cfg`
- `meta.json`
- `tokenizer`
- `/model-last`
- `/textcat_multilabel`
- `cfg`
- `model`
- `/vocab`
- `key2row`
- `lookups.bin`
- `strings.json`
- `vectors`
- `vectors.cfg`
- `config.cfg`
- `meta.json`
- `tokenizer`
- `/python_Code`
- `finalStep-formatLabel.py`
- `firstStep-format.py`
- `five_examples_annotated.ipynb`
- `secondStep-score.py`
- `thirdStep-label.py`
- `train_eval_split.ipynb`
- `TerminalCode.txt`
- `requirements.txt`
- `Terminal Commands vs Project.yml`
- `Project.yml`
- `README.md`
- `prodigy.json`
## License
- Package A: MIT License
- Package B: Apache License 2.00
## Acknowledgements
Manjinder Sandhu, Dagim Bantikassegn, Alex Brooks, Tyler Dabbs |