Spaces:
Runtime error
Runtime error
Merge branch 'fix-mlflow' of Dean/summarization into master
Browse files- .github/CODE_OF_CONDUCT.md +0 -128
- .github/CONTRIBUTING.md +0 -92
- .github/FUNDING.yml +0 -12
- .github/ISSUE_TEMPLATE/bug_report.md +0 -38
- .github/ISSUE_TEMPLATE/feature_request.md +0 -20
- .github/PULL_REQUEST_TEMPLATE.md +0 -29
- .gitignore +4 -1
- Makefile +9 -1
- app.py +0 -32
- data.dvc +0 -14
- data_params.yml +2 -0
- dvc.lock +79 -42
- dvc.yaml +18 -9
- params.yml → model_params.yml +5 -10
- reports/evaluation_metrics.csv +37 -0
- reports/evaluation_metrics.txt +0 -1
- reports/training_metrics.csv +9 -0
- reports/training_metrics.txt +0 -1
- reports/training_params.yml +1 -0
- reports/visualization_metrics.txt +0 -0
- requirements.txt +5 -5
- src/data/__init__.py +0 -0
- src/data/make_dataset.py +1 -1
- src/data/process_data.py +1 -3
- src/models/evaluate_model.py +4 -3
- src/models/hf_upload.py +46 -0
- src/models/model.py +13 -46
- src/models/predict_model.py +2 -2
- src/models/train_model.py +3 -14
- src/visualization/__init__.py +0 -0
- src/visualization/visualize.py +1 -9
.github/CODE_OF_CONDUCT.md
DELETED
|
@@ -1,128 +0,0 @@
|
|
| 1 |
-
# Contributor Covenant Code of Conduct
|
| 2 |
-
|
| 3 |
-
## Our Pledge
|
| 4 |
-
|
| 5 |
-
We as members, contributors, and leaders pledge to make participation in our
|
| 6 |
-
community a harassment-free experience for everyone, regardless of age, body
|
| 7 |
-
size, visible or invisible disability, ethnicity, sex characteristics, gender
|
| 8 |
-
identity and expression, level of experience, education, socio-economic status,
|
| 9 |
-
nationality, personal appearance, race, religion, or sexual identity
|
| 10 |
-
and orientation.
|
| 11 |
-
|
| 12 |
-
We pledge to act and interact in ways that contribute to an open, welcoming,
|
| 13 |
-
diverse, inclusive, and healthy community.
|
| 14 |
-
|
| 15 |
-
## Our Standards
|
| 16 |
-
|
| 17 |
-
Examples of behavior that contributes to a positive environment for our
|
| 18 |
-
community include:
|
| 19 |
-
|
| 20 |
-
* Demonstrating empathy and kindness toward other people
|
| 21 |
-
* Being respectful of differing opinions, viewpoints, and experiences
|
| 22 |
-
* Giving and gracefully accepting constructive feedback
|
| 23 |
-
* Accepting responsibility and apologizing to those affected by our mistakes,
|
| 24 |
-
and learning from the experience
|
| 25 |
-
* Focusing on what is best not just for us as individuals, but for the
|
| 26 |
-
overall community
|
| 27 |
-
|
| 28 |
-
Examples of unacceptable behavior include:
|
| 29 |
-
|
| 30 |
-
* The use of sexualized language or imagery, and sexual attention or
|
| 31 |
-
advances of any kind
|
| 32 |
-
* Trolling, insulting or derogatory comments, and personal or political attacks
|
| 33 |
-
* Public or private harassment
|
| 34 |
-
* Publishing others' private information, such as a physical or email
|
| 35 |
-
address, without their explicit permission
|
| 36 |
-
* Other conduct which could reasonably be considered inappropriate in a
|
| 37 |
-
professional setting
|
| 38 |
-
|
| 39 |
-
## Enforcement Responsibilities
|
| 40 |
-
|
| 41 |
-
Community leaders are responsible for clarifying and enforcing our standards of
|
| 42 |
-
acceptable behavior and will take appropriate and fair corrective action in
|
| 43 |
-
response to any behavior that they deem inappropriate, threatening, offensive,
|
| 44 |
-
or harmful.
|
| 45 |
-
|
| 46 |
-
Community leaders have the right and responsibility to remove, edit, or reject
|
| 47 |
-
comments, commits, code, wiki edits, issues, and other contributions that are
|
| 48 |
-
not aligned to this Code of Conduct, and will communicate reasons for moderation
|
| 49 |
-
decisions when appropriate.
|
| 50 |
-
|
| 51 |
-
## Scope
|
| 52 |
-
|
| 53 |
-
This Code of Conduct applies within all community spaces, and also applies when
|
| 54 |
-
an individual is officially representing the community in public spaces.
|
| 55 |
-
Examples of representing our community include using an official e-mail address,
|
| 56 |
-
posting via an official social media account, or acting as an appointed
|
| 57 |
-
representative at an online or offline event.
|
| 58 |
-
|
| 59 |
-
## Enforcement
|
| 60 |
-
|
| 61 |
-
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
| 62 |
-
reported to the community leaders responsible for enforcement at
|
| 63 |
-
@gagan3012.
|
| 64 |
-
All complaints will be reviewed and investigated promptly and fairly.
|
| 65 |
-
|
| 66 |
-
All community leaders are obligated to respect the privacy and security of the
|
| 67 |
-
reporter of any incident.
|
| 68 |
-
|
| 69 |
-
## Enforcement Guidelines
|
| 70 |
-
|
| 71 |
-
Community leaders will follow these Community Impact Guidelines in determining
|
| 72 |
-
the consequences for any action they deem in violation of this Code of Conduct:
|
| 73 |
-
|
| 74 |
-
### 1. Correction
|
| 75 |
-
|
| 76 |
-
**Community Impact**: Use of inappropriate language or other behavior deemed
|
| 77 |
-
unprofessional or unwelcome in the community.
|
| 78 |
-
|
| 79 |
-
**Consequence**: A private, written warning from community leaders, providing
|
| 80 |
-
clarity around the nature of the violation and an explanation of why the
|
| 81 |
-
behavior was inappropriate. A public apology may be requested.
|
| 82 |
-
|
| 83 |
-
### 2. Warning
|
| 84 |
-
|
| 85 |
-
**Community Impact**: A violation through a single incident or series
|
| 86 |
-
of actions.
|
| 87 |
-
|
| 88 |
-
**Consequence**: A warning with consequences for continued behavior. No
|
| 89 |
-
interaction with the people involved, including unsolicited interaction with
|
| 90 |
-
those enforcing the Code of Conduct, for a specified period of time. This
|
| 91 |
-
includes avoiding interactions in community spaces as well as external channels
|
| 92 |
-
like social media. Violating these terms may lead to a temporary or
|
| 93 |
-
permanent ban.
|
| 94 |
-
|
| 95 |
-
### 3. Temporary Ban
|
| 96 |
-
|
| 97 |
-
**Community Impact**: A serious violation of community standards, including
|
| 98 |
-
sustained inappropriate behavior.
|
| 99 |
-
|
| 100 |
-
**Consequence**: A temporary ban from any sort of interaction or public
|
| 101 |
-
communication with the community for a specified period of time. No public or
|
| 102 |
-
private interaction with the people involved, including unsolicited interaction
|
| 103 |
-
with those enforcing the Code of Conduct, is allowed during this period.
|
| 104 |
-
Violating these terms may lead to a permanent ban.
|
| 105 |
-
|
| 106 |
-
### 4. Permanent Ban
|
| 107 |
-
|
| 108 |
-
**Community Impact**: Demonstrating a pattern of violation of community
|
| 109 |
-
standards, including sustained inappropriate behavior, harassment of an
|
| 110 |
-
individual, or aggression toward or disparagement of classes of individuals.
|
| 111 |
-
|
| 112 |
-
**Consequence**: A permanent ban from any sort of public interaction within
|
| 113 |
-
the community.
|
| 114 |
-
|
| 115 |
-
## Attribution
|
| 116 |
-
|
| 117 |
-
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
| 118 |
-
version 2.0, available at
|
| 119 |
-
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
|
| 120 |
-
|
| 121 |
-
Community Impact Guidelines were inspired by [Mozilla's code of conduct
|
| 122 |
-
enforcement ladder](https://github.com/mozilla/diversity).
|
| 123 |
-
|
| 124 |
-
[homepage]: https://www.contributor-covenant.org
|
| 125 |
-
|
| 126 |
-
For answers to common questions about this code of conduct, see the FAQ at
|
| 127 |
-
https://www.contributor-covenant.org/faq. Translations are available at
|
| 128 |
-
https://www.contributor-covenant.org/translations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/CONTRIBUTING.md
DELETED
|
@@ -1,92 +0,0 @@
|
|
| 1 |
-
# Contributing
|
| 2 |
-
|
| 3 |
-
When contributing to this repository, please first discuss the change you wish to make via issue,
|
| 4 |
-
email, or any other method with the owners of this repository before making a change.
|
| 5 |
-
|
| 6 |
-
Please note we have a code of conduct, please follow it in all your interactions with the project.
|
| 7 |
-
|
| 8 |
-
## Pull Request Process
|
| 9 |
-
|
| 10 |
-
1. Ensure any install or build dependencies are removed before the end of the layer when doing a
|
| 11 |
-
build.
|
| 12 |
-
2. Update the README.md with details of changes to the interface, this includes new environment
|
| 13 |
-
variables, exposed ports, useful file locations and container parameters.
|
| 14 |
-
3. Increase the version numbers in any examples files and the README.md to the new version that this
|
| 15 |
-
Pull Request would represent. The versioning scheme we use is [SemVer](http://semver.org/).
|
| 16 |
-
4. You may merge the Pull Request in once you have the sign-off of two other developers, or if you
|
| 17 |
-
do not have permission to do that, you may request the second reviewer to merge it for you.
|
| 18 |
-
|
| 19 |
-
## Code of Conduct
|
| 20 |
-
|
| 21 |
-
### Our Pledge
|
| 22 |
-
|
| 23 |
-
In the interest of fostering an open and welcoming environment, we as
|
| 24 |
-
contributors and maintainers pledge to making participation in our project and
|
| 25 |
-
our community a harassment-free experience for everyone, regardless of age, body
|
| 26 |
-
size, disability, ethnicity, gender identity and expression, level of experience,
|
| 27 |
-
nationality, personal appearance, race, religion, or sexual identity and
|
| 28 |
-
orientation.
|
| 29 |
-
|
| 30 |
-
### Our Standards
|
| 31 |
-
|
| 32 |
-
Examples of behavior that contributes to creating a positive environment
|
| 33 |
-
include:
|
| 34 |
-
|
| 35 |
-
* Using welcoming and inclusive language
|
| 36 |
-
* Being respectful of differing viewpoints and experiences
|
| 37 |
-
* Gracefully accepting constructive criticism
|
| 38 |
-
* Focusing on what is best for the community
|
| 39 |
-
* Showing empathy towards other community members
|
| 40 |
-
|
| 41 |
-
Examples of unacceptable behavior by participants include:
|
| 42 |
-
|
| 43 |
-
* The use of sexualized language or imagery and unwelcome sexual attention or
|
| 44 |
-
advances
|
| 45 |
-
* Trolling, insulting/derogatory comments, and personal or political attacks
|
| 46 |
-
* Public or private harassment
|
| 47 |
-
* Publishing others' private information, such as a physical or electronic
|
| 48 |
-
address, without explicit permission
|
| 49 |
-
* Other conduct which could reasonably be considered inappropriate in a
|
| 50 |
-
professional setting
|
| 51 |
-
|
| 52 |
-
### Our Responsibilities
|
| 53 |
-
|
| 54 |
-
Project maintainers are responsible for clarifying the standards of acceptable
|
| 55 |
-
behavior and are expected to take appropriate and fair corrective action in
|
| 56 |
-
response to any instances of unacceptable behavior.
|
| 57 |
-
|
| 58 |
-
Project maintainers have the right and responsibility to remove, edit, or
|
| 59 |
-
reject comments, commits, code, wiki edits, issues, and other contributions
|
| 60 |
-
that are not aligned to this Code of Conduct, or to ban temporarily or
|
| 61 |
-
permanently any contributor for other behaviors that they deem inappropriate,
|
| 62 |
-
threatening, offensive, or harmful.
|
| 63 |
-
|
| 64 |
-
### Scope
|
| 65 |
-
|
| 66 |
-
This Code of Conduct applies both within project spaces and in public spaces
|
| 67 |
-
when an individual is representing the project or its community. Examples of
|
| 68 |
-
representing a project or community include using an official project e-mail
|
| 69 |
-
address, posting via an official social media account, or acting as an appointed
|
| 70 |
-
representative at an online or offline event. Representation of a project may be
|
| 71 |
-
further defined and clarified by project maintainers.
|
| 72 |
-
|
| 73 |
-
### Enforcement
|
| 74 |
-
|
| 75 |
-
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
| 76 |
-
reported by contacting the project team at [INSERT EMAIL ADDRESS]. All
|
| 77 |
-
complaints will be reviewed and investigated and will result in a response that
|
| 78 |
-
is deemed necessary and appropriate to the circumstances. The project team is
|
| 79 |
-
obligated to maintain confidentiality with regard to the reporter of an incident.
|
| 80 |
-
Further details of specific enforcement policies may be posted separately.
|
| 81 |
-
|
| 82 |
-
Project maintainers who do not follow or enforce the Code of Conduct in good
|
| 83 |
-
faith may face temporary or permanent repercussions as determined by other
|
| 84 |
-
members of the project's leadership.
|
| 85 |
-
|
| 86 |
-
### Attribution
|
| 87 |
-
|
| 88 |
-
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
| 89 |
-
available at [http://contributor-covenant.org/version/1/4][version]
|
| 90 |
-
|
| 91 |
-
[homepage]: http://contributor-covenant.org
|
| 92 |
-
[version]: http://contributor-covenant.org/version/1/4/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/FUNDING.yml
DELETED
|
@@ -1,12 +0,0 @@
|
|
| 1 |
-
# These are supported funding model platforms
|
| 2 |
-
|
| 3 |
-
github: gagan3012 # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
|
| 4 |
-
patreon: # Replace with a single Patreon username
|
| 5 |
-
open_collective: # Replace with a single Open Collective username
|
| 6 |
-
ko_fi: # Replace with a single Ko-fi username
|
| 7 |
-
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
|
| 8 |
-
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
|
| 9 |
-
liberapay: # Replace with a single Liberapay username
|
| 10 |
-
issuehunt: # Replace with a single IssueHunt username
|
| 11 |
-
otechie: # Replace with a single Otechie username
|
| 12 |
-
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/ISSUE_TEMPLATE/bug_report.md
DELETED
|
@@ -1,38 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
name: Bug report
|
| 3 |
-
about: Create a report to help us improve
|
| 4 |
-
title: ''
|
| 5 |
-
labels: ''
|
| 6 |
-
assignees: ''
|
| 7 |
-
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
**Describe the bug**
|
| 11 |
-
A clear and concise description of what the bug is.
|
| 12 |
-
|
| 13 |
-
**To Reproduce**
|
| 14 |
-
Steps to reproduce the behavior:
|
| 15 |
-
1. Go to '...'
|
| 16 |
-
2. Click on '....'
|
| 17 |
-
3. Scroll down to '....'
|
| 18 |
-
4. See error
|
| 19 |
-
|
| 20 |
-
**Expected behavior**
|
| 21 |
-
A clear and concise description of what you expected to happen.
|
| 22 |
-
|
| 23 |
-
**Screenshots**
|
| 24 |
-
If applicable, add screenshots to help explain your problem.
|
| 25 |
-
|
| 26 |
-
**Desktop (please complete the following information):**
|
| 27 |
-
- OS: [e.g. iOS]
|
| 28 |
-
- Browser [e.g. chrome, safari]
|
| 29 |
-
- Version [e.g. 22]
|
| 30 |
-
|
| 31 |
-
**Smartphone (please complete the following information):**
|
| 32 |
-
- Device: [e.g. iPhone6]
|
| 33 |
-
- OS: [e.g. iOS8.1]
|
| 34 |
-
- Browser [e.g. stock browser, safari]
|
| 35 |
-
- Version [e.g. 22]
|
| 36 |
-
|
| 37 |
-
**Additional context**
|
| 38 |
-
Add any other context about the problem here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/ISSUE_TEMPLATE/feature_request.md
DELETED
|
@@ -1,20 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
name: Feature request
|
| 3 |
-
about: Suggest an idea for this project
|
| 4 |
-
title: ''
|
| 5 |
-
labels: ''
|
| 6 |
-
assignees: ''
|
| 7 |
-
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
**Is your feature request related to a problem? Please describe.**
|
| 11 |
-
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
|
| 12 |
-
|
| 13 |
-
**Describe the solution you'd like**
|
| 14 |
-
A clear and concise description of what you want to happen.
|
| 15 |
-
|
| 16 |
-
**Describe alternatives you've considered**
|
| 17 |
-
A clear and concise description of any alternative solutions or features you've considered.
|
| 18 |
-
|
| 19 |
-
**Additional context**
|
| 20 |
-
Add any other context or screenshots about the feature request here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/PULL_REQUEST_TEMPLATE.md
DELETED
|
@@ -1,29 +0,0 @@
|
|
| 1 |
-
<!--- Provide a general summary of your changes in the Title above -->
|
| 2 |
-
|
| 3 |
-
## Description
|
| 4 |
-
<!--- Describe your changes in detail -->
|
| 5 |
-
|
| 6 |
-
## Motivation and Context
|
| 7 |
-
<!--- Why is this change required? What problem does it solve? -->
|
| 8 |
-
<!--- If it fixes an open issue, please link to the issue here. -->
|
| 9 |
-
|
| 10 |
-
## How Has This Been Tested?
|
| 11 |
-
<!--- Please describe in detail how you tested your changes. -->
|
| 12 |
-
<!--- Include details of your testing environment, and the tests you ran to -->
|
| 13 |
-
<!--- see how your change affects other areas of the code, etc. -->
|
| 14 |
-
|
| 15 |
-
## Screenshots (if appropriate):
|
| 16 |
-
|
| 17 |
-
## Types of changes
|
| 18 |
-
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
|
| 19 |
-
- [ ] Bug fix (non-breaking change which fixes an issue)
|
| 20 |
-
- [ ] New feature (non-breaking change which adds functionality)
|
| 21 |
-
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
|
| 22 |
-
|
| 23 |
-
## Checklist:
|
| 24 |
-
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
|
| 25 |
-
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
|
| 26 |
-
- [ ] My code follows the code style of this project.
|
| 27 |
-
- [ ] My change requires a change to the documentation.
|
| 28 |
-
- [ ] I have updated the documentation accordingly.
|
| 29 |
-
- [ ] I have read the **CONTRIBUTING** document.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.gitignore
CHANGED
|
@@ -93,6 +93,9 @@ coverage.xml
|
|
| 93 |
.vscode
|
| 94 |
/data
|
| 95 |
|
| 96 |
-
wandb/
|
| 97 |
summarization-dagshub/
|
| 98 |
/models
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
.vscode
|
| 94 |
/data
|
| 95 |
|
|
|
|
| 96 |
summarization-dagshub/
|
| 97 |
/models
|
| 98 |
+
default/
|
| 99 |
+
artifacts/
|
| 100 |
+
mlruns/
|
| 101 |
+
hf_model/
|
Makefile
CHANGED
|
@@ -48,7 +48,15 @@ pull:
|
|
| 48 |
|
| 49 |
## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
|
| 50 |
run:
|
| 51 |
-
dvc repro
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
#################################################################################
|
| 54 |
# PROJECT RULES #
|
|
|
|
| 48 |
|
| 49 |
## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
|
| 50 |
run:
|
| 51 |
+
dvc repro eval
|
| 52 |
+
|
| 53 |
+
## run the visualization using Streamlit
|
| 54 |
+
visualize:
|
| 55 |
+
dvc repro visualize
|
| 56 |
+
|
| 57 |
+
## push the trained model to HF model hub
|
| 58 |
+
push_to_hf_hub:
|
| 59 |
+
dvc repro push_to_hf_hub
|
| 60 |
|
| 61 |
#################################################################################
|
| 62 |
# PROJECT RULES #
|
app.py
DELETED
|
@@ -1,32 +0,0 @@
|
|
| 1 |
-
import streamlit as st
|
| 2 |
-
import yaml
|
| 3 |
-
|
| 4 |
-
from src.models.predict_model import predict_model
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
def visualize():
|
| 8 |
-
st.write("# Summarization UI")
|
| 9 |
-
st.markdown(
|
| 10 |
-
"""
|
| 11 |
-
*For additional questions and inquiries, please contact **Gagan Bhatia** via [LinkedIn](
|
| 12 |
-
https://www.linkedin.com/in/gbhatia30/) or [Github](https://github.com/gagan3012).*
|
| 13 |
-
"""
|
| 14 |
-
)
|
| 15 |
-
|
| 16 |
-
text = st.text_area("Enter text here")
|
| 17 |
-
if st.button("Generate Summary"):
|
| 18 |
-
with st.spinner("Connecting the Dots..."):
|
| 19 |
-
sumtext = predict_model(text=text)
|
| 20 |
-
st.write("# Generated Summary:")
|
| 21 |
-
st.write("{}".format(sumtext))
|
| 22 |
-
with open("reports/visualization_metrics.txt", "w") as file1:
|
| 23 |
-
file1.writelines(text)
|
| 24 |
-
file1.writelines(sumtext)
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
if __name__ == "__main__":
|
| 28 |
-
with open("params.yml") as f:
|
| 29 |
-
params = yaml.safe_load(f)
|
| 30 |
-
|
| 31 |
-
if params["visualise"]:
|
| 32 |
-
visualize()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data.dvc
DELETED
|
@@ -1,14 +0,0 @@
|
|
| 1 |
-
deps:
|
| 2 |
-
- path: params.yml
|
| 3 |
-
md5: d0f3e81bc9191e752a69761045a449d9
|
| 4 |
-
size: 196
|
| 5 |
-
- path: src/data/make_dataset.py
|
| 6 |
-
md5: 9de71de0f8df5d0a7beb235ef7c7777d
|
| 7 |
-
size: 772
|
| 8 |
-
cmd: python src/data/make_dataset.py
|
| 9 |
-
outs:
|
| 10 |
-
- md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
|
| 11 |
-
nfiles: 3
|
| 12 |
-
path: data/raw
|
| 13 |
-
size: 1358833013
|
| 14 |
-
md5: ff502232006c7fbef1015b5aa5cc4bbb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_params.yml
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
data: cnn_dailymail
|
| 2 |
+
split: 0.001
|
dvc.lock
CHANGED
|
@@ -4,65 +4,102 @@ stages:
|
|
| 4 |
cmd: python src/models/train_model.py
|
| 5 |
deps:
|
| 6 |
- path: data/processed/train.csv
|
| 7 |
-
md5:
|
| 8 |
-
size:
|
| 9 |
- path: data/processed/validation.csv
|
| 10 |
-
md5:
|
| 11 |
-
size:
|
| 12 |
-
- path:
|
| 13 |
-
md5:
|
| 14 |
-
size:
|
| 15 |
- path: src/models/train_model.py
|
| 16 |
-
md5:
|
| 17 |
-
size:
|
| 18 |
outs:
|
| 19 |
- path: models
|
| 20 |
-
md5:
|
| 21 |
-
size:
|
| 22 |
-
nfiles:
|
| 23 |
-
- path: reports/training_metrics.
|
| 24 |
-
md5:
|
| 25 |
-
size:
|
| 26 |
eval:
|
| 27 |
cmd: python src/models/evaluate_model.py
|
| 28 |
deps:
|
| 29 |
- path: data/processed/test.csv
|
| 30 |
-
md5:
|
| 31 |
-
size:
|
|
|
|
|
|
|
|
|
|
| 32 |
- path: models
|
| 33 |
-
md5:
|
| 34 |
-
size:
|
| 35 |
-
nfiles:
|
| 36 |
-
- path: params.yml
|
| 37 |
-
md5: d0f3e81bc9191e752a69761045a449d9
|
| 38 |
-
size: 196
|
| 39 |
- path: src/models/evaluate_model.py
|
| 40 |
-
md5:
|
| 41 |
-
size:
|
| 42 |
outs:
|
| 43 |
-
- path: reports/
|
| 44 |
-
md5:
|
| 45 |
-
size:
|
| 46 |
process_data:
|
| 47 |
cmd: python src/data/process_data.py
|
| 48 |
deps:
|
| 49 |
- path: data/raw
|
| 50 |
-
md5:
|
| 51 |
-
size:
|
| 52 |
-
nfiles:
|
| 53 |
-
- path:
|
| 54 |
-
md5:
|
| 55 |
-
size:
|
| 56 |
- path: src/data/process_data.py
|
| 57 |
-
md5:
|
| 58 |
-
size:
|
| 59 |
outs:
|
| 60 |
- path: data/processed/test.csv
|
| 61 |
-
md5:
|
| 62 |
-
size:
|
| 63 |
- path: data/processed/train.csv
|
| 64 |
-
md5:
|
| 65 |
-
size:
|
| 66 |
- path: data/processed/validation.csv
|
| 67 |
-
md5:
|
| 68 |
-
size:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
cmd: python src/models/train_model.py
|
| 5 |
deps:
|
| 6 |
- path: data/processed/train.csv
|
| 7 |
+
md5: 5331b9c32b2d097d8d7aca01de5524bc
|
| 8 |
+
size: 1198262
|
| 9 |
- path: data/processed/validation.csv
|
| 10 |
+
md5: 6069153a075b00dfb6d9e0843dd2da89
|
| 11 |
+
size: 52739
|
| 12 |
+
- path: model_params.yml
|
| 13 |
+
md5: 1bf2edf25e851cc9cd3be75fbd9905a3
|
| 14 |
+
size: 177
|
| 15 |
- path: src/models/train_model.py
|
| 16 |
+
md5: f7d1121426c3d5530c2b9697cb7ac74a
|
| 17 |
+
size: 951
|
| 18 |
outs:
|
| 19 |
- path: models
|
| 20 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
| 21 |
+
size: 243476333
|
| 22 |
+
nfiles: 5
|
| 23 |
+
- path: reports/training_metrics.csv
|
| 24 |
+
md5: 3b309def91a32e521acd23b163742522
|
| 25 |
+
size: 320
|
| 26 |
eval:
|
| 27 |
cmd: python src/models/evaluate_model.py
|
| 28 |
deps:
|
| 29 |
- path: data/processed/test.csv
|
| 30 |
+
md5: 3eec94ac211c76363a3d968663b82d02
|
| 31 |
+
size: 39574
|
| 32 |
+
- path: model_params.yml
|
| 33 |
+
md5: 1bf2edf25e851cc9cd3be75fbd9905a3
|
| 34 |
+
size: 177
|
| 35 |
- path: models
|
| 36 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
| 37 |
+
size: 243476333
|
| 38 |
+
nfiles: 5
|
|
|
|
|
|
|
|
|
|
| 39 |
- path: src/models/evaluate_model.py
|
| 40 |
+
md5: 89edb77aaab3055605ae6db2e21eab82
|
| 41 |
+
size: 705
|
| 42 |
outs:
|
| 43 |
+
- path: reports/evaluation_metrics.csv
|
| 44 |
+
md5: eaa3bf017026aa1be31560f308fff78e
|
| 45 |
+
size: 2122
|
| 46 |
process_data:
|
| 47 |
cmd: python src/data/process_data.py
|
| 48 |
deps:
|
| 49 |
- path: data/raw
|
| 50 |
+
md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
|
| 51 |
+
size: 1358833013
|
| 52 |
+
nfiles: 3
|
| 53 |
+
- path: data_params.yml
|
| 54 |
+
md5: a68eabf79c3b3e28afb05baa1944bbc7
|
| 55 |
+
size: 32
|
| 56 |
- path: src/data/process_data.py
|
| 57 |
+
md5: 68db554a69a0c8ce807907afa2be5e9c
|
| 58 |
+
size: 521
|
| 59 |
outs:
|
| 60 |
- path: data/processed/test.csv
|
| 61 |
+
md5: 3eec94ac211c76363a3d968663b82d02
|
| 62 |
+
size: 39574
|
| 63 |
- path: data/processed/train.csv
|
| 64 |
+
md5: 5331b9c32b2d097d8d7aca01de5524bc
|
| 65 |
+
size: 1198262
|
| 66 |
- path: data/processed/validation.csv
|
| 67 |
+
md5: 6069153a075b00dfb6d9e0843dd2da89
|
| 68 |
+
size: 52739
|
| 69 |
+
download_data:
|
| 70 |
+
cmd: python src/data/make_dataset.py
|
| 71 |
+
deps:
|
| 72 |
+
- path: data_params.yml
|
| 73 |
+
md5: a68eabf79c3b3e28afb05baa1944bbc7
|
| 74 |
+
size: 32
|
| 75 |
+
- path: src/data/make_dataset.py
|
| 76 |
+
md5: a0667f4ad8c06551609bd0bf950167b7
|
| 77 |
+
size: 776
|
| 78 |
+
outs:
|
| 79 |
+
- path: data/raw
|
| 80 |
+
md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
|
| 81 |
+
size: 1358833013
|
| 82 |
+
nfiles: 3
|
| 83 |
+
visualize:
|
| 84 |
+
cmd: streamlit run src/visualization/visualize.py
|
| 85 |
+
deps:
|
| 86 |
+
- path: models
|
| 87 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
| 88 |
+
size: 243476333
|
| 89 |
+
nfiles: 5
|
| 90 |
+
- path: src/visualization/visualize.py
|
| 91 |
+
md5: 4226e4148abb5ac186c0ab8c1d87b228
|
| 92 |
+
size: 671
|
| 93 |
+
push_to_hf_hub:
|
| 94 |
+
cmd: python src/models/hf_upload.py
|
| 95 |
+
deps:
|
| 96 |
+
- path: model_params.yml
|
| 97 |
+
md5: 1bf2edf25e851cc9cd3be75fbd9905a3
|
| 98 |
+
size: 177
|
| 99 |
+
- path: models
|
| 100 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
| 101 |
+
size: 243476333
|
| 102 |
+
nfiles: 5
|
| 103 |
+
- path: src/models/hf_upload.py
|
| 104 |
+
md5: a953816a3eb7bef702313544103a1c11
|
| 105 |
+
size: 1290
|
dvc.yaml
CHANGED
|
@@ -1,8 +1,15 @@
|
|
| 1 |
stages:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
process_data:
|
| 3 |
cmd: python src/data/process_data.py
|
| 4 |
deps:
|
| 5 |
-
-
|
| 6 |
- data/raw
|
| 7 |
- src/data/process_data.py
|
| 8 |
outs:
|
|
@@ -18,7 +25,7 @@ stages:
|
|
| 18 |
train:
|
| 19 |
cmd: python src/models/train_model.py
|
| 20 |
deps:
|
| 21 |
-
-
|
| 22 |
- data/processed/train.csv
|
| 23 |
- data/processed/validation.csv
|
| 24 |
- src/models/train_model.py
|
|
@@ -26,25 +33,27 @@ stages:
|
|
| 26 |
- models:
|
| 27 |
persist: true
|
| 28 |
metrics:
|
| 29 |
-
- reports/training_metrics.
|
| 30 |
cache: false
|
| 31 |
eval:
|
| 32 |
cmd: python src/models/evaluate_model.py
|
| 33 |
deps:
|
| 34 |
-
-
|
| 35 |
- data/processed/test.csv
|
| 36 |
- models
|
| 37 |
- src/models/evaluate_model.py
|
| 38 |
metrics:
|
| 39 |
-
- reports/evaluation_metrics.
|
| 40 |
cache: false
|
| 41 |
visualize:
|
| 42 |
cmd: streamlit run src/visualization/visualize.py
|
| 43 |
deps:
|
| 44 |
- models
|
| 45 |
- src/visualization/visualize.py
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
|
|
|
|
|
|
| 50 |
|
|
|
|
| 1 |
stages:
|
| 2 |
+
download_data:
|
| 3 |
+
cmd: python src/data/make_dataset.py
|
| 4 |
+
deps:
|
| 5 |
+
- data_params.yml
|
| 6 |
+
- src/data/make_dataset.py
|
| 7 |
+
outs:
|
| 8 |
+
- data/raw
|
| 9 |
process_data:
|
| 10 |
cmd: python src/data/process_data.py
|
| 11 |
deps:
|
| 12 |
+
- data_params.yml
|
| 13 |
- data/raw
|
| 14 |
- src/data/process_data.py
|
| 15 |
outs:
|
|
|
|
| 25 |
train:
|
| 26 |
cmd: python src/models/train_model.py
|
| 27 |
deps:
|
| 28 |
+
- model_params.yml
|
| 29 |
- data/processed/train.csv
|
| 30 |
- data/processed/validation.csv
|
| 31 |
- src/models/train_model.py
|
|
|
|
| 33 |
- models:
|
| 34 |
persist: true
|
| 35 |
metrics:
|
| 36 |
+
- reports/training_metrics.csv:
|
| 37 |
cache: false
|
| 38 |
eval:
|
| 39 |
cmd: python src/models/evaluate_model.py
|
| 40 |
deps:
|
| 41 |
+
- model_params.yml
|
| 42 |
- data/processed/test.csv
|
| 43 |
- models
|
| 44 |
- src/models/evaluate_model.py
|
| 45 |
metrics:
|
| 46 |
+
- reports/evaluation_metrics.csv:
|
| 47 |
cache: false
|
| 48 |
visualize:
|
| 49 |
cmd: streamlit run src/visualization/visualize.py
|
| 50 |
deps:
|
| 51 |
- models
|
| 52 |
- src/visualization/visualize.py
|
| 53 |
+
push_to_hf_hub:
|
| 54 |
+
cmd: python src/models/hf_upload.py
|
| 55 |
+
deps:
|
| 56 |
+
- model_params.yml
|
| 57 |
+
- src/models/hf_upload.py
|
| 58 |
+
- models
|
| 59 |
|
params.yml → model_params.yml
RENAMED
|
@@ -1,16 +1,11 @@
|
|
| 1 |
name: summarsiation
|
| 2 |
-
data: cnn_dailymail
|
| 3 |
-
batch_size: 2
|
| 4 |
-
num_workers: 2
|
| 5 |
model_type: t5
|
| 6 |
model_name: t5-small
|
| 7 |
-
|
| 8 |
epochs: 5
|
| 9 |
-
|
|
|
|
|
|
|
| 10 |
model_dir: models
|
| 11 |
metric: rouge
|
| 12 |
-
|
| 13 |
-
use_gpu: True
|
| 14 |
-
visualise: True
|
| 15 |
-
hf_username: gagan3012
|
| 16 |
-
upload_to_hf: True
|
|
|
|
| 1 |
name: summarsiation
|
|
|
|
|
|
|
|
|
|
| 2 |
model_type: t5
|
| 3 |
model_name: t5-small
|
| 4 |
+
batch_size: 2
|
| 5 |
epochs: 5
|
| 6 |
+
use_gpu: True
|
| 7 |
+
learning_rate: 1e-4
|
| 8 |
+
num_workers: 2
|
| 9 |
model_dir: models
|
| 10 |
metric: rouge
|
| 11 |
+
source_dir: src
|
|
|
|
|
|
|
|
|
|
|
|
reports/evaluation_metrics.csv
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Name,Value,Timestamp,Step
|
| 2 |
+
"Rouge_1 Low Precision",0.23786550570641482,1628587253223,1
|
| 3 |
+
"Rouge_1 Low recall",0.23355396379384713,1628587253223,1
|
| 4 |
+
"Rouge_1 Low F1",0.23602599457077003,1628587253223,1
|
| 5 |
+
"Rouge_1 Mid Precision",0.3569471852499436,1628587253223,1
|
| 6 |
+
"Rouge_1 Mid recall",0.31915939075819916,1628587253223,1
|
| 7 |
+
"Rouge_1 Mid F1",0.3317618573023773,1628587253223,1
|
| 8 |
+
"Rouge_1 High Precision",0.4726861301480842,1628587253223,1
|
| 9 |
+
"Rouge_1 High recall",0.4019654200001146,1628587253223,1
|
| 10 |
+
"Rouge_1 High F1",0.4298956952594035,1628587253223,1
|
| 11 |
+
"Rouge_2 Low Precision",0.06184772400193972,1628587253223,1
|
| 12 |
+
"Rouge_2 Low recall",0.05626972412346313,1628587253223,1
|
| 13 |
+
"Rouge_2 Low F1",0.058680298802341754,1628587253223,1
|
| 14 |
+
"Rouge_2 Mid Precision",0.1367034298993256,1628587253223,1
|
| 15 |
+
"Rouge_2 Mid recall",0.11953160646342464,1628587253223,1
|
| 16 |
+
"Rouge_2 Mid F1",0.12485064123505887,1628587253223,1
|
| 17 |
+
"Rouge_2 High Precision",0.22739029631016827,1628587253223,1
|
| 18 |
+
"Rouge_2 High recall",0.18851628169809986,1628587253223,1
|
| 19 |
+
"Rouge_2 High F1",0.20306657551189072,1628587253223,1
|
| 20 |
+
"Rouge_L Low Precision",0.18248956154159507,1628587253223,1
|
| 21 |
+
"Rouge_L Low recall",0.18048774357814204,1628587253223,1
|
| 22 |
+
"Rouge_L Low F1",0.18151380309623336,1628587253223,1
|
| 23 |
+
"Rouge_L Mid Precision",0.2614974838710314,1628587253223,1
|
| 24 |
+
"Rouge_L Mid recall",0.24286688705755238,1628587253223,1
|
| 25 |
+
"Rouge_L Mid F1",0.24674586991996245,1628587253223,1
|
| 26 |
+
"Rouge_L High Precision",0.3574471638807763,1628587253223,1
|
| 27 |
+
"Rouge_L High recall",0.30836083808542225,1628587253223,1
|
| 28 |
+
"Rouge_L High F1",0.32385446385474176,1628587253223,1
|
| 29 |
+
"rougeLsum Low Precision",0.21468633089019287,1628587253223,1
|
| 30 |
+
"rougeLsum Low recall",0.2057771050364415,1628587253223,1
|
| 31 |
+
"rougeLsum Low F1",0.21170611912426093,1628587253223,1
|
| 32 |
+
"rougeLsum Mid Precision",0.3060593850789648,1628587253223,1
|
| 33 |
+
"rougeLsum Mid recall",0.27733553744690076,1628587253223,1
|
| 34 |
+
"rougeLsum Mid F1",0.28530501988436374,1628587253223,1
|
| 35 |
+
"rougeLsum High Precision",0.4094614601758424,1628587253223,1
|
| 36 |
+
"rougeLsum High recall",0.34640369291505535,1628587253223,1
|
| 37 |
+
"rougeLsum High F1",0.36454440079714096,1628587253223,1
|
reports/evaluation_metrics.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
{"Rouge 1": {"Rouge_1 Low Precision": 0.34885388166790793, "Rouge_1 Low recall": 0.28871556132198656, "Rouge_1 Low F1": 0.31058637096822267, "Rouge_1 Mid Precision": 0.412435004251884, "Rouge_1 Mid recall": 0.3386352228897427, "Rouge_1 Mid F1": 0.3517931748124066, "Rouge_1 High Precision": 0.47625451117848977, "Rouge_1 High recall": 0.39086727645312935, "Rouge_1 High F1": 0.3959993953753958}, "Rouge 2": {"Rouge_2 Low Precision": 0.1259156300716482, "Rouge_2 Low recall": 0.10333119800163641, "Rouge_2 Low F1": 0.10992592662502373, "Rouge_2 Mid Precision": 0.16879303949162833, "Rouge_2 Mid recall": 0.13805319188028575, "Rouge_2 Mid F1": 0.14400796293585816, "Rouge_2 High Precision": 0.21844214485938712, "Rouge_2 High recall": 0.1777722350788, "Rouge_2 High F1": 0.18342627795315522}, "Rouge L": {"Rouge_L Low Precision": 0.2322041975032734, "Rouge_L Low recall": 0.194000575085051, "Rouge_L Low F1": 0.20468107864660212, "Rouge_L Mid Precision": 0.2797360675037497, "Rouge_L Mid recall": 0.22647774162854406, "Rouge_L Mid F1": 0.2361293941929179, "Rouge_L High Precision": 0.3357160682858357, "Rouge_L High recall": 0.2622222798536235, "Rouge_L High F1": 0.27267217209978356}, "rougeLsum": {"rougeLsum Low Precision": 0.29651536760563263, "rougeLsum Low recall": 0.2432094838451322, "rougeLsum Low F1": 0.26048483356867896, "rougeLsum Mid Precision": 0.35317671791338556, "rougeLsum Mid recall": 0.286187817596869, "rougeLsum Mid F1": 0.2985727815225495, "rougeLsum High Precision": 0.4134539668577922, "rougeLsum High recall": 0.3365998852405162, "rougeLsum High F1": 0.3454898564714797}}
|
|
|
|
|
|
reports/training_metrics.csv
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Name,Value,Timestamp,Step
|
| 2 |
+
"val_loss",2.615034580230713,1628591864766,0
|
| 3 |
+
"epoch",0,1628591864766,0
|
| 4 |
+
"val_loss",2.6141018867492676,1628591893945,1
|
| 5 |
+
"epoch",1,1628591893945,1
|
| 6 |
+
"val_loss",2.6132164001464844,1628591923101,2
|
| 7 |
+
"epoch",2,1628591923101,2
|
| 8 |
+
"val_loss",2.612450361251831,1628591951319,3
|
| 9 |
+
"epoch",3,1628591951319,3
|
reports/training_metrics.txt
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
{"train_loss": 2.785480260848999, "epoch": 4, "trainer/global_step": 289, "_runtime": 88, "_timestamp": 1627353229, "_step": 9, "val_loss": 2.181020975112915}
|
|
|
|
|
|
reports/training_params.yml
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
status: success
|
reports/visualization_metrics.txt
DELETED
|
File without changes
|
requirements.txt
CHANGED
|
@@ -3,13 +3,13 @@ datasets==1.10.2
|
|
| 3 |
pytorch_lightning==1.3.5
|
| 4 |
transformers==4.9.0
|
| 5 |
torch==1.9.0
|
| 6 |
-
dagshub==0.1.
|
| 7 |
pandas==1.1.5
|
| 8 |
-
rouge_score
|
|
|
|
|
|
|
|
|
|
| 9 |
pyyaml
|
| 10 |
-
dvc
|
| 11 |
-
mlflow
|
| 12 |
-
wandb
|
| 13 |
|
| 14 |
# external requirements
|
| 15 |
click
|
|
|
|
| 3 |
pytorch_lightning==1.3.5
|
| 4 |
transformers==4.9.0
|
| 5 |
torch==1.9.0
|
| 6 |
+
dagshub==0.1.7
|
| 7 |
pandas==1.1.5
|
| 8 |
+
rouge_score==0.0.4
|
| 9 |
+
dvc==2.5.4
|
| 10 |
+
mlflow==1.19.0
|
| 11 |
+
streamlit==0.85.1
|
| 12 |
pyyaml
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# external requirements
|
| 15 |
click
|
src/data/__init__.py
DELETED
|
File without changes
|
src/data/make_dataset.py
CHANGED
|
@@ -17,7 +17,7 @@ def make_dataset(dataset="cnn_dailymail", split="train"):
|
|
| 17 |
|
| 18 |
|
| 19 |
if __name__ == "__main__":
|
| 20 |
-
with open("
|
| 21 |
params = yaml.safe_load(f)
|
| 22 |
pprint.pprint(params)
|
| 23 |
make_dataset(dataset=params["data"], split="train")
|
|
|
|
| 17 |
|
| 18 |
|
| 19 |
if __name__ == "__main__":
|
| 20 |
+
with open("data_params.yml") as f:
|
| 21 |
params = yaml.safe_load(f)
|
| 22 |
pprint.pprint(params)
|
| 23 |
make_dataset(dataset=params["data"], split="train")
|
src/data/process_data.py
CHANGED
|
@@ -5,14 +5,12 @@ import os
|
|
| 5 |
|
| 6 |
def process_data(split="train"):
|
| 7 |
|
| 8 |
-
with open("
|
| 9 |
params = yaml.safe_load(f)
|
| 10 |
|
| 11 |
df = pd.read_csv("data/raw/{}.csv".format(split))
|
| 12 |
df.columns = ["Unnamed: 0", "input_text", "output_text"]
|
| 13 |
df = df.sample(frac=params["split"], replace=True, random_state=1)
|
| 14 |
-
if os.path.exists("data/raw/{}.csv".format(split)):
|
| 15 |
-
os.remove("data/raw/{}.csv".format(split))
|
| 16 |
df.to_csv("data/processed/{}.csv".format(split))
|
| 17 |
|
| 18 |
|
|
|
|
| 5 |
|
| 6 |
def process_data(split="train"):
|
| 7 |
|
| 8 |
+
with open("data_params.yml") as f:
|
| 9 |
params = yaml.safe_load(f)
|
| 10 |
|
| 11 |
df = pd.read_csv("data/raw/{}.csv".format(split))
|
| 12 |
df.columns = ["Unnamed: 0", "input_text", "output_text"]
|
| 13 |
df = df.sample(frac=params["split"], replace=True, random_state=1)
|
|
|
|
|
|
|
| 14 |
df.to_csv("data/processed/{}.csv".format(split))
|
| 15 |
|
| 16 |
|
src/models/evaluate_model.py
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
import yaml
|
| 2 |
|
| 3 |
from model import Summarization
|
|
@@ -9,7 +10,7 @@ def evaluate_model():
|
|
| 9 |
"""
|
| 10 |
Evaluate model using rouge measure
|
| 11 |
"""
|
| 12 |
-
with open("
|
| 13 |
params = yaml.safe_load(f)
|
| 14 |
|
| 15 |
test_df = pd.read_csv("data/processed/test.csv")[:25]
|
|
@@ -17,8 +18,8 @@ def evaluate_model():
|
|
| 17 |
model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
|
| 18 |
results = model.evaluate(test_df=test_df, metrics=params["metric"])
|
| 19 |
|
| 20 |
-
with
|
| 21 |
-
|
| 22 |
|
| 23 |
|
| 24 |
if __name__ == "__main__":
|
|
|
|
| 1 |
+
from dagshub import dagshub_logger
|
| 2 |
import yaml
|
| 3 |
|
| 4 |
from model import Summarization
|
|
|
|
| 10 |
"""
|
| 11 |
Evaluate model using rouge measure
|
| 12 |
"""
|
| 13 |
+
with open("model_params.yml") as f:
|
| 14 |
params = yaml.safe_load(f)
|
| 15 |
|
| 16 |
test_df = pd.read_csv("data/processed/test.csv")[:25]
|
|
|
|
| 18 |
model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
|
| 19 |
results = model.evaluate(test_df=test_df, metrics=params["metric"])
|
| 20 |
|
| 21 |
+
with dagshub_logger(metrics_path='reports/evaluation_metrics.csv', should_log_hparams=False) as logger:
|
| 22 |
+
logger.log_metrics(results)
|
| 23 |
|
| 24 |
|
| 25 |
if __name__ == "__main__":
|
src/models/hf_upload.py
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import shutil
|
| 2 |
+
from getpass import getpass
|
| 3 |
+
from pathlib import Path
|
| 4 |
+
import yaml
|
| 5 |
+
|
| 6 |
+
from model import Summarization
|
| 7 |
+
from huggingface_hub import HfApi, Repository
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def upload(model_to_upload, model_name):
|
| 11 |
+
hf_username = input("Enter your HuggingFace username:")
|
| 12 |
+
hf_token = getpass("Enter your HuggingFace token:")
|
| 13 |
+
model_url = HfApi().create_repo(token=hf_token, name=model_name, exist_ok=True)
|
| 14 |
+
model_repo = Repository(
|
| 15 |
+
"./hf_model",
|
| 16 |
+
clone_from=model_url,
|
| 17 |
+
use_auth_token=hf_token,
|
| 18 |
+
git_email=f"{hf_username}@users.noreply.huggingface.co",
|
| 19 |
+
git_user=hf_username,
|
| 20 |
+
)
|
| 21 |
+
|
| 22 |
+
del hf_token
|
| 23 |
+
readme_txt = f"""
|
| 24 |
+
---
|
| 25 |
+
Summarisation model {model_name}
|
| 26 |
+
""".strip()
|
| 27 |
+
|
| 28 |
+
(Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
|
| 29 |
+
commit_url = model_repo.push_to_hub()
|
| 30 |
+
|
| 31 |
+
print("Check out your model at:")
|
| 32 |
+
print(commit_url)
|
| 33 |
+
print(f"https://huggingface.co/{hf_username}/{model_name}")
|
| 34 |
+
|
| 35 |
+
if Path("./hf_model").exists():
|
| 36 |
+
shutil.rmtree("./hf_model")
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
if __name__ == "__main__":
|
| 40 |
+
with open("model_params.yml") as f:
|
| 41 |
+
params = yaml.safe_load(f)
|
| 42 |
+
|
| 43 |
+
model = Summarization()
|
| 44 |
+
model.load_model(model_dir="./models")
|
| 45 |
+
|
| 46 |
+
upload(model_to_upload=model, model_name=params["name"])
|
src/models/model.py
CHANGED
|
@@ -1,10 +1,7 @@
|
|
| 1 |
-
|
| 2 |
-
from getpass import getpass
|
| 3 |
-
from pathlib import Path
|
| 4 |
|
| 5 |
import torch
|
| 6 |
import pandas as pd
|
| 7 |
-
from huggingface_hub import HfApi, Repository
|
| 8 |
from transformers import (
|
| 9 |
AdamW,
|
| 10 |
T5ForConditionalGeneration,
|
|
@@ -15,7 +12,7 @@ from transformers import (
|
|
| 15 |
)
|
| 16 |
from torch.utils.data import Dataset, DataLoader
|
| 17 |
import pytorch_lightning as pl
|
| 18 |
-
from pytorch_lightning
|
| 19 |
from pytorch_lightning import Trainer
|
| 20 |
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
|
| 21 |
from pytorch_lightning import LightningDataModule
|
|
@@ -23,8 +20,7 @@ from pytorch_lightning import LightningModule
|
|
| 23 |
from datasets import load_metric
|
| 24 |
from tqdm.auto import tqdm
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
|
| 29 |
torch.cuda.empty_cache()
|
| 30 |
pl.seed_everything(42)
|
|
@@ -274,7 +270,9 @@ class LightningModel(LightningModule):
|
|
| 274 |
},
|
| 275 |
]
|
| 276 |
optimizer = AdamW(
|
| 277 |
-
optimizer_grouped_parameters,
|
|
|
|
|
|
|
| 278 |
)
|
| 279 |
self.opt = optimizer
|
| 280 |
return [optimizer]
|
|
@@ -364,14 +362,8 @@ class Summarization:
|
|
| 364 |
weight_decay=weight_decay,
|
| 365 |
)
|
| 366 |
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
tracking_uri="https://dagshub.com/gagan3012/summarization.mlflow",
|
| 370 |
-
)
|
| 371 |
-
|
| 372 |
-
WandLogger = WandbLogger(project="summarization-dagshub")
|
| 373 |
-
|
| 374 |
-
# logger = DAGsHubLogger(metrics_path='reports/training_metrics.txt')
|
| 375 |
|
| 376 |
early_stop_callback = (
|
| 377 |
[
|
|
@@ -390,14 +382,17 @@ class Summarization:
|
|
| 390 |
gpus = -1 if use_gpu and torch.cuda.is_available() else 0
|
| 391 |
|
| 392 |
trainer = Trainer(
|
| 393 |
-
logger=
|
| 394 |
callbacks=early_stop_callback,
|
| 395 |
max_epochs=max_epochs,
|
| 396 |
gpus=gpus,
|
| 397 |
progress_bar_refresh_rate=5,
|
| 398 |
)
|
| 399 |
|
| 400 |
-
|
|
|
|
|
|
|
|
|
|
| 401 |
|
| 402 |
def load_model(
|
| 403 |
self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
|
|
@@ -552,31 +547,3 @@ class Summarization:
|
|
| 552 |
"rougeLsum High F1": results["rougeLsum"].high.fmeasure,
|
| 553 |
}
|
| 554 |
return output
|
| 555 |
-
|
| 556 |
-
def upload(self, hf_username, model_name):
|
| 557 |
-
hf_password = getpass("Enter your HuggingFace password")
|
| 558 |
-
if Path("./models").exists():
|
| 559 |
-
shutil.rmtree("./models")
|
| 560 |
-
token = HfApi().login(username=hf_username, password=hf_password)
|
| 561 |
-
del hf_password
|
| 562 |
-
model_url = HfApi().create_repo(token=token, name=model_name, exist_ok=True)
|
| 563 |
-
model_repo = Repository(
|
| 564 |
-
"./model",
|
| 565 |
-
clone_from=model_url,
|
| 566 |
-
use_auth_token=token,
|
| 567 |
-
git_email=f"{hf_username}@users.noreply.huggingface.co",
|
| 568 |
-
git_user=hf_username,
|
| 569 |
-
)
|
| 570 |
-
|
| 571 |
-
readme_txt = f"""
|
| 572 |
-
---
|
| 573 |
-
Summarisation model {model_name}
|
| 574 |
-
""".strip()
|
| 575 |
-
|
| 576 |
-
(Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
|
| 577 |
-
self.save_model()
|
| 578 |
-
commit_url = model_repo.push_to_hub()
|
| 579 |
-
|
| 580 |
-
print("Check out your model at:")
|
| 581 |
-
print(commit_url)
|
| 582 |
-
print(f"https://huggingface.co/{hf_username}/{model_name}")
|
|
|
|
| 1 |
+
|
|
|
|
|
|
|
| 2 |
|
| 3 |
import torch
|
| 4 |
import pandas as pd
|
|
|
|
| 5 |
from transformers import (
|
| 6 |
AdamW,
|
| 7 |
T5ForConditionalGeneration,
|
|
|
|
| 12 |
)
|
| 13 |
from torch.utils.data import Dataset, DataLoader
|
| 14 |
import pytorch_lightning as pl
|
| 15 |
+
from dagshub.pytorch_lightning import DAGsHubLogger
|
| 16 |
from pytorch_lightning import Trainer
|
| 17 |
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
|
| 18 |
from pytorch_lightning import LightningDataModule
|
|
|
|
| 20 |
from datasets import load_metric
|
| 21 |
from tqdm.auto import tqdm
|
| 22 |
|
| 23 |
+
import mlflow.pytorch
|
|
|
|
| 24 |
|
| 25 |
torch.cuda.empty_cache()
|
| 26 |
pl.seed_everything(42)
|
|
|
|
| 270 |
},
|
| 271 |
]
|
| 272 |
optimizer = AdamW(
|
| 273 |
+
optimizer_grouped_parameters,
|
| 274 |
+
lr=self.learning_rate,
|
| 275 |
+
eps=self.adam_epsilon,
|
| 276 |
)
|
| 277 |
self.opt = optimizer
|
| 278 |
return [optimizer]
|
|
|
|
| 362 |
weight_decay=weight_decay,
|
| 363 |
)
|
| 364 |
|
| 365 |
+
logger = DAGsHubLogger(metrics_path='reports/training_metrics.csv',
|
| 366 |
+
hparams_path='reports/training_params.yml')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 367 |
|
| 368 |
early_stop_callback = (
|
| 369 |
[
|
|
|
|
| 382 |
gpus = -1 if use_gpu and torch.cuda.is_available() else 0
|
| 383 |
|
| 384 |
trainer = Trainer(
|
| 385 |
+
logger=logger,
|
| 386 |
callbacks=early_stop_callback,
|
| 387 |
max_epochs=max_epochs,
|
| 388 |
gpus=gpus,
|
| 389 |
progress_bar_refresh_rate=5,
|
| 390 |
)
|
| 391 |
|
| 392 |
+
mlflow.pytorch.autolog(log_models=False)
|
| 393 |
+
|
| 394 |
+
with mlflow.start_run() as run:
|
| 395 |
+
trainer.fit(self.T5Model, self.data_module)
|
| 396 |
|
| 397 |
def load_model(
|
| 398 |
self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
|
|
|
|
| 547 |
"rougeLsum High F1": results["rougeLsum"].high.fmeasure,
|
| 548 |
}
|
| 549 |
return output
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/models/predict_model.py
CHANGED
|
@@ -8,11 +8,11 @@ def predict_model(text):
|
|
| 8 |
"""
|
| 9 |
Predict the summary of the given text.
|
| 10 |
"""
|
| 11 |
-
with open("
|
| 12 |
params = yaml.safe_load(f)
|
| 13 |
|
| 14 |
model = Summarization()
|
| 15 |
-
model.load_model(model_type=params["model_type"], model_dir=
|
| 16 |
pre_summary = model.predict(text)
|
| 17 |
return pre_summary
|
| 18 |
|
|
|
|
| 8 |
"""
|
| 9 |
Predict the summary of the given text.
|
| 10 |
"""
|
| 11 |
+
with open("model_params.yml") as f:
|
| 12 |
params = yaml.safe_load(f)
|
| 13 |
|
| 14 |
model = Summarization()
|
| 15 |
+
model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
|
| 16 |
pre_summary = model.predict(text)
|
| 17 |
return pre_summary
|
| 18 |
|
src/models/train_model.py
CHANGED
|
@@ -1,5 +1,3 @@
|
|
| 1 |
-
import json
|
| 2 |
-
|
| 3 |
import yaml
|
| 4 |
|
| 5 |
from model import Summarization
|
|
@@ -10,15 +8,15 @@ def train_model():
|
|
| 10 |
"""
|
| 11 |
Train the model
|
| 12 |
"""
|
| 13 |
-
with open("
|
| 14 |
params = yaml.safe_load(f)
|
| 15 |
|
| 16 |
# Load the data
|
| 17 |
train_df = pd.read_csv("data/processed/train.csv")
|
| 18 |
eval_df = pd.read_csv("data/processed/validation.csv")
|
| 19 |
|
| 20 |
-
train_df = train_df.sample(
|
| 21 |
-
eval_df = eval_df.sample(
|
| 22 |
|
| 23 |
model = Summarization()
|
| 24 |
model.from_pretrained(
|
|
@@ -37,15 +35,6 @@ def train_model():
|
|
| 37 |
|
| 38 |
model.save_model(model_dir=params["model_dir"])
|
| 39 |
|
| 40 |
-
with open("wandb/latest-run/files/wandb-summary.json") as json_file:
|
| 41 |
-
data = json.load(json_file)
|
| 42 |
-
|
| 43 |
-
with open("reports/training_metrics.txt", "w") as fp:
|
| 44 |
-
json.dump(data, fp)
|
| 45 |
-
|
| 46 |
-
if params["upload_to_hf"]:
|
| 47 |
-
model.upload(hf_username=params["hf_username"], model_name=params["name"])
|
| 48 |
-
|
| 49 |
|
| 50 |
if __name__ == "__main__":
|
| 51 |
train_model()
|
|
|
|
|
|
|
|
|
|
| 1 |
import yaml
|
| 2 |
|
| 3 |
from model import Summarization
|
|
|
|
| 8 |
"""
|
| 9 |
Train the model
|
| 10 |
"""
|
| 11 |
+
with open("model_params.yml") as f:
|
| 12 |
params = yaml.safe_load(f)
|
| 13 |
|
| 14 |
# Load the data
|
| 15 |
train_df = pd.read_csv("data/processed/train.csv")
|
| 16 |
eval_df = pd.read_csv("data/processed/validation.csv")
|
| 17 |
|
| 18 |
+
train_df = train_df.sample(random_state=1)
|
| 19 |
+
eval_df = eval_df.sample(random_state=1)
|
| 20 |
|
| 21 |
model = Summarization()
|
| 22 |
model.from_pretrained(
|
|
|
|
| 35 |
|
| 36 |
model.save_model(model_dir=params["model_dir"])
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
if __name__ == "__main__":
|
| 40 |
train_model()
|
src/visualization/__init__.py
DELETED
|
File without changes
|
src/visualization/visualize.py
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
import streamlit as st
|
| 2 |
-
import yaml
|
| 3 |
|
| 4 |
from src.models.predict_model import predict_model
|
| 5 |
|
|
@@ -19,14 +18,7 @@ def visualize():
|
|
| 19 |
sumtext = predict_model(text=text)
|
| 20 |
st.write("# Generated Summary:")
|
| 21 |
st.write("{}".format(sumtext))
|
| 22 |
-
with open("reports/visualization_metrics.txt", "w") as file1:
|
| 23 |
-
file1.writelines(text)
|
| 24 |
-
file1.writelines(sumtext)
|
| 25 |
|
| 26 |
|
| 27 |
if __name__ == "__main__":
|
| 28 |
-
|
| 29 |
-
params = yaml.safe_load(f)
|
| 30 |
-
|
| 31 |
-
if params["visualise"]:
|
| 32 |
-
visualize()
|
|
|
|
| 1 |
import streamlit as st
|
|
|
|
| 2 |
|
| 3 |
from src.models.predict_model import predict_model
|
| 4 |
|
|
|
|
| 18 |
sumtext = predict_model(text=text)
|
| 19 |
st.write("# Generated Summary:")
|
| 20 |
st.write("{}".format(sumtext))
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
|
| 23 |
if __name__ == "__main__":
|
| 24 |
+
visualize()
|
|
|
|
|
|
|
|
|
|
|
|