Spaces:
Runtime error
Runtime error
Merge branch 'fix-mlflow' of Dean/summarization into master
Browse files- .github/CODE_OF_CONDUCT.md +0 -128
- .github/CONTRIBUTING.md +0 -92
- .github/FUNDING.yml +0 -12
- .github/ISSUE_TEMPLATE/bug_report.md +0 -38
- .github/ISSUE_TEMPLATE/feature_request.md +0 -20
- .github/PULL_REQUEST_TEMPLATE.md +0 -29
- .gitignore +4 -1
- Makefile +9 -1
- app.py +0 -32
- data.dvc +0 -14
- data_params.yml +2 -0
- dvc.lock +79 -42
- dvc.yaml +18 -9
- params.yml → model_params.yml +5 -10
- reports/evaluation_metrics.csv +37 -0
- reports/evaluation_metrics.txt +0 -1
- reports/training_metrics.csv +9 -0
- reports/training_metrics.txt +0 -1
- reports/training_params.yml +1 -0
- reports/visualization_metrics.txt +0 -0
- requirements.txt +5 -5
- src/data/__init__.py +0 -0
- src/data/make_dataset.py +1 -1
- src/data/process_data.py +1 -3
- src/models/evaluate_model.py +4 -3
- src/models/hf_upload.py +46 -0
- src/models/model.py +13 -46
- src/models/predict_model.py +2 -2
- src/models/train_model.py +3 -14
- src/visualization/__init__.py +0 -0
- src/visualization/visualize.py +1 -9
.github/CODE_OF_CONDUCT.md
DELETED
@@ -1,128 +0,0 @@
|
|
1 |
-
# Contributor Covenant Code of Conduct
|
2 |
-
|
3 |
-
## Our Pledge
|
4 |
-
|
5 |
-
We as members, contributors, and leaders pledge to make participation in our
|
6 |
-
community a harassment-free experience for everyone, regardless of age, body
|
7 |
-
size, visible or invisible disability, ethnicity, sex characteristics, gender
|
8 |
-
identity and expression, level of experience, education, socio-economic status,
|
9 |
-
nationality, personal appearance, race, religion, or sexual identity
|
10 |
-
and orientation.
|
11 |
-
|
12 |
-
We pledge to act and interact in ways that contribute to an open, welcoming,
|
13 |
-
diverse, inclusive, and healthy community.
|
14 |
-
|
15 |
-
## Our Standards
|
16 |
-
|
17 |
-
Examples of behavior that contributes to a positive environment for our
|
18 |
-
community include:
|
19 |
-
|
20 |
-
* Demonstrating empathy and kindness toward other people
|
21 |
-
* Being respectful of differing opinions, viewpoints, and experiences
|
22 |
-
* Giving and gracefully accepting constructive feedback
|
23 |
-
* Accepting responsibility and apologizing to those affected by our mistakes,
|
24 |
-
and learning from the experience
|
25 |
-
* Focusing on what is best not just for us as individuals, but for the
|
26 |
-
overall community
|
27 |
-
|
28 |
-
Examples of unacceptable behavior include:
|
29 |
-
|
30 |
-
* The use of sexualized language or imagery, and sexual attention or
|
31 |
-
advances of any kind
|
32 |
-
* Trolling, insulting or derogatory comments, and personal or political attacks
|
33 |
-
* Public or private harassment
|
34 |
-
* Publishing others' private information, such as a physical or email
|
35 |
-
address, without their explicit permission
|
36 |
-
* Other conduct which could reasonably be considered inappropriate in a
|
37 |
-
professional setting
|
38 |
-
|
39 |
-
## Enforcement Responsibilities
|
40 |
-
|
41 |
-
Community leaders are responsible for clarifying and enforcing our standards of
|
42 |
-
acceptable behavior and will take appropriate and fair corrective action in
|
43 |
-
response to any behavior that they deem inappropriate, threatening, offensive,
|
44 |
-
or harmful.
|
45 |
-
|
46 |
-
Community leaders have the right and responsibility to remove, edit, or reject
|
47 |
-
comments, commits, code, wiki edits, issues, and other contributions that are
|
48 |
-
not aligned to this Code of Conduct, and will communicate reasons for moderation
|
49 |
-
decisions when appropriate.
|
50 |
-
|
51 |
-
## Scope
|
52 |
-
|
53 |
-
This Code of Conduct applies within all community spaces, and also applies when
|
54 |
-
an individual is officially representing the community in public spaces.
|
55 |
-
Examples of representing our community include using an official e-mail address,
|
56 |
-
posting via an official social media account, or acting as an appointed
|
57 |
-
representative at an online or offline event.
|
58 |
-
|
59 |
-
## Enforcement
|
60 |
-
|
61 |
-
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
62 |
-
reported to the community leaders responsible for enforcement at
|
63 |
-
@gagan3012.
|
64 |
-
All complaints will be reviewed and investigated promptly and fairly.
|
65 |
-
|
66 |
-
All community leaders are obligated to respect the privacy and security of the
|
67 |
-
reporter of any incident.
|
68 |
-
|
69 |
-
## Enforcement Guidelines
|
70 |
-
|
71 |
-
Community leaders will follow these Community Impact Guidelines in determining
|
72 |
-
the consequences for any action they deem in violation of this Code of Conduct:
|
73 |
-
|
74 |
-
### 1. Correction
|
75 |
-
|
76 |
-
**Community Impact**: Use of inappropriate language or other behavior deemed
|
77 |
-
unprofessional or unwelcome in the community.
|
78 |
-
|
79 |
-
**Consequence**: A private, written warning from community leaders, providing
|
80 |
-
clarity around the nature of the violation and an explanation of why the
|
81 |
-
behavior was inappropriate. A public apology may be requested.
|
82 |
-
|
83 |
-
### 2. Warning
|
84 |
-
|
85 |
-
**Community Impact**: A violation through a single incident or series
|
86 |
-
of actions.
|
87 |
-
|
88 |
-
**Consequence**: A warning with consequences for continued behavior. No
|
89 |
-
interaction with the people involved, including unsolicited interaction with
|
90 |
-
those enforcing the Code of Conduct, for a specified period of time. This
|
91 |
-
includes avoiding interactions in community spaces as well as external channels
|
92 |
-
like social media. Violating these terms may lead to a temporary or
|
93 |
-
permanent ban.
|
94 |
-
|
95 |
-
### 3. Temporary Ban
|
96 |
-
|
97 |
-
**Community Impact**: A serious violation of community standards, including
|
98 |
-
sustained inappropriate behavior.
|
99 |
-
|
100 |
-
**Consequence**: A temporary ban from any sort of interaction or public
|
101 |
-
communication with the community for a specified period of time. No public or
|
102 |
-
private interaction with the people involved, including unsolicited interaction
|
103 |
-
with those enforcing the Code of Conduct, is allowed during this period.
|
104 |
-
Violating these terms may lead to a permanent ban.
|
105 |
-
|
106 |
-
### 4. Permanent Ban
|
107 |
-
|
108 |
-
**Community Impact**: Demonstrating a pattern of violation of community
|
109 |
-
standards, including sustained inappropriate behavior, harassment of an
|
110 |
-
individual, or aggression toward or disparagement of classes of individuals.
|
111 |
-
|
112 |
-
**Consequence**: A permanent ban from any sort of public interaction within
|
113 |
-
the community.
|
114 |
-
|
115 |
-
## Attribution
|
116 |
-
|
117 |
-
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
118 |
-
version 2.0, available at
|
119 |
-
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
|
120 |
-
|
121 |
-
Community Impact Guidelines were inspired by [Mozilla's code of conduct
|
122 |
-
enforcement ladder](https://github.com/mozilla/diversity).
|
123 |
-
|
124 |
-
[homepage]: https://www.contributor-covenant.org
|
125 |
-
|
126 |
-
For answers to common questions about this code of conduct, see the FAQ at
|
127 |
-
https://www.contributor-covenant.org/faq. Translations are available at
|
128 |
-
https://www.contributor-covenant.org/translations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/CONTRIBUTING.md
DELETED
@@ -1,92 +0,0 @@
|
|
1 |
-
# Contributing
|
2 |
-
|
3 |
-
When contributing to this repository, please first discuss the change you wish to make via issue,
|
4 |
-
email, or any other method with the owners of this repository before making a change.
|
5 |
-
|
6 |
-
Please note we have a code of conduct, please follow it in all your interactions with the project.
|
7 |
-
|
8 |
-
## Pull Request Process
|
9 |
-
|
10 |
-
1. Ensure any install or build dependencies are removed before the end of the layer when doing a
|
11 |
-
build.
|
12 |
-
2. Update the README.md with details of changes to the interface, this includes new environment
|
13 |
-
variables, exposed ports, useful file locations and container parameters.
|
14 |
-
3. Increase the version numbers in any examples files and the README.md to the new version that this
|
15 |
-
Pull Request would represent. The versioning scheme we use is [SemVer](http://semver.org/).
|
16 |
-
4. You may merge the Pull Request in once you have the sign-off of two other developers, or if you
|
17 |
-
do not have permission to do that, you may request the second reviewer to merge it for you.
|
18 |
-
|
19 |
-
## Code of Conduct
|
20 |
-
|
21 |
-
### Our Pledge
|
22 |
-
|
23 |
-
In the interest of fostering an open and welcoming environment, we as
|
24 |
-
contributors and maintainers pledge to making participation in our project and
|
25 |
-
our community a harassment-free experience for everyone, regardless of age, body
|
26 |
-
size, disability, ethnicity, gender identity and expression, level of experience,
|
27 |
-
nationality, personal appearance, race, religion, or sexual identity and
|
28 |
-
orientation.
|
29 |
-
|
30 |
-
### Our Standards
|
31 |
-
|
32 |
-
Examples of behavior that contributes to creating a positive environment
|
33 |
-
include:
|
34 |
-
|
35 |
-
* Using welcoming and inclusive language
|
36 |
-
* Being respectful of differing viewpoints and experiences
|
37 |
-
* Gracefully accepting constructive criticism
|
38 |
-
* Focusing on what is best for the community
|
39 |
-
* Showing empathy towards other community members
|
40 |
-
|
41 |
-
Examples of unacceptable behavior by participants include:
|
42 |
-
|
43 |
-
* The use of sexualized language or imagery and unwelcome sexual attention or
|
44 |
-
advances
|
45 |
-
* Trolling, insulting/derogatory comments, and personal or political attacks
|
46 |
-
* Public or private harassment
|
47 |
-
* Publishing others' private information, such as a physical or electronic
|
48 |
-
address, without explicit permission
|
49 |
-
* Other conduct which could reasonably be considered inappropriate in a
|
50 |
-
professional setting
|
51 |
-
|
52 |
-
### Our Responsibilities
|
53 |
-
|
54 |
-
Project maintainers are responsible for clarifying the standards of acceptable
|
55 |
-
behavior and are expected to take appropriate and fair corrective action in
|
56 |
-
response to any instances of unacceptable behavior.
|
57 |
-
|
58 |
-
Project maintainers have the right and responsibility to remove, edit, or
|
59 |
-
reject comments, commits, code, wiki edits, issues, and other contributions
|
60 |
-
that are not aligned to this Code of Conduct, or to ban temporarily or
|
61 |
-
permanently any contributor for other behaviors that they deem inappropriate,
|
62 |
-
threatening, offensive, or harmful.
|
63 |
-
|
64 |
-
### Scope
|
65 |
-
|
66 |
-
This Code of Conduct applies both within project spaces and in public spaces
|
67 |
-
when an individual is representing the project or its community. Examples of
|
68 |
-
representing a project or community include using an official project e-mail
|
69 |
-
address, posting via an official social media account, or acting as an appointed
|
70 |
-
representative at an online or offline event. Representation of a project may be
|
71 |
-
further defined and clarified by project maintainers.
|
72 |
-
|
73 |
-
### Enforcement
|
74 |
-
|
75 |
-
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
76 |
-
reported by contacting the project team at [INSERT EMAIL ADDRESS]. All
|
77 |
-
complaints will be reviewed and investigated and will result in a response that
|
78 |
-
is deemed necessary and appropriate to the circumstances. The project team is
|
79 |
-
obligated to maintain confidentiality with regard to the reporter of an incident.
|
80 |
-
Further details of specific enforcement policies may be posted separately.
|
81 |
-
|
82 |
-
Project maintainers who do not follow or enforce the Code of Conduct in good
|
83 |
-
faith may face temporary or permanent repercussions as determined by other
|
84 |
-
members of the project's leadership.
|
85 |
-
|
86 |
-
### Attribution
|
87 |
-
|
88 |
-
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
89 |
-
available at [http://contributor-covenant.org/version/1/4][version]
|
90 |
-
|
91 |
-
[homepage]: http://contributor-covenant.org
|
92 |
-
[version]: http://contributor-covenant.org/version/1/4/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/FUNDING.yml
DELETED
@@ -1,12 +0,0 @@
|
|
1 |
-
# These are supported funding model platforms
|
2 |
-
|
3 |
-
github: gagan3012 # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
|
4 |
-
patreon: # Replace with a single Patreon username
|
5 |
-
open_collective: # Replace with a single Open Collective username
|
6 |
-
ko_fi: # Replace with a single Ko-fi username
|
7 |
-
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
|
8 |
-
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
|
9 |
-
liberapay: # Replace with a single Liberapay username
|
10 |
-
issuehunt: # Replace with a single IssueHunt username
|
11 |
-
otechie: # Replace with a single Otechie username
|
12 |
-
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/ISSUE_TEMPLATE/bug_report.md
DELETED
@@ -1,38 +0,0 @@
|
|
1 |
-
---
|
2 |
-
name: Bug report
|
3 |
-
about: Create a report to help us improve
|
4 |
-
title: ''
|
5 |
-
labels: ''
|
6 |
-
assignees: ''
|
7 |
-
|
8 |
-
---
|
9 |
-
|
10 |
-
**Describe the bug**
|
11 |
-
A clear and concise description of what the bug is.
|
12 |
-
|
13 |
-
**To Reproduce**
|
14 |
-
Steps to reproduce the behavior:
|
15 |
-
1. Go to '...'
|
16 |
-
2. Click on '....'
|
17 |
-
3. Scroll down to '....'
|
18 |
-
4. See error
|
19 |
-
|
20 |
-
**Expected behavior**
|
21 |
-
A clear and concise description of what you expected to happen.
|
22 |
-
|
23 |
-
**Screenshots**
|
24 |
-
If applicable, add screenshots to help explain your problem.
|
25 |
-
|
26 |
-
**Desktop (please complete the following information):**
|
27 |
-
- OS: [e.g. iOS]
|
28 |
-
- Browser [e.g. chrome, safari]
|
29 |
-
- Version [e.g. 22]
|
30 |
-
|
31 |
-
**Smartphone (please complete the following information):**
|
32 |
-
- Device: [e.g. iPhone6]
|
33 |
-
- OS: [e.g. iOS8.1]
|
34 |
-
- Browser [e.g. stock browser, safari]
|
35 |
-
- Version [e.g. 22]
|
36 |
-
|
37 |
-
**Additional context**
|
38 |
-
Add any other context about the problem here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/ISSUE_TEMPLATE/feature_request.md
DELETED
@@ -1,20 +0,0 @@
|
|
1 |
-
---
|
2 |
-
name: Feature request
|
3 |
-
about: Suggest an idea for this project
|
4 |
-
title: ''
|
5 |
-
labels: ''
|
6 |
-
assignees: ''
|
7 |
-
|
8 |
-
---
|
9 |
-
|
10 |
-
**Is your feature request related to a problem? Please describe.**
|
11 |
-
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
|
12 |
-
|
13 |
-
**Describe the solution you'd like**
|
14 |
-
A clear and concise description of what you want to happen.
|
15 |
-
|
16 |
-
**Describe alternatives you've considered**
|
17 |
-
A clear and concise description of any alternative solutions or features you've considered.
|
18 |
-
|
19 |
-
**Additional context**
|
20 |
-
Add any other context or screenshots about the feature request here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.github/PULL_REQUEST_TEMPLATE.md
DELETED
@@ -1,29 +0,0 @@
|
|
1 |
-
<!--- Provide a general summary of your changes in the Title above -->
|
2 |
-
|
3 |
-
## Description
|
4 |
-
<!--- Describe your changes in detail -->
|
5 |
-
|
6 |
-
## Motivation and Context
|
7 |
-
<!--- Why is this change required? What problem does it solve? -->
|
8 |
-
<!--- If it fixes an open issue, please link to the issue here. -->
|
9 |
-
|
10 |
-
## How Has This Been Tested?
|
11 |
-
<!--- Please describe in detail how you tested your changes. -->
|
12 |
-
<!--- Include details of your testing environment, and the tests you ran to -->
|
13 |
-
<!--- see how your change affects other areas of the code, etc. -->
|
14 |
-
|
15 |
-
## Screenshots (if appropriate):
|
16 |
-
|
17 |
-
## Types of changes
|
18 |
-
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
|
19 |
-
- [ ] Bug fix (non-breaking change which fixes an issue)
|
20 |
-
- [ ] New feature (non-breaking change which adds functionality)
|
21 |
-
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
|
22 |
-
|
23 |
-
## Checklist:
|
24 |
-
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
|
25 |
-
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
|
26 |
-
- [ ] My code follows the code style of this project.
|
27 |
-
- [ ] My change requires a change to the documentation.
|
28 |
-
- [ ] I have updated the documentation accordingly.
|
29 |
-
- [ ] I have read the **CONTRIBUTING** document.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.gitignore
CHANGED
@@ -93,6 +93,9 @@ coverage.xml
|
|
93 |
.vscode
|
94 |
/data
|
95 |
|
96 |
-
wandb/
|
97 |
summarization-dagshub/
|
98 |
/models
|
|
|
|
|
|
|
|
|
|
93 |
.vscode
|
94 |
/data
|
95 |
|
|
|
96 |
summarization-dagshub/
|
97 |
/models
|
98 |
+
default/
|
99 |
+
artifacts/
|
100 |
+
mlruns/
|
101 |
+
hf_model/
|
Makefile
CHANGED
@@ -48,7 +48,15 @@ pull:
|
|
48 |
|
49 |
## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
|
50 |
run:
|
51 |
-
dvc repro
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
#################################################################################
|
54 |
# PROJECT RULES #
|
|
|
48 |
|
49 |
## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
|
50 |
run:
|
51 |
+
dvc repro eval
|
52 |
+
|
53 |
+
## run the visualization using Streamlit
|
54 |
+
visualize:
|
55 |
+
dvc repro visualize
|
56 |
+
|
57 |
+
## push the trained model to HF model hub
|
58 |
+
push_to_hf_hub:
|
59 |
+
dvc repro push_to_hf_hub
|
60 |
|
61 |
#################################################################################
|
62 |
# PROJECT RULES #
|
app.py
DELETED
@@ -1,32 +0,0 @@
|
|
1 |
-
import streamlit as st
|
2 |
-
import yaml
|
3 |
-
|
4 |
-
from src.models.predict_model import predict_model
|
5 |
-
|
6 |
-
|
7 |
-
def visualize():
|
8 |
-
st.write("# Summarization UI")
|
9 |
-
st.markdown(
|
10 |
-
"""
|
11 |
-
*For additional questions and inquiries, please contact **Gagan Bhatia** via [LinkedIn](
|
12 |
-
https://www.linkedin.com/in/gbhatia30/) or [Github](https://github.com/gagan3012).*
|
13 |
-
"""
|
14 |
-
)
|
15 |
-
|
16 |
-
text = st.text_area("Enter text here")
|
17 |
-
if st.button("Generate Summary"):
|
18 |
-
with st.spinner("Connecting the Dots..."):
|
19 |
-
sumtext = predict_model(text=text)
|
20 |
-
st.write("# Generated Summary:")
|
21 |
-
st.write("{}".format(sumtext))
|
22 |
-
with open("reports/visualization_metrics.txt", "w") as file1:
|
23 |
-
file1.writelines(text)
|
24 |
-
file1.writelines(sumtext)
|
25 |
-
|
26 |
-
|
27 |
-
if __name__ == "__main__":
|
28 |
-
with open("params.yml") as f:
|
29 |
-
params = yaml.safe_load(f)
|
30 |
-
|
31 |
-
if params["visualise"]:
|
32 |
-
visualize()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data.dvc
DELETED
@@ -1,14 +0,0 @@
|
|
1 |
-
deps:
|
2 |
-
- path: params.yml
|
3 |
-
md5: d0f3e81bc9191e752a69761045a449d9
|
4 |
-
size: 196
|
5 |
-
- path: src/data/make_dataset.py
|
6 |
-
md5: 9de71de0f8df5d0a7beb235ef7c7777d
|
7 |
-
size: 772
|
8 |
-
cmd: python src/data/make_dataset.py
|
9 |
-
outs:
|
10 |
-
- md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
|
11 |
-
nfiles: 3
|
12 |
-
path: data/raw
|
13 |
-
size: 1358833013
|
14 |
-
md5: ff502232006c7fbef1015b5aa5cc4bbb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data_params.yml
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
data: cnn_dailymail
|
2 |
+
split: 0.001
|
dvc.lock
CHANGED
@@ -4,65 +4,102 @@ stages:
|
|
4 |
cmd: python src/models/train_model.py
|
5 |
deps:
|
6 |
- path: data/processed/train.csv
|
7 |
-
md5:
|
8 |
-
size:
|
9 |
- path: data/processed/validation.csv
|
10 |
-
md5:
|
11 |
-
size:
|
12 |
-
- path:
|
13 |
-
md5:
|
14 |
-
size:
|
15 |
- path: src/models/train_model.py
|
16 |
-
md5:
|
17 |
-
size:
|
18 |
outs:
|
19 |
- path: models
|
20 |
-
md5:
|
21 |
-
size:
|
22 |
-
nfiles:
|
23 |
-
- path: reports/training_metrics.
|
24 |
-
md5:
|
25 |
-
size:
|
26 |
eval:
|
27 |
cmd: python src/models/evaluate_model.py
|
28 |
deps:
|
29 |
- path: data/processed/test.csv
|
30 |
-
md5:
|
31 |
-
size:
|
|
|
|
|
|
|
32 |
- path: models
|
33 |
-
md5:
|
34 |
-
size:
|
35 |
-
nfiles:
|
36 |
-
- path: params.yml
|
37 |
-
md5: d0f3e81bc9191e752a69761045a449d9
|
38 |
-
size: 196
|
39 |
- path: src/models/evaluate_model.py
|
40 |
-
md5:
|
41 |
-
size:
|
42 |
outs:
|
43 |
-
- path: reports/
|
44 |
-
md5:
|
45 |
-
size:
|
46 |
process_data:
|
47 |
cmd: python src/data/process_data.py
|
48 |
deps:
|
49 |
- path: data/raw
|
50 |
-
md5:
|
51 |
-
size:
|
52 |
-
nfiles:
|
53 |
-
- path:
|
54 |
-
md5:
|
55 |
-
size:
|
56 |
- path: src/data/process_data.py
|
57 |
-
md5:
|
58 |
-
size:
|
59 |
outs:
|
60 |
- path: data/processed/test.csv
|
61 |
-
md5:
|
62 |
-
size:
|
63 |
- path: data/processed/train.csv
|
64 |
-
md5:
|
65 |
-
size:
|
66 |
- path: data/processed/validation.csv
|
67 |
-
md5:
|
68 |
-
size:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
cmd: python src/models/train_model.py
|
5 |
deps:
|
6 |
- path: data/processed/train.csv
|
7 |
+
md5: 5331b9c32b2d097d8d7aca01de5524bc
|
8 |
+
size: 1198262
|
9 |
- path: data/processed/validation.csv
|
10 |
+
md5: 6069153a075b00dfb6d9e0843dd2da89
|
11 |
+
size: 52739
|
12 |
+
- path: model_params.yml
|
13 |
+
md5: 1bf2edf25e851cc9cd3be75fbd9905a3
|
14 |
+
size: 177
|
15 |
- path: src/models/train_model.py
|
16 |
+
md5: f7d1121426c3d5530c2b9697cb7ac74a
|
17 |
+
size: 951
|
18 |
outs:
|
19 |
- path: models
|
20 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
21 |
+
size: 243476333
|
22 |
+
nfiles: 5
|
23 |
+
- path: reports/training_metrics.csv
|
24 |
+
md5: 3b309def91a32e521acd23b163742522
|
25 |
+
size: 320
|
26 |
eval:
|
27 |
cmd: python src/models/evaluate_model.py
|
28 |
deps:
|
29 |
- path: data/processed/test.csv
|
30 |
+
md5: 3eec94ac211c76363a3d968663b82d02
|
31 |
+
size: 39574
|
32 |
+
- path: model_params.yml
|
33 |
+
md5: 1bf2edf25e851cc9cd3be75fbd9905a3
|
34 |
+
size: 177
|
35 |
- path: models
|
36 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
37 |
+
size: 243476333
|
38 |
+
nfiles: 5
|
|
|
|
|
|
|
39 |
- path: src/models/evaluate_model.py
|
40 |
+
md5: 89edb77aaab3055605ae6db2e21eab82
|
41 |
+
size: 705
|
42 |
outs:
|
43 |
+
- path: reports/evaluation_metrics.csv
|
44 |
+
md5: eaa3bf017026aa1be31560f308fff78e
|
45 |
+
size: 2122
|
46 |
process_data:
|
47 |
cmd: python src/data/process_data.py
|
48 |
deps:
|
49 |
- path: data/raw
|
50 |
+
md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
|
51 |
+
size: 1358833013
|
52 |
+
nfiles: 3
|
53 |
+
- path: data_params.yml
|
54 |
+
md5: a68eabf79c3b3e28afb05baa1944bbc7
|
55 |
+
size: 32
|
56 |
- path: src/data/process_data.py
|
57 |
+
md5: 68db554a69a0c8ce807907afa2be5e9c
|
58 |
+
size: 521
|
59 |
outs:
|
60 |
- path: data/processed/test.csv
|
61 |
+
md5: 3eec94ac211c76363a3d968663b82d02
|
62 |
+
size: 39574
|
63 |
- path: data/processed/train.csv
|
64 |
+
md5: 5331b9c32b2d097d8d7aca01de5524bc
|
65 |
+
size: 1198262
|
66 |
- path: data/processed/validation.csv
|
67 |
+
md5: 6069153a075b00dfb6d9e0843dd2da89
|
68 |
+
size: 52739
|
69 |
+
download_data:
|
70 |
+
cmd: python src/data/make_dataset.py
|
71 |
+
deps:
|
72 |
+
- path: data_params.yml
|
73 |
+
md5: a68eabf79c3b3e28afb05baa1944bbc7
|
74 |
+
size: 32
|
75 |
+
- path: src/data/make_dataset.py
|
76 |
+
md5: a0667f4ad8c06551609bd0bf950167b7
|
77 |
+
size: 776
|
78 |
+
outs:
|
79 |
+
- path: data/raw
|
80 |
+
md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
|
81 |
+
size: 1358833013
|
82 |
+
nfiles: 3
|
83 |
+
visualize:
|
84 |
+
cmd: streamlit run src/visualization/visualize.py
|
85 |
+
deps:
|
86 |
+
- path: models
|
87 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
88 |
+
size: 243476333
|
89 |
+
nfiles: 5
|
90 |
+
- path: src/visualization/visualize.py
|
91 |
+
md5: 4226e4148abb5ac186c0ab8c1d87b228
|
92 |
+
size: 671
|
93 |
+
push_to_hf_hub:
|
94 |
+
cmd: python src/models/hf_upload.py
|
95 |
+
deps:
|
96 |
+
- path: model_params.yml
|
97 |
+
md5: 1bf2edf25e851cc9cd3be75fbd9905a3
|
98 |
+
size: 177
|
99 |
+
- path: models
|
100 |
+
md5: fc37870a93db61b94af9f0847577f09b.dir
|
101 |
+
size: 243476333
|
102 |
+
nfiles: 5
|
103 |
+
- path: src/models/hf_upload.py
|
104 |
+
md5: a953816a3eb7bef702313544103a1c11
|
105 |
+
size: 1290
|
dvc.yaml
CHANGED
@@ -1,8 +1,15 @@
|
|
1 |
stages:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
process_data:
|
3 |
cmd: python src/data/process_data.py
|
4 |
deps:
|
5 |
-
-
|
6 |
- data/raw
|
7 |
- src/data/process_data.py
|
8 |
outs:
|
@@ -18,7 +25,7 @@ stages:
|
|
18 |
train:
|
19 |
cmd: python src/models/train_model.py
|
20 |
deps:
|
21 |
-
-
|
22 |
- data/processed/train.csv
|
23 |
- data/processed/validation.csv
|
24 |
- src/models/train_model.py
|
@@ -26,25 +33,27 @@ stages:
|
|
26 |
- models:
|
27 |
persist: true
|
28 |
metrics:
|
29 |
-
- reports/training_metrics.
|
30 |
cache: false
|
31 |
eval:
|
32 |
cmd: python src/models/evaluate_model.py
|
33 |
deps:
|
34 |
-
-
|
35 |
- data/processed/test.csv
|
36 |
- models
|
37 |
- src/models/evaluate_model.py
|
38 |
metrics:
|
39 |
-
- reports/evaluation_metrics.
|
40 |
cache: false
|
41 |
visualize:
|
42 |
cmd: streamlit run src/visualization/visualize.py
|
43 |
deps:
|
44 |
- models
|
45 |
- src/visualization/visualize.py
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
|
|
|
|
50 |
|
|
|
1 |
stages:
|
2 |
+
download_data:
|
3 |
+
cmd: python src/data/make_dataset.py
|
4 |
+
deps:
|
5 |
+
- data_params.yml
|
6 |
+
- src/data/make_dataset.py
|
7 |
+
outs:
|
8 |
+
- data/raw
|
9 |
process_data:
|
10 |
cmd: python src/data/process_data.py
|
11 |
deps:
|
12 |
+
- data_params.yml
|
13 |
- data/raw
|
14 |
- src/data/process_data.py
|
15 |
outs:
|
|
|
25 |
train:
|
26 |
cmd: python src/models/train_model.py
|
27 |
deps:
|
28 |
+
- model_params.yml
|
29 |
- data/processed/train.csv
|
30 |
- data/processed/validation.csv
|
31 |
- src/models/train_model.py
|
|
|
33 |
- models:
|
34 |
persist: true
|
35 |
metrics:
|
36 |
+
- reports/training_metrics.csv:
|
37 |
cache: false
|
38 |
eval:
|
39 |
cmd: python src/models/evaluate_model.py
|
40 |
deps:
|
41 |
+
- model_params.yml
|
42 |
- data/processed/test.csv
|
43 |
- models
|
44 |
- src/models/evaluate_model.py
|
45 |
metrics:
|
46 |
+
- reports/evaluation_metrics.csv:
|
47 |
cache: false
|
48 |
visualize:
|
49 |
cmd: streamlit run src/visualization/visualize.py
|
50 |
deps:
|
51 |
- models
|
52 |
- src/visualization/visualize.py
|
53 |
+
push_to_hf_hub:
|
54 |
+
cmd: python src/models/hf_upload.py
|
55 |
+
deps:
|
56 |
+
- model_params.yml
|
57 |
+
- src/models/hf_upload.py
|
58 |
+
- models
|
59 |
|
params.yml → model_params.yml
RENAMED
@@ -1,16 +1,11 @@
|
|
1 |
name: summarsiation
|
2 |
-
data: cnn_dailymail
|
3 |
-
batch_size: 2
|
4 |
-
num_workers: 2
|
5 |
model_type: t5
|
6 |
model_name: t5-small
|
7 |
-
|
8 |
epochs: 5
|
9 |
-
|
|
|
|
|
10 |
model_dir: models
|
11 |
metric: rouge
|
12 |
-
|
13 |
-
use_gpu: True
|
14 |
-
visualise: True
|
15 |
-
hf_username: gagan3012
|
16 |
-
upload_to_hf: True
|
|
|
1 |
name: summarsiation
|
|
|
|
|
|
|
2 |
model_type: t5
|
3 |
model_name: t5-small
|
4 |
+
batch_size: 2
|
5 |
epochs: 5
|
6 |
+
use_gpu: True
|
7 |
+
learning_rate: 1e-4
|
8 |
+
num_workers: 2
|
9 |
model_dir: models
|
10 |
metric: rouge
|
11 |
+
source_dir: src
|
|
|
|
|
|
|
|
reports/evaluation_metrics.csv
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Name,Value,Timestamp,Step
|
2 |
+
"Rouge_1 Low Precision",0.23786550570641482,1628587253223,1
|
3 |
+
"Rouge_1 Low recall",0.23355396379384713,1628587253223,1
|
4 |
+
"Rouge_1 Low F1",0.23602599457077003,1628587253223,1
|
5 |
+
"Rouge_1 Mid Precision",0.3569471852499436,1628587253223,1
|
6 |
+
"Rouge_1 Mid recall",0.31915939075819916,1628587253223,1
|
7 |
+
"Rouge_1 Mid F1",0.3317618573023773,1628587253223,1
|
8 |
+
"Rouge_1 High Precision",0.4726861301480842,1628587253223,1
|
9 |
+
"Rouge_1 High recall",0.4019654200001146,1628587253223,1
|
10 |
+
"Rouge_1 High F1",0.4298956952594035,1628587253223,1
|
11 |
+
"Rouge_2 Low Precision",0.06184772400193972,1628587253223,1
|
12 |
+
"Rouge_2 Low recall",0.05626972412346313,1628587253223,1
|
13 |
+
"Rouge_2 Low F1",0.058680298802341754,1628587253223,1
|
14 |
+
"Rouge_2 Mid Precision",0.1367034298993256,1628587253223,1
|
15 |
+
"Rouge_2 Mid recall",0.11953160646342464,1628587253223,1
|
16 |
+
"Rouge_2 Mid F1",0.12485064123505887,1628587253223,1
|
17 |
+
"Rouge_2 High Precision",0.22739029631016827,1628587253223,1
|
18 |
+
"Rouge_2 High recall",0.18851628169809986,1628587253223,1
|
19 |
+
"Rouge_2 High F1",0.20306657551189072,1628587253223,1
|
20 |
+
"Rouge_L Low Precision",0.18248956154159507,1628587253223,1
|
21 |
+
"Rouge_L Low recall",0.18048774357814204,1628587253223,1
|
22 |
+
"Rouge_L Low F1",0.18151380309623336,1628587253223,1
|
23 |
+
"Rouge_L Mid Precision",0.2614974838710314,1628587253223,1
|
24 |
+
"Rouge_L Mid recall",0.24286688705755238,1628587253223,1
|
25 |
+
"Rouge_L Mid F1",0.24674586991996245,1628587253223,1
|
26 |
+
"Rouge_L High Precision",0.3574471638807763,1628587253223,1
|
27 |
+
"Rouge_L High recall",0.30836083808542225,1628587253223,1
|
28 |
+
"Rouge_L High F1",0.32385446385474176,1628587253223,1
|
29 |
+
"rougeLsum Low Precision",0.21468633089019287,1628587253223,1
|
30 |
+
"rougeLsum Low recall",0.2057771050364415,1628587253223,1
|
31 |
+
"rougeLsum Low F1",0.21170611912426093,1628587253223,1
|
32 |
+
"rougeLsum Mid Precision",0.3060593850789648,1628587253223,1
|
33 |
+
"rougeLsum Mid recall",0.27733553744690076,1628587253223,1
|
34 |
+
"rougeLsum Mid F1",0.28530501988436374,1628587253223,1
|
35 |
+
"rougeLsum High Precision",0.4094614601758424,1628587253223,1
|
36 |
+
"rougeLsum High recall",0.34640369291505535,1628587253223,1
|
37 |
+
"rougeLsum High F1",0.36454440079714096,1628587253223,1
|
reports/evaluation_metrics.txt
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"Rouge 1": {"Rouge_1 Low Precision": 0.34885388166790793, "Rouge_1 Low recall": 0.28871556132198656, "Rouge_1 Low F1": 0.31058637096822267, "Rouge_1 Mid Precision": 0.412435004251884, "Rouge_1 Mid recall": 0.3386352228897427, "Rouge_1 Mid F1": 0.3517931748124066, "Rouge_1 High Precision": 0.47625451117848977, "Rouge_1 High recall": 0.39086727645312935, "Rouge_1 High F1": 0.3959993953753958}, "Rouge 2": {"Rouge_2 Low Precision": 0.1259156300716482, "Rouge_2 Low recall": 0.10333119800163641, "Rouge_2 Low F1": 0.10992592662502373, "Rouge_2 Mid Precision": 0.16879303949162833, "Rouge_2 Mid recall": 0.13805319188028575, "Rouge_2 Mid F1": 0.14400796293585816, "Rouge_2 High Precision": 0.21844214485938712, "Rouge_2 High recall": 0.1777722350788, "Rouge_2 High F1": 0.18342627795315522}, "Rouge L": {"Rouge_L Low Precision": 0.2322041975032734, "Rouge_L Low recall": 0.194000575085051, "Rouge_L Low F1": 0.20468107864660212, "Rouge_L Mid Precision": 0.2797360675037497, "Rouge_L Mid recall": 0.22647774162854406, "Rouge_L Mid F1": 0.2361293941929179, "Rouge_L High Precision": 0.3357160682858357, "Rouge_L High recall": 0.2622222798536235, "Rouge_L High F1": 0.27267217209978356}, "rougeLsum": {"rougeLsum Low Precision": 0.29651536760563263, "rougeLsum Low recall": 0.2432094838451322, "rougeLsum Low F1": 0.26048483356867896, "rougeLsum Mid Precision": 0.35317671791338556, "rougeLsum Mid recall": 0.286187817596869, "rougeLsum Mid F1": 0.2985727815225495, "rougeLsum High Precision": 0.4134539668577922, "rougeLsum High recall": 0.3365998852405162, "rougeLsum High F1": 0.3454898564714797}}
|
|
|
|
reports/training_metrics.csv
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Name,Value,Timestamp,Step
|
2 |
+
"val_loss",2.615034580230713,1628591864766,0
|
3 |
+
"epoch",0,1628591864766,0
|
4 |
+
"val_loss",2.6141018867492676,1628591893945,1
|
5 |
+
"epoch",1,1628591893945,1
|
6 |
+
"val_loss",2.6132164001464844,1628591923101,2
|
7 |
+
"epoch",2,1628591923101,2
|
8 |
+
"val_loss",2.612450361251831,1628591951319,3
|
9 |
+
"epoch",3,1628591951319,3
|
reports/training_metrics.txt
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"train_loss": 2.785480260848999, "epoch": 4, "trainer/global_step": 289, "_runtime": 88, "_timestamp": 1627353229, "_step": 9, "val_loss": 2.181020975112915}
|
|
|
|
reports/training_params.yml
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
status: success
|
reports/visualization_metrics.txt
DELETED
File without changes
|
requirements.txt
CHANGED
@@ -3,13 +3,13 @@ datasets==1.10.2
|
|
3 |
pytorch_lightning==1.3.5
|
4 |
transformers==4.9.0
|
5 |
torch==1.9.0
|
6 |
-
dagshub==0.1.
|
7 |
pandas==1.1.5
|
8 |
-
rouge_score
|
|
|
|
|
|
|
9 |
pyyaml
|
10 |
-
dvc
|
11 |
-
mlflow
|
12 |
-
wandb
|
13 |
|
14 |
# external requirements
|
15 |
click
|
|
|
3 |
pytorch_lightning==1.3.5
|
4 |
transformers==4.9.0
|
5 |
torch==1.9.0
|
6 |
+
dagshub==0.1.7
|
7 |
pandas==1.1.5
|
8 |
+
rouge_score==0.0.4
|
9 |
+
dvc==2.5.4
|
10 |
+
mlflow==1.19.0
|
11 |
+
streamlit==0.85.1
|
12 |
pyyaml
|
|
|
|
|
|
|
13 |
|
14 |
# external requirements
|
15 |
click
|
src/data/__init__.py
DELETED
File without changes
|
src/data/make_dataset.py
CHANGED
@@ -17,7 +17,7 @@ def make_dataset(dataset="cnn_dailymail", split="train"):
|
|
17 |
|
18 |
|
19 |
if __name__ == "__main__":
|
20 |
-
with open("
|
21 |
params = yaml.safe_load(f)
|
22 |
pprint.pprint(params)
|
23 |
make_dataset(dataset=params["data"], split="train")
|
|
|
17 |
|
18 |
|
19 |
if __name__ == "__main__":
|
20 |
+
with open("data_params.yml") as f:
|
21 |
params = yaml.safe_load(f)
|
22 |
pprint.pprint(params)
|
23 |
make_dataset(dataset=params["data"], split="train")
|
src/data/process_data.py
CHANGED
@@ -5,14 +5,12 @@ import os
|
|
5 |
|
6 |
def process_data(split="train"):
|
7 |
|
8 |
-
with open("
|
9 |
params = yaml.safe_load(f)
|
10 |
|
11 |
df = pd.read_csv("data/raw/{}.csv".format(split))
|
12 |
df.columns = ["Unnamed: 0", "input_text", "output_text"]
|
13 |
df = df.sample(frac=params["split"], replace=True, random_state=1)
|
14 |
-
if os.path.exists("data/raw/{}.csv".format(split)):
|
15 |
-
os.remove("data/raw/{}.csv".format(split))
|
16 |
df.to_csv("data/processed/{}.csv".format(split))
|
17 |
|
18 |
|
|
|
5 |
|
6 |
def process_data(split="train"):
|
7 |
|
8 |
+
with open("data_params.yml") as f:
|
9 |
params = yaml.safe_load(f)
|
10 |
|
11 |
df = pd.read_csv("data/raw/{}.csv".format(split))
|
12 |
df.columns = ["Unnamed: 0", "input_text", "output_text"]
|
13 |
df = df.sample(frac=params["split"], replace=True, random_state=1)
|
|
|
|
|
14 |
df.to_csv("data/processed/{}.csv".format(split))
|
15 |
|
16 |
|
src/models/evaluate_model.py
CHANGED
@@ -1,3 +1,4 @@
|
|
|
|
1 |
import yaml
|
2 |
|
3 |
from model import Summarization
|
@@ -9,7 +10,7 @@ def evaluate_model():
|
|
9 |
"""
|
10 |
Evaluate model using rouge measure
|
11 |
"""
|
12 |
-
with open("
|
13 |
params = yaml.safe_load(f)
|
14 |
|
15 |
test_df = pd.read_csv("data/processed/test.csv")[:25]
|
@@ -17,8 +18,8 @@ def evaluate_model():
|
|
17 |
model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
|
18 |
results = model.evaluate(test_df=test_df, metrics=params["metric"])
|
19 |
|
20 |
-
with
|
21 |
-
|
22 |
|
23 |
|
24 |
if __name__ == "__main__":
|
|
|
1 |
+
from dagshub import dagshub_logger
|
2 |
import yaml
|
3 |
|
4 |
from model import Summarization
|
|
|
10 |
"""
|
11 |
Evaluate model using rouge measure
|
12 |
"""
|
13 |
+
with open("model_params.yml") as f:
|
14 |
params = yaml.safe_load(f)
|
15 |
|
16 |
test_df = pd.read_csv("data/processed/test.csv")[:25]
|
|
|
18 |
model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
|
19 |
results = model.evaluate(test_df=test_df, metrics=params["metric"])
|
20 |
|
21 |
+
with dagshub_logger(metrics_path='reports/evaluation_metrics.csv', should_log_hparams=False) as logger:
|
22 |
+
logger.log_metrics(results)
|
23 |
|
24 |
|
25 |
if __name__ == "__main__":
|
src/models/hf_upload.py
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import shutil
|
2 |
+
from getpass import getpass
|
3 |
+
from pathlib import Path
|
4 |
+
import yaml
|
5 |
+
|
6 |
+
from model import Summarization
|
7 |
+
from huggingface_hub import HfApi, Repository
|
8 |
+
|
9 |
+
|
10 |
+
def upload(model_to_upload, model_name):
|
11 |
+
hf_username = input("Enter your HuggingFace username:")
|
12 |
+
hf_token = getpass("Enter your HuggingFace token:")
|
13 |
+
model_url = HfApi().create_repo(token=hf_token, name=model_name, exist_ok=True)
|
14 |
+
model_repo = Repository(
|
15 |
+
"./hf_model",
|
16 |
+
clone_from=model_url,
|
17 |
+
use_auth_token=hf_token,
|
18 |
+
git_email=f"{hf_username}@users.noreply.huggingface.co",
|
19 |
+
git_user=hf_username,
|
20 |
+
)
|
21 |
+
|
22 |
+
del hf_token
|
23 |
+
readme_txt = f"""
|
24 |
+
---
|
25 |
+
Summarisation model {model_name}
|
26 |
+
""".strip()
|
27 |
+
|
28 |
+
(Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
|
29 |
+
commit_url = model_repo.push_to_hub()
|
30 |
+
|
31 |
+
print("Check out your model at:")
|
32 |
+
print(commit_url)
|
33 |
+
print(f"https://huggingface.co/{hf_username}/{model_name}")
|
34 |
+
|
35 |
+
if Path("./hf_model").exists():
|
36 |
+
shutil.rmtree("./hf_model")
|
37 |
+
|
38 |
+
|
39 |
+
if __name__ == "__main__":
|
40 |
+
with open("model_params.yml") as f:
|
41 |
+
params = yaml.safe_load(f)
|
42 |
+
|
43 |
+
model = Summarization()
|
44 |
+
model.load_model(model_dir="./models")
|
45 |
+
|
46 |
+
upload(model_to_upload=model, model_name=params["name"])
|
src/models/model.py
CHANGED
@@ -1,10 +1,7 @@
|
|
1 |
-
|
2 |
-
from getpass import getpass
|
3 |
-
from pathlib import Path
|
4 |
|
5 |
import torch
|
6 |
import pandas as pd
|
7 |
-
from huggingface_hub import HfApi, Repository
|
8 |
from transformers import (
|
9 |
AdamW,
|
10 |
T5ForConditionalGeneration,
|
@@ -15,7 +12,7 @@ from transformers import (
|
|
15 |
)
|
16 |
from torch.utils.data import Dataset, DataLoader
|
17 |
import pytorch_lightning as pl
|
18 |
-
from pytorch_lightning
|
19 |
from pytorch_lightning import Trainer
|
20 |
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
|
21 |
from pytorch_lightning import LightningDataModule
|
@@ -23,8 +20,7 @@ from pytorch_lightning import LightningModule
|
|
23 |
from datasets import load_metric
|
24 |
from tqdm.auto import tqdm
|
25 |
|
26 |
-
|
27 |
-
|
28 |
|
29 |
torch.cuda.empty_cache()
|
30 |
pl.seed_everything(42)
|
@@ -274,7 +270,9 @@ class LightningModel(LightningModule):
|
|
274 |
},
|
275 |
]
|
276 |
optimizer = AdamW(
|
277 |
-
optimizer_grouped_parameters,
|
|
|
|
|
278 |
)
|
279 |
self.opt = optimizer
|
280 |
return [optimizer]
|
@@ -364,14 +362,8 @@ class Summarization:
|
|
364 |
weight_decay=weight_decay,
|
365 |
)
|
366 |
|
367 |
-
|
368 |
-
|
369 |
-
tracking_uri="https://dagshub.com/gagan3012/summarization.mlflow",
|
370 |
-
)
|
371 |
-
|
372 |
-
WandLogger = WandbLogger(project="summarization-dagshub")
|
373 |
-
|
374 |
-
# logger = DAGsHubLogger(metrics_path='reports/training_metrics.txt')
|
375 |
|
376 |
early_stop_callback = (
|
377 |
[
|
@@ -390,14 +382,17 @@ class Summarization:
|
|
390 |
gpus = -1 if use_gpu and torch.cuda.is_available() else 0
|
391 |
|
392 |
trainer = Trainer(
|
393 |
-
logger=
|
394 |
callbacks=early_stop_callback,
|
395 |
max_epochs=max_epochs,
|
396 |
gpus=gpus,
|
397 |
progress_bar_refresh_rate=5,
|
398 |
)
|
399 |
|
400 |
-
|
|
|
|
|
|
|
401 |
|
402 |
def load_model(
|
403 |
self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
|
@@ -552,31 +547,3 @@ class Summarization:
|
|
552 |
"rougeLsum High F1": results["rougeLsum"].high.fmeasure,
|
553 |
}
|
554 |
return output
|
555 |
-
|
556 |
-
def upload(self, hf_username, model_name):
|
557 |
-
hf_password = getpass("Enter your HuggingFace password")
|
558 |
-
if Path("./models").exists():
|
559 |
-
shutil.rmtree("./models")
|
560 |
-
token = HfApi().login(username=hf_username, password=hf_password)
|
561 |
-
del hf_password
|
562 |
-
model_url = HfApi().create_repo(token=token, name=model_name, exist_ok=True)
|
563 |
-
model_repo = Repository(
|
564 |
-
"./model",
|
565 |
-
clone_from=model_url,
|
566 |
-
use_auth_token=token,
|
567 |
-
git_email=f"{hf_username}@users.noreply.huggingface.co",
|
568 |
-
git_user=hf_username,
|
569 |
-
)
|
570 |
-
|
571 |
-
readme_txt = f"""
|
572 |
-
---
|
573 |
-
Summarisation model {model_name}
|
574 |
-
""".strip()
|
575 |
-
|
576 |
-
(Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
|
577 |
-
self.save_model()
|
578 |
-
commit_url = model_repo.push_to_hub()
|
579 |
-
|
580 |
-
print("Check out your model at:")
|
581 |
-
print(commit_url)
|
582 |
-
print(f"https://huggingface.co/{hf_username}/{model_name}")
|
|
|
1 |
+
|
|
|
|
|
2 |
|
3 |
import torch
|
4 |
import pandas as pd
|
|
|
5 |
from transformers import (
|
6 |
AdamW,
|
7 |
T5ForConditionalGeneration,
|
|
|
12 |
)
|
13 |
from torch.utils.data import Dataset, DataLoader
|
14 |
import pytorch_lightning as pl
|
15 |
+
from dagshub.pytorch_lightning import DAGsHubLogger
|
16 |
from pytorch_lightning import Trainer
|
17 |
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
|
18 |
from pytorch_lightning import LightningDataModule
|
|
|
20 |
from datasets import load_metric
|
21 |
from tqdm.auto import tqdm
|
22 |
|
23 |
+
import mlflow.pytorch
|
|
|
24 |
|
25 |
torch.cuda.empty_cache()
|
26 |
pl.seed_everything(42)
|
|
|
270 |
},
|
271 |
]
|
272 |
optimizer = AdamW(
|
273 |
+
optimizer_grouped_parameters,
|
274 |
+
lr=self.learning_rate,
|
275 |
+
eps=self.adam_epsilon,
|
276 |
)
|
277 |
self.opt = optimizer
|
278 |
return [optimizer]
|
|
|
362 |
weight_decay=weight_decay,
|
363 |
)
|
364 |
|
365 |
+
logger = DAGsHubLogger(metrics_path='reports/training_metrics.csv',
|
366 |
+
hparams_path='reports/training_params.yml')
|
|
|
|
|
|
|
|
|
|
|
|
|
367 |
|
368 |
early_stop_callback = (
|
369 |
[
|
|
|
382 |
gpus = -1 if use_gpu and torch.cuda.is_available() else 0
|
383 |
|
384 |
trainer = Trainer(
|
385 |
+
logger=logger,
|
386 |
callbacks=early_stop_callback,
|
387 |
max_epochs=max_epochs,
|
388 |
gpus=gpus,
|
389 |
progress_bar_refresh_rate=5,
|
390 |
)
|
391 |
|
392 |
+
mlflow.pytorch.autolog(log_models=False)
|
393 |
+
|
394 |
+
with mlflow.start_run() as run:
|
395 |
+
trainer.fit(self.T5Model, self.data_module)
|
396 |
|
397 |
def load_model(
|
398 |
self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
|
|
|
547 |
"rougeLsum High F1": results["rougeLsum"].high.fmeasure,
|
548 |
}
|
549 |
return output
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/models/predict_model.py
CHANGED
@@ -8,11 +8,11 @@ def predict_model(text):
|
|
8 |
"""
|
9 |
Predict the summary of the given text.
|
10 |
"""
|
11 |
-
with open("
|
12 |
params = yaml.safe_load(f)
|
13 |
|
14 |
model = Summarization()
|
15 |
-
model.load_model(model_type=params["model_type"], model_dir=
|
16 |
pre_summary = model.predict(text)
|
17 |
return pre_summary
|
18 |
|
|
|
8 |
"""
|
9 |
Predict the summary of the given text.
|
10 |
"""
|
11 |
+
with open("model_params.yml") as f:
|
12 |
params = yaml.safe_load(f)
|
13 |
|
14 |
model = Summarization()
|
15 |
+
model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
|
16 |
pre_summary = model.predict(text)
|
17 |
return pre_summary
|
18 |
|
src/models/train_model.py
CHANGED
@@ -1,5 +1,3 @@
|
|
1 |
-
import json
|
2 |
-
|
3 |
import yaml
|
4 |
|
5 |
from model import Summarization
|
@@ -10,15 +8,15 @@ def train_model():
|
|
10 |
"""
|
11 |
Train the model
|
12 |
"""
|
13 |
-
with open("
|
14 |
params = yaml.safe_load(f)
|
15 |
|
16 |
# Load the data
|
17 |
train_df = pd.read_csv("data/processed/train.csv")
|
18 |
eval_df = pd.read_csv("data/processed/validation.csv")
|
19 |
|
20 |
-
train_df = train_df.sample(
|
21 |
-
eval_df = eval_df.sample(
|
22 |
|
23 |
model = Summarization()
|
24 |
model.from_pretrained(
|
@@ -37,15 +35,6 @@ def train_model():
|
|
37 |
|
38 |
model.save_model(model_dir=params["model_dir"])
|
39 |
|
40 |
-
with open("wandb/latest-run/files/wandb-summary.json") as json_file:
|
41 |
-
data = json.load(json_file)
|
42 |
-
|
43 |
-
with open("reports/training_metrics.txt", "w") as fp:
|
44 |
-
json.dump(data, fp)
|
45 |
-
|
46 |
-
if params["upload_to_hf"]:
|
47 |
-
model.upload(hf_username=params["hf_username"], model_name=params["name"])
|
48 |
-
|
49 |
|
50 |
if __name__ == "__main__":
|
51 |
train_model()
|
|
|
|
|
|
|
1 |
import yaml
|
2 |
|
3 |
from model import Summarization
|
|
|
8 |
"""
|
9 |
Train the model
|
10 |
"""
|
11 |
+
with open("model_params.yml") as f:
|
12 |
params = yaml.safe_load(f)
|
13 |
|
14 |
# Load the data
|
15 |
train_df = pd.read_csv("data/processed/train.csv")
|
16 |
eval_df = pd.read_csv("data/processed/validation.csv")
|
17 |
|
18 |
+
train_df = train_df.sample(random_state=1)
|
19 |
+
eval_df = eval_df.sample(random_state=1)
|
20 |
|
21 |
model = Summarization()
|
22 |
model.from_pretrained(
|
|
|
35 |
|
36 |
model.save_model(model_dir=params["model_dir"])
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
if __name__ == "__main__":
|
40 |
train_model()
|
src/visualization/__init__.py
DELETED
File without changes
|
src/visualization/visualize.py
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
import streamlit as st
|
2 |
-
import yaml
|
3 |
|
4 |
from src.models.predict_model import predict_model
|
5 |
|
@@ -19,14 +18,7 @@ def visualize():
|
|
19 |
sumtext = predict_model(text=text)
|
20 |
st.write("# Generated Summary:")
|
21 |
st.write("{}".format(sumtext))
|
22 |
-
with open("reports/visualization_metrics.txt", "w") as file1:
|
23 |
-
file1.writelines(text)
|
24 |
-
file1.writelines(sumtext)
|
25 |
|
26 |
|
27 |
if __name__ == "__main__":
|
28 |
-
|
29 |
-
params = yaml.safe_load(f)
|
30 |
-
|
31 |
-
if params["visualise"]:
|
32 |
-
visualize()
|
|
|
1 |
import streamlit as st
|
|
|
2 |
|
3 |
from src.models.predict_model import predict_model
|
4 |
|
|
|
18 |
sumtext = predict_model(text=text)
|
19 |
st.write("# Generated Summary:")
|
20 |
st.write("{}".format(sumtext))
|
|
|
|
|
|
|
21 |
|
22 |
|
23 |
if __name__ == "__main__":
|
24 |
+
visualize()
|
|
|
|
|
|
|
|