gagan3012 commited on
Commit
a74c595
·
2 Parent(s): a1c754b 09be2fb

Merge branch 'fix-mlflow' of Dean/summarization into master

Browse files
.github/CODE_OF_CONDUCT.md DELETED
@@ -1,128 +0,0 @@
1
- # Contributor Covenant Code of Conduct
2
-
3
- ## Our Pledge
4
-
5
- We as members, contributors, and leaders pledge to make participation in our
6
- community a harassment-free experience for everyone, regardless of age, body
7
- size, visible or invisible disability, ethnicity, sex characteristics, gender
8
- identity and expression, level of experience, education, socio-economic status,
9
- nationality, personal appearance, race, religion, or sexual identity
10
- and orientation.
11
-
12
- We pledge to act and interact in ways that contribute to an open, welcoming,
13
- diverse, inclusive, and healthy community.
14
-
15
- ## Our Standards
16
-
17
- Examples of behavior that contributes to a positive environment for our
18
- community include:
19
-
20
- * Demonstrating empathy and kindness toward other people
21
- * Being respectful of differing opinions, viewpoints, and experiences
22
- * Giving and gracefully accepting constructive feedback
23
- * Accepting responsibility and apologizing to those affected by our mistakes,
24
- and learning from the experience
25
- * Focusing on what is best not just for us as individuals, but for the
26
- overall community
27
-
28
- Examples of unacceptable behavior include:
29
-
30
- * The use of sexualized language or imagery, and sexual attention or
31
- advances of any kind
32
- * Trolling, insulting or derogatory comments, and personal or political attacks
33
- * Public or private harassment
34
- * Publishing others' private information, such as a physical or email
35
- address, without their explicit permission
36
- * Other conduct which could reasonably be considered inappropriate in a
37
- professional setting
38
-
39
- ## Enforcement Responsibilities
40
-
41
- Community leaders are responsible for clarifying and enforcing our standards of
42
- acceptable behavior and will take appropriate and fair corrective action in
43
- response to any behavior that they deem inappropriate, threatening, offensive,
44
- or harmful.
45
-
46
- Community leaders have the right and responsibility to remove, edit, or reject
47
- comments, commits, code, wiki edits, issues, and other contributions that are
48
- not aligned to this Code of Conduct, and will communicate reasons for moderation
49
- decisions when appropriate.
50
-
51
- ## Scope
52
-
53
- This Code of Conduct applies within all community spaces, and also applies when
54
- an individual is officially representing the community in public spaces.
55
- Examples of representing our community include using an official e-mail address,
56
- posting via an official social media account, or acting as an appointed
57
- representative at an online or offline event.
58
-
59
- ## Enforcement
60
-
61
- Instances of abusive, harassing, or otherwise unacceptable behavior may be
62
- reported to the community leaders responsible for enforcement at
63
- @gagan3012.
64
- All complaints will be reviewed and investigated promptly and fairly.
65
-
66
- All community leaders are obligated to respect the privacy and security of the
67
- reporter of any incident.
68
-
69
- ## Enforcement Guidelines
70
-
71
- Community leaders will follow these Community Impact Guidelines in determining
72
- the consequences for any action they deem in violation of this Code of Conduct:
73
-
74
- ### 1. Correction
75
-
76
- **Community Impact**: Use of inappropriate language or other behavior deemed
77
- unprofessional or unwelcome in the community.
78
-
79
- **Consequence**: A private, written warning from community leaders, providing
80
- clarity around the nature of the violation and an explanation of why the
81
- behavior was inappropriate. A public apology may be requested.
82
-
83
- ### 2. Warning
84
-
85
- **Community Impact**: A violation through a single incident or series
86
- of actions.
87
-
88
- **Consequence**: A warning with consequences for continued behavior. No
89
- interaction with the people involved, including unsolicited interaction with
90
- those enforcing the Code of Conduct, for a specified period of time. This
91
- includes avoiding interactions in community spaces as well as external channels
92
- like social media. Violating these terms may lead to a temporary or
93
- permanent ban.
94
-
95
- ### 3. Temporary Ban
96
-
97
- **Community Impact**: A serious violation of community standards, including
98
- sustained inappropriate behavior.
99
-
100
- **Consequence**: A temporary ban from any sort of interaction or public
101
- communication with the community for a specified period of time. No public or
102
- private interaction with the people involved, including unsolicited interaction
103
- with those enforcing the Code of Conduct, is allowed during this period.
104
- Violating these terms may lead to a permanent ban.
105
-
106
- ### 4. Permanent Ban
107
-
108
- **Community Impact**: Demonstrating a pattern of violation of community
109
- standards, including sustained inappropriate behavior, harassment of an
110
- individual, or aggression toward or disparagement of classes of individuals.
111
-
112
- **Consequence**: A permanent ban from any sort of public interaction within
113
- the community.
114
-
115
- ## Attribution
116
-
117
- This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118
- version 2.0, available at
119
- https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120
-
121
- Community Impact Guidelines were inspired by [Mozilla's code of conduct
122
- enforcement ladder](https://github.com/mozilla/diversity).
123
-
124
- [homepage]: https://www.contributor-covenant.org
125
-
126
- For answers to common questions about this code of conduct, see the FAQ at
127
- https://www.contributor-covenant.org/faq. Translations are available at
128
- https://www.contributor-covenant.org/translations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.github/CONTRIBUTING.md DELETED
@@ -1,92 +0,0 @@
1
- # Contributing
2
-
3
- When contributing to this repository, please first discuss the change you wish to make via issue,
4
- email, or any other method with the owners of this repository before making a change.
5
-
6
- Please note we have a code of conduct, please follow it in all your interactions with the project.
7
-
8
- ## Pull Request Process
9
-
10
- 1. Ensure any install or build dependencies are removed before the end of the layer when doing a
11
- build.
12
- 2. Update the README.md with details of changes to the interface, this includes new environment
13
- variables, exposed ports, useful file locations and container parameters.
14
- 3. Increase the version numbers in any examples files and the README.md to the new version that this
15
- Pull Request would represent. The versioning scheme we use is [SemVer](http://semver.org/).
16
- 4. You may merge the Pull Request in once you have the sign-off of two other developers, or if you
17
- do not have permission to do that, you may request the second reviewer to merge it for you.
18
-
19
- ## Code of Conduct
20
-
21
- ### Our Pledge
22
-
23
- In the interest of fostering an open and welcoming environment, we as
24
- contributors and maintainers pledge to making participation in our project and
25
- our community a harassment-free experience for everyone, regardless of age, body
26
- size, disability, ethnicity, gender identity and expression, level of experience,
27
- nationality, personal appearance, race, religion, or sexual identity and
28
- orientation.
29
-
30
- ### Our Standards
31
-
32
- Examples of behavior that contributes to creating a positive environment
33
- include:
34
-
35
- * Using welcoming and inclusive language
36
- * Being respectful of differing viewpoints and experiences
37
- * Gracefully accepting constructive criticism
38
- * Focusing on what is best for the community
39
- * Showing empathy towards other community members
40
-
41
- Examples of unacceptable behavior by participants include:
42
-
43
- * The use of sexualized language or imagery and unwelcome sexual attention or
44
- advances
45
- * Trolling, insulting/derogatory comments, and personal or political attacks
46
- * Public or private harassment
47
- * Publishing others' private information, such as a physical or electronic
48
- address, without explicit permission
49
- * Other conduct which could reasonably be considered inappropriate in a
50
- professional setting
51
-
52
- ### Our Responsibilities
53
-
54
- Project maintainers are responsible for clarifying the standards of acceptable
55
- behavior and are expected to take appropriate and fair corrective action in
56
- response to any instances of unacceptable behavior.
57
-
58
- Project maintainers have the right and responsibility to remove, edit, or
59
- reject comments, commits, code, wiki edits, issues, and other contributions
60
- that are not aligned to this Code of Conduct, or to ban temporarily or
61
- permanently any contributor for other behaviors that they deem inappropriate,
62
- threatening, offensive, or harmful.
63
-
64
- ### Scope
65
-
66
- This Code of Conduct applies both within project spaces and in public spaces
67
- when an individual is representing the project or its community. Examples of
68
- representing a project or community include using an official project e-mail
69
- address, posting via an official social media account, or acting as an appointed
70
- representative at an online or offline event. Representation of a project may be
71
- further defined and clarified by project maintainers.
72
-
73
- ### Enforcement
74
-
75
- Instances of abusive, harassing, or otherwise unacceptable behavior may be
76
- reported by contacting the project team at [INSERT EMAIL ADDRESS]. All
77
- complaints will be reviewed and investigated and will result in a response that
78
- is deemed necessary and appropriate to the circumstances. The project team is
79
- obligated to maintain confidentiality with regard to the reporter of an incident.
80
- Further details of specific enforcement policies may be posted separately.
81
-
82
- Project maintainers who do not follow or enforce the Code of Conduct in good
83
- faith may face temporary or permanent repercussions as determined by other
84
- members of the project's leadership.
85
-
86
- ### Attribution
87
-
88
- This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
89
- available at [http://contributor-covenant.org/version/1/4][version]
90
-
91
- [homepage]: http://contributor-covenant.org
92
- [version]: http://contributor-covenant.org/version/1/4/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.github/FUNDING.yml DELETED
@@ -1,12 +0,0 @@
1
- # These are supported funding model platforms
2
-
3
- github: gagan3012 # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
4
- patreon: # Replace with a single Patreon username
5
- open_collective: # Replace with a single Open Collective username
6
- ko_fi: # Replace with a single Ko-fi username
7
- tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
8
- community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
9
- liberapay: # Replace with a single Liberapay username
10
- issuehunt: # Replace with a single IssueHunt username
11
- otechie: # Replace with a single Otechie username
12
- custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
 
 
 
 
 
 
 
 
 
 
 
 
 
.github/ISSUE_TEMPLATE/bug_report.md DELETED
@@ -1,38 +0,0 @@
1
- ---
2
- name: Bug report
3
- about: Create a report to help us improve
4
- title: ''
5
- labels: ''
6
- assignees: ''
7
-
8
- ---
9
-
10
- **Describe the bug**
11
- A clear and concise description of what the bug is.
12
-
13
- **To Reproduce**
14
- Steps to reproduce the behavior:
15
- 1. Go to '...'
16
- 2. Click on '....'
17
- 3. Scroll down to '....'
18
- 4. See error
19
-
20
- **Expected behavior**
21
- A clear and concise description of what you expected to happen.
22
-
23
- **Screenshots**
24
- If applicable, add screenshots to help explain your problem.
25
-
26
- **Desktop (please complete the following information):**
27
- - OS: [e.g. iOS]
28
- - Browser [e.g. chrome, safari]
29
- - Version [e.g. 22]
30
-
31
- **Smartphone (please complete the following information):**
32
- - Device: [e.g. iPhone6]
33
- - OS: [e.g. iOS8.1]
34
- - Browser [e.g. stock browser, safari]
35
- - Version [e.g. 22]
36
-
37
- **Additional context**
38
- Add any other context about the problem here.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.github/ISSUE_TEMPLATE/feature_request.md DELETED
@@ -1,20 +0,0 @@
1
- ---
2
- name: Feature request
3
- about: Suggest an idea for this project
4
- title: ''
5
- labels: ''
6
- assignees: ''
7
-
8
- ---
9
-
10
- **Is your feature request related to a problem? Please describe.**
11
- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12
-
13
- **Describe the solution you'd like**
14
- A clear and concise description of what you want to happen.
15
-
16
- **Describe alternatives you've considered**
17
- A clear and concise description of any alternative solutions or features you've considered.
18
-
19
- **Additional context**
20
- Add any other context or screenshots about the feature request here.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.github/PULL_REQUEST_TEMPLATE.md DELETED
@@ -1,29 +0,0 @@
1
- <!--- Provide a general summary of your changes in the Title above -->
2
-
3
- ## Description
4
- <!--- Describe your changes in detail -->
5
-
6
- ## Motivation and Context
7
- <!--- Why is this change required? What problem does it solve? -->
8
- <!--- If it fixes an open issue, please link to the issue here. -->
9
-
10
- ## How Has This Been Tested?
11
- <!--- Please describe in detail how you tested your changes. -->
12
- <!--- Include details of your testing environment, and the tests you ran to -->
13
- <!--- see how your change affects other areas of the code, etc. -->
14
-
15
- ## Screenshots (if appropriate):
16
-
17
- ## Types of changes
18
- <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
19
- - [ ] Bug fix (non-breaking change which fixes an issue)
20
- - [ ] New feature (non-breaking change which adds functionality)
21
- - [ ] Breaking change (fix or feature that would cause existing functionality to change)
22
-
23
- ## Checklist:
24
- <!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
25
- <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
26
- - [ ] My code follows the code style of this project.
27
- - [ ] My change requires a change to the documentation.
28
- - [ ] I have updated the documentation accordingly.
29
- - [ ] I have read the **CONTRIBUTING** document.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.gitignore CHANGED
@@ -93,6 +93,9 @@ coverage.xml
93
  .vscode
94
  /data
95
 
96
- wandb/
97
  summarization-dagshub/
98
  /models
 
 
 
 
 
93
  .vscode
94
  /data
95
 
 
96
  summarization-dagshub/
97
  /models
98
+ default/
99
+ artifacts/
100
+ mlruns/
101
+ hf_model/
Makefile CHANGED
@@ -48,7 +48,15 @@ pull:
48
 
49
  ## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
50
  run:
51
- dvc repro dvc.yaml
 
 
 
 
 
 
 
 
52
 
53
  #################################################################################
54
  # PROJECT RULES #
 
48
 
49
  ## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
50
  run:
51
+ dvc repro eval
52
+
53
+ ## run the visualization using Streamlit
54
+ visualize:
55
+ dvc repro visualize
56
+
57
+ ## push the trained model to HF model hub
58
+ push_to_hf_hub:
59
+ dvc repro push_to_hf_hub
60
 
61
  #################################################################################
62
  # PROJECT RULES #
app.py DELETED
@@ -1,32 +0,0 @@
1
- import streamlit as st
2
- import yaml
3
-
4
- from src.models.predict_model import predict_model
5
-
6
-
7
- def visualize():
8
- st.write("# Summarization UI")
9
- st.markdown(
10
- """
11
- *For additional questions and inquiries, please contact **Gagan Bhatia** via [LinkedIn](
12
- https://www.linkedin.com/in/gbhatia30/) or [Github](https://github.com/gagan3012).*
13
- """
14
- )
15
-
16
- text = st.text_area("Enter text here")
17
- if st.button("Generate Summary"):
18
- with st.spinner("Connecting the Dots..."):
19
- sumtext = predict_model(text=text)
20
- st.write("# Generated Summary:")
21
- st.write("{}".format(sumtext))
22
- with open("reports/visualization_metrics.txt", "w") as file1:
23
- file1.writelines(text)
24
- file1.writelines(sumtext)
25
-
26
-
27
- if __name__ == "__main__":
28
- with open("params.yml") as f:
29
- params = yaml.safe_load(f)
30
-
31
- if params["visualise"]:
32
- visualize()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
data.dvc DELETED
@@ -1,14 +0,0 @@
1
- deps:
2
- - path: params.yml
3
- md5: d0f3e81bc9191e752a69761045a449d9
4
- size: 196
5
- - path: src/data/make_dataset.py
6
- md5: 9de71de0f8df5d0a7beb235ef7c7777d
7
- size: 772
8
- cmd: python src/data/make_dataset.py
9
- outs:
10
- - md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
11
- nfiles: 3
12
- path: data/raw
13
- size: 1358833013
14
- md5: ff502232006c7fbef1015b5aa5cc4bbb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
data_params.yml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ data: cnn_dailymail
2
+ split: 0.001
dvc.lock CHANGED
@@ -4,65 +4,102 @@ stages:
4
  cmd: python src/models/train_model.py
5
  deps:
6
  - path: data/processed/train.csv
7
- md5: 51edd724b75a8e99a78b9138f8f37c60
8
- size: 25012573
9
  - path: data/processed/validation.csv
10
- md5: 0900e2bb330df94cb045faddd0b945d1
11
- size: 1138285
12
- - path: params.yml
13
- md5: d0f3e81bc9191e752a69761045a449d9
14
- size: 196
15
  - path: src/models/train_model.py
16
- md5: fca8acf70f09cecd679ca1ddb2eef6a9
17
- size: 1198
18
  outs:
19
  - path: models
20
- md5: 688745a9fb1cc7c8580887bae3873a39.dir
21
- size: 486952666
22
- nfiles: 10
23
- - path: reports/training_metrics.txt
24
- md5: 048a956b0eb431535d287bbc3322cf76
25
- size: 158
26
  eval:
27
  cmd: python src/models/evaluate_model.py
28
  deps:
29
  - path: data/processed/test.csv
30
- md5: 3cb7b63891f12d53b3ef3e81a2e93f8e
31
- size: 986944
 
 
 
32
  - path: models
33
- md5: 688745a9fb1cc7c8580887bae3873a39.dir
34
- size: 486952666
35
- nfiles: 10
36
- - path: params.yml
37
- md5: d0f3e81bc9191e752a69761045a449d9
38
- size: 196
39
  - path: src/models/evaluate_model.py
40
- md5: aa01b1564d737fef54ae45d25c5018d1
41
- size: 615
42
  outs:
43
- - path: reports/metrics.txt
44
- md5: 27d21366dca75caa1bb3777575cb126b
45
- size: 1596
46
  process_data:
47
  cmd: python src/data/process_data.py
48
  deps:
49
  - path: data/raw
50
- md5: d751713988987e9331980363e24189ce.dir
51
- size: 0
52
- nfiles: 0
53
- - path: params.yml
54
- md5: d0f3e81bc9191e752a69761045a449d9
55
- size: 196
56
  - path: src/data/process_data.py
57
- md5: ba3ba7b7c8a905b736b6b0a28d2334c4
58
- size: 623
59
  outs:
60
  - path: data/processed/test.csv
61
- md5: 3cb7b63891f12d53b3ef3e81a2e93f8e
62
- size: 986944
63
  - path: data/processed/train.csv
64
- md5: 51edd724b75a8e99a78b9138f8f37c60
65
- size: 25012573
66
  - path: data/processed/validation.csv
67
- md5: 0900e2bb330df94cb045faddd0b945d1
68
- size: 1138285
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  cmd: python src/models/train_model.py
5
  deps:
6
  - path: data/processed/train.csv
7
+ md5: 5331b9c32b2d097d8d7aca01de5524bc
8
+ size: 1198262
9
  - path: data/processed/validation.csv
10
+ md5: 6069153a075b00dfb6d9e0843dd2da89
11
+ size: 52739
12
+ - path: model_params.yml
13
+ md5: 1bf2edf25e851cc9cd3be75fbd9905a3
14
+ size: 177
15
  - path: src/models/train_model.py
16
+ md5: f7d1121426c3d5530c2b9697cb7ac74a
17
+ size: 951
18
  outs:
19
  - path: models
20
+ md5: fc37870a93db61b94af9f0847577f09b.dir
21
+ size: 243476333
22
+ nfiles: 5
23
+ - path: reports/training_metrics.csv
24
+ md5: 3b309def91a32e521acd23b163742522
25
+ size: 320
26
  eval:
27
  cmd: python src/models/evaluate_model.py
28
  deps:
29
  - path: data/processed/test.csv
30
+ md5: 3eec94ac211c76363a3d968663b82d02
31
+ size: 39574
32
+ - path: model_params.yml
33
+ md5: 1bf2edf25e851cc9cd3be75fbd9905a3
34
+ size: 177
35
  - path: models
36
+ md5: fc37870a93db61b94af9f0847577f09b.dir
37
+ size: 243476333
38
+ nfiles: 5
 
 
 
39
  - path: src/models/evaluate_model.py
40
+ md5: 89edb77aaab3055605ae6db2e21eab82
41
+ size: 705
42
  outs:
43
+ - path: reports/evaluation_metrics.csv
44
+ md5: eaa3bf017026aa1be31560f308fff78e
45
+ size: 2122
46
  process_data:
47
  cmd: python src/data/process_data.py
48
  deps:
49
  - path: data/raw
50
+ md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
51
+ size: 1358833013
52
+ nfiles: 3
53
+ - path: data_params.yml
54
+ md5: a68eabf79c3b3e28afb05baa1944bbc7
55
+ size: 32
56
  - path: src/data/process_data.py
57
+ md5: 68db554a69a0c8ce807907afa2be5e9c
58
+ size: 521
59
  outs:
60
  - path: data/processed/test.csv
61
+ md5: 3eec94ac211c76363a3d968663b82d02
62
+ size: 39574
63
  - path: data/processed/train.csv
64
+ md5: 5331b9c32b2d097d8d7aca01de5524bc
65
+ size: 1198262
66
  - path: data/processed/validation.csv
67
+ md5: 6069153a075b00dfb6d9e0843dd2da89
68
+ size: 52739
69
+ download_data:
70
+ cmd: python src/data/make_dataset.py
71
+ deps:
72
+ - path: data_params.yml
73
+ md5: a68eabf79c3b3e28afb05baa1944bbc7
74
+ size: 32
75
+ - path: src/data/make_dataset.py
76
+ md5: a0667f4ad8c06551609bd0bf950167b7
77
+ size: 776
78
+ outs:
79
+ - path: data/raw
80
+ md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
81
+ size: 1358833013
82
+ nfiles: 3
83
+ visualize:
84
+ cmd: streamlit run src/visualization/visualize.py
85
+ deps:
86
+ - path: models
87
+ md5: fc37870a93db61b94af9f0847577f09b.dir
88
+ size: 243476333
89
+ nfiles: 5
90
+ - path: src/visualization/visualize.py
91
+ md5: 4226e4148abb5ac186c0ab8c1d87b228
92
+ size: 671
93
+ push_to_hf_hub:
94
+ cmd: python src/models/hf_upload.py
95
+ deps:
96
+ - path: model_params.yml
97
+ md5: 1bf2edf25e851cc9cd3be75fbd9905a3
98
+ size: 177
99
+ - path: models
100
+ md5: fc37870a93db61b94af9f0847577f09b.dir
101
+ size: 243476333
102
+ nfiles: 5
103
+ - path: src/models/hf_upload.py
104
+ md5: a953816a3eb7bef702313544103a1c11
105
+ size: 1290
dvc.yaml CHANGED
@@ -1,8 +1,15 @@
1
  stages:
 
 
 
 
 
 
 
2
  process_data:
3
  cmd: python src/data/process_data.py
4
  deps:
5
- - params.yml
6
  - data/raw
7
  - src/data/process_data.py
8
  outs:
@@ -18,7 +25,7 @@ stages:
18
  train:
19
  cmd: python src/models/train_model.py
20
  deps:
21
- - params.yml
22
  - data/processed/train.csv
23
  - data/processed/validation.csv
24
  - src/models/train_model.py
@@ -26,25 +33,27 @@ stages:
26
  - models:
27
  persist: true
28
  metrics:
29
- - reports/training_metrics.txt:
30
  cache: false
31
  eval:
32
  cmd: python src/models/evaluate_model.py
33
  deps:
34
- - params.yml
35
  - data/processed/test.csv
36
  - models
37
  - src/models/evaluate_model.py
38
  metrics:
39
- - reports/evaluation_metrics.txt:
40
  cache: false
41
  visualize:
42
  cmd: streamlit run src/visualization/visualize.py
43
  deps:
44
  - models
45
  - src/visualization/visualize.py
46
- - params.yml
47
- metrics:
48
- - reports/visualization_metrics.txt:
49
- cache: false
 
 
50
 
 
1
  stages:
2
+ download_data:
3
+ cmd: python src/data/make_dataset.py
4
+ deps:
5
+ - data_params.yml
6
+ - src/data/make_dataset.py
7
+ outs:
8
+ - data/raw
9
  process_data:
10
  cmd: python src/data/process_data.py
11
  deps:
12
+ - data_params.yml
13
  - data/raw
14
  - src/data/process_data.py
15
  outs:
 
25
  train:
26
  cmd: python src/models/train_model.py
27
  deps:
28
+ - model_params.yml
29
  - data/processed/train.csv
30
  - data/processed/validation.csv
31
  - src/models/train_model.py
 
33
  - models:
34
  persist: true
35
  metrics:
36
+ - reports/training_metrics.csv:
37
  cache: false
38
  eval:
39
  cmd: python src/models/evaluate_model.py
40
  deps:
41
+ - model_params.yml
42
  - data/processed/test.csv
43
  - models
44
  - src/models/evaluate_model.py
45
  metrics:
46
+ - reports/evaluation_metrics.csv:
47
  cache: false
48
  visualize:
49
  cmd: streamlit run src/visualization/visualize.py
50
  deps:
51
  - models
52
  - src/visualization/visualize.py
53
+ push_to_hf_hub:
54
+ cmd: python src/models/hf_upload.py
55
+ deps:
56
+ - model_params.yml
57
+ - src/models/hf_upload.py
58
+ - models
59
 
params.yml → model_params.yml RENAMED
@@ -1,16 +1,11 @@
1
  name: summarsiation
2
- data: cnn_dailymail
3
- batch_size: 2
4
- num_workers: 2
5
  model_type: t5
6
  model_name: t5-small
7
- learning_rate: 1e-4
8
  epochs: 5
9
- source_dir: src
 
 
10
  model_dir: models
11
  metric: rouge
12
- split: 0.001
13
- use_gpu: True
14
- visualise: True
15
- hf_username: gagan3012
16
- upload_to_hf: True
 
1
  name: summarsiation
 
 
 
2
  model_type: t5
3
  model_name: t5-small
4
+ batch_size: 2
5
  epochs: 5
6
+ use_gpu: True
7
+ learning_rate: 1e-4
8
+ num_workers: 2
9
  model_dir: models
10
  metric: rouge
11
+ source_dir: src
 
 
 
 
reports/evaluation_metrics.csv ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Name,Value,Timestamp,Step
2
+ "Rouge_1 Low Precision",0.23786550570641482,1628587253223,1
3
+ "Rouge_1 Low recall",0.23355396379384713,1628587253223,1
4
+ "Rouge_1 Low F1",0.23602599457077003,1628587253223,1
5
+ "Rouge_1 Mid Precision",0.3569471852499436,1628587253223,1
6
+ "Rouge_1 Mid recall",0.31915939075819916,1628587253223,1
7
+ "Rouge_1 Mid F1",0.3317618573023773,1628587253223,1
8
+ "Rouge_1 High Precision",0.4726861301480842,1628587253223,1
9
+ "Rouge_1 High recall",0.4019654200001146,1628587253223,1
10
+ "Rouge_1 High F1",0.4298956952594035,1628587253223,1
11
+ "Rouge_2 Low Precision",0.06184772400193972,1628587253223,1
12
+ "Rouge_2 Low recall",0.05626972412346313,1628587253223,1
13
+ "Rouge_2 Low F1",0.058680298802341754,1628587253223,1
14
+ "Rouge_2 Mid Precision",0.1367034298993256,1628587253223,1
15
+ "Rouge_2 Mid recall",0.11953160646342464,1628587253223,1
16
+ "Rouge_2 Mid F1",0.12485064123505887,1628587253223,1
17
+ "Rouge_2 High Precision",0.22739029631016827,1628587253223,1
18
+ "Rouge_2 High recall",0.18851628169809986,1628587253223,1
19
+ "Rouge_2 High F1",0.20306657551189072,1628587253223,1
20
+ "Rouge_L Low Precision",0.18248956154159507,1628587253223,1
21
+ "Rouge_L Low recall",0.18048774357814204,1628587253223,1
22
+ "Rouge_L Low F1",0.18151380309623336,1628587253223,1
23
+ "Rouge_L Mid Precision",0.2614974838710314,1628587253223,1
24
+ "Rouge_L Mid recall",0.24286688705755238,1628587253223,1
25
+ "Rouge_L Mid F1",0.24674586991996245,1628587253223,1
26
+ "Rouge_L High Precision",0.3574471638807763,1628587253223,1
27
+ "Rouge_L High recall",0.30836083808542225,1628587253223,1
28
+ "Rouge_L High F1",0.32385446385474176,1628587253223,1
29
+ "rougeLsum Low Precision",0.21468633089019287,1628587253223,1
30
+ "rougeLsum Low recall",0.2057771050364415,1628587253223,1
31
+ "rougeLsum Low F1",0.21170611912426093,1628587253223,1
32
+ "rougeLsum Mid Precision",0.3060593850789648,1628587253223,1
33
+ "rougeLsum Mid recall",0.27733553744690076,1628587253223,1
34
+ "rougeLsum Mid F1",0.28530501988436374,1628587253223,1
35
+ "rougeLsum High Precision",0.4094614601758424,1628587253223,1
36
+ "rougeLsum High recall",0.34640369291505535,1628587253223,1
37
+ "rougeLsum High F1",0.36454440079714096,1628587253223,1
reports/evaluation_metrics.txt DELETED
@@ -1 +0,0 @@
1
- {"Rouge 1": {"Rouge_1 Low Precision": 0.34885388166790793, "Rouge_1 Low recall": 0.28871556132198656, "Rouge_1 Low F1": 0.31058637096822267, "Rouge_1 Mid Precision": 0.412435004251884, "Rouge_1 Mid recall": 0.3386352228897427, "Rouge_1 Mid F1": 0.3517931748124066, "Rouge_1 High Precision": 0.47625451117848977, "Rouge_1 High recall": 0.39086727645312935, "Rouge_1 High F1": 0.3959993953753958}, "Rouge 2": {"Rouge_2 Low Precision": 0.1259156300716482, "Rouge_2 Low recall": 0.10333119800163641, "Rouge_2 Low F1": 0.10992592662502373, "Rouge_2 Mid Precision": 0.16879303949162833, "Rouge_2 Mid recall": 0.13805319188028575, "Rouge_2 Mid F1": 0.14400796293585816, "Rouge_2 High Precision": 0.21844214485938712, "Rouge_2 High recall": 0.1777722350788, "Rouge_2 High F1": 0.18342627795315522}, "Rouge L": {"Rouge_L Low Precision": 0.2322041975032734, "Rouge_L Low recall": 0.194000575085051, "Rouge_L Low F1": 0.20468107864660212, "Rouge_L Mid Precision": 0.2797360675037497, "Rouge_L Mid recall": 0.22647774162854406, "Rouge_L Mid F1": 0.2361293941929179, "Rouge_L High Precision": 0.3357160682858357, "Rouge_L High recall": 0.2622222798536235, "Rouge_L High F1": 0.27267217209978356}, "rougeLsum": {"rougeLsum Low Precision": 0.29651536760563263, "rougeLsum Low recall": 0.2432094838451322, "rougeLsum Low F1": 0.26048483356867896, "rougeLsum Mid Precision": 0.35317671791338556, "rougeLsum Mid recall": 0.286187817596869, "rougeLsum Mid F1": 0.2985727815225495, "rougeLsum High Precision": 0.4134539668577922, "rougeLsum High recall": 0.3365998852405162, "rougeLsum High F1": 0.3454898564714797}}
 
 
reports/training_metrics.csv ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ Name,Value,Timestamp,Step
2
+ "val_loss",2.615034580230713,1628591864766,0
3
+ "epoch",0,1628591864766,0
4
+ "val_loss",2.6141018867492676,1628591893945,1
5
+ "epoch",1,1628591893945,1
6
+ "val_loss",2.6132164001464844,1628591923101,2
7
+ "epoch",2,1628591923101,2
8
+ "val_loss",2.612450361251831,1628591951319,3
9
+ "epoch",3,1628591951319,3
reports/training_metrics.txt DELETED
@@ -1 +0,0 @@
1
- {"train_loss": 2.785480260848999, "epoch": 4, "trainer/global_step": 289, "_runtime": 88, "_timestamp": 1627353229, "_step": 9, "val_loss": 2.181020975112915}
 
 
reports/training_params.yml ADDED
@@ -0,0 +1 @@
 
 
1
+ status: success
reports/visualization_metrics.txt DELETED
File without changes
requirements.txt CHANGED
@@ -3,13 +3,13 @@ datasets==1.10.2
3
  pytorch_lightning==1.3.5
4
  transformers==4.9.0
5
  torch==1.9.0
6
- dagshub==0.1.6
7
  pandas==1.1.5
8
- rouge_score
 
 
 
9
  pyyaml
10
- dvc
11
- mlflow
12
- wandb
13
 
14
  # external requirements
15
  click
 
3
  pytorch_lightning==1.3.5
4
  transformers==4.9.0
5
  torch==1.9.0
6
+ dagshub==0.1.7
7
  pandas==1.1.5
8
+ rouge_score==0.0.4
9
+ dvc==2.5.4
10
+ mlflow==1.19.0
11
+ streamlit==0.85.1
12
  pyyaml
 
 
 
13
 
14
  # external requirements
15
  click
src/data/__init__.py DELETED
File without changes
src/data/make_dataset.py CHANGED
@@ -17,7 +17,7 @@ def make_dataset(dataset="cnn_dailymail", split="train"):
17
 
18
 
19
  if __name__ == "__main__":
20
- with open("params.yml") as f:
21
  params = yaml.safe_load(f)
22
  pprint.pprint(params)
23
  make_dataset(dataset=params["data"], split="train")
 
17
 
18
 
19
  if __name__ == "__main__":
20
+ with open("data_params.yml") as f:
21
  params = yaml.safe_load(f)
22
  pprint.pprint(params)
23
  make_dataset(dataset=params["data"], split="train")
src/data/process_data.py CHANGED
@@ -5,14 +5,12 @@ import os
5
 
6
  def process_data(split="train"):
7
 
8
- with open("params.yml") as f:
9
  params = yaml.safe_load(f)
10
 
11
  df = pd.read_csv("data/raw/{}.csv".format(split))
12
  df.columns = ["Unnamed: 0", "input_text", "output_text"]
13
  df = df.sample(frac=params["split"], replace=True, random_state=1)
14
- if os.path.exists("data/raw/{}.csv".format(split)):
15
- os.remove("data/raw/{}.csv".format(split))
16
  df.to_csv("data/processed/{}.csv".format(split))
17
 
18
 
 
5
 
6
  def process_data(split="train"):
7
 
8
+ with open("data_params.yml") as f:
9
  params = yaml.safe_load(f)
10
 
11
  df = pd.read_csv("data/raw/{}.csv".format(split))
12
  df.columns = ["Unnamed: 0", "input_text", "output_text"]
13
  df = df.sample(frac=params["split"], replace=True, random_state=1)
 
 
14
  df.to_csv("data/processed/{}.csv".format(split))
15
 
16
 
src/models/evaluate_model.py CHANGED
@@ -1,3 +1,4 @@
 
1
  import yaml
2
 
3
  from model import Summarization
@@ -9,7 +10,7 @@ def evaluate_model():
9
  """
10
  Evaluate model using rouge measure
11
  """
12
- with open("params.yml") as f:
13
  params = yaml.safe_load(f)
14
 
15
  test_df = pd.read_csv("data/processed/test.csv")[:25]
@@ -17,8 +18,8 @@ def evaluate_model():
17
  model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
18
  results = model.evaluate(test_df=test_df, metrics=params["metric"])
19
 
20
- with open("reports/metrics.csv", "w") as fp:
21
- json.dump(results, fp)
22
 
23
 
24
  if __name__ == "__main__":
 
1
+ from dagshub import dagshub_logger
2
  import yaml
3
 
4
  from model import Summarization
 
10
  """
11
  Evaluate model using rouge measure
12
  """
13
+ with open("model_params.yml") as f:
14
  params = yaml.safe_load(f)
15
 
16
  test_df = pd.read_csv("data/processed/test.csv")[:25]
 
18
  model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
19
  results = model.evaluate(test_df=test_df, metrics=params["metric"])
20
 
21
+ with dagshub_logger(metrics_path='reports/evaluation_metrics.csv', should_log_hparams=False) as logger:
22
+ logger.log_metrics(results)
23
 
24
 
25
  if __name__ == "__main__":
src/models/hf_upload.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import shutil
2
+ from getpass import getpass
3
+ from pathlib import Path
4
+ import yaml
5
+
6
+ from model import Summarization
7
+ from huggingface_hub import HfApi, Repository
8
+
9
+
10
+ def upload(model_to_upload, model_name):
11
+ hf_username = input("Enter your HuggingFace username:")
12
+ hf_token = getpass("Enter your HuggingFace token:")
13
+ model_url = HfApi().create_repo(token=hf_token, name=model_name, exist_ok=True)
14
+ model_repo = Repository(
15
+ "./hf_model",
16
+ clone_from=model_url,
17
+ use_auth_token=hf_token,
18
+ git_email=f"{hf_username}@users.noreply.huggingface.co",
19
+ git_user=hf_username,
20
+ )
21
+
22
+ del hf_token
23
+ readme_txt = f"""
24
+ ---
25
+ Summarisation model {model_name}
26
+ """.strip()
27
+
28
+ (Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
29
+ commit_url = model_repo.push_to_hub()
30
+
31
+ print("Check out your model at:")
32
+ print(commit_url)
33
+ print(f"https://huggingface.co/{hf_username}/{model_name}")
34
+
35
+ if Path("./hf_model").exists():
36
+ shutil.rmtree("./hf_model")
37
+
38
+
39
+ if __name__ == "__main__":
40
+ with open("model_params.yml") as f:
41
+ params = yaml.safe_load(f)
42
+
43
+ model = Summarization()
44
+ model.load_model(model_dir="./models")
45
+
46
+ upload(model_to_upload=model, model_name=params["name"])
src/models/model.py CHANGED
@@ -1,10 +1,7 @@
1
- import shutil
2
- from getpass import getpass
3
- from pathlib import Path
4
 
5
  import torch
6
  import pandas as pd
7
- from huggingface_hub import HfApi, Repository
8
  from transformers import (
9
  AdamW,
10
  T5ForConditionalGeneration,
@@ -15,7 +12,7 @@ from transformers import (
15
  )
16
  from torch.utils.data import Dataset, DataLoader
17
  import pytorch_lightning as pl
18
- from pytorch_lightning.loggers import MLFlowLogger, WandbLogger
19
  from pytorch_lightning import Trainer
20
  from pytorch_lightning.callbacks.early_stopping import EarlyStopping
21
  from pytorch_lightning import LightningDataModule
@@ -23,8 +20,7 @@ from pytorch_lightning import LightningModule
23
  from datasets import load_metric
24
  from tqdm.auto import tqdm
25
 
26
- # from dagshub.pytorch_lightning import DAGsHubLogger
27
-
28
 
29
  torch.cuda.empty_cache()
30
  pl.seed_everything(42)
@@ -274,7 +270,9 @@ class LightningModel(LightningModule):
274
  },
275
  ]
276
  optimizer = AdamW(
277
- optimizer_grouped_parameters, lr=self.learning_rate, eps=self.adam_epsilon
 
 
278
  )
279
  self.opt = optimizer
280
  return [optimizer]
@@ -364,14 +362,8 @@ class Summarization:
364
  weight_decay=weight_decay,
365
  )
366
 
367
- MLlogger = MLFlowLogger(
368
- experiment_name="Summarization",
369
- tracking_uri="https://dagshub.com/gagan3012/summarization.mlflow",
370
- )
371
-
372
- WandLogger = WandbLogger(project="summarization-dagshub")
373
-
374
- # logger = DAGsHubLogger(metrics_path='reports/training_metrics.txt')
375
 
376
  early_stop_callback = (
377
  [
@@ -390,14 +382,17 @@ class Summarization:
390
  gpus = -1 if use_gpu and torch.cuda.is_available() else 0
391
 
392
  trainer = Trainer(
393
- logger=[WandLogger, MLlogger],
394
  callbacks=early_stop_callback,
395
  max_epochs=max_epochs,
396
  gpus=gpus,
397
  progress_bar_refresh_rate=5,
398
  )
399
 
400
- trainer.fit(self.T5Model, self.data_module)
 
 
 
401
 
402
  def load_model(
403
  self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
@@ -552,31 +547,3 @@ class Summarization:
552
  "rougeLsum High F1": results["rougeLsum"].high.fmeasure,
553
  }
554
  return output
555
-
556
- def upload(self, hf_username, model_name):
557
- hf_password = getpass("Enter your HuggingFace password")
558
- if Path("./models").exists():
559
- shutil.rmtree("./models")
560
- token = HfApi().login(username=hf_username, password=hf_password)
561
- del hf_password
562
- model_url = HfApi().create_repo(token=token, name=model_name, exist_ok=True)
563
- model_repo = Repository(
564
- "./model",
565
- clone_from=model_url,
566
- use_auth_token=token,
567
- git_email=f"{hf_username}@users.noreply.huggingface.co",
568
- git_user=hf_username,
569
- )
570
-
571
- readme_txt = f"""
572
- ---
573
- Summarisation model {model_name}
574
- """.strip()
575
-
576
- (Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
577
- self.save_model()
578
- commit_url = model_repo.push_to_hub()
579
-
580
- print("Check out your model at:")
581
- print(commit_url)
582
- print(f"https://huggingface.co/{hf_username}/{model_name}")
 
1
+
 
 
2
 
3
  import torch
4
  import pandas as pd
 
5
  from transformers import (
6
  AdamW,
7
  T5ForConditionalGeneration,
 
12
  )
13
  from torch.utils.data import Dataset, DataLoader
14
  import pytorch_lightning as pl
15
+ from dagshub.pytorch_lightning import DAGsHubLogger
16
  from pytorch_lightning import Trainer
17
  from pytorch_lightning.callbacks.early_stopping import EarlyStopping
18
  from pytorch_lightning import LightningDataModule
 
20
  from datasets import load_metric
21
  from tqdm.auto import tqdm
22
 
23
+ import mlflow.pytorch
 
24
 
25
  torch.cuda.empty_cache()
26
  pl.seed_everything(42)
 
270
  },
271
  ]
272
  optimizer = AdamW(
273
+ optimizer_grouped_parameters,
274
+ lr=self.learning_rate,
275
+ eps=self.adam_epsilon,
276
  )
277
  self.opt = optimizer
278
  return [optimizer]
 
362
  weight_decay=weight_decay,
363
  )
364
 
365
+ logger = DAGsHubLogger(metrics_path='reports/training_metrics.csv',
366
+ hparams_path='reports/training_params.yml')
 
 
 
 
 
 
367
 
368
  early_stop_callback = (
369
  [
 
382
  gpus = -1 if use_gpu and torch.cuda.is_available() else 0
383
 
384
  trainer = Trainer(
385
+ logger=logger,
386
  callbacks=early_stop_callback,
387
  max_epochs=max_epochs,
388
  gpus=gpus,
389
  progress_bar_refresh_rate=5,
390
  )
391
 
392
+ mlflow.pytorch.autolog(log_models=False)
393
+
394
+ with mlflow.start_run() as run:
395
+ trainer.fit(self.T5Model, self.data_module)
396
 
397
  def load_model(
398
  self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
 
547
  "rougeLsum High F1": results["rougeLsum"].high.fmeasure,
548
  }
549
  return output
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/models/predict_model.py CHANGED
@@ -8,11 +8,11 @@ def predict_model(text):
8
  """
9
  Predict the summary of the given text.
10
  """
11
- with open("params.yml") as f:
12
  params = yaml.safe_load(f)
13
 
14
  model = Summarization()
15
- model.load_model(model_type=params["model_type"], model_dir=f"{params['hf_username']}/{params['name']}")
16
  pre_summary = model.predict(text)
17
  return pre_summary
18
 
 
8
  """
9
  Predict the summary of the given text.
10
  """
11
+ with open("model_params.yml") as f:
12
  params = yaml.safe_load(f)
13
 
14
  model = Summarization()
15
+ model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
16
  pre_summary = model.predict(text)
17
  return pre_summary
18
 
src/models/train_model.py CHANGED
@@ -1,5 +1,3 @@
1
- import json
2
-
3
  import yaml
4
 
5
  from model import Summarization
@@ -10,15 +8,15 @@ def train_model():
10
  """
11
  Train the model
12
  """
13
- with open("params.yml") as f:
14
  params = yaml.safe_load(f)
15
 
16
  # Load the data
17
  train_df = pd.read_csv("data/processed/train.csv")
18
  eval_df = pd.read_csv("data/processed/validation.csv")
19
 
20
- train_df = train_df.sample(frac=params["split"], replace=True, random_state=1)
21
- eval_df = eval_df.sample(frac=params["split"], replace=True, random_state=1)
22
 
23
  model = Summarization()
24
  model.from_pretrained(
@@ -37,15 +35,6 @@ def train_model():
37
 
38
  model.save_model(model_dir=params["model_dir"])
39
 
40
- with open("wandb/latest-run/files/wandb-summary.json") as json_file:
41
- data = json.load(json_file)
42
-
43
- with open("reports/training_metrics.txt", "w") as fp:
44
- json.dump(data, fp)
45
-
46
- if params["upload_to_hf"]:
47
- model.upload(hf_username=params["hf_username"], model_name=params["name"])
48
-
49
 
50
  if __name__ == "__main__":
51
  train_model()
 
 
 
1
  import yaml
2
 
3
  from model import Summarization
 
8
  """
9
  Train the model
10
  """
11
+ with open("model_params.yml") as f:
12
  params = yaml.safe_load(f)
13
 
14
  # Load the data
15
  train_df = pd.read_csv("data/processed/train.csv")
16
  eval_df = pd.read_csv("data/processed/validation.csv")
17
 
18
+ train_df = train_df.sample(random_state=1)
19
+ eval_df = eval_df.sample(random_state=1)
20
 
21
  model = Summarization()
22
  model.from_pretrained(
 
35
 
36
  model.save_model(model_dir=params["model_dir"])
37
 
 
 
 
 
 
 
 
 
 
38
 
39
  if __name__ == "__main__":
40
  train_model()
src/visualization/__init__.py DELETED
File without changes
src/visualization/visualize.py CHANGED
@@ -1,5 +1,4 @@
1
  import streamlit as st
2
- import yaml
3
 
4
  from src.models.predict_model import predict_model
5
 
@@ -19,14 +18,7 @@ def visualize():
19
  sumtext = predict_model(text=text)
20
  st.write("# Generated Summary:")
21
  st.write("{}".format(sumtext))
22
- with open("reports/visualization_metrics.txt", "w") as file1:
23
- file1.writelines(text)
24
- file1.writelines(sumtext)
25
 
26
 
27
  if __name__ == "__main__":
28
- with open("params.yml") as f:
29
- params = yaml.safe_load(f)
30
-
31
- if params["visualise"]:
32
- visualize()
 
1
  import streamlit as st
 
2
 
3
  from src.models.predict_model import predict_model
4
 
 
18
  sumtext = predict_model(text=text)
19
  st.write("# Generated Summary:")
20
  st.write("{}".format(sumtext))
 
 
 
21
 
22
 
23
  if __name__ == "__main__":
24
+ visualize()