Spaces:

gagan3012
/

summarization

Runtime error

App Files Files Community

gagan3012 commited on Aug 10, 2021

Commit

a74c595

2 Parent(s): a1c754b 09be2fb

Merge branch 'fix-mlflow' of Dean/summarization into master

Browse files

Files changed (31) hide show

.github/CODE_OF_CONDUCT.md +0 -128
.github/CONTRIBUTING.md +0 -92
.github/FUNDING.yml +0 -12
.github/ISSUE_TEMPLATE/bug_report.md +0 -38
.github/ISSUE_TEMPLATE/feature_request.md +0 -20
.github/PULL_REQUEST_TEMPLATE.md +0 -29
.gitignore +4 -1
Makefile +9 -1
app.py +0 -32
data.dvc +0 -14
data_params.yml +2 -0
dvc.lock +79 -42
dvc.yaml +18 -9
params.yml → model_params.yml +5 -10
reports/evaluation_metrics.csv +37 -0
reports/evaluation_metrics.txt +0 -1
reports/training_metrics.csv +9 -0
reports/training_metrics.txt +0 -1
reports/training_params.yml +1 -0
reports/visualization_metrics.txt +0 -0
requirements.txt +5 -5
src/data/__init__.py +0 -0
src/data/make_dataset.py +1 -1
src/data/process_data.py +1 -3
src/models/evaluate_model.py +4 -3
src/models/hf_upload.py +46 -0
src/models/model.py +13 -46
src/models/predict_model.py +2 -2
src/models/train_model.py +3 -14
src/visualization/__init__.py +0 -0
src/visualization/visualize.py +1 -9

.github/CODE_OF_CONDUCT.md DELETED Viewed

@@ -1,128 +0,0 @@
-# Contributor Covenant Code of Conduct
-## Our Pledge
-We as members, contributors, and leaders pledge to make participation in our
-community a harassment-free experience for everyone, regardless of age, body
-size, visible or invisible disability, ethnicity, sex characteristics, gender
-identity and expression, level of experience, education, socio-economic status,
-nationality, personal appearance, race, religion, or sexual identity
-and orientation.
-We pledge to act and interact in ways that contribute to an open, welcoming,
-diverse, inclusive, and healthy community.
-## Our Standards
-Examples of behavior that contributes to a positive environment for our
-community include:
-* Demonstrating empathy and kindness toward other people
-* Being respectful of differing opinions, viewpoints, and experiences
-* Giving and gracefully accepting constructive feedback
-* Accepting responsibility and apologizing to those affected by our mistakes,
-  and learning from the experience
-* Focusing on what is best not just for us as individuals, but for the
-  overall community
-Examples of unacceptable behavior include:
-* The use of sexualized language or imagery, and sexual attention or
-  advances of any kind
-* Trolling, insulting or derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or email
-  address, without their explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting
-## Enforcement Responsibilities
-Community leaders are responsible for clarifying and enforcing our standards of
-acceptable behavior and will take appropriate and fair corrective action in
-response to any behavior that they deem inappropriate, threatening, offensive,
-or harmful.
-Community leaders have the right and responsibility to remove, edit, or reject
-comments, commits, code, wiki edits, issues, and other contributions that are
-not aligned to this Code of Conduct, and will communicate reasons for moderation
-decisions when appropriate.
-## Scope
-This Code of Conduct applies within all community spaces, and also applies when
-an individual is officially representing the community in public spaces.
-Examples of representing our community include using an official e-mail address,
-posting via an official social media account, or acting as an appointed
-representative at an online or offline event.
-## Enforcement
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported to the community leaders responsible for enforcement at
-@gagan3012.
-All complaints will be reviewed and investigated promptly and fairly.
-All community leaders are obligated to respect the privacy and security of the
-reporter of any incident.
-## Enforcement Guidelines
-Community leaders will follow these Community Impact Guidelines in determining
-the consequences for any action they deem in violation of this Code of Conduct:
-### 1. Correction
-**Community Impact**: Use of inappropriate language or other behavior deemed
-unprofessional or unwelcome in the community.
-**Consequence**: A private, written warning from community leaders, providing
-clarity around the nature of the violation and an explanation of why the
-behavior was inappropriate. A public apology may be requested.
-### 2. Warning
-**Community Impact**: A violation through a single incident or series
-of actions.
-**Consequence**: A warning with consequences for continued behavior. No
-interaction with the people involved, including unsolicited interaction with
-those enforcing the Code of Conduct, for a specified period of time. This
-includes avoiding interactions in community spaces as well as external channels
-like social media. Violating these terms may lead to a temporary or
-permanent ban.
-### 3. Temporary Ban
-**Community Impact**: A serious violation of community standards, including
-sustained inappropriate behavior.
-**Consequence**: A temporary ban from any sort of interaction or public
-communication with the community for a specified period of time. No public or
-private interaction with the people involved, including unsolicited interaction
-with those enforcing the Code of Conduct, is allowed during this period.
-Violating these terms may lead to a permanent ban.
-### 4. Permanent Ban
-**Community Impact**: Demonstrating a pattern of violation of community
-standards, including sustained inappropriate behavior,  harassment of an
-individual, or aggression toward or disparagement of classes of individuals.
-**Consequence**: A permanent ban from any sort of public interaction within
-the community.
-## Attribution
-This Code of Conduct is adapted from the [Contributor Covenant][homepage],
-version 2.0, available at
-https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
-Community Impact Guidelines were inspired by [Mozilla's code of conduct
-enforcement ladder](https://github.com/mozilla/diversity).
-[homepage]: https://www.contributor-covenant.org
-For answers to common questions about this code of conduct, see the FAQ at
-https://www.contributor-covenant.org/faq. Translations are available at
-https://www.contributor-covenant.org/translations.

.github/CONTRIBUTING.md DELETED Viewed

@@ -1,92 +0,0 @@
-# Contributing
-When contributing to this repository, please first discuss the change you wish to make via issue,
-email, or any other method with the owners of this repository before making a change.
-Please note we have a code of conduct, please follow it in all your interactions with the project.
-## Pull Request Process
-1. Ensure any install or build dependencies are removed before the end of the layer when doing a
-   build.
-2. Update the README.md with details of changes to the interface, this includes new environment
-   variables, exposed ports, useful file locations and container parameters.
-3. Increase the version numbers in any examples files and the README.md to the new version that this
-   Pull Request would represent. The versioning scheme we use is [SemVer](http://semver.org/).
-4. You may merge the Pull Request in once you have the sign-off of two other developers, or if you
-   do not have permission to do that, you may request the second reviewer to merge it for you.
-## Code of Conduct
-### Our Pledge
-In the interest of fostering an open and welcoming environment, we as
-contributors and maintainers pledge to making participation in our project and
-our community a harassment-free experience for everyone, regardless of age, body
-size, disability, ethnicity, gender identity and expression, level of experience,
-nationality, personal appearance, race, religion, or sexual identity and
-orientation.
-### Our Standards
-Examples of behavior that contributes to creating a positive environment
-include:
-* Using welcoming and inclusive language
-* Being respectful of differing viewpoints and experiences
-* Gracefully accepting constructive criticism
-* Focusing on what is best for the community
-* Showing empathy towards other community members
-Examples of unacceptable behavior by participants include:
-* The use of sexualized language or imagery and unwelcome sexual attention or
-advances
-* Trolling, insulting/derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or electronic
-  address, without explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting
-### Our Responsibilities
-Project maintainers are responsible for clarifying the standards of acceptable
-behavior and are expected to take appropriate and fair corrective action in
-response to any instances of unacceptable behavior.
-Project maintainers have the right and responsibility to remove, edit, or
-reject comments, commits, code, wiki edits, issues, and other contributions
-that are not aligned to this Code of Conduct, or to ban temporarily or
-permanently any contributor for other behaviors that they deem inappropriate,
-threatening, offensive, or harmful.
-### Scope
-This Code of Conduct applies both within project spaces and in public spaces
-when an individual is representing the project or its community. Examples of
-representing a project or community include using an official project e-mail
-address, posting via an official social media account, or acting as an appointed
-representative at an online or offline event. Representation of a project may be
-further defined and clarified by project maintainers.
-### Enforcement
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported by contacting the project team at [INSERT EMAIL ADDRESS]. All
-complaints will be reviewed and investigated and will result in a response that
-is deemed necessary and appropriate to the circumstances. The project team is
-obligated to maintain confidentiality with regard to the reporter of an incident.
-Further details of specific enforcement policies may be posted separately.
-Project maintainers who do not follow or enforce the Code of Conduct in good
-faith may face temporary or permanent repercussions as determined by other
-members of the project's leadership.
-### Attribution
-This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
-available at [http://contributor-covenant.org/version/1/4][version]
-[homepage]: http://contributor-covenant.org
-[version]: http://contributor-covenant.org/version/1/4/

.github/FUNDING.yml DELETED Viewed

@@ -1,12 +0,0 @@
-# These are supported funding model platforms
-github: gagan3012 # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
-patreon: # Replace with a single Patreon username
-open_collective: # Replace with a single Open Collective username
-ko_fi: # Replace with a single Ko-fi username
-tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
-community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
-liberapay: # Replace with a single Liberapay username
-issuehunt: # Replace with a single IssueHunt username
-otechie: # Replace with a single Otechie username
-custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

.github/ISSUE_TEMPLATE/bug_report.md DELETED Viewed

@@ -1,38 +0,0 @@
----
-name: Bug report
-about: Create a report to help us improve
-title: ''
-labels: ''
-assignees: ''
----
-**Describe the bug**
-A clear and concise description of what the bug is.
-**To Reproduce**
-Steps to reproduce the behavior:
-1. Go to '...'
-2. Click on '....'
-3. Scroll down to '....'
-4. See error
-**Expected behavior**
-A clear and concise description of what you expected to happen.
-**Screenshots**
-If applicable, add screenshots to help explain your problem.
-**Desktop (please complete the following information):**
- - OS: [e.g. iOS]
- - Browser [e.g. chrome, safari]
- - Version [e.g. 22]
-**Smartphone (please complete the following information):**
- - Device: [e.g. iPhone6]
- - OS: [e.g. iOS8.1]
- - Browser [e.g. stock browser, safari]
- - Version [e.g. 22]
-**Additional context**
-Add any other context about the problem here.

.github/ISSUE_TEMPLATE/feature_request.md DELETED Viewed

@@ -1,20 +0,0 @@
----
-name: Feature request
-about: Suggest an idea for this project
-title: ''
-labels: ''
-assignees: ''
----
-**Is your feature request related to a problem? Please describe.**
-A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
-**Describe the solution you'd like**
-A clear and concise description of what you want to happen.
-**Describe alternatives you've considered**
-A clear and concise description of any alternative solutions or features you've considered.
-**Additional context**
-Add any other context or screenshots about the feature request here.

.github/PULL_REQUEST_TEMPLATE.md DELETED Viewed

@@ -1,29 +0,0 @@
-<!--- Provide a general summary of your changes in the Title above -->
-## Description
-<!--- Describe your changes in detail -->
-## Motivation and Context
-<!--- Why is this change required? What problem does it solve? -->
-<!--- If it fixes an open issue, please link to the issue here. -->
-## How Has This Been Tested?
-<!--- Please describe in detail how you tested your changes. -->
-<!--- Include details of your testing environment, and the tests you ran to -->
-<!--- see how your change affects other areas of the code, etc. -->
-## Screenshots (if appropriate):
-## Types of changes
-<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
-- [ ] Bug fix (non-breaking change which fixes an issue)
-- [ ] New feature (non-breaking change which adds functionality)
-- [ ] Breaking change (fix or feature that would cause existing functionality to change)
-## Checklist:
-<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
-<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
-- [ ] My code follows the code style of this project.
-- [ ] My change requires a change to the documentation.
-- [ ] I have updated the documentation accordingly.
-- [ ] I have read the **CONTRIBUTING** document.

.gitignore CHANGED Viewed

@@ -93,6 +93,9 @@ coverage.xml
 .vscode
 /data
-wandb/
 summarization-dagshub/
 /models

 .vscode
 /data
 summarization-dagshub/
 /models
+default/
+artifacts/
+mlruns/
+hf_model/

Makefile CHANGED Viewed

@@ -48,7 +48,15 @@ pull:
 ## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
 run:
-	dvc repro dvc.yaml
 #################################################################################
 # PROJECT RULES                                                                 #

 ## run the DVC pipeline - recompute any modified outputs such as processed data or trained models
 run:
+	dvc repro eval
+## run the visualization using Streamlit
+visualize:
+	dvc repro visualize
+## push the trained model to HF model hub
+push_to_hf_hub:
+	dvc repro push_to_hf_hub
 #################################################################################
 # PROJECT RULES                                                                 #

app.py DELETED Viewed

@@ -1,32 +0,0 @@
-import streamlit as st
-import yaml
-from src.models.predict_model import predict_model
-def visualize():
-    st.write("# Summarization  UI")
-    st.markdown(
-        """
-        *For additional questions and inquiries, please contact **Gagan Bhatia** via [LinkedIn](
-        https://www.linkedin.com/in/gbhatia30/) or [Github](https://github.com/gagan3012).*
-        """
-    )
-    text = st.text_area("Enter text here")
-    if st.button("Generate Summary"):
-        with st.spinner("Connecting the Dots..."):
-            sumtext = predict_model(text=text)
-        st.write("# Generated Summary:")
-        st.write("{}".format(sumtext))
-        with open("reports/visualization_metrics.txt", "w") as file1:
-            file1.writelines(text)
-            file1.writelines(sumtext)
-if __name__ == "__main__":
-    with open("params.yml") as f:
-        params = yaml.safe_load(f)
-    if params["visualise"]:
-        visualize()

data.dvc DELETED Viewed

@@ -1,14 +0,0 @@
-deps:
-- path: params.yml
-  md5: d0f3e81bc9191e752a69761045a449d9
-  size: 196
-- path: src/data/make_dataset.py
-  md5: 9de71de0f8df5d0a7beb235ef7c7777d
-  size: 772
-cmd: python src/data/make_dataset.py
-outs:
-- md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
-  nfiles: 3
-  path: data/raw
-  size: 1358833013
-md5: ff502232006c7fbef1015b5aa5cc4bbb

data_params.yml ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ data: cnn_dailymail
2	+ split: 0.001

dvc.lock CHANGED Viewed

@@ -4,65 +4,102 @@ stages:
     cmd: python src/models/train_model.py
     deps:
     - path: data/processed/train.csv
-      md5: 51edd724b75a8e99a78b9138f8f37c60
-      size: 25012573
     - path: data/processed/validation.csv
-      md5: 0900e2bb330df94cb045faddd0b945d1
-      size: 1138285
-    - path: params.yml
-      md5: d0f3e81bc9191e752a69761045a449d9
-      size: 196
     - path: src/models/train_model.py
-      md5: fca8acf70f09cecd679ca1ddb2eef6a9
-      size: 1198
     outs:
     - path: models
-      md5: 688745a9fb1cc7c8580887bae3873a39.dir
-      size: 486952666
-      nfiles: 10
-    - path: reports/training_metrics.txt
-      md5: 048a956b0eb431535d287bbc3322cf76
-      size: 158
   eval:
     cmd: python src/models/evaluate_model.py
     deps:
     - path: data/processed/test.csv
-      md5: 3cb7b63891f12d53b3ef3e81a2e93f8e
-      size: 986944
     - path: models
-      md5: 688745a9fb1cc7c8580887bae3873a39.dir
-      size: 486952666
-      nfiles: 10
-    - path: params.yml
-      md5: d0f3e81bc9191e752a69761045a449d9
-      size: 196
     - path: src/models/evaluate_model.py
-      md5: aa01b1564d737fef54ae45d25c5018d1
-      size: 615
     outs:
-    - path: reports/metrics.txt
-      md5: 27d21366dca75caa1bb3777575cb126b
-      size: 1596
   process_data:
     cmd: python src/data/process_data.py
     deps:
     - path: data/raw
-      md5: d751713988987e9331980363e24189ce.dir
-      size: 0
-      nfiles: 0
-    - path: params.yml
-      md5: d0f3e81bc9191e752a69761045a449d9
-      size: 196
     - path: src/data/process_data.py
-      md5: ba3ba7b7c8a905b736b6b0a28d2334c4
-      size: 623
     outs:
     - path: data/processed/test.csv
-      md5: 3cb7b63891f12d53b3ef3e81a2e93f8e
-      size: 986944
     - path: data/processed/train.csv
-      md5: 51edd724b75a8e99a78b9138f8f37c60
-      size: 25012573
     - path: data/processed/validation.csv
-      md5: 0900e2bb330df94cb045faddd0b945d1
-      size: 1138285

     cmd: python src/models/train_model.py
     deps:
     - path: data/processed/train.csv
+      md5: 5331b9c32b2d097d8d7aca01de5524bc
+      size: 1198262
     - path: data/processed/validation.csv
+      md5: 6069153a075b00dfb6d9e0843dd2da89
+      size: 52739
+    - path: model_params.yml
+      md5: 1bf2edf25e851cc9cd3be75fbd9905a3
+      size: 177
     - path: src/models/train_model.py
+      md5: f7d1121426c3d5530c2b9697cb7ac74a
+      size: 951
     outs:
     - path: models
+      md5: fc37870a93db61b94af9f0847577f09b.dir
+      size: 243476333
+      nfiles: 5
+    - path: reports/training_metrics.csv
+      md5: 3b309def91a32e521acd23b163742522
+      size: 320
   eval:
     cmd: python src/models/evaluate_model.py
     deps:
     - path: data/processed/test.csv
+      md5: 3eec94ac211c76363a3d968663b82d02
+      size: 39574
+    - path: model_params.yml
+      md5: 1bf2edf25e851cc9cd3be75fbd9905a3
+      size: 177
     - path: models
+      md5: fc37870a93db61b94af9f0847577f09b.dir
+      size: 243476333
+      nfiles: 5
     - path: src/models/evaluate_model.py
+      md5: 89edb77aaab3055605ae6db2e21eab82
+      size: 705
     outs:
+    - path: reports/evaluation_metrics.csv
+      md5: eaa3bf017026aa1be31560f308fff78e
+      size: 2122
   process_data:
     cmd: python src/data/process_data.py
     deps:
     - path: data/raw
+      md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
+      size: 1358833013
+      nfiles: 3
+    - path: data_params.yml
+      md5: a68eabf79c3b3e28afb05baa1944bbc7
+      size: 32
     - path: src/data/process_data.py
+      md5: 68db554a69a0c8ce807907afa2be5e9c
+      size: 521
     outs:
     - path: data/processed/test.csv
+      md5: 3eec94ac211c76363a3d968663b82d02
+      size: 39574
     - path: data/processed/train.csv
+      md5: 5331b9c32b2d097d8d7aca01de5524bc
+      size: 1198262
     - path: data/processed/validation.csv
+      md5: 6069153a075b00dfb6d9e0843dd2da89
+      size: 52739
+  download_data:
+    cmd: python src/data/make_dataset.py
+    deps:
+    - path: data_params.yml
+      md5: a68eabf79c3b3e28afb05baa1944bbc7
+      size: 32
+    - path: src/data/make_dataset.py
+      md5: a0667f4ad8c06551609bd0bf950167b7
+      size: 776
+    outs:
+    - path: data/raw
+      md5: 2ab20ac1b58df875a590b07d0e04eb5b.dir
+      size: 1358833013
+      nfiles: 3
+  visualize:
+    cmd: streamlit run src/visualization/visualize.py
+    deps:
+    - path: models
+      md5: fc37870a93db61b94af9f0847577f09b.dir
+      size: 243476333
+      nfiles: 5
+    - path: src/visualization/visualize.py
+      md5: 4226e4148abb5ac186c0ab8c1d87b228
+      size: 671
+  push_to_hf_hub:
+    cmd: python src/models/hf_upload.py
+    deps:
+    - path: model_params.yml
+      md5: 1bf2edf25e851cc9cd3be75fbd9905a3
+      size: 177
+    - path: models
+      md5: fc37870a93db61b94af9f0847577f09b.dir
+      size: 243476333
+      nfiles: 5
+    - path: src/models/hf_upload.py
+      md5: a953816a3eb7bef702313544103a1c11
+      size: 1290

dvc.yaml CHANGED Viewed

@@ -1,8 +1,15 @@
 stages:
   process_data:
     cmd: python src/data/process_data.py
     deps:
-      - params.yml
       - data/raw
       - src/data/process_data.py
     outs:
@@ -18,7 +25,7 @@ stages:
   train:
     cmd: python src/models/train_model.py
     deps:
-      - params.yml
       - data/processed/train.csv
       - data/processed/validation.csv
       - src/models/train_model.py
@@ -26,25 +33,27 @@ stages:
       - models:
           persist: true
     metrics:
-      - reports/training_metrics.txt:
           cache: false
   eval:
     cmd: python src/models/evaluate_model.py
     deps:
-      - params.yml
       - data/processed/test.csv
       - models
       - src/models/evaluate_model.py
     metrics:
-      - reports/evaluation_metrics.txt:
           cache: false
   visualize:
     cmd: streamlit run src/visualization/visualize.py
     deps:
       - models
       - src/visualization/visualize.py
-      - params.yml
-    metrics:
-      - reports/visualization_metrics.txt:
-          cache: false

 stages:
+  download_data:
+    cmd: python src/data/make_dataset.py
+    deps:
+      - data_params.yml
+      - src/data/make_dataset.py
+    outs:
+      - data/raw
   process_data:
     cmd: python src/data/process_data.py
     deps:
+      - data_params.yml
       - data/raw
       - src/data/process_data.py
     outs:
   train:
     cmd: python src/models/train_model.py
     deps:
+      - model_params.yml
       - data/processed/train.csv
       - data/processed/validation.csv
       - src/models/train_model.py
       - models:
           persist: true
     metrics:
+      - reports/training_metrics.csv:
           cache: false
   eval:
     cmd: python src/models/evaluate_model.py
     deps:
+      - model_params.yml
       - data/processed/test.csv
       - models
       - src/models/evaluate_model.py
     metrics:
+      - reports/evaluation_metrics.csv:
           cache: false
   visualize:
     cmd: streamlit run src/visualization/visualize.py
     deps:
       - models
       - src/visualization/visualize.py
+  push_to_hf_hub:
+    cmd: python src/models/hf_upload.py
+    deps:
+      - model_params.yml
+      - src/models/hf_upload.py
+      - models

params.yml → model_params.yml RENAMED Viewed

@@ -1,16 +1,11 @@
 name: summarsiation
-data: cnn_dailymail
-batch_size: 2
-num_workers: 2
 model_type: t5
 model_name: t5-small
-learning_rate: 1e-4
 epochs: 5
-source_dir: src
 model_dir: models
 metric: rouge
-split: 0.001
-use_gpu: True
-visualise: True
-hf_username: gagan3012
-upload_to_hf: True

 name: summarsiation
 model_type: t5
 model_name: t5-small
+batch_size: 2
 epochs: 5
+use_gpu: True
+learning_rate: 1e-4
+num_workers: 2
 model_dir: models
 metric: rouge
+source_dir: src

reports/evaluation_metrics.csv ADDED Viewed

	@@ -0,0 +1,37 @@

+Name,Value,Timestamp,Step
+"Rouge_1 Low Precision",0.23786550570641482,1628587253223,1
+"Rouge_1 Low recall",0.23355396379384713,1628587253223,1
+"Rouge_1 Low F1",0.23602599457077003,1628587253223,1
+"Rouge_1 Mid Precision",0.3569471852499436,1628587253223,1
+"Rouge_1 Mid recall",0.31915939075819916,1628587253223,1
+"Rouge_1 Mid F1",0.3317618573023773,1628587253223,1
+"Rouge_1 High Precision",0.4726861301480842,1628587253223,1
+"Rouge_1 High recall",0.4019654200001146,1628587253223,1
+"Rouge_1 High F1",0.4298956952594035,1628587253223,1
+"Rouge_2 Low Precision",0.06184772400193972,1628587253223,1
+"Rouge_2 Low recall",0.05626972412346313,1628587253223,1
+"Rouge_2 Low F1",0.058680298802341754,1628587253223,1
+"Rouge_2 Mid Precision",0.1367034298993256,1628587253223,1
+"Rouge_2 Mid recall",0.11953160646342464,1628587253223,1
+"Rouge_2 Mid F1",0.12485064123505887,1628587253223,1
+"Rouge_2 High Precision",0.22739029631016827,1628587253223,1
+"Rouge_2 High recall",0.18851628169809986,1628587253223,1
+"Rouge_2 High F1",0.20306657551189072,1628587253223,1
+"Rouge_L Low Precision",0.18248956154159507,1628587253223,1
+"Rouge_L Low recall",0.18048774357814204,1628587253223,1
+"Rouge_L Low F1",0.18151380309623336,1628587253223,1
+"Rouge_L Mid Precision",0.2614974838710314,1628587253223,1
+"Rouge_L Mid recall",0.24286688705755238,1628587253223,1
+"Rouge_L Mid F1",0.24674586991996245,1628587253223,1
+"Rouge_L High Precision",0.3574471638807763,1628587253223,1
+"Rouge_L High recall",0.30836083808542225,1628587253223,1
+"Rouge_L High F1",0.32385446385474176,1628587253223,1
+"rougeLsum Low Precision",0.21468633089019287,1628587253223,1
+"rougeLsum Low recall",0.2057771050364415,1628587253223,1
+"rougeLsum Low F1",0.21170611912426093,1628587253223,1
+"rougeLsum Mid Precision",0.3060593850789648,1628587253223,1
+"rougeLsum Mid recall",0.27733553744690076,1628587253223,1
+"rougeLsum Mid F1",0.28530501988436374,1628587253223,1
+"rougeLsum High Precision",0.4094614601758424,1628587253223,1
+"rougeLsum High recall",0.34640369291505535,1628587253223,1
+"rougeLsum High F1",0.36454440079714096,1628587253223,1

reports/evaluation_metrics.txt DELETED Viewed

@@ -1 +0,0 @@

- {"Rouge 1": {"Rouge_1 Low Precision": 0.34885388166790793, "Rouge_1 Low recall": 0.28871556132198656, "Rouge_1 Low F1": 0.31058637096822267, "Rouge_1 Mid Precision": 0.412435004251884, "Rouge_1 Mid recall": 0.3386352228897427, "Rouge_1 Mid F1": 0.3517931748124066, "Rouge_1 High Precision": 0.47625451117848977, "Rouge_1 High recall": 0.39086727645312935, "Rouge_1 High F1": 0.3959993953753958}, "Rouge 2": {"Rouge_2 Low Precision": 0.1259156300716482, "Rouge_2 Low recall": 0.10333119800163641, "Rouge_2 Low F1": 0.10992592662502373, "Rouge_2 Mid Precision": 0.16879303949162833, "Rouge_2 Mid recall": 0.13805319188028575, "Rouge_2 Mid F1": 0.14400796293585816, "Rouge_2 High Precision": 0.21844214485938712, "Rouge_2 High recall": 0.1777722350788, "Rouge_2 High F1": 0.18342627795315522}, "Rouge L": {"Rouge_L Low Precision": 0.2322041975032734, "Rouge_L Low recall": 0.194000575085051, "Rouge_L Low F1": 0.20468107864660212, "Rouge_L Mid Precision": 0.2797360675037497, "Rouge_L Mid recall": 0.22647774162854406, "Rouge_L Mid F1": 0.2361293941929179, "Rouge_L High Precision": 0.3357160682858357, "Rouge_L High recall": 0.2622222798536235, "Rouge_L High F1": 0.27267217209978356}, "rougeLsum": {"rougeLsum Low Precision": 0.29651536760563263, "rougeLsum Low recall": 0.2432094838451322, "rougeLsum Low F1": 0.26048483356867896, "rougeLsum Mid Precision": 0.35317671791338556, "rougeLsum Mid recall": 0.286187817596869, "rougeLsum Mid F1": 0.2985727815225495, "rougeLsum High Precision": 0.4134539668577922, "rougeLsum High recall": 0.3365998852405162, "rougeLsum High F1": 0.3454898564714797}}

reports/training_metrics.csv ADDED Viewed

	@@ -0,0 +1,9 @@

+Name,Value,Timestamp,Step
+"val_loss",2.615034580230713,1628591864766,0
+"epoch",0,1628591864766,0
+"val_loss",2.6141018867492676,1628591893945,1
+"epoch",1,1628591893945,1
+"val_loss",2.6132164001464844,1628591923101,2
+"epoch",2,1628591923101,2
+"val_loss",2.612450361251831,1628591951319,3
+"epoch",3,1628591951319,3

reports/training_metrics.txt DELETED Viewed

	@@ -1 +0,0 @@
1	- {"train_loss": 2.785480260848999, "epoch": 4, "trainer/global_step": 289, "_runtime": 88, "_timestamp": 1627353229, "_step": 9, "val_loss": 2.181020975112915}

reports/training_params.yml ADDED Viewed

	@@ -0,0 +1 @@


1	+ status: success

reports/visualization_metrics.txt DELETED Viewed

File without changes

requirements.txt CHANGED Viewed

@@ -3,13 +3,13 @@ datasets==1.10.2
 pytorch_lightning==1.3.5
 transformers==4.9.0
 torch==1.9.0
-dagshub==0.1.6
 pandas==1.1.5
-rouge_score
 pyyaml
-dvc
-mlflow
-wandb
 # external requirements
 click

 pytorch_lightning==1.3.5
 transformers==4.9.0
 torch==1.9.0
+dagshub==0.1.7
 pandas==1.1.5
+rouge_score==0.0.4
+dvc==2.5.4
+mlflow==1.19.0
+streamlit==0.85.1
 pyyaml
 # external requirements
 click

src/data/__init__.py DELETED Viewed

File without changes

src/data/make_dataset.py CHANGED Viewed

@@ -17,7 +17,7 @@ def make_dataset(dataset="cnn_dailymail", split="train"):
 if __name__ == "__main__":
-    with open("params.yml") as f:
         params = yaml.safe_load(f)
     pprint.pprint(params)
     make_dataset(dataset=params["data"], split="train")

 if __name__ == "__main__":
+    with open("data_params.yml") as f:
         params = yaml.safe_load(f)
     pprint.pprint(params)
     make_dataset(dataset=params["data"], split="train")

src/data/process_data.py CHANGED Viewed

@@ -5,14 +5,12 @@ import os
 def process_data(split="train"):
-    with open("params.yml") as f:
         params = yaml.safe_load(f)
     df = pd.read_csv("data/raw/{}.csv".format(split))
     df.columns = ["Unnamed: 0", "input_text", "output_text"]
     df = df.sample(frac=params["split"], replace=True, random_state=1)
-    if os.path.exists("data/raw/{}.csv".format(split)):
-        os.remove("data/raw/{}.csv".format(split))
     df.to_csv("data/processed/{}.csv".format(split))

 def process_data(split="train"):
+    with open("data_params.yml") as f:
         params = yaml.safe_load(f)
     df = pd.read_csv("data/raw/{}.csv".format(split))
     df.columns = ["Unnamed: 0", "input_text", "output_text"]
     df = df.sample(frac=params["split"], replace=True, random_state=1)
     df.to_csv("data/processed/{}.csv".format(split))

src/models/evaluate_model.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import yaml
 from model import Summarization
@@ -9,7 +10,7 @@ def evaluate_model():
     """
     Evaluate model using rouge measure
     """
-    with open("params.yml") as f:
         params = yaml.safe_load(f)
     test_df = pd.read_csv("data/processed/test.csv")[:25]
@@ -17,8 +18,8 @@ def evaluate_model():
     model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
     results = model.evaluate(test_df=test_df, metrics=params["metric"])
-    with open("reports/metrics.csv", "w") as fp:
-        json.dump(results, fp)
 if __name__ == "__main__":

+from dagshub import dagshub_logger
 import yaml
 from model import Summarization
     """
     Evaluate model using rouge measure
     """
+    with open("model_params.yml") as f:
         params = yaml.safe_load(f)
     test_df = pd.read_csv("data/processed/test.csv")[:25]
     model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
     results = model.evaluate(test_df=test_df, metrics=params["metric"])
+    with dagshub_logger(metrics_path='reports/evaluation_metrics.csv', should_log_hparams=False) as logger:
+        logger.log_metrics(results)
 if __name__ == "__main__":

src/models/hf_upload.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import shutil
+from getpass import getpass
+from pathlib import Path
+import yaml
+from model import Summarization
+from huggingface_hub import HfApi, Repository
+def upload(model_to_upload, model_name):
+    hf_username = input("Enter your HuggingFace username:")
+    hf_token = getpass("Enter your HuggingFace token:")
+    model_url = HfApi().create_repo(token=hf_token, name=model_name, exist_ok=True)
+    model_repo = Repository(
+        "./hf_model",
+        clone_from=model_url,
+        use_auth_token=hf_token,
+        git_email=f"{hf_username}@users.noreply.huggingface.co",
+        git_user=hf_username,
+    )
+    del hf_token
+    readme_txt = f"""
+            ---
+            Summarisation model {model_name}
+            """.strip()
+    (Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
+    commit_url = model_repo.push_to_hub()
+    print("Check out your model at:")
+    print(commit_url)
+    print(f"https://huggingface.co/{hf_username}/{model_name}")
+    if Path("./hf_model").exists():
+        shutil.rmtree("./hf_model")
+if __name__ == "__main__":
+    with open("model_params.yml") as f:
+        params = yaml.safe_load(f)
+    model = Summarization()
+    model.load_model(model_dir="./models")
+    upload(model_to_upload=model, model_name=params["name"])

src/models/model.py CHANGED Viewed

@@ -1,10 +1,7 @@
-import shutil
-from getpass import getpass
-from pathlib import Path
 import torch
 import pandas as pd
-from huggingface_hub import HfApi, Repository
 from transformers import (
     AdamW,
     T5ForConditionalGeneration,
@@ -15,7 +12,7 @@ from transformers import (
 )
 from torch.utils.data import Dataset, DataLoader
 import pytorch_lightning as pl
-from pytorch_lightning.loggers import MLFlowLogger, WandbLogger
 from pytorch_lightning import Trainer
 from pytorch_lightning.callbacks.early_stopping import EarlyStopping
 from pytorch_lightning import LightningDataModule
@@ -23,8 +20,7 @@ from pytorch_lightning import LightningModule
 from datasets import load_metric
 from tqdm.auto import tqdm
-# from dagshub.pytorch_lightning import DAGsHubLogger
 torch.cuda.empty_cache()
 pl.seed_everything(42)
@@ -274,7 +270,9 @@ class LightningModel(LightningModule):
             },
         ]
         optimizer = AdamW(
-            optimizer_grouped_parameters, lr=self.learning_rate, eps=self.adam_epsilon
         )
         self.opt = optimizer
         return [optimizer]
@@ -364,14 +362,8 @@ class Summarization:
             weight_decay=weight_decay,
         )
-        MLlogger = MLFlowLogger(
-            experiment_name="Summarization",
-            tracking_uri="https://dagshub.com/gagan3012/summarization.mlflow",
-        )
-        WandLogger = WandbLogger(project="summarization-dagshub")
-        # logger = DAGsHubLogger(metrics_path='reports/training_metrics.txt')
         early_stop_callback = (
             [
@@ -390,14 +382,17 @@ class Summarization:
         gpus = -1 if use_gpu and torch.cuda.is_available() else 0
         trainer = Trainer(
-            logger=[WandLogger, MLlogger],
             callbacks=early_stop_callback,
             max_epochs=max_epochs,
             gpus=gpus,
             progress_bar_refresh_rate=5,
         )
-        trainer.fit(self.T5Model, self.data_module)
     def load_model(
         self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
@@ -552,31 +547,3 @@ class Summarization:
             "rougeLsum High F1": results["rougeLsum"].high.fmeasure,
         }
         return output
-    def upload(self, hf_username, model_name):
-        hf_password = getpass("Enter your HuggingFace password")
-        if Path("./models").exists():
-            shutil.rmtree("./models")
-        token = HfApi().login(username=hf_username, password=hf_password)
-        del hf_password
-        model_url = HfApi().create_repo(token=token, name=model_name, exist_ok=True)
-        model_repo = Repository(
-            "./model",
-            clone_from=model_url,
-            use_auth_token=token,
-            git_email=f"{hf_username}@users.noreply.huggingface.co",
-            git_user=hf_username,
-        )
-        readme_txt = f"""
-            ---
-            Summarisation model {model_name}
-            """.strip()
-        (Path(model_repo.local_dir) / "README.md").write_text(readme_txt)
-        self.save_model()
-        commit_url = model_repo.push_to_hub()
-        print("Check out your model at:")
-        print(commit_url)
-        print(f"https://huggingface.co/{hf_username}/{model_name}")

 import torch
 import pandas as pd
 from transformers import (
     AdamW,
     T5ForConditionalGeneration,
 )
 from torch.utils.data import Dataset, DataLoader
 import pytorch_lightning as pl
+from dagshub.pytorch_lightning import DAGsHubLogger
 from pytorch_lightning import Trainer
 from pytorch_lightning.callbacks.early_stopping import EarlyStopping
 from pytorch_lightning import LightningDataModule
 from datasets import load_metric
 from tqdm.auto import tqdm
+import mlflow.pytorch
 torch.cuda.empty_cache()
 pl.seed_everything(42)
             },
         ]
         optimizer = AdamW(
+            optimizer_grouped_parameters,
+            lr=self.learning_rate,
+            eps=self.adam_epsilon,
         )
         self.opt = optimizer
         return [optimizer]
             weight_decay=weight_decay,
         )
+        logger = DAGsHubLogger(metrics_path='reports/training_metrics.csv',
+                               hparams_path='reports/training_params.yml')
         early_stop_callback = (
             [
         gpus = -1 if use_gpu and torch.cuda.is_available() else 0
         trainer = Trainer(
+            logger=logger,
             callbacks=early_stop_callback,
             max_epochs=max_epochs,
             gpus=gpus,
             progress_bar_refresh_rate=5,
         )
+        mlflow.pytorch.autolog(log_models=False)
+        with mlflow.start_run() as run:
+            trainer.fit(self.T5Model, self.data_module)
     def load_model(
         self, model_type: str = "t5", model_dir: str = "models", use_gpu: bool = False
             "rougeLsum High F1": results["rougeLsum"].high.fmeasure,
         }
         return output

src/models/predict_model.py CHANGED Viewed

@@ -8,11 +8,11 @@ def predict_model(text):
     """
     Predict the summary of the given text.
     """
-    with open("params.yml") as f:
         params = yaml.safe_load(f)
     model = Summarization()
-    model.load_model(model_type=params["model_type"], model_dir=f"{params['hf_username']}/{params['name']}")
     pre_summary = model.predict(text)
     return pre_summary

     """
     Predict the summary of the given text.
     """
+    with open("model_params.yml") as f:
         params = yaml.safe_load(f)
     model = Summarization()
+    model.load_model(model_type=params["model_type"], model_dir=params["model_dir"])
     pre_summary = model.predict(text)
     return pre_summary

src/models/train_model.py CHANGED Viewed

@@ -1,5 +1,3 @@
-import json
 import yaml
 from model import Summarization
@@ -10,15 +8,15 @@ def train_model():
     """
     Train the model
     """
-    with open("params.yml") as f:
         params = yaml.safe_load(f)
     # Load the data
     train_df = pd.read_csv("data/processed/train.csv")
     eval_df = pd.read_csv("data/processed/validation.csv")
-    train_df = train_df.sample(frac=params["split"], replace=True, random_state=1)
-    eval_df = eval_df.sample(frac=params["split"], replace=True, random_state=1)
     model = Summarization()
     model.from_pretrained(
@@ -37,15 +35,6 @@ def train_model():
     model.save_model(model_dir=params["model_dir"])
-    with open("wandb/latest-run/files/wandb-summary.json") as json_file:
-        data = json.load(json_file)
-    with open("reports/training_metrics.txt", "w") as fp:
-        json.dump(data, fp)
-    if params["upload_to_hf"]:
-        model.upload(hf_username=params["hf_username"], model_name=params["name"])
 if __name__ == "__main__":
     train_model()

 import yaml
 from model import Summarization
     """
     Train the model
     """
+    with open("model_params.yml") as f:
         params = yaml.safe_load(f)
     # Load the data
     train_df = pd.read_csv("data/processed/train.csv")
     eval_df = pd.read_csv("data/processed/validation.csv")
+    train_df = train_df.sample(random_state=1)
+    eval_df = eval_df.sample(random_state=1)
     model = Summarization()
     model.from_pretrained(
     model.save_model(model_dir=params["model_dir"])
 if __name__ == "__main__":
     train_model()

src/visualization/__init__.py DELETED Viewed

File without changes

src/visualization/visualize.py CHANGED Viewed

@@ -1,5 +1,4 @@
 import streamlit as st
-import yaml
 from src.models.predict_model import predict_model
@@ -19,14 +18,7 @@ def visualize():
             sumtext = predict_model(text=text)
         st.write("# Generated Summary:")
         st.write("{}".format(sumtext))
-        with open("reports/visualization_metrics.txt", "w") as file1:
-            file1.writelines(text)
-            file1.writelines(sumtext)
 if __name__ == "__main__":
-    with open("params.yml") as f:
-        params = yaml.safe_load(f)
-    if params["visualise"]:
-        visualize()

 import streamlit as st
 from src.models.predict_model import predict_model
             sumtext = predict_model(text=text)
         st.write("# Generated Summary:")
         st.write("{}".format(sumtext))
 if __name__ == "__main__":
+    visualize()