Spaces:

Saving-Willy
/

saving-willy-dev

Sleeping

App Files Files Community

vancauwe commited on Apr 10

Commit

aba41f2

1 Parent(s): e40a0fa

chore: documentation of refactor

Browse files

Files changed (25) hide show

docs/{hotdog.md → classifier_hotdog.md} +0 -0
docs/dataset_cleaner.md +3 -0
docs/dataset_download.md +3 -0
docs/dataset_fake_data.md +3 -0
docs/{hf_push_observations.md → dataset_hf_push_observations.md} +1 -1
docs/dataset_requests.md +3 -0
docs/{main.md → home.md} +0 -0
docs/pages.md +1 -0
docs/pages_classifiers.md +3 -0
docs/pages_gallery.md +3 -0
docs/pages_logs.md +3 -0
docs/pages_map.md +3 -0
docs/pages_requests.md +3 -0
docs/release_protocol.md +13 -0
docs/{fix_tabrender.md → utils_fix_tabrender.md} +0 -0
docs/{grid_maker.md → utils_grid_maker.md} +0 -0
docs/{metadata_handler.md → utils_metadata_handler.md} +0 -0
mkdocs.yaml +26 -18
src/classifier/classifier_image.py +1 -1
src/dataset/cleaner.py +14 -0
src/dataset/download.py +6 -0
src/dataset/fake_data.py +8 -2
src/{hf_push_observations.py → dataset/hf_push_observations.py} +0 -0
src/dataset/requests.py +28 -2
src/pages/4_🔥_classifiers.py +1 -2

docs/{hotdog.md → classifier_hotdog.md} RENAMED Viewed

File without changes

docs/dataset_cleaner.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module provides basic cleaning checks for the dataset that has been downloaded, any row which does not have the expected types is discarded.
2	+
3	+ ::: src.dataset.cleaner

docs/dataset_download.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module provides a download function for accessing the hugging face Dataset.
2	+
3	+ ::: src.data.download

docs/dataset_fake_data.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module takes care of generating some fake data.
2	+
3	+ ::: src.dataset.fake_data

docs/{hf_push_observations.md → dataset_hf_push_observations.md} RENAMED Viewed

@@ -1,3 +1,3 @@
 This module writes an observation into a temporary JSON file, in order to add this JSON file to the Saving-Willy Dataset in the Saving-Willy Hugging Face Community.
-::: src.hf_push_observations


1	This module writes an observation into a temporary JSON file, in order to add this JSON file to the Saving-Willy Dataset in the Saving-Willy Hugging Face Community.
2
3	+ ::: src.dataset.hf_push_observations

docs/dataset_requests.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This module provides functions for filtering the data by localisation and time and for rendering the search possibilities as well as the search results.
2	+
3	+ ::: src.dataset.requests

docs/{main.md → home.md} RENAMED Viewed

File without changes

docs/pages.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ The pages documented are the pages with functional code. Some pages such as about, benchmarking, challenges are currently only writing, markdown and images and do not require further documentation.

docs/pages_classifiers.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This page displays the input mechanism for your images as well as all the classifiers which can do inference on your image.
2	+
3	+ ::: src.pages.4_🔥_classifiers

docs/pages_gallery.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This page displays all the cetacean species that can be identified by classifiers.
2	+
3	+ ::: src.pages.7_🌊_gallery

docs/pages_logs.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This page displays all the logs coming from user interactions with the platform and from back-end queries to the hugging face server.
2	+
3	+ ::: src.pages.📊_logs

docs/pages_map.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This page displays the recorded observations of the dataset on a map.
2	+
3	+ ::: src.pages.2_🌍_map

docs/pages_requests.md ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ This page displays the data that can be requested. The default view is over all the data in the dataset. The filters on the side bar allow to narrow down on different geographical zones as well as on different time frames.
2	+
3	+ ::: src.pages.3_🤝_data requests

docs/release_protocol.md ADDED Viewed

	@@ -0,0 +1,13 @@

+# Release Protocol
+We use 2 spaces on hugging face: one for the development of the interface and the main space for showcasing the most recent stable release. The main branch is protected and deploys to the main space when a PR is accepted.
+We wish to enforce strict commits from the dev branch to the main branch when a PR is made to create a new release.
+Dev to Main PR Checklist:
+1. Open a PR from dev branch to main branch
+2. Commit: change the dataset to point the dataset to the main dataset
+3. Commit: change the naming in ReadME to avoid merge conflict
+4. Ask for Review
+5. Merge and make a new release of the code

docs/{fix_tabrender.md → utils_fix_tabrender.md} RENAMED Viewed

File without changes

docs/{grid_maker.md → utils_grid_maker.md} RENAMED Viewed

File without changes

docs/{metadata_handler.md → utils_metadata_handler.md} RENAMED Viewed

File without changes

mkdocs.yaml CHANGED Viewed

@@ -22,32 +22,40 @@ plugins:
 nav:
   - README: index.md
-  #- Quickstart:
-    #- Installation: installation.md
-    #- Usage: usage.md
-  - API:
-    - Main app: main.md
     - Modules:
-      - Data entry handling:
-        - Data input: input_handling.md
-        - Data extraction and validation: input_validator.md
         - Data Object Class: input_observation.md
-      - Classifiers:
         - Cetacean Fluke & Fin Recognition: classifier_image.md
-        - (temporary) Hotdog Classifier: hotdog.md
-      - Hugging Face Integration:
-        - Push Observations to Dataset: hf_push_observations.md
       - Map of observations: obs_map.md
       - Whale gallery: whale_gallery.md
       - Whale viewer: whale_viewer.md
       - Logging: st_logs.md
       - Utils:
-        - Tab-rendering fix (js): fix_tabrender.md
-        - Metadata handling: metadata_handler.md
-        - Grid maker: grid_maker.md
     - Development clutter:
       - Demo app: app.md
-  - How to contribute:
-    - Dev Notes: dev_notes.md

 nav:
   - README: index.md
+  - Release Protocol: release_protocol.md
+  - How to contribute:
+    - Dev Notes: dev_notes.md
+  - App:
+    - Main App & Home Page: home.md
+    - Pages:
+      - Overall Notes: pages.md
+      - Map Page: pages_map.md
+      - Requests Page: pages_requests.md
+      - Classifiers Page: pages_classifiers.md
+      - Gallery Page: pages_gallery.md
+      - Logs: pages_logs.md
     - Modules:
+      - Data Entry Handling:
+        - Data Input: input_handling.md
+        - Data Extraction & Validation: input_validator.md
         - Data Object Class: input_observation.md
+      - Hugging Face Dataset:
+        - Download: dataset_download.md
+        - Cleaning: dataset_cleaner.md
+        - Push Observations to Dataset: dataset_hf_push_observations.md
+        - Data Requests: dataset_requests.md
+        - Fake data: dataset_fake_data.md
+      - Hugging Face Classifiers:
         - Cetacean Fluke & Fin Recognition: classifier_image.md
+        - (temporary) Hotdog Classifier: classifier_hotdog.md
       - Map of observations: obs_map.md
       - Whale gallery: whale_gallery.md
       - Whale viewer: whale_viewer.md
       - Logging: st_logs.md
       - Utils:
+        - Tab-rendering fix (js): utils_fix_tabrender.md
+        - Metadata handling: utils_metadata_handler.md
+        - Grid maker: utils_grid_maker.md
     - Development clutter:
       - Demo app: app.md

src/classifier/classifier_image.py CHANGED Viewed

@@ -7,7 +7,7 @@ g_logger = logging.getLogger(__name__)
 g_logger.setLevel(LOG_LEVEL)
 import whale_viewer as viewer
-from hf_push_observations import push_observations
 from utils.grid_maker import gridder
 from utils.metadata_handler import metadata2md
 from input.input_observation import InputObservation

 g_logger.setLevel(LOG_LEVEL)
 import whale_viewer as viewer
+from dataset.hf_push_observations import push_observations
 from utils.grid_maker import gridder
 from utils.metadata_handler import metadata2md
 from input.input_observation import InputObservation

src/dataset/cleaner.py CHANGED Viewed

@@ -1,6 +1,13 @@
 import pandas as pd
 def clean_lat_long(df): # Ensure lat and lon are numeric, coerce errors to NaN
     df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
     df['lon'] = pd.to_numeric(df['lon'], errors='coerce')
@@ -9,6 +16,13 @@ def clean_lat_long(df): # Ensure lat and lon are numeric, coerce errors to NaN
     return df
 def clean_date(df): # Ensure lat and lon are numeric, coerce errors to NaN
     df['date'] = pd.to_datetime(df['date'], errors='coerce')
     # Drop rows with NaN in lat or lon
     df = df.dropna(subset=['date']).reset_index(drop=True)

 import pandas as pd
 def clean_lat_long(df): # Ensure lat and lon are numeric, coerce errors to NaN
+    """
+    Clean latitude and longitude columns in the DataFrame.
+    Args:
+        df (pd.DataFrame): DataFrame containing latitude and longitude columns.
+    Returns:
+        pd.DataFrame: DataFrame with cleaned latitude and longitude columns.
+    """
     df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
     df['lon'] = pd.to_numeric(df['lon'], errors='coerce')
     return df
 def clean_date(df): # Ensure lat and lon are numeric, coerce errors to NaN
+    """
+    Clean date column in the DataFrame.
+    Args:
+        df (pd.DataFrame): DataFrame containing date column.
+    Returns:
+        pd.DataFrame: DataFrame with cleaned date column.
+    """
     df['date'] = pd.to_datetime(df['date'], errors='coerce')
     # Drop rows with NaN in lat or lon
     df = df.dropna(subset=['date']).reset_index(drop=True)

src/dataset/download.py CHANGED Viewed

@@ -63,6 +63,12 @@ def try_download_dataset(dataset_id:str, data_files:str) -> dict:
     return metadata
 def get_dataset():
     # load/download data from huggingface dataset
     metadata = try_download_dataset(dataset_id, data_files)

     return metadata
 def get_dataset():
+    """
+    Downloads the dataset from Hugging Face and prepares it for use.
+    If the dataset is not available, it creates an empty DataFrame with the specified schema.
+    Returns:
+        pd.DataFrame: A DataFrame containing the dataset, or an empty DataFrame if the dataset is not available.
+    """
     # load/download data from huggingface dataset
     metadata = try_download_dataset(dataset_id, data_files)

src/dataset/fake_data.py CHANGED Viewed

@@ -4,6 +4,14 @@ import random
 from datetime import datetime, timedelta
 def generate_fake_data(df, num_fake):
     # Options for random generation
     species_options = [
@@ -51,7 +59,6 @@ def generate_fake_data(df, num_fake):
         end = datetime(end_year, 1, 1)
         return start + timedelta(days=random.randint(0, (end - start).days))
-    # Generate 20 new observations
     new_data = []
     for _ in range(num_fake):
         lat, lon = random_ocean_coord()
@@ -60,7 +67,6 @@ def generate_fake_data(df, num_fake):
         date = random_date()
         new_data.append([lat, lon, species, email, date])
-    # Create a DataFrame and append
     new_df = pd.DataFrame(new_data, columns=['lat', 'lon', 'species', 'author_email', 'date'])
     df = pd.concat([df, new_df], ignore_index=True)
     return df

 from datetime import datetime, timedelta
 def generate_fake_data(df, num_fake):
+    """
+    Generate fake data for the dataset.
+    Args:
+        df (pd.DataFrame): Original DataFrame to append fake data to.
+        num_fake (int): Number of fake observations to generate.
+    Returns:
+        pd.DataFrame: DataFrame with the original and fake data.
+    """
     # Options for random generation
     species_options = [
         end = datetime(end_year, 1, 1)
         return start + timedelta(days=random.randint(0, (end - start).days))
     new_data = []
     for _ in range(num_fake):
         lat, lon = random_ocean_coord()
         date = random_date()
         new_data.append([lat, lon, species, email, date])
     new_df = pd.DataFrame(new_data, columns=['lat', 'lon', 'species', 'author_email', 'date'])
     df = pd.concat([df, new_df], ignore_index=True)
     return df

src/{hf_push_observations.py → dataset/hf_push_observations.py} RENAMED Viewed

File without changes

src/dataset/requests.py CHANGED Viewed

@@ -5,14 +5,27 @@ from dataset.download import get_dataset
 from dataset.fake_data import generate_fake_data
 def data_prep():
-    "Doing data prep"
     df = get_dataset()
     # df = generate_fake_data(df, 100)
     df = clean_lat_long(df)
     df = clean_date(df)
     return df
 def filter_data(df):
     df_filtered = df[
     (df['date'] >= pd.to_datetime(st.session_state.date_range[0])) &
         (df['date'] <= pd.to_datetime(st.session_state.date_range[1])) &
@@ -24,7 +37,11 @@ def filter_data(df):
     return df_filtered
 def show_specie_author(df):
-    print(df)
     df = df.groupby(['species', 'author_email']).size().reset_index(name='counts')
     for specie in df["species"].unique():
         st.subheader(f"Species: {specie}")
@@ -35,6 +52,15 @@ def show_specie_author(df):
             st.session_state.checkbox_states[key] = st.checkbox(label, key=key)
 def show_new_data_view(df):
     df = filter_data(df)
     df_ordered = show_specie_author(df)
     return df_ordered

 from dataset.fake_data import generate_fake_data
 def data_prep():
+    """
+    Prepares the dataset for use in the application.
+    Downloads the dataset and cleans the data (and generates fake data if needed).
+    Returns:
+        pd.DataFrame: A DataFrame containing the cleaned dataset.
+    """
     df = get_dataset()
+    # uncomment to generate some fake data
     # df = generate_fake_data(df, 100)
     df = clean_lat_long(df)
     df = clean_date(df)
     return df
 def filter_data(df):
+    """
+    Filter the DataFrame based on user-selected ranges for latitude, longitude, and date.
+    Args:
+        df (pd.DataFrame): DataFrame to filter.
+    Returns:
+        pd.DataFrame: Filtered DataFrame.
+    """
     df_filtered = df[
     (df['date'] >= pd.to_datetime(st.session_state.date_range[0])) &
         (df['date'] <= pd.to_datetime(st.session_state.date_range[1])) &
     return df_filtered
 def show_specie_author(df):
+    """
+    Display a list of species and their corresponding authors with checkboxes.
+    Args:
+        df (pd.DataFrame): DataFrame containing species and author information.
+    """
     df = df.groupby(['species', 'author_email']).size().reset_index(name='counts')
     for specie in df["species"].unique():
         st.subheader(f"Species: {specie}")
             st.session_state.checkbox_states[key] = st.checkbox(label, key=key)
 def show_new_data_view(df):
+    """
+    Filter the dataframe based on the state of the localisation sliders and selected timeframe by the user.
+    Then, show the results of the filtering grouped by species then by authors.
+    Authors are matched to a checkbox component so the user can click it if he/she/they wish to request data from this author.
+    Args:
+        df (pd.DataFrame): DataFrame to filter and display.
+    Returns:
+        pd.DataFrame: Filtered and grouped DataFrame.
+    """
     df = filter_data(df)
     df_ordered = show_specie_author(df)
     return df_ordered

src/pages/4_🔥_classifiers.py CHANGED Viewed

@@ -19,7 +19,7 @@ from input.input_handling import init_input_container_states, add_input_UI_eleme
 from input.input_handling import dbg_show_observation_hashes
 from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
-from hf_push_observations import push_all_observations
 from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
 from classifier.classifier_hotdog import hotdog_classify
@@ -84,7 +84,6 @@ with tab_inference:
             g_logger.info(f"{st.session_state.observations}")
             df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
-            #df = pd.DataFrame(st.session_state.observations, index=[0])
             # with tab_coords:
             #     st.table(df)
             # there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)

 from input.input_handling import dbg_show_observation_hashes
 from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
+from dataset.hf_push_observations import push_all_observations
 from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
 from classifier.classifier_hotdog import hotdog_classify
             g_logger.info(f"{st.session_state.observations}")
             df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
             # with tab_coords:
             #     st.table(df)
             # there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)