Spaces:
Running
Running
chore: documentation of refactor
Browse files- docs/{hotdog.md β classifier_hotdog.md} +0 -0
- docs/dataset_cleaner.md +3 -0
- docs/dataset_download.md +3 -0
- docs/dataset_fake_data.md +3 -0
- docs/{hf_push_observations.md β dataset_hf_push_observations.md} +1 -1
- docs/dataset_requests.md +3 -0
- docs/{main.md β home.md} +0 -0
- docs/pages.md +1 -0
- docs/pages_classifiers.md +3 -0
- docs/pages_gallery.md +3 -0
- docs/pages_logs.md +3 -0
- docs/pages_map.md +3 -0
- docs/pages_requests.md +3 -0
- docs/release_protocol.md +13 -0
- docs/{fix_tabrender.md β utils_fix_tabrender.md} +0 -0
- docs/{grid_maker.md β utils_grid_maker.md} +0 -0
- docs/{metadata_handler.md β utils_metadata_handler.md} +0 -0
- mkdocs.yaml +26 -18
- src/classifier/classifier_image.py +1 -1
- src/dataset/cleaner.py +14 -0
- src/dataset/download.py +6 -0
- src/dataset/fake_data.py +8 -2
- src/{hf_push_observations.py β dataset/hf_push_observations.py} +0 -0
- src/dataset/requests.py +28 -2
- src/pages/4_π₯_classifiers.py +1 -2
docs/{hotdog.md β classifier_hotdog.md}
RENAMED
File without changes
|
docs/dataset_cleaner.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This module provides basic cleaning checks for the dataset that has been downloaded, any row which does not have the expected types is discarded.
|
2 |
+
|
3 |
+
::: src.dataset.cleaner
|
docs/dataset_download.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This module provides a download function for accessing the hugging face Dataset.
|
2 |
+
|
3 |
+
::: src.data.download
|
docs/dataset_fake_data.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This module takes care of generating some fake data.
|
2 |
+
|
3 |
+
::: src.dataset.fake_data
|
docs/{hf_push_observations.md β dataset_hf_push_observations.md}
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
This module writes an observation into a temporary JSON file, in order to add this JSON file to the Saving-Willy Dataset in the Saving-Willy Hugging Face Community.
|
2 |
|
3 |
-
::: src.hf_push_observations
|
|
|
1 |
This module writes an observation into a temporary JSON file, in order to add this JSON file to the Saving-Willy Dataset in the Saving-Willy Hugging Face Community.
|
2 |
|
3 |
+
::: src.dataset.hf_push_observations
|
docs/dataset_requests.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This module provides functions for filtering the data by localisation and time and for rendering the search possibilities as well as the search results.
|
2 |
+
|
3 |
+
::: src.dataset.requests
|
docs/{main.md β home.md}
RENAMED
File without changes
|
docs/pages.md
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
The pages documented are the pages with functional code. Some pages such as about, benchmarking, challenges are currently only writing, markdown and images and do not require further documentation.
|
docs/pages_classifiers.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This page displays the input mechanism for your images as well as all the classifiers which can do inference on your image.
|
2 |
+
|
3 |
+
::: src.pages.4_π₯_classifiers
|
docs/pages_gallery.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This page displays all the cetacean species that can be identified by classifiers.
|
2 |
+
|
3 |
+
::: src.pages.7_π_gallery
|
docs/pages_logs.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This page displays all the logs coming from user interactions with the platform and from back-end queries to the hugging face server.
|
2 |
+
|
3 |
+
::: src.pages.π_logs
|
docs/pages_map.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This page displays the recorded observations of the dataset on a map.
|
2 |
+
|
3 |
+
::: src.pages.2_π_map
|
docs/pages_requests.md
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
This page displays the data that can be requested. The default view is over all the data in the dataset. The filters on the side bar allow to narrow down on different geographical zones as well as on different time frames.
|
2 |
+
|
3 |
+
::: src.pages.3_π€_data requests
|
docs/release_protocol.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Release Protocol
|
2 |
+
|
3 |
+
We use 2 spaces on hugging face: one for the development of the interface and the main space for showcasing the most recent stable release. The main branch is protected and deploys to the main space when a PR is accepted.
|
4 |
+
|
5 |
+
We wish to enforce strict commits from the dev branch to the main branch when a PR is made to create a new release.
|
6 |
+
|
7 |
+
Dev to Main PR Checklist:
|
8 |
+
|
9 |
+
1. Open a PR from dev branch to main branch
|
10 |
+
2. Commit: change the dataset to point the dataset to the main dataset
|
11 |
+
3. Commit: change the naming in ReadME to avoid merge conflict
|
12 |
+
4. Ask for Review
|
13 |
+
5. Merge and make a new release of the code
|
docs/{fix_tabrender.md β utils_fix_tabrender.md}
RENAMED
File without changes
|
docs/{grid_maker.md β utils_grid_maker.md}
RENAMED
File without changes
|
docs/{metadata_handler.md β utils_metadata_handler.md}
RENAMED
File without changes
|
mkdocs.yaml
CHANGED
@@ -22,32 +22,40 @@ plugins:
|
|
22 |
|
23 |
nav:
|
24 |
- README: index.md
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
-
|
29 |
-
- Main
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
- Modules:
|
31 |
-
- Data
|
32 |
-
- Data
|
33 |
-
- Data
|
34 |
- Data Object Class: input_observation.md
|
35 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
- Cetacean Fluke & Fin Recognition: classifier_image.md
|
37 |
-
- (temporary) Hotdog Classifier:
|
38 |
-
- Hugging Face Integration:
|
39 |
-
- Push Observations to Dataset: hf_push_observations.md
|
40 |
- Map of observations: obs_map.md
|
41 |
- Whale gallery: whale_gallery.md
|
42 |
- Whale viewer: whale_viewer.md
|
43 |
- Logging: st_logs.md
|
44 |
- Utils:
|
45 |
-
- Tab-rendering fix (js):
|
46 |
-
- Metadata handling:
|
47 |
-
- Grid maker:
|
48 |
|
49 |
- Development clutter:
|
50 |
- Demo app: app.md
|
51 |
-
|
52 |
-
- How to contribute:
|
53 |
-
- Dev Notes: dev_notes.md
|
|
|
22 |
|
23 |
nav:
|
24 |
- README: index.md
|
25 |
+
- Release Protocol: release_protocol.md
|
26 |
+
- How to contribute:
|
27 |
+
- Dev Notes: dev_notes.md
|
28 |
+
- App:
|
29 |
+
- Main App & Home Page: home.md
|
30 |
+
- Pages:
|
31 |
+
- Overall Notes: pages.md
|
32 |
+
- Map Page: pages_map.md
|
33 |
+
- Requests Page: pages_requests.md
|
34 |
+
- Classifiers Page: pages_classifiers.md
|
35 |
+
- Gallery Page: pages_gallery.md
|
36 |
+
- Logs: pages_logs.md
|
37 |
- Modules:
|
38 |
+
- Data Entry Handling:
|
39 |
+
- Data Input: input_handling.md
|
40 |
+
- Data Extraction & Validation: input_validator.md
|
41 |
- Data Object Class: input_observation.md
|
42 |
+
- Hugging Face Dataset:
|
43 |
+
- Download: dataset_download.md
|
44 |
+
- Cleaning: dataset_cleaner.md
|
45 |
+
- Push Observations to Dataset: dataset_hf_push_observations.md
|
46 |
+
- Data Requests: dataset_requests.md
|
47 |
+
- Fake data: dataset_fake_data.md
|
48 |
+
- Hugging Face Classifiers:
|
49 |
- Cetacean Fluke & Fin Recognition: classifier_image.md
|
50 |
+
- (temporary) Hotdog Classifier: classifier_hotdog.md
|
|
|
|
|
51 |
- Map of observations: obs_map.md
|
52 |
- Whale gallery: whale_gallery.md
|
53 |
- Whale viewer: whale_viewer.md
|
54 |
- Logging: st_logs.md
|
55 |
- Utils:
|
56 |
+
- Tab-rendering fix (js): utils_fix_tabrender.md
|
57 |
+
- Metadata handling: utils_metadata_handler.md
|
58 |
+
- Grid maker: utils_grid_maker.md
|
59 |
|
60 |
- Development clutter:
|
61 |
- Demo app: app.md
|
|
|
|
|
|
src/classifier/classifier_image.py
CHANGED
@@ -7,7 +7,7 @@ g_logger = logging.getLogger(__name__)
|
|
7 |
g_logger.setLevel(LOG_LEVEL)
|
8 |
|
9 |
import whale_viewer as viewer
|
10 |
-
from hf_push_observations import push_observations
|
11 |
from utils.grid_maker import gridder
|
12 |
from utils.metadata_handler import metadata2md
|
13 |
from input.input_observation import InputObservation
|
|
|
7 |
g_logger.setLevel(LOG_LEVEL)
|
8 |
|
9 |
import whale_viewer as viewer
|
10 |
+
from dataset.hf_push_observations import push_observations
|
11 |
from utils.grid_maker import gridder
|
12 |
from utils.metadata_handler import metadata2md
|
13 |
from input.input_observation import InputObservation
|
src/dataset/cleaner.py
CHANGED
@@ -1,6 +1,13 @@
|
|
1 |
import pandas as pd
|
2 |
|
3 |
def clean_lat_long(df): # Ensure lat and lon are numeric, coerce errors to NaN
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
|
5 |
df['lon'] = pd.to_numeric(df['lon'], errors='coerce')
|
6 |
|
@@ -9,6 +16,13 @@ def clean_lat_long(df): # Ensure lat and lon are numeric, coerce errors to NaN
|
|
9 |
return df
|
10 |
|
11 |
def clean_date(df): # Ensure lat and lon are numeric, coerce errors to NaN
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
df['date'] = pd.to_datetime(df['date'], errors='coerce')
|
13 |
# Drop rows with NaN in lat or lon
|
14 |
df = df.dropna(subset=['date']).reset_index(drop=True)
|
|
|
1 |
import pandas as pd
|
2 |
|
3 |
def clean_lat_long(df): # Ensure lat and lon are numeric, coerce errors to NaN
|
4 |
+
"""
|
5 |
+
Clean latitude and longitude columns in the DataFrame.
|
6 |
+
Args:
|
7 |
+
df (pd.DataFrame): DataFrame containing latitude and longitude columns.
|
8 |
+
Returns:
|
9 |
+
pd.DataFrame: DataFrame with cleaned latitude and longitude columns.
|
10 |
+
"""
|
11 |
df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
|
12 |
df['lon'] = pd.to_numeric(df['lon'], errors='coerce')
|
13 |
|
|
|
16 |
return df
|
17 |
|
18 |
def clean_date(df): # Ensure lat and lon are numeric, coerce errors to NaN
|
19 |
+
"""
|
20 |
+
Clean date column in the DataFrame.
|
21 |
+
Args:
|
22 |
+
df (pd.DataFrame): DataFrame containing date column.
|
23 |
+
Returns:
|
24 |
+
pd.DataFrame: DataFrame with cleaned date column.
|
25 |
+
"""
|
26 |
df['date'] = pd.to_datetime(df['date'], errors='coerce')
|
27 |
# Drop rows with NaN in lat or lon
|
28 |
df = df.dropna(subset=['date']).reset_index(drop=True)
|
src/dataset/download.py
CHANGED
@@ -63,6 +63,12 @@ def try_download_dataset(dataset_id:str, data_files:str) -> dict:
|
|
63 |
return metadata
|
64 |
|
65 |
def get_dataset():
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
# load/download data from huggingface dataset
|
67 |
metadata = try_download_dataset(dataset_id, data_files)
|
68 |
|
|
|
63 |
return metadata
|
64 |
|
65 |
def get_dataset():
|
66 |
+
"""
|
67 |
+
Downloads the dataset from Hugging Face and prepares it for use.
|
68 |
+
If the dataset is not available, it creates an empty DataFrame with the specified schema.
|
69 |
+
Returns:
|
70 |
+
pd.DataFrame: A DataFrame containing the dataset, or an empty DataFrame if the dataset is not available.
|
71 |
+
"""
|
72 |
# load/download data from huggingface dataset
|
73 |
metadata = try_download_dataset(dataset_id, data_files)
|
74 |
|
src/dataset/fake_data.py
CHANGED
@@ -4,6 +4,14 @@ import random
|
|
4 |
from datetime import datetime, timedelta
|
5 |
|
6 |
def generate_fake_data(df, num_fake):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
# Options for random generation
|
9 |
species_options = [
|
@@ -51,7 +59,6 @@ def generate_fake_data(df, num_fake):
|
|
51 |
end = datetime(end_year, 1, 1)
|
52 |
return start + timedelta(days=random.randint(0, (end - start).days))
|
53 |
|
54 |
-
# Generate 20 new observations
|
55 |
new_data = []
|
56 |
for _ in range(num_fake):
|
57 |
lat, lon = random_ocean_coord()
|
@@ -60,7 +67,6 @@ def generate_fake_data(df, num_fake):
|
|
60 |
date = random_date()
|
61 |
new_data.append([lat, lon, species, email, date])
|
62 |
|
63 |
-
# Create a DataFrame and append
|
64 |
new_df = pd.DataFrame(new_data, columns=['lat', 'lon', 'species', 'author_email', 'date'])
|
65 |
df = pd.concat([df, new_df], ignore_index=True)
|
66 |
return df
|
|
|
4 |
from datetime import datetime, timedelta
|
5 |
|
6 |
def generate_fake_data(df, num_fake):
|
7 |
+
"""
|
8 |
+
Generate fake data for the dataset.
|
9 |
+
Args:
|
10 |
+
df (pd.DataFrame): Original DataFrame to append fake data to.
|
11 |
+
num_fake (int): Number of fake observations to generate.
|
12 |
+
Returns:
|
13 |
+
pd.DataFrame: DataFrame with the original and fake data.
|
14 |
+
"""
|
15 |
|
16 |
# Options for random generation
|
17 |
species_options = [
|
|
|
59 |
end = datetime(end_year, 1, 1)
|
60 |
return start + timedelta(days=random.randint(0, (end - start).days))
|
61 |
|
|
|
62 |
new_data = []
|
63 |
for _ in range(num_fake):
|
64 |
lat, lon = random_ocean_coord()
|
|
|
67 |
date = random_date()
|
68 |
new_data.append([lat, lon, species, email, date])
|
69 |
|
|
|
70 |
new_df = pd.DataFrame(new_data, columns=['lat', 'lon', 'species', 'author_email', 'date'])
|
71 |
df = pd.concat([df, new_df], ignore_index=True)
|
72 |
return df
|
src/{hf_push_observations.py β dataset/hf_push_observations.py}
RENAMED
File without changes
|
src/dataset/requests.py
CHANGED
@@ -5,14 +5,27 @@ from dataset.download import get_dataset
|
|
5 |
from dataset.fake_data import generate_fake_data
|
6 |
|
7 |
def data_prep():
|
8 |
-
"
|
|
|
|
|
|
|
|
|
|
|
9 |
df = get_dataset()
|
|
|
10 |
# df = generate_fake_data(df, 100)
|
11 |
df = clean_lat_long(df)
|
12 |
df = clean_date(df)
|
13 |
return df
|
14 |
|
15 |
def filter_data(df):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
df_filtered = df[
|
17 |
(df['date'] >= pd.to_datetime(st.session_state.date_range[0])) &
|
18 |
(df['date'] <= pd.to_datetime(st.session_state.date_range[1])) &
|
@@ -24,7 +37,11 @@ def filter_data(df):
|
|
24 |
return df_filtered
|
25 |
|
26 |
def show_specie_author(df):
|
27 |
-
|
|
|
|
|
|
|
|
|
28 |
df = df.groupby(['species', 'author_email']).size().reset_index(name='counts')
|
29 |
for specie in df["species"].unique():
|
30 |
st.subheader(f"Species: {specie}")
|
@@ -35,6 +52,15 @@ def show_specie_author(df):
|
|
35 |
st.session_state.checkbox_states[key] = st.checkbox(label, key=key)
|
36 |
|
37 |
def show_new_data_view(df):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
df = filter_data(df)
|
39 |
df_ordered = show_specie_author(df)
|
40 |
return df_ordered
|
|
|
5 |
from dataset.fake_data import generate_fake_data
|
6 |
|
7 |
def data_prep():
|
8 |
+
"""
|
9 |
+
Prepares the dataset for use in the application.
|
10 |
+
Downloads the dataset and cleans the data (and generates fake data if needed).
|
11 |
+
Returns:
|
12 |
+
pd.DataFrame: A DataFrame containing the cleaned dataset.
|
13 |
+
"""
|
14 |
df = get_dataset()
|
15 |
+
# uncomment to generate some fake data
|
16 |
# df = generate_fake_data(df, 100)
|
17 |
df = clean_lat_long(df)
|
18 |
df = clean_date(df)
|
19 |
return df
|
20 |
|
21 |
def filter_data(df):
|
22 |
+
"""
|
23 |
+
Filter the DataFrame based on user-selected ranges for latitude, longitude, and date.
|
24 |
+
Args:
|
25 |
+
df (pd.DataFrame): DataFrame to filter.
|
26 |
+
Returns:
|
27 |
+
pd.DataFrame: Filtered DataFrame.
|
28 |
+
"""
|
29 |
df_filtered = df[
|
30 |
(df['date'] >= pd.to_datetime(st.session_state.date_range[0])) &
|
31 |
(df['date'] <= pd.to_datetime(st.session_state.date_range[1])) &
|
|
|
37 |
return df_filtered
|
38 |
|
39 |
def show_specie_author(df):
|
40 |
+
"""
|
41 |
+
Display a list of species and their corresponding authors with checkboxes.
|
42 |
+
Args:
|
43 |
+
df (pd.DataFrame): DataFrame containing species and author information.
|
44 |
+
"""
|
45 |
df = df.groupby(['species', 'author_email']).size().reset_index(name='counts')
|
46 |
for specie in df["species"].unique():
|
47 |
st.subheader(f"Species: {specie}")
|
|
|
52 |
st.session_state.checkbox_states[key] = st.checkbox(label, key=key)
|
53 |
|
54 |
def show_new_data_view(df):
|
55 |
+
"""
|
56 |
+
Filter the dataframe based on the state of the localisation sliders and selected timeframe by the user.
|
57 |
+
Then, show the results of the filtering grouped by species then by authors.
|
58 |
+
Authors are matched to a checkbox component so the user can click it if he/she/they wish to request data from this author.
|
59 |
+
Args:
|
60 |
+
df (pd.DataFrame): DataFrame to filter and display.
|
61 |
+
Returns:
|
62 |
+
pd.DataFrame: Filtered and grouped DataFrame.
|
63 |
+
"""
|
64 |
df = filter_data(df)
|
65 |
df_ordered = show_specie_author(df)
|
66 |
return df_ordered
|
src/pages/4_π₯_classifiers.py
CHANGED
@@ -19,7 +19,7 @@ from input.input_handling import init_input_container_states, add_input_UI_eleme
|
|
19 |
from input.input_handling import dbg_show_observation_hashes
|
20 |
|
21 |
from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
|
22 |
-
from hf_push_observations import push_all_observations
|
23 |
|
24 |
from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
|
25 |
from classifier.classifier_hotdog import hotdog_classify
|
@@ -84,7 +84,6 @@ with tab_inference:
|
|
84 |
g_logger.info(f"{st.session_state.observations}")
|
85 |
|
86 |
df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
|
87 |
-
#df = pd.DataFrame(st.session_state.observations, index=[0])
|
88 |
# with tab_coords:
|
89 |
# st.table(df)
|
90 |
# there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)
|
|
|
19 |
from input.input_handling import dbg_show_observation_hashes
|
20 |
|
21 |
from utils.workflow_ui import refresh_progress_display, init_workflow_viz, init_workflow_session_states
|
22 |
+
from dataset.hf_push_observations import push_all_observations
|
23 |
|
24 |
from classifier.classifier_image import cetacean_just_classify, cetacean_show_results_and_review, cetacean_show_results, init_classifier_session_states
|
25 |
from classifier.classifier_hotdog import hotdog_classify
|
|
|
84 |
g_logger.info(f"{st.session_state.observations}")
|
85 |
|
86 |
df = pd.DataFrame([obs.to_dict() for obs in st.session_state.observations.values()])
|
|
|
87 |
# with tab_coords:
|
88 |
# st.table(df)
|
89 |
# there doesn't seem to be any actual validation here?? TODO: find validator function (each element is validated by the input box, but is there something at the whole image level?)
|