AlvaroMros commited on
Commit
5271c2e
·
1 Parent(s): 0e63702

Update README and UFC data, retrain models

Browse files

Expanded and clarified the README with detailed usage instructions for scraping, prediction, and pipeline execution. Updated ufc_fights.csv with new event results, added output/last_event.json, and refreshed model artifacts and results to reflect retraining on the latest data.

README.md CHANGED
@@ -19,20 +19,50 @@ pinned: false
19
  ```bash
20
  pip install -r requirements.txt
21
  ```
22
- ## Scraping:
23
- Scrape ALL fight and fighter data from [ufcstats.com](http://ufcstats.com) up to the latest event and save them in `.csv` format
24
 
25
- 2. Then run the main script to scrape all data:
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ```bash
28
- python -m src.scrape.main
29
  ```
30
- This command will execute the entire scraping and processing pipeline, saving the final CSV files in the `output/` directory.
31
 
32
- ## Train and save ML models:
 
 
 
 
33
 
34
- This trains a different set of ML models and saves them in `output/models`.
35
 
36
  ```bash
37
- python -m src.predict.main
38
  ```
 
 
 
 
 
 
 
 
 
 
 
 
19
  ```bash
20
  pip install -r requirements.txt
21
  ```
 
 
22
 
23
+ ## Usage
24
 
25
+ ### 1. Data Scraping
26
+
27
+ **Initial Setup (First Time):**
28
+ ```bash
29
+ python -m src.main --pipeline scrape --scrape-mode full
30
+ ```
31
+ Scrapes all historical fight data from ufcstats.com.
32
+
33
+ **Update Data (Regular Use):**
34
+ ```bash
35
+ python -m src.main --pipeline scrape --scrape-mode update
36
+ ```
37
+ Adds only the latest events to existing data.
38
+
39
+ ### 2. Fight Prediction
40
+
41
+ **Use Existing Models (Fast):**
42
  ```bash
43
+ python -m src.main --pipeline predict
44
  ```
45
+ Loads saved models if available and retrains if new data available.
46
 
47
+ **Force Retrain Models:**
48
+ ```bash
49
+ python -m src.main --pipeline predict --force-retrain
50
+ ```
51
+ Always retrains all models from scratch with latest data. This is useful for when the way training models changes
52
 
53
+ ### 3. Complete Pipeline
54
 
55
  ```bash
56
+ python -m src.main --pipeline all --scrape-mode update
57
  ```
58
+ Runs scraping (update mode), analysis, and prediction in sequence.
59
+
60
+ ## Model Performance
61
+
62
+ The system tests on the latest UFC event for realistic accuracy scores (typically 50-70% for fight prediction).
63
+
64
+ ## Output
65
+
66
+ - **Data:** `output/ufc_fights.csv`, `output/ufc_fighters.csv`
67
+ - **Models:** `output/models/*.joblib`
68
+ - **Results:** `output/model_results.json`
output/last_event.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6437dbe76de54ac99372958849c4fda0baab3fe5dae46844de8201f4df7ea50
3
+ size 168
output/model_results.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ac3da79a015fe96d6a70000dea70be81cd626208bfdc05a79b2c7d444d68a59
3
- size 222959
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf8df1ba9e26fa98e34bfb1c773e66576cbf89152087c55b70921269c84f39d5
3
+ size 27286
output/models/BernoulliNBModel.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:48e363229ce42b62cc80eaa694e53906527f17faecd49fc952ec8b70753bec39
3
- size 5338648
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ff1f1701e009137de1325c65eda57ff32444f723b07d6bc9bf0dd5b87d4dd01
3
+ size 5344949
output/models/LGBMModel.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e91f0d2d056c0a2d0d19866cac1a547498cf1c5d819e34b842880554befd30bc
3
- size 6649369
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2acd855ed50d393d06119fc0a3cff73e7a2e1affe2d387e631169b52e8083dd
3
+ size 6657224
output/models/LogisticRegressionModel.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0db74028309dac730252143b3df7fc115c145dfab7da1f1dc1b25f55c1c3f65a
3
- size 5511435
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a773552b7f1b166858ab1ff7bdf472e24b293279a8e24871de773b1a3de46e1
3
+ size 5517988
output/models/RandomForestModel.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b1dfa2fa6240d5979ebaf66aa933b0d5c10f0919cf14c56e65047cd89ebd5259
3
- size 49556539
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:100ab12c17d233b9ac97e75a8d81cf339c0d7cbd7f17050005f535f2965a67cd
3
+ size 49715610
output/models/SVCModel.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:063b4df8b8c95679fb861498247120466f125a54245abd6498acaf5fb4c73a93
3
- size 7193785
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4db6a11d4082ffa4d8626e485959c42868553380a7dabfc93db55bceaecd873
3
+ size 7204520
output/models/XGBoostModel.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cbbb33909efa675e5dc7f2860c0ae32a90d4721ff92175aa03728bfa793af979
3
- size 6060855
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:901938289fd8ac976f04be6ae72ba6ea9df9dcda4d6d37955f47bb9fdf2acd30
3
+ size 6070396
output/ufc_fights.csv CHANGED
The diff for this file is too large to render. See raw diff
 
src/config.py CHANGED
@@ -5,6 +5,6 @@ MODELS_DIR = os.path.join(OUTPUT_DIR, 'models')
5
  MODEL_RESULTS_PATH = os.path.join(OUTPUT_DIR, 'model_results.json')
6
  FIGHTS_CSV_PATH = os.path.join(OUTPUT_DIR, 'ufc_fights.csv')
7
  FIGHTERS_CSV_PATH = os.path.join(OUTPUT_DIR, 'ufc_fighters.csv')
8
- UPCOMING_EVENTS_JSON_PATH = os.path.join(OUTPUT_DIR, 'upcoming_events.json')
9
  EVENTS_JSON_PATH = os.path.join(OUTPUT_DIR, 'events.json')
10
-
 
 
5
  MODEL_RESULTS_PATH = os.path.join(OUTPUT_DIR, 'model_results.json')
6
  FIGHTS_CSV_PATH = os.path.join(OUTPUT_DIR, 'ufc_fights.csv')
7
  FIGHTERS_CSV_PATH = os.path.join(OUTPUT_DIR, 'ufc_fighters.csv')
 
8
  EVENTS_JSON_PATH = os.path.join(OUTPUT_DIR, 'events.json')
9
+ FIGHTERS_JSON_PATH = os.path.join(OUTPUT_DIR, 'fighters.json')
10
+ LAST_EVENT_JSON_PATH = os.path.join(OUTPUT_DIR, 'last_event.json')
src/main.py CHANGED
@@ -1,5 +1,96 @@
 
 
 
1
 
 
 
2
 
3
- # Run scrape.main
4
- # Run analysis.elo to add elo
5
- # Run predict.main for ML models
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import sys
3
+ import os
4
 
5
+ # Add the current directory to Python path for imports
6
+ sys.path.append(os.path.dirname(os.path.abspath(__file__)))
7
 
8
+ def main():
9
+ """
10
+ Main entry point for the UFC data pipeline.
11
+ Supports scraping, analysis, and prediction workflows.
12
+ """
13
+ parser = argparse.ArgumentParser(description="UFC Data Pipeline")
14
+ parser.add_argument(
15
+ '--pipeline',
16
+ type=str,
17
+ default='scrape',
18
+ choices=['scrape', 'analysis', 'predict', 'all'],
19
+ help="Pipeline to run: 'scrape', 'analysis', 'predict', or 'all'"
20
+ )
21
+ parser.add_argument(
22
+ '--scrape-mode',
23
+ type=str,
24
+ default='full',
25
+ choices=['full', 'update'],
26
+ help="Scraping mode: 'full' (complete scraping) or 'update' (latest events only)"
27
+ )
28
+ parser.add_argument(
29
+ '--num-events',
30
+ type=int,
31
+ default=5,
32
+ help="Number of latest events to scrape in update mode (default: 5)"
33
+ )
34
+ # Model management arguments for prediction pipeline
35
+ parser.add_argument(
36
+ '--use-existing-models',
37
+ action='store_true',
38
+ default=True,
39
+ help="Use existing saved models if available and no new data (default: True)."
40
+ )
41
+ parser.add_argument(
42
+ '--no-use-existing-models',
43
+ action='store_true',
44
+ default=False,
45
+ help="Force retrain all models from scratch, ignoring existing saved models."
46
+ )
47
+ parser.add_argument(
48
+ '--force-retrain',
49
+ action='store_true',
50
+ default=False,
51
+ help="Force retrain all models even if no new data is available."
52
+ )
53
+
54
+ args = parser.parse_args()
55
+
56
+ if args.pipeline in ['scrape', 'all']:
57
+ print("=== Running Scraping Pipeline ===")
58
+ from scrape.main import main as scrape_main
59
+
60
+ # Override sys.argv to pass arguments to scrape.main
61
+ original_argv = sys.argv
62
+ sys.argv = ['scrape_main', '--mode', args.scrape_mode, '--num-events', str(args.num_events)]
63
+ try:
64
+ scrape_main()
65
+ finally:
66
+ sys.argv = original_argv
67
+
68
+ if args.pipeline in ['analysis', 'all']:
69
+ print("\n=== Running ELO Analysis ===")
70
+ from analysis.elo import main as elo_main
71
+ elo_main()
72
+
73
+ if args.pipeline in ['predict', 'all']:
74
+ print("\n=== Running Prediction Pipeline ===")
75
+ from predict.main import main as predict_main
76
+
77
+ # Override sys.argv to pass model management arguments to predict.main
78
+ original_argv = sys.argv
79
+ predict_args = ['predict_main']
80
+
81
+ if args.no_use_existing_models:
82
+ predict_args.append('--no-use-existing-models')
83
+ elif args.use_existing_models:
84
+ predict_args.append('--use-existing-models')
85
+
86
+ if args.force_retrain:
87
+ predict_args.append('--force-retrain')
88
+
89
+ sys.argv = predict_args
90
+ try:
91
+ predict_main()
92
+ finally:
93
+ sys.argv = original_argv
94
+
95
+ if __name__ == '__main__':
96
+ main()
src/predict/main.py CHANGED
@@ -1,6 +1,8 @@
1
  import argparse
2
- from .pipeline import PredictionPipeline
3
- from .models import (
 
 
4
  EloBaselineModel,
5
  LogisticRegressionModel,
6
  XGBoostModel,
@@ -23,8 +25,37 @@ def main():
23
  choices=['detailed', 'summary'],
24
  help="Type of report to generate: 'detailed' (file) or 'summary' (console)."
25
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  args = parser.parse_args()
27
 
 
 
 
 
 
 
 
 
 
 
 
28
  # --- Define Models to Run ---
29
  # Instantiate all the models you want to evaluate here.
30
  models_to_run = [
@@ -38,7 +69,11 @@ def main():
38
  ]
39
  # --- End of Model Definition ---
40
 
41
- pipeline = PredictionPipeline(models=models_to_run)
 
 
 
 
42
 
43
  try:
44
  pipeline.run(detailed_report=(args.report == 'detailed'))
 
1
  import argparse
2
+
3
+ # Use absolute imports to avoid relative import issues
4
+ from src.predict.pipeline import PredictionPipeline
5
+ from src.predict.models import (
6
  EloBaselineModel,
7
  LogisticRegressionModel,
8
  XGBoostModel,
 
25
  choices=['detailed', 'summary'],
26
  help="Type of report to generate: 'detailed' (file) or 'summary' (console)."
27
  )
28
+ parser.add_argument(
29
+ '--use-existing-models',
30
+ action='store_true',
31
+ default=True,
32
+ help="Use existing saved models if available and no new data (default: True)."
33
+ )
34
+ parser.add_argument(
35
+ '--no-use-existing-models',
36
+ action='store_true',
37
+ default=False,
38
+ help="Force retrain all models from scratch, ignoring existing saved models."
39
+ )
40
+ parser.add_argument(
41
+ '--force-retrain',
42
+ action='store_true',
43
+ default=False,
44
+ help="Force retrain all models even if no new data is available."
45
+ )
46
  args = parser.parse_args()
47
 
48
+ # Handle conflicting arguments
49
+ use_existing_models = not args.no_use_existing_models and args.use_existing_models
50
+ force_retrain = args.force_retrain
51
+
52
+ if args.no_use_existing_models:
53
+ print("No-use-existing-models flag set: All models will be retrained from scratch.")
54
+ elif force_retrain:
55
+ print("Force-retrain flag set: All models will be retrained regardless of new data.")
56
+ elif use_existing_models:
57
+ print("Using existing models if available and no new data detected.")
58
+
59
  # --- Define Models to Run ---
60
  # Instantiate all the models you want to evaluate here.
61
  models_to_run = [
 
69
  ]
70
  # --- End of Model Definition ---
71
 
72
+ pipeline = PredictionPipeline(
73
+ models=models_to_run,
74
+ use_existing_models=use_existing_models,
75
+ force_retrain=force_retrain
76
+ )
77
 
78
  try:
79
  pipeline.run(detailed_report=(args.report == 'detailed'))
src/predict/models.py CHANGED
@@ -1,7 +1,6 @@
1
  from abc import ABC, abstractmethod
2
  import sys
3
  import os
4
- from ..analysis.elo import process_fights_for_elo, INITIAL_ELO
5
  import pandas as pd
6
  from sklearn.linear_model import LogisticRegression
7
  from sklearn.svm import SVC
@@ -9,8 +8,17 @@ from sklearn.naive_bayes import BernoulliNB
9
  from sklearn.ensemble import RandomForestClassifier
10
  from xgboost import XGBClassifier
11
  from lightgbm import LGBMClassifier
12
- from ..config import FIGHTERS_CSV_PATH
13
- from .preprocess import preprocess_for_ml, _get_fighter_history_stats, _calculate_age
 
 
 
 
 
 
 
 
 
14
 
15
  class BaseModel(ABC):
16
  """
 
1
  from abc import ABC, abstractmethod
2
  import sys
3
  import os
 
4
  import pandas as pd
5
  from sklearn.linear_model import LogisticRegression
6
  from sklearn.svm import SVC
 
8
  from sklearn.ensemble import RandomForestClassifier
9
  from xgboost import XGBClassifier
10
  from lightgbm import LGBMClassifier
11
+
12
+ # Use absolute imports to avoid relative import issues
13
+ try:
14
+ from src.analysis.elo import process_fights_for_elo, INITIAL_ELO
15
+ from src.config import FIGHTERS_CSV_PATH
16
+ from src.predict.preprocess import preprocess_for_ml, _get_fighter_history_stats, _calculate_age
17
+ except ImportError:
18
+ # Fallback for when running directly
19
+ from ..analysis.elo import process_fights_for_elo, INITIAL_ELO
20
+ from ..config import FIGHTERS_CSV_PATH
21
+ from .preprocess import preprocess_for_ml, _get_fighter_history_stats, _calculate_age
22
 
23
  class BaseModel(ABC):
24
  """
src/predict/pipeline.py CHANGED
@@ -6,22 +6,139 @@ from collections import OrderedDict
6
  import json
7
  import joblib
8
 
9
- from ..config import FIGHTS_CSV_PATH, MODEL_RESULTS_PATH, MODELS_DIR
 
 
 
 
 
 
10
  from .models import BaseModel
11
 
12
  class PredictionPipeline:
13
  """
14
  Orchestrates the model training, evaluation, and reporting pipeline.
15
  """
16
- def __init__(self, models):
17
  if not all(isinstance(m, BaseModel) for m in models):
18
  raise TypeError("All models must be instances of BaseModel.")
19
  self.models = models
20
  self.train_fights = []
21
  self.test_fights = []
22
  self.results = {}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
- def _load_and_split_data(self, num_test_events=10):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  """Loads and splits the data into chronological training and testing sets."""
26
  print("\n--- Loading and Splitting Data ---")
27
  if not os.path.exists(FIGHTS_CSV_PATH):
@@ -41,7 +158,7 @@ class PredictionPipeline:
41
  self.train_fights = [f for f in fights if f['event_name'] not in test_event_names]
42
  self.test_fights = [f for f in fights if f['event_name'] in test_event_names]
43
  print(f"Data loaded. {len(self.train_fights)} training fights, {len(self.test_fights)} testing fights.")
44
- print(f"Testing on the last {num_test_events} events.")
45
 
46
  def run(self, detailed_report=True):
47
  """Executes the full pipeline: load, train, evaluate, report and save models."""
@@ -52,10 +169,24 @@ class PredictionPipeline:
52
  print("No fights with definitive outcomes in the test set. Aborting.")
53
  return
54
 
55
- for model in self.models:
 
 
56
  model_name = model.__class__.__name__
57
  print(f"\n--- Evaluating Model: {model_name} ---")
58
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  model.train(self.train_fights)
60
 
61
  correct_predictions = 0
@@ -84,10 +215,12 @@ class PredictionPipeline:
84
  })
85
 
86
  accuracy = (correct_predictions / len(eval_fights)) * 100
 
87
  self.results[model_name] = {
88
  'accuracy': accuracy,
89
  'predictions': predictions,
90
- 'total_fights': len(eval_fights)
 
91
  }
92
 
93
  if detailed_report:
@@ -95,7 +228,9 @@ class PredictionPipeline:
95
  else:
96
  self._report_summary()
97
 
98
- self._train_and_save_models()
 
 
99
 
100
  def _train_and_save_models(self):
101
  """Trains all models on the full dataset and saves them."""
@@ -114,6 +249,13 @@ class PredictionPipeline:
114
  os.makedirs(MODELS_DIR)
115
  print(f"Created directory: {MODELS_DIR}")
116
 
 
 
 
 
 
 
 
117
  for model in self.models:
118
  model_name = model.__class__.__name__
119
  print(f"\n--- Training: {model_name} ---")
@@ -125,14 +267,20 @@ class PredictionPipeline:
125
  joblib.dump(model, save_path)
126
  print(f"Model saved successfully to {save_path}")
127
 
 
 
 
 
 
128
  def _report_summary(self):
129
  """Prints a concise summary of model performance."""
130
  print("\n\n--- Prediction Pipeline Summary ---")
131
- print(f"{'Model':<25} | {'Accuracy':<10} | {'Fights Evaluated':<20}")
132
- print("-" * 65)
133
  for model_name, result in self.results.items():
134
- print(f"{model_name:<25} | {result['accuracy']:<9.2f}% | {result['total_fights']:<20}")
135
- print("-" * 65)
 
136
 
137
  def _save_report_to_json(self, file_path=MODEL_RESULTS_PATH):
138
  """Saves the detailed prediction results to a JSON file."""
@@ -153,6 +301,7 @@ class PredictionPipeline:
153
  report[model_name] = {
154
  "overall_accuracy": f"{result['accuracy']:.2f}%",
155
  "total_fights_evaluated": result['total_fights'],
 
156
  "predictions_by_event": predictions_by_event
157
  }
158
 
 
6
  import json
7
  import joblib
8
 
9
+ # Use absolute imports to avoid relative import issues
10
+ try:
11
+ from src.config import FIGHTS_CSV_PATH, MODEL_RESULTS_PATH, MODELS_DIR, LAST_EVENT_JSON_PATH
12
+ except ImportError:
13
+ # Fallback for when running directly
14
+ from ..config import FIGHTS_CSV_PATH, MODEL_RESULTS_PATH, MODELS_DIR, LAST_EVENT_JSON_PATH
15
+
16
  from .models import BaseModel
17
 
18
  class PredictionPipeline:
19
  """
20
  Orchestrates the model training, evaluation, and reporting pipeline.
21
  """
22
+ def __init__(self, models, use_existing_models=True, force_retrain=False):
23
  if not all(isinstance(m, BaseModel) for m in models):
24
  raise TypeError("All models must be instances of BaseModel.")
25
  self.models = models
26
  self.train_fights = []
27
  self.test_fights = []
28
  self.results = {}
29
+ self.use_existing_models = use_existing_models
30
+ self.force_retrain = force_retrain
31
+
32
+ def _get_last_trained_event(self):
33
+ """Get the last event that models were trained on."""
34
+ if not os.path.exists(LAST_EVENT_JSON_PATH):
35
+ return None
36
+ try:
37
+ with open(LAST_EVENT_JSON_PATH, 'r', encoding='utf-8') as f:
38
+ last_event_data = json.load(f)
39
+ if isinstance(last_event_data, list) and len(last_event_data) > 0:
40
+ return last_event_data[0].get('name'), last_event_data[0].get('date')
41
+ return None, None
42
+ except (json.JSONDecodeError, FileNotFoundError):
43
+ return None, None
44
+
45
+ def _save_last_trained_event(self, event_name, event_date):
46
+ """Save the last event that models were trained on."""
47
+ last_event_data = [{
48
+ "name": event_name,
49
+ "date": event_date,
50
+ "training_timestamp": datetime.now().isoformat()
51
+ }]
52
+ try:
53
+ with open(LAST_EVENT_JSON_PATH, 'w', encoding='utf-8') as f:
54
+ json.dump(last_event_data, f, indent=4)
55
+ except Exception as e:
56
+ print(f"Warning: Could not save last trained event: {e}")
57
+
58
+ def _has_new_data_since_last_training(self):
59
+ """Check if there's new fight data since the last training."""
60
+ last_event_name, last_event_date = self._get_last_trained_event()
61
+ if not last_event_name or not last_event_date:
62
+ return True # No previous training record, consider as new data
63
+
64
+ if not os.path.exists(FIGHTS_CSV_PATH):
65
+ return False
66
+
67
+ with open(FIGHTS_CSV_PATH, 'r', encoding='utf-8') as f:
68
+ fights = list(csv.DictReader(f))
69
+
70
+ if not fights:
71
+ return False
72
+
73
+ # Sort fights by date to get the latest event
74
+ fights.sort(key=lambda x: datetime.strptime(x['event_date'], '%B %d, %Y'))
75
+ latest_fight = fights[-1]
76
+ latest_event_name = latest_fight['event_name']
77
+ latest_event_date = latest_fight['event_date']
78
+
79
+ # Check if we have new events since last training
80
+ if latest_event_name != last_event_name:
81
+ print(f"New data detected: Latest event '{latest_event_name}' differs from last trained event '{last_event_name}'")
82
+ return True
83
+
84
+ return False
85
 
86
+ def _model_exists(self, model):
87
+ """Check if a saved model file exists and can be loaded successfully."""
88
+ model_name = model.__class__.__name__
89
+ file_name = f"{model_name}.joblib"
90
+ save_path = os.path.join(MODELS_DIR, file_name)
91
+
92
+ if not os.path.exists(save_path):
93
+ return False
94
+
95
+ # Verify the model can actually be loaded
96
+ try:
97
+ joblib.load(save_path)
98
+ return True
99
+ except Exception as e:
100
+ print(f"Warning: Model file {file_name} exists but cannot be loaded ({e}). Will retrain.")
101
+ return False
102
+
103
+ def _load_existing_model(self, model_class):
104
+ """Load an existing model from disk."""
105
+ model_name = model_class.__name__
106
+ file_name = f"{model_name}.joblib"
107
+ load_path = os.path.join(MODELS_DIR, file_name)
108
+
109
+ try:
110
+ loaded_model = joblib.load(load_path)
111
+ print(f"Loaded existing model: {model_name}")
112
+ return loaded_model
113
+ except Exception as e:
114
+ print(f"Error loading model {model_name}: {e}")
115
+ return None
116
+
117
+ def _should_retrain_models(self):
118
+ """Determine if models should be retrained."""
119
+ if self.force_retrain:
120
+ print("Force retrain flag is set. Retraining all models.")
121
+ return True
122
+
123
+ if not self.use_existing_models:
124
+ print("Use existing models flag is disabled. Retraining all models.")
125
+ return True
126
+
127
+ # Check if any model files are missing
128
+ missing_models = [m for m in self.models if not self._model_exists(m)]
129
+ if missing_models:
130
+ missing_names = [m.__class__.__name__ for m in missing_models]
131
+ print(f"Missing model files for: {missing_names}. Retraining all models.")
132
+ return True
133
+
134
+ # Check if there's new data since last training
135
+ if self._has_new_data_since_last_training():
136
+ return True
137
+
138
+ print("No new data detected and all model files exist. Using existing models.")
139
+ return False
140
+
141
+ def _load_and_split_data(self, num_test_events=1):
142
  """Loads and splits the data into chronological training and testing sets."""
143
  print("\n--- Loading and Splitting Data ---")
144
  if not os.path.exists(FIGHTS_CSV_PATH):
 
158
  self.train_fights = [f for f in fights if f['event_name'] not in test_event_names]
159
  self.test_fights = [f for f in fights if f['event_name'] in test_event_names]
160
  print(f"Data loaded. {len(self.train_fights)} training fights, {len(self.test_fights)} testing fights.")
161
+ print(f"Testing on the last {num_test_events} event(s): {', '.join(test_event_names)}")
162
 
163
  def run(self, detailed_report=True):
164
  """Executes the full pipeline: load, train, evaluate, report and save models."""
 
169
  print("No fights with definitive outcomes in the test set. Aborting.")
170
  return
171
 
172
+ should_retrain = self._should_retrain_models()
173
+
174
+ for i, model in enumerate(self.models):
175
  model_name = model.__class__.__name__
176
  print(f"\n--- Evaluating Model: {model_name} ---")
177
 
178
+ if should_retrain:
179
+ print(f"Training {model_name}...")
180
+ model.train(self.train_fights)
181
+ else:
182
+ # Try to load existing model, fall back to training if loading fails
183
+ loaded_model = self._load_existing_model(model.__class__)
184
+ if loaded_model is not None:
185
+ # Replace the model instance with the loaded one
186
+ self.models[i] = loaded_model
187
+ model = loaded_model
188
+ else:
189
+ print(f"Failed to load {model_name}, training new model...")
190
  model.train(self.train_fights)
191
 
192
  correct_predictions = 0
 
215
  })
216
 
217
  accuracy = (correct_predictions / len(eval_fights)) * 100
218
+ model_status = "retrained" if should_retrain else "loaded from disk"
219
  self.results[model_name] = {
220
  'accuracy': accuracy,
221
  'predictions': predictions,
222
+ 'total_fights': len(eval_fights),
223
+ 'model_status': model_status
224
  }
225
 
226
  if detailed_report:
 
228
  else:
229
  self._report_summary()
230
 
231
+ # Only train and save models if retraining was performed
232
+ if should_retrain:
233
+ self._train_and_save_models()
234
 
235
  def _train_and_save_models(self):
236
  """Trains all models on the full dataset and saves them."""
 
249
  os.makedirs(MODELS_DIR)
250
  print(f"Created directory: {MODELS_DIR}")
251
 
252
+ # Get the latest event info for tracking
253
+ if all_fights:
254
+ all_fights.sort(key=lambda x: datetime.strptime(x['event_date'], '%B %d, %Y'))
255
+ latest_fight = all_fights[-1]
256
+ latest_event_name = latest_fight['event_name']
257
+ latest_event_date = latest_fight['event_date']
258
+
259
  for model in self.models:
260
  model_name = model.__class__.__name__
261
  print(f"\n--- Training: {model_name} ---")
 
267
  joblib.dump(model, save_path)
268
  print(f"Model saved successfully to {save_path}")
269
 
270
+ # Save the last trained event info
271
+ if all_fights:
272
+ self._save_last_trained_event(latest_event_name, latest_event_date)
273
+ print(f"Updated last trained event: {latest_event_name} ({latest_event_date})")
274
+
275
  def _report_summary(self):
276
  """Prints a concise summary of model performance."""
277
  print("\n\n--- Prediction Pipeline Summary ---")
278
+ print(f"{'Model':<25} | {'Accuracy':<10} | {'Fights Evaluated':<20} | {'Status':<15}")
279
+ print("-" * 80)
280
  for model_name, result in self.results.items():
281
+ status = result.get('model_status', 'unknown')
282
+ print(f"{model_name:<25} | {result['accuracy']:<9.2f}% | {result['total_fights']:<20} | {status:<15}")
283
+ print("-" * 80)
284
 
285
  def _save_report_to_json(self, file_path=MODEL_RESULTS_PATH):
286
  """Saves the detailed prediction results to a JSON file."""
 
301
  report[model_name] = {
302
  "overall_accuracy": f"{result['accuracy']:.2f}%",
303
  "total_fights_evaluated": result['total_fights'],
304
+ "model_status": result.get('model_status', 'unknown'),
305
  "predictions_by_event": predictions_by_event
306
  }
307
 
src/predict/predict_new.py CHANGED
@@ -3,7 +3,12 @@ import os
3
  import joblib
4
  from datetime import datetime
5
 
6
- from ..config import MODELS_DIR
 
 
 
 
 
7
 
8
  def predict_new_fight(fighter1_name, fighter2_name, model_path):
9
  """
 
3
  import joblib
4
  from datetime import datetime
5
 
6
+ # Use absolute imports to avoid relative import issues
7
+ try:
8
+ from src.config import MODELS_DIR
9
+ except ImportError:
10
+ # Fallback for when running directly
11
+ from ..config import MODELS_DIR
12
 
13
  def predict_new_fight(fighter1_name, fighter2_name, model_path):
14
  """
src/predict/preprocess.py CHANGED
@@ -2,6 +2,12 @@ import pandas as pd
2
  import os
3
  import sys
4
  from datetime import datetime
 
 
 
 
 
 
5
  from ..config import FIGHTERS_CSV_PATH
6
 
7
  def _clean_numeric_column(series):
@@ -232,6 +238,11 @@ def preprocess_for_ml(fights_to_process, fighters_csv_path):
232
  return X, y, metadata
233
 
234
  if __name__ == '__main__':
 
 
 
 
 
235
  from .pipeline import PredictionPipeline
236
 
237
  print("--- Running Preprocessing Example ---")
 
2
  import os
3
  import sys
4
  from datetime import datetime
5
+
6
+ # Use absolute imports to avoid relative import issues
7
+ try:
8
+ from src.config import FIGHTERS_CSV_PATH
9
+ except ImportError:
10
+ # Fallback for when running directly
11
  from ..config import FIGHTERS_CSV_PATH
12
 
13
  def _clean_numeric_column(series):
 
238
  return X, y, metadata
239
 
240
  if __name__ == '__main__':
241
+ # Use absolute imports to avoid relative import issues
242
+ try:
243
+ from src.predict.pipeline import PredictionPipeline
244
+ except ImportError:
245
+ # Fallback for when running directly
246
  from .pipeline import PredictionPipeline
247
 
248
  print("--- Running Preprocessing Example ---")
src/scrape/main.py CHANGED
@@ -1,6 +1,8 @@
1
  import os
2
  import json
3
- from .scrape_fights import scrape_all_events
 
 
4
  from .scrape_fighters import scrape_all_fighters
5
  from .to_csv import json_to_csv, fighters_json_to_csv
6
  from .preprocess import preprocess_fighters_csv
@@ -8,17 +10,46 @@ from .. import config
8
 
9
  def main():
10
  """
11
- Main function to run the complete scraping and preprocessing pipeline.
 
12
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Ensure the output directory exists
14
  if not os.path.exists(config.OUTPUT_DIR):
15
  os.makedirs(config.OUTPUT_DIR)
16
  print(f"Created directory: {config.OUTPUT_DIR}")
17
 
 
 
 
 
 
 
 
 
 
 
 
18
  # --- Step 1: Scrape all data from the website ---
19
  # This will generate fighters.json and events.json
20
- scrape_all_fighters()
21
- scrape_all_events()
22
 
23
  # --- Step 2: Convert the scraped JSON data to CSV format ---
24
  # This will generate fighters.csv and fights.csv
@@ -42,7 +73,133 @@ def main():
42
  except OSError as e:
43
  print(f"Error deleting JSON files: {e}")
44
 
45
- print("\n\n--- Scraping and Preprocessing Pipeline Finished ---")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  if __name__ == '__main__':
48
  main()
 
1
  import os
2
  import json
3
+ import argparse
4
+ import pandas as pd
5
+ from .scrape_fights import scrape_all_events, scrape_latest_events
6
  from .scrape_fighters import scrape_all_fighters
7
  from .to_csv import json_to_csv, fighters_json_to_csv
8
  from .preprocess import preprocess_fighters_csv
 
10
 
11
  def main():
12
  """
13
+ Main function to run the scraping and preprocessing pipeline.
14
+ Supports both full scraping and incremental updates.
15
  """
16
+ parser = argparse.ArgumentParser(description="UFC Data Scraping Pipeline")
17
+ parser.add_argument(
18
+ '--mode',
19
+ type=str,
20
+ default='full',
21
+ choices=['full', 'update'],
22
+ help="Scraping mode: 'full' (complete scraping) or 'update' (latest events + sync from last_event.json)"
23
+ )
24
+ parser.add_argument(
25
+ '--num-events',
26
+ type=int,
27
+ default=5,
28
+ help="Number of latest events to scrape in update mode (default: 5)"
29
+ )
30
+
31
+ args = parser.parse_args()
32
+
33
  # Ensure the output directory exists
34
  if not os.path.exists(config.OUTPUT_DIR):
35
  os.makedirs(config.OUTPUT_DIR)
36
  print(f"Created directory: {config.OUTPUT_DIR}")
37
 
38
+ if args.mode == 'full':
39
+ run_full_pipeline()
40
+ elif args.mode == 'update':
41
+ run_update_pipeline(args.num_events)
42
+
43
+ def run_full_pipeline():
44
+ """
45
+ Runs the complete scraping and preprocessing pipeline.
46
+ """
47
+ print("\n=== Running FULL scraping pipeline ===")
48
+
49
  # --- Step 1: Scrape all data from the website ---
50
  # This will generate fighters.json and events.json
51
+ scrape_all_fighters(config.FIGHTERS_JSON_PATH)
52
+ scrape_all_events(config.EVENTS_JSON_PATH)
53
 
54
  # --- Step 2: Convert the scraped JSON data to CSV format ---
55
  # This will generate fighters.csv and fights.csv
 
73
  except OSError as e:
74
  print(f"Error deleting JSON files: {e}")
75
 
76
+ print("\n\n--- Full Scraping and Preprocessing Pipeline Finished ---")
77
+
78
+ def run_update_pipeline(num_events=5):
79
+ """
80
+ Runs the incremental update pipeline to scrape only the latest events.
81
+ Also adds any events from last_event.json that aren't already in the CSV.
82
+
83
+ Args:
84
+ num_events (int): Number of latest events to scrape
85
+ """
86
+ print(f"\n=== Running UPDATE pipeline for latest {num_events} events ===")
87
+
88
+ # --- Step 1: Scrape latest events only ---
89
+ latest_events = scrape_latest_events(config.LAST_EVENT_JSON_PATH, num_events)
90
+
91
+ # --- Step 2: Save latest events to last_event.json (even if empty) ---
92
+ if latest_events:
93
+ with open(config.LAST_EVENT_JSON_PATH, 'w') as f:
94
+ json.dump(latest_events, f, indent=4)
95
+ print(f"Latest {len(latest_events)} events saved to {config.LAST_EVENT_JSON_PATH}")
96
+
97
+ # --- Step 3: Always check and update from last_event.json ---
98
+ update_fights_csv_from_last_event()
99
+
100
+ print(f"\n--- Update Pipeline Finished ---")
101
+
102
+ def update_fights_csv_from_last_event():
103
+ """
104
+ Updates the existing fights CSV with any events from last_event.json that aren't already present.
105
+ Ensures latest events are on top and preserves data types.
106
+ """
107
+ # Check if last_event.json exists
108
+ if not os.path.exists(config.LAST_EVENT_JSON_PATH):
109
+ print(f"No {config.LAST_EVENT_JSON_PATH} found. Nothing to update.")
110
+ return
111
+
112
+ # Load events from last_event.json
113
+ try:
114
+ with open(config.LAST_EVENT_JSON_PATH, 'r') as f:
115
+ events_from_json = json.load(f)
116
+
117
+ if not events_from_json:
118
+ print("No events found in last_event.json.")
119
+ return
120
+
121
+ print(f"Found {len(events_from_json)} events in last_event.json")
122
+
123
+ except Exception as e:
124
+ print(f"Error reading last_event.json: {e}")
125
+ return
126
+
127
+ try:
128
+ # Check if main CSV exists
129
+ if os.path.exists(config.FIGHTS_CSV_PATH):
130
+ existing_df = pd.read_csv(config.FIGHTS_CSV_PATH)
131
+ existing_event_names = set(existing_df['event_name'].unique())
132
+ else:
133
+ print(f"Main fights CSV ({config.FIGHTS_CSV_PATH}) not found. Creating new CSV from last_event.json.")
134
+ json_to_csv(config.LAST_EVENT_JSON_PATH, config.FIGHTS_CSV_PATH)
135
+ return
136
+
137
+ # Create temporary CSV from events in last_event.json
138
+ temp_json_path = os.path.join(config.OUTPUT_DIR, 'temp_latest.json')
139
+ temp_csv_path = os.path.join(config.OUTPUT_DIR, 'temp_latest.csv')
140
+
141
+ with open(temp_json_path, 'w') as f:
142
+ json.dump(events_from_json, f, indent=4)
143
+
144
+ json_to_csv(temp_json_path, temp_csv_path)
145
+
146
+ # Read the new CSV
147
+ new_df = pd.read_csv(temp_csv_path)
148
+
149
+ # Filter out events that already exist
150
+ new_events_df = new_df[~new_df['event_name'].isin(existing_event_names)]
151
+
152
+ if len(new_events_df) > 0:
153
+ # Add new events to the TOP of the CSV (latest first)
154
+ combined_df = pd.concat([new_events_df, existing_df], ignore_index=True)
155
+
156
+ # Convert date column to datetime for proper sorting
157
+ combined_df['event_date_parsed'] = pd.to_datetime(combined_df['event_date'])
158
+
159
+ # Sort by date descending (latest first)
160
+ combined_df = combined_df.sort_values('event_date_parsed', ascending=False)
161
+
162
+ # Drop the temporary date column
163
+ combined_df = combined_df.drop('event_date_parsed', axis=1)
164
+
165
+ # Fix data types to remove .0 from numbers
166
+ fix_data_types(combined_df)
167
+
168
+ combined_df.to_csv(config.FIGHTS_CSV_PATH, index=False)
169
+ print(f"Added {len(new_events_df)} new fights from {new_events_df['event_name'].nunique()} events to the TOP of {config.FIGHTS_CSV_PATH}")
170
+ else:
171
+ print("No new events found that aren't already in the existing CSV.")
172
+
173
+ # Clean up temporary files
174
+ if os.path.exists(temp_json_path):
175
+ os.remove(temp_json_path)
176
+ if os.path.exists(temp_csv_path):
177
+ os.remove(temp_csv_path)
178
+
179
+ except Exception as e:
180
+ print(f"Error updating fights CSV: {e}")
181
+ print("Falling back to creating new CSV from last_event.json only.")
182
+ json_to_csv(config.LAST_EVENT_JSON_PATH, config.FIGHTS_CSV_PATH)
183
+
184
+ def fix_data_types(df):
185
+ """
186
+ Fix data types in the dataframe to remove .0 from numbers and preserve original format.
187
+
188
+ Args:
189
+ df (pandas.DataFrame): DataFrame to fix
190
+ """
191
+ for col in df.columns:
192
+ if df[col].dtype == 'float64':
193
+ # Check if the column contains only whole numbers (no actual decimals)
194
+ if df[col].notna().all() and (df[col] % 1 == 0).all():
195
+ df[col] = df[col].astype('int64')
196
+ elif df[col].isna().any():
197
+ # Handle columns with missing values - keep as string to avoid .0
198
+ df[col] = df[col].fillna('').astype(str)
199
+ # Remove .0 from string representations
200
+ df[col] = df[col].str.replace(r'\.0$', '', regex=True)
201
+ # Convert empty strings back to original empty values
202
+ df[col] = df[col].replace('', '')
203
 
204
  if __name__ == '__main__':
205
  main()
src/scrape/scrape_fighters.py CHANGED
@@ -68,7 +68,7 @@ def process_fighter(fighter_data):
68
  time.sleep(REQUEST_DELAY)
69
  return fighter_data
70
 
71
- def scrape_all_fighters():
72
  """Scrapes all fighters from a-z pages using parallel processing."""
73
 
74
  # Step 1: Sequentially scrape all fighter list pages. This is fast.
@@ -129,14 +129,14 @@ def scrape_all_fighters():
129
 
130
  if (i + 1) > 0 and (i + 1) % 50 == 0:
131
  fighters_with_details.sort(key=lambda x: (x['last_name'], x['first_name']))
132
- with open(config.FIGHTERS_JSON_PATH, 'w') as f:
133
  json.dump(fighters_with_details, f, indent=4)
134
 
135
  fighters_with_details.sort(key=lambda x: (x['last_name'], x['first_name']))
136
  return fighters_with_details
137
 
138
  if __name__ == "__main__":
139
- all_fighters_data = scrape_all_fighters()
140
  if not os.path.exists(config.OUTPUT_DIR):
141
  os.makedirs(config.OUTPUT_DIR)
142
 
 
68
  time.sleep(REQUEST_DELAY)
69
  return fighter_data
70
 
71
+ def scrape_all_fighters(json_path):
72
  """Scrapes all fighters from a-z pages using parallel processing."""
73
 
74
  # Step 1: Sequentially scrape all fighter list pages. This is fast.
 
129
 
130
  if (i + 1) > 0 and (i + 1) % 50 == 0:
131
  fighters_with_details.sort(key=lambda x: (x['last_name'], x['first_name']))
132
+ with open(json_path, 'w') as f:
133
  json.dump(fighters_with_details, f, indent=4)
134
 
135
  fighters_with_details.sort(key=lambda x: (x['last_name'], x['first_name']))
136
  return fighters_with_details
137
 
138
  if __name__ == "__main__":
139
+ all_fighters_data = scrape_all_fighters(config.FIGHTERS_JSON_PATH)
140
  if not os.path.exists(config.OUTPUT_DIR):
141
  os.makedirs(config.OUTPUT_DIR)
142
 
src/scrape/scrape_fights.py CHANGED
@@ -3,7 +3,7 @@ from bs4 import BeautifulSoup
3
  import json
4
  import time
5
  import concurrent.futures
6
- from ..config import EVENTS_JSON_PATH
7
 
8
  # --- Configuration ---
9
  # The number of parallel threads to use for scraping fight details.
@@ -175,7 +175,7 @@ def scrape_event_details(event_url):
175
  event_details['fights'] = completed_fights
176
  return event_details
177
 
178
- def scrape_all_events():
179
  soup = get_soup(BASE_URL)
180
  events = []
181
 
@@ -204,15 +204,60 @@ def scrape_all_events():
204
 
205
  if (i + 1) % 10 == 0:
206
  print(f"--- Saving progress: {i + 1} of {total_events} events saved. ---")
207
- with open(EVENTS_JSON_PATH, 'w') as f:
208
  json.dump(events, f, indent=4)
209
  except Exception as e:
210
  print(f"Could not process event {event_url}. Error: {e}")
211
 
212
  return events
213
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
  if __name__ == "__main__":
215
- all_events_data = scrape_all_events()
216
- with open(EVENTS_JSON_PATH, 'w') as f:
217
  json.dump(all_events_data, f, indent=4)
218
- print(f"\nScraping complete. Final data saved to {EVENTS_JSON_PATH}")
 
3
  import json
4
  import time
5
  import concurrent.futures
6
+ from .. import config
7
 
8
  # --- Configuration ---
9
  # The number of parallel threads to use for scraping fight details.
 
175
  event_details['fights'] = completed_fights
176
  return event_details
177
 
178
+ def scrape_all_events(json_path):
179
  soup = get_soup(BASE_URL)
180
  events = []
181
 
 
204
 
205
  if (i + 1) % 10 == 0:
206
  print(f"--- Saving progress: {i + 1} of {total_events} events saved. ---")
207
+ with open(json_path, 'w') as f:
208
  json.dump(events, f, indent=4)
209
  except Exception as e:
210
  print(f"Could not process event {event_url}. Error: {e}")
211
 
212
  return events
213
 
214
+ def scrape_latest_events(json_path, num_events=5):
215
+ """
216
+ Scrapes only the latest N events from UFC stats.
217
+ This is useful for incremental updates to avoid re-scraping all data.
218
+
219
+ Args:
220
+ json_path (str): Path to save the latest events JSON file
221
+ num_events (int): Number of latest events to scrape (default: 5)
222
+
223
+ Returns:
224
+ list: List of scraped event data
225
+ """
226
+ soup = get_soup(BASE_URL)
227
+ events = []
228
+
229
+ table = soup.find('table', class_='b-statistics__table-events')
230
+ if not table:
231
+ print("Could not find events table on the page.")
232
+ return []
233
+
234
+ event_rows = [row for row in table.find_all('tr', class_='b-statistics__table-row') if row.find('td')]
235
+
236
+ # Limit to the latest N events (events are ordered chronologically with most recent first)
237
+ latest_event_rows = event_rows[:num_events]
238
+ total_events = len(latest_event_rows)
239
+ print(f"Found {len(event_rows)} total events. Scraping latest {total_events} events.")
240
+
241
+ for i, row in enumerate(latest_event_rows):
242
+ event_link_tag = row.find('a', class_='b-link b-link_style_black')
243
+ if not event_link_tag or not event_link_tag.has_attr('href'):
244
+ continue
245
+
246
+ event_url = event_link_tag['href']
247
+
248
+ try:
249
+ event_data = scrape_event_details(event_url)
250
+ if event_data:
251
+ events.append(event_data)
252
+
253
+ print(f"Progress: {i+1}/{total_events} latest events scraped.")
254
+ except Exception as e:
255
+ print(f"Could not process event {event_url}. Error: {e}")
256
+
257
+ return events
258
+
259
  if __name__ == "__main__":
260
+ all_events_data = scrape_all_events(config.EVENTS_JSON_PATH)
261
+ with open(config.EVENTS_JSON_PATH, 'w') as f:
262
  json.dump(all_events_data, f, indent=4)
263
+ print(f"\nScraping complete. Final data saved to {config.EVENTS_JSON_PATH}")