falcon-1b-hse / logs.log
almersawi's picture
Upload logs
b98aec6 verified
raw
history blame
15.1 kB
2024-06-11 18:12:02,678 - INFO: Calling run..
2024-06-11 18:12:02,678 - INFO: Problem Type: text_causal_classification_modeling
2024-06-11 18:12:02,678 - INFO: Global random seed: 5001
2024-06-11 18:12:02,678 - INFO: Preparing the data...
2024-06-11 18:12:02,679 - INFO: Setting up automatic validation split...
2024-06-11 18:12:02,720 - INFO: Preparing train and validation data
2024-06-11 18:12:02,721 - INFO: Loading train dataset...
2024-06-11 18:12:03,290 - INFO: Loading validation dataset...
2024-06-11 18:12:03,398 - INFO: Number of observations in train dataset: 9600
2024-06-11 18:12:03,399 - INFO: Number of observations in validation dataset: 2400
2024-06-11 18:12:03,566 - WARNING: EOS token id not matching between config and tokenizer. Overwriting with tokenizer id.
2024-06-11 18:12:03,566 - WARNING: PAD token id not matching between config and tokenizer. Overwriting with tokenizer id.
2024-06-11 18:12:03,571 - INFO: Using bfloat16 for backbone
2024-06-11 18:12:03,571 - INFO: Loading tiiuae/falcon-rw-1b. This may take a while.
2024-06-11 18:13:13,609 - INFO: Loaded tiiuae/falcon-rw-1b.
2024-06-11 18:13:13,612 - WARNING: EOS token id not matching between generation config and tokenizer. Overwriting with tokenizer id.
2024-06-11 18:13:13,612 - WARNING: PAD token id not matching between generation config and tokenizer. Overwriting with tokenizer id.
2024-06-11 18:13:13,612 - INFO: Lora module names: ['query_key_value', 'dense', 'dense_h_to_4h', 'dense_4h_to_h']
2024-06-11 18:13:13,747 - INFO: Enough space available for saving model weights.Required space: 2591.11MB, Available space: 996275.30MB.
2024-06-11 18:13:13,752 - INFO: Optimizer AdamW has been provided with parameters {'weight_decay': 0.0, 'eps': 1e-08, 'betas': (0.8999999762, 0.9990000129), 'lr': 0.0001}
2024-06-11 18:13:13,897 - WARNING: No order set for keys: ['answer_column_label', 'num_classes'].
2024-06-11 18:13:13,912 - WARNING: No order set for keys: ['answer_column_label', 'num_classes'].
2024-06-11 18:13:14,944 - INFO: started process: 0, can_track: True, tracking_mode: TrackingMode.DURING_EPOCH
2024-06-11 18:13:14,945 - INFO: Training Epoch: 1 / 1
2024-06-11 18:13:14,945 - INFO: train loss: 0%| | 0/600 [00:00<?, ?it/s]
2024-06-11 18:13:15,125 - INFO: Evaluation step: 600
2024-06-11 18:13:17,794 - INFO: train loss: 27.85: 1%|1 | 6/600 [00:02<04:41, 2.11it/s]
2024-06-11 18:13:20,240 - INFO: train loss: 29.97: 2%|2 | 12/600 [00:05<04:15, 2.30it/s]
2024-06-11 18:13:22,856 - INFO: train loss: 15.00: 3%|3 | 18/600 [00:07<04:13, 2.30it/s]
2024-06-11 18:13:25,363 - INFO: train loss: 4.29: 4%|4 | 24/600 [00:10<04:06, 2.33it/s]
2024-06-11 18:13:27,809 - INFO: train loss: 2.71: 5%|5 | 30/600 [00:12<04:00, 2.37it/s]
2024-06-11 18:13:30,444 - INFO: train loss: 2.08: 6%|6 | 36/600 [00:15<04:00, 2.34it/s]
2024-06-11 18:13:32,938 - INFO: train loss: 2.43: 7%|7 | 42/600 [00:17<03:56, 2.36it/s]
2024-06-11 18:13:35,519 - INFO: train loss: 2.30: 8%|8 | 48/600 [00:20<03:54, 2.35it/s]
2024-06-11 18:13:37,862 - INFO: train loss: 2.07: 9%|9 | 54/600 [00:22<03:46, 2.41it/s]
2024-06-11 18:13:40,492 - INFO: train loss: 2.11: 10%|# | 60/600 [00:25<03:47, 2.37it/s]
2024-06-11 18:13:43,074 - INFO: train loss: 2.00: 11%|#1 | 66/600 [00:28<03:46, 2.36it/s]
2024-06-11 18:13:46,074 - INFO: train loss: 1.87: 12%|#2 | 72/600 [00:31<03:56, 2.23it/s]
2024-06-11 18:13:48,617 - INFO: train loss: 1.80: 13%|#3 | 78/600 [00:33<03:49, 2.27it/s]
2024-06-11 18:13:51,159 - INFO: train loss: 1.96: 14%|#4 | 84/600 [00:36<03:44, 2.30it/s]
2024-06-11 18:13:53,541 - INFO: train loss: 3.41: 15%|#5 | 90/600 [00:38<03:36, 2.36it/s]
2024-06-11 18:13:56,135 - INFO: train loss: 3.75: 16%|#6 | 96/600 [00:41<03:34, 2.35it/s]
2024-06-11 18:13:58,539 - INFO: train loss: 2.11: 17%|#7 | 102/600 [00:43<03:28, 2.39it/s]
2024-06-11 18:14:01,301 - INFO: train loss: 1.87: 18%|#8 | 108/600 [00:46<03:32, 2.32it/s]
2024-06-11 18:14:03,837 - INFO: train loss: 1.84: 19%|#9 | 114/600 [00:48<03:28, 2.33it/s]
2024-06-11 18:14:06,304 - INFO: train loss: 1.74: 20%|## | 120/600 [00:51<03:23, 2.36it/s]
2024-06-11 18:14:08,749 - INFO: train loss: 1.78: 21%|##1 | 126/600 [00:53<03:18, 2.39it/s]
2024-06-11 18:14:11,182 - INFO: train loss: 1.63: 22%|##2 | 132/600 [00:56<03:14, 2.41it/s]
2024-06-11 18:14:13,724 - INFO: train loss: 1.79: 23%|##3 | 138/600 [00:58<03:12, 2.40it/s]
2024-06-11 18:14:16,072 - INFO: train loss: 2.02: 24%|##4 | 144/600 [01:01<03:06, 2.44it/s]
2024-06-11 18:14:18,388 - INFO: train loss: 1.65: 25%|##5 | 150/600 [01:03<03:01, 2.48it/s]
2024-06-11 18:14:20,832 - INFO: train loss: 1.57: 26%|##6 | 156/600 [01:05<02:59, 2.48it/s]
2024-06-11 18:14:23,260 - INFO: train loss: 1.52: 27%|##7 | 162/600 [01:08<02:57, 2.47it/s]
2024-06-11 18:14:26,004 - INFO: train loss: 1.90: 28%|##8 | 168/600 [01:11<03:01, 2.38it/s]
2024-06-11 18:14:28,649 - INFO: train loss: 1.86: 29%|##9 | 174/600 [01:13<03:01, 2.35it/s]
2024-06-11 18:14:31,091 - INFO: train loss: 1.83: 30%|### | 180/600 [01:16<02:56, 2.38it/s]
2024-06-11 18:14:33,524 - INFO: train loss: 1.74: 31%|###1 | 186/600 [01:18<02:52, 2.40it/s]
2024-06-11 18:14:36,108 - INFO: train loss: 1.61: 32%|###2 | 192/600 [01:21<02:51, 2.38it/s]
2024-06-11 18:14:38,735 - INFO: train loss: 1.75: 33%|###3 | 198/600 [01:23<02:51, 2.35it/s]
2024-06-11 18:14:41,312 - INFO: train loss: 1.74: 34%|###4 | 204/600 [01:26<02:49, 2.34it/s]
2024-06-11 18:14:43,836 - INFO: train loss: 1.58: 35%|###5 | 210/600 [01:28<02:45, 2.35it/s]
2024-06-11 18:14:46,349 - INFO: train loss: 1.73: 36%|###6 | 216/600 [01:31<02:42, 2.36it/s]
2024-06-11 18:14:48,860 - INFO: train loss: 1.72: 37%|###7 | 222/600 [01:33<02:39, 2.37it/s]
2024-06-11 18:14:51,344 - INFO: train loss: 1.50: 38%|###8 | 228/600 [01:36<02:36, 2.38it/s]
2024-06-11 18:14:53,917 - INFO: train loss: 1.57: 39%|###9 | 234/600 [01:38<02:34, 2.37it/s]
2024-06-11 18:14:56,442 - INFO: train loss: 1.57: 40%|#### | 240/600 [01:41<02:31, 2.37it/s]
2024-06-11 18:14:58,801 - INFO: train loss: 1.45: 41%|####1 | 246/600 [01:43<02:26, 2.42it/s]
2024-06-11 18:15:01,446 - INFO: train loss: 1.75: 42%|####2 | 252/600 [01:46<02:26, 2.37it/s]
2024-06-11 18:15:03,751 - INFO: train loss: 1.38: 43%|####3 | 258/600 [01:48<02:20, 2.44it/s]
2024-06-11 18:15:06,382 - INFO: train loss: 1.25: 44%|####4 | 264/600 [01:51<02:20, 2.39it/s]
2024-06-11 18:15:09,001 - INFO: train loss: 1.29: 45%|####5 | 270/600 [01:54<02:19, 2.36it/s]
2024-06-11 18:15:11,157 - INFO: train loss: 1.34: 46%|####6 | 276/600 [01:56<02:11, 2.47it/s]
2024-06-11 18:15:13,775 - INFO: train loss: 1.53: 47%|####6 | 282/600 [01:58<02:11, 2.41it/s]
2024-06-11 18:15:16,367 - INFO: train loss: 1.46: 48%|####8 | 288/600 [02:01<02:10, 2.38it/s]
2024-06-11 18:15:18,800 - INFO: train loss: 1.29: 49%|####9 | 294/600 [02:03<02:07, 2.41it/s]
2024-06-11 18:15:21,149 - INFO: train loss: 1.11: 50%|##### | 300/600 [02:06<02:02, 2.45it/s]
2024-06-11 18:15:23,366 - INFO: train loss: 1.10: 51%|#####1 | 306/600 [02:08<01:56, 2.52it/s]
2024-06-11 18:15:25,956 - INFO: train loss: 1.38: 52%|#####2 | 312/600 [02:11<01:57, 2.46it/s]
2024-06-11 18:15:28,552 - INFO: train loss: 1.22: 53%|#####3 | 318/600 [02:13<01:56, 2.41it/s]
2024-06-11 18:15:31,098 - INFO: train loss: 1.36: 54%|#####4 | 324/600 [02:16<01:55, 2.39it/s]
2024-06-11 18:15:33,640 - INFO: train loss: 1.26: 55%|#####5 | 330/600 [02:18<01:53, 2.38it/s]
2024-06-11 18:15:36,165 - INFO: train loss: 1.25: 56%|#####6 | 336/600 [02:21<01:50, 2.38it/s]
2024-06-11 18:15:38,498 - INFO: train loss: 1.21: 57%|#####6 | 342/600 [02:23<01:45, 2.44it/s]
2024-06-11 18:15:40,974 - INFO: train loss: 1.17: 58%|#####8 | 348/600 [02:26<01:43, 2.43it/s]
2024-06-11 18:15:43,399 - INFO: train loss: 0.98: 59%|#####8 | 354/600 [02:28<01:40, 2.44it/s]
2024-06-11 18:15:45,959 - INFO: train loss: 0.75: 60%|###### | 360/600 [02:31<01:39, 2.41it/s]
2024-06-11 18:15:48,574 - INFO: train loss: 0.90: 61%|######1 | 366/600 [02:33<01:38, 2.38it/s]
2024-06-11 18:15:50,821 - INFO: train loss: 0.93: 62%|######2 | 372/600 [02:35<01:32, 2.46it/s]
2024-06-11 18:15:53,266 - INFO: train loss: 0.92: 63%|######3 | 378/600 [02:38<01:30, 2.46it/s]
2024-06-11 18:15:55,825 - INFO: train loss: 0.87: 64%|######4 | 384/600 [02:40<01:29, 2.42it/s]
2024-06-11 18:15:58,587 - INFO: train loss: 0.72: 65%|######5 | 390/600 [02:43<01:29, 2.34it/s]
2024-06-11 18:16:01,176 - INFO: train loss: 0.80: 66%|######6 | 396/600 [02:46<01:27, 2.33it/s]
2024-06-11 18:16:03,717 - INFO: train loss: 0.89: 67%|######7 | 402/600 [02:48<01:24, 2.34it/s]
2024-06-11 18:16:06,313 - INFO: train loss: 0.97: 68%|######8 | 408/600 [02:51<01:22, 2.33it/s]
2024-06-11 18:16:08,801 - INFO: train loss: 0.79: 69%|######9 | 414/600 [02:53<01:18, 2.36it/s]
2024-06-11 18:16:11,184 - INFO: train loss: 0.76: 70%|####### | 420/600 [02:56<01:14, 2.40it/s]
2024-06-11 18:16:13,644 - INFO: train loss: 0.76: 71%|#######1 | 426/600 [02:58<01:12, 2.41it/s]
2024-06-11 18:16:16,306 - INFO: train loss: 0.64: 72%|#######2 | 432/600 [03:01<01:11, 2.36it/s]
2024-06-11 18:16:18,760 - INFO: train loss: 0.61: 73%|#######3 | 438/600 [03:03<01:07, 2.39it/s]
2024-06-11 18:16:21,223 - INFO: train loss: 0.59: 74%|#######4 | 444/600 [03:06<01:04, 2.40it/s]
2024-06-11 18:16:23,589 - INFO: train loss: 0.72: 75%|#######5 | 450/600 [03:08<01:01, 2.44it/s]
2024-06-11 18:16:26,060 - INFO: train loss: 0.59: 76%|#######6 | 456/600 [03:11<00:59, 2.44it/s]
2024-06-11 18:16:28,667 - INFO: train loss: 0.70: 77%|#######7 | 462/600 [03:13<00:57, 2.39it/s]
2024-06-11 18:16:31,109 - INFO: train loss: 0.69: 78%|#######8 | 468/600 [03:16<00:54, 2.41it/s]
2024-06-11 18:16:33,545 - INFO: train loss: 0.61: 79%|#######9 | 474/600 [03:18<00:51, 2.43it/s]
2024-06-11 18:16:35,943 - INFO: train loss: 0.42: 80%|######## | 480/600 [03:20<00:48, 2.45it/s]
2024-06-11 18:16:38,289 - INFO: train loss: 0.57: 81%|########1 | 486/600 [03:23<00:45, 2.48it/s]
2024-06-11 18:16:40,909 - INFO: train loss: 0.63: 82%|########2 | 492/600 [03:25<00:44, 2.42it/s]
2024-06-11 18:16:43,398 - INFO: train loss: 0.54: 83%|########2 | 498/600 [03:28<00:42, 2.42it/s]
2024-06-11 18:16:45,881 - INFO: train loss: 0.58: 84%|########4 | 504/600 [03:30<00:39, 2.42it/s]
2024-06-11 18:16:48,275 - INFO: train loss: 0.63: 85%|########5 | 510/600 [03:33<00:36, 2.44it/s]
2024-06-11 18:16:50,832 - INFO: train loss: 0.57: 86%|########6 | 516/600 [03:35<00:34, 2.41it/s]
2024-06-11 18:16:53,394 - INFO: train loss: 0.59: 87%|########7 | 522/600 [03:38<00:32, 2.39it/s]
2024-06-11 18:16:55,948 - INFO: train loss: 0.59: 88%|########8 | 528/600 [03:41<00:30, 2.38it/s]
2024-06-11 18:16:58,552 - INFO: train loss: 0.71: 89%|########9 | 534/600 [03:43<00:28, 2.36it/s]
2024-06-11 18:17:01,223 - INFO: train loss: 0.63: 90%|######### | 540/600 [03:46<00:25, 2.32it/s]
2024-06-11 18:17:03,727 - INFO: train loss: 0.65: 91%|#########1| 546/600 [03:48<00:23, 2.34it/s]
2024-06-11 18:17:06,060 - INFO: train loss: 0.63: 92%|#########2| 552/600 [03:51<00:19, 2.41it/s]
2024-06-11 18:17:08,627 - INFO: train loss: 0.65: 93%|#########3| 558/600 [03:53<00:17, 2.39it/s]
2024-06-11 18:17:11,043 - INFO: train loss: 0.62: 94%|#########3| 564/600 [03:56<00:14, 2.41it/s]
2024-06-11 18:17:13,728 - INFO: train loss: 0.64: 95%|#########5| 570/600 [03:58<00:12, 2.36it/s]
2024-06-11 18:17:16,137 - INFO: train loss: 0.62: 96%|#########6| 576/600 [04:01<00:10, 2.40it/s]
2024-06-11 18:17:18,768 - INFO: train loss: 0.72: 97%|#########7| 582/600 [04:03<00:07, 2.36it/s]
2024-06-11 18:17:21,126 - INFO: train loss: 0.79: 98%|#########8| 588/600 [04:06<00:04, 2.41it/s]
2024-06-11 18:17:23,619 - INFO: train loss: 0.76: 99%|#########9| 594/600 [04:08<00:02, 2.41it/s]
2024-06-11 18:17:25,921 - INFO: train loss: 0.61: 100%|##########| 600/600 [04:10<00:00, 2.47it/s]
2024-06-11 18:17:25,921 - INFO: train loss: 0.61: 100%|##########| 600/600 [04:10<00:00, 2.39it/s]
2024-06-11 18:17:25,921 - INFO: Saving last model checkpoint to /app/output
2024-06-11 18:17:25,922 - INFO: Saving checkpoint..
2024-06-11 18:17:28,646 - INFO: Starting validation inference
2024-06-11 18:17:28,646 - INFO: validation progress: 0%| | 0/150 [00:00<?, ?it/s]
2024-06-11 18:17:29,623 - INFO: validation progress: 5%|4 | 7/150 [00:00<00:19, 7.17it/s]
2024-06-11 18:17:30,353 - INFO: validation progress: 9%|9 | 14/150 [00:01<00:16, 8.42it/s]
2024-06-11 18:17:31,126 - INFO: validation progress: 14%|#4 | 21/150 [00:02<00:14, 8.70it/s]
2024-06-11 18:17:31,912 - INFO: validation progress: 19%|#8 | 28/150 [00:03<00:13, 8.78it/s]
2024-06-11 18:17:32,691 - INFO: validation progress: 23%|##3 | 35/150 [00:04<00:12, 8.85it/s]
2024-06-11 18:17:33,491 - INFO: validation progress: 28%|##8 | 42/150 [00:04<00:12, 8.82it/s]
2024-06-11 18:17:34,243 - INFO: validation progress: 33%|###2 | 49/150 [00:05<00:11, 8.97it/s]
2024-06-11 18:17:35,022 - INFO: validation progress: 37%|###7 | 56/150 [00:06<00:10, 8.98it/s]
2024-06-11 18:17:35,800 - INFO: validation progress: 42%|####2 | 63/150 [00:07<00:09, 8.98it/s]
2024-06-11 18:17:36,616 - INFO: validation progress: 47%|####6 | 70/150 [00:07<00:09, 8.85it/s]
2024-06-11 18:17:37,430 - INFO: validation progress: 51%|#####1 | 77/150 [00:08<00:08, 8.77it/s]
2024-06-11 18:17:38,207 - INFO: validation progress: 56%|#####6 | 84/150 [00:09<00:07, 8.85it/s]
2024-06-11 18:17:39,024 - INFO: validation progress: 61%|###### | 91/150 [00:10<00:06, 8.76it/s]
2024-06-11 18:17:39,772 - INFO: validation progress: 65%|######5 | 98/150 [00:11<00:05, 8.93it/s]
2024-06-11 18:17:40,514 - INFO: validation progress: 70%|####### | 105/150 [00:11<00:04, 9.08it/s]
2024-06-11 18:17:41,317 - INFO: validation progress: 75%|#######4 | 112/150 [00:12<00:04, 8.97it/s]
2024-06-11 18:17:42,101 - INFO: validation progress: 79%|#######9 | 119/150 [00:13<00:03, 8.96it/s]
2024-06-11 18:17:42,894 - INFO: validation progress: 84%|########4 | 126/150 [00:14<00:02, 8.92it/s]
2024-06-11 18:17:43,703 - INFO: validation progress: 89%|########8 | 133/150 [00:15<00:01, 8.83it/s]
2024-06-11 18:17:44,475 - INFO: validation progress: 93%|#########3| 140/150 [00:15<00:01, 8.90it/s]
2024-06-11 18:17:45,272 - INFO: validation progress: 98%|#########8| 147/150 [00:16<00:00, 8.87it/s]
2024-06-11 18:17:45,607 - INFO: validation progress: 100%|##########| 150/150 [00:16<00:00, 8.84it/s]
2024-06-11 18:17:45,678 - INFO: Validation AUC: 0.65136
2024-06-11 18:17:45,679 - INFO: Mean validation loss: 0.53393
2024-06-11 18:17:46,646 - WARNING: No order set for keys: ['answer_column_label', 'num_classes'].