Super cool new model card!
@julien-c
We just need to sync the logs from Jean Zay to the model repos. They're currently synced to logging repos (e.g.: The logs of bigscience/bloom
are currently in bigscience/tr11-176B-ml-logs
).
@teven
said he'd soon sync the logs for the smaller models directly to their respective repos, but maybe we should also sync the logs of 176B to this repo (Currently automated using https://huggingface.co/bigscience-bot).
@stas Do you think we could update the cron job to sync logs to this repo as well? :-)
Thanks so much!
Chris got it -- if you check out https://huggingface.co/bigscience/tr11-176B-ml-logs , you'll see a nice header with language codes and a tensorboard link.
We need the same thing on this updated card as well. It's such a cool resource.
Ok I see! Yeah, syncing the TB traces to this repo in addition to the bigscience/tr11-176B-ml-logs
repo, going forward, would make a lot of sense IMO
Then we only keep bigscience/tr11-176B-ml-logs
for historical legacy as the full training logs will be in the final model repo
I think the username tagging is pointless at the moment since it doesn't do anything. the whole org receives all notifications... so if you need to tag me please do it on slack, since there is no way I'm going to click on every email notification.
Chris, I will not be syncing to this repo as it contains huge files - LFS doesn't go well with normal git needs and makes using the git clone extremely slow and often inoperable.
https://huggingface.co/bigscience/tr11-176B-ml-logs is there specifically for log syncing.
If when the training is over you want to copy the log files over that's probably a better solution.
So does this mean we can't have a tensorboard link at the top? =(
Sure, we can sync manually once in a while until the end of the training, and it would add the "Tensorboard" tag and the "Training metrics" tab. Tell me if you need me to do it.
But maybe @stas had a point against doing it (I didn't understand if the issue was to have the tensorboard traces inside the "bloom" repository or to synchronize the traces here periodically).
For reference:
- the logs (that are not used for tensorboard) are a 133MB file versioned with LFS: https://huggingface.co/bigscience/tr11-176B-ml-logs/tree/main/logs/main
- the tensorboard traces are 82 files, also versioned with LFS, for a total of 198MB
We could probably sync the TB traces now now that training has ended.
Closing as this discussion seems to have come to an end.