finetune-ckl8r-2169410118: 2022/04/22 15:00:21 Newest source `/finetune-data/dataset/Alastair Reynolds - [Revelation Space] Chasm City.txt` is older than `/finetune-data/dataset-EleutherAI_gpt_j_6B-2048.tokens`, not retokenizing. Use -retokenize to force retokenization.
finetune-ckl8r-3837635091: RUN_NAME: cassandra-gpt-j-6b-fp16
finetune-ckl8r-3837635091: HOST: finetune-ckl8r-3837635091
finetune-ckl8r-3837635091: CUDA: 11.3
finetune-ckl8r-3837635091: TORCH: 1.10.0a0+git302ee7b
finetune-ckl8r-3837635091: TRANSFORMERS: 4.17.0
finetune-ckl8r-3837635091: CPU: (maxrss: 297mb F: 811,255mb) GPU: (U: 19mb F: 51,033mb T: 51,052mb) TORCH: (R: 0mb/0mb, A: 0mb/0mb)
finetune-ckl8r-3837635091: DATASET: /finetune-data/dataset-EleutherAI_gpt_j_6B-2048.tokens
finetune-ckl8r-3837635091: DATASET SIZE: 194.27mb, 101,855,232 tokens, 49,734 contexts
finetune-ckl8r-3837635091: TRAIN_DATASET: 44,760 examples
finetune-ckl8r-3837635091: VALUE_DATASET: 4,974 examples
finetune-ckl8r-3837635091: LAST CHECKPOINT: None
finetune-ckl8r-3837635091: RANDOM SEED: 42
finetune-ckl8r-3837635091: FORCE FP16: True
finetune-ckl8r-3837635091: Loading EleutherAI/gpt-j-6B
finetune-ckl8r-3837635091: CPU: (maxrss: 48,240mb F: 761,345mb) GPU: (U: 13,117mb F: 37,935mb T: 51,052mb) TORCH: (R: 12,228mb/12,228mb, A: 12,219mb/12,219mb)
finetune-ckl8r-3837635091: CPU: (maxrss: 48,240mb F: 785,595mb) GPU: (U: 13,117mb F: 37,935mb T: 51,052mb) TORCH: (R: 12,228mb/12,228mb, A: 12,219mb/12,219mb)
finetune-ckl8r-3837635091: Setting batch size to 4
finetune-ckl8r-3837635091: [2022-04-22 15:03:12,856] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
finetune-ckl8r-3837635091: Using amp half precision backend
finetune-ckl8r-3837635091: [2022-04-22 15:03:12,863] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.1, git-hash=unknown, git-branch=unknown
4% 99/2238 [1:48:23<35:43:37, 60.13s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,206mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
4% 100/2238 [1:49:59<42:08:58, 70.97s/it]{'loss': 2.6446, 'learning_rate': 5e-05, 'epoch': 0.04}
5% 101/2238 [1:51:00<40:18:00, 67.89s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
5% 103/2238 [1:53:00<37:54:38, 63.92s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)
5% 105/2238 [1:55:01<36:50:48, 62.19s/it]
finetune-ckl8r-3837635091: CPU: (maxrss: 114,754mb F: 674,205mb) GPU: (U: 37,289mb F: 13,763mb T: 51,052mb) TORCH: (R: 36,033mb/36,033mb, A: 22,016mb/23,665mb)