I explored in a previous post how to run nanoGPT on modal – both the training and sampling. It was successful, but tiresome. There were a lot of changes to the downloaded code which made me unhappy. If I want to try different projects out that are on Github etc. I don’t want to be doing a lot of coding and fiddling just to get the existing code to run. I just want to run it as if I am running it locally. This is where my script lob.py
comes in!
What is lob.py
?
This is a Python script that provides a fairly easy way (all things considered!) to run your local code on a cloud GPU. It does this by running your code on Modal, and handles some of the logistics of doing so, such as uploading source code.
Let’s explore what it does, by doing what took 1.5 blog posts in 1 blog post, and train and run nanoGPT on Modal.
Using lob.py
to train and run nanoGPT.
First, clone nanoGPT, and download lob.py
from a public Gist (I may make this a full Github repo later, we will see).
git clone https://github.com/karpathy/nanoGPT
cd nanoGPT
wget -O lob.py https://gist.githubusercontent.com/mcapodici/eaa39861affe75be13badefbe9e05079/raw/bbc9e3cbb692277ffcf18406c61805685bf70d25/lob.py
Now set up a python environment you favourite way. I will use venv in this example:
python3 -m venv .
source bin/activate
Now you might want to add the following to .gitignore
to avoid have lots of changes show up (this might differ if you used another Python environment tool)
bin
lib
lib64
share
Now install modal and log in, using their standard instructions:
pip install modal-client
modal token new
We will now set up the lob.py
for our requirements. The version we downloaded is already set up for nanoGPT, but let’s review the contents of it’s parameters.
Setting up lob.py
run parameters
The first one is just selecting what GPU you want to use. For nanoGPT, the cheapest one t4
is plenty enough for the task:
# Choose one of: "t4", "a10g", "inf2", "a100-20g", "a100" or None
gpu="t4"
Next we define the commands that run. These are run after copying all the local code files to the server and changing directory into that folder. We have a single command for each stage, but you can have multiple.
commands={
'prepare': ['python data/shakespeare_char/prepare.py'],
'train': ['python train.py config/train_shakespeare_char.py'],
'sample': ['python sample.py --out_dir=out-shakespeare-char'],
}
Now we set verbose, which tells us what files are being uploaded, we set the name of the volume (so that we can keep our files for this project separate), a timeout of 60 minutes after which Modal will terminate the job and a list of paths to not upload:
verbose=True
volume_name_prefix="2023-07-27-10-45"
timeout_mins=60
exclude_paths_starting_with=["./.git", "./.github", "./bin", "./lib", "./share"]
Finally we define the image, which is how the container will be set up that will run the program. rsync
is needed because it is used to copy up the right files (without losing generated files on the server). In addition we need to do the pip install
defined in the README.md
of the nanoGPT project:
image = modal.Image \
.debian_slim() \
.apt_install("rsync") \
.pip_install("torch numpy transformers datasets tiktoken wandb tqdm".split(" "))
Train and run using lob.py
With all the set up done, running is very simple. Just run these commands one after the other. They correspond to instructions in the readme.md
modal run lob.py --command prepare
modal run lob.py --command train
modal run lob.py --command sample
Here is some output from the final phase:
ISABELLA:
This is the day of this is your land;
But I have been call'd up him been your tent?
DUKE VINCENTIO:
How far of the solemnity? who is wrong'd?
Why should we shame an arms stoop of life?
They will prove his like offence with life
And to be crave a happy model's guilty of his cheeks;
For all his foes, that are gone of me.
Here is the entire output of the 3 commands (click to expand):
Click to expand
(nanoGPT) martin@Capo:~/nanoGPT$ modal run lob.py --command prepare
💾 using volume name: 2023-07-27-10-45-2-aws
✓ Initialized. View app at https://modal.com/apps/ap-4Ed0rtqxmI5GCB0E73QVRT
./configurator.py
./LICENSE
./scaling_laws.ipynb
./README.md
./train.py
./model.py
./pyvenv.cfg
./sample.py
./transformer_sizing.ipynb
./lob.py
./bench.py
./data/shakespeare/prepare.py
./data/shakespeare/readme.md
./data/openwebtext/prepare.py
./data/openwebtext/readme.md
./data/shakespeare_char/prepare.py
./data/shakespeare_char/readme.md
./__pycache__/lob.cpython-38.pyc
./assets/nanogpt.jpg
./assets/gpt2_124M_loss.png
./config/train_shakespeare_char.py
./config/eval_gpt2.py
./config/eval_gpt2_large.py
./config/train_gpt2.py
./config/eval_gpt2_medium.py
./config/eval_gpt2_xl.py
./config/finetune_shakespeare.py
✓ Created objects.
├── 🔨 Created copy.
├── 🔨 Created mount .
└── 🔨 Created mount /home/martin/nanoGPT/lob.py
Command prepare was chosen.
This will run: ['python data/shakespeare_char/prepare.py']
💾 using volume name: 2023-07-27-10-45-2-aws
📁 Running rsync to copy files up to container:
sending incremental file list
./
LICENSE
1,072 100% 0.00kB/s 0:00:00 1,072 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=33/35)
README.md
13,534 100% 12.91MB/s 0:00:00 13,534 100% 12.91MB/s 0:00:00 (xfr#2, to-chk=32/35)
bench.py
4,815 100% 4.59MB/s 0:00:00 4,815 100% 4.59MB/s 0:00:00 (xfr#3, to-chk=31/35)
configurator.py
1,758 100% 1.68MB/s 0:00:00 1,758 100% 1.68MB/s 0:00:00 (xfr#4, to-chk=30/35)
lob.py
5,037 100% 4.80MB/s 0:00:00 5,037 100% 4.80MB/s 0:00:00 (xfr#5, to-chk=29/35)
model.py
16,345 100% 15.59MB/s 0:00:00 16,345 100% 15.59MB/s 0:00:00 (xfr#6, to-chk=28/35)
pyvenv.cfg
70 100% 68.36kB/s 0:00:00 70 100% 68.36kB/s 0:00:00 (xfr#7, to-chk=27/35)
sample.py
3,942 100% 1.25MB/s 0:00:00 3,942 100% 1.25MB/s 0:00:00 (xfr#8, to-chk=26/35)
scaling_laws.ipynb
32,768 12% 10.42MB/s 0:00:00 268,519 100% 1.24MB/s 0:00:00 (xfr#9, to-chk=25/35)
train.py
14,673 100% 69.22kB/s 0:00:00 14,673 100% 69.22kB/s 0:00:00 (xfr#10, to-chk=24/35)
transformer_sizing.ipynb
14,579 100% 68.45kB/s 0:00:00 14,579 100% 68.45kB/s 0:00:00 (xfr#11, to-chk=23/35)
__pycache__/
__pycache__/lob.cpython-38.pyc
2,510 100% 11.73kB/s 0:00:00 2,510 100% 11.73kB/s 0:00:00 (xfr#12, to-chk=18/35)
assets/
assets/gpt2_124M_loss.png
32,768 29% 153.11kB/s 0:00:00 110,433 100% 516.00kB/s 0:00:00 (xfr#13, to-chk=17/35)
assets/nanogpt.jpg
32,768 27% 93.29kB/s 0:00:00 118,621 100% 337.73kB/s 0:00:00 (xfr#14, to-chk=16/35)
config/
config/eval_gpt2.py
208 100% 0.59kB/s 0:00:00 208 100% 0.59kB/s 0:00:00 (xfr#15, to-chk=15/35)
config/eval_gpt2_large.py
215 100% 0.61kB/s 0:00:00 215 100% 0.61kB/s 0:00:00 (xfr#16, to-chk=14/35)
config/eval_gpt2_medium.py
216 100% 0.61kB/s 0:00:00 216 100% 0.61kB/s 0:00:00 (xfr#17, to-chk=13/35)
config/eval_gpt2_xl.py
213 100% 0.60kB/s 0:00:00 213 100% 0.60kB/s 0:00:00 (xfr#18, to-chk=12/35)
config/finetune_shakespeare.py
645 100% 1.82kB/s 0:00:00 645 100% 1.82kB/s 0:00:00 (xfr#19, to-chk=11/35)
config/train_gpt2.py
681 100% 1.92kB/s 0:00:00 681 100% 1.92kB/s 0:00:00 (xfr#20, to-chk=10/35)
config/train_shakespeare_char.py
1,132 100% 3.19kB/s 0:00:00 1,132 100% 3.19kB/s 0:00:00 (xfr#21, to-chk=9/35)
data/
data/openwebtext/
data/openwebtext/prepare.py
3,170 100% 8.92kB/s 0:00:00 3,170 100% 8.92kB/s 0:00:00 (xfr#22, to-chk=5/35)
data/openwebtext/readme.md
489 100% 1.38kB/s 0:00:00 489 100% 1.38kB/s 0:00:00 (xfr#23, to-chk=4/35)
data/shakespeare/
data/shakespeare/prepare.py
1,096 100% 3.08kB/s 0:00:00 1,096 100% 3.08kB/s 0:00:00 (xfr#24, to-chk=3/35)
data/shakespeare/readme.md
161 100% 0.45kB/s 0:00:00 161 100% 0.45kB/s 0:00:00 (xfr#25, to-chk=2/35)
data/shakespeare_char/
data/shakespeare_char/prepare.py
2,344 100% 6.58kB/s 0:00:00 2,344 100% 6.58kB/s 0:00:00 (xfr#26, to-chk=1/35)
data/shakespeare_char/readme.md
209 100% 0.59kB/s 0:00:00 209 100% 0.59kB/s 0:00:00 (xfr#27, to-chk=0/35)
🐍 Using remote python version:
Python 3.8.15
🏃🏽Executing command: python data/shakespeare_char/prepare.py
length of dataset in characters: 1,115,394
all the unique characters:
!$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
vocab size: 65
train has 1,003,854 tokens
val has 111,540 tokens
✓ App completed.
(nanoGPT) martin@Capo:~/nanoGPT$ modal run lob.py --command train
💾 using volume name: 2023-07-27-10-45-2-aws
✓ Initialized. View app at https://modal.com/apps/ap-qsVnQ1NihkY8FsxsG1PlYX
./configurator.py
./LICENSE
./scaling_laws.ipynb
./README.md
./train.py
./model.py
./pyvenv.cfg
./sample.py
./transformer_sizing.ipynb
./lob.py
./bench.py
./data/shakespeare/prepare.py
./data/shakespeare/readme.md
./data/openwebtext/prepare.py
./data/openwebtext/readme.md
./data/shakespeare_char/prepare.py
./data/shakespeare_char/readme.md
./__pycache__/lob.cpython-38.pyc
./assets/nanogpt.jpg
./assets/gpt2_124M_loss.png
./config/train_shakespeare_char.py
./config/eval_gpt2.py
./config/eval_gpt2_large.py
./config/train_gpt2.py
./config/eval_gpt2_medium.py
./config/eval_gpt2_xl.py
./config/finetune_shakespeare.py
✓ Created objects.
├── 🔨 Created copy.
├── 🔨 Created mount .
└── 🔨 Created mount /home/martin/nanoGPT/lob.py
Command train was chosen.
This will run: ['python train.py config/train_shakespeare_char.py']
💾 using volume name: 2023-07-27-10-45-2-aws
📁 Running rsync to copy files up to container:
sending incremental file list
data/shakespeare_char/
🐍 Using remote python version:
Python 3.8.15
🏃🏽Executing command: python train.py config/train_shakespeare_char.py
Overriding config with config/train_shakespeare_char.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such
out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often
# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False
wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'
dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters
# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2
learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small
warmup_iters = 100 # not super necessary potentially
# on macbook also add
# device = 'cpu' # run on cpu only
# compile = False # do not torch compile the model
tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
[2023-07-27 07:51:04,437] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:04,979] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:06,088] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:06,384] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:06,831] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:07,271] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:07,723] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:08,019] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:08,464] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:08,757] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:09,196] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
[2023-07-27 07:51:09,489] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2649, time 33065.55ms, mfu -100.00%
iter 10: loss 3.2438, time 103.08ms, mfu 3.62%
iter 20: loss 2.7899, time 103.51ms, mfu 3.61%
iter 30: loss 2.6383, time 104.21ms, mfu 3.61%
iter 40: loss 2.5763, time 103.21ms, mfu 3.61%
iter 50: loss 2.5261, time 105.02ms, mfu 3.60%
iter 60: loss 2.5136, time 104.65ms, mfu 3.60%
iter 70: loss 2.4921, time 104.66ms, mfu 3.60%
iter 80: loss 2.4932, time 103.74ms, mfu 3.60%
iter 90: loss 2.4696, time 104.93ms, mfu 3.59%
iter 100: loss 2.4526, time 104.89ms, mfu 3.59%
iter 110: loss 2.4543, time 104.67ms, mfu 3.58%
iter 120: loss 2.4223, time 102.57ms, mfu 3.59%
iter 130: loss 2.4059, time 105.67ms, mfu 3.58%
iter 140: loss 2.3925, time 104.18ms, mfu 3.58%
iter 150: loss 2.4098, time 103.93ms, mfu 3.58%
iter 160: loss 2.3675, time 104.91ms, mfu 3.58%
iter 170: loss 2.3382, time 104.84ms, mfu 3.58%
iter 180: loss 2.3011, time 105.85ms, mfu 3.57%
iter 190: loss 2.2278, time 105.65ms, mfu 3.57%
iter 200: loss 2.2004, time 105.36ms, mfu 3.56%
iter 210: loss 2.1244, time 107.28ms, mfu 3.55%
iter 220: loss 2.1338, time 107.40ms, mfu 3.55%
iter 230: loss 2.0709, time 105.63ms, mfu 3.54%
iter 240: loss 2.0742, time 104.87ms, mfu 3.55%
step 250: train loss 1.9616, val loss 2.0647
saving checkpoint to out-shakespeare-char
iter 250: loss 2.0277, time 14340.46ms, mfu 3.19%
iter 260: loss 1.9685, time 106.94ms, mfu 3.22%
iter 270: loss 1.9776, time 107.05ms, mfu 3.25%
iter 280: loss 1.9798, time 106.01ms, mfu 3.27%
iter 290: loss 1.9237, time 108.32ms, mfu 3.29%
iter 300: loss 1.8944, time 107.02ms, mfu 3.31%
iter 310: loss 1.8637, time 108.22ms, mfu 3.32%
iter 320: loss 1.8569, time 108.47ms, mfu 3.33%
iter 330: loss 1.8088, time 108.44ms, mfu 3.35%
iter 340: loss 1.7812, time 108.27ms, mfu 3.35%
iter 350: loss 1.8272, time 107.65ms, mfu 3.37%
iter 360: loss 1.7745, time 108.39ms, mfu 3.37%
iter 370: loss 1.7414, time 109.46ms, mfu 3.38%
iter 380: loss 1.7304, time 107.99ms, mfu 3.38%
iter 390: loss 1.7372, time 109.29ms, mfu 3.39%
iter 400: loss 1.7640, time 109.49ms, mfu 3.39%
iter 410: loss 1.6959, time 106.60ms, mfu 3.40%
iter 420: loss 1.7088, time 109.99ms, mfu 3.40%
iter 430: loss 1.6815, time 107.62ms, mfu 3.40%
iter 440: loss 1.6462, time 108.94ms, mfu 3.41%
iter 450: loss 1.6511, time 107.66ms, mfu 3.41%
iter 460: loss 1.6024, time 108.66ms, mfu 3.41%
iter 470: loss 1.6554, time 110.38ms, mfu 3.41%
iter 480: loss 1.6165, time 108.03ms, mfu 3.41%
iter 490: loss 1.6016, time 107.65ms, mfu 3.42%
step 500: train loss 1.5285, val loss 1.7362
saving checkpoint to out-shakespeare-char
iter 500: loss 1.6016, time 11776.34ms, mfu 3.08%
iter 510: loss 1.6162, time 110.05ms, mfu 3.11%
iter 520: loss 1.6020, time 111.90ms, mfu 3.13%
iter 530: loss 1.5657, time 110.67ms, mfu 3.16%
iter 540: loss 1.6203, time 110.58ms, mfu 3.18%
iter 550: loss 1.5671, time 111.13ms, mfu 3.19%
iter 560: loss 1.5651, time 111.00ms, mfu 3.21%
iter 570: loss 1.5745, time 110.94ms, mfu 3.23%
iter 580: loss 1.5401, time 111.85ms, mfu 3.24%
iter 590: loss 1.5016, time 111.08ms, mfu 3.25%
iter 600: loss 1.5180, time 111.45ms, mfu 3.26%
iter 610: loss 1.5552, time 110.42ms, mfu 3.27%
iter 620: loss 1.5292, time 111.15ms, mfu 3.28%
iter 630: loss 1.5179, time 111.15ms, mfu 3.29%
iter 640: loss 1.4779, time 111.21ms, mfu 3.29%
iter 650: loss 1.5043, time 110.39ms, mfu 3.30%
iter 660: loss 1.5145, time 111.28ms, mfu 3.30%
iter 670: loss 1.4491, time 111.07ms, mfu 3.31%
iter 680: loss 1.5118, time 110.52ms, mfu 3.32%
iter 690: loss 1.4611, time 111.86ms, mfu 3.32%
iter 700: loss 1.4802, time 110.82ms, mfu 3.32%
iter 710: loss 1.4568, time 111.52ms, mfu 3.32%
iter 720: loss 1.4481, time 112.24ms, mfu 3.32%
iter 730: loss 1.4214, time 120.96ms, mfu 3.30%
iter 740: loss 1.4311, time 112.04ms, mfu 3.30%
step 750: train loss 1.3611, val loss 1.5957
saving checkpoint to out-shakespeare-char
iter 750: loss 1.4226, time 12240.61ms, mfu 2.97%
iter 760: loss 1.4461, time 112.47ms, mfu 3.01%
iter 770: loss 1.4283, time 111.68ms, mfu 3.04%
iter 780: loss 1.4123, time 111.30ms, mfu 3.07%
iter 790: loss 1.4221, time 111.95ms, mfu 3.10%
iter 800: loss 1.4286, time 111.81ms, mfu 3.12%
iter 810: loss 1.4068, time 113.48ms, mfu 3.14%
iter 820: loss 1.4105, time 113.58ms, mfu 3.15%
iter 830: loss 1.3939, time 111.81ms, mfu 3.17%
iter 840: loss 1.4056, time 113.22ms, mfu 3.18%
iter 850: loss 1.3911, time 111.85ms, mfu 3.20%
iter 860: loss 1.4010, time 113.44ms, mfu 3.21%
iter 870: loss 1.4061, time 112.60ms, mfu 3.22%
iter 880: loss 1.3758, time 116.09ms, mfu 3.22%
iter 890: loss 1.3920, time 113.44ms, mfu 3.22%
iter 900: loss 1.3728, time 113.66ms, mfu 3.23%
iter 910: loss 1.3245, time 113.50ms, mfu 3.23%
iter 920: loss 1.3671, time 114.03ms, mfu 3.24%
iter 930: loss 1.3670, time 112.65ms, mfu 3.24%
iter 940: loss 1.3524, time 111.20ms, mfu 3.25%
iter 950: loss 1.3538, time 112.70ms, mfu 3.26%
iter 960: loss 1.3668, time 114.80ms, mfu 3.26%
iter 970: loss 1.3636, time 115.69ms, mfu 3.25%
iter 980: loss 1.3566, time 112.03ms, mfu 3.26%
iter 990: loss 1.3400, time 113.93ms, mfu 3.26%
step 1000: train loss 1.2762, val loss 1.5256
saving checkpoint to out-shakespeare-char
iter 1000: loss 1.3415, time 12715.10ms, mfu 2.94%
iter 1010: loss 1.3431, time 114.84ms, mfu 2.97%
iter 1020: loss 1.3133, time 114.62ms, mfu 3.00%
iter 1030: loss 1.3367, time 114.21ms, mfu 3.02%
iter 1040: loss 1.3645, time 114.66ms, mfu 3.05%
iter 1050: loss 1.2936, time 114.63ms, mfu 3.07%
iter 1060: loss 1.3426, time 115.89ms, mfu 3.08%
iter 1070: loss 1.3305, time 114.95ms, mfu 3.10%
iter 1080: loss 1.3386, time 116.72ms, mfu 3.11%
iter 1090: loss 1.3539, time 115.43ms, mfu 3.12%
iter 1100: loss 1.3168, time 117.47ms, mfu 3.12%
iter 1110: loss 1.3008, time 116.91ms, mfu 3.13%
iter 1120: loss 1.3026, time 118.77ms, mfu 3.13%
iter 1130: loss 1.3015, time 116.35ms, mfu 3.14%
iter 1140: loss 1.2992, time 112.41ms, mfu 3.16%
iter 1150: loss 1.3126, time 115.89ms, mfu 3.16%
iter 1160: loss 1.3269, time 115.05ms, mfu 3.17%
iter 1170: loss 1.3064, time 116.09ms, mfu 3.17%
iter 1180: loss 1.3226, time 115.55ms, mfu 3.18%
iter 1190: loss 1.2668, time 113.06ms, mfu 3.19%
iter 1200: loss 1.2964, time 113.89ms, mfu 3.20%
iter 1210: loss 1.2739, time 114.14ms, mfu 3.21%
iter 1220: loss 1.3009, time 114.65ms, mfu 3.21%
iter 1230: loss 1.2977, time 116.01ms, mfu 3.21%
iter 1240: loss 1.3042, time 113.79ms, mfu 3.22%
step 1250: train loss 1.2079, val loss 1.4969
saving checkpoint to out-shakespeare-char
iter 1250: loss 1.2753, time 12646.97ms, mfu 2.90%
iter 1260: loss 1.2867, time 115.14ms, mfu 2.93%
iter 1270: loss 1.2715, time 115.65ms, mfu 2.96%
iter 1280: loss 1.2605, time 117.75ms, mfu 2.98%
iter 1290: loss 1.2806, time 115.94ms, mfu 3.00%
iter 1300: loss 1.2990, time 113.14ms, mfu 3.03%
iter 1310: loss 1.2402, time 112.46ms, mfu 3.06%
iter 1320: loss 1.3072, time 113.39ms, mfu 3.08%
iter 1330: loss 1.2705, time 117.34ms, mfu 3.09%
iter 1340: loss 1.2992, time 116.71ms, mfu 3.10%
iter 1350: loss 1.2529, time 114.87ms, mfu 3.12%
iter 1360: loss 1.2679, time 114.92ms, mfu 3.13%
iter 1370: loss 1.2591, time 117.08ms, mfu 3.13%
iter 1380: loss 1.2695, time 116.04ms, mfu 3.14%
iter 1390: loss 1.2550, time 117.59ms, mfu 3.15%
iter 1400: loss 1.2619, time 116.96ms, mfu 3.15%
iter 1410: loss 1.2513, time 117.91ms, mfu 3.15%
iter 1420: loss 1.2738, time 116.81ms, mfu 3.15%
iter 1430: loss 1.2424, time 114.54ms, mfu 3.16%
iter 1440: loss 1.2576, time 114.93ms, mfu 3.17%
iter 1450: loss 1.2388, time 118.07ms, mfu 3.17%
iter 1460: loss 1.2448, time 119.56ms, mfu 3.16%
iter 1470: loss 1.2223, time 118.65ms, mfu 3.16%
iter 1480: loss 1.2091, time 116.77ms, mfu 3.17%
iter 1490: loss 1.2391, time 117.71ms, mfu 3.17%
step 1500: train loss 1.1533, val loss 1.4693
saving checkpoint to out-shakespeare-char
iter 1500: loss 1.1903, time 14834.12ms, mfu 2.85%
iter 1510: loss 1.2334, time 117.89ms, mfu 2.88%
iter 1520: loss 1.2242, time 116.96ms, mfu 2.91%
iter 1530: loss 1.2554, time 116.91ms, mfu 2.94%
iter 1540: loss 1.1952, time 116.01ms, mfu 2.97%
iter 1550: loss 1.2376, time 115.90ms, mfu 2.99%
iter 1560: loss 1.2124, time 115.27ms, mfu 3.02%
iter 1570: loss 1.2289, time 117.66ms, mfu 3.03%
iter 1580: loss 1.2126, time 116.45ms, mfu 3.05%
iter 1590: loss 1.1923, time 117.08ms, mfu 3.06%
iter 1600: loss 1.2019, time 117.95ms, mfu 3.07%
iter 1610: loss 1.2397, time 117.50ms, mfu 3.08%
iter 1620: loss 1.1842, time 119.91ms, mfu 3.08%
iter 1630: loss 1.2066, time 115.21ms, mfu 3.10%
iter 1640: loss 1.2057, time 115.86ms, mfu 3.11%
iter 1650: loss 1.1848, time 117.37ms, mfu 3.12%
iter 1660: loss 1.2201, time 115.62ms, mfu 3.13%
iter 1670: loss 1.2026, time 118.70ms, mfu 3.13%
iter 1680: loss 1.2037, time 116.72ms, mfu 3.14%
iter 1690: loss 1.2076, time 114.73ms, mfu 3.15%
iter 1700: loss 1.1913, time 115.13ms, mfu 3.16%
iter 1710: loss 1.1841, time 117.65ms, mfu 3.16%
iter 1720: loss 1.1854, time 117.87ms, mfu 3.16%
iter 1730: loss 1.1986, time 117.57ms, mfu 3.16%
iter 1740: loss 1.1763, time 117.37ms, mfu 3.16%
step 1750: train loss 1.1028, val loss 1.4603
saving checkpoint to out-shakespeare-char
iter 1750: loss 1.1805, time 12920.15ms, mfu 2.85%
iter 1760: loss 1.1936, time 117.16ms, mfu 2.88%
iter 1770: loss 1.1979, time 119.79ms, mfu 2.90%
iter 1780: loss 1.1955, time 113.89ms, mfu 2.94%
iter 1790: loss 1.1912, time 118.11ms, mfu 2.96%
iter 1800: loss 1.1752, time 117.03ms, mfu 2.98%
iter 1810: loss 1.1647, time 117.42ms, mfu 3.00%
iter 1820: loss 1.1709, time 116.32ms, mfu 3.02%
iter 1830: loss 1.1703, time 116.40ms, mfu 3.04%
iter 1840: loss 1.1592, time 117.01ms, mfu 3.06%
iter 1850: loss 1.1671, time 117.30ms, mfu 3.07%
iter 1860: loss 1.1751, time 118.06ms, mfu 3.08%
iter 1870: loss 1.1444, time 116.93ms, mfu 3.09%
iter 1880: loss 1.1820, time 116.80ms, mfu 3.10%
iter 1890: loss 1.1850, time 114.54ms, mfu 3.11%
iter 1900: loss 1.1317, time 114.22ms, mfu 3.13%
iter 1910: loss 1.1653, time 116.54ms, mfu 3.13%
iter 1920: loss 1.1680, time 115.75ms, mfu 3.14%
iter 1930: loss 1.1419, time 114.94ms, mfu 3.15%
iter 1940: loss 1.1259, time 115.84ms, mfu 3.16%
iter 1950: loss 1.1378, time 118.97ms, mfu 3.16%
iter 1960: loss 1.1573, time 116.48ms, mfu 3.16%
iter 1970: loss 1.1574, time 114.98ms, mfu 3.17%
iter 1980: loss 1.1518, time 114.50ms, mfu 3.18%
iter 1990: loss 1.1529, time 114.88ms, mfu 3.18%
step 2000: train loss 1.0560, val loss 1.4723
iter 2000: loss 1.1313, time 12257.49ms, mfu 2.87%
iter 2010: loss 1.1371, time 115.66ms, mfu 2.90%
iter 2020: loss 1.1267, time 116.53ms, mfu 2.93%
iter 2030: loss 1.1575, time 116.89ms, mfu 2.96%
iter 2040: loss 1.1369, time 116.50ms, mfu 2.98%
iter 2050: loss 1.1200, time 115.13ms, mfu 3.01%
iter 2060: loss 1.1054, time 114.00ms, mfu 3.03%
iter 2070: loss 1.1241, time 116.13ms, mfu 3.05%
iter 2080: loss 1.1240, time 117.11ms, mfu 3.06%
iter 2090: loss 1.1352, time 114.44ms, mfu 3.08%
iter 2100: loss 1.1349, time 115.23ms, mfu 3.10%
iter 2110: loss 1.1359, time 117.54ms, mfu 3.11%
iter 2120: loss 1.1305, time 117.35ms, mfu 3.11%
iter 2130: loss 1.1397, time 116.09ms, mfu 3.12%
iter 2140: loss 1.1365, time 117.49ms, mfu 3.13%
iter 2150: loss 1.1284, time 114.82ms, mfu 3.14%
iter 2160: loss 1.1414, time 116.10ms, mfu 3.15%
iter 2170: loss 1.1367, time 115.65ms, mfu 3.15%
iter 2180: loss 1.1187, time 119.46ms, mfu 3.15%
iter 2190: loss 1.1095, time 112.09ms, mfu 3.17%
iter 2200: loss 1.1194, time 115.60ms, mfu 3.17%
iter 2210: loss 1.1110, time 115.48ms, mfu 3.18%
iter 2220: loss 1.1232, time 114.33ms, mfu 3.19%
iter 2230: loss 1.1231, time 116.60ms, mfu 3.19%
iter 2240: loss 1.1356, time 115.37ms, mfu 3.19%
step 2250: train loss 1.0100, val loss 1.4719
iter 2250: loss 1.1040, time 12193.67ms, mfu 2.88%
iter 2260: loss 1.1040, time 116.32ms, mfu 2.91%
iter 2270: loss 1.1425, time 116.92ms, mfu 2.94%
iter 2280: loss 1.0969, time 115.55ms, mfu 2.97%
iter 2290: loss 1.1535, time 116.32ms, mfu 2.99%
iter 2300: loss 1.1172, time 117.35ms, mfu 3.01%
iter 2310: loss 1.0925, time 116.01ms, mfu 3.03%
iter 2320: loss 1.0992, time 116.60ms, mfu 3.04%
iter 2330: loss 1.1028, time 116.21ms, mfu 3.06%
iter 2340: loss 1.1236, time 114.90ms, mfu 3.08%
iter 2350: loss 1.1000, time 113.61ms, mfu 3.10%
iter 2360: loss 1.1014, time 113.65ms, mfu 3.12%
iter 2370: loss 1.0974, time 114.22ms, mfu 3.13%
iter 2380: loss 1.0846, time 116.35ms, mfu 3.14%
iter 2390: loss 1.0864, time 117.19ms, mfu 3.14%
iter 2400: loss 1.0765, time 113.52ms, mfu 3.16%
iter 2410: loss 1.0720, time 117.45ms, mfu 3.16%
iter 2420: loss 1.0792, time 113.19ms, mfu 3.17%
iter 2430: loss 1.0588, time 116.62ms, mfu 3.17%
iter 2440: loss 1.0537, time 113.97ms, mfu 3.18%
iter 2450: loss 1.0758, time 117.42ms, mfu 3.18%
iter 2460: loss 1.0872, time 115.39ms, mfu 3.19%
iter 2470: loss 1.0853, time 118.00ms, mfu 3.18%
iter 2480: loss 1.0859, time 115.33ms, mfu 3.19%
iter 2490: loss 1.0610, time 113.64ms, mfu 3.20%
step 2500: train loss 0.9590, val loss 1.4886
iter 2500: loss 1.0800, time 12184.79ms, mfu 2.88%
iter 2510: loss 1.0775, time 117.50ms, mfu 2.91%
iter 2520: loss 1.0480, time 115.16ms, mfu 2.94%
iter 2530: loss 1.0608, time 115.74ms, mfu 2.97%
iter 2540: loss 1.0559, time 115.97ms, mfu 2.99%
iter 2550: loss 1.0655, time 115.44ms, mfu 3.02%
iter 2560: loss 1.0615, time 116.20ms, mfu 3.04%
iter 2570: loss 1.0752, time 116.99ms, mfu 3.05%
iter 2580: loss 1.0818, time 116.49ms, mfu 3.07%
iter 2590: loss 1.0710, time 115.88ms, mfu 3.08%
iter 2600: loss 1.0741, time 112.56ms, mfu 3.10%
iter 2610: loss 1.0476, time 117.23ms, mfu 3.11%
iter 2620: loss 1.0413, time 112.69ms, mfu 3.13%
iter 2630: loss 1.0266, time 114.45ms, mfu 3.14%
iter 2640: loss 1.0461, time 116.10ms, mfu 3.15%
iter 2650: loss 1.0662, time 116.35ms, mfu 3.16%
iter 2660: loss 1.0466, time 116.22ms, mfu 3.16%
iter 2670: loss 1.0205, time 116.80ms, mfu 3.16%
iter 2680: loss 1.0463, time 117.87ms, mfu 3.16%
iter 2690: loss 1.0574, time 116.00ms, mfu 3.17%
iter 2700: loss 1.0213, time 115.56ms, mfu 3.17%
iter 2710: loss 1.0412, time 116.20ms, mfu 3.18%
iter 2720: loss 1.0447, time 120.05ms, mfu 3.17%
iter 2730: loss 1.0544, time 115.64ms, mfu 3.18%
iter 2740: loss 1.0265, time 118.42ms, mfu 3.17%
step 2750: train loss 0.9135, val loss 1.5104
iter 2750: loss 1.0351, time 12241.53ms, mfu 2.86%
iter 2760: loss 1.0271, time 116.76ms, mfu 2.89%
iter 2770: loss 1.0300, time 116.32ms, mfu 2.92%
iter 2780: loss 1.0212, time 116.84ms, mfu 2.95%
iter 2790: loss 1.0406, time 116.89ms, mfu 2.97%
iter 2800: loss 1.0119, time 116.11ms, mfu 3.00%
iter 2810: loss 1.0451, time 115.22ms, mfu 3.02%
iter 2820: loss 1.0255, time 116.07ms, mfu 3.04%
iter 2830: loss 1.0377, time 114.26ms, mfu 3.06%
iter 2840: loss 1.0019, time 114.69ms, mfu 3.08%
iter 2850: loss 1.0306, time 115.24ms, mfu 3.10%
iter 2860: loss 1.0237, time 116.07ms, mfu 3.11%
iter 2870: loss 1.0016, time 114.00ms, mfu 3.12%
iter 2880: loss 1.0270, time 117.10ms, mfu 3.13%
iter 2890: loss 1.0150, time 116.35ms, mfu 3.14%
iter 2900: loss 0.9963, time 116.32ms, mfu 3.14%
iter 2910: loss 1.0427, time 115.04ms, mfu 3.15%
iter 2920: loss 1.0190, time 115.45ms, mfu 3.16%
iter 2930: loss 0.9971, time 114.76ms, mfu 3.17%
iter 2940: loss 0.9860, time 114.46ms, mfu 3.18%
iter 2950: loss 1.0231, time 116.47ms, mfu 3.18%
iter 2960: loss 1.0000, time 116.44ms, mfu 3.18%
iter 2970: loss 0.9940, time 113.93ms, mfu 3.19%
iter 2980: loss 1.0014, time 114.07ms, mfu 3.20%
iter 2990: loss 0.9899, time 112.14ms, mfu 3.21%
step 3000: train loss 0.8673, val loss 1.5194
iter 3000: loss 0.9866, time 12198.37ms, mfu 2.89%
iter 3010: loss 0.9940, time 115.19ms, mfu 2.93%
iter 3020: loss 0.9946, time 114.00ms, mfu 2.96%
iter 3030: loss 1.0067, time 116.10ms, mfu 2.99%
iter 3040: loss 1.0279, time 112.95ms, mfu 3.02%
iter 3050: loss 0.9855, time 114.92ms, mfu 3.04%
iter 3060: loss 0.9980, time 114.28ms, mfu 3.06%
iter 3070: loss 1.0166, time 116.64ms, mfu 3.08%
iter 3080: loss 1.0064, time 116.06ms, mfu 3.09%
iter 3090: loss 0.9795, time 116.85ms, mfu 3.10%
iter 3100: loss 0.9986, time 114.91ms, mfu 3.11%
iter 3110: loss 0.9806, time 114.00ms, mfu 3.13%
iter 3120: loss 0.9921, time 115.17ms, mfu 3.14%
iter 3130: loss 0.9779, time 115.70ms, mfu 3.15%
iter 3140: loss 0.9753, time 117.19ms, mfu 3.15%
iter 3150: loss 0.9882, time 117.12ms, mfu 3.15%
iter 3160: loss 1.0155, time 117.83ms, mfu 3.15%
iter 3170: loss 0.9620, time 115.54ms, mfu 3.16%
iter 3180: loss 0.9784, time 115.59ms, mfu 3.17%
iter 3190: loss 0.9972, time 116.21ms, mfu 3.17%
iter 3200: loss 0.9690, time 115.14ms, mfu 3.18%
iter 3210: loss 0.9701, time 117.81ms, mfu 3.18%
iter 3220: loss 0.9613, time 114.30ms, mfu 3.18%
iter 3230: loss 0.9596, time 115.49ms, mfu 3.19%
iter 3240: loss 0.9608, time 116.11ms, mfu 3.19%
step 3250: train loss 0.8239, val loss 1.5611
iter 3250: loss 0.9846, time 12237.43ms, mfu 2.88%
iter 3260: loss 0.9668, time 116.44ms, mfu 2.91%
iter 3270: loss 0.9772, time 116.16ms, mfu 2.94%
iter 3280: loss 0.9445, time 116.97ms, mfu 2.96%
iter 3290: loss 0.9432, time 116.93ms, mfu 2.98%
iter 3300: loss 0.9451, time 115.18ms, mfu 3.01%
iter 3310: loss 0.9571, time 117.17ms, mfu 3.03%
iter 3320: loss 0.9727, time 116.68ms, mfu 3.04%
iter 3330: loss 0.9581, time 112.84ms, mfu 3.07%
iter 3340: loss 0.9502, time 115.18ms, mfu 3.09%
iter 3350: loss 0.9538, time 114.65ms, mfu 3.10%
iter 3360: loss 0.9362, time 115.49ms, mfu 3.11%
iter 3370: loss 0.9606, time 114.58ms, mfu 3.13%
iter 3380: loss 0.9514, time 116.54ms, mfu 3.14%
iter 3390: loss 0.9561, time 114.63ms, mfu 3.15%
iter 3400: loss 0.9596, time 116.01ms, mfu 3.15%
iter 3410: loss 0.9444, time 112.50ms, mfu 3.17%
iter 3420: loss 0.9479, time 116.45ms, mfu 3.17%
iter 3430: loss 0.9430, time 114.03ms, mfu 3.18%
iter 3440: loss 0.9693, time 116.27ms, mfu 3.18%
iter 3450: loss 0.9509, time 115.68ms, mfu 3.19%
iter 3460: loss 0.9453, time 117.12ms, mfu 3.19%
iter 3470: loss 0.9385, time 117.24ms, mfu 3.19%
iter 3480: loss 0.9504, time 115.33ms, mfu 3.19%
iter 3490: loss 0.9106, time 116.81ms, mfu 3.19%
step 3500: train loss 0.7790, val loss 1.5720
iter 3500: loss 0.9011, time 12234.12ms, mfu 2.87%
iter 3510: loss 0.9180, time 116.66ms, mfu 2.91%
iter 3520: loss 0.9251, time 118.31ms, mfu 2.93%
iter 3530: loss 0.9584, time 118.13ms, mfu 2.95%
iter 3540: loss 0.9311, time 116.41ms, mfu 2.98%
iter 3550: loss 0.9215, time 116.94ms, mfu 3.00%
iter 3560: loss 0.9519, time 116.18ms, mfu 3.02%
iter 3570: loss 0.9363, time 118.52ms, mfu 3.03%
iter 3580: loss 0.9419, time 112.09ms, mfu 3.06%
iter 3590: loss 0.9228, time 114.29ms, mfu 3.08%
iter 3600: loss 0.9237, time 113.88ms, mfu 3.10%
iter 3610: loss 0.9118, time 115.23ms, mfu 3.11%
iter 3620: loss 0.9171, time 118.56ms, mfu 3.12%
iter 3630: loss 0.9199, time 115.27ms, mfu 3.13%
iter 3640: loss 0.9228, time 115.28ms, mfu 3.14%
iter 3650: loss 0.9075, time 116.32ms, mfu 3.15%
iter 3660: loss 0.9391, time 118.61ms, mfu 3.14%
iter 3670: loss 0.9375, time 114.86ms, mfu 3.15%
iter 3680: loss 0.9084, time 116.61ms, mfu 3.16%
iter 3690: loss 0.9350, time 115.12ms, mfu 3.17%
iter 3700: loss 0.8690, time 116.40ms, mfu 3.17%
iter 3710: loss 0.8807, time 117.80ms, mfu 3.17%
iter 3720: loss 0.9150, time 115.20ms, mfu 3.18%
iter 3730: loss 0.9002, time 117.32ms, mfu 3.18%
iter 3740: loss 0.9056, time 114.51ms, mfu 3.18%
step 3750: train loss 0.7414, val loss 1.6029
iter 3750: loss 0.9066, time 12211.47ms, mfu 2.87%
iter 3760: loss 0.9350, time 120.05ms, mfu 2.89%
iter 3770: loss 0.9320, time 117.65ms, mfu 2.92%
iter 3780: loss 0.9233, time 116.81ms, mfu 2.95%
iter 3790: loss 0.9007, time 117.14ms, mfu 2.97%
iter 3800: loss 0.9027, time 116.44ms, mfu 2.99%
iter 3810: loss 0.9152, time 116.25ms, mfu 3.01%
iter 3820: loss 0.8884, time 116.74ms, mfu 3.03%
iter 3830: loss 0.9023, time 114.30ms, mfu 3.05%
iter 3840: loss 0.8860, time 115.67ms, mfu 3.07%
iter 3850: loss 0.8922, time 115.66ms, mfu 3.09%
iter 3860: loss 0.8694, time 116.67ms, mfu 3.10%
iter 3870: loss 0.8979, time 117.24ms, mfu 3.11%
iter 3880: loss 0.8883, time 116.62ms, mfu 3.11%
iter 3890: loss 0.8921, time 117.78ms, mfu 3.12%
iter 3900: loss 0.8796, time 117.16ms, mfu 3.13%
iter 3910: loss 0.8910, time 113.17ms, mfu 3.14%
iter 3920: loss 0.8786, time 116.18ms, mfu 3.15%
iter 3930: loss 0.8860, time 114.44ms, mfu 3.16%
iter 3940: loss 0.8757, time 116.20ms, mfu 3.16%
iter 3950: loss 0.8747, time 115.90ms, mfu 3.17%
iter 3960: loss 0.9120, time 115.20ms, mfu 3.18%
iter 3970: loss 0.8914, time 114.93ms, mfu 3.18%
iter 3980: loss 0.9046, time 116.67ms, mfu 3.18%
iter 3990: loss 0.8761, time 116.38ms, mfu 3.19%
step 4000: train loss 0.7089, val loss 1.6181
iter 4000: loss 0.8590, time 12261.06ms, mfu 2.87%
iter 4010: loss 0.8836, time 117.03ms, mfu 2.90%
iter 4020: loss 0.8819, time 116.28ms, mfu 2.93%
iter 4030: loss 0.8776, time 119.57ms, mfu 2.95%
iter 4040: loss 0.8846, time 116.67ms, mfu 2.97%
iter 4050: loss 0.8734, time 114.50ms, mfu 3.00%
iter 4060: loss 0.8648, time 116.99ms, mfu 3.02%
iter 4070: loss 0.8631, time 115.76ms, mfu 3.04%
iter 4080: loss 0.8867, time 115.88ms, mfu 3.06%
iter 4090: loss 0.8479, time 114.74ms, mfu 3.08%
iter 4100: loss 0.8987, time 117.46ms, mfu 3.09%
iter 4110: loss 0.8641, time 117.32ms, mfu 3.10%
iter 4120: loss 0.8797, time 116.96ms, mfu 3.10%
iter 4130: loss 0.8548, time 117.32ms, mfu 3.11%
iter 4140: loss 0.8778, time 115.72ms, mfu 3.12%
iter 4150: loss 0.8723, time 116.71ms, mfu 3.13%
iter 4160: loss 0.8512, time 116.37ms, mfu 3.14%
iter 4170: loss 0.8705, time 117.28ms, mfu 3.14%
iter 4180: loss 0.8739, time 115.51ms, mfu 3.15%
iter 4190: loss 0.8654, time 115.80ms, mfu 3.16%
iter 4200: loss 0.8579, time 117.02ms, mfu 3.16%
iter 4210: loss 0.8754, time 115.09ms, mfu 3.17%
iter 4220: loss 0.8595, time 116.92ms, mfu 3.17%
iter 4230: loss 0.8805, time 114.63ms, mfu 3.18%
iter 4240: loss 0.8654, time 115.65ms, mfu 3.18%
step 4250: train loss 0.6784, val loss 1.6442
iter 4250: loss 0.8753, time 12213.09ms, mfu 2.87%
iter 4260: loss 0.8598, time 114.89ms, mfu 2.90%
iter 4270: loss 0.8634, time 117.44ms, mfu 2.93%
iter 4280: loss 0.8539, time 1225.34ms, mfu 2.67%
iter 4290: loss 0.8318, time 112.19ms, mfu 2.73%
iter 4300: loss 0.8307, time 117.42ms, mfu 2.78%
iter 4310: loss 0.8535, time 117.93ms, mfu 2.82%
iter 4320: loss 0.8466, time 114.60ms, mfu 2.86%
iter 4330: loss 0.8621, time 116.64ms, mfu 2.89%
iter 4340: loss 0.8286, time 118.11ms, mfu 2.92%
iter 4350: loss 0.8388, time 117.73ms, mfu 2.94%
iter 4360: loss 0.8612, time 115.78ms, mfu 2.97%
iter 4370: loss 0.8579, time 112.69ms, mfu 3.00%
iter 4380: loss 0.8347, time 117.20ms, mfu 3.02%
iter 4390: loss 0.8672, time 115.48ms, mfu 3.04%
iter 4400: loss 0.8440, time 115.27ms, mfu 3.06%
iter 4410: loss 0.8630, time 118.10ms, mfu 3.07%
iter 4420: loss 0.8611, time 116.09ms, mfu 3.08%
iter 4430: loss 0.8399, time 115.48ms, mfu 3.10%
iter 4440: loss 0.8514, time 115.89ms, mfu 3.11%
iter 4450: loss 0.8549, time 117.24ms, mfu 3.12%
iter 4460: loss 0.8366, time 117.52ms, mfu 3.12%
iter 4470: loss 0.8530, time 114.76ms, mfu 3.14%
iter 4480: loss 0.8332, time 114.35ms, mfu 3.15%
iter 4490: loss 0.8419, time 116.28ms, mfu 3.15%
step 4500: train loss 0.6544, val loss 1.6613
iter 4500: loss 0.8533, time 12221.51ms, mfu 2.84%
iter 4510: loss 0.8460, time 112.49ms, mfu 2.89%
iter 4520: loss 0.8332, time 112.16ms, mfu 2.93%
iter 4530: loss 0.8519, time 115.10ms, mfu 2.96%
iter 4540: loss 0.8431, time 115.25ms, mfu 2.99%
iter 4550: loss 0.8768, time 113.19ms, mfu 3.02%
iter 4560: loss 0.8361, time 113.91ms, mfu 3.04%
iter 4570: loss 0.8384, time 112.27ms, mfu 3.07%
iter 4580: loss 0.8528, time 117.01ms, mfu 3.08%
iter 4590: loss 0.8554, time 115.77ms, mfu 3.10%
iter 4600: loss 0.8282, time 115.90ms, mfu 3.11%
iter 4610: loss 0.8602, time 115.36ms, mfu 3.12%
iter 4620: loss 0.8341, time 116.40ms, mfu 3.13%
iter 4630: loss 0.8136, time 115.64ms, mfu 3.14%
iter 4640: loss 0.8465, time 117.28ms, mfu 3.14%
iter 4650: loss 0.8606, time 116.78ms, mfu 3.15%
iter 4660: loss 0.8533, time 114.05ms, mfu 3.16%
iter 4670: loss 0.8358, time 113.38ms, mfu 3.17%
iter 4680: loss 0.8533, time 115.58ms, mfu 3.18%
iter 4690: loss 0.8411, time 114.61ms, mfu 3.18%
iter 4700: loss 0.8206, time 115.47ms, mfu 3.19%
iter 4710: loss 0.7934, time 114.49ms, mfu 3.20%
iter 4720: loss 0.8374, time 116.00ms, mfu 3.20%
iter 4730: loss 0.8223, time 114.59ms, mfu 3.20%
iter 4740: loss 0.8268, time 115.66ms, mfu 3.20%
step 4750: train loss 0.6359, val loss 1.6802
iter 4750: loss 0.8078, time 12216.16ms, mfu 2.89%
iter 4760: loss 0.8218, time 117.52ms, mfu 2.92%
iter 4770: loss 0.8055, time 115.92ms, mfu 2.95%
iter 4780: loss 0.8098, time 115.62ms, mfu 2.97%
iter 4790: loss 0.8343, time 113.37ms, mfu 3.00%
iter 4800: loss 0.8281, time 112.90ms, mfu 3.03%
iter 4810: loss 0.8343, time 116.12ms, mfu 3.05%
iter 4820: loss 0.8217, time 115.77ms, mfu 3.07%
iter 4830: loss 0.8183, time 116.31ms, mfu 3.08%
iter 4840: loss 0.8372, time 115.24ms, mfu 3.10%
iter 4850: loss 0.8202, time 112.28ms, mfu 3.12%
iter 4860: loss 0.8241, time 116.29ms, mfu 3.13%
iter 4870: loss 0.8103, time 116.90ms, mfu 3.13%
iter 4880: loss 0.8312, time 112.69ms, mfu 3.15%
iter 4890: loss 0.8079, time 115.64ms, mfu 3.16%
iter 4900: loss 0.8054, time 114.67ms, mfu 3.17%
iter 4910: loss 0.8333, time 114.81ms, mfu 3.18%
iter 4920: loss 0.8248, time 114.41ms, mfu 3.18%
iter 4930: loss 0.8068, time 116.15ms, mfu 3.19%
iter 4940: loss 0.8003, time 115.11ms, mfu 3.19%
iter 4950: loss 0.8276, time 115.75ms, mfu 3.19%
iter 4960: loss 0.8309, time 116.05ms, mfu 3.20%
iter 4970: loss 0.7909, time 112.48ms, mfu 3.21%
iter 4980: loss 0.7935, time 115.85ms, mfu 3.21%
iter 4990: loss 0.8205, time 116.26ms, mfu 3.21%
step 5000: train loss 0.6210, val loss 1.7005
iter 5000: loss 0.8196, time 12154.77ms, mfu 2.89%
✓ App completed.
(nanoGPT) martin@Capo:~/nanoGPT$ modal run lob.py --command sample
💾 using volume name: 2023-07-27-10-45-2-aws
✓ Initialized. View app at https://modal.com/apps/ap-GmvdCJZhE0rBXwoUqZzliE
./configurator.py
./LICENSE
./scaling_laws.ipynb
./README.md
./train.py
./model.py
./pyvenv.cfg
./sample.py
./transformer_sizing.ipynb
./lob.py
./bench.py
./data/shakespeare/prepare.py
./data/shakespeare/readme.md
./data/openwebtext/prepare.py
./data/openwebtext/readme.md
./data/shakespeare_char/prepare.py
./data/shakespeare_char/readme.md
./__pycache__/lob.cpython-38.pyc
./assets/nanogpt.jpg
./assets/gpt2_124M_loss.png
./config/train_shakespeare_char.py
./config/eval_gpt2.py
./config/eval_gpt2_large.py
./config/train_gpt2.py
./config/eval_gpt2_medium.py
./config/eval_gpt2_xl.py
./config/finetune_shakespeare.py
✓ Created objects.
├── 🔨 Created copy.
├── 🔨 Created mount .
└── 🔨 Created mount /home/martin/nanoGPT/lob.py
Command sample was chosen.
This will run: ['python sample.py --out_dir=out-shakespeare-char']
💾 using volume name: 2023-07-27-10-45-2-aws
📁 Running rsync to copy files up to container:
sending incremental file list
./
__pycache__/
🐍 Using remote python version:
Python 3.8.15
🏃🏽Executing command: python sample.py --out_dir=out-shakespeare-char
Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...
Clown:
So you will be a fellow a servant. He have not strange?
MONTAGUE:
No more, my lord: he's never court.
ANGELO:
Go on,
You shall be a common that little might show.
ANGELO:
Alack to the queen.
ANGELO:
Here's no more.
ANGELO:
But you are near like for this. Go with this good
to sweet the king Willoughby princely.
ANGELO:
Even she is the common of the king thrust for the
great of this isle of the field?
ISABELLA:
I shall to be a word: but it is so.
ISABELLA:
I tell you, my lord; I wi
---------------
MenEnius, I must thou continue me
That I have sent forth and late.
CORIOLANUS:
Change thee,
One that frowns should sold. A silly thing it
To see thee. Thou art a maid?
LARTIUS:
My very time:
Here's a Volscian, sir, come, sir, sir, thou art a very root
To an unballable: 'tis no world of this selfname
Whose lasts our voices' shore, his heir
Than he nothing Camillo is the office
To her sent a submissips; and will not so the silk
Even with the crown, to prove a watch, if
I can rest; and think it e
---------------
MARIANA:
I beseech you, I would weep.
My lord, I do dinner and I would you bring you.
ROMEO:
It is a fellow till your sons, growing as
I dream'd your gracious senIUS:
You take to the covert and in this blame
In my breath in men man is walling out of it;
And still you with that with the prisoner.
Nurse:
What does not?
Madam? O my heart with you? how I was,
Provost, 'tis quickly have been dead!'
LADY CAPULET:
Well, that I have quite to the town throughese man
I' the instrument, so much of thee
---------------
The straight will keep the sweetest princes for kove,
As every top the officers, who was straight
To score the shame, and patricians.
BENVOLIO:
I am worn. The lark of Rome, Romeo comes
Shall bear the news.
MERCUTIO:
O her love! her wife, thou with him:
Here's companion, Lucio.
MERCUTIO:
An if you be fair?
BENVOLIO:
The rest, are no like a sound fool.
MERCUTIO:
O, true, thou hast dead! most dead, an old transport
Than do thou threat parliaments!
ROMEO:
I fear, how she was ever briever and h
---------------
BUCKINGHAM:
Then, here is not some stinlight for your grace,
Not yet ever so much every are there
But in the people's parliament. Some in the man
Being moody to swimit it of you.
BUCKINGHAM:
No other more come to make his most noble great;
And, by her name well appear the nounted soldiers,
Being common your worship not and under his soundly.
GLOUCESTER:
Here comes her than the tedious course.
KING EDWARD IV:
And then moves so he that would as dead.
Go on, and will keep our grace; and then for
---------------
MENENIUS:
He's a letter for Corioli: he's
a gracious lord.
First Citizen:
Away, away!
MENENIUS:
I had a lack of the people, the
people! Take him dead, I had said to chance.
CORIOLANUS:
What's change to the people? have you not
to have an abroad with the seeming noise the proud?
Second Citizen:
Conceit it, Backingham, that walk appointed in the pride.
CORIOLANUS:
I cannot think with all the tomb, that he does
The vilgary of mine and his bed,
Her disgraced him, dear him, to do weep with the
---------------
Shepherd:
Heaven if so, steel it not.
Clown:
I have not to many of this all, and whose
the shores: your blestings are bried intents
none world, and have you so?
Clown:
Will you not banishment, that you have deserved them, who
seen you do, sir, which I must conto any fool, a
rashning from this is little bustrains. Your
wisdom to see alone all. Proceeds to the limb; a while I woo do it alone.
Clown:
My lord, hold I tell you, and I would prophesy to
hold that which you were look not like a fool.
---------------
ISABELLA:
True, if not me, or with mine honour,
I do not banish you with her rest,
That I should not hear me with her eyes,
The world is beholding in enterial.
ANGELO:
The battle hours I do disgrace the sword
Of her tongue, and to such a brother world in earth!
ISABELLA:
Now, sir, if it breathed not, I will tell you fry them?
ANGELO:
See you, tell me this winter'd divines.
ISABELLA:
How doth here this, my lord; but it is an abuse
Within this approbation, and she was much a maid
But in the ol
---------------
GLOUCESTER:
Lords, I can not see the other sad:
And with her, which she shall seem for their swordship.
WARWICK:
Bloody maids the feast.
KING LEWIS XI:
No, my lord, Bona broke, and go to thee.
YORK:
Exeter, I mean this, by my soul rest.
KING LEWIS XI:
The king my great soul in any summer's love,
And in no hour aughty of your streets of justice;
For your blood with their grief,
And all the ground shall be so grown'd along:
Since will I shame at supper us.
GLOUCESTER:
No, no, my lord, be it w
---------------
How changed the drops of eight of his soul?
ISABELLA:
This is the day of this is your land;
But I have been call'd up him been your tent?
DUKE VINCENTIO:
How far of the solemnity? who is wrong'd?
Why should we shame an arms stoop of life?
They will prove his like offence with life
And to be crave a happy model's guilty of his cheeks;
For all his foes, that are gone of me.
DUKE VINCENTIO:
Their love and press sent to it.
CLAUDIO:
Brother is a visitor.
ANGELO:
Why, 'tis my lord, I die the Cap
---------------
Notes about how lob.py
works
The script works by doing the following:
- The is a function called
copy
(I should probably have called thiscopy_and_run
) that runs on the remote machine, and copies all of the changed files from the local file system to the remote machine. - To this function we bind 2 directories that appear on the remote system:
/source/code
is a mount that is a copy of your local folder, except for the folders mentioned inexclude_paths_starting_with
. This is a (I think) temporary and (for sure) read-only folder./root/code
which is a “network file system” which has been set up as persistent and read/write.
- The
copy
function usesrsync
to copy everything that has changed from the mount to the persistent file system. This means that future runs are quicker (they only need to copy changed files) and the running code can save data, such as model snapshots, and recover them on future runs. - Once
copy
has done withrsync
it changes directory into the/root/code
folder, then runs your commands.