NanoGPT – Martin Capodici

Modal.com and NanoGPT continued: producing output; using tiktoken for bigger tokens

posted on July 19, 2023July 19, 2023tagged as Blog, MachineLearning, NanoGPT

In the previous post we explored how to get NanoGPT training on Modal. There was quite a bit to that, so I left the text generation part to now just to cap that post off. Let’s do that now and then try some more stuff out with NanoGPT.

Let’s make some Shakespam

With all the setup work done in the first post, generating text on Modal will be much easier.

The repo code that generates text is sample.py, and we just need a script to hook into that and run in in Modal, which is this (train_modal.py):

import modal

# Make sure we have access to the data we prepared earlier:
volume = modal.NetworkFileSystem.new().persisted("nano-gpt-volume")

# Set up the container for running the training, and make sure it has the necessary
# python pacakages installed.
stub = modal.Stub("nano-gpt-sample",
    image=modal.Image.debian_slim().pip_install(
        ["torch", "numpy", "transformers", "datasets", "tiktoken", "wandb", "tqdm"]
    )
)

# This stub.function allows train_modal to be called remotely on their servers. We will
# now specify how we want that set up...
@stub.function(
        # Ensure that the function runs with a GPU, I have picked out a cheap one, but you can replace
        # this with "any" in the future if this GPU is no longer available.
        gpu=modal.gpu.T4(), 

        # Increase the timeout to allow long training times.
        timeout=3600, 

        # This tells modal to upload the entire nanogpt package we created. Without doing
        # this it won't be able to locate train.py, model.py etc.
        mounts=[modal.Mount.from_local_python_packages("nanogpt")],
        
        # Mount the data we prepared earlier
        network_file_systems={"/root/data": volume}
        )
def sample_modal():
    # This import is a cheeky and quick way to run nanogpt with minimal changes to Andrej's code. Ideally we would change
    # the `train`` module to expose a function. Then import `train` and call that function.
    import nanogpt.sample

# This is what gets called locally when running `modal run train_modal.py`, and it just calls the 
# remote function.
@stub.local_entrypoint()
def main():
    sample_modal.call()

Then to run it:

modal run sample_modal.py

The result of this is long, and is shown in the expander below. I think this is really impressive:

Shakespeare Output (click to expand)


ANGELO:
And coward to lay them again.

DUKE VINCENTIO:
My lord,
My lord, I have received. Come, and you not affance,
I have heard to the way of the wanton.

DUKE VINCENTIO:
Once some rove sorrow to prince, you must have a
party husband with a creature her years and we are at lives
to the world that you have done evil so, and your face now
yet to-day, you must pardon not the son; if he must not have
been fair of your own.

KING EDWARD IV:
I am a brief, and that he straight no more stones of the

---------------

Men pardon me, you shall have hang have myself
And fortune's madam, who cause me to my grave,
That could perform the glory to ask of men.

Servant:
Peace, my lord, dispatch, I say you to the state.

First Citizen:
It is Claudio, now she shall content me to do home,
That were but with his grace.

First Servingman:
Nor I had said it, he did not the voice.

Third Citizen:
He is it is no no strength of all the city;
And so the silence lies all of soldiers part.
There is it out to give us your wife.

---------------

MARIANA:
I beseech you, I would were her the compass,
And say your complexions are to be rest;
And in the world-find the veins of this very
Doth like a good sword. You have a man a poison come as
In that lamentation or pleadeness, not of happy part
In antic determy's landers, most lovely speech
To the world a deed of through he is head,
Resolved in closen. What is the law of your taste?

HENRY BOLINGBROKE:
Farewell, and let us my lord: yet I will be foresh,
Believe me as you are so are all to th
---------------

The shortening shall the crown, even I be sudden.

GLOUCESTER:
But when I was before the straight of my death?

QUEEN ELIZABETH:
And if thou deserve mistrust with God!

KING RICHARD III:
Hast thou know'st this sentence may not less?

QUEEN ELIZABETH:
The crowns of thy hand, like all thy courts to see.

KING RICHARD III:
Thou liest; and like a spoil to thy wit should.
Wert thou and a world prize head, and sent my grace
To comfort, a should things bring thy honour,
And a submission that be a happy
---------------

Be every believed him to the crown
For less of Gremio, I means to give us him.

FLORIZEL:
What rade you beseech your tongue.
If my live, we'll be gone, my lord.

CLAUDIO:
Look,
That he not so your trannous company many
Does his noble gentle confession
Procled had done, let us you go your foe
To say advise me which royal provided in his beast.

ISABELLA:
But your mother, the senators of brother party,
Sir, he is in joy and little his head;
The air fearful rob her false for our growther's crown,
Y
---------------


MENENIUS:
He hath had not a scatch'd before him:
Fie, make a grace, but he sword him well
To pluck a town and so at the time;
But what news is he see his beauty love?
What was done? if we will not sit it now,
Respected it come, and you receiver
To stand me the line of your honour grace
And have heard him fair.

ROMEO:
Cry you not inheritate is the prince of heaven,
Which your ladyy contraction will strike him to bring them
From my sovereignty: they are in his heir,
And he is so his arms and spi
---------------

She would be make from thee spirit!--Her, chequence here,
A man thoughts are the place of a sweeter,
Bare from faults: but I come to him,
You do leave the prisoner and raining
Fear him fill old my vault, and save him love.

First Lord:
He shall be some counsel hunt and the morning warrant,
So if the never warrants of your father's most
To fall father's order? Therefore, I perceive
Think there the new to defend, that tears the people.

HASTINGS:
I thank you at xpeals in the while be
All the souls
---------------

lord:
How dost thou tell he is himself? what give me the world?

Third Citizen:
And the master that be a party villain.

Second Citizen:
Here hath he comes at known to use your guarden
how by the own cause.

First Citizen:
He hath peruned but a stronge in earth!

Second Citizen:
He wisdom crave him with your strokes and highness
Have he deserved to your native with his power.

CORIOLANUS:
No more eye than country's purpose a worthy lips.

Second Citizen:
A peace, sir, when you must be dead! who 
---------------

See that wretched in the night's care. But if you like, I
leave you shall be true.

LEONTES:
Who's some that made you have scretch'd me to take.

Second Watchman:
Not, how my brother gates that can the day
And I may out wish her and revenge with yourselves. I

First Gentleman:
And he is my present but to your honours.

CATESBY:
As if they gave you here such be the business
With a rude appetitent of my watch mother,
Be relessed in her is a poison.

LADY ANNE:
How fearful is yourself, my gracious 
---------------

Her cheeks is a dreadful end; my lady's gracious lord,
And like a bastard to her tyrannous tain,
But if I think, at so so such disposed.

KING RICHARD II:
How! what a woman?

QUEEN ELIZABETH:
My lord, shall I were her no score to me.

QUEEN ELIZABETH:
Go then be a bounty's place-gowen fyriends!

KING RICHARD II:
Then gentle counsel your headship office,
And I shall say therefore and person committent.

KING RICHARD III:
O Bushy, thou hast not speak'd of fair both to thee.
Thou but shalt thou sha
---------------

It amazes me that we can get computers, which are purely logical to do stuff like this at all. For presentive my first computer was an Acorn Electron – 32kb RAM (a millionth of a decent laptop nowadays).

Another reason this is amazing is the step-change that using the transformer model (which is the T in GPT) gives you over other models that are shown in the zero-hero course. It is not just the computing power that does this, but the research into new models that has happened in the last 20 years or so.

Turning up the temperature

Andrej included a temperature setting, which allows you to adjust the “randomness” of the output:

If set to very close to zero, it will produce the same output each time. This is the output it considers “most likely”.
If set to 1, it will produce the output with the probabilities it predicts, for example if it decides, based on training, that there is a 80% chance of an o coming next, and 15% chance of a d, then it will produce an o 80% of the time.
If set higher, the probabilities will move closer together, giving less likely character more chance of appearing.

The chart below (link to Google sheet) shows how increasing temperature makes the probabilities of 3 potential “next characters” close up to each other, and decreasing causes the preferred outcome to be picked as the winner always:

Let’s try a temperature of 2, add this line to train_shakespeare_char.py:

temperature = 2

Here is a small sample of the output I got. It is definitely more chaotic!

HASTINMBSABURY:
Stir-3 Sleep, haugs:
Warthy, usquick..tWarwiXl!
Hatensworn my feans?
You know,
Young, tof it is!
BAmilind!

A low of temperature of 0.1 give us this, which seems more coherent, but much more “stuck like a record”:

CORIOLANUS:
I will be so so much a part of the people,
And then the way of the common of the court,
And then the way of the people of the court,
And the prince of the people of the court,
Which we have stood of the prince of the people,
And the princely of the streets of the state,
Which we have stood to the body of the sea,

I think the default temperature of 0.8 was probably “just right” like the porridge!

Using tiktoken for better encoding of the text

Tiktoken is a tokenizer library used by OpenAI. It’s job is to turn a sentence into a string of number representations, which can then be used to train the model. It does this using an algorithm which first encodes the most frequent words as single tokens, while the less frequent words that contain more frequent words as its subwords are represented by multiple tokens, each of them representing a word part..

Until now. have been training by converting each character to a number. However the problem here is we are not making good use of the structure already in English: words and parts of words to process numbers with more meaning.

Tiktoken offers a choice of the pre-built tokenizers they use in their models, and Andrej uses the gpt2 one. To give an idea of what this does, here is some code that encodes using tiktoken, then shows the resulting encoding

enc = tiktoken.get_encoding("gpt2")
for tok in enc.encode("Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?'"):
   print(f'{str(tok).ljust(5)} : {enc.decode([tok])}')

Here is the result:

Click to expand

44484 : Alice
373   :  was
3726  :  beginning
284   :  to
651   :  get
845   :  very
10032 :  tired
286   :  of
5586  :  sitting
416   :  by
607   :  her
6621  :  sister
319   :  on
262   :  the
3331  :  bank
11    : ,
290   :  and
286   :  of
1719  :  having
2147  :  nothing
284   :  to
466   :  do
25    : :
1752  :  once
393   :  or
5403  :  twice
673   :  she
550   :  had
613   :  pe
538   : ep
276   : ed
656   :  into
262   :  the
1492  :  book
607   :  her
6621  :  sister
373   :  was
3555  :  reading
11    : ,
475   :  but
340   :  it
550   :  had
645   :  no
5986  :  pictures
393   :  or
10275 :  conversations
287   :  in
340   :  it
11    : ,
705   :  '
392   : and
644   :  what
318   :  is
262   :  the
779   :  use
286   :  of
257   :  a
1492  :  book
4032  : ,'
1807  :  thought
14862 :  Alice
705   :  '
19419 : without
5986  :  pictures
393   :  or
10275 :  conversations
8348  : ?'

What I find interesting here is "and" & " and" are different tokens: 392 & 290. It is also interesting that most tokens are whole words here. “Peeped” is the odd one out that got split up.

To train the model using the tiktoken we need to run the prepare.py file in the shakespeare folder (as opposed to the shakespeare_char folder we used last time).

Training with the Tiktoken encoding

There are a few things I had to do to get this to work. It got a bit messy, so I won’t share the code here, but I aim to put something better up on Github eventually. But in short I had to

Change the GPU to A100 – 20Gb to have a chance to train it in a reasonable time
Because modal has “regions” this means also changing the volume name, so it could create a new volume near that GPU’s region
And this mean changing all the modal calls to specify the A100 -20Gb GPU so they would be in the same region
I also changed the parameters, I reduced the batch size from 256 to 64, since the tokens now mean more than they did before, so we can do with fewer, but I increased the embedding size from 384 to 384 * 4 since we might need more dimensions to represent a word.

With all of that done, here are the results I got, there is a lot more text because the number of tokens generated is as before:

Click to expand

RICHARD:
Sir, if you will.

CLIFFORD:
No, by this news.

CLIFF:
I have the duke?

BUCKINGHAM:
The queen, but I'll not answer,
BUCKINGHAM:
So far is she is: I'll tell you well.

CAPULET:
Then, with this?

JULIET:
We are forgot:
My lord,
That is not not a more.

KING RICHARD II:
FRIAR LAURENCE:
But Warwick, noble my heart's son,
And learn thee for afeard.

KING RICHARD III:
POMAS MOWBRAY:
How now,
That thou speak, I should be thy eye,
Call this more as my lord,
That thy good time?

Boy:
No, I will not be done.

QUEEN MARGARET:
But so the duke is this forswear,
Thou canst thou notstst go?

KING RICHARD II:
Well,
Or, my lord, I'll tell thee.

QUEEN ELIZABETH:
My lord, lords, my lord, though I be a life
To your request, by my body or bad.

DUKE VINCENTIO:
At your grace of sorrow and I'll swear
From this is a slave, go, sir, let him be not be so.

LUCIO:
Not much much with him.

DUKE VINCENTIO:
I'll'll not be so,
That you shall be.

DUKE VINCENTIO:
Come, sir, let me be this,
Nor would he:
I have not stay you, sir, my lord?

LUCIO:
Why, madam, in the hour; but I am a
for never will and be sent by that knows you,
That can that you are no man.

ESCALUS:
O, have you.

ISABELLA:
Look ye, I'll give him the queen,
That ever all the city,
Will they see; I am
---------------

Romeo; more a poor gentleman.

FRIAR LAURENCE:
My lord, Warwick, the king,
And, and so, if you did so,
Think what you have been an hour not what?

BUCKINGHAM:
My lord, I pray, my lord,
Which, look yourself to your lords,
I speak or my lord,
I must be a villain's son--

KING RICHARD III:
Not a traitor, and such a wife's wife;
Who now, for, who is the prince blood.

JULIET:
My lord, like Edward's son,
They are a deed were gone.

PETER:
An that it shall never be my son,
Wherein your worship's face, to the cause of this pleasure,
To fight in the rest, with the one that he does approach
To bring my hands
That Henry is my name
Thy mother? Sir and take down
To be too revenure
And yet more, and for, by a righteous justice. This is my son,
Than nature, and never
Which to be in my soul,
To bear his country's daughter's son,
I hate him. So, now how the consul'st, as my friends
Than he might have
The dost he shall live, and in his blood
And the hear him at a burthe;
And he shall hear the earth; and this is the field
When he could not keep me not and make him be but. Look, do I have not for any,
Who ere thou doth not stay:
She is not in him
As when thou that thou hast thou there's deny,
Not yet too worse than a man and comfort
Have no more in such idle by the rest;
Within this means, and thy mind
And any that grieved day and heart'st
As to the people were a corse,
There is the crown of the prince, and with a deal of a
thing by the crown had to his breath.

FRIAR LAURENCE:
Welcome, come, and the cause.

JULIET:
FRIAR LAURENCE:
A night is, I can,
Would you be this?

BUCKINGHAM
---------------


A second arms; she did I have sat home.

CORIOLANUS:
No, my friar,
Whose words, you are true.

COMINIUS:
It is the execution'en is not
As I
would be, and,
AEdidius!

MENENIUS:
I would so, sir.

MENENIUS:
I have worn a good sir.

CORIOLANUS:
I pray you, he you to know him:
Nor I have not long.
All:

COMINIUS:
I'll tell me, and what we will tell him,
I am nor place,
And yet I'll tell your office to my good lord,
To see the queen's answer,
To see him to come, and to him
To-emad,
Unless we may. We'll make them stay away.

BUCKINGHAM:
The grandONay, and I will he'll not be?

Messenger:
You'll not not be gone.

POLIXENES:
No cause, if you be an oathful.

PERDITA:
How now comes, Montague's the city?

CAMILLO:
Tush, the sun; but,
You are too, to be my lord, to be most,
And so I will;
But so would be in her ears.

LEONTES:
AEdet:
You are to be just and my life,
Unless them in them, and,
So now he did have seen him,
How now, I'll not my friends, i' the war,
And to my life be gone; take to have
Are I be this.

ANTIGONUS:
I then, go you, come, sir.

CORIOLANUS:
The cause be your care.

VOLUMNIA:
Go, sir.

MENENIUS:
See, sir.

MARCIUS:
He should you mean with him on,
LADY ANNE:
The people has so past the
land is hanged?

LADY ANNE:
Here's a man.

Citizens:
O, the more of
---------------

LUCIO:
Peace, sir, good lord,
Be it to be your opinion's way: I
What, I'll beseech you, where all you,
That we his wife.

DUKE VINCENTIO:
That you might say the duke is with you.

Provost:
Thou hast, make us keep him for you?

LUCIO:
I would be his friend:
We am I now, not to the poor
wixt it, and lay their
bclaim me again.

ISABELLA:
A:
Gentlemen, madam, and not we must not.

ABHORSON:
I pray, the truth.

Cous my lord, for the other's.

DUKE VINCENTIO:
It will, and yet of good Warwick,
To give the oracle.

ESCALUS:
My lord, with this world is a little great
To the or voice! I could not--

LUCIO:
There may I do; the people
To say 'tis bad. Show you are hanged.

ANGELO:
I am a man, be done.

CLAUDIO:
A day, more true
Of nature's son, or so far
With comfort for his.

DUKE VINCENTIO:
I have made her, and he was no more than well most in the other,
I'll make you
Than I have seen them, and that he doth he rather have,
To the loss of the truth of her.

ISABELLA:
A:
Go, good my lord,
For my lord, my lord,
My lord, I'll tell me.

ANGELO:
Tomewell, farewell:
I'll be but too, if your lip,
Or else be so sent my lord, sir,
AEdage:
Thou art past, then thou liest
For thou art a fawn'd at thy fortune,
Do not put him there?

LEONTES:
Welcome, worthy better
That it be a man;
I will give thee so,
I thank the friar,--

RICHMOND:
An is not a
---------------

GLOUCESTER:
My lord, I'll take it more,
And that it be sworn.

KING EDWARD IV:
My lord, and well I do know
That you have done to use the king,
And love the king may be the king on the king: some other.
To tell him here:
I have been been gone.

KING RICHARD II:
Ay, my lord,
And yet please me, my lord,
That all the king or England's face
My gracious lord, to be satisfied?

CLIFFORD:
Call my lord, our good soul, when I feel,
How dost thou but gone'st not good,
And, in a very day, in my father's death,
Pray thou to have
On him a dish and by me, but of these life.

KING EDWARD IV:
Uncle, and 'tis not, with my soul's head,
Virtue in the benefit of a
more than my soul is valiant--

GLOUCESTER:
Here is a thousand days, I I am strange,
That knows not the more, my lord,
I might have come to thee; so is a
people them the king.

LADY ANNE:
What is't is the gentle's son,
As you have been ill thing to be made,
And, for now he will not, the king is our
That is, by it is with his foot
From all with a good boy.

GLOUCESTER:
O, what you will at your lordship?

LADY ANNE:
I am a man; she's no more fearful
They'll reap the people.

BUCKINGHAM:
They are no more
To use my lord, let me be a
horse it must then, and I can fall
Than a secret fellow.

Citizens:
You have gone to make him you that, I have a
your lords, for it is no better,
Cannot take your goodness and let him be
With every not a loss, by all your wife's love he is gone.

Clown:
My lords!

First Murderer:
Oh, he's.

Third Murderer:
What!
---------------

What news you, that you stood these good.

COMINIUS:
The like you.

Cousin, the time;
Not I did;
The one we must be a great man,
That he may, the war of a noble
To draw the house of the world is amended.

MENENIUS:
Is not I
That'er he had not believe him.

MENENIUS:
Are you deserve to the
man come.

CORIOLANUS:
But call me hence?

VIRGILIA:
O Marcius!

First Murderer:
ANE:
You have been, he did
these sun of our brother's death,
You are no more
As I pray you.
You have been as we have made a drunkenry
With all the wind of a bawme and honour
With all the king, and, I will say
I have heard
I, how I must be
That he sits
He will have thought
To see them, for any doubt; and,
And so we are in my cheeks to our noble man.

VOLUMNIA:
Peace, sir, sir,
Who shows him my lord.

VALERIA:
No, what he did.

MENENIUS:
As Paris is too brief, but that is you,
From this day again, till he shall be gone. They must not go.

COMINIUS:
I'll tell me, be bad.

BALTHASAR:
I'll not your grace he is off
To the
st he does the boar!

LUCIO:
I'll tell you all to speak.

ESCALUS:
I will not give him.
O night, and he is
To the duke I would not be.

LUCIO:
O, my lord! Come, sir, sir, 'twas it?

ESCALUS:
Faith.

ELBOW:
I would not know that I have made us good,
With that you are to my care:
The other that is before the Earl of York.

CLAUDIO:
O, my lord,
That's not my soul.

QUE
---------------

Thou shalt be my good lord, and,
And yet, we must be gone.
Shall be gone:
Nay, good my good wife, thy heart's wife's hence to her husband! Let me be gone.

MENENIUS:
AEdue? Yet;
By our noble uncle, from the good man,
Your suit with the people o'en
To execute my foe, how we did use him.

SICINIUS:
What dost thou didst to the least?

CORIOLANUS:
The sacred noble
report him in the time
Is he had the one that is
be one; and, and not,
If the world be the heavens could.

MARCIUS:
O, fellow you shall you find all
The first will speak him.

SICINIUS:
That often will, O, you shall not not he
But would do it, and
Of this your true? and he's
in the people's wife?

BRUTUS:
Good sir, when he might,
I could be so?

Second Servingman:
I am welcome to the day is the people!

SICINIUS:
Is bad than we are absent to know
With no more.

CORIOLANUS:
I pray, but I know you
Thou art no less a king's head.

SICINIUS:
Even with being a
be a little thing to be it
kand it is hither, he hath it any of. But and so, it is,
For those that are gone with the city.

BRUTUS:
Come to be come!

MENENIUS:
I must but you may be so.

COMINIUS:
O, that it is so.

SICINIUS:
I' the mayor will you.

LADY CAPULET:
Go, sir.

Hath not of my mother, to the rest are,
That you shall not be quickly gone.

JULIET:
Brother, to rise to your highness here.

JULIET:
Good prince, let him be so.

First Citizen:
Why, my lord
---------------

SICINIUS:
O, sir,
And, what is what?

CORIOLANUS:
At the cause of those he
How made't, be satisfied; what we are an evil he is not too.

MENENIUS:
We'll do obey you
Amen, and with a man of care
If you do please you.

A gods here
you was there; then a name, and leave you,
I have all a one.

CORIOLANUS:
If you have been dangerous,
Which it shall not come to tell them yet
Like to be too; here comes the day of the air,
If I had a man so?

MENENIUS:
O, the
fabb'd my friends! the prince
And in all the
news, and you have in the
great man hath given'd a man.

CORIOLANUS:
Not, I'll not be the
day.

CORIOLANUS:
I did not be gone.

MENENIUS:
A god of his pleasure is here.
The very king's
sine, to hold us, being not
with him, with me?
But in the gods I will
My lord, I rather have you, I'll tell him.

CORIOLANUS:
There is it so! he shall be.

MENENIUS:
That you will do deserve.

CORIOLANUS:
Worthy my lord,
You are like a
teest, I am a ballgreat, if you have won'd to thee.

CORIOLANUS:
You shall be made me to give your
to be gone.

MENENIUS:
The one that are about.

MENENIUS:
Your high the way of the people.

CORIOLANUS:
O, how you are no more without
To tell us, sir, the people?

CORIOLANUS:
The people of this doubt till he is so much,
Let me be got.

Second Senator:
No, I'll not kill'd a fearful
Of this is't!

CORIOLANUS:
Go away, that he hath spoke to
---------------

As the queen, but with a glorious man,
And bid it been spent,
To help he can never be drunk.

Page:
What you were a joyful days,
And of all, you must return.

GLOUCESTER:
For honour, if the king's queen?

GLOUCESTER:
I would not never be a poor case.

PRINCE EDWARD:
And, my lord, I am no more
At that the poor king, he is here.

LADY ANNE:
Gentlemen, my lord; and so,
And say, and well thou wilt,
And, by the earth.

LADY ANNE:
I may not be thy blood; my word,
And lay the matter in her, thou art not at all,
But never be too late.

HENRY BOLINGBROKE:
I will keep her.

KING RICHARD II:
O, I'll tell thee to make me speak,
And, this, I see his
me, and all the common
My lord, she is
My lord.
Are I can make you to-day,
Though I have well heard thy brother, O, being the doth not trouble.

QUEEN ELIZABETH:
I'll tell him.

KING RICHARD III:
I'll tell them well.

QUEEN ELIZABETH:
And he did I leave my lord.

KING RICHARD II:
And for my love.

BUCKINGHAM:
The king, poor my lord,
And make my good father, for a man that says I am early upon their best
Of my gracious head, and I thank you.

CLIFFORD:
I have not so so.

QUEEN MARGARET:
But I would tell me, I'll not be satisfied,
How now, with the Duke of York,
And she's gone.

QUEEN MARGARET:
My gracious son, so he did.

GLOUCESTER:
But to thy name, too; and, sir; and to Richmond.

GLOUCESTER:
I hate your grace, my lord,

---------------

Whose my lips, the stroke of God,
And but both be past a devil.

FLORIZEL:
Now, and so place.

HORTENSIO:
POMPEY:
I prithee, madam, my lord,
My name, I have seen it as the more.

CAMILLO:
I'll not be ignorant.

ISABELLA:
HENRY BOLINGBROKE:
Yes, sir, I know the world.

ANGELO:
Give me this remedy, you shall not be with me; make with my
shixt my good lord.

GLOUCESTER:
I am up.

KING EDWARD IV:
Would you go you.

KING EDWARD IV:
Not, for you.

KING EDWARD IV:
The Earl of York, my lord,
And make this good sir, uncle, farewell.
Which he any he makes him.

KING RICHARD III:
Go thou did grant me our fortune's love!

GLOUCESTER:
The queen'st of thy heart shall have been gone.

QUEEN ELIZABETH:
VICINIUS:
From that you may said, I am made the more
At the more that thou not a poor lady's wife
As I'll bite him:
An much in thy noble heart, to be gone,
By my brother,
When I am not satisfied.

CORIOLANUS:
Plantis well when your pleasure.

MENENIUS:
I hope how this is true.

BRUTUS:
These is not the people?

SICINIUS:
The queen is the more, to look them with me.

MENENIUS:
Good princes.

SICINIUS:
AEdile:
Faith, gentle fellow, and it is gone.

MENENIUS:
Besides, what news
To bring him
At our noble man is your grace i' the people
Would free good?

CORIOLANUS:
O, good sir, a poor kingdom,
Which it is no more
To the people is the sanctuary that I'll have been with you.

Training costs were $0.71 for GPU and $0.09 for other stuff. It took almost bang on 1 hour to train. Inference (generating text) took a few seconds.

No local GPU? No Problem! Running Andrej Karpathy’s NanoGPT on Modal.com

posted on July 15, 2023July 25, 2023tagged as Blog, MachineLearning, NanoGPT

Andrej Karpathy released a series of timeless lectures teaching us mortal 9-5 programmers from scratch how to train an “AI” language model, a bit like that GPT4 or ChatGPT you may have heard of.

He goes into a deep dive that includes building your own tiny Pytorch from scratch, setting up bigram models, and simple neural nets, before moving over to use the real Pytorch later. He then explains how transformers (the T in GPT) work, and codes one up to generate some dubious Shakespeare. This final model he calls “NanoGPT”, because of the similarity between it’s model and that of the early GPT models that lead to ChatGPT.

So why this post?

Well, while I absolutely loved the series, I don’t enjoy working with Colab or Jupyter Notebooks. It is easy to forget what code blocks have run, and I am forever scrolling up and down because the code is mixed up with results in one giant page. Not only that but if you are using Google Colab it will time out fairly quicky so you need to waste time running everything again.

⚠️Warning: I don’t think I recommend doing what I do here anymore. It works but is super fiddly. I am working on a much easier way to do this with a single Python file you download and run. So please read bearing that in mind…

I’d run it on my machine instead, but…

I want to run NanoGPT locally but I don’t have a good GPU. To save buying one for $2000+, I would like to rent one in the cloud if possible. If I use cloud GPUs I can experiment quickly with different chips as needed. An A100 GPU for example costs maybe $7000 – $15000 USD, but grabbing one for an hour for $4 is much more in my budget.

modal.com provides this service, and they take care of all of the “devops” as we will see soon. There is some housekeeping Python code to write, but no bash, Terraform or Ansible, which is great because I don’t want to do that.

Their GPU prices are not the cheapest. I would say they charge fair (average) prices though. And they charge for the milliseconds of actual usage and nothing else. That means I don’t pay extra because I forgot to shut down a server. Also they include $40/month credit for free anyway so it is costing me nothing to learn.

In this post I will show you how I used Modal to quickly train and run the NanoGPT model, while having the creature comforts of developing in VSCode.

What is NanoGPT anyway?

NanoGPT is nothing but a text producing bot!

When trained on some text it will learn how to predict the next character. So for example if you feed it “Hello ” it might predict W. You then feed it “Hello W” and it might predict o and so on. By repeating this you get text generation.

When trained on Shakespeare it makes muddled text that is quite a bit Shakespeare-looking.

Example of NanoGPT generated text:
FlY BOLINGLO: Them thrumply towiter arts the muscue rike begatt the sea it What satell in rowers that some than othis Marrity.

LUCENTVO: But userman these that, where can is not diesty rege; What and see to not. But’s eyes. What?

JOHN MARGARET: Than up I wark, what out, I ever of and love, one these do sponce, vois I me; But my pray sape to ries all to the not erralied in may.

If you want to know more, you can check out:

Video (with links to resources)
Google Colab code
Github code

Now let’s get started, and get NanoGPT trained and running with local code, and a cloud GPU from Modal.

Step 1: Learn how to run code on Modal

I won’t parrot too much what Modal have in their tutorials, as that is the best place to go, but in a nutshell you can decorate functions in Python that you want to run on their servers.

For example you have a function you want to run in their cloud:

@stub.function()
def f(i):
    if i % 2 == 0:
        print("hello", i)
    else:
        print("world", i, file=sys.stderr)

    return i * i

And then you can call this from a local function either as-is (to run locally) or with .call (to run on the server):

@stub.local_entrypoint()
def main():
    # Call the function locally.
    print(f(1000))

    # Call the function remotely.
    print(f.call(1000))

To run this from the command line:

modal deploy example.py

Step 2: Fork the NanoGPT repo, and check it works on local computer

The next step is to make a fork of https://github.com/karpathy/nanoGPT and clone that fork to my computer, so that I can make some changes to adapt it to use Modal.

Note: If using Windows, you will need to use a Linux distribution installed to WSL2 to do this successfully as Windows is not supported for torch.compile

It is a good idea to check that we can get it to run locally. I just want to check the code works fast so I will reduce the number of iterations in train_shakespeare_char.py to 5, and dumb down the model size to ridiculously small so it completes in a few seconds on a crap laptop. Here are the changed lines in train_shakespeare_char.py:

...
max_iters = 5
...
# baby GPT model :)
n_layer = 2
n_head = 4
n_embd = 16
dropout = 0.2
...

In addition, I uncomment these 2 lines in the same file (train_shakespeare_char.py) to make it possible to run on an average laptop with no GPU:


# on macbook also add
device = 'cpu'  # run on cpu only
compile = False # do not torch compile the model

To check that it works, I set up a Python environment, and run similar commands as shown in the NanoGPT README.md:

python -m venv .
source bin/activate
pip install torch numpy transformers datasets tiktoken wandb tqdm
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py

From this we get a confirmation that this training loop is running correctly:

step 0: train loss 4.1783, val loss 4.1771
iter 0: loss 4.1791, time 47896.67ms, mfu -100.00%

Knowing that it works on my computer makes me more confident to try and getting it working on Modal.

Step 3: Upload the training data to modal

3.1 Authenticate with modal

First, lets do the basic setup for Modal and get authenticated:

pip install modal-client
modal token new

3.2 Change the prepare.py to upload to Modal

Now edit data/shakespeare_char/prepare.py, and nest the existing code inside a main function. Add a @stub.local_entrypoint() decorator, so that Modal knows to run this locally.

@stub.local_entrypoint()
def main():
    """     
    Prepare the Shakespeare dataset for character-level language modeling.
    So instead of encoding with GPT-2 BPoE tokens, we just map characters to ints.
    Will save train.bin, val.bin containing the ids, and meta.pkl containing the
    encoder and decoder and some other related info.
    """
    import os
    import pickle
    ...

Add the following lines at the top of the file to define the volume and app name:

import modal

volume = modal.NetworkFileSystem.new().persisted("nano-gpt-volume")
stub = modal.Stub("nano-gpt-code")

Now add this function at the bottom, which will run on the remote server. All it does is copies the files over with some prints to check if it was successful. It keeps the folder structure on the server the same (the working directory is /root there) so that there is less code to change in train.py when we get to it.


dataset = "shakespeare_char"

@stub.function(
        mounts=[modal.Mount.from_local_dir("data", remote_path="/source/data")],
        network_file_systems={"/root/data": volume})
def copy():
    import shutil          
    import os


    source_dataset_path = os.path.join("/source/data", dataset)
    dest_dataset_path = os.path.join("/root/data", dataset)

    def check():        
        if os.path.exists(dest_dataset_path):
            files = os.listdir(dest_dataset_path)
            print(f"Files: {str.join(', ', files)}")
        else:
            print(f"Path doesn't exist")

    check()
    shutil.copytree(source_dataset_path, dest_dataset_path, dirs_exist_ok=True)
    print("files copied")
    check()

Now make the call to copy from main:

...
    # val has 111540 tokens

    copy.call()

3.3 Run the upload

You can now run this to perform the upload:

modal run data/shakespeare_char/prepare.py

You should get an output like this:

Path doesn't exist
files copied
Files: meta.pkl, val.bin, prepare.py, input.txt, __pycache__, train.bin, readme.md

If you run it again, it should show that the files exist before it is copied, proving that the data was persisted. Now the remote machine has access to the training data.

Step 4: Adapt the training code to run on Modal

4.1 Make the training code into a Python package

As far as I can tell, in order for Modal to see all of your Python code it must be organised in a package.

To make the code into a Python package those is quite simple, first move the python files for the model training and text generation into a new folder:

mkdir nanogpt
mv config *.py nanogpt

Find all instances of from model in these files, and replace with from .model (Add a period). For example in train.py:

from .model import GPTConfig, GPT

Adding a period to these local imports says “this is from the current directory’s package”. This allows the code to work when called from another package or location, which will be doing when using Modal.

4.2 Remove the configurator

There is a line in train.py that needs to be commented out because it won’t work in Modal (because it doesn’t have the source files in the same place), so comment this out, and add a hard-coded line that does the equivalent thing for the Shakespeare model.

# exec(open('nanogpt/configurator.py').read()) # overrides from command line or config file
from .config.train_shakespeare_char import *

This is perhaps not the ideal way to do it, but a quick change for the purposes of making this blog post not too long.

4.3 Add a python script to run the code in Modal

Create a new file called train.modal.py in the root of the project (so one up from nanogpt folder) and add the code below. I have put some comments in there to explain it.

import modal

# Make sure we have access to the data we prepared earlier:
volume = modal.NetworkFileSystem.new().persisted("nano-gpt-volume")

# Set up the container for running the training, and make sure it has the necessary
# python pacakages installed.
stub = modal.Stub("nano-gpt-train",
    image=modal.Image.debian_slim().pip_install(
        ["torch", "numpy", "transformers", "datasets", "tiktoken", "wandb", "tqdm"]
    )
)

# This stub.function allows train_modal to be called remotely on their servers. We will
# now specify how we want that set up...
@stub.function(
        # Ensure that the function runs with a GPU, I have picked out a cheap one, but you can replace
        # this with "any" in the future if this GPU is no longer available.
        gpu=modal.gpu.T4(), 

        # Increase the timeout to allow long training times.
        timeout=3600, 

        # This tells modal to upload the entire nanogpt package we created. Without doing
        # this it won't be able to locate train.py, model.py etc.
        mounts=[modal.Mount.from_local_python_packages("nanogpt")],
        
        # Mount the data we prepared earlier
        network_file_systems={"/root/data": volume}
        )
def train_modal():
    # This import is a cheeky and quick way to run nanogpt with minimal changes to Andrej's code. Ideally we would change
    # the `train`` module to expose a function. Then import `train` and call that function.ction and call that.
    import nanogpt.train

# This is what gets called locally when running `modal run train_modal.py`, and it just calls the 
# remote function.
@stub.local_entrypoint()
def main():
    train_modal.call()

With a GPU available, we can comment these 2 lines back out in train_shakespeare_char.py:

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

We also want the checkpoint saving to work (which saves the progress so we can resume on error, and also to run the model later). Because we mounted a folder called data, make the following change, otherwise the checkpoints wont be saved:


out_dir = 'data/out-shakespeare-char'

4.4 Run the script

Now we can run this from the command line: modal run train_modal.py, and here is the result:

(nanoGPTonModal) martin@Capo:~/nanoGPTonModal$ modal run train_modal.py
✓ Initialized. View app at https://modal.com/apps/ap-k9Oehw5IpXCxmt3yNBUNds
✓ Created objects.
├── 🔨 Created train_modal.
├── 🔨 Created mount /home/martin/nanoGPTonModal/nanogpt
└── 🔨 Created mount /home/martin/nanoGPTonModal/train_modal.py
tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 0.01M
num decayed parameter tensors: 10, with 11,280 parameters
num non-decayed parameter tensors: 5, with 80 parameters
using fused AdamW: True
step 0: train loss 4.1783, val loss 4.1771
iter 0: loss 4.1791, time 3620.00ms, mfu -100.00%
✓ App completed.

4.5. Revert to the proper sized hyper-parameters

Revert the values in train_shakespeare_char.py to the bigger model values, with more iterations. Now we are using Modal, this will be able to run in a reasonable time.

...
max_iters = 5000
...
# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2
...

Tip, the next step takes about 15 minutes. If it makes training progress (says checkpoint has been created) but then gets stopped, you can resume it by setting init_from = 'resume' in the parameters above.

Running modal run train_modal.py again:

\(nanoGPTonModal) martin@Capo:~/nanoGPTonModal$modal run train_modal.py
✓ Initialized. View app at https://modal.com/apps/ap-HU6D2SRnxOv1OsJpmlb3Fj
✓ Created objects.
├── 🔨 Created train_modal.
├── 🔨 Created mount /home/martin/nanoGPTonModal/nanogpt
└── 🔨 Created mount /home/martin/nanoGPTonModal/train_modal.py
tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2649, time 29573.95ms, mfu -100.00%
iter 10: loss 3.2438, time 101.76ms, mfu 3.66%
iter 20: loss 2.7899, time 103.62ms, mfu 3.66%
iter 30: loss 2.6383, time 104.10ms, mfu 3.65%
iter 40: loss 2.5763, time 101.83ms, mfu 3.65%
iter 50: loss 2.5261, time 104.54ms, mfu 3.64%
iter 60: loss 2.5136, time 103.90ms, mfu 3.64%
...
iter 4980: loss 1.2050, time 117.62ms, mfu 3.16%
iter 4990: loss 1.2493, time 114.90ms, mfu 3.17%
step 5000: train loss 1.1405, val loss 1.4969
iter 5000: loss 1.2446, time 12044.48ms, mfu 2.86%
✓ App completed.

Costs

It took about 14 minutes and cost $0.21 to train the model. I think $0.14 was for the GPU and the rest was for CPU/memory.

Conclusion

First, this took a little more work than expected to get some local python code running on Modal.

The combination of design choices in the nanoGPT repo, and the fairly narrow happy path to get code to run in Modal meant that a lot of changes had to be made. To summarize these things meant code changes were needed:

Modal will only upload a bunch of Python files if specified as a package. NanoGPT didn’t do this.
Modal will put the files “somewhere”, so using exec() on relative paths to local scripts like NanoGPT does won’t work.
Modal requires additional functions and decorations, so a new file is needed.
Modal requires specification of mounts etc. so this new file has quite a bit to it.

I think if you build a Python project with Modal in mind, then the experience will be easier. You will know how to organize files, what not to do, etc. So there will be less work to do.

Next, it is worth saying that once you get this working, it works really well. Running modal run train_modal.py it gets going and chugs along, you almost forget this is doing a whole bunch of ops stuff in the cloud for you. Then you can iterate and change things up and Modal gets out of your way a bit.

With Modal set up, I can now code with an IDE, IDE Plugins, file structure, git, etc. It is more what I am used to than the Jupyter experience where you have to remember what state things are in, there is effectively one big file, and output and code are all mixed up. This is much better.

Therefore overall I think Modal is worth learning and experimenting with, and putting that initial effort to get set up. Or if money is no object, just go buy a big GPU :-).

In the next blog post I run the text generation to see what kind of Shakespeare this model can produce. This will require some code changes to get that to work on Modal, but I expect it to be a lot less as much of the work has been done.

I will also explore what other features are in NanoGPT and try them out using Modal too.