Production tests: a guidebook for better systems and more sleep

Your customers expect your site to be fully working whenever they need it. This means you need to aim for near-perfect uptime not just for the site itself, but for all features customers may use.

Modern software engineering uses quality control measures such as automated test suites and observability tools (tracing, metrics, and logs) to ensure availability. Often overlooked in this landscape is production tests (also known as synthetics) that can give you immediate notification of failures in production.

Production tests can be setup up with minimal fuss—usually within one sprint—and can provide a high return on investment. In this post, I will cover how to best set up production tests and how they can help with reliability, deployments, and observability.

While I have always liked production tests, I got a real appreciation for them at Atlassian, where they are used extensively and are called “pollinators”, and I have seen first hand how they can give early warnings of problems, which can be fixed before the become incidents.

What are production tests?

A production test is any automated test that runs on the production environment. The test runs on a frequent schedule so that an on-call engineer can respond quickly. Typically, they run every minute. The test might use a headless browser to emulate user actions, or it may use an API directly to emulate the actions of browser code or backend service.

The production test should run in a reasonable time. I suggest 30 seconds or less, so that you can run the test easily once per minute. A test that takes longer than 30 seconds is probably to complex anyway for a production test. How the test deals with failure is up to your team. It could integrate with your on call paging system, send a Slack notification, or just log an error into your logging systems.

How do production tests help?

Productions tests help make your production environment more reliable by giving immediate warning of a regression. This means you can potentially fix issues before a customer discovers them.

In addition, the production test can be used as a canary before deployments, and as such acts as an integration test. The test can detect regressions that are caused by mismatches with other services, such as API shape or error-handling issues. You can run the test when deploying to development and staging environments too, to get a warning of an issue during development.

You also make it easier to debug production issues by having production tests. If you have an issue you are investigating, knowing which production tests are working and are failing gives you insight into what the issue might be. If you rely on other team’s services, and they also have production tests, you can look at their tests too to help with diagnostics.

Having production tests will reduce the time to recovery for any incident where a human has to fix it, as the engineer learns about the issue sooner, and has more information at hand to resolve it.

Important design considerations for production tests

For production-tests to be worth having, they need to be well-thought-out. If the test keeps failing it’ll probably get silenced and ignored. Even if the test is reliable it can cause other problems, such as resource usage impacting downstream systems. You may even have to change your systems to be a bit more testable. Here are some tips to consider when setting up your production tests:

Keep your production tests basic

Your production tests should cover less ground than your automated test suites. You want the tests to be reliable enough that they don’t waste your time with false alerts, which leads to frustration, lost time and possibly disabling of alerts. The goal of a production test is to be the canary that something has gone badly wrong. It gives you a head start on fixing it, hopefully before a customer gets affected.

To illustrate the required simplicity, here are some examples of what I think are good candidates for a production test:

  • Log in, and confirm you are on the home page, showing that user’s name.
  • Load the main editor of your app and type in Hello. Confirm Hello is there. Reload the browser and confirm Hello is still there.
  • Call 4 API endpoints to do CRUD operations for your microservice, using possibly some fake data in the API.
  • Ping /health and check for 200 response code within 250ms

Contrast this with these more complex and problematic production tests:

  • Run through a 25-step test that checks all the main functionality of the editor, asserting that elements are the correct size and are in the correct position.
  • Test that if you make call in quick succession to the API, they will be added to the database in the same order.
  • Check that the credit card page can differentiate between Visa and Mastercard based on the card number.
  • Ping /health and check for 200 response code within 1.5ms

The first examples are good because they are quick, reliable, and simple. They are not too functionality-specific, are so are unlikely to be broken by feature changes.

The bad examples on the other hand have problems:

The 25-step test will likely be flaky due to browser automation quirks, and feature changes. You will spend a lot of time investigating failures to the test, and concluding “It was just a one-off” or “Oh! It’s because we moved a button”.

The API calls in quick succession will sometimes fail due to network conditions changing the order in which the requests are received. It is not necessary to do something sophisticated in a production test—your goal is to know if something is horribly broken, not detect subtle timing issues.

The credit card test is probably OK in terms of not being flaky, but is a little too specific. You can almost entirely de-risk a bug in credit card UI behavior with tests that run before deployment, so do that instead.

The health check test expecting a fast response is likely to fail often enough to cause alarms that have no meaningful response. There are better ways to monitor latency in production that I will cover in a future post.

But… try go get some decent coverage

You are not aiming for anything near 100% code coverage, in fact code coverage doesn’t really matter here. That said, production tests should cover more than just “load the home page”. This is an art, but a guiding idea here is “if there is a serious problem, how likely are my production tests to detect it”. You are balancing that up with “how frequently will I get false reports of an outage”.

You also may want to think about the value of the things you are testing. Is it a minor feature used by 1% of users, or is it the page where new customers sign up? The latter is particularly important for growth companies who rely a lot on customer self-servicing to try out their software.

You don’t have to get this completely right from the beginning. You can add more tests next week. Furthermore, you can edit your existing tests, or even remove them too. Err on the side of too little coverage to begin with and consider adding more later. This way your team will get used to owning, tweaking and responding to production tests, without a cacophony of alarm bells!

As a rough guide, I would say test 3-5 simple things to begin with, with the goal to eventually move towards a reasonable amount of coverage. What is reasonable? You’ll find out after trying things out, and discussing outcomes with your team. There is no correct answer here.

Production tests are not health checks, but may overlap with them

A health check is usually a simple check that the server is running. For example, if using Node.js and Koa or Express, you might add a health route that just returns a success response on any invocation.

The purpose of this check is to assess basic server health. It allows load balancers to know which nodes to send traffic to, and deployment pipelines to know when a deployment has completed (or failed).

We expect these health checks to fail in production, possibly without affecting the customer. For example if a machine goes offline, the load balancer will detect this and stop sending traffic to that node. However, if there is no adverse effect of this on the consuming service or customer then there is no urgent problem. The system has self-healed.

Calling this health check on a node as a production test is not advisable, as it will cause false alarms. Even if it didn’t, a health check is not a good indicator of user experience.

The term “Health check” is also sometimes used to refer to checks that do more that just check the server is up. E.g., they may check dependencies such as storage systems, caches, queues and so on. A production test that calls such a health check could give early warning of an issue before it becomes noticeable by customers.

Be mindful of how your production tests affect observability

Running a test every minute, or 1440 times a day, will show up quite a lot in logs, metrics, and traces. This is often a good thing, because regions or services with very low traffic are now a bit more “observable” than they would be without such tests.

The downside is it may add costs by keeping resources spun up in those regions. Another downside is that being fake traffic, and sometimes requiring fake data (such as a fake user ID) can add noise to the logs you will sometimes need to filter out.

Fake data considerations

Consider a monolithic application — a single server that serves a website and all of its functionality. You need to write a test that logs in, enters some data in a field, saves it and checks it got saved. Such a test has a few challenges.

Firstly you need to decide how it logs in. Does it have a real account? If it does—what stops that account from expiring (e.g., from a free trial). You may need a discount code or special “fake credit card” that the system knows is for testing. If it is a “fake” account then how does that work? Is it hard coded somewhere.

The test is now generating data on each run. Will the success or failure of a previous run affect the current run of the test. Will running the test thousands of times a week use up storage space?

Think about these issues when planning the tests.

The tempting thing is to do the easiest thing to get it done. E.g., create an account using your staff email, worry about storage later, just get it working. This can come back to haunt you if the test breaks later, and you need to deal with upgrading the account, or maybe the member of staff whose email was used is on leave.

Making a production system testable may take some work and have special switches, such as user fields, or feature flags so that the behavior can be slightly different to facilitate the test.

If you use microservices, or a monolith with a separate auth service, there are ways to avoid needing a test user account. For example if your service uses a JSON Web Token (JWT) to authenticate, the authentication server could support logins for test systems, and your service can declare which tests it is authorizing to call it.

Three strikes before an alarm

If you have enough production tests, and run them on enough systems for long enough you will get false alerts. These can be caused by all sorts of things, network problems that only affect the test side, quirks in browser automation, or a genuine problem that heals itself two seconds later.

The simple way to avoid getting these kinds of false alerts is to wait for three consecutive failures before raising any alarm.

You might be giving me a side eye right now though, because should we be ignoring these false alerts? Probably not. If you have a team ritual of looking through operations data, you can incorporate non-alerting production rest failures into that ritual. The idea is that you prioritize it like any regular work, rather than making it an urgent thing that requires people to work out of hours, or stop their regular work.

Pros and cons of production tests

What is good about testing in production? I have hinted at a few of these already, however let’s get into the details:

  • Real world testing: Test suites—including unit tests and locally-run integration tests—that run on every build are definitely needed. However, even with all of there is nothing quite like test-driving the finished car, and that is what production tests are for. The proof of the pudding is in the eating.
  • Quality Control: The main reason for running tests is to know something is broken. Despite all the automated and manual testing you do on your machine and in staging environments, production is always unique. There are some things that only happen they way they do in prod. And even if your staging environment is precisely the same, generally speaking production will have more traffic. Running tests in production let you know if certain things are working in production.
  • Troubleshooting: If there is an incident where customers are impacted by a bug or outage, the tests are very useful. They may be failing or passing, but either way, their current state tells you something about the system, and can help narrow down the problem.
  • Observability for low traffic regions: If you deploy your service to multiple regions and some of those regions don’t get much traffic, the production tests will create synthetic traffic. Your observability metrics such as latency and reliability percentiles will be more meaningful with even just 1000 calls per day, than with close to zero traffic.
  • Safer Deployments: The same tests you use to monitor production can double up as tests you run to accept a blue/green deployment. These tests are then acting as continuous integration tests of last resort, adding to the assurance that your deployment won’t cause big issues.
  • Reuse in other environments: Just as you can use these tests for safer deployments, you can also use them in dev and staging environments, to get early detection of issues that your build pipeline cannot detect, such as integration issues with other services.

There are, of course some disadvantages to using production tests:

  • Setup and teardown challenges: Tests run on the real system, so you cannot wipe the database before each run. You need to figure out how to best set up the specific scenario you want to test for, and how to clean up after the test.
  • They sometimes need setup of scenarios: For example you have a test than you can upgrade your account, but you need an account set up that needs to be upgraded, and a fake payment method that will be accepted. These scenarios may mean changing production code to support them too.
  • They can be flaky: Any test that runs on a real system can fail from time to time for various reasons. For example network issues between the test and the service or occasional timing issues in the browser the test uses. Flakiness of tests needs to be taken into account if you intend to wake someone up when the test fails.
  • They cause resource usage and costs: Running tests do cost. Often tests are run across different geographic regions to ensure they are tested well, and hundreds of times a day. That compute cost adds up.
  • Human cost in maintaining tests: Not every test fail is an issue, but it does need to be investigated. Too many tests could leave you with a full-time position monitoring and making those tests more robust.

Production tests, vs. observability

You can also “test production” though monitoring of real traffic, and looking for problems there. This deserves a post or series of its own, but in short you can check for things like:

  • Latency, e.g., alert me when the top 99-percentile of latency for a particular endpoint is > 200ms, for 3 periods in a row.
  • Reliability, e.g., alert me when more than 0.1% of requests fail with a 5xx code.
  • Assertions, e.g., alert me when a code path that is considered unexpected or impossible has been triggered.
  • Failures, e.g., alert me when a customer couldn’t perform a task because they got an error.

The good thing about observability-based alerts is they are very simple to set up. It may need some minor code changes, and then setting up detectors in observability tools, that take look for and take an action when the condition is met.

These tests pair well with production tests. They have a different purpose though, since they detect problems after they have happened, and they piggyback on existing system use. They can be helped by production tests creating additional synthetic traffic which they can then monitor, which lets you keep checking things when natural traffic levels fall off.

There is no “versus” though! You would do well to have both. For a particular need it might be better to use a production test or better to use observability. A rule of thumb is if observability can meet the requirements of alerting you to a problem, then it will be easier to set up and tune to your needs than a production test.

Summary

Phew! We covered a lot of ground there. In a nutshell adding well-designed production tests to your systems will help in a number of ways. You’ll get earlier warnings of issues, you will be able to fix them faster, you will get observability benefits, and you can also use production tests as a deployment rollout test.

It is worth adding production tests if you are not doing so already—it is a relatively small task to fit in to your planning, and you will reap rewards. If you are already using them, keep reviewing them, and see what you need to add, remove and tweak to get the most value from them as your systems evolve.

Happy testing!

LanguageTool — Check your writing the open-source way

If you write anything (so that’s a yes) then an in-browser spelling and grammar checker can be handy. Even if all you write is emails!

A checker that is free and good, and easy to use, is LanguageTool. As a bonus, it is open-source too. The open-source version is a bit more technical to set up, and if you don’t want to do that, they have a paid cloud version. However, it isn’t too hard to get going, as we will see below.

1. Install the browser extension

Install the LanguageTool Chrome Extension or Firefox Extension.

This will work as-is, and you can start using the tool to check your spelling on WordPress, Google Docs, Gmail and most other apps.

However, it will be limited to a certain number of words, and certain features.

You can either 1. live with that, 2. upgrade to the paid cloud version, or 3. perform the following steps to run it locally for free.

2. Run a local LanguageTool server

The instructions to do this are here: https://dev.languagetool.org/http-server

If you are lucky enough to use a Mac, there is a simple brew installation. Otherwise, I found the Docker method quite convenient, if you already have Docker installed. (If you don’t it is worth installing Docker Desktop or Rancher, as it is pretty handy for many things, including ephemeral “installations” of software like LanguageTool for example).

The docker images are maintained by different people, so there are a few options. One of these is https://github.com/meyayl/docker-languagetool, which I got working well locally. The command to do so is there, but I will repeat it here:

docker run -d \
  --name languagetool \
  --restart unless-stopped \
  --cap-drop ALL \
  --cap-add CAP_CHOWN \
  --cap-add CAP_DAC_OVERRIDE \
  --cap-add CAP_SETUID \
  --cap-add CAP_SETGID \
  --security-opt no-new-privileges \
  --publish 8081:8081 \
  --env download_ngrams_for_langs=en \
  --env MAP_UID=783 \
  --env MAG_GID=783 \
  --read-only \
  --tmpfs /tmp \
  --volume $PWD/ngrams:/ngrams \
  --volume $PWD/fasttext:/fasttext \
  meyay/languagetool:latest

3. Tell your extension to use the local server

Click the LT logo on your browser. It is either on the toolbar, or nested inside the extensions menu (jigsaw icon). It looks like this:

Click the settings (gear icon), and then scroll down to Advanced settings (only for professional users) and under LanguageTool server, choose Local server and click Save.

4. Try it out

Open your favorite digital parchment, such as Google Docs, Confluence, or Notion, or anything else. For a quick try-out, it works here too: https://www.editpad.org/. Whatever you decide, when you open the page you should see a small blue circle with a tick, somewhere in the bottom right of the editing space.

Type something with an intentional mistake, e.g., “Helo”, and you should see the circle go red, and the mistake is highlighted. You are now being watched! In a good way.

Coding Resources

A list of resources I have found useful for programming. Mainly for my own reference later:

Running local Python code on a remote GPU (using modal and lob.py)

I explored in a previous post how to run nanoGPT on modal – both the training and sampling. It was successful, but tiresome. There were a lot of changes to the downloaded code which made me unhappy. If I want to try different projects out that are on Github etc. I don’t want to be doing a lot of coding and fiddling just to get the existing code to run. I just want to run it as if I am running it locally. This is where my script lob.py comes in!

What is lob.py?

This is a Python script that provides a fairly easy way (all things considered!) to run your local code on a cloud GPU. It does this by running your code on Modal, and handles some of the logistics of doing so, such as uploading source code.

Let’s explore what it does, by doing what took 1.5 blog posts in 1 blog post, and train and run nanoGPT on Modal.

Using lob.py to train and run nanoGPT.

First, clone nanoGPT, and download lob.py from a public Gist (I may make this a full Github repo later, we will see).

git clone https://github.com/karpathy/nanoGPT
cd nanoGPT
wget -O lob.py https://gist.githubusercontent.com/mcapodici/eaa39861affe75be13badefbe9e05079/raw/bbc9e3cbb692277ffcf18406c61805685bf70d25/lob.py

Now set up a python environment you favourite way. I will use venv in this example:

python3 -m venv .
source bin/activate

Now you might want to add the following to .gitignore to avoid have lots of changes show up (this might differ if you used another Python environment tool)

bin
lib
lib64
share

Now install modal and log in, using their standard instructions:

pip install modal-client
modal token new

We will now set up the lob.py for our requirements. The version we downloaded is already set up for nanoGPT, but let’s review the contents of it’s parameters.

Setting up lob.py run parameters

The first one is just selecting what GPU you want to use. For nanoGPT, the cheapest one t4 is plenty enough for the task:

# Choose one of: "t4", "a10g", "inf2", "a100-20g", "a100" or None
gpu="t4"

Next we define the commands that run. These are run after copying all the local code files to the server and changing directory into that folder. We have a single command for each stage, but you can have multiple.

commands={
    'prepare': ['python data/shakespeare_char/prepare.py'],
    'train': ['python train.py config/train_shakespeare_char.py'],
    'sample': ['python sample.py --out_dir=out-shakespeare-char'],
}

Now we set verbose, which tells us what files are being uploaded, we set the name of the volume (so that we can keep our files for this project separate), a timeout of 60 minutes after which Modal will terminate the job and a list of paths to not upload:

verbose=True
volume_name_prefix="2023-07-27-10-45"
timeout_mins=60
exclude_paths_starting_with=["./.git", "./.github", "./bin", "./lib", "./share"]

Finally we define the image, which is how the container will be set up that will run the program. rsync is needed because it is used to copy up the right files (without losing generated files on the server). In addition we need to do the pip install defined in the README.md of the nanoGPT project:

image = modal.Image \
    .debian_slim() \
    .apt_install("rsync") \
    .pip_install("torch numpy transformers datasets tiktoken wandb tqdm".split(" "))

Train and run using lob.py

With all the set up done, running is very simple. Just run these commands one after the other. They correspond to instructions in the readme.md

modal run lob.py --command prepare
modal run lob.py --command train
modal run lob.py --command sample

Here is some output from the final phase:

ISABELLA:
This is the day of this is your land;
But I have been call'd up him been your tent?

DUKE VINCENTIO:
How far of the solemnity? who is wrong'd?
Why should we shame an arms stoop of life?
They will prove his like offence with life
And to be crave a happy model's guilty of his cheeks;
For all his foes, that are gone of me.

Here is the entire output of the 3 commands (click to expand):

Click to expand

Notes about how lob.py works

The script works by doing the following:

  • The is a function called copy (I should probably have called this copy_and_run) that runs on the remote machine, and copies all of the changed files from the local file system to the remote machine.
  • To this function we bind 2 directories that appear on the remote system:
    • /source/code is a mount that is a copy of your local folder, except for the folders mentioned in exclude_paths_starting_with. This is a (I think) temporary and (for sure) read-only folder.
    • /root/code which is a “network file system” which has been set up as persistent and read/write.
  • The copy function uses rsync to copy everything that has changed from the mount to the persistent file system. This means that future runs are quicker (they only need to copy changed files) and the running code can save data, such as model snapshots, and recover them on future runs.
  • Once copy has done with rsync it changes directory into the /root/code folder, then runs your commands.

Modal.com and NanoGPT continued: producing output; using tiktoken for bigger tokens

In the previous post we explored how to get NanoGPT training on Modal. There was quite a bit to that, so I left the text generation part to now just to cap that post off. Let’s do that now and then try some more stuff out with NanoGPT.

Let’s make some Shakespam

With all the setup work done in the first post, generating text on Modal will be much easier.

The repo code that generates text is sample.py, and we just need a script to hook into that and run in in Modal, which is this (train_modal.py):

import modal

# Make sure we have access to the data we prepared earlier:
volume = modal.NetworkFileSystem.new().persisted("nano-gpt-volume")

# Set up the container for running the training, and make sure it has the necessary
# python pacakages installed.
stub = modal.Stub("nano-gpt-sample",
    image=modal.Image.debian_slim().pip_install(
        ["torch", "numpy", "transformers", "datasets", "tiktoken", "wandb", "tqdm"]
    )
)

# This stub.function allows train_modal to be called remotely on their servers. We will
# now specify how we want that set up...
@stub.function(
        # Ensure that the function runs with a GPU, I have picked out a cheap one, but you can replace
        # this with "any" in the future if this GPU is no longer available.
        gpu=modal.gpu.T4(), 

        # Increase the timeout to allow long training times.
        timeout=3600, 

        # This tells modal to upload the entire nanogpt package we created. Without doing
        # this it won't be able to locate train.py, model.py etc.
        mounts=[modal.Mount.from_local_python_packages("nanogpt")],
        
        # Mount the data we prepared earlier
        network_file_systems={"/root/data": volume}
        )
def sample_modal():
    # This import is a cheeky and quick way to run nanogpt with minimal changes to Andrej's code. Ideally we would change
    # the `train`` module to expose a function. Then import `train` and call that function.
    import nanogpt.sample

# This is what gets called locally when running `modal run train_modal.py`, and it just calls the 
# remote function.
@stub.local_entrypoint()
def main():
    sample_modal.call()

Then to run it:

modal run sample_modal.py

The result of this is long, and is shown in the expander below. I think this is really impressive:

Shakespeare Output (click to expand)

It amazes me that we can get computers, which are purely logical to do stuff like this at all. For presentive my first computer was an Acorn Electron – 32kb RAM (a millionth of a decent laptop nowadays).

Another reason this is amazing is the step-change that using the transformer model (which is the T in GPT) gives you over other models that are shown in the zero-hero course. It is not just the computing power that does this, but the research into new models that has happened in the last 20 years or so.

Turning up the temperature

Andrej included a temperature setting, which allows you to adjust the “randomness” of the output:

  • If set to very close to zero, it will produce the same output each time. This is the output it considers “most likely”.
  • If set to 1, it will produce the output with the probabilities it predicts, for example if it decides, based on training, that there is a 80% chance of an o coming next, and 15% chance of a d, then it will produce an o 80% of the time.
  • If set higher, the probabilities will move closer together, giving less likely character more chance of appearing.

The chart below (link to Google sheet) shows how increasing temperature makes the probabilities of 3 potential “next characters” close up to each other, and decreasing causes the preferred outcome to be picked as the winner always:

Let’s try a temperature of 2, add this line to train_shakespeare_char.py:

temperature = 2

Here is a small sample of the output I got. It is definitely more chaotic!

HASTINMBSABURY:
Stir-3 Sleep, haugs:
Warthy, usquick..tWarwiXl!
Hatensworn my feans?
You know,
Young, tof it is!
BAmilind!

A low of temperature of 0.1 give us this, which seems more coherent, but much more “stuck like a record”:

CORIOLANUS:
I will be so so much a part of the people,
And then the way of the common of the court,
And then the way of the people of the court,
And the prince of the people of the court,
Which we have stood of the prince of the people,
And the princely of the streets of the state,
Which we have stood to the body of the sea,

I think the default temperature of 0.8 was probably “just right” like the porridge!

Using tiktoken for better encoding of the text

Tiktoken is a tokenizer library used by OpenAI. It’s job is to turn a sentence into a string of number representations, which can then be used to train the model. It does this using an algorithm which first encodes the most frequent words as single tokens, while the less frequent words that contain more frequent words as its subwords are represented by multiple tokens, each of them representing a word part..

Until now. have been training by converting each character to a number. However the problem here is we are not making good use of the structure already in English: words and parts of words to process numbers with more meaning.

Tiktoken offers a choice of the pre-built tokenizers they use in their models, and Andrej uses the gpt2 one. To give an idea of what this does, here is some code that encodes using tiktoken, then shows the resulting encoding

enc = tiktoken.get_encoding("gpt2")
for tok in enc.encode("Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?'"):
   print(f'{str(tok).ljust(5)} : {enc.decode([tok])}')

Here is the result:

Click to expand

What I find interesting here is "and" & " and" are different tokens: 392 & 290. It is also interesting that most tokens are whole words here. “Peeped” is the odd one out that got split up.

To train the model using the tiktoken we need to run the prepare.py file in the shakespeare folder (as opposed to the shakespeare_char folder we used last time).

Training with the Tiktoken encoding

There are a few things I had to do to get this to work. It got a bit messy, so I won’t share the code here, but I aim to put something better up on Github eventually. But in short I had to

  • Change the GPU to A100 – 20Gb to have a chance to train it in a reasonable time
  • Because modal has “regions” this means also changing the volume name, so it could create a new volume near that GPU’s region
  • And this mean changing all the modal calls to specify the A100 -20Gb GPU so they would be in the same region
  • I also changed the parameters, I reduced the batch size from 256 to 64, since the tokens now mean more than they did before, so we can do with fewer, but I increased the embedding size from 384 to 384 * 4 since we might need more dimensions to represent a word.

With all of that done, here are the results I got, there is a lot more text because the number of tokens generated is as before:

Click to expand

Training costs were $0.71 for GPU and $0.09 for other stuff. It took almost bang on 1 hour to train. Inference (generating text) took a few seconds.

No local GPU? No Problem! Running Andrej Karpathy’s NanoGPT on Modal.com

Andrej Karpathy released a series of timeless lectures teaching us mortal 9-5 programmers from scratch how to train an “AI” language model, a bit like that GPT4 or ChatGPT you may have heard of.

He goes into a deep dive that includes building your own tiny Pytorch from scratch, setting up bigram models, and simple neural nets, before moving over to use the real Pytorch later. He then explains how transformers (the T in GPT) work, and codes one up to generate some dubious Shakespeare. This final model he calls “NanoGPT”, because of the similarity between it’s model and that of the early GPT models that lead to ChatGPT.

So why this post?

Well, while I absolutely loved the series, I don’t enjoy working with Colab or Jupyter Notebooks. It is easy to forget what code blocks have run, and I am forever scrolling up and down because the code is mixed up with results in one giant page. Not only that but if you are using Google Colab it will time out fairly quicky so you need to waste time running everything again.

⚠️Warning: I don’t think I recommend doing what I do here anymore. It works but is super fiddly. I am working on a much easier way to do this with a single Python file you download and run. So please read bearing that in mind…

I’d run it on my machine instead, but…

I want to run NanoGPT locally but I don’t have a good GPU. To save buying one for $2000+, I would like to rent one in the cloud if possible. If I use cloud GPUs I can experiment quickly with different chips as needed. An A100 GPU for example costs maybe $7000 – $15000 USD, but grabbing one for an hour for $4 is much more in my budget.

modal.com provides this service, and they take care of all of the “devops” as we will see soon. There is some housekeeping Python code to write, but no bash, Terraform or Ansible, which is great because I don’t want to do that.

Their GPU prices are not the cheapest. I would say they charge fair (average) prices though. And they charge for the milliseconds of actual usage and nothing else. That means I don’t pay extra because I forgot to shut down a server. Also they include $40/month credit for free anyway so it is costing me nothing to learn.

In this post I will show you how I used Modal to quickly train and run the NanoGPT model, while having the creature comforts of developing in VSCode.

What is NanoGPT anyway?

NanoGPT is nothing but a text producing bot!

When trained on some text it will learn how to predict the next character. So for example if you feed it “Hello ” it might predict W. You then feed it “Hello W” and it might predict o and so on. By repeating this you get text generation.

When trained on Shakespeare it makes muddled text that is quite a bit Shakespeare-looking.

Example of NanoGPT generated text:

FlY BOLINGLO: Them thrumply towiter arts the muscue rike begatt the sea it What satell in rowers that some than othis Marrity.

LUCENTVO: But userman these that, where can is not diesty rege; What and see to not. But’s eyes. What?

JOHN MARGARET: Than up I wark, what out, I ever of and love, one these do sponce, vois I me; But my pray sape to ries all to the not erralied in may.

If you want to know more, you can check out:

Now let’s get started, and get NanoGPT trained and running with local code, and a cloud GPU from Modal.

Step 1: Learn how to run code on Modal

I won’t parrot too much what Modal have in their tutorials, as that is the best place to go, but in a nutshell you can decorate functions in Python that you want to run on their servers.

For example you have a function you want to run in their cloud:

@stub.function()
def f(i):
    if i % 2 == 0:
        print("hello", i)
    else:
        print("world", i, file=sys.stderr)

    return i * i

And then you can call this from a local function either as-is (to run locally) or with .call (to run on the server):

@stub.local_entrypoint()
def main():
    # Call the function locally.
    print(f(1000))

    # Call the function remotely.
    print(f.call(1000))

To run this from the command line:

modal deploy example.py

Step 2: Fork the NanoGPT repo, and check it works on local computer

The next step is to make a fork of https://github.com/karpathy/nanoGPT and clone that fork to my computer, so that I can make some changes to adapt it to use Modal.

Note: If using Windows, you will need to use a Linux distribution installed to WSL2 to do this successfully as Windows is not supported for torch.compile

It is a good idea to check that we can get it to run locally. I just want to check the code works fast so I will reduce the number of iterations in train_shakespeare_char.py to 5, and dumb down the model size to ridiculously small so it completes in a few seconds on a crap laptop. Here are the changed lines in train_shakespeare_char.py:

...
max_iters = 5
...
# baby GPT model :)
n_layer = 2
n_head = 4
n_embd = 16
dropout = 0.2
...

In addition, I uncomment these 2 lines in the same file (train_shakespeare_char.py) to make it possible to run on an average laptop with no GPU:


# on macbook also add
device = 'cpu'  # run on cpu only
compile = False # do not torch compile the model

To check that it works, I set up a Python environment, and run similar commands as shown in the NanoGPT README.md:

python -m venv .
source bin/activate
pip install torch numpy transformers datasets tiktoken wandb tqdm
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py

From this we get a confirmation that this training loop is running correctly:

step 0: train loss 4.1783, val loss 4.1771
iter 0: loss 4.1791, time 47896.67ms, mfu -100.00%

Knowing that it works on my computer makes me more confident to try and getting it working on Modal.

Step 3: Upload the training data to modal

3.1 Authenticate with modal

First, lets do the basic setup for Modal and get authenticated:

pip install modal-client
modal token new

3.2 Change the prepare.py to upload to Modal

Now edit data/shakespeare_char/prepare.py, and nest the existing code inside a main function. Add a @stub.local_entrypoint() decorator, so that Modal knows to run this locally.

@stub.local_entrypoint()
def main():
    """     
    Prepare the Shakespeare dataset for character-level language modeling.
    So instead of encoding with GPT-2 BPoE tokens, we just map characters to ints.
    Will save train.bin, val.bin containing the ids, and meta.pkl containing the
    encoder and decoder and some other related info.
    """
    import os
    import pickle
    ...

Add the following lines at the top of the file to define the volume and app name:

import modal

volume = modal.NetworkFileSystem.new().persisted("nano-gpt-volume")
stub = modal.Stub("nano-gpt-code")

Now add this function at the bottom, which will run on the remote server. All it does is copies the files over with some prints to check if it was successful. It keeps the folder structure on the server the same (the working directory is /root there) so that there is less code to change in train.py when we get to it.


dataset = "shakespeare_char"

@stub.function(
        mounts=[modal.Mount.from_local_dir("data", remote_path="/source/data")],
        network_file_systems={"/root/data": volume})
def copy():
    import shutil          
    import os


    source_dataset_path = os.path.join("/source/data", dataset)
    dest_dataset_path = os.path.join("/root/data", dataset)

    def check():        
        if os.path.exists(dest_dataset_path):
            files = os.listdir(dest_dataset_path)
            print(f"Files: {str.join(', ', files)}")
        else:
            print(f"Path doesn't exist")

    check()
    shutil.copytree(source_dataset_path, dest_dataset_path, dirs_exist_ok=True)
    print("files copied")
    check()

Now make the call to copy from main:

...
    # val has 111540 tokens

    copy.call()

3.3 Run the upload

You can now run this to perform the upload:

modal run data/shakespeare_char/prepare.py

You should get an output like this:

Path doesn't exist
files copied
Files: meta.pkl, val.bin, prepare.py, input.txt, __pycache__, train.bin, readme.md

If you run it again, it should show that the files exist before it is copied, proving that the data was persisted. Now the remote machine has access to the training data.

Step 4: Adapt the training code to run on Modal

4.1 Make the training code into a Python package

As far as I can tell, in order for Modal to see all of your Python code it must be organised in a package.

To make the code into a Python package those is quite simple, first move the python files for the model training and text generation into a new folder:

mkdir nanogpt
mv config *.py nanogpt

Find all instances of from model in these files, and replace with from .model (Add a period). For example in train.py:

from .model import GPTConfig, GPT

Adding a period to these local imports says “this is from the current directory’s package”. This allows the code to work when called from another package or location, which will be doing when using Modal.

4.2 Remove the configurator

There is a line in train.py that needs to be commented out because it won’t work in Modal (because it doesn’t have the source files in the same place), so comment this out, and add a hard-coded line that does the equivalent thing for the Shakespeare model.

# exec(open('nanogpt/configurator.py').read()) # overrides from command line or config file
from .config.train_shakespeare_char import *

This is perhaps not the ideal way to do it, but a quick change for the purposes of making this blog post not too long.

4.3 Add a python script to run the code in Modal

Create a new file called train.modal.py in the root of the project (so one up from nanogpt folder) and add the code below. I have put some comments in there to explain it.

import modal

# Make sure we have access to the data we prepared earlier:
volume = modal.NetworkFileSystem.new().persisted("nano-gpt-volume")

# Set up the container for running the training, and make sure it has the necessary
# python pacakages installed.
stub = modal.Stub("nano-gpt-train",
    image=modal.Image.debian_slim().pip_install(
        ["torch", "numpy", "transformers", "datasets", "tiktoken", "wandb", "tqdm"]
    )
)

# This stub.function allows train_modal to be called remotely on their servers. We will
# now specify how we want that set up...
@stub.function(
        # Ensure that the function runs with a GPU, I have picked out a cheap one, but you can replace
        # this with "any" in the future if this GPU is no longer available.
        gpu=modal.gpu.T4(), 

        # Increase the timeout to allow long training times.
        timeout=3600, 

        # This tells modal to upload the entire nanogpt package we created. Without doing
        # this it won't be able to locate train.py, model.py etc.
        mounts=[modal.Mount.from_local_python_packages("nanogpt")],
        
        # Mount the data we prepared earlier
        network_file_systems={"/root/data": volume}
        )
def train_modal():
    # This import is a cheeky and quick way to run nanogpt with minimal changes to Andrej's code. Ideally we would change
    # the `train`` module to expose a function. Then import `train` and call that function.ction and call that.
    import nanogpt.train

# This is what gets called locally when running `modal run train_modal.py`, and it just calls the 
# remote function.
@stub.local_entrypoint()
def main():
    train_modal.call()

With a GPU available, we can comment these 2 lines back out in train_shakespeare_char.py:

# on macbook also add
# device = 'cpu'  # run on cpu only
# compile = False # do not torch compile the model

We also want the checkpoint saving to work (which saves the progress so we can resume on error, and also to run the model later). Because we mounted a folder called data, make the following change, otherwise the checkpoints wont be saved:


out_dir = 'data/out-shakespeare-char'

4.4 Run the script

Now we can run this from the command line: modal run train_modal.py, and here is the result:

(nanoGPTonModal) martin@Capo:~/nanoGPTonModal$ modal run train_modal.py
✓ Initialized. View app at https://modal.com/apps/ap-k9Oehw5IpXCxmt3yNBUNds
✓ Created objects.
├── 🔨 Created train_modal.
├── 🔨 Created mount /home/martin/nanoGPTonModal/nanogpt
└── 🔨 Created mount /home/martin/nanoGPTonModal/train_modal.py
tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 0.01M
num decayed parameter tensors: 10, with 11,280 parameters
num non-decayed parameter tensors: 5, with 80 parameters
using fused AdamW: True
step 0: train loss 4.1783, val loss 4.1771
iter 0: loss 4.1791, time 3620.00ms, mfu -100.00%
✓ App completed.

4.5. Revert to the proper sized hyper-parameters

Revert the values in train_shakespeare_char.py to the bigger model values, with more iterations. Now we are using Modal, this will be able to run in a reasonable time.

...
max_iters = 5000
...
# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2
...

Tip, the next step takes about 15 minutes. If it makes training progress (says checkpoint has been created) but then gets stopped, you can resume it by setting init_from = 'resume' in the parameters above.

Running modal run train_modal.py again:

\(nanoGPTonModal) martin@Capo:~/nanoGPTonModal$modal run train_modal.py
✓ Initialized. View app at https://modal.com/apps/ap-HU6D2SRnxOv1OsJpmlb3Fj
✓ Created objects.
├── 🔨 Created train_modal.
├── 🔨 Created mount /home/martin/nanoGPTonModal/nanogpt
└── 🔨 Created mount /home/martin/nanoGPTonModal/train_modal.py
tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2649, time 29573.95ms, mfu -100.00%
iter 10: loss 3.2438, time 101.76ms, mfu 3.66%
iter 20: loss 2.7899, time 103.62ms, mfu 3.66%
iter 30: loss 2.6383, time 104.10ms, mfu 3.65%
iter 40: loss 2.5763, time 101.83ms, mfu 3.65%
iter 50: loss 2.5261, time 104.54ms, mfu 3.64%
iter 60: loss 2.5136, time 103.90ms, mfu 3.64%
...
iter 4980: loss 1.2050, time 117.62ms, mfu 3.16%
iter 4990: loss 1.2493, time 114.90ms, mfu 3.17%
step 5000: train loss 1.1405, val loss 1.4969
iter 5000: loss 1.2446, time 12044.48ms, mfu 2.86%
✓ App completed.

Costs

It took about 14 minutes and cost $0.21 to train the model. I think $0.14 was for the GPU and the rest was for CPU/memory.

Conclusion

First, this took a little more work than expected to get some local python code running on Modal.

The combination of design choices in the nanoGPT repo, and the fairly narrow happy path to get code to run in Modal meant that a lot of changes had to be made. To summarize these things meant code changes were needed:

  • Modal will only upload a bunch of Python files if specified as a package. NanoGPT didn’t do this.
  • Modal will put the files “somewhere”, so using exec() on relative paths to local scripts like NanoGPT does won’t work.
  • Modal requires additional functions and decorations, so a new file is needed.
  • Modal requires specification of mounts etc. so this new file has quite a bit to it.

I think if you build a Python project with Modal in mind, then the experience will be easier. You will know how to organize files, what not to do, etc. So there will be less work to do.

Next, it is worth saying that once you get this working, it works really well. Running modal run train_modal.py it gets going and chugs along, you almost forget this is doing a whole bunch of ops stuff in the cloud for you. Then you can iterate and change things up and Modal gets out of your way a bit.

With Modal set up, I can now code with an IDE, IDE Plugins, file structure, git, etc. It is more what I am used to than the Jupyter experience where you have to remember what state things are in, there is effectively one big file, and output and code are all mixed up. This is much better.

Therefore overall I think Modal is worth learning and experimenting with, and putting that initial effort to get set up. Or if money is no object, just go buy a big GPU :-).

Next

In the next blog post I run the text generation to see what kind of Shakespeare this model can produce. This will require some code changes to get that to work on Modal, but I expect it to be a lot less as much of the work has been done.

I will also explore what other features are in NanoGPT and try them out using Modal too.

NextJS – Undocumented Features

I have recently being playing around with the app router in NextJS. This is “new paradigm for building applications using React’s latest features”, and was introduces in v13. Personally I find this a real headache to work with. Mainly because the documentation is a bit scant and only covers the happy path. It feels a bit “early” to be using this, so I would probably stick to Page Routing for anything critical.

NEXT_PRIVATE_DEBUG_CACHE

One problem is that it will refuse to cache any fetch over 2Mb (well, looking at their code anything with more than 2 * 1024 * 1024 string length, which is an approximation. However 2Mb and a bit is a annoyingly low limit. I tried to hack it to be higher, but it seems Vercel refuses to cache it anyway. I discovered a nice hidden feature in their library code. Dotted around it are statements like this:

                if (res.status === 404) {
                    if (this.debug) {
                        console.log(`no fetch cache entry for ${key}, duration: ${Date.now() - start}ms`);
                    }

We have a debug flag, which if set logs more stuff. And more logs lead to, well more knowing what the hell is going on! How to set this debug flag on? Simple. Set the environment variable NEXT_PRIVATE_DEBUG_CACHE to true (or any truthy value). If you are using Vercel to host, you can set this up in your settings, under environment variables, then redeploy. Like this:

Here are some example logs, after doing that:

no fetch cache entry for 7710c8185037cf970b4bbd65edf9625c9f8acd1a32b78bc76ec6fc2314ff8273, duration: 83ms
no fetch cache entry for 7710c8185037cf970b4bbd65edf9625c9f8acd1a32b78bc76ec6fc2314ff8273, duration: 44ms
set cache 7710c8185037cf970b4bbd65edf9625c9f8acd1a32b78bc76ec6fc2314ff8273 { tags: '/deals/[UniqueId]/page' }
set cache 7710c8185037cf970b4bbd65edf9625c9f8acd1a32b78bc76ec6fc2314ff8273 { tags: '/deals/[UniqueId]/page' }

Repeats of this told me what I expected, my stuff ain’t getting cached. Now I need to figure out why.

How to make Windows 11 livable

Bring back the old File Explorer right-click menu

Enable the old menu for the explorer shell. The one that had everything on it rather than just 6 or so things! Like this:

To do this, run these from the “cmd” command prompt. (This will close any explorer windows by the way).

reg.exe add "HKCU\Software\Classes\CLSID\{86ca1aa0-34aa-4e8b-a509-50c905bae2a2}\InprocServer32" /f /ve
taskkill /f /im explorer.exe
explorer

Add VSCode to the context menu

Want this? (The bottom option below)

The easiest way to do this is check the 2 options when installing:

If you have already installed VSCode, no problem. Just download the installer and install again!

Install 7z

Install 7z to extract zip, 7z, tar and gz files: https://www.7-zip.org/download.html

More to come…

Learning List

This is the stuff I want to read and learn at the moment:

General Learning Stuff

Machine Learning

  • CSC 321 Winter 2018 – Intro to Neural Networks and Machine Learning – https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/ – so I can keep up when studying Neural Networks: Zero to Hero
  • Neural Networks: Zero to Hero – https://github.com/karpathy/nn-zero-to-hero
  • Blog Post: LLM Engineering https://huyenchip.com/2023/04/11/llm-engineering.html
  • https://www.deeplearningbook.org/
  • https://towardsdatascience.com/perplexity-intuition-and-derivation-105dd481c8f3
  • https://towardsdatascience.com/the-intuition-behind-shannons-entropy-e74820fe9800
  • TBC Some course on calculus / linear algebra to brush up
  • TBC Some course on PyTorch

Computers in general

  • cpu.land – a small course about learning how Linux works in more detail

Firebase Firestore Rules Recipes and Tips

Er… Why?

Having done a complete web app using Firebase, I did find there are a few non-obvious things about it. One thing that kept tripping me up is the Firestore rules.

If you want users to directly read and write the data in Firestore, without needing a function or a separate server (like Node.js & Express), then Firebase Rules are what will enforce security and data validation for you.

What I found is that some of the nuances of Firebase rules aren’t too obviously documented. I have created this guide to hopefully save you some time with them if you use them.

Here I am talking about rules just for Firestore. This doesn’t cover Storage or Realtime Database. (Storage rules are quite similar though).

Before I talk about rules, let’s cover some basics about the data itself in Firestore and what Rules are. Feel free to skip over these sections if you know this already.

Firestore Structure

Firestore is a document-oriented database. This means the structure is one of collections of documents. Documents are identified by a key, and contain key/value pairs, which can contain further key/value pairs similar to a JSON structure.

You can get documents by the collection name and key, and you can also run queries to get all documents in a collection where the document meets conditions you specify.

The Firestore can be interacted via the regular API. This is for apps / sites, used by both anonymous users and authenticated users of the database. It can also be interacted with via the Admin API, which is for example used by cloud functions that need unfettered access to all of the data.

Rules only apply to the regular API.

For further details, see the Firestore Data Model documentation.

Rules basics

Firestore rules allow you to decide whether to accept or deny any request to read or write data, based on:

  • The request
  • The user
  • The record being amended
  • The updated data
  • Other records in Firestore

This allows rules to serve at least two main functions:

  • Authorization – i.e. what can this user (or unauthenticated request) do?
  • Validation – i.e. is the new record or update valid?

All a rule can do is accept or reject a request. It cannot filter a request.

If a user does a request that would return 10 records, but the rules denies 1 of those records, then the request simply fails, rather than returning the 9 records they can get.

Rules can be defined for the operations of get, list, create, update and delete, with the shortcut keywords:

  • “read” means both get & list
  • “write” means create, update & delete.

If you want to understand the syntax of rules or the operations, you can read the documentation here. Or just read the recipes and hopefully it should make sense intuitively.

Rules Recipes

With the basics out of the way, I want to provide lots of example of rules.

The thing I struggled with when using rules is “how to do this?”. I got stuck on the difference between request.resource and resource, and why some rules didn’t seem to have any effect, or just kept giving errors.

Therefore I think these recipes should help you, hopefully, by reducing the time to get something working and debuggable. You can then customize the recipe for your needs using the Firebase documentation. Please give me feedback in the comments if something does not make sense, is not right, or isn’t covered. Happy to help on your specific problem, and add to my knowledge too.

Allow nothing

The allow nothing recipe is the rule that doesn’t allow any operations on a collection.

To do this, do nothing! By default, with no rules you cannot access the data in any way. Which is a good thing.

Allow everyone to read

To allow everyone, both authenticated and unauthenticated to read all documents in the collection – i.e. both list the documents and read their contents:

match /collection/{item} {
    allow read: if true;
}

Allow authenticated users only to read

This allows anyone who is logged in to read all documents in the collection:

match /collection/{item} {
    allow read: if request.auth != null;
}

Allow authenticated users with a specific token to read

Firebase authentication allows you to set up extra details for a user when they are created (and these can be updated) which can be queried during the rules:

match /collection/{item} {
    allow read: if request.auth != null &&
        request.auth.token.isAdmin == true;
}

Having used tokens, I am not too keen on them, because the are invisible in admin interfaces (so you have to write script code to see what tokens a user has), and you will probably need to keep the token information in sync with the database data anyway. The next example is the way I prefer to do it…

Allow authenticated users based on profile criteria

You can use the incoming authentication uid to look up details about the user, and then check if they should have access:

match /collection/{item} {
    allow read: if request.auth != null &&
        get(/databases/$(database)/documents/users/$(request.auth.uid)).isAdmin == true
}

Allow only the “owner” of an object to read

You can store the id of the user who “owns” this object as a string, and then compare this to the requesting user:

match /collection/{item} {
    allow read: if request.auth != null &&
        request.auth.uid == resource.data.uid;
}

Social Network User Profile Pattern

The following shows a basic pattern for a user profile, so that someone can only create or update a user profile for them, but you can read other peoples profiles, even anonymously.

match /users/{uid} {
    allow write, update: if request.auth != null && request.auth.uid == uid;
    allow read: if true;
}

Of course, you can change this to a more private social network by only allowing authenticated users to read profiles:

match /users/{uid} {
    //...
    allow read: if request.auth != null;
}

Multitenant

Multitenant is the idea of having different organisations (e.g. companies) that have users, where from a security point of view they are siloed, so you can only see data within your company.

There are two ways to do this: Firstly, you can use a tenantId field on each record to distinguish them, or secondly create subcollections within a tenant collection.

In my experience for maximum future flexibility and ease of client programming, I recommend adding a tenantId field and to avoid subcollections because they make calling code more complicated.

Here is an example of changing the user rules from the Social Network example to allow reading only for users in the same company:

match /users/{uid} {
    //...
    allow read: if request.auth != null &&
        get(/databases/$(database)/documents/users/$(request.auth.uid)).companyId == resource.data.companyId
}

Note that something needs to set the companyId initially, so probably you would want that to be trusted code, for example running in a function.

Subcollection matching

The syntax for matching a subcollection item is this giving you access to the id of both the parent and child collection:

match /collection/{collectionId}/subcollection/{subCollectionId} {
    //...
}

This said, my experience with subcollections leads me to think by default you should go for single level collections for everything.

To do this, reference the parent collection by having a parentId string field (no need for a reference field, they are tricky to work with inside rules, and I don’t see the benefit of them over a string)

It makes client code much simpler, while (from what I see) offering no real disadvantage. Please comment if I am wrong about this!

Data Validations

With Firebase by default accepting any shape of data, you might want to validate that fields exist, don’t exist, or have certain types or constraints. Remember with rules you don’t get much error information if they fail, so you also want to check these on the client before sending any data for a good user experience.

For this I would split out the validations into a function, so you can call this from different rules. A validation may look something like this:


function validateEvent(event) {    
  return event.keys().hasAll(['title', 'description', 'imageName', 'dateAndTimeOfFirstRecurrance',
    'timeZone', 'recurrenceType', 'numberOfRecurrences', 'questions', 'invitees',
    'breakTimeMs', 'rules' ]) &&
    event.title is string &&
    event.description is string &&
    event.imageName is string || event.imageName == null &&
    event.dateAndTimeOfFirstRecurrance is number &&
    event.timeZone is string &&
    event.recurrenceType is string &&
    event.numberOfRecurrences is number &&
    event.questions is list &&
    event.invitees is list &&
    event.invitees is list &&
    event.breakTimeMs is number &&
    event.rules is string &&
    event.title.size() > 0 &&
    event.title.size() < 100 &&
    event.description.size() > 0 &&
    event.description.size() < 10000 &&
    event.rules.size() < 1000 &&
    event.dateAndTimeOfFirstRecurrance > 0 &&
    event.timeZone.size() > 0 &&
    event.timeZone.size() < 1000 &&
    event.recurrenceType.size() > 0 &&
    event.recurrenceType.size() < 1000 &&
    event.questions.size() > 0 &&
    event.breakTimeMs >= 10000 &&
    event.breakTimeMs <= 3600000
//... etc ...
}

match /events/{e} {
//...
    allow create, update: if request.auth != null && 
        validateEvent(request.resource.data)
}

This is a living blog post!

I will add more examples to this as I encounter them, and I am sure I have forgotten a few. Watch this space.

Human-made Content
×