How do climate models work?

Also published at Skeptical Science

This is a climate model:

T = [(1-α)S/(4εσ)]1/4

(T is temperature, α is the albedo, S is the incoming solar radiation, ε is the emissivity, and σ is the Stefan-Boltzmann constant)

An extremely simplified climate model, that is. It’s one line long, and is at the heart of every computer model of global warming. Using basic thermodynamics, it calculates the temperature of the Earth based on incoming sunlight and the reflectivity of the surface. The model is zero-dimensional, treating the Earth as a point mass at a fixed time. It doesn’t consider the greenhouse effect, ocean currents, nutrient cycles, volcanoes, or pollution.

If you fix these deficiencies, the model becomes more and more complex. You have to derive many variables from physical laws, and use empirical data to approximate certain values. You have to repeat the calculations over and over for different parts of the Earth. Eventually the model is too complex to solve using pencil, paper and a pocket calculator. It’s necessary to program the equations into a computer, and that’s what climate scientists have been doing ever since computers were invented.

A pixellated Earth

Today’s most sophisticated climate models are called GCMs, which stands for General Circulation Model or Global Climate Model, depending on who you talk to. On average, they are about 500 000 lines of computer code long, and mainly written in Fortran, a scientific programming language. Despite the huge jump in complexity, GCMs have much in common with the one-line climate model above: they’re just a lot of basic physics equations put together.

Computers are great for doing a lot of calculations very quickly, but they have a disadvantage: computers are discrete, while the real world is continuous. To understand the term “discrete”, think about a digital photo. It’s composed of a finite number of pixels, which you can see if you zoom in far enough. The existence of these indivisible pixels, with clear boundaries between them, makes digital photos discrete. But the real world doesn’t work this way. If you look at the subject of your photo with your own eyes, it’s not pixellated, no matter how close you get – even if you look at it through a microscope. The real world is continuous (unless you’re working at the quantum level!)

Similarly, the surface of the world isn’t actually split up into three-dimensional cells (you can think of them as cubes, even though they’re usually wedge-shaped) where every climate variable – temperature, pressure, precipitation, clouds – is exactly the same everywhere in that cell. Unfortunately, that’s how scientists have to represent the world in climate models, because that’s the only way computers work. The same strategy is used for the fourth dimension, time, with discrete “timesteps” in the model, indicating how often calculations are repeated.

It would be fine if the cells could be really tiny – like a high-resolution digital photo that looks continuous even though it’s discrete – but doing calculations on cells that small would take so much computer power that the model would run slower than real time. As it is, the cubes are on the order of 100 km wide in most GCMs, and timesteps are on the order of hours to minutes, depending on the calculation. That might seem huge, but it’s about as good as you can get on today’s supercomputers. Remember that doubling the resolution of the model won’t just double the running time – instead, the running time will increase by a factor of sixteen (one doubling for each dimension).

Despite the seemingly enormous computer power available to us today, GCMs have always been limited by it. In fact, early computers were developed, in large part, to facilitate atmospheric models for weather and climate prediction.

Cracking the code

A climate model is actually a collection of models – typically an atmosphere model, an ocean model, a land model, and a sea ice model. Some GCMs split up the sub-models (let’s call them components) a bit differently, but that’s the most common arrangement.

Each component represents a staggering amount of complex, specialized processes. Here are just a few examples from the Community Earth System Model, developed at the National Center for Atmospheric Research in Boulder, Colorado:

  • Atmosphere: sea salt suspended in the air, three-dimensional wind velocity, the wavelengths of incoming sunlight
  • Ocean: phytoplankton, the iron cycle, the movement of tides
  • Land: soil hydrology, forest fires, air conditioning in cities
  • Sea Ice: pollution trapped within the ice, melt ponds, the age of different parts of the ice

Each component is developed independently, and as a result, they are highly encapsulated (bundled separately in the source code). However, the real world is not encapsulated – the land and ocean and air are very interconnected. Some central code is necessary to tie everything together. This piece of code is called the coupler, and it has two main purposes:

  1. Pass data between the components. This can get complicated if the components don’t all use the same grid (system of splitting the Earth up into cells).
  2. Control the main loop, or “time stepping loop”, which tells the components to perform their calculations in a certain order, once per time step.

For example, take a look at the IPSL (Institut Pierre Simon Laplace) climate model architecture. In the diagram below, each bubble represents an encapsulated piece of code, and the number of lines in this code is roughly proportional to the bubble’s area. Arrows represent data transfer, and the colour of each arrow shows where the data originated:

We can see that IPSL’s major components are atmosphere, land, and ocean (which also contains sea ice). The atmosphere is the most complex model, and land is the least. While both the atmosphere and the ocean use the coupler for data transfer, the land model does not – it’s simpler just to connect it directly to the atmosphere, since it uses the same grid, and doesn’t have to share much data with any other component. Land-ocean interactions are limited to surface runoff and coastal erosion, which are passed through the atmosphere in this model.

You can see diagrams like this for seven different GCMs, as well as a comparison of their different approaches to software architecture, in this summary of my research.

Show time

When it’s time to run the model, you might expect that scientists initialize the components with data collected from the real world. Actually, it’s more convenient to “spin up” the model: start with a dark, stationary Earth, turn the Sun on, start the Earth spinning, and wait until the atmosphere and ocean settle down into equilibrium. The resulting data fits perfectly into the cells, and matches up really nicely with observations. It fits within the bounds of the real climate, and could easily pass for real weather.

Scientists feed input files into the model, which contain the values of certain parameters, particularly agents that can cause climate change. These include the concentration of greenhouse gases, the intensity of sunlight, the amount of deforestation, and volcanoes that should erupt during the simulation. It’s also possible to give the model a different map to change the arrangement of continents. Through these input files, it’s possible to recreate the climate from just about any period of the Earth’s lifespan: the Jurassic Period, the last Ice Age, the present day…and even what the future might look like, depending on what we do (or don’t do) about global warming.

The highest resolution GCMs, on the fastest supercomputers, can simulate about 1 year for every day of real time. If you’re willing to sacrifice some complexity and go down to a lower resolution, you can speed things up considerably, and simulate millennia of climate change in a reasonable amount of time. For this reason, it’s useful to have a hierarchy of climate models with varying degrees of complexity.

As the model runs, every cell outputs the values of different variables (such as atmospheric pressure, ocean salinity, or forest cover) into a file, once per time step. The model can average these variables based on space and time, and calculate changes in the data. When the model is finished running, visualization software converts the rows and columns of numbers into more digestible maps and graphs. For example, this model output shows temperature change over the next century, depending on how many greenhouse gases we emit:

Predicting the past

So how do we know the models are working? Should we trust the predictions they make for the future? It’s not reasonable to wait for a hundred years to see if the predictions come true, so scientists have come up with a different test: tell the models to predict the past. For example, give the model the observed conditions of the year 1900, run it forward to 2000, and see if the climate it recreates matches up with observations from the real world.

This 20th-century run is one of many standard tests to verify that a GCM can accurately mimic the real world. It’s also common to recreate the last ice age, and compare the output to data from ice cores. While GCMs can travel even further back in time – for example, to recreate the climate that dinosaurs experienced – proxy data is so sparse and uncertain that you can’t really test these simulations. In fact, much of the scientific knowledge about pre-Ice Age climates actually comes from models!

Climate models aren’t perfect, but they are doing remarkably well. They pass the tests of predicting the past, and go even further. For example, scientists don’t know what causes El Niño, a phenomenon in the Pacific Ocean that affects weather worldwide. There are some hypotheses on what oceanic conditions can lead to an El Niño event, but nobody knows what the actual trigger is. Consequently, there’s no way to program El Niños into a GCM. But they show up anyway – the models spontaneously generate their own El Niños, somehow using the basic principles of fluid dynamics to simulate a phenomenon that remains fundamentally mysterious to us.

In some areas, the models are having trouble. Certain wind currents are notoriously difficult to simulate, and calculating regional climates requires an unaffordably high resolution. Phenomena that scientists can’t yet quantify, like the processes by which glaciers melt, or the self-reinforcing cycles of thawing permafrost, are also poorly represented. However, not knowing everything about the climate doesn’t mean scientists know nothing. Incomplete knowledge does not imply nonexistent knowledge – you don’t need to understand calculus to be able to say with confidence that 9 x 3 = 27.

Also, history has shown us that when climate models make mistakes, they tend to be too stable, and underestimate the potential for abrupt changes. Take the Arctic sea ice: just a few years ago, GCMs were predicting it would completely melt around 2100. Now, the estimate has been revised to 2030, as the ice melts faster than anyone anticipated:

Answering the big questions

At the end of the day, GCMs are the best prediction tools we have. If they all agree on an outcome, it would be silly to bet against them. However, the big questions, like “Is human activity warming the planet?”, don’t even require a model. The only things you need to answer those questions are a few fundamental physics and chemistry equations that we’ve known for over a century.

You could take climate models right out of the picture, and the answer wouldn’t change. Scientists would still be telling us that the Earth is warming, humans are causing it, and the consequences will likely be severe – unless we take action to stop it.

Wrapping Up

My summer job as a research student of Steve Easterbrook is nearing an end. All of a sudden, I only have a few days left, and the weather is (thankfully) cooling down as autumn approaches. It feels like just a few weeks ago that this summer was beginning!

Over the past three months, I examined seven different GCMs from Canada, the United States, and Europe. Based on the source code, documentation, and correspondence with scientists, I uncovered the underlying architecture of each model. This was represented in a set of diagrams. You can view full-sized versions here:

The component bubbles are to scale (based on the size of the code base) within each model, but not between models. The size and complexity of each GCM varies greatly, as can be seen below. UVic is by far the least complex model – it is arguably closer to an EMIC than a full GCM.

I came across many insights while comparing GCM architectures, regarding how modular components are, how extensively the coupler is used, and how complexity is distributed between components. I wrote some of these observations up into the poster I presented last week to the computer science department. My references can be seen here.

A big thanks to the scientists who answered questions about their work developing GCMs: Gavin Schmidt (Model E); Michael Eby (UVic); Tim Johns (HadGEM3); Arnaud Caubel, Marie-Alice Foujols, and Anne Cozic (IPSL); and Gary Strand (CESM). Additionally, Michael Eby from the University of Victoria was instrumental in improving the diagram design.

Although the summer is nearly over, our research certainly isn’t. I have started writing a more in-depth paper that Steve and I plan to develop during the year. We are also hoping to present our work at the upcoming AGU Fall Meeting, if our abstract gets accepted. Beyond this project, we are also looking at a potential experiment to run on CESM.

I guess I am sort of a scientist now. The line between “student” and “scientist” is blurry. I am taking classes, but also writing papers. Where does one end and the other begin? Regardless of where I am on the spectrum, I think I’m moving in the right direction. If this is what Doing Science means – investigating whatever little path interests me – I’m certainly enjoying it.

Working Away

The shape of my summer research is slowly becoming clearer. Basically, I’ll be writing a document comparing the architecture of different climate models. This, of course, involves getting access to the source code. Building on Steve’s list, here are my experiences:

NCAR, Community Earth System Model (CESM): Password-protected, but you can get access within an hour. After a quick registration, you’ll receive an automated email with a username and password. This login information gives you access to their Subversion repository. Registration links and further information are available here, under “Acquiring the CESM1.0 Release Code”.

University of Victoria, Earth System Climate Model (ESCM): Links to the source code can be found on this page, but they’re password-protected. You can request an account by sending an email – follow the link for more information.

Geophysical Fluid Dynamics Laboratory (GFDL), CM 2.1: Slightly more complicated. Create an account for their Gforge repository, which is an automated process. Then, request access to the MOM4P1 project – apparently CM 2.1 is included within that. Apparently, the server grants you request to a project, so it sounds automatic – but the only emails I’ve received from the server regard some kind of GFDL mailing list, and don’t mention the project request. I will wait and see.
Update (July 20): It looks like I got access to the project right after I requested it – I just never received an email!

Max Planck Institute (MPI), COSMOS: Code access involves signing a licence agreement, faxing it to Germany, and waiting for it to be approved and signed by MPI. The agreement is not very restrictive, though – it deals mainly with version control, documenting changes to the code, etc.

UK Met Office, Hadley Centre Coupled Model version 3 (HadCM3): Our lab already has a copy of the code for HadCM3, so I’m not really sure what the process is to get access, but apparently it involved a lot of government paperwork.

Institut Pierre Simon Laplace (IPSL), CM5: This one tripped me up for a while, largely because the user guide is difficult to find, and written in French. Google Translate helped me out there, but it also attempted to “translate” their command line samples! Make sure that you have ksh installed, too – it’s quick to fix, but I didn’t realize it right away. Some of the components for IPSLCM5 are open access, but others are password-protected. Follow the user guide’s instructions for who to email to request access.

Model E: This was the easiest of all. From the GISS website, you can access all the source code without any registration. They offer a frozen AR4 version, as well as nightly snapshots of the work-in-process for AR5 (frozen AR5 version soon to come). There is also a wealth of documentation on this site, such as an installation guide and a description of the model.

I’ve taken a look at the structural code for Model E, which is mostly contained in the file MODELE.f. The code is very clear and well commented, and the online documentation helped me out too. After drawing a lot of complicated diagrams with arrows and lists, I feel that I have a decent understanding of the Model E architecture.

Reading code can become monotonous, though, and every now and then I feel like a little computer trouble to keep things interesting. For that reason, I’m continuing to chip away at building and running two models, Model E and CESM. See my previous post for how this process started.

<TECHNICAL COMPUTER STUFF> (Feel free to skip ahead…)

I was still having trouble viewing the Model E output (only one file worked on Panoply, the rest created an empty map) so I emailed some of the lab’s contacts at NASA. They suggested I install CDAT, a process which nearly broke Ubuntu (haven’t we all been there?) Basically, because it’s an older program, it thought the newest version of Python was 2.5 – which it subsequently installed and set as the default in /usr/bin. Since I had Python 2.6 installed, and the versions are apparently very not-backwards-compatible, every program that depended on Python (i.e. almost everything on Ubuntu) stopped working. Our IT contact managed to set 2.6 back as the default, but I’m not about to try my hand at CDAT again…

I have moved forward very slightly on CESM. I’ve managed to build the model, but upon calling test.<machine name>.run, I get rather an odd error:

./Tools/ccsm_getenv: line 9: syntax error near unexpected token '('
./Tools/ccsm_getenv: line 9: 'foreach i (env_case.xml env_run.xml env_conf.xml env_build.xml env_mach_pes.xml)'

Now, I’m pretty new at shell scripting, but I can’t see the syntax error there – and wouldn’t syntax errors appear at compile-time, rather than run-time?

A post by Michael Tobis, who had a similar error, suggested that the issue had to do with qsub. Unfortunately, that meant I had to actually use qsub – I had previously given up trying to configure Torque to run on a single machine rather than many. I gave the installation another go, and now I can get scripts into the queue, but they never start running – their status stays as “Q” even if I leave the computer alone for an hour. Since the machine has a dual-core processor, I can’t see why it couldn’t run both a server and a node at once, but it doesn’t seem to be working for me.

</TECHNICAL COMPUTER STUFF>

Before I started this job, climate models seemed analogous to Antarctica – a distant, mysterious, complex system that I wanted to visit, but didn’t know how to get to. In fact, they’re far more accessible than Antarctica. More on the scale of a complicated bus trip across town, perhaps?

They are not perfect pieces of software, and they’re not very user friendly. However, all the struggles of installation pay off when you finally get some output, and open it up, and see realistic data representing the very same planet you’re sitting on! Even just reading the code for different models shows you many different ways to look at the same system – for example, is sea ice a realm of its own, or is it a subset of the ocean? In the real world the lines are blurry, but computation requires us to make clear divisions.

The code can be unintelligible (lndmaxjovrdmdni) or familiar (“The Stefan-Boltzmann constant! Finally I recognize something!”) or even entertaining (a seemingly random identification string, dozens of characters long, followed by the comment if you edit this you will get what you deserve). When you get tied up in the code, though, it’s easy to miss the bigger picture: the incredible fact that we can use the sterile, binary practice of computation to represent a system as messy and mysterious as the whole planet. Isn’t that something worth sitting and marveling over?