Feeds:
Posts

## We just published in Nature Geoscience

It turns out that when you submit a paper to a journal like Nature Geoscience “just in case, we have nothing to lose, they’ll probably reject it straight away”…sometimes you are unexpectedly successful.

Assorted media coverage:

More detailed post to come…

## My Research with Steve

Almost four years ago I took a job as a summer student of Dr. Steve Easterbrook, in the software engineering lab of the University of Toronto. This was my first time taking part in research, but also my first time living away from home and my first time using a Unix terminal (both of which are challenging, but immensely rewarding, life skills).

While working with Steve I discovered that climate model output is really pretty (an opinion which hasn’t changed in the four years since) and that climate models are really hard to install (that hasn’t changed either).

At Steve’s suggestion I got a hold of the code for various climate models and started to pick it apart. By the end of the summer I had created a series of standardised diagrams showing the software architecture of each model.

These diagrams proved to be really useful communication tools: we presented our work at AGU the following December, and at NCAR about a year after that, to very positive feedback. Many climate modellers we met at these conferences were pleased to have a software diagram of the model they used (which is very useful to show during presentations), but they were generally more interested in the diagrams for other models, to see how other research groups used different software structures to solve the same problems. “I had no idea they did it like that,” was a remark I heard more than a few times.

Between my undergrad and PhD, I went back to Toronto for a month where I analysed the model code more rigorously. We made a new set of diagrams which was more accurate: the code was sorted into components based on dependency structure, and the area of each component in a given diagram was exactly proportional to the line count of its source code.

Here is the diagram we made for the GFDL-ESM2M model, which is developed at the Geophysical Fluid Dynamics Laboratory in Princeton:

We wrote this all up into a paper, submitted it to GMD, and after several months and several rounds of revision it was published just yesterday! The paper is open access, and can be downloaded for free here. It’s my first paper as lead author which is pretty exciting.

I could go on about all the interesting things we discovered while comparing the diagrams, but that’s all in the paper. Instead I wanted to talk about what’s not in the paper: the story of the long and winding journey we took to get there, from my first day as a nervous summer student in Toronto to the final publication yesterday. These are the stories you don’t read about in scientific papers, which out of necessity detail the methodology as if the authors knew exactly where they were going and got there using the shortest possible path. Science doesn’t often work like that. Science is about messing around and exploring and getting a bit lost and eventually figuring it out and feeling like a superhero when you do. And then writing it up as if it was easy.

I also wanted to express my gratitude to Steve, who has been an amazing source of support, advice, conversations, book recommendations, introductions to scientists, and career advice. I’m so happy that I got to be your student. See you before long on one continent or another!

## A Simple Stochastic Climate Model: Climate Sensitivity

Last time I derived the following ODE for temperature T at time t:



where S and τ are constants, and F(t) is the net radiative forcing at time t. Eventually I will discuss each of these terms in detail; this post will focus on S.

At equilibrium, when dT/dt = 0, the ODE necessitates T(t) = S F(t). A physical interpretation for S becomes apparent: it measures the equilibrium change in temperature per unit forcing, also known as climate sensitivity.

A great deal of research has been conducted with the aim of quantifying climate sensitivity, through paleoclimate analyses, modelling experiments, and instrumental data. Overall, these assessments show that climate sensitivity is on the order of 3 K per doubling of CO2 (divide by 5.35 ln 2 W/m2 to convert to warming per unit forcing).

The IPCC AR4 report (note that AR5 was not yet published at the time of my calculations) compared many different probability distribution functions (PDFs) of climate sensitivity, shown below. They follow the same general shape of a shifted distribution with a long tail to the right, and average 5-95% confidence intervals of around 1.5 to 7 K per doubling of CO2.

Box 10.2, Figure 1 of the IPCC AR4 WG1: Probability distribution functions of climate sensitivity (a), 5-95% confidence intervals (b).

These PDFs generally consist of discrete data points that are not publicly available. Consequently, sampling from any existing PDF would be difficult. Instead, I chose to create my own PDF of climate sensitivity, modelled as a log-normal distribution (e raised to the power of a normal distribution) with the same shape and bounds as the existing datasets.

The challenge was to find values for μ and σ, the mean and standard deviation of the corresponding normal distribution, such that for any z sampled from the log-normal distribution,





Since erf, the error function, cannot be evaluated analytically, this two-parameter problem must be solved numerically. I built a simple particle swarm optimizer to find the solution, which consistently yielded results of μ = 1.1757, σ = 0.4683.

The upper tail of a log-normal distribution is unbounded, so I truncated the distribution at 10 K, consistent with existing PDFs (see figure above). At the beginning of each simulation, climate sensitivity in my model is sampled from this distribution and held fixed for the entire run. A histogram of 106 sampled points, shown below, has the desired characteristics.

Histogram of 106 points sampled from the log-normal distribution used for climate sensitivity in the model.

Note that in order to be used in the ODE, the sampled points must then be converted to units of Km2/W (warming per unit forcing) by dividing by 5.35 ln 2 W/m2, the forcing from doubled CO2.

## A Simple Stochastic Climate Model: Deriving the Backbone

Last time I introduced the concept of a simple climate model which uses stochastic techniques to simulate uncertainty in our knowledge of the climate system. Here I will derive the backbone of this model, an ODE describing the response of global temperature to net radiative forcing. This derivation is based on unpublished work by Nathan Urban – many thanks!

In reality, the climate system should be modelled not as a single ODE, but as a coupled system of hundreds of PDEs in four dimensions. Such a task is about as arduous as numerical science can get, but dozens of research groups around the world have built GCMs (General Circulation Models, or Global Climate Models, depending on who you talk to) which come quite close to this ideal.

Each GCM has taken hundreds of person-years to develop, and I only had eight weeks. So for the purposes of this project, I treat the Earth as a spatially uniform body with a single temperature. This is clearly a huge simplification but I decided it was necessary.

Let’s start by defining T1(t) to be the absolute temperature of this spatially uniform Earth at time t, and let its heat capacity be C. Therefore,

$C \: T_1(t) = E$

where E is the change in energy required to warm the Earth from 0 K to temperature T1. Taking the time derivative of both sides,

$C \: \frac{dT_1}{dt} = \frac{dE}{dt}$

Now, divide through by A, the surface area of the Earth:

$c \: \frac{dT_1}{dt} = \frac{1}{A} \frac{dE}{dt}$

where c = C/A is the heat capacity per unit area. Note that the right side of the equation, a change in energy per unit time per unit area, has units of W/m2. We can express this as the difference of incoming and outgoing radiative fluxes, I(t) and O(t) respectively:

$c \: \frac{dT_1}{dt} = I(t)- O(t)$

By the Stefan-Boltzmann Law,

$c \: \frac{dT_1}{dt} = I(t) - \epsilon \sigma T_1(t)^4$

where ϵ is the emissivity of the Earth and σ is the Stefan-Boltzmann constant.

To consider the effect of a change in temperature, suppose that T1(t) = T0 + T(t), where T0 is an initial equilibrium temperature and T(t) is a temperature anomaly. Substituting into the equation,

$c \: \frac{d(T_0 + T(t))}{dt} = I(t) - \epsilon \sigma (T_0 + T(t))^4$

Noting that T0 is a constant, and also factoring the right side,

$c \: \frac{dT}{dt} = I(t) - \epsilon \sigma T_0^4 (1 + \tfrac{T(t)}{T_0})^4$

Since the absolute temperature of the Earth is around 280 K, and we are interested in perturbations of around 5 K, we can assume that T(t)/T0 ≪ 1. So we can linearize (1 + T(t)/T0)4 using a Taylor expansion about T(t) = 0:

$c \: \frac{dT}{dt} = I(t) - \epsilon \sigma T_0^4 (1 + 4 \tfrac{T(t)}{T_0} + O[(\tfrac{T(t)}{T_0})^2])$

$\approx I(t) - \epsilon \sigma T_0^4 (1 + 4 \tfrac{T(t)}{T_0})$

$= I(t) - \epsilon \sigma T_0^4 - 4 \epsilon \sigma T_0^3 T(t)$

Next, let O0 = ϵσT04 be the initial outgoing flux. So,

$c \: \frac{dT}{dt} = I(t) - O_0 - 4 \epsilon \sigma T_0^3 T(t)$

Let F(t) = I(t) – O0 be the radiative forcing at time t. Making this substitution as well as dividing by c, we have

$\frac{dT}{dt} = \frac{F(t) - 4 \epsilon \sigma T_0^3 T(t)}{c}$

Dividing each term by 4ϵσT03 and rearranging the numerator,

$\frac{dT}{dt} = - \frac{T(t) - \tfrac{1}{4 \epsilon \sigma T_0^3} F(t)}{\tfrac{c}{4 \epsilon \sigma T_0^3}}$

Finally, let S = 1/(4ϵσT03) and τ = cS. Our final equation is

$\frac{dT}{dt} = - \frac{T(t) - S F(t)}{\tau}$

While S depends on the initial temperature T0, all of the model runs for this project begin in the preindustrial period when global temperature is approximately constant. Therefore, we can treat S as a parameter independent of initial conditions. As I will show in the next post, the uncertainty in S based on climate system dynamics far overwhelms any error we might introduce by disregarding T0.

## A Simple Stochastic Climate Model: Introduction

This winter I took a course in computational physics, which has probably been my favourite undergraduate course to date. Essentially it was an advanced numerical methods course, but from a very practical point of view. We got a lot of practice using numerical techniques to solve realistic problems, rather than just analysing error estimates and proving conditions of convergence. As a math student I found this refreshing, and incredibly useful for my research career.

We all had to complete a term project of our choice, and I decided to build a small climate model. I was particularly interested in the stochastic techniques taught in the course, and given that modern GCMs and EMICs are almost entirely deterministic, it was possible that I could contribute something original to the field.

The basic premise of my model is this: All anthropogenic forcings are deterministic, and chosen by the user. Everything else is determined stochastically: parameters such as climate sensitivity are sampled from probability distributions, whereas natural forcings are randomly generated but follow the same general pattern that exists in observations. The idea is to run this model with the same anthropogenic input hundreds of times and build up a probability distribution of future temperature trajectories. The spread in possible scenarios is entirely due to uncertainty in the natural processes involved.

This approach mimics the real world, because the only part of the climate system we have full control over is our own actions. Other influences on climate are out of our control, sometimes poorly understood, and often unpredictable. It is just begging to be modelled as a stochastic system. (Not that it is actually stochastic, of course; in fact, I understand that nothing is truly stochastic, even random number generators – unless you can find a counterexample using quantum mechanics? But that’s a discussion for another time.)

A word of caution: I built this model in about eight weeks. As such, it is highly simplified and leaves out a lot of processes. You should never ever use it for real climate projections. This project is purely an exercise in numerical methods, and an exploration of the possible role of stochastic techniques in climate modelling.

Over the coming weeks, I will write a series of posts that explains each component of my simple stochastic climate model in detail. I will show the results from some sample simulations, and discuss how one might apply these stochastic techniques to existing GCMs. I also plan to make the code available to anyone who’s interested – it’s written in Matlab, although I might translate it to a free language like Python, partly because I need an excuse to finally learn Python.

I am very excited to finally share this project with you all! Check back soon for the next installment.

## Milestones

You may have already heard that carbon dioxide concentrations have surpassed 400 ppm. The most famous monitoring station, Mauna Loa Observatory in Hawaii, reached this value on May 9th. Due to the seasonal cycle, CO2 levels began to decline almost immediately thereafter, but next year they will easily blow past 400 ppm.

Of course, this milestone is largely arbitrary. There’s nothing inherently special about 400 ppm. But it’s a good reminder that while we were arguing about taxation, CO2 levels continued to quietly tick up and up.

In happier news, John Cook and others have just published the most exhaustive survey of the peer-reviewed climate literature to date. Read the paper here (open access), and a detailed but accessible summary here. Unsurprisingly, they found the same 97% consensus that has come up over and over again.

Cook et al read the abstracts of nearly 12 000 papers published between 1991 and 2011 – every single hit from the ISI Web of Science with the keywords “global climate change” or “global warming”. Several different people categorized each abstract, and the authors were contacted whenever possible to categorize their own papers. Using several different methods like this makes the results more reliable.

Around two-thirds of the studies, particularly the more recent ones, didn’t mention the cause of climate change. This is unsurprising, since human-caused warming has been common knowledge in the field for years. Similarly, seismology papers don’t usually mention that plate tectonics cause earthquakes, particularly in the abstracts where space is limited.

Among the papers which did express a position, 97.1% said climate change was human-caused. Again, unsurprising to anyone working in the field, but it’s news to many members of the public. The study has been widely covered in the mainstream media – everywhere from The Guardian to The Australian – and even President Obama’s Twitter feed.

Congratulations are also due to Andrew Weaver, my supervisor from last summer, who has just been elected to the British Columbia provincial legislature. He is not only the first-ever Green Party MLA in BC’s history, but also (as far as I know) the first-ever climate scientist to hold public office.

Governments the world over are sorely in need of officials who actually understand the problem of climate change. Nobody fits this description better than Andrew, and I think he is going to be great. The large margin by which he won also indicates that public support for climate action is perhaps higher than we thought.

Finally, my second publication came out this week in Climate of the Past. It describes an EMIC intercomparison project the UVic lab conducted for the next IPCC report, which I helped out with while I was there. The project was so large that we split the results into two papers (the second of which is in press in Journal of Climate). This paper covers the historical experiments – comparing model results from 850-2005 to observations and proxy reconstructions – as well as some idealized experiments designed to measure metrics such as climate sensitivity, transient climate response, and carbon cycle feedbacks.

## A Visit to NCAR

Last week I was lucky enough to attend the Second Workshop on Coupling Technologies for Earth System Models, held at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, USA. I was excited just to visit NCAR, which is one of the top climate research facilities in the world. Not only is it packed full of interesting scientists and great museum displays, but it’s nestled in the Rocky Mountains and so the view from the conference room looks like this:

Many of the visitors would spend large portions of the coffee breaks just staring out the window…

The conference was focused on couplers – the part of a climate model that ties all the other components (atmosphere, ocean, land, etc.) together. However, the presentations covered (as Rob Jacob put it) “everything that physical scientists don’t care about unless it stops working”. Since I consider myself a physical scientist, this included a lot of concepts I hadn’t thought about before:

• Parallel processing: Since climate models are so big, it makes sense to multitask by splitting the work over many computer processors. You have to allocate the right number of processors to each component, though: if the atmosphere has too many processors, it will finish its timestep too quickly and sit there waiting until the ocean is done, and vice versa. This is called load balancing, and it gets very tricky as soon as the number of components exceeds two.
• Scalability: The more processors you use, the faster the model runs, but the speed has diminishing returns. If you double the number of processors, you won’t quite double the speed, particularly if the number of processors exceeds 104 (a setup which is becoming increasingly affordable for large research groups). Historically, the coupler has not been a code bottleneck (limiting factor for model speed), but as the number of processors gets very large, that scenario is changing. We have to figure out the most efficient way to couple many small components together, so that climate model speed can continue to keep up with advances in computer hardware.
• Standardization: Modelling groups across the world are communicating with each other more and more, and using each other’s code. Currently this requires a lot of modifications, because every climate model has a different structure. Everyone seems to agree that it would be great to have a standard interface that allowed you to plug any combination of components together, but of course everyone has a different idea of what that standard should be.
• Fortran is still the best language for climate models, believe it or not, because it is the fastest language for the kinds of operations required. If a modern, accessible language like Python could compete based on speed, you can bet that new climate models like MPAS would use it.

I was at the conference with Steve Easterbrook and his new M.Sc. student Daniel Levy, presenting our bubble diagrams of model architecture. (If you haven’t already, read my AGU poster schpiel first, or none of this will make sense!) As interesting and useful as these diagrams are, there were some flaws in our original analysis:

1. We didn’t use preprocessed code, meaning that each “model” is actually the code base for many different model configurations. So our estimate of model complexity based on line count is biased towards models which are very configurable, but might not actually be very complex. We can fix this by choosing specific configurations of each model (for consistency, the configuration used in CMIP5 or the equivalent EMIC AR5 intercomparison project) and obtaining preprocessed code from the corresponding institutions.
2. We sorted the code into components (eg atmosphere) and sub-components (eg atmospheric aerosols) based on folder structure, which might not reflect the hierarchy of routines formed at runtime. Some modelling groups keep their files very organized, but often code from different parts of the model was mixed together, and separating it out was very much a judgement call. To fix this, we can sort based on the dependency structure (a massive tree graph showing which routines call which): all the descendants of the atmosphere driver are part of the atmosphere component, and so on.
3. We made our diagrams in Microsoft PowerPoint, which is quite limited, and didn’t allow us to size the bubbles so their area was perfectly proportional to line count. Instead, we just had to eyeball it. We can fix this by using Adobe Illustrator, which is much more advanced and has this capability.

So far, we’ve repeated the analysis for the UK Met Office Model, version HadGEM2-ES. I created the dependency structure by going manually through every file and making good use of grep, which took hours and hours (although it was a nice, menial way to avoid studying for my courses!). Daniel is going to write a Fortran parser to make the job easier next time around. In the meantime, our HadGEM2-ES diagram is absolutely gorgeous and wonderfully accurate:

I will post future diagrams as they become available. We think the main use of these diagrams will be as communication tools between scientists, so they are free to use with attribution.

Just a few more weeks of classes, then I can enjoy some full-time research. Now that I’ve had a taste of being a proper scientist, it’s hard to go back!