Modularity

I’ve now taken a look at the code and structure of four different climate models: Model E, CESM, UVic ESCM, and the Met Office Unified Model (which contains all the Hadley models). I’m noticing all sorts of similarities and differences, many of which I didn’t expect.

For example, I didn’t anticipate any overlap in climate model components. I thought that every modelling group would build their own ocean, their own atmosphere, and so on, from scratch. In fact, what I think of as a “model” – a self-contained, independent piece of software – applies to components more accurately than it does to an Earth system model. The latter is more accurately described as a collection of models, each representing one piece of the climate system. Each modelling group has a different collection of models, but not every one of these models is unique to their lab.

Ocean models are a particularly good example. The Modular Ocean Model (MOM) is built by GFDL, but it’s also used in NASA’s Model E and the UVic Earth System Climate Model. Another popular ocean model is the Nucleus for European Modelling of the Ocean (NEMO, what a great acronym) which is used by the newer Hadley climate models, as well as the IPSL model from France (which is sitting on my desktop as my next project!)

Aside: Speaking of clever acronyms, I don’t know what the folks at NCAR were thinking when they created the Single Column Atmosphere Model. Really, how did they not see their mistake? And why haven’t Marc Morano et al latched onto this acronym and spread it all over the web by now?

In most cases, an Earth system model has a unique architecture to fit all the component models together – a different coupling process. However, with the rise of standard interfaces like the Earth System Modeling Framework, even couplers can be reused between modelling groups. For example, the Hadley Centre and IPSL both use the OASIS coupler.

There are benefits and drawbacks to the rising overlap and “modularity” of Earth system models. One could argue that it makes the models less independent. If they all agree closely, how much of that agreement is due to their physical grounding in reality, and how much is due to the fact that they all use a lot of the same code? However, modularity is clearly a more efficient process for model development. It allows larger communities of scientists from each sub-discipline of Earth system modelling to form, and – in the case of MOM and NEMO – make two or three really good ocean models, instead of a dozen mediocre ones. Concentrating our effort, and reducing unnecessary duplication of code, makes modularity an attractive strategy, if an imperfect one.

The least modular of all the Earth system models I’ve looked at is Model E. The documentation mentions different components for the atmosphere, sea ice, and so on, but these components aren’t separated into subdirectories, and the lines between them are blurry. Nearly all the fortran files sit in the same directory, “model”,  and some of them deal with two or more components. For example, how would you categorize a file that calculates surface-atmosphere fluxes? Even where Model E uses code from other institutions, such as the MOM ocean model, it’s usually adapted and integrated into their own files, rather than in a separate directory.

The most modular Earth system model is probably the Met Office Unified Model. They don’t appear to have adapted NEMO, CICE (the sea ice model from NCAR) and OASIS at all – in fact, they’re not present in the code repository they gave us. I was a bit confused when I discovered that their “ocean” directory, left over from the years when they wrote their own ocean code, was now completely empty! Encapsulation to the point where a component model can be stored completely externally to the structural code was unexpected.

An interesting example of the challenges of modularity appears in sea ice. Do you create a separate, independent sea ice component, like CESM did? Do you consider it part of the ocean, like NEMO? Or do you lump in lake ice along with sea ice and subsequently allow the component to float between the surface and the ocean, like Model E?

The real world isn’t modular. There are no clear boundaries between components on the physical Earth. But then, there’s only one physical Earth, whereas there are many virtual Earths in the form of climate modelling, and limited resources for developing the code in each component. In this spectrum of interconnection and encapsulation, is one end or the other our best bet? Or is there a healthy balance somewhere in the middle?