Wrapping Up

Posted on Aug 16, 2011 by climatesight

My summer job as a research student of Steve Easterbrook is nearing an end. All of a sudden, I only have a few days left, and the weather is (thankfully) cooling down as autumn approaches. It feels like just a few weeks ago that this summer was beginning!

Over the past three months, I examined seven different GCMs from Canada, the United States, and Europe. Based on the source code, documentation, and correspondence with scientists, I uncovered the underlying architecture of each model. This was represented in a set of diagrams. You can view full-sized versions here:

Diagram key
COSMOS 1.2.1
Model E (17/06/2011)
HadGEM3 (03/08/2009) We are working on getting access to more recent source code for this model, as it is using a new land component now.
CESM 1.0.3
GFDL CM 2.1
IPSLCM5A
UVic ESCM 2.9

The component bubbles are to scale (based on the size of the code base) within each model, but not between models. The size and complexity of each GCM varies greatly, as can be seen below. UVic is by far the least complex model – it is arguably closer to an EMIC than a full GCM.

I came across many insights while comparing GCM architectures, regarding how modular components are, how extensively the coupler is used, and how complexity is distributed between components. I wrote some of these observations up into the poster I presented last week to the computer science department. My references can be seen here.

A big thanks to the scientists who answered questions about their work developing GCMs: Gavin Schmidt (Model E); Michael Eby (UVic); Tim Johns (HadGEM3); Arnaud Caubel, Marie-Alice Foujols, and Anne Cozic (IPSL); and Gary Strand (CESM). Additionally, Michael Eby from the University of Victoria was instrumental in improving the diagram design.

Although the summer is nearly over, our research certainly isn’t. I have started writing a more in-depth paper that Steve and I plan to develop during the year. We are also hoping to present our work at the upcoming AGU Fall Meeting, if our abstract gets accepted. Beyond this project, we are also looking at a potential experiment to run on CESM.

I guess I am sort of a scientist now. The line between “student” and “scientist” is blurry. I am taking classes, but also writing papers. Where does one end and the other begin? Regardless of where I am on the spectrum, I think I’m moving in the right direction. If this is what Doing Science means – investigating whatever little path interests me – I’m certainly enjoying it.

10 thoughts on “Wrapping Up”

Richard Pauli on Aug 16, 2011 at 3:30 pm said:

Thank you so much for all that you do. Here and now, and in the future.

Oh… and please hurry. And don’t let anything get in your way.

Reply ↓
Stephen Berg on Aug 16, 2011 at 3:49 pm said:

Holy cow! That’s really intense stuff! Great to see you have been able to get a grasp of all of that! I don’t know if I could. Nice work!

Reply ↓
Deech56 on Aug 17, 2011 at 3:57 am said:

When you’re contributing science, you’re a scientist. A scientist never stops being a student, but there does come a time when the flow of information is greater from the scientist part of you to other students. Keep up the good work, Kate.

Reply ↓
Ken on Aug 17, 2011 at 10:45 am said:

Congratulations,Kate. I’ll second Deech56’s comments too. I’ll also add that being a scientist is a frame of mind in that it requires an ability to examine evidence as honestly as possible. Fred Hoyle, for example, comes to mind–he didn’t think the Big Bang was right, yet he later worked on the BB model with others so it would be the best model available. Of course the converse is also true where a scientist is no longer a scientist because they’ve lost–or never had?–that mindset, and are more interested in fitting evidence into their ideologies.

Reply ↓
Nathan Urban on Aug 18, 2011 at 10:10 am said:

UVic is an EMIC, not a full GCM. (There is a new version being developed, OSUVic, which has a dynamic atmosphere and thus counts as a simple AOGCM/ESM.)

Its complex ocean (a modified version of MOM) is more akin to a GCM, so its developers feel it sits somewhere between the two on the hierarchy of models. -Kate

Reply ↓
Nathan Urban on Aug 19, 2011 at 10:08 am said:

Not to split hairs too much, but UVic is referred to as an “intermediate complexity” model rightin the paper in which it was introduced as well as in standard reviews of EMICs. Now that coupled AOGCMs have become common, variants of AGCMs and OGCMs are often included in the “EMIC” class: they have an atmosphere or an ocean circulation model, but not both.

Reply ↓
rsdunlapiv on Oct 17, 2011 at 3:14 pm said:

Hi Kate,

These architecture diagrams are fantastic! I’m wondering what process you used to come up with all of these? Did you wade through the source code yourself? Did you look at user documentation? Interviews?

Like you, I am also interested in comparing different climate modeling architectures. In fact, it’s the topic of my PhD thesis. I have spent a good bit of time looking at the various coupling technologies out there. As I’m sure you have seen, in many cases the coupler determines a lot about how a climate model is architected.

In addition to my question about your process in generating the diagrams, I’m also wondering whether you studied the rationale behind the various architectural choices that modelers have made. For example, in some cases the land is part of the atmosphere (e.g., such as one or more subroutine calls) and in other cases the land is another component that is accessed via the coupler. Why the difference? Is purely historical accident? Is there a well-thought-out reason for including the land in the atmosphere sometimes and not including it other times? If you have some insights about this (and the reasons behind other architectural differences), I’d be very interested in hearing what you have learned.

Enjoy the Fall AGU if you poster is accepted!

Reply ↓
- climatesight on Oct 18, 2011 at 9:56 pm said:
  
  Wow – someone else researching in this (relatively uninhabited) niche! Thanks for stopping by. Here are some responses to your questions:
  
  These diagrams weren’t planned from the beginning – they just sort of evolved from my research process. I mainly explored the top-level source code, particularly the main coupler loop, for each model, as well as the drivers for each component. User documentation, if it existed, generally dealt with instructions to run the model, rather than describing its structure. Many labs, though, will publish a paper describing the model, which was helpful. Once I had been through all that, I directed any questions I still had to our contacts at each modelling group. They were all very helpful.
  
  I originally planned to make flow charts of method calls, but that proved to be too large and unwieldy. A diagram of directory structure didn’t seem ambitious enough. So I developed a sort of hybrid of the two, a modular approach that also shows the dynamic relationships between components. As the components are usually well encapsulated in the directory structure (although I sometimes had to partition them manually), it was easy to do a line count (I like SLOCCount because it excludes comments and blank lines) to find the relative size of the code base for each component. This is a reasonable proxy for complexity, which added another level of functionality to the diagrams.
  
  Several revisions to the design were needed, to make it consistent, concise, and accurate. Michael Eby from the University of Victoria was incredibly helpful in offering suggestions for improvement.
  
  Regarding your second question, I found the major determinants of code structure to be 1) where each component originated and 2) how the components interact in the real world. For example, in HadGEM3, the atmosphere and land are both built by the Met Office. One is clearly dominant to the other in terms of code complexity, and the land only really needs to exchange fluxes with the atmosphere (unless coastline erosion is included), so it makes sense that the two components would have the same grid and interact directly rather than through the coupler. A similar situation can be seen with sea ice models, which are often sub-components of ocean models. An alternate example is UVic, where the atmosphere came from a different lab than the land (same with ocean and sea ice). The resulting difference in grids and interfaces makes a central coupler essential.
  
  Something Steve pointed out about my diagrams was that many scientists probably wouldn’t include the coupler, viewing it as an invisible piece of infrastructure with no analogue in the real world – but it is in fact an essential and complex piece of software that’s necessary to explain most component relationships.
  
  You probably have already seen this, but I read a paper this summer about coupling technologies that you may find helpful. I don’t have the citation with me here but I know one of the authors was Mariana Vertenstein. Let me know if you don’t have it and I will track down the reference.
  
  Reply ↓
- climatesight on Oct 18, 2011 at 9:59 pm said:
  
  PS If you (or anyone) wants to use these diagrams, for example in presentations (some of the scientists working on the models we studied have already done so!), just send me an email – we did get in to AGU, and are planning on a paper as well, so the diagrams will be updated in the near future.
  
  Reply ↓
  - rsdunlapiv on Oct 18, 2011 at 10:41 pm said:
    
    Your reasons for why climate model codebases are structured the way they are make sense. If you are going to import a component (e.g., land model) from another lab, it seems much easier to incorporate the new model via a separate coupler component than to try to integrate it directly with an existing component (e.g., atmosphere model). One goal of my research is to determine how it would be possible for the same land model to be composed either via an external coupler or integrated more tightly with the atmosphere model. This could be done by isolating the parts of the land model responsible for coupling functions and automatically generating the “glue code” to tie it back to the original model. It turns out this is very hard to do, but we that is the direction I am heading.
    
    Regarding whether or not to even include the coupler in the architecture diagram, I’d have to say that it depends on your audience. Most scientists, for example, are likely only thinking about the geophysical domains in the model (and rightly so). I am a software engineering guy, so I’m interested in how data is communicated among the various models. So from my perspective, the coupler is very important. In fact, the science in the models is so complex that the coupler is one of the few things I can get my head around!
    
    I’m familiar with the coupling technologies paper with Mariana. I’m pretty sure you are talking about a draft of a paper we submitted earlier this year.
    
    Also, great to hear that your work will be presented at AGU. I’m looking forward to reading the paper when it comes out. Please keep us all updated on your blog.