A Visit to NCAR

Last week I was lucky enough to attend the Second Workshop on Coupling Technologies for Earth System Models, held at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado, USA. I was excited just to visit NCAR, which is one of the top climate research facilities in the world. Not only is it packed full of interesting scientists and great museum displays, but it’s nestled in the Rocky Mountains and so the view from the conference room looks like this:

2013-02-21 13.46.43

Many of the visitors would spend large portions of the coffee breaks just staring out the window…

The conference was focused on couplers – the part of a climate model that ties all the other components (atmosphere, ocean, land, etc.) together. However, the presentations covered (as Rob Jacob put it) “everything that physical scientists don’t care about unless it stops working”. Since I consider myself a physical scientist, this included a lot of concepts I hadn’t thought about before:

  • Parallel processing: Since climate models are so big, it makes sense to multitask by splitting the work over many computer processors. You have to allocate the right number of processors to each component, though: if the atmosphere has too many processors, it will finish its timestep too quickly and sit there waiting until the ocean is done, and vice versa. This is called load balancing, and it gets very tricky as soon as the number of components exceeds two.
  • Scalability: The more processors you use, the faster the model runs, but the speed has diminishing returns. If you double the number of processors, you won’t quite double the speed, particularly if the number of processors exceeds 104 (a setup which is becoming increasingly affordable for large research groups). Historically, the coupler has not been a code bottleneck (limiting factor for model speed), but as the number of processors gets very large, that scenario is changing. We have to figure out the most efficient way to couple many small components together, so that climate model speed can continue to keep up with advances in computer hardware.
  • Standardization: Modelling groups across the world are communicating with each other more and more, and using each other’s code. Currently this requires a lot of modifications, because every climate model has a different structure. Everyone seems to agree that it would be great to have a standard interface that allowed you to plug any combination of components together, but of course everyone has a different idea of what that standard should be.
  • Fortran is still the best language for climate models, believe it or not, because it is the fastest language for the kinds of operations required. If a modern, accessible language like Python could compete based on speed, you can bet that new climate models like MPAS would use it.

I was at the conference with Steve Easterbrook and his new M.Sc. student Daniel Levy, presenting our bubble diagrams of model architecture. (If you haven’t already, read my AGU poster schpiel first, or none of this will make sense!) As interesting and useful as these diagrams are, there were some flaws in our original analysis:

  1. We didn’t use preprocessed code, meaning that each “model” is actually the code base for many different model configurations. So our estimate of model complexity based on line count is biased towards models which are very configurable, but might not actually be very complex. We can fix this by choosing specific configurations of each model (for consistency, the configuration used in CMIP5 or the equivalent EMIC AR5 intercomparison project) and obtaining preprocessed code from the corresponding institutions.
  2. We sorted the code into components (eg atmosphere) and sub-components (eg atmospheric aerosols) based on folder structure, which might not reflect the hierarchy of routines formed at runtime. Some modelling groups keep their files very organized, but often code from different parts of the model was mixed together, and separating it out was very much a judgement call. To fix this, we can sort based on the dependency structure (a massive tree graph showing which routines call which): all the descendants of the atmosphere driver are part of the atmosphere component, and so on.
  3. We made our diagrams in Microsoft PowerPoint, which is quite limited, and didn’t allow us to size the bubbles so their area was perfectly proportional to line count. Instead, we just had to eyeball it. We can fix this by using Adobe Illustrator, which is much more advanced and has this capability.

So far, we’ve repeated the analysis for the UK Met Office Model, version HadGEM2-ES. I created the dependency structure by going manually through every file and making good use of grep, which took hours and hours (although it was a nice, menial way to avoid studying for my courses!). Daniel is going to write a Fortran parser to make the job easier next time around. In the meantime, our HadGEM2-ES diagram is absolutely gorgeous and wonderfully accurate:
HadGEM2-ES
I will post future diagrams as they become available. We think the main use of these diagrams will be as communication tools between scientists, so they are free to use with attribution.

Just a few more weeks of classes, then I can enjoy some full-time research. Now that I’ve had a taste of being a proper scientist, it’s hard to go back!

Advertisements

Climate Models on Ubuntu

Part 1: Model E

I felt a bit over my head attempting to port CESM, so I asked a grad student, who had done his Master’s on climate modelling, for help. He looked at the documentation, scratched his head, and suggested I start with NASA’s Model E instead, because it was easier to install. And was it ever! We had it up and running within an hour or so. It was probably so much easier because Model E comes with gfortran support, while CESM only has scripts written for commercial compilers like Intel or PGI.

Strangely, when using Model E, no matter what dates the rundeck sets for the simulation start and end, the subsequently generated I file always has December 1, 1949 as the start date and December 2, 1949 as the end date. We edited the I files after they were created, which seemed to fix the problem, but it was still kind of weird.

I set up Model E to run a ten-year simulation with fixed atmospheric concentration (really, I just picked a rundeck at random) over the weekend. It took it about 3 days to complete, so just over 7 hours per year of simulation time…not bad for a 32-bit desktop!

However, I’m having some weird problems with the output – after configuring the model to output files in NetCDF format and opening them in Panoply, only the file with all the sea ice variables worked. All the others either gave a blank map (array full of N/A’s) or threw errors when Panoply tried to read them. Perhaps the model isn’t enjoying having the I file edited?

Part 2: CESM

After exploring Model E, I felt like trying my hand at CESM again. Steve managed to port it onto his Macbook last year, and took detailed notes. Editing the scripts didn’t seem so ominous this time!

The CESM code can be downloaded using Subversion (instructions here) after a quick registration. Using the Ubuntu Software Center, I downloaded some necessary packages: libnetcdf-dev, mpich2, and torque-scheduler. I already had gfortran, which is sort of essential.

I used the Porting via user defined machine files method to configure the model for my machine, using the Hadley scripts as a starting point. Variables for the config_machines.xml are explained in Appendix D through H of the user’s guide (links in chapter 7). Mostly, you’re just pointing to folders where you want to store data and files. Here are a few exceptions:

  • DOUT_L_HTAR: I stuck with "TRUE", as that was the default.
  • CCSM_CPRNC: this tool already exists in the CESM source code, in /models/atm/cam/tools/cprnc.
  • BATCHQUERY and BATCHSUBMIT: the Hadley entry had “qstat” and “qsub”, respectively, so I Googled these terms to find out which batch submission software they referred to (Torque, which is freely available in the torque-scheduler package) and downloaded it so I could keep the commands the same!
  • GMAKE_J: this determines how many processors to commit to a certain task, and I wasn’t sure how many this machine had, so I just put “1”.
  • MAX_TASKS_PER_NODE: I chose "8", which the user’s guide had mentioned as an example.
  • MPISERIAL_SUPPORT: the default is “FALSE”.

The only file that I really needed to edit was Macros.<machine name>. The env_machopts.<machine name> file ended up being empty for me. I spent a while confused by the modules declarations, which turned out to refer to the Environment Modules software. Once I realized that, for this software to be helpful, I would have to write five or six modulefiles in a language I didn’t know, I decided that it probably wasn’t worth the effort, and took these declarations out. I left mkbatch.<machine name> alone, except for the first line which sets the machine, and then turned my attention to Macros.

“Getting this to work will be an iterative process”, the user’s guide says, and it certainly was (and still is). It’s never a good sign when the installation guide reminds you to be patient! Here is the sequence of each iteration:

  1. Edit the Macros file as best I can.
  2. Open up the terminal, cd to cesm1_0/scripts, and create a new case as follows: ./create_newcase -case test -res f19_g16 -compset X -mach <machine name>
  3. If this works, cd to test, and run configure: ./configure -case
  4. If all is well, try to build the case: ./test.<machine name>.build
  5. See where it fails and read the build log file it refers to for ideas as to what went wrong. Search on Google for what certain errors mean. Do some other work for a while, to let the ideas simmer.
  6. Set up for the next case: ./test.<machine name>.clean_build , cd .., and rm -rf test. This clears out old files so you can safely build a new case with the same name.
  7. See step 1.

I wasn’t really sure what the program paths were, as I couldn’t find a nicely contained folder for each one (like Windows has in “Program Files”), but I soon stumbled upon a nice little trick: look up the package on Ubuntu Package Manager, and click on “list of files” under the Download section. That should tell you what path the program used as its root.

I also discovered that setting FC and CC to gfortran and gcc, respectively, in the Macros file will throw errors. Instead, leave the variables as mpif90 and mpicc, which are linked to the GNU compilers. For example, when I type mpif90 in the terminal, the result is gfortran: no input files, just as if I had typed gfortran. For some reason, though, the errors go away.

As soon as I made it past building the mct and pio libraries, the build logs for each component (eg atm, ice) started saying gmake: command not found. This is one of the pitfalls of Ubuntu: it uses the command make for the same program that basically every other Unix-based OS calls gmake. So I needed to find and edit all the scripts that called gmake, or generated other scripts that called it, and so on. “There must be a way to automate this,” I thought, and from this article I found out how. In the terminal, cd to the CESM source code folder, and type the following:

grep -lr -e 'gmake' * | xargs sed -i 's/gmake/make/g'

You should only have to do this once. It’s case sensitive, so it will leave the xml variable GMAKE_J alone.

Then I turned my attention to compiler flags, which Steve chronicled quite well in his notes (see link above). I made most of the same changes that he did, except I didn’t need to change -DLINUX to -DDarwin. However, I needed some more compiler flags still. In the terminal, man gfortran brings up a list of all the options for gfortran, which was helpful.

The ccsm build log had hundreds of undefined reference errors as soon as it started to compile fortran. The way I understand it, many of the fortran files reference each other, but gfortran likes to append underscores to user-defined variables, and then it can’t find the file the variable is referencing! You can suppress this using the flag -fno-underscoring.

Now I am stuck on a new error. It looks like the ccsm script is almost reaching the end, as it’s using ld, the gcc linking mechanism, to tie all the files together. Then the build log says:

/usr/bin/ld: seq_domain_mct.o(.debug_info+0x1c32): unresolvable R_386_32 relocation against symbol 'mpi_fortran_argv_null'
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status

I’m having trouble finding articles on the internet about similar errors, and the gcc and ld manpages are so long that trying every compiler flag isn’t really an option. Any ideas?

Update: Fixed it! In scripts/ccsm_utils/Build/Makefile, I changed LD := $(F90) to LD := gcc -shared. The build was finally successful! Now off to try and run it…

The good thing is that, since I re-started this project a few days ago, I haven’t spent very long stuck on any one error. I’m constantly having problems, but I move through them pretty quickly! In the meantime, I’m learning a lot about the model and how it fits everything together during installation. I’ve also come a long way with Linux programming in general. Considering that when I first installed Ubuntu a few months ago, and sheepishly called my friend to ask where to find the command line, I’m quite proud of my progress!

I hope this article will help future Ubuntu users install CESM, as it seems to have a few quirks that even Mac OS X doesn’t experience (eg make vs gmake). For the rest of you, apologies if I have bored you to tears!

Self-Taught Climate Science

If you haven’t already guessed, I am a real math and science geek (and rapidly becoming a computer programming geek as well). So, when I got my first taste of quantitative climate analysis from Dana’s articles over at Skeptical Science, I was really interested. It will be a while before my education takes me in that direction, and I’m starting to think I’m not that patient. I would like to learn some relevant physics and programming ahead of time.

Here is my list of plans and resources, roughly in order of priority:

  • Learn Fortran. The majority of code in climate models is written in Fortran, and this probably isn’t going to change any time soon. I have begun studying an online Fortran 77 tutorial, and am finding that learning a second programming language is far easier than the first (Java, in my case). The major concepts are virtually identical – it’s all a case of syntax.
  • Read and do problems from some relevant chapters in my physics textbook that we will not be covering in the course: fluid dynamics and thermodynamics.
  • Follow through, in detail, a derivation of a zero-dimensional energy balance model for the Earth that was kindly sent to me by a reader.
  • Read David Archer’s textbook, Global Warming: Understanding the Forecast. I attempted to read it a year or two ago, but I hadn’t done very much physics yet and consequently became kind of lost (“Electrons are waves?!” the younger Kate said incredulously). Dr. Archer has also posted accompanying video lectures from the University of Chicago course the book is based on, which will help.
  • Try to find a copy of Ray Pierrehumbert’s new book, Principles of Planetary Climate. From what I have heard, this will involve learning some Python.
  • I have several textbooks on loan or second hand, two regarding climate physics, and one about general atmospheric dynamics.

That will probably keep me busy for some time, but I would appreciate recommendations for additions/changes!