Working Away

The shape of my summer research is slowly becoming clearer. Basically, I’ll be writing a document comparing the architecture of different climate models. This, of course, involves getting access to the source code. Building on Steve’s list, here are my experiences:

NCAR, Community Earth System Model (CESM): Password-protected, but you can get access within an hour. After a quick registration, you’ll receive an automated email with a username and password. This login information gives you access to their Subversion repository. Registration links and further information are available here, under “Acquiring the CESM1.0 Release Code”.

University of Victoria, Earth System Climate Model (ESCM): Links to the source code can be found on this page, but they’re password-protected. You can request an account by sending an email – follow the link for more information.

Geophysical Fluid Dynamics Laboratory (GFDL), CM 2.1: Slightly more complicated. Create an account for their Gforge repository, which is an automated process. Then, request access to the MOM4P1 project – apparently CM 2.1 is included within that. Apparently, the server grants you request to a project, so it sounds automatic – but the only emails I’ve received from the server regard some kind of GFDL mailing list, and don’t mention the project request. I will wait and see.
Update (July 20): It looks like I got access to the project right after I requested it – I just never received an email!

Max Planck Institute (MPI), COSMOS: Code access involves signing a licence agreement, faxing it to Germany, and waiting for it to be approved and signed by MPI. The agreement is not very restrictive, though – it deals mainly with version control, documenting changes to the code, etc.

UK Met Office, Hadley Centre Coupled Model version 3 (HadCM3): Our lab already has a copy of the code for HadCM3, so I’m not really sure what the process is to get access, but apparently it involved a lot of government paperwork.

Institut Pierre Simon Laplace (IPSL), CM5: This one tripped me up for a while, largely because the user guide is difficult to find, and written in French. Google Translate helped me out there, but it also attempted to “translate” their command line samples! Make sure that you have ksh installed, too – it’s quick to fix, but I didn’t realize it right away. Some of the components for IPSLCM5 are open access, but others are password-protected. Follow the user guide’s instructions for who to email to request access.

Model E: This was the easiest of all. From the GISS website, you can access all the source code without any registration. They offer a frozen AR4 version, as well as nightly snapshots of the work-in-process for AR5 (frozen AR5 version soon to come). There is also a wealth of documentation on this site, such as an installation guide and a description of the model.

I’ve taken a look at the structural code for Model E, which is mostly contained in the file MODELE.f. The code is very clear and well commented, and the online documentation helped me out too. After drawing a lot of complicated diagrams with arrows and lists, I feel that I have a decent understanding of the Model E architecture.

Reading code can become monotonous, though, and every now and then I feel like a little computer trouble to keep things interesting. For that reason, I’m continuing to chip away at building and running two models, Model E and CESM. See my previous post for how this process started.

<TECHNICAL COMPUTER STUFF> (Feel free to skip ahead…)

I was still having trouble viewing the Model E output (only one file worked on Panoply, the rest created an empty map) so I emailed some of the lab’s contacts at NASA. They suggested I install CDAT, a process which nearly broke Ubuntu (haven’t we all been there?) Basically, because it’s an older program, it thought the newest version of Python was 2.5 – which it subsequently installed and set as the default in /usr/bin. Since I had Python 2.6 installed, and the versions are apparently very not-backwards-compatible, every program that depended on Python (i.e. almost everything on Ubuntu) stopped working. Our IT contact managed to set 2.6 back as the default, but I’m not about to try my hand at CDAT again…

I have moved forward very slightly on CESM. I’ve managed to build the model, but upon calling test.<machine name>.run, I get rather an odd error:

./Tools/ccsm_getenv: line 9: syntax error near unexpected token '('
./Tools/ccsm_getenv: line 9: 'foreach i (env_case.xml env_run.xml env_conf.xml env_build.xml env_mach_pes.xml)'

Now, I’m pretty new at shell scripting, but I can’t see the syntax error there – and wouldn’t syntax errors appear at compile-time, rather than run-time?

A post by Michael Tobis, who had a similar error, suggested that the issue had to do with qsub. Unfortunately, that meant I had to actually use qsub – I had previously given up trying to configure Torque to run on a single machine rather than many. I gave the installation another go, and now I can get scripts into the queue, but they never start running – their status stays as “Q” even if I leave the computer alone for an hour. Since the machine has a dual-core processor, I can’t see why it couldn’t run both a server and a node at once, but it doesn’t seem to be working for me.

</TECHNICAL COMPUTER STUFF>

Before I started this job, climate models seemed analogous to Antarctica – a distant, mysterious, complex system that I wanted to visit, but didn’t know how to get to. In fact, they’re far more accessible than Antarctica. More on the scale of a complicated bus trip across town, perhaps?

They are not perfect pieces of software, and they’re not very user friendly. However, all the struggles of installation pay off when you finally get some output, and open it up, and see realistic data representing the very same planet you’re sitting on! Even just reading the code for different models shows you many different ways to look at the same system – for example, is sea ice a realm of its own, or is it a subset of the ocean? In the real world the lines are blurry, but computation requires us to make clear divisions.

The code can be unintelligible (lndmaxjovrdmdni) or familiar (“The Stefan-Boltzmann constant! Finally I recognize something!”) or even entertaining (a seemingly random identification string, dozens of characters long, followed by the comment if you edit this you will get what you deserve). When you get tied up in the code, though, it’s easy to miss the bigger picture: the incredible fact that we can use the sterile, binary practice of computation to represent a system as messy and mysterious as the whole planet. Isn’t that something worth sitting and marveling over?

Advertisements

Climate Models on Ubuntu

Part 1: Model E

I felt a bit over my head attempting to port CESM, so I asked a grad student, who had done his Master’s on climate modelling, for help. He looked at the documentation, scratched his head, and suggested I start with NASA’s Model E instead, because it was easier to install. And was it ever! We had it up and running within an hour or so. It was probably so much easier because Model E comes with gfortran support, while CESM only has scripts written for commercial compilers like Intel or PGI.

Strangely, when using Model E, no matter what dates the rundeck sets for the simulation start and end, the subsequently generated I file always has December 1, 1949 as the start date and December 2, 1949 as the end date. We edited the I files after they were created, which seemed to fix the problem, but it was still kind of weird.

I set up Model E to run a ten-year simulation with fixed atmospheric concentration (really, I just picked a rundeck at random) over the weekend. It took it about 3 days to complete, so just over 7 hours per year of simulation time…not bad for a 32-bit desktop!

However, I’m having some weird problems with the output – after configuring the model to output files in NetCDF format and opening them in Panoply, only the file with all the sea ice variables worked. All the others either gave a blank map (array full of N/A’s) or threw errors when Panoply tried to read them. Perhaps the model isn’t enjoying having the I file edited?

Part 2: CESM

After exploring Model E, I felt like trying my hand at CESM again. Steve managed to port it onto his Macbook last year, and took detailed notes. Editing the scripts didn’t seem so ominous this time!

The CESM code can be downloaded using Subversion (instructions here) after a quick registration. Using the Ubuntu Software Center, I downloaded some necessary packages: libnetcdf-dev, mpich2, and torque-scheduler. I already had gfortran, which is sort of essential.

I used the Porting via user defined machine files method to configure the model for my machine, using the Hadley scripts as a starting point. Variables for the config_machines.xml are explained in Appendix D through H of the user’s guide (links in chapter 7). Mostly, you’re just pointing to folders where you want to store data and files. Here are a few exceptions:

  • DOUT_L_HTAR: I stuck with "TRUE", as that was the default.
  • CCSM_CPRNC: this tool already exists in the CESM source code, in /models/atm/cam/tools/cprnc.
  • BATCHQUERY and BATCHSUBMIT: the Hadley entry had “qstat” and “qsub”, respectively, so I Googled these terms to find out which batch submission software they referred to (Torque, which is freely available in the torque-scheduler package) and downloaded it so I could keep the commands the same!
  • GMAKE_J: this determines how many processors to commit to a certain task, and I wasn’t sure how many this machine had, so I just put “1”.
  • MAX_TASKS_PER_NODE: I chose "8", which the user’s guide had mentioned as an example.
  • MPISERIAL_SUPPORT: the default is “FALSE”.

The only file that I really needed to edit was Macros.<machine name>. The env_machopts.<machine name> file ended up being empty for me. I spent a while confused by the modules declarations, which turned out to refer to the Environment Modules software. Once I realized that, for this software to be helpful, I would have to write five or six modulefiles in a language I didn’t know, I decided that it probably wasn’t worth the effort, and took these declarations out. I left mkbatch.<machine name> alone, except for the first line which sets the machine, and then turned my attention to Macros.

“Getting this to work will be an iterative process”, the user’s guide says, and it certainly was (and still is). It’s never a good sign when the installation guide reminds you to be patient! Here is the sequence of each iteration:

  1. Edit the Macros file as best I can.
  2. Open up the terminal, cd to cesm1_0/scripts, and create a new case as follows: ./create_newcase -case test -res f19_g16 -compset X -mach <machine name>
  3. If this works, cd to test, and run configure: ./configure -case
  4. If all is well, try to build the case: ./test.<machine name>.build
  5. See where it fails and read the build log file it refers to for ideas as to what went wrong. Search on Google for what certain errors mean. Do some other work for a while, to let the ideas simmer.
  6. Set up for the next case: ./test.<machine name>.clean_build , cd .., and rm -rf test. This clears out old files so you can safely build a new case with the same name.
  7. See step 1.

I wasn’t really sure what the program paths were, as I couldn’t find a nicely contained folder for each one (like Windows has in “Program Files”), but I soon stumbled upon a nice little trick: look up the package on Ubuntu Package Manager, and click on “list of files” under the Download section. That should tell you what path the program used as its root.

I also discovered that setting FC and CC to gfortran and gcc, respectively, in the Macros file will throw errors. Instead, leave the variables as mpif90 and mpicc, which are linked to the GNU compilers. For example, when I type mpif90 in the terminal, the result is gfortran: no input files, just as if I had typed gfortran. For some reason, though, the errors go away.

As soon as I made it past building the mct and pio libraries, the build logs for each component (eg atm, ice) started saying gmake: command not found. This is one of the pitfalls of Ubuntu: it uses the command make for the same program that basically every other Unix-based OS calls gmake. So I needed to find and edit all the scripts that called gmake, or generated other scripts that called it, and so on. “There must be a way to automate this,” I thought, and from this article I found out how. In the terminal, cd to the CESM source code folder, and type the following:

grep -lr -e 'gmake' * | xargs sed -i 's/gmake/make/g'

You should only have to do this once. It’s case sensitive, so it will leave the xml variable GMAKE_J alone.

Then I turned my attention to compiler flags, which Steve chronicled quite well in his notes (see link above). I made most of the same changes that he did, except I didn’t need to change -DLINUX to -DDarwin. However, I needed some more compiler flags still. In the terminal, man gfortran brings up a list of all the options for gfortran, which was helpful.

The ccsm build log had hundreds of undefined reference errors as soon as it started to compile fortran. The way I understand it, many of the fortran files reference each other, but gfortran likes to append underscores to user-defined variables, and then it can’t find the file the variable is referencing! You can suppress this using the flag -fno-underscoring.

Now I am stuck on a new error. It looks like the ccsm script is almost reaching the end, as it’s using ld, the gcc linking mechanism, to tie all the files together. Then the build log says:

/usr/bin/ld: seq_domain_mct.o(.debug_info+0x1c32): unresolvable R_386_32 relocation against symbol 'mpi_fortran_argv_null'
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status

I’m having trouble finding articles on the internet about similar errors, and the gcc and ld manpages are so long that trying every compiler flag isn’t really an option. Any ideas?

Update: Fixed it! In scripts/ccsm_utils/Build/Makefile, I changed LD := $(F90) to LD := gcc -shared. The build was finally successful! Now off to try and run it…

The good thing is that, since I re-started this project a few days ago, I haven’t spent very long stuck on any one error. I’m constantly having problems, but I move through them pretty quickly! In the meantime, I’m learning a lot about the model and how it fits everything together during installation. I’ve also come a long way with Linux programming in general. Considering that when I first installed Ubuntu a few months ago, and sheepishly called my friend to ask where to find the command line, I’m quite proud of my progress!

I hope this article will help future Ubuntu users install CESM, as it seems to have a few quirks that even Mac OS X doesn’t experience (eg make vs gmake). For the rest of you, apologies if I have bored you to tears!