Working Away

The shape of my summer research is slowly becoming clearer. Basically, I’ll be writing a document comparing the architecture of different climate models. This, of course, involves getting access to the source code. Building on Steve’s list, here are my experiences:

NCAR, Community Earth System Model (CESM): Password-protected, but you can get access within an hour. After a quick registration, you’ll receive an automated email with a username and password. This login information gives you access to their Subversion repository. Registration links and further information are available here, under “Acquiring the CESM1.0 Release Code”.

University of Victoria, Earth System Climate Model (ESCM): Links to the source code can be found on this page, but they’re password-protected. You can request an account by sending an email – follow the link for more information.

Geophysical Fluid Dynamics Laboratory (GFDL), CM 2.1: Slightly more complicated. Create an account for their Gforge repository, which is an automated process. Then, request access to the MOM4P1 project – apparently CM 2.1 is included within that. Apparently, the server grants you request to a project, so it sounds automatic – but the only emails I’ve received from the server regard some kind of GFDL mailing list, and don’t mention the project request. I will wait and see.
Update (July 20): It looks like I got access to the project right after I requested it – I just never received an email!

Max Planck Institute (MPI), COSMOS: Code access involves signing a licence agreement, faxing it to Germany, and waiting for it to be approved and signed by MPI. The agreement is not very restrictive, though – it deals mainly with version control, documenting changes to the code, etc.

UK Met Office, Hadley Centre Coupled Model version 3 (HadCM3): Our lab already has a copy of the code for HadCM3, so I’m not really sure what the process is to get access, but apparently it involved a lot of government paperwork.

Institut Pierre Simon Laplace (IPSL), CM5: This one tripped me up for a while, largely because the user guide is difficult to find, and written in French. Google Translate helped me out there, but it also attempted to “translate” their command line samples! Make sure that you have ksh installed, too – it’s quick to fix, but I didn’t realize it right away. Some of the components for IPSLCM5 are open access, but others are password-protected. Follow the user guide’s instructions for who to email to request access.

Model E: This was the easiest of all. From the GISS website, you can access all the source code without any registration. They offer a frozen AR4 version, as well as nightly snapshots of the work-in-process for AR5 (frozen AR5 version soon to come). There is also a wealth of documentation on this site, such as an installation guide and a description of the model.

I’ve taken a look at the structural code for Model E, which is mostly contained in the file MODELE.f. The code is very clear and well commented, and the online documentation helped me out too. After drawing a lot of complicated diagrams with arrows and lists, I feel that I have a decent understanding of the Model E architecture.

Reading code can become monotonous, though, and every now and then I feel like a little computer trouble to keep things interesting. For that reason, I’m continuing to chip away at building and running two models, Model E and CESM. See my previous post for how this process started.

<TECHNICAL COMPUTER STUFF> (Feel free to skip ahead…)

I was still having trouble viewing the Model E output (only one file worked on Panoply, the rest created an empty map) so I emailed some of the lab’s contacts at NASA. They suggested I install CDAT, a process which nearly broke Ubuntu (haven’t we all been there?) Basically, because it’s an older program, it thought the newest version of Python was 2.5 – which it subsequently installed and set as the default in /usr/bin. Since I had Python 2.6 installed, and the versions are apparently very not-backwards-compatible, every program that depended on Python (i.e. almost everything on Ubuntu) stopped working. Our IT contact managed to set 2.6 back as the default, but I’m not about to try my hand at CDAT again…

I have moved forward very slightly on CESM. I’ve managed to build the model, but upon calling test.<machine name>.run, I get rather an odd error:

./Tools/ccsm_getenv: line 9: syntax error near unexpected token '(' ./Tools/ccsm_getenv: line 9: 'foreach i (env_case.xml env_run.xml env_conf.xml env_build.xml env_mach_pes.xml)'

Now, I’m pretty new at shell scripting, but I can’t see the syntax error there – and wouldn’t syntax errors appear at compile-time, rather than run-time?

A post by Michael Tobis, who had a similar error, suggested that the issue had to do with qsub. Unfortunately, that meant I had to actually use qsub – I had previously given up trying to configure Torque to run on a single machine rather than many. I gave the installation another go, and now I can get scripts into the queue, but they never start running – their status stays as “Q” even if I leave the computer alone for an hour. Since the machine has a dual-core processor, I can’t see why it couldn’t run both a server and a node at once, but it doesn’t seem to be working for me.

</TECHNICAL COMPUTER STUFF>

Before I started this job, climate models seemed analogous to Antarctica – a distant, mysterious, complex system that I wanted to visit, but didn’t know how to get to. In fact, they’re far more accessible than Antarctica. More on the scale of a complicated bus trip across town, perhaps?

They are not perfect pieces of software, and they’re not very user friendly. However, all the struggles of installation pay off when you finally get some output, and open it up, and see realistic data representing the very same planet you’re sitting on! Even just reading the code for different models shows you many different ways to look at the same system – for example, is sea ice a realm of its own, or is it a subset of the ocean? In the real world the lines are blurry, but computation requires us to make clear divisions.

The code can be unintelligible (lndmaxjovrdmdni) or familiar (“The Stefan-Boltzmann constant! Finally I recognize something!”) or even entertaining (a seemingly random identification string, dozens of characters long, followed by the comment if you edit this you will get what you deserve). When you get tied up in the code, though, it’s easy to miss the bigger picture: the incredible fact that we can use the sterile, binary practice of computation to represent a system as messy and mysterious as the whole planet. Isn’t that something worth sitting and marveling over?

8 thoughts on “Working Away”

silence on Jun 27, 2011 at 5:54 pm said:

Shell scripts are an interpreted language, with the interpreter not looking at the individual line until it is ready to execute that line. As such, syntax errors won’t show up until you run the script.

In this case, it looks like you’re interpreting a script which was written to be interpreted by a csh-style shell using an sh-style shell.

Try changing the first line of the script to #!/bin/csh (or wherever your actual csh is located).

The first line is already #!/bin/csh -f. I tried taking out the -f, but it didn’t make any difference. -Kate

Reply ↓
- frank -- Decoding SwiftHack on Jun 28, 2011 at 10:24 am said:
  
  Kate:
  
  In what matter did you invoke the test…..run script?
  
  If you use
  
  . test….run
  
  it will just interpret the commands under your current (Bourne) shell session, so don’t do that.
  
  — frank
  
  First I just tried calling ./test.mach.run (which gave me the error), then I tried using qsub test.mach.run (but I don’t think my torque installation is working). I also tried ./test.mach.submit, but that does the same thing as qsub. -Kate
  
  Reply ↓
  - Michael Tobis on Jul 7, 2011 at 9:47 pm said:
    
    Yeah, I’m really talking to myself on the aardvark blog. It keeps me sane.
    
    I’ve been babysitting twenty 64 cpu production runs at a time of an older model; enough goes wrong to keep me busy. Haven’t untarred CESM again yet. But I remember this problem. qsub has nothing to do with it, and I don’t think torque has anything to do with qsub as I see it anyway.
    
    In my case at least it was that the shell seeing that code was not csh, and it’s csh specific code.
    
    You can check easily enough by putting
    
    echo $SHELL
    
    above the offending line.
    
    Worst case, you can unwind the loop. It’s turned out to be an awfully inconvenient convenience.
- frank -- Decoding SwiftHack on Jun 28, 2011 at 10:36 am said:
  
  Kate:
  
  OK, I think I get what MT was talking about. My guess is that the script started off correctly under csh, but when it submit a job to the computing cluster via the qsub program, it treated the submitted job script as an sh script and not a csh script. You probably need to add an option to qsub to make it run as csh, but I don’t have the qsub manual page with me…
  
  — frank
  
  Reply ↓
  - frank -- Decoding SwiftHack on Jun 28, 2011 at 10:41 am said:
    
    Kate:
    
    …ugh, maybe that’s not the case.
    
    If you didn’t run the script using qsub, then probably qsub didn’t cause the problem.
    
    Try this:
    
    csh -x ./test.mach.run
    
    It’ll dump a lot of stuff about what the csh interpreter is doing, so may help to pinpoint the problem.
    
    — frank
    
    Okay, now it gets past the line that previously threw a syntax error, as well as the BUILDNML and PRESTAGE scripts. It makes a few directories, sets the date, and finally throws this error:
    
    ./Tools/ccsm_postrun.csh ls: No match. Model did not complete - no cpl.log file present - exiting exit 1
    
    In the postrun file, the problematic section seems to be:
    
    set CplLogFile = 'ls -1t cpl.log* | head -1' if ($CplLogFile == "") then echo "Model did not complete - no cpl.log file present - exiting" exit -1 endif
    
    In the directory where I should have log files, I only have bldlog files. -Kate
frank -- Decoding SwiftHack on Jun 28, 2011 at 8:42 am said:

Kate:

Basically, because it’s an older program, it thought the newest version of Python was 2.5 — which it subsequently installed and set as the default in /usr/bin.

Argh!

What I’d probably do in such a situation is to install from the source rather than from precompiled packages: i.e. download the source, hack it to call /usr/bin/python2.5 (or whatever) instead of /usr/bin/python, then compile and install.

And, if you have trouble understanding certain parts of the code, you can consider adding debug output statements at strategic points to see what goes in and what comes out. (I’m not familiar enough with Fortran to know how exactly to do it, unfortunately…) Last I know, gdb doesn’t have support for debugging Fortran, so this is probably the best way for now.

(Um… “if you edit this you will get what you deserve”? Now that is a function that I’d definitely want to edit.)

— frank

Reply ↓
pendantry on Jun 29, 2011 at 3:53 pm said:

Computers. I love ’em :)

Reply ↓
Sinal Infection on Jul 26, 2011 at 6:52 pm said:

I’m working on getting ModelE up and running right now. Got it to the point where it would run on OSX, until it had to open up an input data file, at which point it pukes up a hairball.

With Ubuntu, I gave up porting the ancient, creaky *4 release to linuxland, and grabbed a recent snapshot. It doesn’t build, crashing on what appears to be a failed dependency for “domain_decomp_atm”.

Gah….

Reply ↓

ClimateSight

Climate science from the inside

Working Away

8 thoughts on “Working Away”

Leave a comment Cancel reply

Share this:

Related

8 thoughts on “Working Away”

Leave a comment Cancel reply