Part 1: Model E
I felt a bit over my head attempting to port CESM, so I asked a grad student, who had done his Master’s on climate modelling, for help. He looked at the documentation, scratched his head, and suggested I start with NASA’s Model E instead, because it was easier to install. And was it ever! We had it up and running within an hour or so. It was probably so much easier because Model E comes with gfortran support, while CESM only has scripts written for commercial compilers like Intel or PGI.
Strangely, when using Model E, no matter what dates the rundeck sets for the simulation start and end, the subsequently generated I file always has December 1, 1949 as the start date and December 2, 1949 as the end date. We edited the I files after they were created, which seemed to fix the problem, but it was still kind of weird.
I set up Model E to run a ten-year simulation with fixed atmospheric concentration (really, I just picked a rundeck at random) over the weekend. It took it about 3 days to complete, so just over 7 hours per year of simulation time…not bad for a 32-bit desktop!
However, I’m having some weird problems with the output – after configuring the model to output files in NetCDF format and opening them in Panoply, only the file with all the sea ice variables worked. All the others either gave a blank map (array full of N/A’s) or threw errors when Panoply tried to read them. Perhaps the model isn’t enjoying having the I file edited?
Part 2: CESM
After exploring Model E, I felt like trying my hand at CESM again. Steve managed to port it onto his Macbook last year, and took detailed notes. Editing the scripts didn’t seem so ominous this time!
The CESM code can be downloaded using Subversion (instructions here) after a quick registration. Using the Ubuntu Software Center, I downloaded some necessary packages:
torque-scheduler. I already had
gfortran, which is sort of essential.
I used the Porting via user defined machine files method to configure the model for my machine, using the Hadley scripts as a starting point. Variables for the config_machines.xml are explained in Appendix D through H of the user’s guide (links in chapter 7). Mostly, you’re just pointing to folders where you want to store data and files. Here are a few exceptions:
DOUT_L_HTAR: I stuck with
"TRUE", as that was the default.
CCSM_CPRNC: this tool already exists in the CESM source code, in
BATCHSUBMIT: the Hadley entry had “qstat” and “qsub”, respectively, so I Googled these terms to find out which batch submission software they referred to (Torque, which is freely available in the
torque-schedulerpackage) and downloaded it so I could keep the commands the same!
GMAKE_J: this determines how many processors to commit to a certain task, and I wasn’t sure how many this machine had, so I just put “1”.
MAX_TASKS_PER_NODE: I chose
"8", which the user’s guide had mentioned as an example.
MPISERIAL_SUPPORT: the default is “FALSE”.
The only file that I really needed to edit was
Macros.<machine name>. The
env_machopts.<machine name> file ended up being empty for me. I spent a while confused by the
modules declarations, which turned out to refer to the Environment Modules software. Once I realized that, for this software to be helpful, I would have to write five or six modulefiles in a language I didn’t know, I decided that it probably wasn’t worth the effort, and took these declarations out. I left
mkbatch.<machine name> alone, except for the first line which sets the machine, and then turned my attention to Macros.
“Getting this to work will be an iterative process”, the user’s guide says, and it certainly was (and still is). It’s never a good sign when the installation guide reminds you to be patient! Here is the sequence of each iteration:
- Edit the Macros file as best I can.
- Open up the terminal,
cesm1_0/scripts, and create a new case as follows:
./create_newcase -case test -res f19_g16 -compset X -mach <machine name>
- If this works,
test, and run configure:
- If all is well, try to build the case:
- See where it fails and read the build log file it refers to for ideas as to what went wrong. Search on Google for what certain errors mean. Do some other work for a while, to let the ideas simmer.
- Set up for the next case:
cd .., and
rm -rf test. This clears out old files so you can safely build a new case with the same name.
- See step 1.
I wasn’t really sure what the program paths were, as I couldn’t find a nicely contained folder for each one (like Windows has in “Program Files”), but I soon stumbled upon a nice little trick: look up the package on Ubuntu Package Manager, and click on “list of files” under the Download section. That should tell you what path the program used as its root.
I also discovered that setting
gcc, respectively, in the Macros file will throw errors. Instead, leave the variables as
mpicc, which are linked to the GNU compilers. For example, when I type
mpif90 in the terminal, the result is
gfortran: no input files, just as if I had typed
gfortran. For some reason, though, the errors go away.
As soon as I made it past building the mct and pio libraries, the build logs for each component (eg atm, ice) started saying
gmake: command not found. This is one of the pitfalls of Ubuntu: it uses the command
make for the same program that basically every other Unix-based OS calls
gmake. So I needed to find and edit all the scripts that called
gmake, or generated other scripts that called it, and so on. “There must be a way to automate this,” I thought, and from this article I found out how. In the terminal, cd to the CESM source code folder, and type the following:
grep -lr -e 'gmake' * | xargs sed -i 's/gmake/make/g'
You should only have to do this once. It’s case sensitive, so it will leave the xml variable
Then I turned my attention to compiler flags, which Steve chronicled quite well in his notes (see link above). I made most of the same changes that he did, except I didn’t need to change
-DDarwin. However, I needed some more compiler flags still. In the terminal,
man gfortran brings up a list of all the options for gfortran, which was helpful.
The ccsm build log had hundreds of
undefined reference errors as soon as it started to compile fortran. The way I understand it, many of the fortran files reference each other, but gfortran likes to append underscores to user-defined variables, and then it can’t find the file the variable is referencing! You can suppress this using the flag
Now I am stuck on a new error. It looks like the ccsm script is almost reaching the end, as it’s using ld, the gcc linking mechanism, to tie all the files together. Then the build log says:
/usr/bin/ld: seq_domain_mct.o(.debug_info+0x1c32): unresolvable R_386_32 relocation against symbol 'mpi_fortran_argv_null'
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
I’m having trouble finding articles on the internet about similar errors, and the gcc and ld manpages are so long that trying every compiler flag isn’t really an option. Any ideas?
Update: Fixed it! In
scripts/ccsm_utils/Build/Makefile, I changed
LD := $(F90) to
LD := gcc -shared. The build was finally successful! Now off to try and run it…
The good thing is that, since I re-started this project a few days ago, I haven’t spent very long stuck on any one error. I’m constantly having problems, but I move through them pretty quickly! In the meantime, I’m learning a lot about the model and how it fits everything together during installation. I’ve also come a long way with Linux programming in general. Considering that when I first installed Ubuntu a few months ago, and sheepishly called my friend to ask where to find the command line, I’m quite proud of my progress!
I hope this article will help future Ubuntu users install CESM, as it seems to have a few quirks that even Mac OS X doesn’t experience (eg make vs gmake). For the rest of you, apologies if I have bored you to tears!
Hi, Kate. I am about to keep you company, with my third attempt to get this beast running, in my case on a Red Hat cluster. I have been resisting getting started. This is really my most unfavorite part of the whole business, getting a new model running.
You can see my previous attempt starting here: http://aardvarknoplay.blogspot.com/2010/12/after-much-moaning.html
I do kvetch and moan a lot when I do this stuff. Your calm perseverance is something of an inspiration to me, though you’d expect me to be used to it by now. And while I encourage keeping a log in a public place so people can learn from each other, I think it’s worthwhile to start another blog for the purpose because the general readership won’t be interested.
I doubt I’ll be blogging on model installation much more, it isn’t the main part of my job. I am mostly looking at the code for models. Running them is mostly just for fun, as well as getting a feel for the output and user interface. -Kate
Are you planning to use the T31 resolution? Probably on a small machine that would be best.
“It’s never a good sign when the installation guide reminds you to be patient!” +1 QOTW
gmake vs. make: I found the following advice:
sudo ln -s `which make` /usr/local/bin/gmake
after which gmake should work too
Yes, I tried that first, but for some reason it didn’t work! Typing “gmake” into the terminal still brought up “command not found”. -Kate
I think mpif90 and mpicc are the message passing libriaries used and f90 is probably the one that knows how to send messages to fortran processes. Looking at MPI docs might help explain ‘mpi_fortran_argv_null’
Seems you’re having fun! :) Personally I think software porting isn’t really something that should be OS novices. So essentially you’re diving into the deep end.
A more proper fix would be to instead use `gcc -fPIC -shared’, and also change all the compilation commands that produce .o files destined to go into shared library to read `gcc -fPIC’ instead of just `gcc’.
This’ll produce shared libraries containing position-independent code (PIC); the advantage of PIC is that if two separate processes load the same shared library, there’ll only need to be one copy of the library (not two) in physical memory.
About the underscores/no-underscores thing, I can think of three reasons for the mismatch:
(1) you compiled part of the code with one underscoring convention, and another with a different underscoring convention;
(2) you used a precompiled Fortran library built for a different underscoring convention;
(3) the Fortran model code interfaces with code not written in Fortran, e.g. C or assembly language.
Generally, if all the code were written in Fortran, then it should be unnecessary for you to specify any underscoring convention either way. You should be able to expect all existing libraries to be compiled with the same convention, either all will be compiled with underscores (everyone uses `_mpi_fortran_argv_null’), or all will be compiled without underscores (everyone uses `mpi_fortran_argv_null’).
(I’ve not yet looked at the ModelE code myself. But perhaps I should!)
It is very, very nice. Very clean and easy to understand and execute. Especially compared to CESM, which is now finding imaginary syntax errors in the run files… -Kate
Syntax? Oh, I despair: they’re gonna tax syn now?
I’m trying to port the CESM 1.0.3 to a RedHat cluster, but I’ve run into an error that I’m not sure how to resolve.
Error: CASE label at (1) overlaps with CASE label at (2)
If you’ve run across this error in your work and could offer any advice it would be greatly appreciated.
I haven’t come across that particular error, but if I were you I would take a look at your Macros file. When the error is with mct or pio, it usually means that your environment variables are incorrect. Double check your paths to MPI and NetCDF, and play around with some compiler flags. Good luck, and let us know if you figure it out!
Thanks! I never found the source of that error, but I downloaded the libopenmpi-dev package, and used the mpif90.openmpi and mpicc.openmpi commands for FC and CC and everything built successfully.
could please help me fix it… thank you kate