I apologize for my brief hiatus – it’s been almost two weeks since I’ve posted. I have been very busy recently, but for a very exciting reason: I got a job as a summer student of Dr. Steve Easterbrook! You can read more about Steve and his research on his faculty page and blog.
This job required me to move cities for the summer, so my mind has been consumed with thoughts such as “Where am I and how do I get home from this grocery store?” rather than “What am I going to write a post about this week?” However, I have had a few days on the job now, and as Steve encourages all of his students to blog about their research, I will use this outlet to periodically organize my thoughts.
I will be doing some sort of research project about climate modelling this summer – we’re not yet sure exactly what, so I am starting by taking a look at the code for some GCMs. The NCAR Community Earth System Model is one of the easiest to access, as it is largely an open source project. I’ve only read through a small piece of their atmosphere component, but I’ve already seen more physics calculations in one place than ever before.
I quickly learned that trying to understand every line of the code is a silly goal, as much as I may want to. Instead, I’m trying to get a broader picture of what the programs do. It’s really neat to have my knowledge about different subjects converge so completely. Multi-dimensional arrays, which I have previously only used to program games of Sudoku and tic-tac-toe, are now being used to represent the entire globe. Electric potential, a property I last studied in the circuitry unit of high school physics, somehow impacts atmospheric chemistry. The polar regions, which I was previously fascinated with mainly for their wildlife, also present interesting mathematical boundary cases for a climate model.
It’s also interesting to see how the collaborative nature of CESM, written by many different authors and designed for many different purposes, impacts its code. Some of the modules have nearly a thousand lines of code, and some have only a few dozen – it all depends on the programming style of the various authors. The commenting ranges from extensive to nonexistent. Every now and then one of the files will be written in an older version of Fortran, where EVERYTHING IS IN UPPER CASE.
I am bewildered by most of the variable names. They seem to be collections of abbreviations I’m not familiar with. Some examples are “mxsedfac”, “lndmaxjovrdmdni”, “fxdd”, and “vsc_knm_atm”.
When we get a Linux machine set up (I have heard too many horror stories to attempt a dual-boot with Windows) I am hoping to get a basic CESM simulation running, as well as EdGCM (this could theoretically run on my laptop, but I prefer to bring that home with me each evening, and the simulation will probably take over a day).
I am also doing some background reading on the topic of climate modelling, including this book, which led me to the story of PHONIAC. The first weather prediction done on a computer (the ENIAC machine) was recreated as a smartphone application, and ran approximately 3 million times faster. Unfortunately, I can’t find anyone with a smartphone that supports Java (argh, Apple!) so I haven’t been able to try it out.
I hope everyone is having a good summer so far. A more traditional article about tornadoes will be coming at the end of the week.
I really don’t like looking at other peoples code.
Some programmers try to be clever and write code that is difficult to understand. You can sit staring at it for ages and it doesn’t make sense.
I try and use verbose and logical names for variables, functions etc. In the past it was important to be economical because individual memory bytes were precious, but today there is little excuse for economical names.
Also plenty of comments in the code is important.
It’s not just for other people, readable and well documented code is for the original author as well, since after a year or two, the code becomes gobbledygook for the originator if they don’t document it.
If you view the source code on the data visualisation app:
http://www.skepticalscience.com/climate_science_history.php
You’ll see my ‘style’. I’m not paid these days to do a lot of coding, so maybe I’m out of touch. But having looked at a lot of code on the internet, I’m often appalled with the lack of explanations/documentation for the code.
Your style of coding, particularly in naming variables, seems very similar to what they teach us in intro comp sci these days. -Kate
Kate,
Well done.
Dont be too critical of the older code. Times were very different then, storage at a huge premium. New code would be rewritten to remove characters, not just lines. Comment lines where kept to a minimum.
Onthe otherhand it means there is considerable scope to update the old code. You would be amazed how much they achieved with such restricted computational power.
That is how the milenium bug arrived, in the sixties and early seventies using 4 digits for a year was an unaffordable luxury. Even then they knew it would be a problem, but not one that could be addressed at that stage.
Makes sense. Thanks for the context. I’m not sure exactly when various pieces of this code were written. -Kate
The Unix-based Apple Computer is the only way to go. OS X 10.6 is far better than Microsoft’s lame excuse of an operating system. I’m currently using XCode 3.2.6 to develop software. This version uses the GNU gcc 4.2 compiler, which is open source and used around the world. If you’ve used Visual Studio, then learning XCode will be a snap. I worked many years in the Unix environment and was eventually forced onto PCs and Microsoft’s nightmare, where I used Visual Studio for a number of years. Now that I’ve retired I’m back to my home friend, the Mac and XCode, where we get along just fine.
Bewildered by cryptic variable names? Back in the old days before you were in diapers and when computer memory consisted of little magnetic doughnuts called cores, variable names were limited to eight characters or so. Programmers were forced to invent shorthand that only they could understand until three weeks or three months later. And of course they didn’t write anything down. What was their solution when the program needed updating or fixing? Hand it to some students and let them figure it out! Been there. Done that.
Yes, I can definitely see how a Mac would be preferable to a Windows machine for programming purposes. I can’t really afford the hardware, though! :) -Kate
Except that Real Programmers code in text, so it doesn’t matter what the target platform is.
Oh dear: are we about to descend into another iteration of the old-as-the-hills PC -v- Mac debate?
Macs maybe expensive, but they last a long time and the hardware is generally reliable. I think it is a question of quality versus quantity. At least this has been my experience.
I use both a Mac and a rebuilt secondhand PC.
If the model you’re working on is anything like some of the commercial software I’ve worked on, you’re not going to be able to understand all of it. The most you can hope for is for it to be modularized enough that you can understand a single module.
A summer is also a VERY short time in which to try and become a contributor to a large-scale software project in which thousands of man-years have already been invested. I’d be really shocked if you managed to do more than make minor tweaks as you ramp up.
I’m not planning to make any changes to the model – just understand the general architecture so I can write about it, compare similar projects, etc. -Kate
“(I have heard too many horror stories to attempt a dual-boot with Windows)”
Actually it isn’t that hard. The main thing to remember is to install windows first, then linux. Most linux installers will see the windows install and properly configure the bootloader.
But if you don’t want to mess around with partitions and boot loaders Ubuntu (one of the more user friendly linux distros, has a super easy to use installer that run within windows and sets everything up automatically. And as a plus if you decide you don’t like it you can get rid of the whole thing in the add/remove programs list.
And even though this sounds like a virtual machine, it isn’t. http://www.ubuntu.com/download/ubuntu/windows-installer
The other option is setting up a virtual machine like virtalbox. Also pretty easy, but your computer should be fairly powerful if you expect decent performance.
And if you really want a mac you can get OSX to run on PC hardware, but it takes some tinkering. Look up hackintosh
The lab has some extra Linux boxes lying around, so I’ll probably just use one of those. Thanks for the tips, though. -Kate
Eh, you can install a Linux copy on a loopback disk inside Windows using Wubi:
https://wiki.ubuntu.com/WubiGuide
Avoids the “nightmare” of repartitioning. Works fine for me!
…and by the way, is this the first time you install yourself any version of Unix? If so, savour the moment you’re logging in as root for the very first time; it’s a rite of passage, your entry into a greater world, stepping into a stream of history that has seen greatness… back when the ships were of wood and the men of iron, eh, I mean when men were men and wrote their own device drivers… (note to self: find a less sexist saying)
No, I’ve already installed Ubuntu on an old desktop back home that won’t run anything else. Don’t worry, though, I savoured it :) -Kate
Ah, staring at obscure FORTRAN code. Welcome to the club!
If you’re still interested in looking at PHONIAC, then you might be able to run it on your laptop using the mpowerplayer SDK. It’s a program that runs MIDP apps on a normal computer that can run Java. MIDP is the programming interface one uses when writing mobile phone apps in Java, and mpowerplayer can be used as a sort of emulator when one is developing Java mobile phone apps. I haven’t used it for 5 years (but it doesn’t seem to have changed… hmm); other MIDP SDKs are available (especially for linux).
That long variable name seems the easiest to understand.
natural log of dmaxj over dmdni, a partial differential, perhaps.
Keep us posted on your summer research!
I wonder if you’d be able to help a non-scientist understand something about GCMs? I’m a (java-coding) social scientist with not much maths and a lot of tie-die t-shirt level knowledge of chaotic systems. In my limited understanding of modelling, I’ve always been confused what the purpose of GCMs was. In my own field, I’m always a little suspicious of folks adding more detail to models, since often I can’t see a justification for it, and all it does is obscure what would otherwise be a simpler underlying dynamic. With GCMs, I presume the reason for adding more detail is: it’s the boundary conditions that are of interest, and the more micro-detail (and the more ensembles run with them) the better fix on the statistically likely macro outcomes at the boundary. Is that what GCMs are after? i.e. simply more accurate global mean modelling? Would I be right in saying that we can’t expect decent higher-resolution predictions? I have some vague notion I read somewhere that people *were* after regional-level prediction, but I wonder if that’s considered possible?
A little knowledge is a dangerous thing: I’ve only a little knowledge and am confused! If you ever had time to write something on the overall aims of the various GCM projects, that’d be awesome. (Or if there are some good links you can recommend…)