Now that my poster is finished, I am taking one last crack at getting CESM to run. Last time I wrote, I mentioned that the model execution was failing without giving any error messages (except the occasional “Segmentation fault”).
Michael Tobis thought that the problem had to do with mpiexec, so today I tried something new. I uninstalled mpich2 and replaced it with openmpi which I had built manually (as opposed to using apt-get). Now, when the model fails, the ccsm.log file actually says something:
mpiexec noticed that process rank 0 with PID 1846 on node
computer name exited on signal 11 (Segmentation fault).
15 total processes killed (some possibly by mpiexec during cleanup)
Perhaps the problem is still with MPI. It seems unlikely that the segfault is due to a problem with the code itself (eg an undeclared variable), seeing as this version has been tested and used by NCAR. Maybe gcc is the issue, and I should play around with some compiler flags? Any suggestions would be welcome.