Hi Fiona,
we also ran into into the problem with the compiler messages.
Our IBM site engineer Michael Hennecke found a workaround and
we have opened PMR 92722,033,724 for this.
Regards
Jutta
Problem:
XL compiler runtime errors on BlueGene only produce
an 15xx-xxx error number, but no error texts.
.
This is apparently caused by the fact that the
message catalogs reside in /opt/ibmcmp/msg,
which exists on the SN and FENs but not on the
BG nodes.
Here is a workaround, verified with V1R1M0:
.
# copy the message catalogs to a place below /bgl...
mkdir -p /bgl/local/opt/ibmcmp/msg/
cp -p /opt/ibmcmp/msg/en_US /bgl/local/opt/xlcmp/msg/
.
# create a SITEDIST rc file to point /opt to it on the IONs...
mkdir -p /bgl/dist/etc/rc.d/init.d
cd /bgl/dist/etc/rc.d/init.d
cat <<E_O_F >> ./xlmsg
.
# link to XL compiler runtime message catalogs
#
export LANG=en_US
ln -s /bgl/local/opt /opt
E_O_F
.
# activate the new rc file for runlevel 3...
mkdir -p /bgl/dist/etc/rc.d/rc3.d
cd /bgl/dist/etc/rc.d/rc3.d
ln -s ../init.d/xlmsg ./S10xlmsg
.
Fiona Reid wrote:
Hi Everyone,
I have a question regarding the error codes returned by mpirun:
For helloworld which runs normally:
<Nov 4 10:50:30> FE_MPI (Info) : BG/L job exit status = (0)
<Nov 4 10:50:44> FE_MPI (Info) : == Exit status: 0 ==
For a helloworld which is forced to core dump: <Nov 4 10:53:05> BE_MPI
(ERROR): The error message in the job record is as follows:
<Nov 4 10:53:05> BE_MPI (ERROR): "killed with signal 5"
<Nov 4 10:53:06> FE_MPI (Info) : BG/L job exit status = (133)
<Nov 4 10:53:21> FE_MPI (Info) : == Exit status: 133 ==
For the OCCAM code which crashes but doesn't produce a core: 1525-003
1525-003
1525-001
<Nov 4 10:57:10> BE_MPI (ERROR): The error message in the job record is
as follows:
<Nov 4 10:57:10> BE_MPI (ERROR): "killed by exit(1) on node 0"
<Nov 4 10:57:11> FE_MPI (Info) : BG/L job exit status = (1)
<Nov 4 10:57:26> FE_MPI (Info) : == Exit status: 0 ==
Can anyone explain the what the different error codes mean?
E.g. what do 1525-001 and 1525-003 mean?
what does "killed with signal 5" mean?
what does "kill by exit(1) on node 0" mean?
what does "Exit status: 133" mean?
Many thanks,
Fiona
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.
--
--------------------------------------------------------------
Jutta Docter E-mail: J.Docter@xxxxxxxxxxxxx
Forschungszentrum Juelich GmbH Phone: (+49) 2461 61-6763
ZAM Fax: (+49) 2461 61-6656
D 52425 Juelich GERMANY
--------------------------------------------------------------
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.