[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bgl-discuss] Blue Gene and mpirun error codes



 Dear Susan,
your 'cp' is perfectly right. I just copied an old passage without
verifying it again.
Sorry
Jutta


Susan Coghlan wrote:
Hi Jutta,

Thank you for responding to this email.  I'm setting up what you've
suggested and have a quick question.  The mkdir command makes the
directory /bgl/local/opt/ibmcmp/msg but the cp command that copies the
actual translation files from the standard compiler installation directory
copies them to a different directory -  /bgl/local/opt/xlcmp/msg/.  xlcmp
instead of ibmcmp.  My guess is that they are trying to set up the env
on the IONodes to be identical to the standard env (i.e. the ones on
the frontends).  If so, then xlcmp is wrong and the copy command
should be:

cp -p /opt/ibmcmp/msg/en_US /bgl/local/opt/ibmcmp/msg/

Is that correct?  Or do I need to make the /bgl/local/opt/xlcmp/msg dir
and put the en_US dir in there?

Thanks,
Susan.

On Fri, 4 Nov 2005, Jutta Docter wrote:


 Hi Fiona,
we also ran into into the problem with the compiler messages.
Our IBM site engineer Michael Hennecke found a workaround and
we have opened PMR 92722,033,724 for this.
Regards
Jutta


Problem: XL compiler runtime errors on BlueGene only produce an 15xx-xxx error number, but no error texts. . This is apparently caused by the fact that the message catalogs reside in /opt/ibmcmp/msg, which exists on the SN and FENs but not on the BG nodes.

Here is a workaround, verified with V1R1M0:
.
# copy the message catalogs to a place below /bgl...
mkdir -p /bgl/local/opt/ibmcmp/msg/
cp -p /opt/ibmcmp/msg/en_US /bgl/local/opt/xlcmp/msg/
.
# create a SITEDIST rc file to point /opt to it on the IONs...
mkdir -p /bgl/dist/etc/rc.d/init.d
cd       /bgl/dist/etc/rc.d/init.d
cat <<E_O_F >> ./xlmsg
.
# link to XL compiler runtime message catalogs
#
export LANG=en_US
ln -s /bgl/local/opt /opt
E_O_F
.
# activate the new rc file for runlevel 3...
mkdir -p /bgl/dist/etc/rc.d/rc3.d
cd       /bgl/dist/etc/rc.d/rc3.d
ln -s ../init.d/xlmsg ./S10xlmsg
.

Fiona Reid wrote:

Hi Everyone,

I have a question regarding the error codes returned by mpirun:

For helloworld which runs normally:
<Nov  4 10:50:30> FE_MPI (Info) : BG/L job exit status = (0)
<Nov  4 10:50:44> FE_MPI (Info) : == Exit status:   0 ==

For a helloworld which is forced to core dump: <Nov  4 10:53:05> BE_MPI
(ERROR): The error message in the job record is as follows:
<Nov  4 10:53:05> BE_MPI (ERROR):   "killed with signal 5"
<Nov  4 10:53:06> FE_MPI (Info) : BG/L job exit status = (133)
<Nov  4 10:53:21> FE_MPI (Info) : == Exit status: 133 ==

For the OCCAM code which crashes but doesn't produce a core: 1525-003
1525-003
1525-001
<Nov  4 10:57:10> BE_MPI (ERROR): The error message in the job record is
as follows:
<Nov  4 10:57:10> BE_MPI (ERROR):   "killed by exit(1) on node 0"
<Nov  4 10:57:11> FE_MPI (Info) : BG/L job exit status = (1)
<Nov  4 10:57:26> FE_MPI (Info) : == Exit status:   0 ==

Can anyone explain the what the different error codes mean?
E.g. what do 1525-001  and 1525-003 mean?
    what does "killed with signal 5" mean?
    what does "kill by exit(1) on node 0" mean?
    what does "Exit status: 133" mean?

Many thanks,

Fiona

- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.


--
--------------------------------------------------------------
Jutta Docter                    E-mail: J.Docter@xxxxxxxxxxxxx
Forschungszentrum Juelich GmbH  Phone:  (+49) 2461 61-6763
ZAM                             Fax:    (+49) 2461 61-6656
D 52425 Juelich                 GERMANY
--------------------------------------------------------------


- -------------------------------------------------------------------- To add or remove yourself from this mailing list, use the 'notifyme' command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.




--
--------------------------------------------------------------
Jutta Docter                    E-mail: J.Docter@xxxxxxxxxxxxx
Forschungszentrum Juelich GmbH  Phone:  (+49) 2461 61-6763
ZAM                             Fax:    (+49) 2461 61-6656
D 52425 Juelich                 GERMANY
--------------------------------------------------------------


- -------------------------------------------------------------------- To add or remove yourself from this mailing list, use the 'notifyme' command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.