[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bgl-discuss] mpi_abort does not stop the job
In a run of mine, the slaves detected an error and called mpi_abort.
However the master did not and the job just hung the next time the
master tried to communicate with the slaves. It was killed when it ran
out of time.
the stderr has the following showing mpi_abort was called:
7: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 7
2: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 2
8: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 8
1: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
3: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3
4: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 4
9: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 9
5: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 5
6: application called MPI_Abort(MPI_COMM_WORLD, 0) - process 6
Has this been reported as a bug to IBM already?
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.