[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bgl-discuss] NFS problems with multiple writes from BGL applications
Hallo,
we are experiencing missing or zero-length files when applications
write from multiple processes to NFS filesystems and this
applications are run again in short intervals. They appear
in the mmcs log as: nfs_refresh_inode: inode number mismatch
We have reported this in PMR 92548,033,724 and IBM has tracked
down the problem to:
BG/L reboots for each job
this causes the same XIDs are used over and over again.
Some NFS requests then may appear to be duplicates to the
NFS server and it sends cached but old replies to the
clients....
Unfortunately the server cache cannot be disabled with nfso.
As a workaround one could generate enough activity between
the runs to get the cache flushed or call mpirun with the
option -nofree. But then you need to free the partition
after finishing the runs.
So we are wondering: Has anyone else experienced this problem and
who else is operating BGL with user filesystems mounted via NFS?
Regards
Jutta Docter
--
--------------------------------------------------------------
Jutta Docter E-mail: J.Docter@xxxxxxxxxxxxx
Forschungszentrum Juelich GmbH Phone: (+49) 2461 61-6763
ZAM Fax: (+49) 2461 61-6656
D 52425 Juelich GERMANY
--------------------------------------------------------------
- --------------------------------------------------------------------
To add or remove yourself from this mailing list, use the 'notifyme'
command on any BGL machine. To remove: notifyme -n, to add: notifyme -y.