[Bio-linux-dev] bio-linux-backups 'sync' problem
Tony Travis
a.travis at abdn.ac.uk
Sun Oct 17 15:57:15 EDT 2010
On 14/10/10 11:20, Tim Booth wrote:
> Hi Tony,
>
> Interesting. I just took a USB stick, made 2 partitions and formatted
> one as NTFS and the other as VFAT. I then plugged it in and let the
> hotplug magic auto-mount the two partitions under /media. I then
> started 2 simultaneous processes doing "cat /dev/urandom> foo" on each
> partition and when they were under way I yanked the stick out.
>
> In this case, my system sorted itself out right away. "IO Error" is
> printed, the writing processes are killed and the mounts are cleared up.
> Sync works fine. This is what I hoped would happen on a modern Linux
> kernel. If I end up with my system in a state where "sync" hangs
> indefinitely then I generally reckon there is a big problem and an
> urgent reboot is required. What I don't know is how commonly this
> results from untimely removal of a USB stick or other factors (Novell
> network mounts are bad for this). Most probably triggering of the
> problem by untimely removal of a USB device is dependent on the exact
> hardware, the kernel drivers, DBUS quirks and any number of timing
> conditions.
Hi, Tim.
In our case, "mount.ntfs-3g" itself was the dead-locked process: It
seems likely that the USB stick had been removed while it was still
being auto-mounted, as in plug the stick in then yank it out because the
user changed their mind or nothing seemed to be happening...
However unlikely this scenario is, it actually happened on one of our
NBX's and, as a consequence, the NBX concerned was not backed for ten
days. I do monitor the systems, of course, but I didn't notice this
failure because it looked like the dumps were still in progress! In
fact, the dump had stalled, waiting for "sync" to complete...
Actually, in the good old days, we used to run "sync; sync" before
rebooting Unix because the first "sync" does not return until all the
deferred writes are flushed to disk. The snag is that if "sync" can't
flush the buffers to disk, it never returns.
> My inclination would be to not try and make the backup script work
> around this specific problem. Rather then pressing on after a failed
> sync the user should really be alerted that the machine is not behaving.
> Perhaps a more general solution would be to start a watchdog process at
> the start of the backup script. After ten minutes the watchdog looks
> for evidence that the backup is running properly and if not it shouts
> loudly for sysadmin intervention.
>
> What do you think?
I think that's a good idea, but it might be better to make the backup
script check and report if /backups is already mounted and exit instead
of failing silently if it can't mount /backups (for any reason) as it
does now. This will also detect if the backups take longer than 24h!
Bye,
Tony.
--
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt
More information about the Bio-linux-devel
mailing list