[Bio-linux-dev] bio-linux-backups 'sync' problem

Tony Travis a.travis at abdn.ac.uk
Sun Oct 17 15:57:15 EDT 2010


On 14/10/10 11:20, Tim Booth wrote:
> Hi Tony,
>
> Interesting.  I just took a USB stick, made 2 partitions and formatted
> one as NTFS and the other as VFAT.  I then plugged it in and let the
> hotplug magic auto-mount the two partitions under /media.  I then
> started 2 simultaneous processes doing "cat /dev/urandom>  foo" on each
> partition and when they were under way I yanked the stick out.
>
> In this case, my system sorted itself out right away.  "IO Error" is
> printed, the writing processes are killed and the mounts are cleared up.
> Sync works fine.  This is what I hoped would happen on a modern Linux
> kernel.  If I end up with my system in a state where "sync" hangs
> indefinitely then I generally reckon there is a big problem and an
> urgent reboot is required.  What I don't know is how commonly this
> results from untimely removal of a USB stick or other factors (Novell
> network mounts are bad for this).  Most probably triggering of the
> problem by untimely removal of a USB device is dependent on the exact
> hardware, the kernel drivers, DBUS quirks and any number of timing
> conditions.

Hi, Tim.

In our case, "mount.ntfs-3g" itself was the dead-locked process: It 
seems likely that the USB stick had been removed while it was still 
being auto-mounted, as in plug the stick in then yank it out because the 
user changed their mind or nothing seemed to be happening...

However unlikely this scenario is, it actually happened on one of our 
NBX's and, as a consequence, the NBX concerned was not backed for ten 
days. I do monitor the systems, of course, but I didn't notice this 
failure because it looked like the dumps were still in progress! In 
fact, the dump had stalled, waiting for "sync" to complete...

Actually, in the good old days, we used to run "sync; sync" before 
rebooting Unix because the first "sync" does not return until all the 
deferred writes are flushed to disk. The snag is that if "sync" can't 
flush the buffers to disk, it never returns.

> My inclination would be to not try and make the backup script work
> around this specific problem. Rather then pressing on after a failed
> sync the user should really be alerted that the machine is not behaving.
> Perhaps a more general solution would be to start a watchdog process at
> the start of the backup script.  After ten minutes the watchdog looks
> for evidence that the backup is running properly and if not it shouts
> loudly for sysadmin intervention.
>
> What do you think?

I think that's a good idea, but it might be better to make the backup 
script check and report if /backups is already mounted and exit instead 
of failing silently if it can't mount /backups (for any reason) as it 
does now. This will also detect if the backups take longer than 24h!

Bye,

   Tony.
-- 
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt



More information about the Bio-linux-devel mailing list