[opensuse] Strange file corruption problem
I have two systems, let's call them 'home' and 'away'. When I buy a music CD, I rip it onto whichever machine is available, and convert it to flac. I keep copies on both machines by rsyncing, so each machine's music collection is a mirror of the other. Both collections are on encrypted partitions. So far, so good. 'Away' plays the music perfectly, whereas 'home' has some corrupted tracks which have small skips and louder pops and crackles in the playback. The errors are reproducible, in that they occur in the same place each time. It makes no difference which end the original ripping took place, whether at 'home' or 'away'. 'Away' is running Linux Mint with the Gnome desktop. 'Home' is running openSUSE 11.2 with KDE 4.3.4 from the ../KDE4:/STABLE repo. Both machines use mplayer to play the music, though the errors on 'home' occur with amarok also. 'Away' uses pulseaudio, 'home' uses alsa. A faulty file on 'home' is also faulty when transferred to 'away' by scp. Mplayer output on 'away' for faulty copy: mplayer -ao pulse 10\ Rhymes\ of\ Goodbye.flac MPlayer SVN-r29482-4.3.3 (C) 2000-2009 MPlayer Team Playing 10 Rhymes of Goodbye.flac. Cache fill: 0.00% (0 bytes) Audio only file format detected. ========================================================================== Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders AUDIO: 44100 Hz, 2 ch, s16le, 874.0 kbit/61.93% (ratio: 109252->176400) Selected audio codec: [ffflac] afm: ffmpeg (FFmpeg FLAC audio) ========================================================================== [pulse] working around probably broken pause functionality, see http://www.pulseaudio.org/ticket/440 AO: [pulse] 44100Hz 2ch s16le (2 bytes per sample) Video: no video Starting playback... [flac @ 0x27dd9c0]FRAME HEADER not here% 84% [flac @ 0x27dd9c0]broken stream, invalid padding [flac @ 0x27dd9c0]invalid frame header [flac @ 0x27dd9c0]decode_frame() failed [flac @ 0x27dd9c0]FRAME HEADER not here% 84% Last message repeated 2 times4.0) 0.4% 49% [flac @ 0x27dd9c0]invalid subframe padding [flac @ 0x27dd9c0]decode_frame() failed [flac @ 0x27dd9c0]FRAME HEADER not here0.4% 49% A: 112.2 (01:52.2) of 184.0 (03:04.0) 0.4% 49% Exiting... (Quit) Mplayer output on 'away' for good copy: mplayer -ao pulse /media/music/artists/Scott\ Walker/1969\ Scott\ 4/10\ Rhymes\ of\ Goodbye.flac MPlayer SVN-r29482-4.3.3 (C) 2000-2009 MPlayer Team Playing /media/music/artists/Scott Walker/1969 Scott 4/10 Rhymes of Goodbye.flac. Cache fill: 18.28% (1916928 bytes) Audio only file format detected. ========================================================================== Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders AUDIO: 44100 Hz, 2 ch, s16le, 874.0 kbit/61.93% (ratio: 109252->176400) Selected audio codec: [ffflac] afm: ffmpeg (FFmpeg FLAC audio) ========================================================================== [pulse] working around probably broken pause functionality, see http://www.pulseaudio.org/ticket/440 AO: [pulse] 44100Hz 2ch s16le (2 bytes per sample) Video: no video Starting playback... A: 185.1 (03:05.0) of 184.0 (03:04.0) 0.4% 0% Exiting... (End of file) Both files have the same size: 'Home': -rwxrwxr-x 1 bob music 20154730 2008-06-14 03:23 Scott Walker/1969 Scott 4/10 Rhymes of Goodbye.flac 'Away': -rwxrwxr-x 1 play play 20154730 2008-06-13 22:23 /media/music/artists/Scott Walker/1969 Scott 4/10 Rhymes of Goodbye.flac The drive containing the music collection at 'home' appears to be healthy: fsck -CV /dev/mapper/music [...] /dev/mapper/music: clean, 43894/45874512 files, 117443940/183140352 blocks and: smartctl --all /dev/sdj smartctl 5.39 2009-08-08 r2872~ [x86_64-unknown-linux-gnu] (openSUSE RPM) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD753LJ Serial Number: S13UJ1MQ202570 Firmware Version: 1AA01108 User Capacity: 750,156,374,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Sun Jan 10 08:19:24 2010 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED [...] SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 0 Warning: ATA Specification requires self-test log structure revision number = 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 8874 - I've run out of ideas in trying to track down the source of these errors, and welcome advice from the list. Thanks, Bob -- Registered Linux User #463880 FSFE Member #1300 GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E openSUSE 11.2, Kernel 2.6.31.5-0.1-desktop, KDE 4.3.3 Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, 2010-01-10 at 08:29 -0000, Bob Williams wrote:
I have two systems, let's call them 'home' and 'away'. When I buy a music CD, I rip it onto whichever machine is available, and convert it to flac. I keep copies on both machines by rsyncing, so each machine's music collection is a mirror of the other. Both collections are on encrypted partitions. So far, so good.
'Away' plays the music perfectly, whereas 'home' has some corrupted tracks which have small skips and louder pops and crackles in the playback. The errors are reproducible, in that they occur in the same place each time. It makes no difference which end the original ripping took place, whether at 'home' or 'away'.
let me see: - 2 machines - encripted storage ¿type? - rip on any, copied to the other using rsync - some tracks bad on machine "home". - bad file on "home" is bad on "away" if copied via scp Correct? I would try comparing a "bad" file with the same file on the other machine. My guess is that they willbe different, and thus, it would prove that "rsync" botched the job, and worse, it does not detect it.
and:
smartctl --all /dev/sdj
Find out if you have remapped (reallocated) sectors. It is in the list of attributes. If so, a sector could have been remapped while containing data and would explain the problem. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktJ7mUACgkQtTMYHG2NR9Vh0QCcD3hRWrMqcWFquRedcRYn1T2D xdQAnRdqHsU+sAUd8TnVZ0XC7pjJhIS+ =LoIQ -----END PGP SIGNATURE-----
Thanks, Carlos You have grasped the setup exactly ... On Sunday 10 Jan 2010 15:12:30 Carlos E. R. wrote:
On Sunday, 2010-01-10 at 08:29 -0000, Bob Williams wrote:
I have two systems, let's call them 'home' and 'away'. When I buy a music CD, I rip it onto whichever machine is available, and convert it to flac. I keep copies on both machines by rsyncing, so each machine's music collection is a mirror of the other. Both collections are on encrypted partitions. So far, so good.
'Away' plays the music perfectly, whereas 'home' has some corrupted tracks which have small skips and louder pops and crackles in the playback. The errors are reproducible, in that they occur in the same place each time. It makes no difference which end the original ripping took place, whether at 'home' or 'away'.
let me see:
- 2 machines - encripted storage ¿type?
luKs
- rip on any, copied to the other using rsync - some tracks bad on machine "home". - bad file on "home" is bad on "away" if copied via scp
Correct?
Correct
I would try comparing a "bad" file with the same file on the other machine. My guess is that they willbe different, and thus, it would prove that "rsync" botched the job, and worse, it does not detect it.
Two objections to that solution: 1) Files that were ripped on "home" get corrupted, presumably _after_ a good copy has been rsynced to "away", though I can't prove that. Generally, the synchronisation will take place within a few days of ripping. The mirroring has been going on for 3 or 4 years. 2) Why would rsync only botch things in one direction. You'd expect to find damaged files at both ends, surely. Although, come to think of it, there is some asymmetry, in that rsync both pushes and pulls from "away", while "home" runs an 'always on' rsync daemon.
and:
smartctl --all /dev/sdj
Find out if you have remapped (reallocated) sectors. It is in the list of attributes. If so, a sector could have been remapped while containing data and would explain the problem.
I'll take a look at that. Bob -- Registered Linux User #463880 FSFE Member #1300 GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E openSUSE 11.2, Kernel 2.6.31.8-0.1-desktop, KDE 4.3.4 Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 10 Jan 2010 19:00:18 Bob Williams wrote:
Find out if you have remapped (reallocated) sectors. It is in the list of attributes. If so, a sector could have been remapped while containing data and would explain the problem.
I'll take a look at that.
There don't seem to be any reallocated sectors :) ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE [...] 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 Bob -- Registered Linux User #463880 FSFE Member #1300 GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E openSUSE 11.2, Kernel 2.6.31.8-0.1-desktop, KDE 4.3.4 Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Content-ID:
Thanks, Carlos
You have grasped the setup exactly ...
Good :-)
let me see:
- 2 machines - encripted storage ¿type?
luKs
Ah. No, I was thinking of the filesystem type. Any of them xfs? I'm having problems with them, which according to the SGI people seems to be a bug on the DM code. No answer from Novell.
- rip on any, copied to the other using rsync - some tracks bad on machine "home". - bad file on "home" is bad on "away" if copied via scp
Correct?
Correct
Ok, leaving the text above for reference.
I would try comparing a "bad" file with the same file on the other machine. My guess is that they willbe different, and thus, it would prove that "rsync" botched the job, and worse, it does not detect it.
Two objections to that solution:
1) Files that were ripped on "home" get corrupted, presumably _after_ a good copy has been rsynced to "away", though I can't prove that. Generally, the synchronisation will take place within a few days of ripping. The mirroring has been going on for 3 or 4 years.
Curious. Or rather, weird.
2) Why would rsync only botch things in one direction. You'd expect to find damaged files at both ends, surely. Although, come to think of it, there is some asymmetry, in that rsync both pushes and pulls from "away", while "home" runs an 'always on' rsync daemon.
Mmm. Network problems in one direction? I would do something: once you create a file, add a checksum of it to a text file or database. When you transmit it to the other computer, check it. After a period or when you suspect a problem, check it. Thus, you will be able to detect a corruption, and know if transmission is to be blamed, or storage. Plus, if you save on the log where the file was created, better. We should not trust our own memory ;-) It is no solution, but it would track or prove the issue. Your files either develop problems while stored, or they are damaged on transmission. It is unfortunate that copy tools nowdays do not include an option to verify that files were correctly copied and saved. I wonder if rsync could be used somehow to run a compare? Perhaps best thing would be using checksums.
and:
smartctl --all /dev/sdj
Find out if you have remapped (reallocated) sectors. It is in the list of attributes. If so, a sector could have been remapped while containing data and would explain the problem.
I'll take a look at that.
I see you did on your next post. No problems there, it seems. Although the "100" is not really the reallocation count, I think it is the raw index which is, and is zero in your case. Let me see, I had a disk go bad recently, I'll see what it reported [...] ah, look: 5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 30748 So yours is perfect, count is "0". - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAktKfjwACgkQtTMYHG2NR9U5XQCeJvkjr9iMepGh1q6xbKH3qn6c TdsAnRC+4ymiEh+1LFZPXe5Es88AuIDX =fa6x -----END PGP SIGNATURE-----
Hi Carlos, On Monday 11 Jan 2010 01:26:10 Carlos E. R. wrote:
Content-ID:
On Sunday, 2010-01-10 at 19:00 -0000, Bob Williams wrote:
- encripted storage ¿type?
luKs
Ah. No, I was thinking of the filesystem type. Any of them xfs? I'm having problems with them, which according to the SGI people seems to be a bug on the DM code. No answer from Novell.
Both ext3
- rip on any, copied to the other using rsync - some tracks bad on machine "home". - bad file on "home" is bad on "away" if copied via scp
Correct?
Correct
Ok, leaving the text above for reference.
I would try comparing a "bad" file with the same file on the other machine. My guess is that they willbe different, and thus, it would prove that "rsync" botched the job, and worse, it does not detect it.
Two objections to that solution:
1) Files that were ripped on "home" get corrupted, presumably _after_ a good copy has been rsynced to "away", though I can't prove that. Generally, the synchronisation will take place within a few days of ripping. The mirroring has been going on for 3 or 4 years.
Curious. Or rather, weird.
2) Why would rsync only botch things in one direction. You'd expect to find damaged files at both ends, surely. Although, come to think of it, there is some asymmetry, in that rsync both pushes and pulls from "away", while "home" runs an 'always on' rsync daemon.
Mmm.
Network problems in one direction?
I would do something: once you create a file, add a checksum of it to a text file or database. When you transmit it to the other computer, check it. After a period or when you suspect a problem, check it. Thus, you will be able to detect a corruption, and know if transmission is to be blamed, or storage. Plus, if you save on the log where the file was created, better. We should not trust our own memory ;-)
It is no solution, but it would track or prove the issue. Your files either develop problems while stored, or they are damaged on transmission. It is unfortunate that copy tools nowdays do not include an option to verify that files were correctly copied and saved.
This suggestion is the slow but sure method of tracking down the cause of the problem. Whether it will lead to a solution that prevents it happening in the future, I don't know.
I wonder if rsync could be used somehow to run a compare? Perhaps best thing would be using checksums.
More in-depth reading of the man page needed :) Many thanks for your helpful suggestions. Bob -- Registered Linux User #463880 FSFE Member #1300 GPG-FP: A6C1 457C 6DBA B13E 5524 F703 D12A FB79 926B 994E openSUSE 11.2, Kernel 2.6.31.8-0.1-desktop, KDE 4.3.4 Intel Core2 Quad Q9400 2.66GHz, 4GB DDR RAM, nVidia GeForce 9200GS -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Bob Williams
-
Carlos E. R.