Re: [suse-amd64] SUSE 64-bit + 3ware controller = Corrupt filesystem
John McCorquodale wrote:
This is with an 8500-12 SATA raid controller. I should also mention that using the 32-bit SUSE 9.0 Professional on the same machine does not exhibit the corruption.
Yes, I have this problem as well. I have reported it to 3ware (which you should do too; duplicate bug reports add legitimacy to problems) with no luck. There was some theorization on the x86-64 kernel list a few months ago that it was an IOMMU problem, but I never had luck with tweeking IOMMU parameters.
Yes, we reported this to 3ware as well.
The problem I saw was that reads/writes with blocksize under 4k-N (where N was 40 bytes or something like that) seemed to work fine, but would return all zeros (or bogus data) in the buffers above 4k-N or in some cases (M*4k)-N. Also, the first 16k (if I remember correctly) of the partition always reads as zero.
As another data point, I was able to install onto the RAID array and do some testing with RedHat Enterprise for AMD64 with no corruption.
The K8W appears to have serious BIOS problems setting up memory mappings (particularly in regard to AGP setup) when 8GB is installed, so drop your RAM to 1GB and see if you have better luck.
What sort of AGP problems are you seeing? We are seeing poor performance with > 4GB.
After I'm done here at Supercomputing, I'll try to come up with a kernel patch to reorganize memory mappings to sanify Tyan's BIOS's mess for 8GB, but I don't know that this approach will work. I'm hoping it will solve the 3ware problem and my AGP problem simultaneously, but I am used to engaging in wishful thinking. :)
Well, I'll keep my fingers crossed. :-) Mike
What sort of AGP problems are you seeing? We are seeing poor performance with > 4GB.
Yes, it's pathetic with 8GB installed. Bandwidth across the AGP connector is about 17MB/s, due mostly to a bogus uncachable MTRR entry that BIOS sets up, the removal of which blows the system away (reason unknown; I don't know why it's there in the first place). With 1GB installed, the frame buffer can be mapped write-combining and AGP b/w jumps to about 250MB/s. This is the fastest I've been able to observe actual data transport (about 1x AGP) depite the fact that the AGP connector is signalling as fast as 8x AGP v3 (on an ATI X1). The 8151 is on its own HT link, and that link is running at 16 bit, 600 MHz, so there's no architectural reason it should be this slow. There is clearly something incredibly and obviously wrong going on with way things are getting set up on the board, and being as it's been weeks since I initially started talking to Tyan and have heard nothing back, I'm not hopeful that they're even bothered to reproduce the problem let alone do anything about it. I am afraid they're of the mindset that "picture on screen == works" -- I sure wish they'd correct this misconception if it is one. Anyway, I'm still of the belief that there's just a crazy memory mapping and requests to AGP are getting stalled/deferred because of some deranged table entries somewhere. I haven't found them yet 'tho. :) See my posts to the suse-amd64 archives for Oct/Nov timeframe for more details and pointers to a benchmark you can play with if you're that motivated. I really want to start making my own boards...
Well, I'll keep my fingers crossed. :-)
Yeah. -mcq
There is clearly something incredibly and obviously wrong going on with way things are getting set up on the board, and being as it's been weeks since I initially started talking to Tyan and have heard nothing back, I'm not hopeful that they're even bothered to reproduce the problem let alone do anything about it. I am afraid they're of the mindset that "picture on screen == works" -- I sure wish they'd correct this misconception if it is one.
I am seeing the exact same behaviour with a Thunder K8W (S2885). I have also been in contact with Tyan, and innitially I got some responses claiming that it was a kernel issue. After telling them that was not a kernel issue I have not heard back from them. /peter
On Thu, 20 Nov 2003 11:48:48 -0800 Michael Madore <mmadore@aslab.com> wrote:
John McCorquodale wrote:
This is with an 8500-12 SATA raid controller. I should also mention that using the 32-bit SUSE 9.0 Professional on the same machine does not exhibit the corruption.
Yes, I have this problem as well. I have reported it to 3ware (which you should do too; duplicate bug reports add legitimacy to problems) with no luck. There was some theorization on the x86-64 kernel list a few months ago that it was an IOMMU problem, but I never had luck with tweeking IOMMU parameters.
Yes, we reported this to 3ware as well.
Can people who have problems with 3ware please test if they still happen with mem=3000M ? Just add this line to the kernel arguments in /boot/grub/menu.lst. Only makes sense of course if you have more than 3GB of memory. I'm not suggesting this as a permanent solution, just for testing. -Andi
participants (4)
-
Andi Kleen
-
John McCorquodale
-
Michael Madore
-
Peter Rundberg