Bug ID 934977
Summary strange messages by nvidia and pcieport
Classification openSUSE
Product openSUSE Distribution
Version 13.2
Hardware x86-64
OS openSUSE 13.2
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter tschaefer@t-online.de
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

Hi,


Hardware description:

Fujitsu RX350s7
84:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20Xm] (rev a1)
85:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1)

Change: adding the second card


While using the server with the first/single card I got this message:

nvidia 0000:84:00.0: irq 125 for MSI/MSI-X

But this was not often.

Now with two cards I get:

....
[ 7228.457346] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457348] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457384] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.457395] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457396] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457398] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457411] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.457416] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.457418] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.457419] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.457537] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.457541] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457543] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457544] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457622] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.457627] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457629] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457631] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457696] pcieport 0000:80:03.0: AER: Multiple Corrected error received:
id=8500
[ 7228.457707] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457709] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457710] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457757] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.457788] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457790] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457792] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457799] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.457816] pcieport 0000:80:03.0: can't find device of ID0000
[ 7228.457818] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.457823] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457825] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457826] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.457929] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.457940] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.457942] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.457944] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.458121] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.458137] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.458139] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.458140] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.458146] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.458155] pcieport 0000:80:03.0: can't find device of ID0000
[ 7228.458201] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.458218] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.458219] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.458221] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.458502] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.458518] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.458520] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.458523] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.467618] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.467622] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.467624] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.467625] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.467897] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.467909] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.467910] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.467911] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.467981] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.467992] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.467994] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.467995] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.468279] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.468283] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.468285] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.468286] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.469589] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.469604] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.469606] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.469608] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.469613] pcieport 0000:80:03.0: AER: Multiple Corrected error received:
id=8018
[ 7228.469621] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.469622] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.469624] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.469778] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.469789] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.469791] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.469792] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.469905] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.469916] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.469917] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.469919] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.470012] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.470016] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.470018] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.470019] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.470074] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.470078] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.470080] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.470081] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.470177] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.470187] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.470189] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.470190] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.470228] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.470233] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.470235] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.470236] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.470509] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.470520] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.470521] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.470523] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.471267] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.471281] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.471282] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.471284] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.472056] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.472059] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.472062] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.472064] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.472342] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.472353] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.472354] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.472356] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.472366] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.472371] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.472373] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.472374] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.472447] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.472457] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.472459] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.472460] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.472634] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.472639] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Transmitter ID)
[ 7228.472641] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00001000/00002000                             
[ 7228.472642] pcieport 0000:80:03.0:    [12] Replay Timer Timeout              
[ 7228.472727] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.472732] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Transmitter ID)
[ 7228.472734] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00001000/00002000                             
[ 7228.472735] pcieport 0000:80:03.0:    [12] Replay Timer Timeout              
[ 7228.473298] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.473309] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.473310] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.473312] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.473324] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.473329] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.473331] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.473332] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.473585] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.473598] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.473599] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.473601] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.473839] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.473843] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.473845] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.473846] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.474045] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.474056] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.474057] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.474059] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.474072] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.474077] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.474078] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.474080] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.474254] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.474258] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.474260] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.474261] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.474292] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.474310] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.474312] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.474313] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.474318] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.474323] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.474325] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.474326] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.474413] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.474418] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Transmitter ID)
[ 7228.474420] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00001000/00002000                             
[ 7228.474421] pcieport 0000:80:03.0:    [12] Replay Timer Timeout              
[ 7228.474651] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.474661] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.474663] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.474664] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.474805] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.474815] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.474816] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.474818] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.474823] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.474828] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.474830] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.474831] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.475011] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.475022] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.475023] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.475025] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.475234] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.475244] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.475246] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.475247] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.475723] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.475733] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.475735] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.475736] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.475749] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.475754] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.475756] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.475757] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.475977] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.475983] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.475985] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.475986] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.475991] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.476000] pcieport 0000:80:03.0: can't find device of ID0000
[ 7228.476136] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.476146] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.476147] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.476148] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.476158] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.476163] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.476164] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.476165] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.476290] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.476294] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.476297] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.476298] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.476443] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.476455] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.476457] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.476459] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.478918] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.478930] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Transmitter ID)
[ 7228.478931] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00001000/00002000                             
[ 7228.478933] pcieport 0000:80:03.0:    [12] Replay Timer Timeout              
[ 7228.481104] pcieport 0000:80:03.0: AER: Multiple Corrected error received:
id=8500
[ 7228.481115] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.481117] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.481118] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.484637] pcieport 0000:80:03.0: AER: Multiple Corrected error received:
id=8018
[ 7228.484647] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.484648] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.484650] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.487936] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.487943] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.487944] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.487945] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.493400] pcieport 0000:80:03.0: AER: Multiple Corrected error received:
id=0000
[ 7228.493425] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.493426] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001100/0000a000                               
[ 7228.493427] nvidia 0000:85:00.0:    [ 8] RELAY_NUM Rollover                  
[ 7228.493428] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.493434] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.493447] pcieport 0000:80:03.0: can't find device of ID0000
[ 7228.499494] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.499500] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.499501] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.499503] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.499597] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.499602] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Transmitter ID)
[ 7228.499603] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00001000/00002000                             
[ 7228.499605] pcieport 0000:80:03.0:    [12] Replay Timer Timeout              
[ 7228.499852] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.499865] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.499866] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.499867] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.499885] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.499895] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.499896] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.499897] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.500445] pcieport 0000:80:03.0: AER: Corrected error received: id=0000
[ 7228.500457] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.500458] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000                               
[ 7228.500459] nvidia 0000:85:00.0:    [12] Replay Timer Timeout                
[ 7228.575347] pcieport 0000:80:03.0: AER: Multiple Corrected error received:
id=8018
[ 7228.575366] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Receiver ID)
[ 7228.575368] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00000040/00002000                             
[ 7228.575370] pcieport 0000:80:03.0:    [ 6] Bad TLP                           
[ 7228.589823] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.589832] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.589834] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00003000/0000a000
[ 7228.589836] nvidia 0000:85:00.0:    [12] Replay Timer Timeout  
[ 7228.590109] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.590113] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.590114] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000
[ 7228.590115] nvidia 0000:85:00.0:    [12] Replay Timer Timeout  
[ 7228.590184] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.590187] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.590188] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000
[ 7228.590189] nvidia 0000:85:00.0:    [12] Replay Timer Timeout  
[ 7228.596198] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.596204] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.596205] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000
[ 7228.596207] nvidia 0000:85:00.0:    [12] Replay Timer Timeout  
[ 7228.596415] pcieport 0000:80:03.0: AER: Corrected error received: id=8018
[ 7228.596423] pcieport 0000:80:03.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8018(Transmitter ID)
[ 7228.596424] pcieport 0000:80:03.0:   device [8086:3c08] error
status/mask=00001000/00002000
[ 7228.596425] pcieport 0000:80:03.0:    [12] Replay Timer Timeout  
[ 7228.608387] pcieport 0000:80:03.0: AER: Corrected error received: id=8500
[ 7228.608393] nvidia 0000:85:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=8500(Transmitter ID)
[ 7228.608394] nvidia 0000:85:00.0:   device [10de:1024] error
status/mask=00001000/0000a000
[ 7228.608395] nvidia 0000:85:00.0:    [12] Replay Timer Timeout  




Kernel is:
Linux omega 3.16.7-21-default #1 SMP Tue Apr 14 07:11:37 UTC 2015 (93c1539)
x86_64 x86_64 x86_64 GNU/Linux

nvidia-driver is from package:
Name        : nvidia-gfxG04-kmp-default
Version     : 346.35_k3.16.6_2
Release     : 4.1
Architecture: x86_64
Install Date: Mo 16 M�r 2015 19:47:17 CET
Group       : System/Kernel
Size        : 29342088
License     : PERMISSIVE-OSI-COMPLIANT
Signature   : DSA/SHA1, Mi 04 Feb 2015 17:37:22 CET, Key ID f5113243c66b6eae
Source RPM  : nvidia-gfxG04-346.35-4.1.nosrc.rpm
Build Date  : Mo 02 Feb 2015 12:36:12 CET
Build Host  : sheep02
Relocations : (not relocatable)
Vendor      : obs://build.suse.de/home:sndirsch:drivers
Summary     : NVIDIA graphics driver kernel module for GeForce 8xxx and newer
GPUs
Description :
NVIDIA graphics driver kernel module for GeForce 8xxx and newer GPUs
Distribution: home:sndirsch:drivers / openSUSE_13.2
Name        : nvidia-gfxG04-kmp-default
Version     : 346.72_k3.16.6_2
Release     : 6.1
Architecture: x86_64
Install Date: Di 16 Jun 2015 17:41:38 CEST
Group       : System/Kernel
Size        : 29391745
License     : PERMISSIVE-OSI-COMPLIANT
Signature   : DSA/SHA1, Do 28 Mai 2015 18:19:55 CEST, Key ID f5113243c66b6eae
Source RPM  : nvidia-gfxG04-346.72-6.1.nosrc.rpm
Build Date  : Do 28 Mai 2015 12:14:47 CEST
Build Host  : sheep05
Relocations : (not relocatable)
Vendor      : obs://build.suse.de/home:sndirsch:drivers
Summary     : NVIDIA graphics driver kernel module for GeForce 400 series and
newer
Description :
NVIDIA graphics driver kernel module for GeForce 400 series and newer
Distribution: home:sndirsch:drivers / openSUSE_13.2


nvidia-smi -q

==============NVSMI LOG==============

Timestamp                           : Tue Jun 16 21:14:39 2015
Driver Version                      : 346.72

Attached GPUs                       : 2
GPU 0000:84:00.0
    Product Name                    : Tesla K20Xm
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0320313018232
    GPU UUID                        : GPU-d7f6b60c-6d92-eb24-203f-3302d07dab7e
    Minor Number                    : 0
    VBIOS Version                   : 80.10.17.00.02
    MultiGPU Board                  : No
    Board ID                        : 0x8400
    Inforom Version
        Image Version               : 2081.0200.01.09
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : Compute
        Pending                     : Compute
    PCI
        Bus                         : 0x84
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102110DE
        Bus Id                      : 0000:84:00.0
        Sub System Id               : 0x097D10DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 2
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 5759 MiB
        Used                        : 12 MiB
        Free                        : 5747 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        GPU Current Temp            : 27 C
        GPU Shutdown Temp           : 95 C
        GPU Slowdown Temp           : 90 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 56.42 W
        Power Limit                 : 235.00 W
        Default Power Limit         : 235.00 W
        Enforced Power Limit        : 235.00 W
        Min Power Limit             : 150.00 W
        Max Power Limit             : 235.00 W
    Clocks
        Graphics                    : 732 MHz
        SM                          : 732 MHz
        Memory                      : 2600 MHz
    Applications Clocks
        Graphics                    : 732 MHz
        Memory                      : 2600 MHz
    Default Applications Clocks
        Graphics                    : 732 MHz
        Memory                      : 2600 MHz
    Max Clocks
        Graphics                    : 784 MHz
        SM                          : 784 MHz
        Memory                      : 2600 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

GPU 0000:85:00.0
    Product Name                    : Tesla K40c
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0321915052341
    GPU UUID                        : GPU-be483237-445d-a4bb-ecce-35d886d1fa88
    Minor Number                    : 1
    VBIOS Version                   : 80.80.3E.00.02
    MultiGPU Board                  : No
    Board ID                        : 0x8500
    Inforom Version
        Image Version               : 2081.0206.01.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x85
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102410DE
        Bus Id                      : 0000:85:00.0
        Sub System Id               : 0x098310DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 905
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : 23 %
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 11519 MiB
        Used                        : 23 MiB
        Free                        : 11496 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 90 %
        Memory                      : 4 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        GPU Current Temp            : 32 C
        GPU Shutdown Temp           : 95 C
        GPU Slowdown Temp           : 90 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 64.50 W
        Power Limit                 : 235.00 W
        Default Power Limit         : 235.00 W
        Enforced Power Limit        : 235.00 W
        Min Power Limit             : 180.00 W
        Max Power Limit             : 235.00 W
    Clocks                                                                      
        Graphics                    : 745 MHz
        SM                          : 745 MHz
        Memory                      : 3004 MHz
    Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Default Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None


What should I do?
Is it kernel-related? Is it a nvidia-bug or a BIOS-problem?


You are receiving this mail because: