Comment # 8 on bug 1075876 from
I also started to look at this case. The problem is that tc->cc is null.

We have the following disasembly:
    Upon entering timecounter_read()
    rdi contains struct timecounter *tc

50              cycle_now = tc->cc->read(tc->cc);
   0xffffffff81105096 <+6>:     mov    (%rdi),%rax
    offsetof(struct timecounter, cc) = 0
    rax contains tc->cc, 0

61
62              return ns_offset;
63      }
64
65      u64 timecounter_read(struct timecounter *tc)
66      {
   0xffffffff81105099 <+9>:     mov    %rdi,%rbx
    We get the saved value of rdi (tc) in rbx, ffff88025476b7a0

50              cycle_now = tc->cc->read(tc->cc);
   0xffffffff8110509c <+12>:    mov    %rax,%rdi
   0xffffffff8110509f <+15>:    callq  *(%rax)
    offsetof(struct cyclecounter, read) = 0
    This is the null deref (deref of cc), tc->cc->read
    We already know that cc is null.

The crash occurs after 4 hours because of:

\ e1000e_ptp_init
    INIT_DELAYED_WORK
    schedule_delayed_work(..., 4hours)

tc should be initialized by:

\ e1000_probe
    \ e1000e_reset
        \ e1000e_systim_reset
            timecounter_init(&adapter->tc, &adapter->cc,
                     ktime_to_ns(ktime_get_real()));

This kernel includes a backport of 
aa524b66c5ef e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl
(v4.7-rc1)
This was backported for SLE12SP3.

>From this commit we can see that e1000e_systim_reset() may exit before
timecounter_init() if ret_val. Indeed, in the log from comment 4 we see
Feb 04 15:49:22 fphnbam3 kernel: e1000e 0000:00:1f.6: Failed to restore TIMINCA
clock rate delta: -22
which explains that e1000e_systim_reset() exited early and tc is not
initialized.

I'm not sure where the -EINVAL comes from, how come it only occurs sometimes
(according to the launchpad report linked to in comment 4) and why it seems
this only affects 4.4 stable kernels (I didn't find reports on other
versions).


You are receiving this mail because: