Bug ID | 1162365 |
---|---|
Summary | if the lock does not use lock elision pthread_mutex_destroy will fail |
Classification | openSUSE |
Product | openSUSE Distribution |
Version | Leap 15.1 |
Hardware | x86-64 |
OS | All |
Status | NEW |
Severity | Major |
Priority | P5 - None |
Component | Other |
Assignee | bnc-team-screening@forge.provo.novell.com |
Reporter | jan.m.michalski@intel.com |
QA Contact | qa-bugs@suse.de |
Found By | --- |
Blocker | --- |
Source code: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/locking_issue_repro.c Makefile: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/Makefile Distro: openSUSE Leap 15.1 Kernel: 4.12.14-lp151.28.36-default Glibc: glibc-devel-2.26-lp151.18.7.x86_64 CPU: Intel(R) Xeon(R) Gold 6142M CPU @ 2.60GHz Scenario: Two worker threads at the same time are using a common set of primitives: struct action { pthread_mutex_t lock; pthread_cond_t cond; unsigned val; }; One of the threads is waiting on pthread_cond_t while another is setting val to 1. Everything happens in the action_cancel_worker function: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/locking_issue_repro.c#L159 After exiting from the worker thread all mutexes should be unlocked so it should be possible to destroy them. But they are not. pthread_mutex_destroy fails with EBUSY. Repro: $ ./locking_issue_repro 32 1000 pthread_mutex_destroy: Device or resource busy Note: After each pthread_mutex_lock and pthread_mutex_unlock API call internal state of the mutex is dumped to /dev/shm/obj_pmalloc_mt_dump file. The key is: TID -> actions[worker-id][op-id] = {data read from the pthread_mutex_t} (stage of the worker) Issue: (appears sporadically, but at least 1/5): $ cat /dev/shm/obj_pmalloc_mt_dump | tail 2793 -> actions[7][996] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump) 2777 -> actions[7][794] = {nusers: 1, owner: 2793, kind: 256} (dump) Clues: All of the locks are of the kind: PTHREAD_MUTEX_ELISION_NP so nearly all of them looks as follows: 2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t0) 2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t0) 2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t1) So it looks like all of them are use lock elision. But if any of them does not use lock elision it behaves strangely: - it seems locked all the time: $ cat /dev/shm/obj_pmalloc_mt_dump | grep \\[7\\] | grep 710 2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t1) // no matter if it is after lock 2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t0) 2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t0) // or after unlock 2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t1) 2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump) - but at the same time, they work fine! - excluding the fact they are impossible to destroy them - the rule is: if the lock does not use lock elision it will fail during pthread_mutex_destroy.