Bug ID 1162365
Summary if the lock does not use lock elision pthread_mutex_destroy will fail
Classification openSUSE
Product openSUSE Distribution
Version Leap 15.1
Hardware x86-64
OS All
Status NEW
Severity Major
Priority P5 - None
Component Other
Assignee bnc-team-screening@forge.provo.novell.com
Reporter jan.m.michalski@intel.com
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

Source code:
https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/locking_issue_repro.c
Makefile:
https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/Makefile
Distro: openSUSE Leap 15.1
Kernel: 4.12.14-lp151.28.36-default
Glibc: glibc-devel-2.26-lp151.18.7.x86_64
CPU: Intel(R) Xeon(R) Gold 6142M CPU @ 2.60GHz

Scenario:
Two worker threads at the same time are using a common set of primitives:
struct action {
               pthread_mutex_t lock;
               pthread_cond_t cond;
               unsigned val;
};
One of the threads is waiting on pthread_cond_t while another is setting val to
1.
Everything happens in the action_cancel_worker function:
https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/locking_issue_repro.c#L159
After exiting from the worker thread all mutexes should be unlocked so it
should be possible to destroy them. But they are not. pthread_mutex_destroy
fails with EBUSY.

Repro:
$ ./locking_issue_repro 32 1000
pthread_mutex_destroy: Device or resource busy

Note:
After each pthread_mutex_lock and pthread_mutex_unlock API call internal state
of the mutex is dumped to /dev/shm/obj_pmalloc_mt_dump file.
The key is:
TID -> actions[worker-id][op-id] = {data read from the pthread_mutex_t} (stage
of the worker)

Issue: (appears sporadically, but at least 1/5):
$ cat /dev/shm/obj_pmalloc_mt_dump | tail
2793 -> actions[7][996] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump)
2777 -> actions[7][794] = {nusers: 1, owner: 2793, kind: 256} (dump)

Clues:
All of the locks are of the kind: PTHREAD_MUTEX_ELISION_NP so nearly all of
them looks as follows:
2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t0)
2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t0)
2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
So it looks like all of them are use lock elision.
But if any of them does not use lock elision it behaves strangely:
- it seems locked all the time:
$ cat /dev/shm/obj_pmalloc_mt_dump | grep \\[7\\] | grep 710
2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t1) // no
matter if it is after lock
2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t0)
2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t0) // or
after unlock
2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t1)
2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump)
- but at the same time, they work fine!
- excluding the fact they are impossible to destroy them
- the rule is: if the lock does not use lock elision it will fail during
pthread_mutex_destroy.


You are receiving this mail because: