http://bugzilla.opensuse.org/show_bug.cgi?id=1162365 Bug ID: 1162365 Summary: if the lock does not use lock elision pthread_mutex_destroy will fail Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.1 Hardware: x86-64 OS: All Status: NEW Severity: Major Priority: P5 - None Component: Other Assignee: bnc-team-screening@forge.provo.novell.com Reporter: jan.m.michalski@intel.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Source code: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/... Makefile: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/... Distro: openSUSE Leap 15.1 Kernel: 4.12.14-lp151.28.36-default Glibc: glibc-devel-2.26-lp151.18.7.x86_64 CPU: Intel(R) Xeon(R) Gold 6142M CPU @ 2.60GHz Scenario: Two worker threads at the same time are using a common set of primitives: struct action { pthread_mutex_t lock; pthread_cond_t cond; unsigned val; }; One of the threads is waiting on pthread_cond_t while another is setting val to 1. Everything happens in the action_cancel_worker function: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/... After exiting from the worker thread all mutexes should be unlocked so it should be possible to destroy them. But they are not. pthread_mutex_destroy fails with EBUSY. Repro: $ ./locking_issue_repro 32 1000 pthread_mutex_destroy: Device or resource busy Note: After each pthread_mutex_lock and pthread_mutex_unlock API call internal state of the mutex is dumped to /dev/shm/obj_pmalloc_mt_dump file. The key is: TID -> actions[worker-id][op-id] = {data read from the pthread_mutex_t} (stage of the worker) Issue: (appears sporadically, but at least 1/5): $ cat /dev/shm/obj_pmalloc_mt_dump | tail 2793 -> actions[7][996] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (unlock t1) 2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump) 2777 -> actions[7][794] = {nusers: 1, owner: 2793, kind: 256} (dump) Clues: All of the locks are of the kind: PTHREAD_MUTEX_ELISION_NP so nearly all of them looks as follows: 2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t0) 2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t0) 2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t1) 2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t1) So it looks like all of them are use lock elision. But if any of them does not use lock elision it behaves strangely: - it seems locked all the time: $ cat /dev/shm/obj_pmalloc_mt_dump | grep \\[7\\] | grep 710 2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t1) // no matter if it is after lock 2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t0) 2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t0) // or after unlock 2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t1) 2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump) - but at the same time, they work fine! - excluding the fact they are impossible to destroy them - the rule is: if the lock does not use lock elision it will fail during pthread_mutex_destroy. -- You are receiving this mail because: You are on the CC list for the bug.