Default enabling -z now aka "Full RelRO" in Factory
Hi folks, Security hardening of Linux systems suggests marking ELF binary sections as much read-only as it gets. A part of this binary hardening is making the ELF relocations in binaries and libraries read-only to avoid them being overwritten and used for attacks. SUSE has built everything with "Partial RELRO" for a long time (via a default in binutils). (-z relro) We did not yet do "Full RELRO" (-z now) as we feared the amount of integration work. However as this is industry standard now, we have started the integration and will push it to main Factory / Tumbleweed soon. It is being implemented by: - The SUSE binutils "ld" queries the "SUSE_ZNOW" environment variable. If it is present and not "0", it will enable "-z now". - The post-build-checks injects via /etc/profile.d/build-system.sh the environment variable SUSE_ZNOW=1 into all RPM build chroots (that use post-build-checks, which should be all of them). - Packages can still deselect it. Either: - use linker option "-z lazy". - or export SUSE_ZNOW=0 in the %build section. Currently only "xorg-x11-server" and "python-atspi:tests" needed to do this in our staging. After this is integrated in on the next snapshots, if you see weird symbol lookup errors, as usual report them via bugzilla. Ciao, Marcus
On Tuesday 2022-04-05 11:24, Marcus Meissner wrote:
SUSE has built everything with "Partial RELRO" for a long time (via a default in binutils). (-z relro)
We did not yet do "Full RELRO" (-z now) as we feared the amount of integration work.
The way the manpages are written, one would not think of -z now (or ld.so LD_BIND_NOW) having ties to relro, but be more of a debugging aid, so that debugger sessions don't go through the symbol resolution trampolines. There has got to be some speed penalty when a process that received libstdc++.so.6 by "accident" now has to resolve 6000ish symbols even if unused.
Am 05.04.22 um 11:51 schrieb Jan Engelhardt:
On Tuesday 2022-04-05 11:24, Marcus Meissner wrote:
SUSE has built everything with "Partial RELRO" for a long time (via a default in binutils). (-z relro)
We did not yet do "Full RELRO" (-z now) as we feared the amount of integration work.
The way the manpages are written, one would not think of -z now (or ld.so LD_BIND_NOW) having ties to relro, but be more of a debugging aid, so that debugger sessions don't go through the symbol resolution trampolines. There has got to be some speed penalty when a process that received libstdc++.so.6 by "accident" now has to resolve 6000ish symbols even if unused.
Is that what happens? I thought it would only resolve the symbols that are used, and "resolve all symbols" means resolving all symbols that need to be resolved, i.e. are listed in .dynsym of some loaded ELF file. For unused functions there is no GOT/PLT, so where would the dynamic linker write their result? The second part, "or when the shared library is loaded by dlopen, instead of deferring function call resolution to the point when the function is first called", is probably referring to the .dynsym of that library, i.e. we resolve whatever it uses. I don't think it will precache dlsym results, what would be the point of that? The function pointer returned is stored by the user wherever they want, which will likely be writable pages. But I haven't actually debugged or traced this. In any event, the man page does indeed not mention any security benefits, though it's well-known that it makes the relocation section read-only. (I always forget which one but I think it's .rela.plt.) Aaron
On Thursday 2022-04-07 02:19, Aaron Puchert wrote:
We did not yet do "Full RELRO" (-z now) as we feared the amount of integration work.
The way the manpages are written, one would not think of -z now (or ld.so LD_BIND_NOW) having ties to relro, but be more of a debugging aid, so that debugger sessions don't go through the symbol resolution
Is that what happens? I thought it would only resolve the symbols that are used, and "resolve all symbols" means resolving all symbols that need to be resolved, i.e. are listed in .dynsym of some loaded ELF file
I did not mean to contradict; indeed there are different interpretations for "all" and "used". - symbols in .dynsym of a library (libstdc++ roughly 5971) - symbols in .rela.dyn/.rela.plt/.dynsym section of an executable (hello world[1] in c++: up to 14 depending on what you count) - set of functions that actually get invoked during runtime (if rand yields 0, cout.operator<< won't be called, so it need not be resolved) [1] #include <iostream> #include <cstdlib> int main() { srand(time(nullptr)); if (rand() & 1) std::cout << "Hello world\n"; } LD_BIND_NOW=1 LD_DEBUG=symbols ./a.out 2>&1|grep -i symbo|sort -u|wc -l 6302 LD_BIND_NOW=0 LD_DEBUG=symbols ./a.out 2>&1|grep -i symbo|sort -u|wc -l 4044
Am 07.04.22 um 13:05 schrieb Jan Engelhardt:
On Thursday 2022-04-07 02:19, Aaron Puchert wrote:
We did not yet do "Full RELRO" (-z now) as we feared the amount of integration work. The way the manpages are written, one would not think of -z now (or ld.so LD_BIND_NOW) having ties to relro, but be more of a debugging aid, so that debugger sessions don't go through the symbol resolution Is that what happens? I thought it would only resolve the symbols that are used, and "resolve all symbols" means resolving all symbols that need to be resolved, i.e. are listed in .dynsym of some loaded ELF file I did not mean to contradict; indeed there are different interpretations for "all" and "used".
- symbols in .dynsym of a library (libstdc++ roughly 5971) - symbols in .rela.dyn/.rela.plt/.dynsym section of an executable (hello world[1] in c++: up to 14 depending on what you count) Good point, the .rela.* sections are probably more appropriate when estimating the cost, since symbols in .dynsym don't need to be looked up necessarily. (If they're defined in the library but not used there, or if the library uses -fno-semantic-interposition.) - set of functions that actually get invoked during runtime (if rand yields 0, cout.operator<< won't be called, so it need not be resolved) Indeed, “used” can mean statically used or dynamically used, and neither necessarily implies the other. (Statically used symbols might not be needed in a particular execution, and dynamically used symbols might have been resolved with dlsym.)
There could be quite a number of functions that aren't used in a particular invocation of an executable, which I guess could happen especially in scripts. Then on the other hand, looking up all statically used functions at once might be more cache-friendly.
[1] #include <iostream> #include <cstdlib> int main() { srand(time(nullptr)); if (rand() & 1) std::cout << "Hello world\n"; }
LD_BIND_NOW=1 LD_DEBUG=symbols ./a.out 2>&1|grep -i symbo|sort -u|wc -l 6302 LD_BIND_NOW=0 LD_DEBUG=symbols ./a.out 2>&1|grep -i symbo|sort -u|wc -l 4044
Strangely I get 6300 for both and I haven't updated to the “-z now by default” snapshot yet. But either way there is an awful lot of symbols used by libstdc++. Aaron
On Tue, Apr 5, 2022 at 5:24 AM Marcus Meissner <meissner@suse.de> wrote:
Hi folks,
Security hardening of Linux systems suggests marking ELF binary sections as much read-only as it gets.
A part of this binary hardening is making the ELF relocations in binaries and libraries read-only to avoid them being overwritten and used for attacks.
SUSE has built everything with "Partial RELRO" for a long time (via a default in binutils). (-z relro)
We did not yet do "Full RELRO" (-z now) as we feared the amount of integration work.
However as this is industry standard now, we have started the integration and will push it to main Factory / Tumbleweed soon.
It is being implemented by:
- The SUSE binutils "ld" queries the "SUSE_ZNOW" environment variable. If it is present and not "0", it will enable "-z now".
- The post-build-checks injects via /etc/profile.d/build-system.sh the environment variable SUSE_ZNOW=1 into all RPM build chroots (that use post-build-checks, which should be all of them).
- Packages can still deselect it.
Either: - use linker option "-z lazy". - or export SUSE_ZNOW=0 in the %build section.
Currently only "xorg-x11-server" and "python-atspi:tests" needed to do this in our staging.
+1 ,, so far I have seen the x server does not work (as you mention above) probably other components that load dsos in funky ways will also break. You should also note that you will not be able to interpose symbols anymore, you need to relink everything again to do so. You might want also not to pay the price of symbol interposition in the first place..since it wont work..at compile time by using -fno-plt and -fno-semantic-interposition.
On Tue, Apr 5, 2022 at 2:17 PM Cristian Rodríguez <cristian@rodriguez.im> wrote:
You should also note that you will not be able to interpose symbols anymore, you need to relink everything again to do so.
Before somebody asks why it is because LD does PLT elision at build time.. it changes PLT->GOT when -z now since the whole reason for the existence of PLT is lazy binding.
Am 05.04.22 um 20:24 schrieb Cristian Rodríguez:
On Tue, Apr 5, 2022 at 2:17 PM Cristian Rodríguez <cristian@rodriguez.im> wrote:
You should also note that you will not be able to interpose symbols anymore, you need to relink everything again to do so.
Before somebody asks why it is because LD does PLT elision at build time.. it changes PLT->GOT when -z now since the whole reason for the existence of PLT is lazy binding.
Can you elaborate on this? The GOT is still filled out by the dynamic linker, so if the linker thinks a symbol should be interposed (and the logic behind interposition is architecture-independent of course), why can it not write the address of a different function into the GOT? Also this is not what I see locally (on Tumbleweed x86_64): $ cat test.cpp #include <iostream> int main() { std::cout << "Hello World"; } $ g++ -O2 -Wl,-z,now -o test test.cpp $ readelf --relocs --wide test Relocation section '.rela.dyn' at offset 0x6e0 contains 5 entries: Offset Info Type Symbol's Value Symbol's Name + Addend [...] Relocation section '.rela.plt' at offset 0x758 contains 4 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000403fc0 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 __cxa_atexit@GLIBC_2.2.5 + 0 0000000000403fc8 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@GLIBCXX_3.4.9 + 0 0000000000403fd0 0000000400000007 R_X86_64_JUMP_SLOT 0000000000000000 _ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 + 0 0000000000403fd8 0000000900000007 R_X86_64_JUMP_SLOT 0000000000401060 _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 + 0 $ objdump -d --no-show-raw-insn test [...] Disassembly of section .plt: 0000000000401020 <__cxa_atexit@plt-0x10>: 401020: push 0x2f8a(%rip) # 403fb0 <_GLOBAL_OFFSET_TABLE_+0x8> 401026: jmp *0x2f8c(%rip) # 403fb8 <_GLOBAL_OFFSET_TABLE_+0x10> 40102c: nopl 0x0(%rax) 0000000000401030 <__cxa_atexit@plt>: 401030: jmp *0x2f8a(%rip) # 403fc0 <__cxa_atexit@GLIBC_2.2.5> 401036: push $0x0 40103b: jmp 401020 <_init+0x20> 0000000000401040 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt>: 401040: jmp *0x2f82(%rip) # 403fc8 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@GLIBCXX_3.4.9> 401046: push $0x1 40104b: jmp 401020 <_init+0x20> 0000000000401050 <_ZNSt8ios_base4InitC1Ev@plt>: 401050: jmp *0x2f7a(%rip) # 403fd0 <_ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4> 401056: push $0x2 40105b: jmp 401020 <_init+0x20> 0000000000401060 <_ZNSt8ios_base4InitD1Ev@plt>: 401060: jmp *0x2f72(%rip) # 403fd8 <_ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4> 401066: push $0x3 40106b: jmp 401020 <_init+0x20> Disassembly of section .text: 0000000000401070 <main>: 401070: sub $0x8,%rsp 401074: mov $0xb,%edx 401079: mov $0x402004,%esi 40107e: mov $0x404040,%edi 401083: call 401040 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt> 401088: xor %eax,%eax 40108a: add $0x8,%rsp 40108e: ret 40108f: nop [...] So there is a PLT and it's being used. So maybe it's not using "-z now"? $ readelf --dynamic test Dynamic section at offset 0x2d78 contains 30 entries: Tag Type Name/Value [...] 0x000000000000001e (FLAGS) BIND_NOW 0x000000006ffffffb (FLAGS_1) Flags: NOW [...] Adding Michael Matz because perhaps I'm wrong. Everything I know about ELF interposition is from [1] and I know that this is not undisputed. Aaron [1] <https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic>
On Wed, Apr 6, 2022 at 8:54 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
Can you elaborate on this? The GOT is still filled out by the dynamic linker, so if the linker thinks a symbol should be interposed (and the logic behind interposition is architecture-independent of course), why can it not write the address of a different function into the GOT?
https://github.com/bminor/binutils-gdb/blob/5f0b6b77f11ca1484b69babd7ab6729e... contains the logic .. apparently it no longer does it depending on BIND_NOW..it used to..but was apparently reverted because it broke LD_AUDIT.. Huh.. so I have to use Wl,-Bsymbolic-functions then? https://media.giphy.com/media/ToMjGpx9F5ktZw8qPUQ/giphy-downsized-large.gif -fvisibility=protected did not work the last time I tried either.
Hello, On Wed, 6 Apr 2022, Cristian Rodríguez wrote:
On Wed, Apr 6, 2022 at 8:54 PM Aaron Puchert <aaronpuchert@alice-dsl.net> wrote:
Can you elaborate on this? The GOT is still filled out by the dynamic linker, so if the linker thinks a symbol should be interposed (and the logic behind interposition is architecture-independent of course), why can it not write the address of a different function into the GOT?
https://github.com/bminor/binutils-gdb/blob/5f0b6b77f11ca1484b69babd7ab6729e... contains the logic .. apparently it no longer does it depending on BIND_NOW..it used to..but was apparently reverted because it broke LD_AUDIT..
Huh.. so I have to use Wl,-Bsymbolic-functions then?
I had something to say about Bsymbolic-functions recently in an internal thread, which found its way to stackoverflow via a colleague. It's not quite 100% precise and conflates some things, but essentially says the right thing: don't use it :) https://stackoverflow.com/a/71559422/1817805
-fvisibility=protected did not work the last time I tried either.
For which purpose? Ciao, Michael.
Hello, On Thu, 7 Apr 2022, Aaron Puchert wrote:
Am 05.04.22 um 20:24 schrieb Cristian Rodríguez:
On Tue, Apr 5, 2022 at 2:17 PM Cristian Rodríguez <cristian@rodriguez.im> wrote:
You should also note that you will not be able to interpose symbols anymore, you need to relink everything again to do so.
That's not true. The only real effect that -znow has is to set the DT_BIND_NOW flag in the produced component (shared lib or executable). And that simply makes ld.so resolve all relocations from that component to be resolved at load time, instead of lazily. -znow does _not_ change the ELF interposition and visibility rules in any way. Just because it came up in the thread: do note that the PLT is not for enabling interposition, it's for enabling lazy binding. Interposition is also possible without a PLT, as it always was for e.g. data symbols. So, it doesn't matter (for interposition) if the PLT is now used or not. With -znow it would be possible to avoid the PLT by essentially inlining it: callit(); would be translated roughly to callq *GOT[id_of_callit] (instead of 'callq plt_slot_of_callit'), and at that GOT slot a normal relocation to symbol 'callit' would be recorded. And that one is (with znow) resolved at load time, instead of by the PLT slot calling into the dynamic linker. That scheme can break certain assumptions (e.g. from LD_AUDIT), so it's not done. But even if it were done it wouldn't break interposition. Now, there are contrived examples where znow changes something. You basically need to construct examples where the final outcome of symbol resolution depends on timing: i.e. if the resolved symbol changes from load time compared to somewhen later. One could imagine a component calling foobar() (which may have a definition in say libfoo), and then a dlopen'ed component is also defining foobar. Now with lazy loading foobar would be only resolved at call time, and hence when that occurs (for the first time) after the dlopen, _and_ that dlopen is done in a very specific way so that that component comes in front of libfoo (which isn't normally the case) then foobar will be resolved to the dlopen-component instead of libfoo. With znow it will always resolve to the definition in libfoo (because the dlopen hadn't happened yet).
Before somebody asks why it is because LD does PLT elision at build time.. it changes PLT->GOT when -z now since the whole reason for the existence of PLT is lazy binding.
Can you elaborate on this? The GOT is still filled out by the dynamic linker, so if the linker thinks a symbol should be interposed (and the logic behind interposition is architecture-independent of course), why can it not write the address of a different function into the GOT?
And this is exactly what would happen if the PLT would be elided^Winlined and hence also not break interposition. As you note the PLT is still used on x86-64 though, for other reasons. To wit: % cat app.c extern void interpose_this(void); int main() { interpose_this(); return 0; } % cat lib.c void interpose_this(void) { } % cat interpose.c extern int printf(const char *, ...); void interpose_this(void) { printf("it works\n"); } % gcc -Wl,-z,now -fPIC -shared -o interpose.so interpose.c % gcc -Wl,-z,now -fPIC -shared -o liblib.so lib.c % gcc -Wl,-z,now -Wl,-rpath,. app.c -L. -llib % ./a.out % LD_PRELOAD=interpose.so ./a.out it works So, it works, like it says there ;-) Ciao, Michael.
Am 05.04.22 um 20:17 schrieb Cristian Rodríguez:
You might want also not to pay the price of symbol interposition in the first place..since it wont work..at compile time by using -fno-plt and -fno-semantic-interposition.
There is <https://bugzilla.opensuse.org/show_bug.cgi?id=1187864>. We'll likely not get -fno-semantic-interposition globally, but once the issues with protected visibility get resolved that might be a good alternative. This would make the choice more explicit. But I gather it'll take a while before people start using it after it gets fixed, so that's more of a long term solution. Some projects are already using -fno-semantic-interposition by their own choice, e.g. LLVM since version 13. (If I remember correctly.) Aaron
participants (5)
-
Aaron Puchert
-
Cristian Rodríguez
-
Jan Engelhardt
-
Marcus Meissner
-
Michael Matz