Some of the APIs in use on Linux today come from older Unix variants. (For this reason, I probably wouldn’t call one of these a “Linux API” as the author did, though I guess it works linguistically for those that are usually present on Linux.) These APIs have semantics that were designed before threading existed on many platforms. Making them thread-safe without breaking existing code can be challenging.
If
setenv(3)
is among these, it could explain why glibc’s implementation doesn’t support multi-threaded programs, and why its documentation states as much. To have used it in a multi-threaded environment, ignoring the docs, was a bug in the Steam client. Perhaps it never occurred to the people who ported Steam’s code to glibc that threading issues might be different from what they were used to on other platforms.To be fair, the author might be aware of this, as he did refer to glibc’s implementation as a tradeoff rather than a bug.
That’s not a completely reliable fix, a third party library could still call setenv and trigger crashes, there’s still a risk of data races, but we’ve observed a significant reduction in SIGABRT volumes.
Hmm. If they want a dirty hack, I expect they could do a library interposer that overrides
setenv(3)
andgetenv(3)
symbols with versions that grab a global “environment variable” lock before calling the actual function.They say that they’re having problems with third party libraries that use environment variables. If they’re using third-party libraries statically-linked against libc, I suppose that won’t work, but as long as they’re dynamically-linked, should be okay.
EDIT: Though you’ve still got an atomic update problem with the returned buffer, doing things the way they are, if you don’t want to leak memory. Like, one thread might have half-updated the value of the buffer when another is reading the buffer after returning from the interposer’s version of the function. That shouldn’t directly crash, but you can get a mangled environment variable value. And there’s not going to be guarantees on synchronization on access to the buffer, unlike the
getenv()
call itself.thinks
This is more of a mind-game solution, but…
Well, you can’t track lifetime of pointers to a buffer. So there’s no true fix that doesn’t leak memory. Because the only absolute fix is to return a new buffer from
getenv()
for each uniquesetenv()
, because POSIX provides no lifetime bounds.But if you assume that anything midway through a buffer read is probably going to do so pretty soon, which is probably true…
You can maybe play tricks with
mmap()
andmremap()
, if you’re willing to blow a page per environment variable that you want to update and a page of virtual address space per update, and some temporary memory. The buffer you return from the interposer’sgetenv()
is an mmap()ed range. In the interposer’ssetenv()
, if the value is modified, youmremap()
withMREMAP_DONTUNMAP
. Future calls togetenv()
return the new address. That gives you a userspace page fault handler to the old range, which I suppose – haven’t written userspace page fault handlers myself – can probably block the memory read until the new value is visible and synchronize on visibility of changes across threads.If you assume that any read of the buffer is sequential and moving forward, then if a page fault triggers on an attempted access at the address at the start of the page, then you can return the latest value of the value.
If you get a fault via an address into the middle of the buffer, and you still have a copy of the old value, then you’ve smacked into code in the middle of reading the buffer. Return the old value.
A given amount of time after an update, you’re free to purge old values from
setenv()
. Can do so out of the interposer’s functions.You can never eliminate that chance that a thread has read the first N bytes of an environment variable buffer, then gone to sleep for ten minutes, then suddenly wants the remainder. In that case, you have to permit for the possibility that the thread sees part of the old environment variable value and part of the new. But you can expend temporary memory to remember old values longer to make that ever-more unlikely.
This was a neat read, thanks for sharing!