• w3dd1e@lemmy.zip
    link
    fedilink
    English
    arrow-up
    16
    ·
    28 days ago

    Firefox kept crashing on me a few days ago. Decided to run MemTest86 and sure enough. Bad RAM.

  • flamingo_pinyata@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    8
    ·
    28 days ago

    This is how dev humblebrag sounds like.
    Our app is so stable only random hardware events like bitflips can crash it.

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      28 days ago

      LOL, nah, Firefox isn’t that stable. If 10% of crashes were caused by bad RAM, it means 90% were still caused by something else.

      (My install regularly gets a memory leak that eventually makes my system unusable, BTW. I don’t think it’s necessarily the fault of Firefox itself – more likely Javascript running in tabs, maybe interacting with an extension or something, and some of the blame goes to the kernel’s poor handling of low memory conditions – but it’s definitely not “dev humblebrag stable” for me.)

      • SkyeStarfall@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        5
        ·
        28 days ago

        10% of all crashes is definitively a brag. Crashes due to faulty hardware/bitflips is rare rare, generally I would expect that percentage to be less than 1% in any complex app

      • Liketearsinrain@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        27 days ago

        A lot of these crashes were caused by third party security software injecting code into firefox. There was also some malware, and utilities like driver helpers.

        I don’t have precise numbers, but you may be able to search for it.

  • Toes♀@ani.social
    link
    fedilink
    English
    arrow-up
    7
    ·
    27 days ago

    I used to be a part of an anticheat dev team and we discovered that this was a common problem back in the Windows XP era.

    We added a routine to check the memory addresses used after a crash and notified the user if we suspected hardware failure.

    At the time we suspected unstable overclocks because the metrics showed us the computers affected were typically overclocked as well.

    • llii@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      4
      ·
      28 days ago

      When I upgrade my home server I would like a low-power system with ECC RAM. I hope it will be financially viable in the future.

      • tal@lemmy.today
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        28 days ago

        The problem is that ECC is one of the things used to permit price discrimination between server (less price sensitive) and PC (more price sensitive) users. Like, there’s a significant price difference, more than cost-of-manufacture would warrant. There are only a few companies that make motherboard chipsets, like Intel, and they have enough price control over the industry that they can do that. You’re going to be paying a fair bit more to get into the “server” ecosystem, as a result of that.

        Also…I’m not sure that ECC is the right fix. I kind of wonder whether the fact is actually that the memory is broken, or that people are manually overclocking and running memory that would be stable at a lower rate at too high of a rate, which will cause that. Or whether BIOSes, which can automatically detect a viable rate by testing memory, are simply being too aggressive in choosing high memory bandwidth rates.

        EDIT: If it is actually broken memory and only a region of memory is affected, both Linux and Windows have the ability to map around detected bad regions in memory, if you have the bootloader tell the kernel about them and enough of your memory is working to actually get your kernel up and running during initial boot. So it is viable to run systems that actually do have broken memory, if one can localize the problem.

        https://www.gnu.org/software/grub/manual/grub/html_node/badram.html

        Something like MemTest86 is a more-effective way to do this, because it can touch all the memory. However, you can even do runtime detection of this with Linux up and running using something like memtester, so hypothetically someone could write a software package to detect this, update GRUB to be aware of the bad memory location, and after a reboot, just work correctly (well, with a small amount less memory available to the system…)

        • AA5B@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          27 days ago

          I wonder if ai can actually help here. As the industry abandons consumer hardware in favor of datacenter equipment to profit from the ai bubble, perhaps ecc memory will become cheaper

        • grue@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          28 days ago

          There’s no real good reason that all RAM shouldn’t have been ECC since decades ago. It doesn’t actually cost much more to implement. The only reason it isn’t, as tal’s reply mentioned, is artificial price discrimination.

    • Jarix@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      28 days ago

      How so?

      Didn’t it just highlight how stable the software is?

      I assume bitflipping crashes most softwares. If your software is so stable that hardware errors that effect everyone equally(which may be my erroneous assumption I’ll admit) then it is staying that if Firefox is crashing on you, it might be time to run some diagnosis on your hardware.

      A litmus test as a browser

      • xxce2AAb@feddit.dk
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        28 days ago

        Fair question. I find it unnerving, because there’s very little a software developer can meaningfully do if they cannot rely on the integrity of the hardware upon which their software is running, at least not without significant costs, and ultimately if the problem is bad enough even those would fail. This finding seems to indicate that a lot of hardware is much, much less reliable than I would have thought. I’ve written software for almost thirty years and across numerous platforms at this point, and the thought that I cannot assume a value stored in RAM to reliably retain it’s value fills me with the kind of dread I wouldn’t be able to explain to someone uninitiated without a major digression. Almost everything you do on any computing device - whether a server or a smart phone relies on the assumption of that kind of trust. And this seems to show that assumption is not merely flawed, but badly flawed.

        Suppose you were a car mechanic confronted with a survey that 10 percent of cars were leaking breaking fluid - or fuel. That might illustrate how this makes me feel.

        • Jarix@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          28 days ago

          Hmm thanks, also please massively digress if you would like to.

          I interpreted it like 10% is a lot if it’s 10% of a million. That 100,000. So if there’s a million things that crash Firefox that’s a high number.

          If Firefox only crashes 10 times a year because it runs that well, 10% or that 1 time it crashes from a bitflip is impressive that the rare bitflip takes up such a high percentage of total crashes because Firefox just doesn’t crash very often.

          If your dread is found to be justified that won’t be too surprising, to me, if hardware is getting made less reliable these days thing. Enshitification being the norm, and tech being in everything nowadays

          We obviously need more context from Mozilla, but this could be a canary in the mine type situation.

          But it would be kind of neat if Firefox became something of a reliable test for bitflipping unintentionally

          • xxce2AAb@feddit.dk
            link
            fedilink
            English
            arrow-up
            2
            ·
            27 days ago

            I agree, and there are a number of other biases to consider. Here’s some I can think of:

            • Firefox will mainly be running of desktops, laptops and smartphones. I would expect QA to be significantly better for this type of device than, say, consumer grade routers or TV boxes. But more concerning to me is stuff like cheap ATMs, industrial control systems (although Siemens have great QA) and elevator control systems etc. Infrastructure, not consumer toy, and Mozilla obviously aren’t the right people to say anything about the state of any of that.
            • While Mozilla is currently estimating approximately 200 million installs, some of those - especially on Linux - will have disabled telemetry. I know I do. With that said, I can’t recall the last time I had a FF CTD (crash to desktop) but I suspect when I did, it wasn’t even a bug but an OOM (out-of-memory) kill because I was browsing on something like a 2Gb RAM micro-portable with insufficient swap. FF is one impressively stable piece of software these days.
            • Firefox usage is not evenly globally distributed, and I have no way to reliably assess whether FF has a larger or smaller proportional usage in regions that may rely more on older or refurbished hardware, which I would expect to have higher HW error rates (although I cannot prove that either - I can’t find any good public aggregate data for RAM MTBF trends over time, but I’d be very interested if somebody else knows where to find authoritative answers on that).

            (Un)fortunately, this may be the most Mozilla can provide in terms on insight. Their users tend to be particularly sensitive of perceived or practical privacy violations, so I understand - and appreciate - their caution in gathering data.

  • GreenBeanMachine@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    28 days ago

    What makes Firefox more susceptible to bitflips than any other software? Wouldn’t that mean that 10% of all software crashes are caused by bitflips and it just depends what software you are running when that happens.

    • spizzat2@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      28 days ago

      I don’t think they’re arguing that Firefox is more susceptible to bit flips. They’re trying to say that their software is “solid” enough that a significant number of the reported crashes are due to faulty hardware, which is essentially out of their control.

      If other software used the same methodology, you could probably use the numbers to statistically compare how “solid” the code base is between the two programs. For example, if the other software found that 20% of their crashes were caused by bit flips, you could reasonably assume that the other software is built better because a smaller portion of their crashes is within their control.

      • GreenBeanMachine@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        28 days ago

        Interesting metrics to measure, but since I have no reference to how many crashes are caused by bitflips in any other software, it’s really hard to say if Firefox is super stable or super flaky.

    • xthexder@l.sw0.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      28 days ago

      This checks out with Linus Torvalds saying most OS crashes across linux AND windows are caused by hardware issues, and also why he uses ECC RAM.

      • douglasg14b@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        28 days ago

        Honestly yeah it’s 100% checks out.

        I have device that has ECC ram and I can keep it online and applications running for well over 18 months with no stability issues.

        However, both my work computers and my personal computer start to become unstable after about 15 to 20 days. And degrade over the course of 1 to 2 years (with a considerable increase in the number of corrupt system files)

        Firefox and chrome start to become unstable after usually a week if they have really high memory usage.

    • Buddahriffic@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      28 days ago

      No, the exact % depends on how stable everything else is.

      Like a trivial example, if you have 3 programs, one that sets a pointer to a random address and tries to dereference it, one that does this but only if the last two digits of a timer it checks are “69”, and one that never sets a pointer to an invalid address, based on the programs themselves, the first one will crash almost all the time, the second one will crash about 1% of the time, and the third one won’t crash at all.

      If you had a mechanism to perfectly detect bit flips (honestly, that part has me the most curious about the OP), and you ran each program until you had detected 5 bit flip crashes (let’s say they happen 1 out of each 10k runs), then the first program will have something like a 0.01% chance of any given crash being due to bit flip, about 1% for the 2nd one, and 100% for the 3rd one (assuming no other issues like OS stability causing other crashes).

      Going with those numbers I made up, every 10k “runs”, you’d see 1 crash from bit flips and 9 crashes from other reasons. Or for every crash report they receive, 1 of 10 are bit flips, and 9 of 10 are “other”. Well, more accurately, 1 of 20 for bit flip and 19 of 20 for other, due to the assumption that the detector only detects half of them, because they actually only measured 5%.

    • toddestan@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      28 days ago

      Programs that use more memory could be slightly more susceptible to this sort of thing because if a bit gets randomly flipped somewhere in a computer’s memory, the bit flip more likely to happen in an application that has a larger ram footprint as opposed to an application with a small ram footprint.

      I’m still surprised the percentage is this high.