I’ve been experiencing some perplexing and frustrating issues with my server, and need some advice from those more knowledgeable than me.

Recently I decided to upgrade my raspberry pi server and I found a good deal on an HP Elite Mini 600 G9 on eBay so I took the plunge. It’s got an Intel Core i5-12500T and came with 8gb ram and a 256 gb ssd. I bumped it up to 32gb ram and added a 4tb ssd. It came with windows installed but I installed Debian on there.

With the basics taken care of I got setup with my couple of docker containers (if it matters: caddy, actual budget, immich, prometheus, grafana). But ever since then, anytime some CPU-heavy process runs, the whole machine freezes and stays frozen (I’ve tried letting it go to see if it recovers but it stays frozen for days), and I am forced to physically power it down. I tried to isolate it, thinking it was one of the docker containers but it happened with immich, prometheus, & grafana individually, as well as a borg backup running directly on the machine. When I power it back on after one of these freezes there are not even any system logs from the entire period of the freeze, so I can’t learn anything from them to indicate the issue.

Anyone have any ideas what the issue could be or even where to look? I’m starting to think it’s a hardware problem but I’m not sure and I don’t know what my next step should be.

  • czardestructo@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 hour ago

    Check the power supply too. Those mini PCs just use a cheap laptop power brick and sometimes they can’t sustain their full output anymore. I had one server constantly crash and once I swapped the power brick I never had a problem again.

  • basic_user@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    ·
    22 hours ago

    I had a similar issue on an HP Elitedesk 800 mini G3. It turned out to be a faulty ram stick. Memtest revealed the issue and after unpligging the defective stick the issue was resolved.

    • swipernoswipey@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      16 hours ago

      Thank you!!! I think this is the issue. With one of the ram sticks in I can memtest just fine. With the other it never finishes and it just reboots.

      • SayCyberOnceMore@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        ·
        14 hours ago

        Definitely suspect.

        You should be able to let memtest run for days with no problems, so a reboot would either be a faulty stick or possibly a faulty motherboard slot.

        Swap the RAM between slots to isolate the root cause

  • AMillionMonkeys@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    22 hours ago

    CPU-heavy process

    Sounds to me like a hardware issue: you’re overheating. Find a way to monitor your temps. I’m not sure how to do this on Linux, so I’m open to suggestions too.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    1
    ·
    15 hours ago

    Leave the console visible on an attached monitor. I don’t recall if Debian out-of-box has Ctrl-Alt-F1 disabled, but if not, that’ll put you on the first console. If the kernel panics, it’ll display something there.

    If you can’t do that — no spare monitor — you can set up a serial port console to another machine. I don’t know off the top of my head how to have the kernel emit errors there by default if it’s not the default, but I’m quite sure that it’s possible; I’ve debugged machines with kernel stack traces on serial port consoles. Sending a BREAK was equivalent to Magic Sysrq, as I recall.

  • Shimitar@downonthestreet.eu
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    22 hours ago

    Ram issue or CPU overheat. Monitor CPU temperature over time, and run an extensive me test, like for an entire night…

  • ragingHungryPanda@piefed.keyboardvagabond.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    16 hours ago

    in addition to the other suggestions of checking the rame stick, do you have resource limits on your containers? It’s generally a good thing to have anyway, but I’d do that after checking the ram and cooling situation. Check your cpu temps as well.

  • Onomatopoeia@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    22 hours ago

    As others have said, it’s probably overheating.

    That’s a mini, and likely doesn’t have any fans at all (or something perfunctory), so probably won’t handle being run at high cpu for more than a few minutes.

    I currently have a small-form-factor pc with the same issue - drive and general box temps were high (drive was 110f, continuous, within range but on the edge). It would randomly reboot.

    Replacing the paste on the cpu cooler helped a lot (no more random reboots), but adding a compressor-type fan dropped box temps (and more importantly drive temps), down to room temp.

    I think the best you may be able to do is add an external compressor fan with some duct tape.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    20 hours ago

    Memtest? Boot a live image and stress test each component?

    I don’t think it’s overheating, usually that presents as throttling followed by a thermal protection power off.