I’m doing some Galois Field / Cyclic Redundancy Check research for fun and I’ve come across an intriguing pattern that I need a data structure for.

Across the 64-bit (or even 128-bit or larger) spaces, I’ve discovered an interesting pattern relating to hamming distances that I’d like a data structure to represent.

I’m going to need something on the order of ~billions of intervals each having somewhere between 1 item to ~1 billion per interval. And I’d like to quickly (O(1) or O(lg(n))) determine if other intervals intersect.


For 32-bit space I can simply make a 512MB Bitmask lol and then AND/OR the two Bitmask. Easy

But for 64-bit space I’m stuck and a bit ignorant to various data structures. I’m wondering if someone out there has a good data structure for me to use?

I’ve read over Interval Trees on Wikipedia. I’m also considering binary decision diagram over the 64-bits actually. Finally I’m thinking of some kind of 1-dimension octtree like datastructure (is that just a binary tree?? Lol. But BVH trees in 3d space seems similar to my problem it’s just I need it optimized down to 1 dimension rather than 3.) Anyone else have any other ideas or cool data structures that might work?

    • drspod@lemmy.ml
      link
      fedilink
      arrow-up
      2
      ·
      7 hours ago

      I have not used it for 1 dimension

      Querying for all intervals (x’, y’) that overlap with an interval (x, y) is a two dimensional range query that selects the upper left quadrant of the x-y plane that satisfies x < y’ && x’ < y

  • aes@programming.dev
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    14 hours ago

    For 32-bit space I can simply make a 512MB Bitmask lol and then AND/OR the two Bitmask. Easy

    Uh, no, that’s O(n), yeah?

    • dragontamer@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      7 hours ago

      And typical RAM speeds are 100GB/second for CPUs and 500GB/second on GPUs, meaning 512MB operations are literally on the order of 5 miliseconds for CPU and 1ms on GPU.

      Below certain sizes, the ‘billions of intervals’ is larger than the damn Bitmask. Seriously, 8 bytes per interval (aka one pointer and 0 data) and that’s 8GB for the data structure.

      Instead of a billion 32-bit intervals to store (4GB of RAM at the minimum) it’s obviously a better move to store 500-million byte Bitmasks. And modern GPUs can crush that in parallel with like 3 lines of CUDA anyway.