• TehPers@beehaw.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    12 hours ago

    Guess I’ll post another update. The block-based data structure makes no sense to me. At some point it claims that looking up a pair in the data structure is O(1):

    To delete the key/value pair ⟨a,b⟩, we remove it directly from the linked list, which can be done in O(1) time.

    This has me very confused. First, it doesn’t explain how to find which linked list to remove it from (every block is a linked list, and there are many blocks). You can binary search for the blocks that can contain the value and search them in order based on their upper bounds, but that’d be O(M * |D_0|) just to search the non-bulk-prepended values.

    Second, it feels like in general the data structure is described primarily from a theoretical perspective. Linked lists here are only solid in theory, but from a practical standpoint, it’s better to initialize each block as a preallocated array (vector) of size M. Also, it’s not clear if each block’s elements should be sorted by key within the block itself, but it would make the most sense to do that in my opinion, cutting the split operation from O(M) to O(1), and it’d answer how PULL() returns “the smallest M values”.

    Anyway, it’s possible also that the language of the paper is just beyond me.

    I like the divide-and-conquer approach, but the paper itself is difficult to implement in my opinion.