# discord - p0-2024-fall - vittilad — 2024/9/4 18:341. Count the zeros from the right (this is the Rj in the cardinality computation). Also find the dense bucket index.
2. Reconstruct the curently stored Rj for the bucket. (See step 4 how to do). Compare to determine if you need to overwrite the stored bucket value.
3. Store the 4 least significant bits of Rj binary representation in the dense array (and the 3 bits remaining in the overflow but only if they are not all zero, overflow bucket is a map from dense bucket index to thos 3 bits)4. When computing cardinality, you need to reconstruct the Rj from the dense/overflow storage. The 3 lower bits come from dense bucket and if there is an entry for the bucket index in overflow bucket you need to that into account.
# discord - p0-2024-fall - shengdao - 2024/12/15 11:00According to my understanding, in the presto implementation of HLL, the high p bits are taken as the common bucket number, recorded as bucket_index. Take the LSB as the value. If the LSB exceeds the bucket width (DENSE_BUCKET_SIZE), it needs to be placed in the overflow bucket. The high 3 bits (OVERFLOW_BUCKET_SIZE) are placed in the overflow bucket and stored in the form of (bucket_index,xxx). The remaining low 4 bits (DENSE_BUCKET_SIZE) are placed in the corresponding common bucket. When calculating the cardinality, traverse densebucket. If there is a corresponding value in the overflow bucket, take the maximum value of the two and calculate according to the formula to obtain the cardinality.
If my understanding is correct, when calculating the cardinality, CONSTANTmm/sum, the minimum sum can be about 2^(-15), and it will not calculate such a large cardinality, just like where I made a mistake. I would like to ask, what are the mistakes and omissions I made?
# discord - p0-2024-fall - EvanHuang — 2024/12/21 01:13you are supposed to combine bits from dense and overflow into the original number
注释中,dense_bucket_Structure holding dense buckets (or also known as registers)。