[Discuss] Port Scanning

markw at mohawksoft.com markw at mohawksoft.com
Wed Aug 7 15:09:03 EDT 2024


> On 8/7/24 09:41, Kent Borg wrote:

>
> -kb, the Kent who is lazy and likes having fast data structures for
> free, even though he is sure Mark could implement a faster B-tree in C*.

I know you are saying this with some humor, however, it is circumstantial.
What happens when you have billions of entries? 10 billion? Have you ever
looked at merkle tree?

AVL trees are interesting in that you can have a single "node" structure
that can be unrolled into a doubly linked list and rolled back into a
tree.

A classic binary tree where all the values are in the leafs and the
internal of the tree just leads you down the tree to the data may use more
memory and managing the leaf nodes may take more CPU.

Having a generic binary tree is good and all, but if your "secret sauce"
is the rapid retrieval of data the generics don't work well enough. If
your generic binary tree can't efficiently manage your data within the
constraints of memory and CPU, then, you need to use something else.

Additionally, you can structure things interestingly. If you memmap a slab
of memory into your process and sub-allocate out of that, you can keep
memory closer than just using malloc.

A binary tree is generally represented as O(log n) but can degrade as bad
as O(n) if you don't balance. You have to factor in the balancing process
on insertion.

You can use a hash table instead of a tree, and that can evolve into a
merkle tree, if you don't expect too many collisions you implement a
cuckoo hash.

>
> * The BTreeSet had some funny bumps in its performance curve, where it
> turns out it is allocating memory. If I understand correctly these could
> be avoided by initializing the data structure upfront to needed size,
> instead of being dynamic. The catch in their case is they don't know in
> advance how big the data will be. The AVL tree didn't have these bumps.
> I don't know whether the B-tree could be implemented to avoid them,
> while still being as fast. I'll leave that to others. (Maybe C's tree
> could /add/ some bumps to get faster.)

Like I've said, I don't tend to use the generic implementations in C/C++.
There are too many short comings.




More information about the Discuss mailing list