> Hash joins and sort-merge joins have been considered the algorithms of choice for analytical relational queries in most parallel database systems because of their performance robustness and ease of parallelization. On the other hand, nested loop joins have been considered less attractive and are conservatively used. In this paper, we revisit the potential of nested loop joins [...]
Uh, a hash join is basically a nested loop where first you create a hash table index of one side. When I finally realized that I suddenly understood why certain PG queries were slow!
So right off in the abstract I find this paper suspect.
Hash join is not a nested loop, it's two separate loops one after the other. This makes a huge difference since the complexity is O(N+M) rather than O(N*M). See these examples:
Convert all nested loops and funccalls to one large do whilw loop with chacherowsized bundled input and output in the compiler ?
> Hash joins and sort-merge joins have been considered the algorithms of choice for analytical relational queries in most parallel database systems because of their performance robustness and ease of parallelization. On the other hand, nested loop joins have been considered less attractive and are conservatively used. In this paper, we revisit the potential of nested loop joins [...]
Uh, a hash join is basically a nested loop where first you create a hash table index of one side. When I finally realized that I suddenly understood why certain PG queries were slow!
So right off in the abstract I find this paper suspect.
Hash join is not a nested loop, it's two separate loops one after the other. This makes a huge difference since the complexity is O(N+M) rather than O(N*M). See these examples:
https://vladmihalcea.com/hash-join-algorithm/
https://vladmihalcea.com/nested-loop-join-algorithm/
Oh, I see, thanks. I did screw up my understanding.
(2023)