| DuckDB Internals Part 1(greybeam.ai) | |
| 452 points by marklit 4 days ago | 139 comments | |
tl;dr: Part 1 of a deep dive into DuckDB internals covers everything that happens before query execution: in-process architecture (avoiding ODBC/JDBC serialization overhead via zero-copy reads from Arrow/pandas buffers), the parse/bind/optimize pipeline (~30 optimizer passes including filter pushdown, subquery unnesting, and dynamic join-filter pushdown), and physical planning via pipelines broken up by sinks (GROUP BY, ORDER BY, hash join builds). It also explains the storage layer: 256KB blocks, columnar row groups with zone maps for pruning, and how DuckDB efficiently queries Parquet (using footer stats) and CSV (via an auto-sniffer for dialect and types). | |
HN Discussion:
| |