DuckDB 完全实战指南:Python 嵌入式分析数据库从入门到精通
DuckDB 是 Python 数据分析领域最热门的嵌入式分析数据库。全面讲解安装配置、直接查询 CSV/Parquet 文件、Pandas 和 Polars 集成、窗口函数、核外计算、性能对比与 2026 年推荐工作流,附完整代码示例。
Sofia is a Python data engineer with 7 years building ingestion and transformation systems for media and adtech. She spent three years at Spotify on the personalization-data team, where she shipped a streaming-to-batch reconciliation pipeline that processes around 90 billion playback events per day, and two years before that at The New York Times on the subscriber-analytics platform. She focuses her writing on production pandas patterns (chunked reads, categorical memory tricks, Arrow interop), Airflow 2.x task groups, and the kinds of dbt + Python hybrid pipelines that show up once your warehouse bill stops being cute. She also maintains pyspark-helpers, a small library for column-name munging she keeps porting between jobs. Sofia is based in Madrid, originally from Bogota, and a relentless defender of type hints in notebook code.
DuckDB 是 Python 数据分析领域最热门的嵌入式分析数据库。全面讲解安装配置、直接查询 CSV/Parquet 文件、Pandas 和 Polars 集成、窗口函数、核外计算、性能对比与 2026 年推荐工作流,附完整代码示例。