Free "DuckDB in Action" early access book

MotherDuck is pleased to offer this free early access PDF of the Manning "DuckDB in Action" book by Mark Needham, Michael Hunger and Michael Simons. The authors will be adding new chapters over time, which will be sent to you for free.

"DuckDB in Action" includes

Chapter 1: An introduction to DuckDB (summary)
- Why DuckDB, a single node in-memory database, emerged in the era of big data
- DuckDB’s capabilities
- How DuckDB works and fits into your data pipeline
Chapter 2: Getting started with DuckDB (summary)
- Installing and learning how to use the DuckDB CLI
- Executing commands in the DuckDB CLI
- Querying remote files
Chapter 3: Executing SQL queries (summary)
- The different categories of SQL statements and their fundamental structure
- Creating tables and structures for ingesting a real world dataset
- Laying the fundamentals for analyzing a huge dataset in detail
- Exploring DuckDB-specific extensions to SQL
Chapter 4: Advanced aggregation and analysis of data (summary)
- Preparing, cleaning and aggregating data while ingesting
- Using window functions to create new aggregates over different partitions of any dataset
- Understanding the different types of sub-queries
- Using Common Table Expressions (CTEs)
- Applying filters to any aggregate
Chapter 5: Exploring data without persistence (summary)
- Converting CSV files to Parquet
- Auto inferring file type and data schema
- Creating views to simplify the querying of nested JSON documents
- Exploring the metadata of Parquet files
- Querying other databases like SQLite
Chapter 6: Integrating with the Python ecosystem (summary)
- The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
- Ingesting data from pandas DataFrames, Apache Arrow Tables and more via the Python API
- Querying pandas DataFrames with DuckDB methods
- Exporting data to various DataFrames formats and Apache Arrow Tables
- Using DuckDB’s relational API to compose queries
Chapter 7: DuckDB in the Cloud with MotherDuck (summary)
- The idea behind MotherDuck
- Understanding how the architecture works under the hood
- Use cases for serverless SQL analytics
- Creating, managing, and sharing MotherDuck databases
- Tips for optimizing your MotherDuck usage
Chapter 8: Building data pipelines with DuckDB (summary)
- The meaning and relevance of data pipelines
- What roles DuckDB can have as part of a pipeline
- How DuckDB integrates with tools like the Python based data load tool (dlt) for ingestion and the data build tool (dbt) from dbt Labs for transformation
- Orchestrating pipelines with Dagster
Chapter 9: Building and Deploying Data Apps (summary)
- Building an interactive web application with Streamlit
- Deploying Streamlit applications with Streamlit Community Cloud
- Rendering interactive charts with Plot.ly
- Creating a dashboard for Business Intelligence (BI) with Apache Superset
- Creating charts from a custom SQL query with Apache Superset
Chapter 10: Performance considerations for large datasets (summary)
- Preparing large volumes of data to be imported into DuckDB
- Querying metadata and running exploratory data analysis (EDA) queries on the large datasets
- Exporting full databases concurrently to Parquet
- Using aggregations on multiple columns to speed up statistical analysis
- Using EXPLAIN and EXPLAIN ANALYZE to understand query plans
Chapter 11: Conclusion (summary)