Advent of SQL 2025 with DuckDB and R

Intro

This year, the Advent of SQL is hosted by the Database School. I don’t know anything about them except that they took over the Advent of SQL from last year. There will be only 10 challenges this year (with 25 challenges last year, it felt a little long, so this is a welcome change). The spirit of the challenges seem to remain the same: using SQL to solve Christmas theme puzzles. The delivery format is however different as it uses the Database School platform and format. You need to create an account, and log in to access the challenges and their associated data. Each challenge is in the form of a video tutorial with an associated playground.

I’m going to use these challenges as an opportunity to brush up my SQL skills, using DuckDB. I’m going to work from R (just in case I need to do any additional data manipulation or visualization), but my goal this year is to do everything using DuckDB SQL (and not use LLMs for help, just searching and reading the docs the old fashion way). I might use LLMs to propose more elegant/alternative solutions once I have a working solution.

I’ll post my solutions daily (or as often as I can manage) below. The data can be downloaded from the Database School website once you created an account.

Day 1

It’s a single table, containing messy wish list data. The goal is to find the most common wishes ordered in descending order.

# Replace `BIGSERIAL` with `INTEGER` in `wish_list` table definition.

# Create DuckDB database with:
#  duckdb ./data_duckdb/advent_day_01.duckdb < ./data_sql/day1-wish-list.sql
# Be patient, these single inserts take a while to run in duckdb (about 90s)

con <- DBI::dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_01.duckdb")
DBI::dbGetQuery(
  con,
  "
  SELECT wish, count(wish) AS n
    FROM (SELECT lower(trim(raw_wish)) AS wish FROM 'wish_list')
    GROUP BY wish
    ORDER BY n DESC;
"
)

Day 2

With day 2, we have two tables: snowball_inventory and snowball_categories. The goal is to find the total quantity of items in inventory for each category, ordered by total quantity ascending. Only items with quantity > 0 should be included. You need to watch the video to understand the challenge as some information is not included in the challenge description itself.

# Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_02.duckdb < ./data_sql/day2-inserts.sql

con <- DBI::dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_02.duckdb")
DBI::dbGetQuery(
  con,
  "
  SELECT category_name, SUM(quantity) AS total_quantity
  FROM (
       SELECT i.category_name, i.status, i.quantity, o.*
       FROM snowball_inventory i
       JOIN snowball_categories o
      ON (i.category_name = o.official_category AND quantity > 0)
    )
    GROUP BY category_name
    ORDER BY total_quantity ASC;
  "
)

Day 3

Copy and paste from the challenge:

Using the hotline_messages table, update any record that has “sorry” (case insensitive) in the transcript and doesn’t currently have a status assigned to have a status of “approved”. Then delete any records where the tag is “penguin prank”, “time-loop advisory”, “possible dragon”, or “nonsense alert” or if the caller’s name is “Test Caller”. After updating and deleting the records as described, write a final query that returns how many messages currently have a status of “approved” and how many still need to be reviewed (i.e., status is NULL).

# Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_03.duckdb < ./data_sql/day3-inserts.sql

con <- DBI::dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_03.duckdb")

DBI::dbExecute(
  con,
  "
  UPDATE hotline_messages
  SET status = 'approved' 
  WHERE LOWER(transcript) LIKE '%sorry%'
    AND status IS NULL;
  "
)

DBI::dbExecute(
  con,
  "
  DELETE FROM hotline_messages
  WHERE tag IN (
    'penguin prank',
    'time-loop advisory',
    'possible dragon',
    'nonsense alert'
    )
    OR caller_name = 'Test Caller';
  "
)

DBI::dbGetQuery(
  con,
  "
  SELECT clean_status, COUNT(clean_status)
  FROM (
    SELECT
    status, 
      CASE 
        WHEN status IS NULL THEN 'TBD'
        ELSE 'approved'
      END as clean_status
    FROM hotline_messages
    )
  GROUP BY clean_status;
"
)

Day 4

Copy and paste from the challenge:

Using the official_shifts and last_minute_signups tables, create a combined de-duplicated volunteer list. Ensure the list has standardized role labels of Stage Setup, Cocoa Station, Parking Support, Choir Assistant, Snow Shoveling, Handwarmer Handout. Make sure that the timeslot formats follow John’s official shifts format.

I used the snake case format for the role, it looks like the challenge actually asked for title case. I left the ‘ELSE TBD’ clauses in there as I used them when building the queries to make sure I caught all the cases. I had also checked for unique values in the time slots and given there were just a few I went for a CASE WHEN approach rather than something more sophisticated.

# Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_04.duckdb < ./data_sql/day4-inserts.sql

con <- DBI::dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_04.duckdb")

DBI::dbGetQuery(
  con,
  "SELECT * FROM official_shifts"
)

DBI::dbGetQuery(
  con,
  "SELECT
  volunteer_name,
  CASE
    WHEN assigned_task ILIKE '%choir%' THEN 'choir_assistant'
    WHEN assigned_task ILIKE '%stage%' THEN 'stage_setup'
    WHEN assigned_task ILIKE '%cocoa%' THEN 'cocoa_station'
    WHEN assigned_task ILIKE '%parking%' THEN 'parking_support'
    WHEN assigned_task ILIKE '%shovel%' THEN 'snow_shoveling'
    WHEN assigned_task ILIKE '%hand%' THEN 'handwarmer_handout'
    ELSE 'TBD'
  END as role,
  CASE
    WHEN (time_slot='10AM' OR time_slot ='10 am') THEN '10:00 AM'
    WHEN (time_slot='2 PM' OR time_slot='2 pm') THEN '2:00 PM'
    WHEN time_slot = 'noon' THEN '12:00 PM'
    ELSE 'TBD'
  END as shift_time
  FROM last_minute_signups
 
  UNION

  SELECT volunteer_name,
         role,
         shift_time
  FROM official_shifts
  ORDER BY volunteer_name;
"
)

Day 5

Copy and paste from the challenge:

Challenge: Write a query that returns the top 3 artists per user. Order the results by the most played.

## Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_05.duckdb < ./data_sql/day5-inserts.sql

con <- DBI::dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_05.duckdb")

dbGetQuery(
  con,
  "
  SELECT * FROM(
    SELECT 
      user_name,
      artist,
      COUNT(artist) AS n,
      row_number() OVER (PARTITION BY user_name ORDER BY n DESC) as top
    FROM listening_logs
    GROUP BY user_name, artist
    ORDER BY user_name, n DESC
  )
  WHERE top <= 3;
  "
)

Day 6

Challenge: Generate a report that returns the dates and families that have no delivery assigned after December 14th, using the families and deliveries_assigned. Each row in the report should be a date and family name that represents the dates in which families don’t have a delivery assigned yet. Label the columns as unassigned_date and name. Order the results by unassigned_date and name, respectively, both in ascending order.

## Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_06.duckdb < ./data_sql/day6-inserts.sql

con <- dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_06.duckdb")

dbGetQuery(
  con,
  "WITH december_2025 AS
     (SELECT date::DATE date
      FROM generate_series(
        DATE '2025-12-15',
        DATE '2025-12-31',
        INTERVAL '1 day'
      ) AS t(date)
      ),

  full_info AS (
    SELECT december_2025.date,
      families.id AS family_id,
      families.family_name
    FROM families
    CROSS JOIN december_2025
  )

  SELECT
    full_info.family_id AS full_fid,
    full_info.family_name,
    full_info.date AS full_date,
    deliveries_assigned.*
  FROM full_info
  LEFT JOIN deliveries_assigned ON (
     full_info.date = deliveries_assigned.gift_date AND
     deliveries_assigned.family_id = full_info.family_id
  )
  WHERE deliveries_assigned.gift_name IS NULL
  ORDER BY date ASC, family_name ASC 
  ;
 "
)

Day 7

Challenge: Get the stewards a list of all the passengers and the cocoa car(s) they can be served from that has at least one of their favorite mixins. Remember only the top three most-stocked cocoa cars remained operational, so the passengers must be served from one of those cars.

# Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_07.duckdb < ./data_sql/day7-inserts.sql

con <- dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_07.duckdb")

dbGetQuery(
  con,
  "
  WITH available_mixins AS (
    SELECT
      car_id AS mixins_car_id,
      available_mixins
    FROM cocoa_cars
    ORDER BY total_stock DESC
    LIMIT 3
  )

  SELECT 
    passenger_name,
    string_agg(mixins_car_id) AS available_cars
  FROM passengers
  JOIN available_mixins ON (list_has_any(passengers.favorite_mixins, available_mixins.available_mixins))
  GROUP BY passenger_name
  ORDER BY passenger_name
  "
)

Day 8

Generate a report, using the products and price_changes tables for leadership that returns the product_name, current_price, previous_price, and the difference between the current and previous prices.

I took a (maybe?) unconventional approach by using the list functions to solve this challenge. I was focused on getting the price difference first. Using the lag would have reduced the redundancy or the list(... ORDER BY rn).

# Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_08.duckdb < ./data_sql/day8-inserts.sql

con <- dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_08.duckdb")

dbGetQuery(
  con,
  "
  WITH sub_prices AS (SELECT 
    product_id,
    price,
    effective_timestamp,
    row_number() OVER (PARTITION BY product_id ORDER BY effective_timestamp DESC) AS rn
  FROM price_changes)

  SELECT
    product_name,
    list(price ORDER BY rn)[2] AS current_price,
    list(price ORDER by rn)[1] AS previous_price,
    list_reduce(list(price ORDER by rn), lambda x,y : x - y) AS price_change
  FROM  sub_prices
  JOIN products USING (product_id)
  WHERE rn < 3
  GROUP BY product_id, product_name
  ORDER BY product_id;
  "
)

Day 9

Build a report using the orders table that shows the latest order for each customer, along with their requested shipping method, gift wrap choice (as true or false), and the risk flag in separate columns. Order the report by the most recent order first so Evergreen Market can reach out to them ASAP.

# Edit the orders table definition to replace `JSONB` with `JSON`.

# Create DuckDB database with :
# duckdb ./data_duckdb/advent_day_09.duckdb < ./data_sql/day9-inserts.sql

library(duckdb)
con <- dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_09.duckdb")
dbExecute(con, "INSTALL json; LOAD JSON;")
dbGetQuery(
  con,
  "
  WITH customer_orders AS (
    SELECT *,
      row_number() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
    FROM orders
    ORDER BY customer_id, rn
  )

  SELECT
    customer_id,
    json_extract_string(order_data, '$.shipping.method') AS shipping_method,
    json_extract_string(order_data, '$.gift.wrapped')::BOOL AS gift_wrap,
    json_extract_string(order_data, '$.risk.flag') AS risk_flag
  FROM customer_orders 
  WHERE rn = 1
  ORDER BY created_at DESC;
  "
)

Day 10

Challenge: Clean-up the deliveries table to remove any records where the delivery_location is ‘Volcano Rim’, ‘Drifting Igloo’, ‘Abandoned Lighthouse’, ‘The Vibes’. Move those records to the misdelivered_presents with all the same columns as deliveries plus a flagged_at column with the current time and a reason column with “Invalid delivery location” listed as the reason for each moved record. Make sure your final step shows the misdelivered_presents records that you just moved (i.e. don’t include any existing records from the misdelivered_presents table).

I first solved the challenge by using CTEs to return the appropriate sets of rows. After watching the solution, I discovered RETURNING. I don’t think I have ever used operations in SQL that were destructive, so that was new to me. It seems (and don’t quote me on that) that it’s not possible to do a CTE using DELETE in duckDB. Instead, I relied on a temporary table, first combined with an anti-join to delete the records from the deliveries table, and then combined with an INSERT to add these records to the misdelivered_presents. I still got to use RETURNING to only see the insert rows at the end of the query.

The last few queries at the end validated that the tables were modified correctly.

# Create DuckDB database with (no need to edit the file):
# duckdb ./data_duckdb/advent_day_10.duckdb < ./data_sql/day10-inserts.sql

con <- dbConnect(duckdb::duckdb(), "data_duckdb/advent_day_10.duckdb")

dbGetQuery(
  con,
  "
    CREATE TEMPORARY TABLE deliveries_to_remove AS (
      SELECT * FROM deliveries
      WHERE delivery_location IN
        ('Volcano Rim', 'Drifting Igloo', 'Abandoned Lighthouse', 'The Vibes')
    );

    CREATE OR REPLACE TABLE deliveries AS (
      SELECT * FROM deliveries
      ANTI JOIN deliveries_to_remove USING (id)
    );

    INSERT INTO misdelivered_presents
    SELECT
      id, child_name, delivery_location, gift_name, scheduled_at, NOW(), 'Invalid Delivery Location'
    FROM deliveries_to_remove
    RETURNING *
  "
)

dbGetQuery(con, "SELECT * FROM deliveries;")
dbGetQuery(
  con,
  "SELECT * FROM deliveries
      WHERE delivery_location IN
        ('Volcano Rim', 'Drifting Igloo', 'Abandoned Lighthouse', 'The Vibes')"
)
dbGetQuery(con, "SELECT * FROM misdelivered_presents;")

Advent of SQL 2025 with DuckDB and R

François Michonneau, PhD

Intro

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Day 8

Day 9

Day 10

Comments