Issue #103·Tuesday, April 14, 2026

Data insights for every level.

Datum Daily covers the latest in data analytics, data science, data engineering, and applied statistics — from foundational concepts to cutting-edge research.


Latest
Data Snippets
Quick insights · Under 2 min read
Pro TipApr 3

Always check your join cardinality before aggregating

Before you run a GROUP BY after a JOIN, verify the cardinality of the join. A many-to-many join silently inflates your row count, causing double-counting in any SUM or COUNT that follows. Run SELECT COUNT(*) and SELECT COUNT(DISTINCT key) on both sides of the join first. If they differ, you have a fan-out problem — and your aggregations are wrong.

By the NumbersApr 2

The global data engineering job market grew 50% in 2025

According to LinkedIn's 2025 Jobs on the Rise report, Data Engineer was the fastest-growing technical role for the third consecutive year, with a 50% year-over-year increase in job postings. The median base salary in the US reached $148,000. Cloud data platforms (Snowflake, Databricks, BigQuery) and dbt proficiency were the most-cited skills in job descriptions.

Source: LinkedIn Jobs on the Rise 2025

DefinitionApr 1

What is a Confidence Interval — in plain English

A 95% confidence interval does NOT mean there is a 95% probability that the true value lies within the interval. It means: if you repeated your experiment 100 times and computed a confidence interval each time, approximately 95 of those intervals would contain the true population parameter. The true value is fixed — it is either in your interval or it is not. The probability applies to the procedure, not to any single interval.

Tool SpotlightMar 31

DuckDB: Run SQL on local files without a database server

DuckDB is an in-process analytical database that runs entirely in memory or on local files — no server, no setup, no credentials. You can query CSV, Parquet, and JSON files directly with standard SQL: SELECT * FROM 'data.parquet' WHERE year = 2025. It is orders of magnitude faster than pandas for analytical queries on datasets up to a few GB. Install with pip install duckdb. For data exploration and local ETL, it is one of the most useful tools added to the data stack in recent years.

Pro TipMar 30

Use EXPLAIN ANALYZE before optimizing any slow query

Before adding an index or rewriting a query, always run EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN FORMAT=JSON (MySQL) to see the actual execution plan. The most common culprits for slow queries are sequential scans on large tables, nested loop joins on unindexed columns, and sort operations that spill to disk. The query planner's cost estimates tell you where to focus — do not optimize blindly.

Chart InsightMar 29

Why your bar chart's y-axis should almost always start at zero

Truncating the y-axis of a bar chart — starting it at a value other than zero — visually exaggerates differences between bars. A bar representing 102 will look twice as tall as one representing 101 if the axis starts at 100. This is not just a stylistic choice: it is a form of data distortion. The exception is when you are showing change over time on a line chart, where starting at zero can compress meaningful variation. For bar charts comparing absolute values, zero is the rule.


8+
Articles Published
4
Topic Areas
Weekly
Newsletter
Free
Always
Newsletter

The data briefing that respects your time

Join thousands of data professionals who read Datum Daily every week. Tutorials, industry news, and curated insights — no fluff, no spam.

No spam. Unsubscribe anytime. Powered by Beehiiv.