Module 2 — Subqueries
Why subqueries exist
Sometimes a question can't be answered by looking at raw table data directly — you first need to compute something (a max, an average, a list of ids, a yes/no check), and only then compare each row against that computed thing. A subquery is a SELECT statement nested inside another SQL statement, used to produce that intermediate value, list, or table on the fly.
What it is
A subquery is just a SELECT wrapped in parentheses, placed somewhere another SQL statement expects a value, a list, or a table. It can appear in WHERE, SELECT, FROM, or even HAVING.
When to use it
- You need to compare a row against an aggregate (max, avg, count) computed from the same or a different table.
- You need to check existence of a related row without pulling back any of its columns (data validation, filtering).
- You need to build a small computed lookup table on the fly, used only within this one query.
When NOT to use it
- When a plain JOIN says the same thing more simply and you actually need columns from the other table — joins are usually easier for a reader to follow than deeply nested subqueries.
- When a correlated subquery would force a slow row-by-row loop (see Execution Flow below) and an equivalent JOIN or window function would run in a single pass.
Schema & sample data for this module
We reuse employees/departments from Module 1 (same data), and add a customers/orders pair for the data-quality/production examples.
DDL (recap from Module 1)
-- employees / departments: identical to Module 1 (16 employees, 6 departments, Legal has 0 employees,
-- Tanvi Desai & Sameer Gupta have NULL dept_id, Priya/Rohit(103) tie at 145000, Ananya/Vikram tie at 92000)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(50),
country VARCHAR(50)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT, -- can be a "dangling" id that doesn't exist in customers (data quality issue)
order_date DATE,
total_amount DECIMAL(10,2)
);
INSERT INTO customers VALUES
(201, 'Rahul Bansal', 'India'),
(202, 'Grace Lin', 'Singapore'),
(203, 'Omar Haddad', 'UAE'),
(204, 'Laura Fischer', 'Germany');
INSERT INTO orders VALUES
(5001, 201, '2024-01-10', 2500.00),
(5002, 202, '2024-01-11', 1200.00),
(5003, 201, '2024-01-15', 800.00),
(5004, 203, '2024-02-01', 4300.00),
(5005, 999, '2024-02-05', 650.00); -- customer_id 999 DOES NOT EXIST in customers -- orphaned order (data bug)
| customer_id | customer_name | country |
|---|---|---|
| 201 | Rahul Bansal | India |
| 202 | Grace Lin | Singapore |
| 203 | Omar Haddad | UAE |
| 204 | Laura Fischer | Germany |
| order_id | customer_id | total_amount |
|---|---|---|
| 5001 | 201 | 2500.00 |
| 5002 | 202 | 1200.00 |
| 5003 | 201 | 800.00 |
| 5004 | 203 | 4300.00 |
| 5005 | 999 | 650.00 |
customer_id = 999, which doesn't exist anywhere in customers. Laura Fischer (204) has never placed an order. These two facts drive most of the subquery examples below — subqueries are the natural tool for "find the mismatch."Scalar subquery
What it is: a subquery that returns exactly one row and one column — a single value. It can be used anywhere a plain value could go: in WHERE, in SELECT, even inside arithmetic.
Why it exists: you often need to compare a row against a single computed number — the overall max, the overall average, a specific lookup value — without hardcoding that number.
Easy example
Find employees who earn more than the company-wide average salary.
SELECT emp_name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
Line by line: the inner query SELECT AVG(salary) FROM employees runs first, completely on its own, and produces one number (roughly 100,687.5). The outer query then just does a plain comparison, as if you'd typed WHERE salary > 100687.5 directly.
WHERE filter, or used AVG without realizing a GROUP BY was implicitly needed), most databases throw a runtime error like "subquery returns more than one row" — it's not silently truncated to the first row. Always double check your subquery, run it standalone, and confirm it really returns one row before nesting it.Scalar subquery in SELECT
SELECT
emp_name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
IN subquery
What it is: a subquery that returns a list of values (one column, many rows), used with IN to check membership.
IN checks "is this person's name on the list?"Easy example
Find all employees who work in a department located in Bangalore.
SELECT emp_name
FROM employees
WHERE dept_id IN (
SELECT dept_id FROM departments WHERE location = 'Bangalore'
);
The inner query returns (1, 4) — Engineering and HR. The outer query keeps any employee whose dept_id is in that list.
Medium example — customers who have placed at least one order
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
);
-- Rahul, Grace, Omar -- Laura is excluded, she has never ordered
IN silently de-duplicates nothing on its own — if the subquery returns the same value many times (e.g. customer_id appearing in multiple orders), IN still just checks membership once per outer row; it does NOT multiply your outer rows the way a JOIN would. This is actually a strength of IN over a naive JOIN + DISTINCT for pure "does it exist" questions.NOT IN & the NULL trap
This is one of the most important gotchas in all of SQL, and it comes up constantly in interviews.
The problem, demonstrated
Goal: find customers who have never placed an order.
-- Looks reasonable...
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders
);
-- With OUR data (no NULLs in orders.customer_id), this correctly returns Laura Fischer.
Now watch what happens if just one row in the subquery has a NULL customer_id — which is extremely plausible in real data (an order placed as a guest checkout, with no customer linked yet):
INSERT INTO orders VALUES (5006, NULL, '2024-02-10', 300.00); -- a guest checkout order
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders
);
-- Now returns ZERO ROWS. Not an error -- just silently, completely empty.
Why this happens
NOT IN (list) is logically equivalent to column != v1 AND column != v2 AND column != v3 ... for every value in the list. If any one value in that list is NULL, that one comparison becomes column != NULL, which evaluates to UNKNOWN — not TRUE, not FALSE. And TRUE AND TRUE AND UNKNOWN evaluates to UNKNOWN overall, which is treated as "not true", so the row gets filtered out. This happens for every row being tested, since every row's chain of ANDs includes that same poisoned NULL comparison — so the whole result set silently collapses to empty.
The safe fix
SELECT c.customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- Correctly returns Laura Fischer, regardless of any NULLs in orders.customer_id
NOT EXISTS never compares values directly against NULL the way NOT IN does — it just asks "does at least one matching row exist," and a NULL row in the subquery simply never matches anything, which is exactly the harmless behavior you want.
WHERE customer_id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NOT NULL). This works, but it's an easy line to forget — NOT EXISTS is simply safer by default.ANY & ALL
What they are: operators that compare a value against every value returned by a subquery, using a comparison operator (=, >, <, etc.) combined with ANY or ALL.
> ANY (subquery)— true if the value is greater than at least one value returned (equivalent to> MIN(...)).> ALL (subquery)— true if the value is greater than every value returned (equivalent to> MAX(...)).
Example — employees earning more than ALL Marketing employees
SELECT emp_name, salary
FROM employees
WHERE salary > ALL (
SELECT salary FROM employees WHERE dept_id = 3 -- Marketing
)
AND dept_id != 3; -- exclude Marketing itself from the comparison
This is functionally identical to comparing against MAX(salary) FROM employees WHERE dept_id = 3, but written with ALL. Most engineers actually prefer the MAX()/MIN() scalar subquery form shown in the Level 1 section — it's more widely understood at a glance than ANY/ALL, which some engineers rarely encounter.
= ANY (subquery) is exactly equivalent to IN (subquery), and != ALL (subquery) is exactly equivalent to NOT IN (subquery) — including inheriting the exact same NULL trap discussed above. Interviewers sometimes ask you to name this equivalence directly.EXISTS & NOT EXISTS
What it is: a boolean test — TRUE if the subquery returns at least one row, FALSE if it returns none. EXISTS never cares what columns the subquery selects (writing SELECT 1 or SELECT * inside makes no difference) — only whether a row exists.
This is the "SEMI JOIN" pattern from Module 1, formalized.
Easy example
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- Rahul, Grace, Omar (Laura excluded -- she has no matching order row)
EXISTS is a correlated subquery almost by definition — it's checking, per outer row, "is there a matching row for this specific outer value." Good query engines short-circuit the inner scan the moment they find one matching row — they don't need to find every match, just confirm at least one exists, which is often faster than a full JOIN + DISTINCT for pure existence checks.EXISTS/NOT EXISTS are the standard tools for data quality checks: "find orders with no matching customer" (referential integrity check), "find products never sold" (dead inventory check), "find employees not in any project" (utilization check). They answer yes/no questions without the row-duplication risk of a JOIN.Execution flow internals: how correlated subqueries really run
Logically, a correlated subquery behaves like a nested loop: for every row the outer query considers, the database substitutes that row's values into the inner query and re-evaluates it from scratch.
SELECT e.emp_name
FROM employees e
WHERE EXISTS (
SELECT 1 FROM employees m WHERE m.manager_id = e.emp_id
);
-- "find employees who manage at least one other employee"
Conceptual execution, row by row:
- Take outer row: Arjun Mehta (emp_id=101). Run inner query with
manager_id = 101→ finds Priya & Rohit → EXISTS is TRUE → keep Arjun. - Take outer row: Priya Nair (emp_id=102). Run inner query with
manager_id = 102→ finds Sneha → EXISTS is TRUE → keep Priya. - Take outer row: Sneha Kapoor (emp_id=104). Run inner query with
manager_id = 104→ finds nobody → EXISTS is FALSE → drop Sneha. - ...continue for all 16 rows.
O(n) outer rows × O(n) inner scan each = O(n²) in the worst case, with no index. In practice, real query optimizers are smarter than a literal nested loop — many rewrite correlated subqueries into semi-joins or hash-based plans internally when they can prove it's safe to do so. But you should never assume the optimizer will save you — always check EXPLAIN on a correlated subquery running against a large table, and add an index on the correlation column (manager_id here) if it's missing.Rewriting subqueries as joins
Many subqueries have an equivalent JOIN form. Knowing both, and knowing when to prefer which, is a strong interview signal.
IN subquery → INNER JOIN + DISTINCT
-- Subquery form
SELECT customer_name FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);
-- Equivalent JOIN form
SELECT DISTINCT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
Both give the same result here. The JOIN form needs DISTINCT because a customer with multiple orders would otherwise be duplicated (Module 1's cardinality issue) — the subquery form never had that problem in the first place, since IN only checks membership.
EXISTS → semi-join (same idea)
-- EXISTS form (preferred for pure existence checks)
SELECT c.customer_name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
-- JOIN + DISTINCT form (works, but needs the DISTINCT safety net)
SELECT DISTINCT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
- Prefer EXISTS/IN when you only need a yes/no filter and don't need any columns from the "other" table — it avoids duplication entirely and is often clearer about intent.
- Prefer a JOIN when you actually need to pull columns from the other table into your result — a subquery can't return those columns to the outer SELECT.
- Prefer a correlated subquery for per-row comparisons against a per-group aggregate (like "above my own department's average") when a window function (Module 6) isn't available or feels heavier than needed for a one-off query.
Subqueries in SELECT, FROM, and WHERE
Subquery in FROM (a "derived table" / inline view)
SELECT dept_id, avg_salary
FROM (
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
) dept_avgs
WHERE avg_salary > 100000;
The inner query is treated as if it were a real table — you must give it an alias (dept_avgs here), and every database requires this alias even though you never reference it by name in a simple case like this.
Subquery in SELECT (scalar, per-row)
SELECT
d.dept_name,
(SELECT COUNT(*) FROM employees e WHERE e.dept_id = d.dept_id) AS headcount
FROM departments d;
This is a correlated scalar subquery run once per department row — functionally similar to a LEFT JOIN + GROUP BY, but sometimes easier to read for a single extra computed column. At scale, prefer the JOIN + GROUP BY form (Module 1) since it's usually a single pass instead of one subquery execution per outer row.
FROM without an alias is a syntax error in most databases (MySQL, Postgres both require it) — always alias derived tables, even if you never reference the alias again.How interviewers trick people on subqueries
- The NOT IN + NULL trap — covered in depth above. This is probably the single most common SQL interview gotcha across all companies.
- The forgotten correlation trap — they ask for "above department average" and quietly check whether you correlate the inner query by
dept_id, or accidentally compute the company-wide average instead. - The "subquery returns multiple rows" trap — they ask you to use a scalar subquery (
= (SELECT ...)) against data where the subquery could return more than one row, to see if you catch the runtime error before running it. - The EXISTS vs IN vs JOIN performance question — "which is fastest?" The honest answer is "it depends on the optimizer, data size, and indexes — but EXISTS is generally safest for a pure existence check, since it can short-circuit and never risks duplicate rows." Interviewers want to see you avoid a dogmatic one-line answer.
- The double-negative trap — "find employees NOT in a department that has NO employees earning over 100000" nests two negations; many candidates fumble translating this into nested NOT EXISTS clauses correctly.
3 Hard Interview Questions
H1.Find the department(s) with the highest average salary. hard▶
dept_name, avg_salary — for whichever department(s) tie for the single highest average (handle ties correctly, don't just LIMIT 1).
Compute all department averages in a derived table or CTE, then compare each against the MAX of that same set.
SELECT dept_name, avg_salary
FROM (
SELECT d.dept_name, AVG(e.salary) AS avg_salary
FROM departments d
JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_name
) dept_avgs
WHERE avg_salary = (
SELECT MAX(avg_salary)
FROM (
SELECT AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
) all_avgs
);
-- Result: Engineering (~143,250)
- Using
ORDER BY avg_salary DESC LIMIT 1— this silently drops a genuine tie if two departments happen to share the exact highest average. Always prefer the= (SELECT MAX(...))pattern when ties matter. - Forgetting to exclude Legal (0 employees, NULL average) — it's automatically excluded here since we used an INNER JOIN grouping, but worth stating out loud in an interview.
- "Now do this with a window function instead." (foreshadowing:
RANK() OVER (ORDER BY avg_salary DESC), Module 6)
H2.Find customers whose total order amount is above the average total order amount across all customers. hard▶
This needs two levels of aggregation: per-customer totals first, then the average of those totals, then a filter. A single-level GROUP BY + HAVING can't reach "average of the per-group sums" directly — you need a derived table.
SELECT customer_id, total_spent
FROM (
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
WHERE customer_id IS NOT NULL
GROUP BY customer_id
) customer_totals
WHERE total_spent > (
SELECT AVG(total_spent)
FROM (
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
WHERE customer_id IS NOT NULL
GROUP BY customer_id
) t
);
- Trying to write
HAVING SUM(total_amount) > AVG(SUM(total_amount))in one single GROUP BY — this is invalid in standard SQL; you cannot nest an aggregate directly inside another aggregate in the same grouping level. The derived-table (or CTE) two-step is required. - Forgetting to exclude the orphaned/NULL customer_id order (5005/5006 style rows) from both the inner sums and the outer average — an ungrounded NULL customer_id would either error out or silently create a bogus grouping bucket depending on the engine.
H3.Find employees who are managers, but ALL of their direct reports earn less than 90000. hard▶
This is a double condition: (1) must actually have at least one report (EXISTS), and (2) must have zero reports earning ≥ 90000 (NOT EXISTS with a flipped condition) — or use ALL directly.
SELECT m.emp_name
FROM employees m
WHERE EXISTS (
SELECT 1 FROM employees e WHERE e.manager_id = m.emp_id
)
AND NOT EXISTS (
SELECT 1 FROM employees e WHERE e.manager_id = m.emp_id AND e.salary >= 90000
);
SELECT m.emp_name
FROM employees m
WHERE EXISTS (SELECT 1 FROM employees e WHERE e.manager_id = m.emp_id)
AND 90000 > ALL (
SELECT e.salary FROM employees e WHERE e.manager_id = m.emp_id
);
- Skipping the first EXISTS check — without it, a manager with zero reports would trivially satisfy "NOT EXISTS a report earning ≥ 90000" (vacuously true, since there are no reports to violate the rule), incorrectly including individual contributors with no reports at all.
- "What does 'ALL of an empty set satisfies any condition' mean, and why is it dangerous here?" — a classic logic gotcha: ALL / vacuous truth over an empty set is TRUE by definition, which is exactly why the EXISTS guard is required.
2 FAANG-Level Questions
F1.Second-highest distinct salary overall, using only a subquery (no LIMIT/OFFSET, no window functions). faang▶
This is one of the most repeated interview questions in existence — precisely because there are several ways to solve it, and the "wrong" ways all fail on ties or missing values in specific, revealing ways.
"Second highest" = the max salary strictly less than the overall max.
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Overall max is Arjun at 185000 -> second highest is 160000 (Karan Verma)
SELECT DISTINCT e1.salary
FROM employees e1
WHERE 1 = (
SELECT COUNT(DISTINCT e2.salary)
FROM employees e2
WHERE e2.salary > e1.salary
);
-- "exactly 1 distinct salary is greater than mine" = I am the second highest
For "Nth highest" generalized, a window function (DENSE_RANK() OVER (ORDER BY salary DESC), filtered to rank = N) is the cleanest and most maintainable production answer — see Module 6. The subquery forms above are the classic interview answer specifically because window functions weren't always available/allowed as a constraint in the question.
First solution: O(n) for each MAX scan, effectively O(n) overall with a single pass (or O(log n) with an index on salary). Second solution: O(n²) without an index, since it's a correlated subquery per row.
- Using
ORDER BY salary DESC LIMIT 1 OFFSET 1— works, but silently gives the wrong answer if there's a tie for first place (in our data, if the two highest earners tied, OFFSET 1 would return that same tied value, not the "true" second distinct value). The question explicitly says "distinct" salary for this reason. - Forgetting
DISTINCTwhen the underlying data has salary ties elsewhere in the table (it doesn't affect this specific MAX-based solution, but does affect naive OFFSET-based ones).
- "Generalize this to the Nth highest salary." (Motivates window functions directly — this single question is the classic bridge from subqueries to Module 6.)
F2.Find all departments where every single employee earns above 90000 (a "fully senior" department). faang▶
Notice the trap: Legal has zero employees. Does "every employee earns above 90000" vacuously include Legal? This is a genuine judgment call interviewers use to test whether you think about edge cases, not just write syntax.
"Every employee is above X" = "NOT EXISTS an employee at or below X" — but decide deliberately whether to require at least one employee to exist first.
SELECT d.dept_name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id
)
AND NOT EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id AND e.salary <= 90000
);
-- Result: none of our 5 non-empty departments qualify (each has at least one employee at or below 90000)
SELECT d.dept_name
FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id AND e.salary <= 90000
);
-- Result: Legal included, since it has no violating employees (vacuously true)
- Picking one interpretation without stating the ambiguity out loud. A strong interview answer explicitly says: "Do you want departments with zero employees to count as trivially satisfying this? I'll assume not, but here's the one-line change if you want the other interpretation" — and then shows both queries, exactly as above.
- "Now require it in SQL that treats vacuous truth as invalid by business rule — how would you encode 'must have at least 3 employees to qualify'?" (Add a HAVING COUNT(*) >= 3 in a GROUP BY version instead.)
1 Production Scenario
Referential integrity check before a warehouse load
Before loading a new batch of orders into a data warehouse fact table, a standard pre-load data quality gate checks for orphaned foreign keys — orders referencing a customer_id that doesn't (yet) exist in the customer dimension. Loading them anyway would create the fan-out / "UNKNOWN" join gaps discussed in Module 1's production scenario.
-- Data quality gate: fail the load (or quarantine these rows) if any exist
SELECT o.order_id, o.customer_id
FROM orders o
WHERE NOT EXISTS (
SELECT 1 FROM customers c WHERE c.customer_id = o.customer_id
);
-- Flags order 5005 (customer_id 999, doesn't exist in customers)
Production pipelines typically run this exact NOT EXISTS check as an automated assertion step (in dbt, this pattern is literally the built-in relationships test) and either: (a) halt the pipeline and alert, (b) route the offending rows to a quarantine table for manual review, or (c) load them anyway but flag them, matching the "UNKNOWN" category approach from Module 1 — the right choice depends on whether late-arriving dimension data is expected and normal, or a genuine data bug.
orders.customer_id can ever be NULL (guest checkouts, as seen earlier), a NOT IN version of this exact check would silently report zero problems even when real orphaned rows exist — the worst possible failure mode for a data quality gate: appearing to pass while hiding real issues.10 Practice Questions
Write the SQL yourself before expanding each answer.
P1.Find employees who earn the maximum salary in the whole company. easy▶
SELECT emp_name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);P2.Find departments that have at least one employee earning over 150000. easy▶
SELECT dept_name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id AND e.salary > 150000
);P3.Find employees who are NOT anyone's manager. easy▶
SELECT emp_name
FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM employees m WHERE m.manager_id = e.emp_id
);P4.Find customers who have placed more than one order. medium▶
SELECT customer_id
FROM (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE customer_id IS NOT NULL
GROUP BY customer_id
) t
WHERE order_count > 1;
-- Rahul Bansal (201) placed 2 ordersP5.Find employees earning more than their department's average, using a correlated subquery. medium▶
SELECT e.emp_name
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id
);P6.Find departments with zero employees, using NOT EXISTS. medium▶
SELECT dept_name
FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id
);
-- LegalP7.Find the employee(s) with the lowest salary in each department (one subquery per department comparison). medium▶
SELECT e.emp_name, e.dept_id, e.salary
FROM employees e
WHERE e.salary = (
SELECT MIN(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id
);P8.Explain, without running it, why WHERE dept_id NOT IN (SELECT dept_id FROM employees) could return zero departments even if Legal genuinely has no employees. hard▶
Because employees.dept_id contains NULLs (Tanvi Desai, Sameer Gupta) — the subquery's result list includes a NULL, which poisons every comparison in the implicit AND-chain that NOT IN performs, collapsing the whole result to empty. The safe rewrite is NOT EXISTS (SELECT 1 FROM employees e WHERE e.dept_id = d.dept_id).
P9.Find employees whose salary is above the average salary of employees who report to the same manager as them (peer average, not department average). hard▶
SELECT e.emp_name, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.manager_id = e.manager_id
);
-- Note: employees with manager_id = NULL (Arjun, Karan, Meera, Divya, Ishita) compare against
-- the average of OTHER NULL-manager employees -- since NULL = NULL is never true in the correlation,
-- double check this edge case: most engines treat "e2.manager_id = e.manager_id" as UNKNOWN when both are NULL,
-- so top-level employees would each compare against an EMPTY set (AVG of nothing = NULL), and salary > NULL is UNKNOWN --
-- meaning all NULL-manager employees are silently excluded. Flag this out loud as a known limitation.P10.Find orders placed by customers from a country other than India. hard▶
SELECT order_id, total_amount
FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE country != 'India'
);
-- Grace Lin's and Omar Haddad's orders (5002, 5004)
-- Note: order 5005 (customer_id 999) is correctly excluded, since 999 isn't in the customers list at allBefore you say NEXT: make sure you can explain, out loud and without looking, why NOT IN can silently return zero rows when NULLs are present, the difference between a correlated and non-correlated subquery, when to prefer EXISTS over a JOIN, and how to rewrite an IN subquery as an equivalent JOIN. When you're ready, reply NEXT and we'll move to Module 3: CTEs (Common Table Expressions).