데이터 분석 SQL - 서브 쿼리(Sub-query)

PHM 2022. 10. 8. 16:30

서브쿼리(Sub-query) 개요

- 서브쿼리는 하나의 쿼리내에 또 다른 쿼리가 포함되어 있는 쿼리를 의미

- 서브쿼리는 메인 쿼리(Main Query)내에 포함되어 있는 관계

- Where절에 사용될 경우 복잡한 업무적인 조건을 직관적인 SQL로 표현하여 필터링하는데 주로 사용

select * from hr.emp where sal >= (select avg(sal) from hr.emp)

서브쿼리 유형	예시
Where 절에 사용되는 서브쿼리	- select * from hr.emp where sal >= (select avg(sal) from hr.emp) - select * from hr.emp_salary_hist a where todate = (select max(todate) from hr.emp_salary_hist b where a.empno = b.empno)
Select 절에 사용되는 서브쿼리 (스칼라 서브쿼리)	- select ename, deptno (select dname from hr.dept x where x.deptno = a.deptno) as dname from hr.emp a;
From 절에 사용되는 서브쿼리 (인라인 뷰)	- select dname from *(select from hr.dept x)**

(where절) 서브쿼리의 주요 특징

- 서브쿼리는 메인 쿼리에 where 조건으로 값을 전달하거나 메인쿼리와 연결되어 메인 쿼리의 필터링 작업을 수행

ㆍ서브쿼리 컬럼은 컬럼의 값만 메인쿼리로 전달될 수 있으며 컬럼 자체는 메인쿼리에서 사용될 수 없음

ㆍ메인쿼리의 컬럼은 서브쿼리에서 사용될 수 있음

- 서브쿼리와 메인쿼리로 연결 시 메인쿼리의 집합 레벨이 변경되지 않음

ㆍ메인 쿼리와 서브쿼리와 연결 시 서브쿼리는 연결 컬럼으로 무조건 유니크한 집합, 즉 1의 집합이 되므로 메인쿼리의

집합 레벨을 변경하지 않음.

select a.* from hr.dept a where a.deptno in (select deptno from hr.emp x where x.sal > 1000);	OK
select a.*, x.ename from hr.dept a where a.deptno in (select deptno from hr.emp x where x.sal > 1000);	수행 안됨
select a.* from hr.dept a where exists (select deptno from hr.emp x where x.deptno = a.deptno and x.sal > 1000);	OK

select * from nw.orders a where order_id in (select order_id from nw.order_items where amount >100);

- 1:M 관계 - 서브쿼리는 연결 컬럼으로 무조건 유니크한 집합, 즉 1의 집합이됨 - 1:1

서브쿼리와 세미조인(Semi-Join)

- 서브쿼리는 메인쿼리와 연결시에 조인과 유사한 방식(Semi-Join)으로 연결될 수 있음

- 단 서브쿼리는 무조건 1의 집합으로 만들어지므로 메인쿼리 집합 레벨을 변경 시키지 않음

서브쿼리 활용 시 유의 사항

- 서브쿼리가 비상관(non-correlated) 서브쿼리인가, 상관(correlated) 서브쿼리인가?

ㆍ비상관 서브쿼리 : 서브쿼리 자체적으로 메인쿼리에 값을 전달

select * from hr.dept where deptno in (select deptno from hr.emp where sal < 1300);

ㆍ상관 서브쿼리 : 메인쿼리에서 서브쿼리에 연결 컬럼으로 연결한 뒤에 메인 쿼리에서 값을 서브쿼리로 전달

( 서브쿼리내에 메인쿼리의 연결 컬럼을 가지고 있음 )

select a.* from hr.dept a where exists ( select deptno from hr.emp x where x.deptno = a.deptno and x.sal < 1300 )

- 단일 행 서브쿼리/다중 행 서브쿼리

ㆍ메인 쿼리 - 서브쿼리 연결 연산자가 단순 비교 연산자(( =, >, <, >=, <=, <>), in, exists 또는 all/any 인가?

	비상관 서브쿼리	상관 서브쿼리
단일행 서브쿼리	비교 연산자( =, >, <, >=, <=, <>)	비교 연산자( =, >, <, >=, <=, <> )
다중행 서브쿼리	in, not in	exists, not exists

- in/not in, exists/not exists 차이

- 서브쿼리가 단일 컬럼 또는 다중 컬럼을 반환하는가?

비상관 서브쿼리에서 IN 연산자 - 다중행 서브쿼리

- 비상관 서브쿼리는 서브쿼리 자체적으로 메인쿼리에 값을 전달

- IN연산자는 괄호내에 여러 개의 상수값 또는 여러 개의 레코드를 반환하는 서브쿼를 가질 수 있음

- IN연산자는 여러 개의 값이 입력될 경우 개별 값의 = 조건들의 OR 연산이 적용됨

- IN연산자는 서브쿼리가 사용될 때 SQL 상에서 중복된 여러 개의 값을 입력하더라도 실제 DB에서 사용될 떄는 중복값이 제거된 유니크한 값으로 입력됨

- 여러 개의 컬럼 조건으로 IN연산자를 사용할 수 있음

- select * from hr.emp where deptno in (20,30);
select * from hr.emp where deptno = 20 or deptno = 30;

- select * from hr.dept where deptno in (select deptno from hr.emp where sal < 1300);

- select * from hr.dept where (deptno, loc) in (select deptno, 'DALLAS' from hr.emp where sal < 1300);

비상관 서브쿼리에서 단순 비교 연산자 - 단일행 서브쿼리

- 메인 쿼리에서 단순 비교 연산자(=, >, <, >=, <=, <>)로 서브쿼리의 결과값을 받을 때는 단 한 개의 행만 서브쿼리에서 제공해야 함

- 단순 비교 연산자로 서브쿼리를 연결하여도 여러 컬럼 조건을 가질 수 있음

select * from hr.emp where sal <= (select avg(sal) from hr.emp);	OK
select * from hr.emp where deptno = (select deptno from hr.emp where sal < 1300);	수행 안됨 (여러 건)
select * from nw.orders where (customer_id, order_date) = (select customer_id, max(order_date) from nw.orders where customer_id='VINET' group by customer_id);	OK

상관 서브쿼리 ( Correlated Sub-Query )

- 상관 서브쿼리는 서브쿼리내에 메인쿼리의 연결 컬럼을 가지고 있음. 메인쿼리에서 서브쿼리로 연결 값을 전달하는 방식

select * from nw.orders a
where exists (select order_id from nw.order_items x where x.order_id = a.order_id and x.amount > 100);

select * from hr.emp_salary_hist a
where todate = (select max(todate) from hr.emp_salary_hist x where x.empno = a.empno);

상관 서브쿼리 exists 연산자 - 다중행 서브쿼리

- Exists 는 다중행 상관 서브쿼리에서 사용되는 대표적인 연산자

- 메인쿼리의 레코드 별로 서브쿼리의 결과가 한 건이라도 존재하면 true가 되어 메인쿼리의 결과 반환

- 문맥적으로 메인쿼리의 레코드가 서브쿼리에서 존재하는지를 체크하기 위해 활용

- 메인쿼리와 서브쿼리가 단일 컬럼으로 연결시에 IN과 EXISTS의 결과는 서로 동일 ( 물론 서로 호환이되는 SQL일 경우만). 하지만 DBMS 내부적으로 실행계획은 서로 다를 수 있음

select * from nw.customers a
where exists (select 1 from nw.orders x where x.customer_id = a.customer_id group by customer_id having count(*) >=2 );

select * from nw.customers a
where exists (select 1 from nw.orders x where x.customer_id = a.customer_id and x.order_date
>= to_date('19970101', 'yyyymmdd'));

메인 쿼리와 상관 서브쿼리의 연결 방식

- 메인쿼리는 상관 서브쿼리에서 연결 시 한건의 서브쿼리만 결과를 반환하면 더 이상 동일 레코드로 연결을 수행하지 않기 때문에 메인 쿼리의 집합 레벨을 그대로 유지할 수 있음

-> 딱 하나 연결되면 그 다음은 연결 시도 자체를 안함.

상관 서브쿼리에서 단순 비교 연산자 - 단일행 서브쿼리

- 메인 쿼리에서 상관 서브쿼리의 결과를 단순 비교 연산자 ( =,>,<,>=,<=,<> )로 비교할 때는 반드시 메인쿼리의 개별 레코드 별로 단 한 개의 레코드만 서브쿼리에서 제공해야함

select * from hr.emp_salary_hist a where todate = (select max(todate) from hr.emp_salary_hist x where x.empno = a.empno);	메인쿼리의 개별 레코드별로 서브쿼리의 max(todate)는 단 한개의 레코드를 반환
select * from hr.emp_salary_hist a where todate = (select todate from hr.emp_salary_hist x where x.empno = a.empno)	메인쿼리의 개별 레코드 별로 empno 연결조건으로 단 한건이 아닌 여러건을 반환하므로 수행오류