Home » SQL & PL/SQL » SQL & PL/SQL » Join purchases to customers (19c)
Join purchases to customers [message #687660] |
Mon, 01 May 2023 06:27  |
 |
Unclefool
Messages: 82 Registered: August 2021
|
Member |
|
|
I have some code below, which is working fine. I am having difficulty figuring out the syntax to join purchases customer_id to the customer table in order to display the first_name and last_name after the customer_id. Any help would be appreciated. Below is my sample data and test case
ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'DD-MON-YYYY HH24:MI:SS.FF';
ALTER SESSION SET NLS_DATE_FORMAT = 'DD-MON-YYYY HH24:MI:SS';
CREATE TABLE customers
(CUSTOMER_ID, FIRST_NAME, LAST_NAME) AS
SELECT 1, 'Faith', 'Mazzarone' FROM DUAL UNION ALL
SELECT 2, 'John', 'Henry' FROM DUAL UNION ALL
SELECT 3, 'Lisa', 'Saladino' FROM DUAL;
CREATE TABLE purchases(
ORDER_ID NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
CUSTOMER_ID NUMBER,
PURCHASE_DATE TIMESTAMP
);
INSERT INTO purchases
(CUSTOMER_ID, PURCHASE_DATE)
SELECT 1, TIMESTAMP '2023-04-03 11:41:25' + NUMTODSINTERVAL ( LEVEL * 2, 'DAY') FROM dual CONNECT BY LEVEL <= 6 UNION ALL
SELECT 2, TIMESTAMP '2023-04-03 01:41:25' + NUMTODSINTERVAL ( LEVEL * 1, 'DAY') FROM dual CONNECT BY LEVEL <= 7 UNION ALL
SELECT 3, TIMESTAMP '2023-04-23 21:31:25' + NUMTODSINTERVAL ( LEVEL * 1, 'DAY') FROM dual CONNECT BY LEVEL <= 7 UNION ALL
SELECT 3, TIMESTAMP '2023-03-12 21:44:35' + NUMTODSINTERVAL ( LEVEL * 1, 'DAY') FROM dual
CONNECT BY LEVEL <=20;
WITH t as (
select distinct CUSTOMER_ID, trunc(PURCHASE_DATE) dat
from purchases
)
,tt as (
select t.*
,row_number() over (partition by CUSTOMER_ID order by dat) rn
from t
)
select CUSTOMER_ID, min(dat) start_date,max(dat) end_date, count(*) num_days
from tt
group by CUSTOMER_ID,dat-rn
having count(*) >= 10
|
|
|
Re: Join purchases to customers [message #687661 is a reply to message #687660] |
Mon, 01 May 2023 09:34   |
 |
Barbara Boehmer
Messages: 9058 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Here is one method that just continues what you already have.
C##SCOTT@XE_21.3.0.0.0> WITH t as (
2 select distinct CUSTOMER_ID, trunc(PURCHASE_DATE) dat
3 from purchases
4 )
5 ,tt as (
6 select t.*
7 ,row_number() over (partition by CUSTOMER_ID order by dat) rn
8 from t
9 )
10 ,ttt as (
11 select CUSTOMER_ID, min(dat) start_date,max(dat) end_date, count(*) num_days
12 from tt
13 group by CUSTOMER_ID,dat-rn
14 having count(*) >= 10
15 )
16 select c.customer_id, c.first_name, c.last_name,
17 ttt.start_date, ttt.end_date, ttt.num_days
18 from customers c, ttt
19 where c.customer_id = ttt.customer_id
20 /
CUSTOMER_ID FIRST LAST_NAME START_DATE END_DATE NUM_DAYS
----------- ----- --------- -------------------- -------------------- ----------
3 Lisa Saladino 13-MAR-2023 00:00:00 01-APR-2023 00:00:00 20
1 row selected.
or eliminating some sub-queries:
C##SCOTT@XE_21.3.0.0.0> WITH p as (
2 select distinct customer_id, trunc(purchase_date) dat,
3 dense_rank() over (partition by CUSTOMER_ID order by trunc(purchase_date)) rn
4 from purchases
5 )
6 select c.customer_id, c.first_name, c.last_name,
7 min(p.dat) start_date, max(p.dat) end_date, count(*) num_days
8 from customers c, p
9 where c.customer_id = p.customer_id
10 group by c.customer_id, c.first_name, c.last_name, p.dat-p.rn
11 having count(*) >=10
12 /
CUSTOMER_ID FIRST LAST_NAME START_DATE END_DATE NUM_DAYS
----------- ----- --------- -------------------- -------------------- ----------
3 Lisa Saladino 13-MAR-2023 00:00:00 01-APR-2023 00:00:00 20
1 row selected.
[Updated on: Mon, 01 May 2023 12:07] Report message to a moderator
|
|
|
|
Re: Join purchases to customers [message #687663 is a reply to message #687661] |
Mon, 01 May 2023 12:36   |
Solomon Yakobson
Messages: 3245 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Match recognize solution:
select m.customer_id,
c.first_name,
c.last_name,
m.start_date,
m.end_date,
m.end_date - m.start_date + 1 num_days
from purchases
match_recognize(
partition by customer_id
order by purchase_date
measures
trunc(first(purchase_date)) start_date,
trunc(last(purchase_date)) end_date
one row per match
pattern(p{10,})
define p as prev(purchase_date) is null or trunc(purchase_date) - trunc(prev(purchase_date)) = 1
) m,
customers c
where c.customer_id = m.customer_id
/
CUSTOMER_ID FIRST LAST_NAME START_DATE END_DATE NUM_DAYS
----------- ----- --------- -------------------- -------------------- ----------
3 Lisa Saladino 13-MAR-2023 00:00:00 01-APR-2023 00:00:00 20
SQL>
SY.
|
|
|
Re: Join purchases to customers [message #687664 is a reply to message #687663] |
Mon, 01 May 2023 13:41  |
 |
Barbara Boehmer
Messages: 9058 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
I have yet to try using match_recognize other than adapting some simple example. To me it is kind of like French. When I read it I can understand it. However, if I am trying to write it, I have trouble thinking of the words or syntax. I like that using match_recognize can reduce a lot of code with sub-queries to just one query. Just for fun, I created some indexes and gathered some statistics, and ran explain plans to compare results. I only included the indexes below that the optimizer chose to use. It looks like it opted for compound indexes on all columns so that it can just scan the indexes without even scanning the tables in some cases. It chose those over indexes on individual columns. I found it even chose the index with the order_id that is not used in the queries over one without it. The explain plans show that my initial reduction of sub-queries reduced the steps taken and the match_recognize did so even more. I am presuming that better access and fewer steps would result in a faster query on a data set large enough to see a difference.
C##SCOTT@XE_21.3.0.0.0> -- addition of indexes and gathering of statistics for testing:
C##SCOTT@XE_21.3.0.0.0> CREATE INDEX c_idx ON customers (customer_id, last_name, first_name)
2 /
Index created.
C##SCOTT@XE_21.3.0.0.0> CREATE INDEX p_idx ON purchases (customer_id, purchase_date, order_id)
2 /
Index created.
C##SCOTT@XE_21.3.0.0.0> EXEC DBMS_STATS.GATHER_TABLE_STATS (USER, 'CUSTOMERS')
PL/SQL procedure successfully completed.
C##SCOTT@XE_21.3.0.0.0> EXEC DBMS_STATS.GATHER_TABLE_STATS (USER, 'PURCHASES')
PL/SQL procedure successfully completed.
C##SCOTT@XE_21.3.0.0.0> -- minimal modification by me:
C##SCOTT@XE_21.3.0.0.0> explain plan for
2 WITH t as (
3 select distinct CUSTOMER_ID, trunc(PURCHASE_DATE) dat
4 from purchases
5 )
6 ,tt as (
7 select t.*
8 ,row_number() over (partition by CUSTOMER_ID order by dat) rn
9 from t
10 )
11 ,ttt as (
12 select CUSTOMER_ID, min(dat) start_date,max(dat) end_date, count(*) num_days
13 from tt
14 group by CUSTOMER_ID,dat-rn
15 having count(*) >= 10
16 )
17 select c.customer_id, c.first_name, c.last_name,
18 ttt.start_date, ttt.end_date, ttt.num_days
19 from customers c, ttt
20 where c.customer_id = ttt.customer_id
21 /
Explained.
C##SCOTT@XE_21.3.0.0.0> select * from table (dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4164497899
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 60 | 5 (60)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 60 | 5 (60)| 00:00:01 |
| 2 | VIEW | | 1 | 44 | 4 (75)| 00:00:01 |
|* 3 | FILTER | | | | | |
| 4 | HASH GROUP BY | | 1 | 22 | 4 (75)| 00:00:01 |
| 5 | VIEW | | 40 | 880 | 3 (67)| 00:00:01 |
| 6 | WINDOW SORT | | 40 | 360 | 3 (67)| 00:00:01 |
| 7 | VIEW | | 40 | 360 | 2 (50)| 00:00:01 |
| 8 | HASH UNIQUE | | 40 | 560 | 2 (50)| 00:00:01 |
| 9 | INDEX FULL SCAN| P_IDX | 40 | 560 | 1 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | C_IDX | 1 | 16 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(COUNT(*)>=10)
10 - access("C"."CUSTOMER_ID"="TTT"."CUSTOMER_ID")
Note
-----
- this is an adaptive plan
27 rows selected.
C##SCOTT@XE_21.3.0.0.0> -- fewer sub-queries by me:
C##SCOTT@XE_21.3.0.0.0> explain plan for
2 WITH p as (
3 select distinct customer_id, trunc(purchase_date) dat,
4 dense_rank() over (partition by CUSTOMER_ID order by trunc(purchase_date)) rn
5 from purchases
6 )
7 select c.customer_id, c.first_name, c.last_name,
8 min(p.dat) start_date, max(p.dat) end_date, count(*) num_days
9 from customers c, p
10 where c.customer_id = p.customer_id
11 group by c.customer_id, c.first_name, c.last_name, p.dat-p.rn
12 having count(*) >=10
13 /
Explained.
C##SCOTT@XE_21.3.0.0.0> select * from table (dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3136216344
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 5 (60)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 38 | 5 (60)| 00:00:01 |
|* 3 | HASH JOIN | | 40 | 1520 | 4 (50)| 00:00:01 |
| 4 | INDEX FULL SCAN | C_IDX | 3 | 48 | 1 (0)| 00:00:01 |
| 5 | VIEW | | 40 | 880 | 3 (67)| 00:00:01 |
| 6 | HASH UNIQUE | | 40 | 560 | 3 (67)| 00:00:01 |
| 7 | WINDOW SORT | | 40 | 560 | 3 (67)| 00:00:01 |
| 8 | INDEX FULL SCAN| P_IDX | 40 | 560 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(COUNT(*)>=10)
3 - access("C"."CUSTOMER_ID"="P"."CUSTOMER_ID")
21 rows selected.
C##SCOTT@XE_21.3.0.0.0> -- query using match_recognize by Solomon:
C##SCOTT@XE_21.3.0.0.0> explain plan for
2 select m.customer_id,
3 c.first_name,
4 c.last_name,
5 m.start_date,
6 m.end_date,
7 m.end_date - m.start_date + 1 num_days
8 from purchases
9 match_recognize(
10 partition by customer_id
11 order by purchase_date
12 measures
13 trunc(first(purchase_date)) start_date,
14 trunc(last(purchase_date)) end_date
15 one row per match
16 pattern(p{10,})
17 define p as prev(purchase_date) is null or trunc(purchase_date) - trunc(prev(purchase_date)) = 1
18 ) m,
19 customers c
20 where c.customer_id = m.customer_id
21 /
Explained.
C##SCOTT@XE_21.3.0.0.0> select * from table (dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2832019574
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 40 | 1640 | 2 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 40 | 1640 | 2 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN | C_IDX | 3 | 48 | 1 (0)| 00:00:01 |
| 3 | VIEW | | 40 | 1000 | 1 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 40 | 560 | 1 (0)| 00:00:01 |
| 5 | MATCH RECOGNIZE BUFFER DETERMINISTIC FINITE AUTOMATON| | 40 | 560 | 1 (0)| 00:00:01 |
| 6 | INDEX FULL SCAN | P_IDX | 40 | 560 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C"."CUSTOMER_ID"="M"."CUSTOMER_ID")
18 rows selected.
|
|
|
Goto Forum:
Current Time: Wed Sep 27 09:39:45 CDT 2023
|