Home » SQL & PL/SQL » SQL & PL/SQL » Grouping with the previous one if identical
Grouping with the previous one if identical [message #687314] |
Tue, 21 February 2023 06:46  |
 |
questvba
Messages: 15 Registered: July 2022
|
Junior Member |
|
|
Hi Forum,
With the following query
WITH MYDATA as
(SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
SELECT 3017479, 60914, 202012, 184.25 FROM dual
)
SELECT *
FROM MYDATA
;
I have this result (left side)
In file 3017479, I have two children (40818 and 60914) who receive amounts that may be identical or different depending on certain circumstances.

The aim would be to be able to group the contiguous identical amounts by file, by child. This would look like this (right side):

For child 40818, the amount is the same in 202004 and 202008 but as there was a break in 202007, they are two different groups.
DATES_TART is the MIN of the grouping and DATE_END is the MAX of the grouping.
At the moment I have tried with LAG, LEAD, FIRST_VALUE but I can't get the desired result.
If you have any questions...
If you have a solution, I'm interested.
|
|
|
Re: Grouping with the previous one if identical [message #687315 is a reply to message #687314] |
Tue, 21 February 2023 08:05   |
 |
Michel Cadot
Messages: 68501 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
SQL> WITH MYDATA as
2 (SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
3 SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
4 SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
5 SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
6 SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
7 SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
8 SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
9 SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
10 SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
11 SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
12 SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
13 SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
14 SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
15 SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
16 SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
17 SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
18 SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
19 SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
20 SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
21 SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
22 SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
23 SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
24 SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
25 SELECT 3017479, 60914, 202012, 184.25 FROM dual
26 ),
27 t as (
28 select filenb, child, date_start, amount,
29 case
30 when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
31 then row_number() over (partition by filenb, child order by date_start)
32 end rn
33 from mydata
34 ),
35 grouping as (
36 select filenb, child, date_start, amount,
37 max(rn) over (partition by filenb, child order by date_start) grp
38 from t
39 )
40 select filenb, child, min(date_start) date_start, max(date_start) date_end, amount
41 from grouping
42 group by filenb, child, amount, grp
43 order by filenb, child, date_start
44 /
FILENB CHILD DATE_START DATE_END AMOUNT
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 202002 204.66
3017479 40818 202003 202003 208.75
3017479 40818 202004 202006 123.64
3017479 40818 202007 202007 174.64
3017479 40818 202008 202012 123.64
3017479 60914 202001 202003 210.29
3017479 60914 202004 202006 163.23
3017479 60914 202007 202007 184.25
3017479 60914 202008 202011 163.23
3017479 60914 202012 202012 184.25
10 rows selected.
|
|
|
|
Re: Grouping with the previous one if identical [message #687317 is a reply to message #687315] |
Tue, 21 February 2023 08:45   |
Solomon Yakobson
Messages: 3244 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
WITH MYDATA AS (
SELECT 3017479 FILENB, 40818 CHILD, 202001 DATE_START, 204.66 AMOUNT FROM DUAL UNION ALL
SELECT 3017479, 40818, 202002, 204.66 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202003, 208.75 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202004, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202005, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202006, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202007, 174.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202008, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202009, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202010, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202011, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 40818, 202012, 123.64 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202001, 210.29 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202002, 210.29 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202003, 210.29 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202004, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202005, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202006, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202007, 184.25 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202008, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202009, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202010, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202011, 163.23 FROM DUAL UNION ALL
SELECT 3017479, 60914, 202012, 184.25 FROM DUAL
),
T AS (
SELECT FILENB,
CHILD,
DATE_START,
AMOUNT,
DATE_START - DENSE_RANK() OVER(PARTITION BY FILENB,CHILD,AMOUNT ORDER BY DATE_START) GRP
FROM MYDATA
)
SELECT FILENB,
CHILD,
MIN(DATE_START) DATE_START,
MAX(DATE_START) DATE_END,
AMOUNT
FROM T
GROUP BY FILENB,
CHILD,
AMOUNT,
GRP
ORDER BY FILENB,
CHILD,
DATE_START
/
FILENB CHILD DATE_START DATE_END AMOUNT
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 202002 204.66
3017479 40818 202003 202003 208.75
3017479 40818 202004 202006 123.64
3017479 40818 202007 202007 174.64
3017479 40818 202008 202012 123.64
3017479 60914 202001 202003 210.29
3017479 60914 202004 202006 163.23
3017479 60914 202007 202007 184.25
3017479 60914 202008 202011 163.23
3017479 60914 202012 202012 184.25
10 rows selected.
SQL>
SY.
|
|
|
|
Re: Grouping with the previous one if identical [message #687319 is a reply to message #687318] |
Tue, 21 February 2023 09:58   |
 |
Michel Cadot
Messages: 68501 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
The first subquery "t" adds a tag to each when a new "group" starts:
SQL> WITH MYDATA as
2 (SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
3 SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
4 SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
5 SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
6 SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
7 SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
8 SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
9 SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
10 SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
11 SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
12 SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
13 SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
14 SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
15 SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
16 SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
17 SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
18 SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
19 SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
20 SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
21 SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
22 SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
23 SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
24 SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
25 SELECT 3017479, 60914, 202012, 184.25 FROM dual
26 )
27 select filenb, child, date_start, amount,
28 case
29 when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
30 then row_number() over (partition by filenb, child order by date_start)
31 end rn
32 from mydata
33 order by filenb, child, date_start
34 /
FILENB CHILD DATE_START AMOUNT RN
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 204.66 1
3017479 40818 202002 204.66
3017479 40818 202003 208.75 3
3017479 40818 202004 123.64 4
3017479 40818 202005 123.64
3017479 40818 202006 123.64
3017479 40818 202007 174.64 7
3017479 40818 202008 123.64 8
3017479 40818 202009 123.64
3017479 40818 202010 123.64
3017479 40818 202011 123.64
3017479 40818 202012 123.64
3017479 60914 202001 210.29 1
3017479 60914 202002 210.29
3017479 60914 202003 210.29
3017479 60914 202004 163.23 4
3017479 60914 202005 163.23
3017479 60914 202006 163.23
3017479 60914 202007 184.25 7
3017479 60914 202008 163.23 8
3017479 60914 202009 163.23
3017479 60914 202010 163.23
3017479 60914 202011 163.23
3017479 60914 202012 184.25 12
24 rows selected.
The second one "grouping" spreads this tag to each member of the group:
SQL> break on grp skip 1 dup
SQL> WITH MYDATA as
2 (SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
3 SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
4 SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
5 SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
6 SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
7 SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
8 SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
9 SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
10 SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
11 SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
12 SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
13 SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
14 SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
15 SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
16 SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
17 SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
18 SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
19 SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
20 SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
21 SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
22 SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
23 SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
24 SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
25 SELECT 3017479, 60914, 202012, 184.25 FROM dual
26 ),
27 t as (
28 select filenb, child, date_start, amount,
29 case
30 when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
31 then row_number() over (partition by filenb, child order by date_start)
32 end rn
33 from mydata
34 )
35 select filenb, child, date_start, amount,
36 max(rn) over (partition by filenb, child order by date_start) grp
37 from t
38 order by filenb, child, date_start
39 /
FILENB CHILD DATE_START AMOUNT GRP
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 204.66 1
3017479 40818 202002 204.66 1
3017479 40818 202003 208.75 3
3017479 40818 202004 123.64 4
3017479 40818 202005 123.64 4
3017479 40818 202006 123.64 4
3017479 40818 202007 174.64 7
3017479 40818 202008 123.64 8
3017479 40818 202009 123.64 8
3017479 40818 202010 123.64 8
3017479 40818 202011 123.64 8
3017479 40818 202012 123.64 8
3017479 60914 202001 210.29 1
3017479 60914 202002 210.29 1
3017479 60914 202003 210.29 1
3017479 60914 202004 163.23 4
3017479 60914 202005 163.23 4
3017479 60914 202006 163.23 4
3017479 60914 202007 184.25 7
3017479 60914 202008 163.23 8
3017479 60914 202009 163.23 8
3017479 60914 202010 163.23 8
3017479 60914 202011 163.23 8
3017479 60914 202012 184.25 12
24 rows selected.
Then the last one to get each group with their min and max.
|
|
|
Re: Grouping with the previous one if identical [message #687320 is a reply to message #687318] |
Tue, 21 February 2023 11:26   |
 |
Michel Cadot
Messages: 68501 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
Note that Solomon's solution works only if all dates are filled...
... unless you specify that if some dates are missing this is counted as a "break" even if amount does not change, in this later case mine does not work.
For instance, I modified the dates of last 4 rows:
SQL> WITH MYDATA AS (
2 SELECT 3017479 FILENB, 40818 CHILD, 202001 DATE_START, 204.66 AMOUNT FROM DUAL UNION ALL
3 SELECT 3017479, 40818, 202002, 204.66 FROM DUAL UNION ALL
4 SELECT 3017479, 40818, 202003, 208.75 FROM DUAL UNION ALL
5 SELECT 3017479, 40818, 202004, 123.64 FROM DUAL UNION ALL
6 SELECT 3017479, 40818, 202005, 123.64 FROM DUAL UNION ALL
7 SELECT 3017479, 40818, 202006, 123.64 FROM DUAL UNION ALL
8 SELECT 3017479, 40818, 202007, 174.64 FROM DUAL UNION ALL
9 SELECT 3017479, 40818, 202008, 123.64 FROM DUAL UNION ALL
10 SELECT 3017479, 40818, 202009, 123.64 FROM DUAL UNION ALL
11 SELECT 3017479, 40818, 202010, 123.64 FROM DUAL UNION ALL
12 SELECT 3017479, 40818, 202011, 123.64 FROM DUAL UNION ALL
13 SELECT 3017479, 40818, 202012, 123.64 FROM DUAL UNION ALL
14 SELECT 3017479, 60914, 202001, 210.29 FROM DUAL UNION ALL
15 SELECT 3017479, 60914, 202002, 210.29 FROM DUAL UNION ALL
16 SELECT 3017479, 60914, 202003, 210.29 FROM DUAL UNION ALL
17 SELECT 3017479, 60914, 202004, 163.23 FROM DUAL UNION ALL
18 SELECT 3017479, 60914, 202005, 163.23 FROM DUAL UNION ALL
19 SELECT 3017479, 60914, 202006, 163.23 FROM DUAL UNION ALL
20 SELECT 3017479, 60914, 202007, 184.25 FROM DUAL UNION ALL
21 SELECT 3017479, 60914, 202008, 163.23 FROM DUAL UNION ALL
22 SELECT 3017479, 60914, 202010, 163.23 FROM DUAL UNION ALL
23 SELECT 3017479, 60914, 202012, 163.23 FROM DUAL UNION ALL
24 SELECT 3017479, 60914, 202013, 163.23 FROM DUAL UNION ALL
25 SELECT 3017479, 60914, 202014, 184.25 FROM DUAL
26 ),
27 T AS (
28 SELECT FILENB,
29 CHILD,
30 DATE_START,
31 AMOUNT,
32 DATE_START - DENSE_RANK() OVER(PARTITION BY FILENB,CHILD,AMOUNT ORDER BY DATE_START) GRP
33 FROM MYDATA
34 )
35 SELECT FILENB,
36 CHILD,
37 MIN(DATE_START) DATE_START,
38 MAX(DATE_START) DATE_END,
39 AMOUNT
40 FROM T
41 GROUP BY FILENB,
42 CHILD,
43 AMOUNT,
44 GRP
45 ORDER BY FILENB,
46 CHILD,
47 DATE_START
48 /
FILENB CHILD DATE_START DATE_END AMOUNT
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 202002 204.66
3017479 40818 202003 202003 208.75
3017479 40818 202004 202006 123.64
3017479 40818 202007 202007 174.64
3017479 40818 202008 202012 123.64
3017479 60914 202001 202003 210.29
3017479 60914 202004 202006 163.23
3017479 60914 202007 202007 184.25
3017479 60914 202008 202008 163.23
3017479 60914 202010 202010 163.23
3017479 60914 202012 202013 163.23
3017479 60914 202014 202014 184.25
12 rows selected.
In the lines with 162.23 there are breaks in dates but not in amount.
SQL> WITH MYDATA AS (
2 SELECT 3017479 FILENB, 40818 CHILD, 202001 DATE_START, 204.66 AMOUNT FROM DUAL UNION ALL
3 SELECT 3017479, 40818, 202002, 204.66 FROM DUAL UNION ALL
4 SELECT 3017479, 40818, 202003, 208.75 FROM DUAL UNION ALL
5 SELECT 3017479, 40818, 202004, 123.64 FROM DUAL UNION ALL
6 SELECT 3017479, 40818, 202005, 123.64 FROM DUAL UNION ALL
7 SELECT 3017479, 40818, 202006, 123.64 FROM DUAL UNION ALL
8 SELECT 3017479, 40818, 202007, 174.64 FROM DUAL UNION ALL
9 SELECT 3017479, 40818, 202008, 123.64 FROM DUAL UNION ALL
10 SELECT 3017479, 40818, 202009, 123.64 FROM DUAL UNION ALL
11 SELECT 3017479, 40818, 202010, 123.64 FROM DUAL UNION ALL
12 SELECT 3017479, 40818, 202011, 123.64 FROM DUAL UNION ALL
13 SELECT 3017479, 40818, 202012, 123.64 FROM DUAL UNION ALL
14 SELECT 3017479, 60914, 202001, 210.29 FROM DUAL UNION ALL
15 SELECT 3017479, 60914, 202002, 210.29 FROM DUAL UNION ALL
16 SELECT 3017479, 60914, 202003, 210.29 FROM DUAL UNION ALL
17 SELECT 3017479, 60914, 202004, 163.23 FROM DUAL UNION ALL
18 SELECT 3017479, 60914, 202005, 163.23 FROM DUAL UNION ALL
19 SELECT 3017479, 60914, 202006, 163.23 FROM DUAL UNION ALL
20 SELECT 3017479, 60914, 202007, 184.25 FROM DUAL UNION ALL
21 SELECT 3017479, 60914, 202008, 163.23 FROM DUAL UNION ALL
22 SELECT 3017479, 60914, 202010, 163.23 FROM DUAL UNION ALL
23 SELECT 3017479, 60914, 202012, 163.23 FROM DUAL UNION ALL
24 SELECT 3017479, 60914, 202013, 163.23 FROM DUAL UNION ALL
25 SELECT 3017479, 60914, 202014, 184.25 FROM DUAL
26 ),
27 t as (
28 select filenb, child, date_start, amount,
29 case
30 when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
31 then row_number() over (partition by filenb, child order by date_start)
32 end rn
33 from mydata
34 ),
35 grouping as (
36 select filenb, child, date_start, amount,
37 max(rn) over (partition by filenb, child order by date_start) grp
38 from t
39 )
40 select filenb, child, min(date_start) date_start, max(date_start) date_end, amount
41 from grouping
42 group by filenb, child, amount, grp
43 order by filenb, child, date_start
44 /
FILENB CHILD DATE_START DATE_END AMOUNT
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 202002 204.66
3017479 40818 202003 202003 208.75
3017479 40818 202004 202006 123.64
3017479 40818 202007 202007 174.64
3017479 40818 202008 202012 123.64
3017479 60914 202001 202003 210.29
3017479 60914 202004 202006 163.23
3017479 60914 202007 202007 184.25
3017479 60914 202008 202013 163.23
3017479 60914 202014 202014 184.25
10 rows selected.
So this is to you to know which one fits your need.
|
|
|
|
|
Re: Grouping with the previous one if identical [message #687383 is a reply to message #687382] |
Sun, 05 March 2023 21:37   |
 |
mathguy
Messages: 77 Registered: January 2023
|
Member |
|
|
The problem has a trivial solution using MATCH_RECOGNIZE. I added more sample data to test for the following cases:
- Multiple file numbers (added 4000000)
- Child with gap in the data (child 80000 has data only for Jan-Feb and Aug 2020) - even with the same amount, there should be separate rows in the output
- Child (81000) with data in Dec of one year and Jan of the following year: there is no gap in the dates, even though there is gap in their numerical representation.
- Child (83002) with data for exactly one month (to make sure the query works correctly for such input too).
with
mydata (filenb, child, date_start, amount) as (
select 3017479, 40818, 202001, 204.66 from dual union all
select 3017479, 40818, 202002, 204.66 from dual union all
select 3017479, 40818, 202003, 208.75 from dual union all
select 3017479, 40818, 202004, 123.64 from dual union all
select 3017479, 40818, 202005, 123.64 from dual union all
select 3017479, 40818, 202006, 123.64 from dual union all
select 3017479, 40818, 202007, 174.64 from dual union all
select 3017479, 40818, 202008, 123.64 from dual union all
select 3017479, 40818, 202009, 123.64 from dual union all
select 3017479, 40818, 202010, 123.64 from dual union all
select 3017479, 40818, 202011, 123.64 from dual union all
select 3017479, 40818, 202012, 123.64 from dual union all
select 3017479, 60914, 202001, 210.29 from dual union all
select 3017479, 60914, 202002, 210.29 from dual union all
select 3017479, 60914, 202003, 210.29 from dual union all
select 3017479, 60914, 202004, 163.23 from dual union all
select 3017479, 60914, 202005, 163.23 from dual union all
select 3017479, 60914, 202006, 163.23 from dual union all
select 3017479, 60914, 202007, 184.25 from dual union all
select 3017479, 60914, 202008, 163.23 from dual union all
select 3017479, 60914, 202009, 163.23 from dual union all
select 3017479, 60914, 202010, 163.23 from dual union all
select 3017479, 60914, 202011, 163.23 from dual union all
select 3017479, 60914, 202012, 184.25 from dual union all
select 4000000, 80000, 202001, 123.45 from dual union all
select 4000000, 80000, 202002, 123.45 from dual union all
select 4000000, 80000, 202008, 123.45 from dual union all
select 4000000, 81000, 202012, 100.01 from dual union all
select 4000000, 81000, 202101, 100.01 from dual union all
select 4000000, 83002, 202008, 403.92 from dual
)
select filenb, child, date_start, date_end, amount
from mydata
match_recognize(
partition by filenb, child
order by date_start
measures f.date_start as date_start, last(date_start) as date_end,
f.amount as amount
pattern ( f s* )
define s as amount = f.amount
and to_date(to_char(date_start), 'yyyymm') =
add_months(to_date(to_char(prev(date_start)), 'yyyymm'), 1)
);
FILENB CHILD DATE_START DATE_END AMOUNT
---------- ---------- ---------- ---------- ----------
3017479 40818 202001 202002 204.66
3017479 40818 202003 202003 208.75
3017479 40818 202004 202006 123.64
3017479 40818 202007 202007 174.64
3017479 40818 202008 202012 123.64
3017479 60914 202001 202003 210.29
3017479 60914 202004 202006 163.23
3017479 60914 202007 202007 184.25
3017479 60914 202008 202011 163.23
3017479 60914 202012 202012 184.25
4000000 80000 202001 202002 123.45
4000000 80000 202008 202008 123.45
4000000 81000 202012 202101 100.01
4000000 83002 202008 202008 403.92
|
|
|
|
Re: Grouping with the previous one if identical [message #687404 is a reply to message #687389] |
Mon, 06 March 2023 09:22   |
 |
mathguy
Messages: 77 Registered: January 2023
|
Member |
|
|
Michel Cadot wrote on Mon, 06 March 2023 00:52
Quote:
The Tabibitosan solution will fail if the data is not all from a single calendar year. If a "child" has the same amount in 202012 and in 202101, the Tabibitosan method (as written) will find a gap there, because "dates" are stored as integers, and consecutive months like Dec. 2020 and Jan. 2021 are represented by non-consecutive integers.
The solution is then to insert a previous step that numbers the rows.
I don't see how numbering the rows can fix the solution. If you think it is possible, please show us how.
The tabibitosan method can be used here, but it's a bit more delicate than the usual approach. To create the grouping flags, one would need to convert the numbers to dates and use ADD_MONTHS, similar to what I did in the MATCH_RECOGNIZE solution. (Alternatively, one can devise some direct computation on integers that would be equivalent to the date calculation - but besides being tedious and therefore prone to introducing errors, it would obfuscate the real meaning of the computation, making it harder on future maintainers of the code. In any case, it would not be "numbering the rows".)
Something like this:
with
mydata (filenb, child, date_start, amount) as (
....................
)
, prep (filenb, child, date_start, amount, grp) as (
select filenb, child, date_start, amount,
add_months(to_date(to_char(date_start), 'yyyymm'),
- dense_rank() over (partition by filenb, child, amount order by date_start))
from mydata
)
select filenb, child, min(date_start) as date_start, max(date_start) as date_end, amount
from prep
group by filenb, child, amount, grp
order by filenb, child, date_start -- if needed
;
|
|
|
Re: Grouping with the previous one if identical [message #687406 is a reply to message #687389] |
Mon, 06 March 2023 09:53  |
 |
mathguy
Messages: 77 Registered: January 2023
|
Member |
|
|
Michel Cadot wrote on Mon, 06 March 2023 00:52
Quote:
I haven't checked yet, but it is possible that other solutions in this thread suffer from the same defect.
Too bad you didn't. Your post would be more interesting if you did.
I didn't check other solutions, because I didn't know enough about the OP's problem; waiting for the OP to clarify.
On the one hand, it is possible that all the inputs are for one calendar year, always. Perhaps that is what the OP needs: even with raw data for multiple years, he may need to report on one calendar year - filtering by date in a WHERE clause. Then the question is moot.
On the other hand, it is also possible that the OP must report over more than one calendar year, but show different groups when one year ends and another begins, even when the months are consecutive (no gaps) and the amount is the same. Then, in fact, the way Solomon wrote the solution would be correct. However, if this was part of the requirement, I would expect the sample data to include such cases, and it didn't; and/or the OP to say something to that effect, which he did not.
My speculation that other solutions may suffer from the same "defect" was based on what I didn't see in the code (only having glanced quickly at the other answers). If dates may be from different years, and January following December with the same amount should be considered as part of the same group, I would have to see something like the ADD_MONTHS trick I used in MATCH_RECOGNIZE (and in the tabibitosan solution I just showed), or a corresponding, direct arithmetic computation. I saw nothing of the sort, whence my speculation.
I am still not going to look closely at other solutions - I don't see the point. I will wait for the OP to clarify. From a very quick scanning, it seems that so far there are only two solutions (other than what I proposed) - you gave a solution that doesn't start a new group when there is a gap in dates, and the OP already explained that that isn't his requirement, and Solomon's tabibitosan solution. Was there anything else?
|
|
|
Goto Forum:
Current Time: Mon Sep 25 00:48:08 CDT 2023
|