I am working now in getting a cumulative sum column using pandas. However, this column most include cumulative sum only if other column value is greater than other column value. Here's an example of my current data:
Index A B C
0 1 20 3
1 10 15 11
2 20 12 25
3 30 18 32
4 40 32 17
5 50 12 4
Then I want to cumsum()
column A if column B is greater than C, if not value is zero. Result column D in original df
should look like:
Index A B C D
0 1 20 3 1
1 10 15 11 11
2 20 12 25 0
3 30 18 32 0
4 40 32 17 40
5 50 12 4 90
I appreciate any support in advance.
df = pd.DataFrame({'A': {0: 1, 1: 10, 2: 20, 3: 30, 4: 40, 5: 50},
'B': {0: 20, 1: 15, 2: 12, 3: 18, 4: 32, 5: 12},
'C': {0: 3, 1: 11, 2: 25, 3: 32, 4: 17, 5: 4}})
Make a boolean Series for your condition and identify consecutive groups of True or False
b_gt_c = df.B > df.C
groups = b_gt_c.ne(b_gt_c.shift()).cumsum()
In [107]: b_gt_c
Out[107]:
0 True
1 True
2 False
3 False
4 True
5 True
dtype: bool
In [108]: groups
Out[108]:
0 1
1 1
2 2
3 2
4 3
5 3
dtype: int32
Group by those groups; multiply the cumsum of each group by the condition; assign the result to the new df column.
gb = df.groupby(groups)
for k,g in gb:
df.loc[g.index,'D'] = g['A'].cumsum() * b_gt_c[g.index]
In [109]: df
Out[109]:
A B C D
0 1 20 3 1.0
1 10 15 11 11.0
2 20 12 25 0.0
3 30 18 32 0.0
4 40 32 17 40.0
5 50 12 4 90.0
You could skip the for loop as well :
df['G'] = np.where(df.B.gt(df.C), df.A, np.NaN)
group = df.B.gt(df.C).ne(df.B.gt(df.C).shift()).cumsum()
df['G'] = df.groupby(group).G.cumsum().fillna(0)
Identifying consecutive occurrence of values from SO Q&A: Grouping dataframe based on consecutive occurrence of values
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments