How to create a cumulative sum column in python if column value is greater than other value

joseph Published at Java

joseph

I am working now in getting a cumulative sum column using pandas. However, this column most include cumulative sum only if other column value is greater than other column value. Here's an example of my current data:

Index     A       B       C
0         1       20      3
1         10      15      11
2         20      12      25
3         30      18      32
4         40      32      17
5         50      12      4

Then I want to cumsum() column A if column B is greater than C, if not value is zero. Result column D in original df should look like:

Index     A       B       C      D
0         1       20      3      1
1         10      15      11     11
2         20      12      25     0
3         30      18      32     0
4         40      32      17     40
5         50      12      4      90

I appreciate any support in advance.

wwii

df = pd.DataFrame({'A': {0: 1, 1: 10, 2: 20, 3: 30, 4: 40, 5: 50},
                   'B': {0: 20, 1: 15, 2: 12, 3: 18, 4: 32, 5: 12},
                   'C': {0: 3, 1: 11, 2: 25, 3: 32, 4: 17, 5: 4}})

Make a boolean Series for your condition and identify consecutive groups of True or False

b_gt_c = df.B > df.C
groups = b_gt_c.ne(b_gt_c.shift()).cumsum()

In [107]: b_gt_c
Out[107]: 
0     True
1     True
2    False
3    False
4     True
5     True
dtype: bool

In [108]: groups
Out[108]: 
0    1
1    1
2    2
3    2
4    3
5    3
dtype: int32

Group by those groups; multiply the cumsum of each group by the condition; assign the result to the new df column.

gb = df.groupby(groups)
for k,g in gb:
    df.loc[g.index,'D'] = g['A'].cumsum() * b_gt_c[g.index]

In [109]: df
Out[109]: 
    A   B   C     D
0   1  20   3   1.0
1  10  15  11  11.0
2  20  12  25   0.0
3  30  18  32   0.0
4  40  32  17  40.0
5  50  12   4  90.0

You could skip the for loop as well :

df['G'] = np.where(df.B.gt(df.C), df.A, np.NaN)
group = df.B.gt(df.C).ne(df.B.gt(df.C).shift()).cumsum()
df['G'] = df.groupby(group).G.cumsum().fillna(0)

Identifying consecutive occurrence of values from SO Q&A: Grouping dataframe based on consecutive occurrence of values

Collected from the Internet

Please contact [email protected] to delete if infringement.