Pandas Apply lambda function null values

flyingmeatball

I'm trying to split a column in two, but I know there are null values in my data. Imagine this dataframe:

df = pd.DataFrame(['fruit: apple','vegetable: asparagus',None, 'fruit: pear'], columns = ['text'])

df

                   text
0          fruit: apple
1  vegetable: asparagus
2                   None
3           fruit: pear

I'd like to split this into multiple columns like so:

df['cat'] = df['text'].apply(lambda x: 'unknown' if x == None else x.split(': ')[0])
df['value'] = df['text'].apply(lambda x: 'unknown' if x == None else x.split(': ')[1])

print df

                   text        cat      value
0          fruit: apple      fruit      apple
1  vegetable: asparagus  vegetable  asparagus
2                  None    unknown    unknown
3           fruit: pear      fruit       pear

However, if I have the following df instead:

df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], columns = ['text'])

splitting results in the following error:

df['cat'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[0])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-159-8e5bca809635> in <module>()
      1 df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], columns = ['text'])
      2 #df.columns = ['col_name']
----> 3 df['cat'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[0])
      4 df['value'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[1])

C:\Python27\lib\site-packages\pandas\core\series.pyc in apply(self, func, convert_dtype, args, **kwds)
   2158             values = lib.map_infer(values, lib.Timestamp)
   2159 
-> 2160         mapped = lib.map_infer(values, f, convert=convert_dtype)
   2161         if len(mapped) and isinstance(mapped[0], Series):
   2162             from pandas.core.frame import DataFrame

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:62187)()

<ipython-input-159-8e5bca809635> in <lambda>(x)
      1 df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], columns = ['text'])
      2 #df.columns = ['col_name']
----> 3 df['cat'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[0])
      4 df['value'] = df['text'].apply(lambda x: 'unknown' if x == np.nan else x.split(': ')[1])

AttributeError: 'float' object has no attribute 'split'

How do I do the same split with NaN values? Is there generally a better way to apply a split function that ignores null values?

Imagine this wasn't a string example, instead if I had the following:

df = pd.DataFrame([2,4,6,8,10,np.nan,12], columns = ['numerics'])
df['numerics'].apply(lambda x: np.nan if pd.isnull(x) else x/2.0)

I feel like Series.apply should almost take an argument that instructs it to skip null rows and just output them as nulls. I haven't found a better generic way to do transformations to a series without having to manually avoid nulls.

unutbu

Instead of apply with a custom function you could use the Series.str.extract method:

import numpy as np
import pandas as pd
# df = pd.DataFrame(['fruit: apple','vegetable: asparagus',None, 'fruit: pear'], 
#                   columns = ['text'])
df = pd.DataFrame(['fruit: apple','vegetable: asparagus',np.nan, 'fruit: pear'], 
                  columns = ['text'])
df[['cat', 'value']] = df['text'].str.extract(r'([^:]+):?(.*)', expand=True).fillna('unknown')
print(df)

yields

                   text        cat       value
0          fruit: apple      fruit       apple
1  vegetable: asparagus  vegetable   asparagus
2                   NaN    unknown     unknown
3           fruit: pear      fruit        pear

apply with a custom function is generally slower than equivalent code which makes use of vectorized methods such as Series.str.extract. Under the hood, apply (with an unvectorizable function) essentially calls the custom function in a Python for-loop.


Regarding the edited question: If you have

df = pd.DataFrame([2,4,6,8,10,np.nan,12], columns = ['numerics'])

then use

In [207]: df['numerics']/2
Out[207]: 
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    NaN
6    6.0
Name: numerics, dtype: float64

instead of

df['numerics'].apply(lambda x: np.nan if pd.isnull(x) else x/2.0)

Again, vectorized arithmetic beats apply with a custom function:

In [210]: df = pd.concat([df]*100, ignore_index=True)

In [211]: %timeit df['numerics']/2
10000 loops, best of 3: 93.8 µs per loop

In [212]: %timeit df['numerics'].apply(lambda x: np.nan if pd.isnull(x) else x/2.0)
1000 loops, best of 3: 836 µs per loop

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Pandas Apply lambda function null values

From Dev

pandas apply function with arguments no lambda

From Dev

Apply a lambda with a shift function in python pandas were some null elements are to be replaced

From Dev

Vectorizing a very simple pandas lambda function in apply

From Dev

Vectorizing a very simple pandas lambda function in apply

From Dev

Python pandas apply function if a column value is not NULL

From Dev

Apply function on cumulative values of pandas series

From Dev

Pandas - Apply function and generate more than one row with lambda function

From Dev

How to properly apply a lambda function into a pandas data frame column

From Dev

Use function (not using lambda) with apply method for pandas DataFrame

From Dev

Iterate over rows in a pandas dataframe and apply a lambda function

From Dev

Apply a fucntion with lambda to Pandas

From Dev

pandas apply function that returns multiple values to rows in pandas dataframe

From Dev

Pandas Replace original column null values with df.apply results

From Dev

Pandas count null values in a groupby function

From Dev

Using lambda and strftime on dates when there are null values (Pandas)

From Dev

Apply function to pandas dataframe row using values in other rows

From Dev

Dealing with None values when using Pandas Groupby and Apply with a Function

From Dev

Pandas Groupby apply function to count values greater than zero

From Dev

Apply function row wise on pandas data frame on columns with numerical values

From Dev

Getting previous row values from within pandas apply() function

From Dev

Getting previous row values from within pandas apply() function

From Dev

Python pandas with lambda apply difficulty

From Dev

Apply function with args in pandas

From Dev

Alternative to apply function in pandas

From Dev

Cross Apply remove NULL values

From Dev

Error with lambda function in pandas

From Dev

Lambda function notation in Pandas

From Dev

Pandas: Feeding index values to apply

Related Related

  1. 1

    Pandas Apply lambda function null values

  2. 2

    pandas apply function with arguments no lambda

  3. 3

    Apply a lambda with a shift function in python pandas were some null elements are to be replaced

  4. 4

    Vectorizing a very simple pandas lambda function in apply

  5. 5

    Vectorizing a very simple pandas lambda function in apply

  6. 6

    Python pandas apply function if a column value is not NULL

  7. 7

    Apply function on cumulative values of pandas series

  8. 8

    Pandas - Apply function and generate more than one row with lambda function

  9. 9

    How to properly apply a lambda function into a pandas data frame column

  10. 10

    Use function (not using lambda) with apply method for pandas DataFrame

  11. 11

    Iterate over rows in a pandas dataframe and apply a lambda function

  12. 12

    Apply a fucntion with lambda to Pandas

  13. 13

    pandas apply function that returns multiple values to rows in pandas dataframe

  14. 14

    Pandas Replace original column null values with df.apply results

  15. 15

    Pandas count null values in a groupby function

  16. 16

    Using lambda and strftime on dates when there are null values (Pandas)

  17. 17

    Apply function to pandas dataframe row using values in other rows

  18. 18

    Dealing with None values when using Pandas Groupby and Apply with a Function

  19. 19

    Pandas Groupby apply function to count values greater than zero

  20. 20

    Apply function row wise on pandas data frame on columns with numerical values

  21. 21

    Getting previous row values from within pandas apply() function

  22. 22

    Getting previous row values from within pandas apply() function

  23. 23

    Python pandas with lambda apply difficulty

  24. 24

    Apply function with args in pandas

  25. 25

    Alternative to apply function in pandas

  26. 26

    Cross Apply remove NULL values

  27. 27

    Error with lambda function in pandas

  28. 28

    Lambda function notation in Pandas

  29. 29

    Pandas: Feeding index values to apply

HotTag

Archive