Using 'apply' in Pandas (externally defined function)

monkeybiz7 Published at Dev

monkeybiz7

I have a Dataframe, table, that looks like this:

year name     prop     sex  soundex
1880 John     0.081541 boy  J500
1880 William  0.080511 boy  W450
....
2008 Elianna  0.000127 girl E450

I'm trying to group table by 'year', and access select indices from the column 'name' for each group.

My code is as follows (pretend that special_indices is already defined):

def get_indices_func(x):
    name = [x['name'].iloc[y] for y in special_indices]
    return pd.Series(name)


table.groupby(by='year').apply(get_indices_func)

I got the following error:

/Users/***/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    722         """
    723         try:
--> 724             return self._engine.get_value(series, key)
    725         except KeyError, e1:
    726             if len(self) > 0 and self.inferred_type == 'integer':

KeyError: 1000

What am I doing wrong? I think I'm not really understanding how apply (and its cousins, aggregate and agg) works. If someone could explain, I'd be ever so grateful!

FooBar

An alternative solution:

df.groupby('year').apply(lambda x: x.sort('prop', ascending=False).iloc[0]['name'])

What is happening here?

First, as with Woody, we group by the correct column. apply() will deliver group-level data to that function. Instead, for understanding purposes, I could had written

define takeAGroupAndGiveBackMax(group):
    # year level data: first sort it by prop, descending
    group.sort('prop', ascending=False, inplace=True)
    # now return value 'name' of the first entry
    return group.iloc[0]['name']

# the following will give you a data set, indexed on whatever you grouped it by (here: year), and have a columns all the properties you return.    
df.groupby('year').apply(takeAGroupAndGiveBackMax)

In order to understand these, you should play around with the function. Try returning multiple columns, multiple rows, and you will see what apply() returns to you. It is really a powerful tool that pandas gives you here.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-9

Comments

0 comments

From Dev

Related Related

Article

Using 'apply' in Pandas (externally defined function)

Using 'apply' in Pandas (externally defined function)

How to call a javascript function defined externally using react?

Python pandas apply function defined in class

apply using a function with iloc pandas

Is it possible to use an externally defined function in a closure

pandas apply User defined function to grouped dataframe on multiple columns

Using result_type with pandas apply function

Using a shift() function within an apply function to compare rows in a Pandas Dataframe

Apply function with args in pandas

Alternative to apply function in pandas

apply to the entire dataframe a user-defined function involving another dataframe in pandas

Apply function to pandas dataframe row using values in other rows

Dealing with None values when using Pandas Groupby and Apply with a Function

Using rolling_apply with a function that requires 2 arguments in Pandas

Python custom function using rolling_apply for pandas

how to apply Functions on numpy arrays using pandas groupby function

In Pandas, how to apply a customized function using Group mean on Groupby Object

Performance of custom function while using .apply on Pandas Dataframes

Pandas: The apply function I am using is giving me wrong results

Performance of custom function while using .apply on Pandas Dataframes

Use function (not using lambda) with apply method for pandas DataFrame

Using pandas apply

apply() in R with user-defined function

Exception Handling in Pandas .apply() function

Apply ewm function on Pandas groupby

Speeding up Pandas apply function

Pandas Groupby Apply Function to Level

pandas apply function with arguments no lambda

Counting within Pandas apply() function

Apply Customize Cumulative Function to Pandas