I have a Dataframe, table
, that looks like this:
year name prop sex soundex
1880 John 0.081541 boy J500
1880 William 0.080511 boy W450
....
2008 Elianna 0.000127 girl E450
I'm trying to group table
by 'year'
, and access select indices from the column 'name'
for each group.
My code is as follows (pretend that special_indices
is already defined):
def get_indices_func(x):
name = [x['name'].iloc[y] for y in special_indices]
return pd.Series(name)
table.groupby(by='year').apply(get_indices_func)
I got the following error:
/Users/***/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
722 """
723 try:
--> 724 return self._engine.get_value(series, key)
725 except KeyError, e1:
726 if len(self) > 0 and self.inferred_type == 'integer':
KeyError: 1000
What am I doing wrong? I think I'm not really understanding how apply (and its cousins, aggregate and agg) works. If someone could explain, I'd be ever so grateful!
An alternative solution:
df.groupby('year').apply(lambda x: x.sort('prop', ascending=False).iloc[0]['name'])
What is happening here?
First, as with Woody, we group by the correct column. apply()
will deliver group-level data to that function. Instead, for understanding purposes, I could had written
define takeAGroupAndGiveBackMax(group):
# year level data: first sort it by prop, descending
group.sort('prop', ascending=False, inplace=True)
# now return value 'name' of the first entry
return group.iloc[0]['name']
# the following will give you a data set, indexed on whatever you grouped it by (here: year), and have a columns all the properties you return.
df.groupby('year').apply(takeAGroupAndGiveBackMax)
In order to understand these, you should play around with the function. Try returning multiple columns, multiple rows, and you will see what apply()
returns to you. It is really a powerful tool that pandas gives you here.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments