I'd like to generate a Series
by iterating over a DataFrame
row by row, using the values from two columns. Something like this:
race_cats = ('White', 'Black', 'Hispanic', 'Other')
def raceParse(row):
if row.hispan != 'Not Hispanic':
return 'Hispanic'
elif row.race == 'White':
return 'White'
elif row.race == 'Black/Negro':
return 'Black'
else:
return 'Other'
df['race4'] = df.map(lambda r: raceParse(r)).astype('category', ordered=False, categories=race_cats)
This doesn't work, obviously, as DataFrame
doesn't have a map
method. apply
and applymap
work element-wise, not row-wise. What's the best way to do this?
You can achieve your desired results using loc
as follows:
# Sample data.
df = pd.DataFrame({'hispan': ['Not Hispanic', 'Not Hispanic', 'Hispanic'], 'race': ['White', 'Black', 'Other']})
>>> df
hispan race
0 Not Hispanic White
1 Not Hispanic Black
2 Hispanic Other
df['race4'] = 'Other'
df.loc[df.race == 'Black', 'race4'] = 'Black'
df.loc[df.race == 'White', 'race4'] = 'White'
df.loc[~df.hispan.isin(['Not Hispanic']), 'race4'] = 'Hispanic'
>>> df
hispan race race4
0 Not Hispanic White White
1 Not Hispanic Black Black
2 Hispanic Other Hispanic
You can then convert the column to categoricals if that is what you desire:
df['race4'] = pd.Categorical(df.race4, categories=['White', 'Black', 'Hispanic', 'Other'])
Note that the order of the loc
assignments is important. It is equivalent to your if row.hispan != 'Not Hispanic': ... elif
structure. By having the test for Hispanic last, it takes precedence over the race
column.
I believe the reply above is what you want. In terms of what you asked for, there is an iterrows
method:
def race_parse(row):
if row.hispan != 'Not Hispanic':
return 'Hispanic'
elif row.race == 'White':
return 'White'
elif row.race == 'Black/Negro':
return 'Black'
else:
return 'Other'
df['race4'] = [race_parse(row) for _, row in df.iterrows()]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments