Pandas function for generating series from dataframe

futuraprime Published at Dev

futuraprime

I'd like to generate a Series by iterating over a DataFrame row by row, using the values from two columns. Something like this:

race_cats = ('White', 'Black', 'Hispanic', 'Other')
def raceParse(row):
    if row.hispan != 'Not Hispanic':
        return 'Hispanic'
    elif row.race == 'White':
        return 'White'
    elif row.race == 'Black/Negro':
        return 'Black'
    else:
        return 'Other'
df['race4'] = df.map(lambda r: raceParse(r)).astype('category', ordered=False, categories=race_cats)

This doesn't work, obviously, as DataFrame doesn't have a map method. apply and applymap work element-wise, not row-wise. What's the best way to do this?

Alexander

You can achieve your desired results using loc as follows:

# Sample data.
df = pd.DataFrame({'hispan': ['Not Hispanic', 'Not Hispanic', 'Hispanic'], 'race': ['White', 'Black', 'Other']})

>>> df
         hispan   race
0  Not Hispanic  White
1  Not Hispanic  Black
2      Hispanic  Other

df['race4'] = 'Other'
df.loc[df.race == 'Black', 'race4'] = 'Black'
df.loc[df.race == 'White', 'race4'] = 'White'
df.loc[~df.hispan.isin(['Not Hispanic']), 'race4'] = 'Hispanic'

>>> df
         hispan   race     race4
0  Not Hispanic  White     White
1  Not Hispanic  Black     Black
2      Hispanic  Other  Hispanic

You can then convert the column to categoricals if that is what you desire:

df['race4'] = pd.Categorical(df.race4, categories=['White', 'Black', 'Hispanic', 'Other'])

Note that the order of the loc assignments is important. It is equivalent to your if row.hispan != 'Not Hispanic': ... elif structure. By having the test for Hispanic last, it takes precedence over the race column.

I believe the reply above is what you want. In terms of what you asked for, there is an iterrows method:

def race_parse(row):
    if row.hispan != 'Not Hispanic':
        return 'Hispanic'
    elif row.race == 'White':
        return 'White'
    elif row.race == 'Black/Negro':
        return 'Black'
    else:
        return 'Other'

df['race4'] = [race_parse(row) for _, row in df.iterrows()]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-28

Comments

0 comments

From Java

Related Related

Article

Pandas function for generating series from dataframe

Pandas function for generating series from dataframe

Converting a Pandas GroupBy output from Series to DataFrame

Convert pandas Series to DataFrame

Get particular row as series from pandas dataframe

Creating pandas dataframe from series changes indexes

Python: Pandas dataframe from Series of dict

dataframe pandas subset series

Pandas merging a Dataframe and a series

Pandas dataframe from series of series

List of Series from a DataFrame in Pandas

seaborn time series from pandas dataframe

how to make 1 by n dataframe from series in pandas?

pandas automatically create dataframe from list of series with column names

concat a DataFrame with a Series in Pandas

Pandas, Generating a table from DataFrame with mutliple columns merged into the new index

Select from pandas dataframe using boolean series/array

Comparing pandas DataFrame to Series

Python Pandas - how to apply boolean series to extract rows from dataframe

Use Pandas dataframe to add lag feature from MultiIindex Series

pandas dataframe to series

Selection of a Series from pandas dataframe by interpolating column labels

Convert pandas Series/DataFrame to numpy matrix, unpacking coordinates from index

Pandas, Generating a table from DataFrame with mutliple columns merged into the new index

Function on each row of pandas DataFrame but not generating a new column

Adding Dates (Series) column from one DataFrame to the other Pandas, Python

Use Map function with subsets of a pandas series or dataframe

Adding series to pandas dataframe

Python Pandas Series to Dataframe

if string in pandas series contains a string from another pandas dataframe

pandas get the value and the location from another DataFrame and make a series