Pandas function for generating series from dataframe

futuraprime

I'd like to generate a Series by iterating over a DataFrame row by row, using the values from two columns. Something like this:

race_cats = ('White', 'Black', 'Hispanic', 'Other')
def raceParse(row):
    if row.hispan != 'Not Hispanic':
        return 'Hispanic'
    elif row.race == 'White':
        return 'White'
    elif row.race == 'Black/Negro':
        return 'Black'
    else:
        return 'Other'
df['race4'] = df.map(lambda r: raceParse(r)).astype('category', ordered=False, categories=race_cats)

This doesn't work, obviously, as DataFrame doesn't have a map method. apply and applymap work element-wise, not row-wise. What's the best way to do this?

Alexander

You can achieve your desired results using loc as follows:

# Sample data.
df = pd.DataFrame({'hispan': ['Not Hispanic', 'Not Hispanic', 'Hispanic'], 'race': ['White', 'Black', 'Other']})

>>> df
         hispan   race
0  Not Hispanic  White
1  Not Hispanic  Black
2      Hispanic  Other

df['race4'] = 'Other'
df.loc[df.race == 'Black', 'race4'] = 'Black'
df.loc[df.race == 'White', 'race4'] = 'White'
df.loc[~df.hispan.isin(['Not Hispanic']), 'race4'] = 'Hispanic'

>>> df
         hispan   race     race4
0  Not Hispanic  White     White
1  Not Hispanic  Black     Black
2      Hispanic  Other  Hispanic

You can then convert the column to categoricals if that is what you desire:

df['race4'] = pd.Categorical(df.race4, categories=['White', 'Black', 'Hispanic', 'Other'])

Note that the order of the loc assignments is important. It is equivalent to your if row.hispan != 'Not Hispanic': ... elif structure. By having the test for Hispanic last, it takes precedence over the race column.

I believe the reply above is what you want. In terms of what you asked for, there is an iterrows method:

def race_parse(row):
    if row.hispan != 'Not Hispanic':
        return 'Hispanic'
    elif row.race == 'White':
        return 'White'
    elif row.race == 'Black/Negro':
        return 'Black'
    else:
        return 'Other'

df['race4'] = [race_parse(row) for _, row in df.iterrows()]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Converting a Pandas GroupBy output from Series to DataFrame

From Java

Convert pandas Series to DataFrame

From Dev

Get particular row as series from pandas dataframe

From Dev

Creating pandas dataframe from series changes indexes

From Dev

Python: Pandas dataframe from Series of dict

From Dev

dataframe pandas subset series

From Dev

Pandas merging a Dataframe and a series

From Dev

Pandas dataframe from series of series

From Dev

List of Series from a DataFrame in Pandas

From Dev

seaborn time series from pandas dataframe

From Dev

how to make 1 by n dataframe from series in pandas?

From Dev

pandas automatically create dataframe from list of series with column names

From Dev

concat a DataFrame with a Series in Pandas

From Dev

Pandas, Generating a table from DataFrame with mutliple columns merged into the new index

From Dev

Select from pandas dataframe using boolean series/array

From Dev

Comparing pandas DataFrame to Series

From Dev

Python Pandas - how to apply boolean series to extract rows from dataframe

From Dev

Use Pandas dataframe to add lag feature from MultiIindex Series

From Dev

pandas dataframe to series

From Dev

Selection of a Series from pandas dataframe by interpolating column labels

From Dev

Convert pandas Series/DataFrame to numpy matrix, unpacking coordinates from index

From Dev

Pandas, Generating a table from DataFrame with mutliple columns merged into the new index

From Dev

Function on each row of pandas DataFrame but not generating a new column

From Dev

Adding Dates (Series) column from one DataFrame to the other Pandas, Python

From Dev

Use Map function with subsets of a pandas series or dataframe

From Dev

Adding series to pandas dataframe

From Dev

Python Pandas Series to Dataframe

From Dev

if string in pandas series contains a string from another pandas dataframe

From Dev

pandas get the value and the location from another DataFrame and make a series

Related Related

  1. 1

    Converting a Pandas GroupBy output from Series to DataFrame

  2. 2

    Convert pandas Series to DataFrame

  3. 3

    Get particular row as series from pandas dataframe

  4. 4

    Creating pandas dataframe from series changes indexes

  5. 5

    Python: Pandas dataframe from Series of dict

  6. 6

    dataframe pandas subset series

  7. 7

    Pandas merging a Dataframe and a series

  8. 8

    Pandas dataframe from series of series

  9. 9

    List of Series from a DataFrame in Pandas

  10. 10

    seaborn time series from pandas dataframe

  11. 11

    how to make 1 by n dataframe from series in pandas?

  12. 12

    pandas automatically create dataframe from list of series with column names

  13. 13

    concat a DataFrame with a Series in Pandas

  14. 14

    Pandas, Generating a table from DataFrame with mutliple columns merged into the new index

  15. 15

    Select from pandas dataframe using boolean series/array

  16. 16

    Comparing pandas DataFrame to Series

  17. 17

    Python Pandas - how to apply boolean series to extract rows from dataframe

  18. 18

    Use Pandas dataframe to add lag feature from MultiIindex Series

  19. 19

    pandas dataframe to series

  20. 20

    Selection of a Series from pandas dataframe by interpolating column labels

  21. 21

    Convert pandas Series/DataFrame to numpy matrix, unpacking coordinates from index

  22. 22

    Pandas, Generating a table from DataFrame with mutliple columns merged into the new index

  23. 23

    Function on each row of pandas DataFrame but not generating a new column

  24. 24

    Adding Dates (Series) column from one DataFrame to the other Pandas, Python

  25. 25

    Use Map function with subsets of a pandas series or dataframe

  26. 26

    Adding series to pandas dataframe

  27. 27

    Python Pandas Series to Dataframe

  28. 28

    if string in pandas series contains a string from another pandas dataframe

  29. 29

    pandas get the value and the location from another DataFrame and make a series

HotTag

Archive