Pandas: filter dataframe by column of sets using multiple conditions regarding substrings

SJPJack Published at Dev

SJPJack

I have a problem whereby I want to filter the following dataframe such that I only return the rows where we have both a pie and a non-pie item:

ID	set
1	apple pie, banana loaf
2	banana pie, apple pie
3	banana loaf, apple tart

Thus, the expected output would be:

ID	set
1	apple pie, banana loaf

Note that every set in the set column contains exactly two items.

What I have tried so far:

df[(any("pie" in s for s in df['set'])) & (any("pie" not in s for s in df['set']))]

I expect I am doing something that is breaking Pandas dataframe filtering convention but not sure what exactly.

Any help appreciated!

Marcelo Paco

You could use apply on your dataframe:

df[df.set.apply(lambda x: len([s for s in x if "pie" in s]) == 1)]

Results:

   ID                       set
0   1  [apple pie, banana loaf]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2023-08-26

Comments

0 comments

From Dev

Splitting a column of a Pandas dataframe using multiple conditions

From Dev

How to filter Pandas dataframe based on grouped multiple column conditions?

From Dev

Removing multiple substrings in a pandas dataframe column

From Dev

Error when using variables to filter dataframe with multiple conditions (Pandas)

From Dev

Filter pandas dataframe rows based on multiple conditions

From Dev

Using a list of conditions to filter a DataFrame in Pandas

From Dev

Filter Pandas Dataframe using an arbitrary number of conditions

From Dev

Pandas dataframe replace values on multiple column conditions

From Dev

Filter dataframe with multiple conditions

From Dev

Creating Column in Dataframe Using Multiple Conditions

From Java

Sort Pandas Dataframe by substrings of a column

From Dev

pandas if else conditions for multiple columns using dataframe

From Dev

Pandas: using last column value with multiple conditions?

From Dev

Filter pandas dataframe rows by multiple column values

From Java

Check if multiple substrings are in pandas dataframe

From Dev

How to filter column values in the pandas dataframe with certain conditions?

From Dev

How to print just selected substrings (contained in a dataframe column) setting conditions with pandas

From Dev

Pandas: Filter correctly Dataframe columns considering multiple conditions

From Dev

Faster way to filter pandas DataFrame in For loop on multiple conditions

From Dev

How can I filter a pandas dataframe of substrings based on another dataframe's column of full strings?

From Dev

loop to filter rows based on multiple column conditions pandas python

From Dev

Filter pandas DataFrame using a column of np arrays

From Dev

Fill a column in my dataframe with another dataframe using two conditions Pandas

From Dev

Unable to filter pandas dataframe based on set of substrings

From Dev

using for loop to get mutiple dataframe that fits my filter conditions with pandas

From Dev

Function to filter dataframe with multiple if / and conditions

From Dev

How to filter dataframe with multiple conditions?

From Dev

Filter dataframe index on multiple conditions

From Dev

How to estimate count for Pandas dataframe column values based on multiple conditions?

Related Related

Article