Starting with this dataframe df
:
0 1 2
02 en it None
03 en None None
01 nl en fil
There are some missing values. I'm trying to apply a replace function row-wise, e.g. in pseudocode:
def replace(x):
if 'fil' and 'nl' in row:
x = ''
I know that I can do someting like:
df.apply(f, axis=1)
with a function f
defined like:
def f(x):
if x[0] == 'nl' and x[2] == 'fil':
x[0] = ''
return x
obtaining:
0 1 2
02 en it None
03 en None None
01 en fil
but a priori I don't know the actual positions of the strings through the columns, so I have to search with something like the isin
method, but row-wise.
EDIT: every string can appear anywhere throughout the columns.
You could create boolean indexing based on string comparisons like this
df['0'].str.contains('nl') & df['2'].str.contains('fil')
or since you updated that the columns could change:
df.isin(['fil']).any(axis=1) & df.isin(['nl']).any(axis=1)
Here is the test case:
import pandas as pd
from cStringIO import StringIO
text_file = '''
0 1 2
02 en it None
03 en None None
01 nl en fil
'''
# Read in tabular data
df = pd.read_table(StringIO(text_file), sep='\s+')
print 'Original Data:'
print df
print
# Create boolean index based on text comparison
boolIndx = df.isin(['nl']).any(axis=1) & df.isin(['fil']).any(axis=1)
print 'Example Boolean index:'
print boolIndx
print
# Replace string based on boolean assignment
df.loc[boolIndx] = df.loc[boolIndx].replace('nl', '')
print 'Filtered Data:'
print df
print
Original Data:
0 1 2
2 en it None
3 en None None
1 nl en fil
Example Boolean index:
2 False
3 False
1 True
dtype: bool
Filtered Data:
0 1 2
2 en it None
3 en None None
1 en fil
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments