python - Pandas: Efficiently subset DataFrame based on strings containing certain values -


to illustrate want achieve here dataframe called df:

column1  column2   1        foo faa 2        bar car 3        dog dog 4        cat rat 5        foo foo 6        bar cat 7        bird rat 8        cat dog 9        bird foo 10       bar car 

i want subset dataframe - condition being rows dropped if string in column2 contains 1 of multiple values.

this easy enough single value, in instance 'foo':

df = df[~df['column2'].str.contains("foo")]

but let's wanted drop rows in strings in column2 contained 'cat' or 'foo'. applied df above, drop 5 rows.

what efficient, pythonic way this? either in form of function, multiple booleans or else i'm not thinking of.

isin doesn't work requires exact matches.

n.b: have edited question made mistake first time round. apologies.

you can use logical masking as:

df = df[(~df['column2'].str.contains("foo")) & (~df['column2'].str.contains("bird")) & (~df['column2'].str.contains("cat"))] 

that returns:

   column1 column2 1        2     bar 2        3     dog 5        6     bar 9       10     bar 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -