python - Pandas: Efficiently subset DataFrame based on strings containing certain values -

June 15, 2014

to illustrate want achieve here dataframe called df:

column1  column2   1        foo faa 2        bar car 3        dog dog 4        cat rat 5        foo foo 6        bar cat 7        bird rat 8        cat dog 9        bird foo 10       bar car

i want subset dataframe - condition being rows dropped if string in column2 contains 1 of multiple values.

this easy enough single value, in instance 'foo':

df = df[~df['column2'].str.contains("foo")]

but let's wanted drop rows in strings in column2 contained 'cat' or 'foo'. applied df above, drop 5 rows.

what efficient, pythonic way this? either in form of function, multiple booleans or else i'm not thinking of.

isin doesn't work requires exact matches.

n.b: have edited question made mistake first time round. apologies.

you can use logical masking as:

df = df[(~df['column2'].str.contains("foo")) & (~df['column2'].str.contains("bird")) & (~df['column2'].str.contains("cat"))]

that returns:

   column1 column2 1        2     bar 2        3     dog 5        6     bar 9       10     bar

Search This Blog

Two

python - Pandas: Efficiently subset DataFrame based on strings containing certain values -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -