python - Pandas: Efficiently subset DataFrame based on strings containing certain values -
to illustrate want achieve here dataframe called df
:
column1 column2 1 foo faa 2 bar car 3 dog dog 4 cat rat 5 foo foo 6 bar cat 7 bird rat 8 cat dog 9 bird foo 10 bar car
i want subset dataframe - condition being rows dropped if string in column2
contains 1 of multiple values.
this easy enough single value, in instance 'foo':
df = df[~df['column2'].str.contains("foo")]
but let's wanted drop rows in strings in column2 contained 'cat' or 'foo'. applied df
above, drop 5 rows.
what efficient, pythonic way this? either in form of function, multiple booleans or else i'm not thinking of.
isin
doesn't work requires exact matches.
n.b: have edited question made mistake first time round. apologies.
you can use logical masking as:
df = df[(~df['column2'].str.contains("foo")) & (~df['column2'].str.contains("bird")) & (~df['column2'].str.contains("cat"))]
that returns:
column1 column2 1 2 bar 2 3 dog 5 6 bar 9 10 bar
Comments
Post a Comment