python - Equivalent of \b word boundary in str.contains? -
is there equivalent when using str.contains?
the following code mistakenly listing "said business school" in category because of 'sa.' if create wordboundary solve problem. putting space after messes up. using pandas, dfs. know can use regex, curious if can use strings make faster
gprivate_n = ('co|inc|llc|group|ltd|corp|plc|sa |insurance|ag|as|media|&|corporation') df.loc[df[df.name.str.contains('{0}'.format(gprivate_n))].index, "private"] = 1
a word boundary not character, can't find .contains. need either use regex or split strings words , check membership of each of words in set have defined in gprivate_n.
Comments
Post a Comment