python - random subsampling of the majority class -


i have unbalanced data , want perform random subsampling on majority class each subsample same size minority class ... think implemented on weka , matlab, there equivalent on sklearn ?

say data looks generated code:

import numpy np  x = np.random.randn(100, 3) y = np.array([int(i % 5 == 0) in range(100)]) 

(only 1/5th of y 1, minority class).

to find size of minority class, do:

>>> np.sum(y == 1) 20 

to find subset consists of majority class, do:

majority_x, majority_y = x[y == 0, :], y[y == 0] 

to find random subset of size 20, do:

inds = np.random.choice(range(majority_x.shape[0]), 20) 

followed by

majority_x[inds, :] 

and

majority_y[inds] 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -