python - random subsampling of the majority class -
i have unbalanced data , want perform random subsampling on majority class each subsample same size minority class ... think implemented on weka , matlab, there equivalent on sklearn ?
say data looks generated code:
import numpy np x = np.random.randn(100, 3) y = np.array([int(i % 5 == 0) in range(100)])
(only 1/5th of y
1, minority class).
to find size of minority class, do:
>>> np.sum(y == 1) 20
to find subset consists of majority class, do:
majority_x, majority_y = x[y == 0, :], y[y == 0]
to find random subset of size 20, do:
inds = np.random.choice(range(majority_x.shape[0]), 20)
followed by
majority_x[inds, :]
and
majority_y[inds]
Comments
Post a Comment