How to implement random sampling of a set of vectors in java? -
i have huge number of context vectors , want find average cosine similarity of them. however, it's not efficient calculate through whole set. that's why, want take random sample set.
the problem each context vector explains degree of meaning word want make balanced selection(according vector values). searched , found can use monte carlo method. found gibbs sampler example here: https://darrenjw.wordpress.com/2011/07/16/gibbs-sampler-in-various-languages-revisited/
however, confused little bit. understand, method provides normal distribution , generates double numbers. did not understand how implement method in case. explain me how can solve problem?
thanks in advance.
you don't want random sample, want representative sample. 1 relatively efficient way sort elements in "strength" order, take every nth element, give representative sample of size/n elements.
try this:
// given set<vector> myset; int reductionfactor = 200; // eg sample 0.5% of elements list<vector> list = new arraylist<>(myset); collections.sort(list, new comparator<vector> { public int compare(vector o1, vector o2) { // compare "strength" } }); list<vector> randomsample = new arraylist<>(list.size() / reductionfactor ); (int = 0; < list.size(); += reductionfactor) randomsample.add(list.get(i);
the time complexity o(n log n) due sort operation, , space complexity o(n).
Comments
Post a Comment