performance - Algorithm for calculate all difference values from a large list -
i have question have list of 3 million records , want difference values between every 2 records. simple nested loop may take forever. suggest me algorithm capable of handling problem?
if want calculate mean of absolute differences , timestamps sorted, need 1 loop:
t[i] <= t[i + 1] --> abs(t[i] - t[j]) = t[j] - t[i] < j that is, there summand positive sign , summand negative sign each of n timestamp differences. let's @ example 4 timestamps:
sum = (t[3] - t[2]) + (t[3] - t[1]) + (t[3] - t[0]) + (t[2] - t[1]) + (t[2] - t[0]) + (t[1] - t[0]) here, t[3] added, t[2] added twice , subtracted once, t[1] added once , subtracted twice , lowest value, t[0] subtracted.
ore, more general: first timestamp, i.e. 1 lowest value, has negative sign, n - 1 times. second has n - 2 times negative signs , positive sign once, namely when comparing the first timestamp. third has n - 3 times negative sign , positive sign twice.
so loop goes this:
sum = 0; = 0 n: sum = sum + (2*i - n + 1) * t[i] where i zero-based index , n exclusive upper bound, c-style. average, divide (n - 1) * n / 2.
if array isn't sorted, must sort first, has better performance quadratic time, should better off nested loop.
one thing might occur summing large values, hit limits of data type. try fix halving loop , start summing both ends in hope differences cancel out. alternatively divide total number of differences inside loop, possibly introducing nasty floating-point round-off errors.
Comments
Post a Comment