pandas - Calculating mutual information in python returns nan -


i've implemented mutual information formula in python using pandas , numpy

def mutual_info(p):     p_x=p.sum(axis=1)     p_y=p.sum(axis=0)     i=0.0     i_y in p.index:         i_x in p.columns:            i+=(p.ix[i_y,i_x]*np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))).values[0]     return 

however, if cell in p has 0 probability, np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x])) negative infinity, , whole expression multiplied 0 , returns nan.

what right way work around that?

for various theoretical , practical reasons (e.g., see competitive distribution estimation: why good-turing good), might consider never using 0 probability log loss measure.

so, say, if have probability vector p, then, small scalar α > 0, use α 1 + (1 - α) p (where here first 1 uniform vector). unfortunately, there no general guidelines choosing α, , you'll have assess further down calculation.

for kullback-leibler distance, of course apply each of inputs.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -