pandas - Calculating mutual information in python returns nan -
i've implemented mutual information formula in python using pandas
, numpy
def mutual_info(p): p_x=p.sum(axis=1) p_y=p.sum(axis=0) i=0.0 i_y in p.index: i_x in p.columns: i+=(p.ix[i_y,i_x]*np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))).values[0] return
however, if cell in p
has 0 probability, np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))
negative infinity, , whole expression multiplied 0 , returns nan
.
what right way work around that?
for various theoretical , practical reasons (e.g., see competitive distribution estimation: why good-turing good), might consider never using 0 probability log loss measure.
so, say, if have probability vector p, then, small scalar α > 0, use α 1 + (1 - α) p (where here first 1 uniform vector). unfortunately, there no general guidelines choosing α, , you'll have assess further down calculation.
for kullback-leibler distance, of course apply each of inputs.
Comments
Post a Comment