Estimating the entropy of short discrete sequences is an important problem with multiple applications in statistics, linguistics, ecology or neuroscience. Despite the multiple attempts to find an unbiased estimator with minimum mean squared error, the proposed estimators exhibit a performance that greatly vary depending on the type of system under study and the size of the available data. Further, most of the entropy estimators found in the literature assume that the data sequence is generated by independent events. Therefore, their range of applicability to correlated systems has thus far remained unexplored. To fill in this gap, we have analyzed the mostly used entropy estimators for i) binary Markovian sequences and ii) Markovian systems in the
undersample regime. We carefully compare the performance of the estimators created for independent systems with a newly proposed entropy formula that takes into account the order of the sequence. We find that this new estimator performs well in terms of the bias but, in the undersample regime, its large dispersion is dominant.
Presential in the IFISC seminar room, Zoom stream at:
David Sánchez Contact form