Jiehua Chen

Paper accepted at ESA 2019: On Computing Centroids According to the $p$-Norms of Hamming Distance Vectors

Posted on by Jiehua Chen

Summary

We consider the -Norm Hamming Centroid problem which asks to determine whether some given binary strings have a centroid with a bound on the -norm of its Hamming distances to the strings. Specifically, given a set of strings and a real , we consider the problem of determining whether there exists a string with , where denotes the Hamming distance metric. This problem has important applications in data clustering, and is a generalization of the well-known polynomial-time solvable Consensus String problem, as well as the NP-hard Closest String problem.

Our main result are

  1. -Norm Hamming Centroid is NP-hard for all fixed rational , closing the gap for all rational values of $p$ between $1$ and $\infty$.

  2. Under standard complexity assumptions the reduction also implies that -Norm Hamming Centroid has no -time or -time algorithm, where denotes the number of input strings and $n$ denotes the length of each string, for any fixed . Both running time lower bounds are tight. In particular, we provide a -time algorithm for each fixed .

  3. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor- approximation algorithm for the problem.

For a comparison, the following cluster with strings, each of length , shows that for different , we indeed obtain different optimal centroids. For each , string is an optimal -norm centroid but it is not an optimal -norm centroid, where . Moreover, one can verify that is the only optimal -norm centroid and no optimal -norm centroid (closest string) is an optimal -norm centroid.

====================================

The paper can be found on arXiv.