Jiehua Chen

On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Posted on by Jiehua Chen

Summary

We consider the p-Norm Hamming Centroid problem which asks to determine whether some given binary strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set of strings S and a real k, we consider the problem of determining whether there exists a string s^* with \big(\sum_{s \in S}d^p(s^*,s)\big)^{1/p} \leq k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=\infty) problem. Our main result shows that the problem is NP-hard for all rational p > 1, closing the gap for all rational values of p between 1 and \infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^{o(n+m)}-time or 2^{o(k^{\frac{p}{(p+1)}})}-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p \geq 2. Both running time lower bounds are tight. In particular, we provide a 2^{k^{\frac{p}{(p+1)}+\varepsilon}}-time algorithm for each fixed \varepsilon > 0.

====================================

The paper can be found on arXiv.