# Jiehua Chen

Summary

We consider the $p$-Norm Hamming Centroid problem which asks to determine whether some given binary strings have a centroid with a bound on the $p$-norm of its Hamming distances to the strings. Specifically, given a set of strings $S$ and a real $k$, we consider the problem of determining whether there exists a string $s^*$ with $\big(\sum_{s \in S}d^p(s^*,s)\big)^{1/p} \leq k$, where $d(,)$ denotes the Hamming distance metric. This problem has important applications in data clustering, and is a generalization of the well-known polynomial-time solvable Consensus String $(p=1)$ problem, as well as the NP-hard Closest String $(p=\infty)$ problem.

Our main result are

1. $p$-Norm Hamming Centroid is NP-hard for all fixed rational $p > 1$, closing the gap for all rational values of $p$ between $1$ and $\infty$.

2. Under standard complexity assumptions the reduction also implies that $p$-Norm Hamming Centroid has no $2^{o(n+m)}$-time or $2^{o(k^{\frac{p}{(p+1)}})}$-time algorithm, where $m$ denotes the number of input strings and $n$ denotes the length of each string, for any fixed $p > 1$. Both running time lower bounds are tight. In particular, we provide a $2^{k^{\frac{p}{(p+1)}+\varepsilon}}$-time algorithm for each fixed $\varepsilon > 0$.

3. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-$2$ approximation algorithm for the problem.

For a comparison, the following cluster $S$ with $5$ strings, each of length $7$, shows that for different $p$, we indeed obtain different optimal centroids. For each $p\in \{1,2,\infty\}$, string $s^*_p$ is an optimal $p$-norm centroid but it is not an optimal $q$-norm centroid, where $q \in \{1, 2, \infty\} \setminus \{p\}$. Moreover, one can verify that $s^*_2$ is the only optimal $2$-norm centroid and no optimal $\infty$-norm centroid (closest string) is an optimal $2$-norm centroid.

====================================

The paper can be found on arXiv.