TY - GEN

T1 - Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

AU - Cohen-Addad, Vincent

AU - Das, Debarati

AU - Kipouridis, Evangelos

AU - Parotsidis, Nikos

AU - Thorup, Mikkel

N1 - Funding Information:
The full version of the paper can be accessed at https://arxiv.org/ abs/2110.02807 Research of Debarati Das, Evangelos Kipouridis and Mikkel Thorup is supported by VILLUM Investigator Grant 16582, Basic
Funding Information:
Evangelos Kipouridis has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801199. 1https://iep.utm.edu/classifi/, Internet Encyclopedia of Philosophy
Publisher Copyright:
© 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - We consider the numerical taxonomy problem of fitting a positive distance function D}:\binom{S2rightarrow R}> 0 by a tree metric. We want a tree T with positive edge weights and including s among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate the branching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in s. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n) by Ailon and Charikar [2005] who wrote 'Determining whether an O(1) approximation can be obtained is a fascinating question'.

AB - We consider the numerical taxonomy problem of fitting a positive distance function D}:\binom{S2rightarrow R}> 0 by a tree metric. We want a tree T with positive edge weights and including s among the vertices so that their distances in T match those in D. A nice application is in evolutionary biology where the tree T aims to approximate the branching process leading to the observed distances in D [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in s. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n) by Ailon and Charikar [2005] who wrote 'Determining whether an O(1) approximation can be obtained is a fascinating question'.

UR - http://www.scopus.com/inward/record.url?scp=85127156121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85127156121&partnerID=8YFLogxK

U2 - 10.1109/FOCS52979.2021.00054

DO - 10.1109/FOCS52979.2021.00054

M3 - Conference contribution

AN - SCOPUS:85127156121

T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

SP - 468

EP - 479

BT - Proceedings - 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science, FOCS 2021

PB - IEEE Computer Society

T2 - 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021

Y2 - 7 February 2022 through 10 February 2022

ER -