The persistent homology of genealogical networksGenealogical networks (i.e. family trees) are of growing interest, with the
largest known data sets now including well over one billion individuals.
Interest in family history also supports an 8.5 billion dollar industry whose
size is projected to double within 7 years (FutureWise report HC1137). Yet
little mathematical attention has been paid to the complex network properties
of genealogical networks, especially at large scales.
The structure of genealogical networks is of particular interest due to the
practice of forming unions, e.g. marriages, that are typically well outside
one's immediate family. In most other networks, including other social
networks, no equivalent restriction exists on the distance at which
relationships form. To study the effect this has on genealogical networks we
use persistent homology to identify and compare the structure of 101
genealogical and 31 other social networks. Specifically, we introduce the
notion of a network's persistence curve, which encodes the network's set of
persistence intervals. We find that the persistence curves of genealogical
networks have a distinct structure when compared to other social networks. This
difference in structure also extends to subnetworks of genealogical and social
networks suggesting that, even with incomplete data, persistent homology can be
used to meaningfully analyze genealogical networks. Here we also describe how
concepts from genealogical networks, such as common ancestor cycles, are
represented using persistent homology. We expect that persistent homology tools
will become increasingly important in genealogical exploration as popular
interest in ancestry research continues to expand.
arxiv.org