Graphs for Genealogists™
Powered by Neo4j® and a physician genealogist

Home     Blog     Log In

The Power of Graph Methods

Overview of how a graph database works and support genealogy research.

Graph methods are powerful and especially well suited for genealogy research. The GFG platform, which deploys Neo4j, exploits some of the capabilities. Neo4j version 4 enables multiple databases on the same server, allowing each user to have their own environment with access control and role management. Users can create their database and upload vendor files for multiple kits. This schema includes nodes and edges. Edges are the relationships between the nodes. Both nodes and edges can have properties (metadata). Details of this are described in another post.

Genetic genealogy primarily uses two graphs, the traditional family tree and graphs created by DNA. Linking these two graphs is much easier and more robust in a graph database. The platform provides user friendly tools for you to curate the links between your two graphs. Queries are more intuitive and efficient that traditional relational databases because traversal of paths in a graph replaces stepping through joins. Queries traversing the graph can collect information along the path, such as lineage information. Finding common ancestors involves traversals until they intersect. Similarly you can easily traverse from an ancestor to all descendants who have DNA test results. DNA results are linked to the tester and their traditional family tree (from a GEDCOM file). Putting this together, one can readily produce chromosome segment data for descendants of an ancestor with all the matches at a segment and those matches not at the segment. This is triangulation of segments made easy. Assembly of shared matches, which may not share segments, is also robust in a graph system. The platform also deployes a variety of visualization tools and outputs that can be downloaded. These will be discussed in detail in other posts. including a few listed here.

Previous blog posts provide useful background:

  1. Graph Databases in Genealogy; Nov 2017
  2. Neo4j Technical Issues; Dec 2017. Significant upgrades since then
  3. Y-DNA Haplotrees; Dec 2017

There are several other tools available to genealogist interested in clustering and graph analytic methods:
  1. Genetic Affairs
  2. Double Match Triangulator
  3. Connected DNA
The GFG platform uses Neo4j. Much is written about Neo4j. A good starting point is a set of free books available in print or as a pdf:

  1. O'Reilley's Graph Databases
  2. O'Rielly's Graph Algorithms
  3. Graph Databases for Dummies

Graphs are everywhere! When you grasp their pervasiveness, your world view is transformed. Albert-László Barabási has written books for lay readers and Linked is a good starting point for understanding graphs in general. He has also written a definitive textbook and reference work: Network Science.