Graphs for Genealogists™
Powered by Neo4j® and a physician genealogist

Home     Blog     Log In

Neo4j Genealogy PlugIn

Genealogy data involves multiple graphs. It is prudent to develop user defined functions (UDF) to simplify data uploads, reports and analytics which will encourage more genealogists to adopt Neo4j.

What are User Defined Functions?

Neo4j is a native graph database. It supports plugins which extend its native capabilities. Plugins incorporate User Defined Functions (UDF). The plugin is a Java jar files that is placed in the Neo4j plugin directory and then the UDFs run within its Cypher queries. Cypher is the open source standardize query language used by Neo4j.

Rationale for User Defined Functions?

Genealogy data involves multiple graphs. I've been working with Neo4j and genealogy data for several years, refining a schema and cypher queries. The complexity of the latter has increased over time. Thus, it is prudent to develop user defined functions (UDF) to simplify and speed up queries. But more importantly, the Genealogy UDF is designed to make state of the art graph methods available to more genealogists. It is a heavy lift to transition from current systems to graphs. But now it's pretty easy for the many genealogy citizen scientists. There is a software package that simplifies to implementation of the steps described here.

I have created a GitHub private repository for developing the Genealogy UDF. I'd like to engage a few collaborators as alpha-testers and also developers with skills in Java used in UDF development. If you'd like to be considered, please communicate with me offline at dave@wai.md.

Getting started involves a few steps:
  1. Install Neo4j Enterprise Edition, v 4.x (it's free)
  2. Download the genealogy UDF jar file and place it in the Neo4j Plugin folder
  3. Change of few lines in the Neo4j configuration file
  4. Run the UDF to load your GEDCOM and FTDNA project data (takes a few minutes)
  5. Enjoy your graph experiences and other functions below and others in development
There will be detailed instructions at the GitHib repository for those participating. The procedure which loads a GEDCOM file directly into Neo4j creates an easy path to initial success. GEDCOM is an industry standard which existing software can export. This procedure will ease your migration to graph resources. I will have an additional benefit of stimulating the standardization of a genealogy graph schema.

There is some background on Neo4j in genealogy at these links:
Early posts about Neoj in genealogy
Graphs for Genealogists blog posts

What will the UDFs do?

UDF are called within Cypher queries. Functions perform specific limited queries the results of which appear in the output of the query. UDF Procedures perform more complex tasks such as loading or modifying data within the database. We can envision a set of UDF that would be packaged in the Genealogy UDF (Gen-UDF)

  1. Upload GEDCOM file to Neo4j*
  2. Upload Family Tree DNA result files to Neo4j*
  3. Descriptive statics about the data uploaded to Neo4j*
  4. Create GEDCOM output from Neo4j data*
  5. Relationships
    • MRCAs of two persons*
    • MRCAs of multiple persons with different paths for the MRCAs*
    • MRCAs phasing of multiple persons with distinct paths to unrelated MRCAs*
    • Relationships lookups of two persons based on graph paths to common ancestor(s)*
    • Ahnentafel numbers in a person's pedigree chart*
    • Ahnentafel numbers of each person in path to MRCA*
  6. Genealogy Reports
    • Patrilineal trees to facilitate Y-DNA analytics
    • Matrilineal trees to facilitate mt_DNA analytics
    • X-linked relative trees to facilitate X match research
  7. Analyze DNA results
    • Find informative matches for further research
    • Facilitate triangulation group creation
    • Link genealogy DNA segments to OMIM gene data.
  8. Triangulation group (TG) queries
    • Create TG Reports*
    • Create at-haplotrees
    • TG segments: segments within a TG*
    • A specific segment's TG*
    • Matches sharing a TG*
    • Autosomal haplotree: TGs segregating in descent from MRCA
  9. Reference Dataset in Graph Format
    • Family relationship table*
    • Y-Hapotree: Blocks, SNPs and hierarchical edges*
    • mt-Haplotree
    • OMIM human gene classification
    • HapMap for computing segment centimorgans
    • Pile up regions
    • Newberry Library Historical County Boundary Shape Files
    * available


Our logo is a registered trademark