Graphs for Genealogists™
Powered by Neo4j® and a physician genealogist

Home     Blog     Log In

Autosomal DNA analytics in surname projects

Surname projects involve men whose Y-DNA matches but whose relatedness may otherwise be unknown. at-DNA matches shared by these men can provide clues for further research. Graph methods a ideally suited for exposing and visualizing these clues. In other words, graph methods help you discover facts otherwise difficult to recognize.

Family Tree DNA supports surname projects. Participants either share a surname themselves or have Y-DNA matches to the surname. In the latter case, there was likely a surname change somewhere back in the lineage. Surname projects group men based on haplogroups and the pattern of single nucleotide polymorphisms (SNPs). It is not unusual for projects to have numerous groups. Within a group the project seeks to identify common male ancestors (CMA). When there are well documented family trees a CMA may already be defined and the DNA is used to validate or extend the conclusions derived from historical records. However, many projects have groups of men whose common ancestor is unknown. This is a common conundrum in American families who arrived in colonial times and whose diasphora is lost to history. It is this scenario which this document addresses.

Patrilineal trees are a convenient way to show relationships between men in a surname project group which presumably has one or more CMA. If individual trees are truncated before a CMA, one can assume the existence of the CMA and add him to the patrilineal tree. This allows one to analyse the group as a whole. In some cases there may be common ancestors of a subgroup(s) who are known and presumably, by an unknown path, converge on the more distant CMA for all the men.

This document explores methods for using at-DNA results to help map the missing steps in the path to the CMA of all the men. The at-DNA may be derived from the men in the project or other individuals in their families who share a path to the surname under investigation. The figure illustrates the opportunity.

Matches (blue nodes) may match more than one surname kit (pink nodes) and provide additional clues about how the kit owners are related. The number on the edge is the shared centimorgans


With this introduction, let's step back and outline the steps:
  1. Identify of suitable surname project. The group of men should have strong evidence for sharing a Y-chromosome, typically most recent SNPs on the same recent branch of the haplotree. There must be a critical mass of at-DNA test results available for these men or others who match them but have a different surname. Embedding an at-DNA project within a surname project encourages submission of these kits. One strategy is to include a group in the project for individuals matched to the surname but not in the patrilineal tree. The Modified Y-Utility can be used to determine the time to MRCAs and this will help determine kits most likely to benefit from at-DNA analytics.
  2. Collect the at-DNA result files. A project administrator can download the FTDNA Family Finder and Chromosome Browser comma delited files for each available kit.
  3. Curate a GEDCOM file. The Graphs for Genealogy (GFG) platform has tools allowing you to cross map matches to known individuals in the GEDCOM. The GEDCOM should incorporate hypothesized MRCAs who link known lines for groups of men in the project. This will enable queries of all putative descendants.
  4. Load the kits to the platform. This involves several steps described in another blog post.
  5. Discovery process. Graph methods are powerful because they enable discovery of insights that are otherwise difficult to find.
  6. Research Hypotheses. Discovered insights produce clues which will require in depth research. The research will also add individuals to the GEDCOM and, through curation, create new links between them and the project matches in the family tree graph.
  7. Peer review. Good research will hold up under scrutiny by peers knowledgable about the lines investigated and the methods used.
Discovery Methods

Discovery methods are still under development! Research is defining the best methods and then code is created to enable users to implement the methods. This page will be updated to reflect progress.
  1. Targeted surnames. Matches with the project surname are perhaps the best targets for finding clues of relatedness. Some of these may be men who can then be recruited to the project.
  2. Spouse surnames. Men sharing a CMA may also share a common female ancestor.
  3. Sister surnames. You may recognize distant cousins by a surname. MCA's known sisters may have known husbands whose surname is propagated to his descendants, your distant cousin, who have at-DNA test results.
  4. Surname frequency. Counts of the number of matches with specific surnames may point you to best opportunities for research, particularly if the surname is not common in the general population but is prevalent in the project matches.
  5. Surname substitutes. Projects often have men whose surname is different from that of a distant patrilineal ancestor. Use these substituted surnames to discover insights.
  6. Clusters. Visualized graphs will show clusters of matches associated with various combinations of project kits. The matches in these clusters are bridges in the graph between two or more kits. Of particular interest are matches who match multiple kits, especially if the kits are in different known lines.
Case Study: Avitts Surname Project

The Avitts surname project at FTDNA has several distinct groups of men with different spellings of their surname in the group with a haplogroup defining SNP of BY39551. The patrilineal tree incorporates hypothesized CMA, allowing the display of all project members and their lineages.

William H Averatt line. The author's great-great-great grandfather is William H. Averatts [53] (c1813 SC-). The number in [] is a unique identifier. There were several hypotheses for his parents, but each lacked conclusive evidence. Among those was a suggestion his parents were John Dempsey Avit [3032] (1785-) and Nancy McGhee [3033] (1797->1840). Supporting this were the presence of a Nancy Avit next to a younger John T. Avit [3036] (≬1815/20-) in the 1840 census of Gallatan Co., Illinois where William [53] was known to live. This census indicated Nancy [3033] was born between 1791 and 1800. A very simple and intutive graph database query produced the graph (shown above) of McGhee linking multiple kits in the project. This included two men whose family trees mapped to a CMA, James H McGhee Sr [30187] (1698-1774). This tree was known from previous research but assumed increase importance with the multiple McGhee matches in the project. Further study of the children of James H McGhee Sr [30187] revealed a daughter Nancy, born in 1797 and a son William who died in Gallatan Co., IL in 1840. Futhermore, the geographic trajectory of the children of James McGhee [30187] through North Carolina, South Carolina and Jackson Co., Alabama was aligned with that of the author's William H Averatts [53] line. William's oldest daughter was born near Birminham, Alabama. This DNA and historical evidence added weight to the notion that Nancy [3033] was the mother of William [53] and John T. [3036].

Biddy Wilson line. One of the other participants in the project was a Wilson whose Y-DNA matched him to Avitts. Research provided an explanation. His great great great grandmother was Biddy Wilson [26815] (c1795->1860). She is named in the probate record of her mother Susanna in Spartanburg Co., South Carolina. This record names Biddy's husband as William Evet [26812] (c1800-). She was evidently estranged from him and gave her children her maiden name. This finding led to research on Avitts/Evit in Spartanburg Co. The 1790 census shows, on the same page, William Avit [31678] (<1774-) and John Avit [31676] (<1774-). In the 1820 census we find a William Evet who may be [26812].

Samuel Givens Evetts line. A third line in the project tracks it lineage to the CMA Samuel Givens Evetts Sr [26907] (1774 NC-1850 TX) whose hypothesized but not proven parents are George Evit [26951] (1745-1830) and Elizabeth Givens [26952] (-). Using the simple graph database query, six distinct Givens matches were found linked to each of the three lines described here. None of these matches were shared by all three lines.

Case Study: FF_Green Surname Project

The FF_Green surname project at FTDNA has several distinct groups of men with different spellings of their surname in the group with a haplogroup defining SNP of BY96037. The patrilineal tree incorporates hypothesized CMA, allowing the display of all project members and their lineages.

Robert Green line. The author's great-great-great-great grandfather is Robert Green [1628] (c1774-1826). He was living in Abbeville Co., SC in the late 1700s. His parents are unknown. His haplogroup root is R and this distinguishes him from descendants of another Robert Green in that county who are in the I root branch.

Ezekiel Green line. Several project participants descend from Ezekiel Green [27324] (1774-). It is hypothesized that he descended from Greens who settled initially in Maryland.

Ohio Green line. Two project participants are descended from Green lines that are traced back to Ohio but whose earlier members are unknown.

A early finding in this project was discovering, among the 80,000 matches, a match to three of the kits whose family came from Early Co., Georgia and who had 3 brothers who are potential Y-DNA testers.