Our DNA Mega-Cluster Beta Testing program is now underway. A common question that may arise – “Is the effort to setup a Mega-Cluster Database worth all the manual labor involved?” We hope the answer will be “Yes.” The early beta testing does indeed involve many manual steps as we continue to refine and optimize our procedures and associated documentation. Eventually, our goal is to minimize this effort and automate many of the steps used in the procedure. We hope the following example will help users answer that question.
Example of a Basic DNA Mega-Cluster Analysis for Chromosome-1
This example describes the results from creating a Mega-Cluster Basic Database file from a GEDmatch One-to-Many comparison for the Primary Kit owner. The results were downloaded from GEDmatch as a csv (comma separated variable) file and opened as a Google Sheet file. The initial download results were then reformatted to the Basic Mega-Cluster standards used for this example.
The total number of matches included in a Mega-Cluster depends on the threshold settings selected when the match results were created. While smaller matches can be useful to confirm relationships, a starting value for the threshold defined as 20 cM will create a list of 500-750 matches. In this example the total number of segments generated 530 segment matches. The Table 1. shows Family Relationships and the expected cM size and range found for these matches. While you can find matches with 6th-cousins, the results may be less reliable due to the small amount of centiMorgans in the matches and the difficulty of identifying which of your 256 possible great-grandparents are your common ancestor. For that reason, the Mega-Cluster Method recommends setting a higher threshold of 20 cM when you create your initial database.
Table 1. Family Relationships and Estimated Size (cM) of Matches
A snapshot of a partial view of the Basic DNA Mega-Cluster Analysis for matches found on Chromosome-1 is shown below. The color bands make identifying cluster easier. The clusters are shown for the matches with the Primary Kit Owner. The Primary and Match tester’s personal information has been hidden in this view. The matches shown use the Database Column Name (Alias) function for privacy considerations. The Mega-Cluster Database contains additional descriptive fields that can be displayed using the DNA Mega-Cluster Database Enhanced or Advanced templates. These are described in other Evergreen Mega-Cluster User Guides.
Column Descriptions – DNA Mega-Cluster Database – Basic View (Private )
- M-Kit ID – GED – Ged Kit#-Uploaded From Company
- Name (Alias) – Match name (Alias) [Function]
- M-Sex – Match Tester Sex
- Chrom – Chromosome Number
- Start Loc – Segment Start Location
- End Loc – Segment End Location
- Seg Length – Segment Length (Calculated)
- cM – Segment size (CentiMorgans)
- Date Created – Date Raw Data uploaded to GED