More genome comparison potential?

The following site came up as part of Google’s targeted content for my profile. Thought it might be of interest to others here who have already done a bit of integrating DNA data in Gramps.

But it brings up another point to discuss. DNA data can be huge… actually that should be massively HUGE! Perhaps thought should be given to storing genome data in a separate database file rather than as media? That way, the files could be backed up only when new data is added. And it might also encourage designing of Genetics as a separate View so that Media Galleries/Thumbnails don’t become cluttered & disorganized.

(It seems like Addressbooks, eMails & Galleries each would be more sustainable as separate external database files. There are existing database formats for each & tools for optimizing their content. My Tree data, by itself, has become large enough that Backup upon Exit requires allocating significant shutdown leadtime. While I’d like to use Gramps’ Media functionality, the additional housekeeping makes that impractical.)

pachterlab/ gget Public

gget enables efficient querying of genomic databases, such as Ensembl, UniProt, NCBI, directly into a Python or terminal programming environment. It was designed to support genomic data analysis.
See the README.md for more information.

It depends on what you want to save:

  • Genome Sequencing (WGS 30x) = ~100GB
  • DNA Testing (700.000 SNPs) = ~20MB
  • List of Segment Data (~1.000 Matches) = <3MB
  • List of DNA Matches (~1.000 Matches) = <1MB
2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.