Why is genealogical data portability so bad?

This is not Gramps specific, but I decided the ask the question here since many here are experts both in programming and genealogy (a combination that is not the norm).

I’ve imported data from MyHeritage and Ancestry into Gramps and a lot gets left on the floor. I get events and maybe sources but citations are not there or ignored errors. Often you get a lot of citations and atributes with site specific internal ID numbers that appear to be useless. From what I’ve read this is pretty much how it is from-to any genealogy platform web or desktop.

I realize a Gramps file to another Gramps famly or to the web version works very well, but I’m looking more data portability.

From from understanding, Gedcom7 was supposed to solve a lot of the gaps in Gedcom exports, but, hey, it was released in 2021 and it still not a common standard across the major platforms.

I suspect user “lock in” is one reson there is not a great deal of attention to solving this issue, which is sad. Yeah, the major web platforms are big and complicated, but the problem is not a moon shot.

My ideal would be something that is like “Markdown for Genealogy.” Is there any dates portability solution on the horizon? I realize media is a different animal, but at this point even structured text doesn’t port anywhere near 100%.

(I like Gramps and am not trying to find a way to migrate off, but not everybody uses Gramps.)

I hope this doesn’t come off just as a rant.

3 Likes

Your observation is spot-on with regard to data exchange, and I can tell you from experience that this is a pain point in many more domains than just genealogy.

While I can’t offer insights into data exchange with MyHeritage, Ancestry or GEDCOM7 specifically, here’s what might move things forward with Gramps:

  1. Identify specific issues you are running into.
  2. In addition to discussions here on the forum, check the bugbase to see if these issues are not already logged, either as bug report or feature request.
  3. If they are not, create issues, and accompany them with small samples that illustrate the problem.

Well, that would depend on what the solution is, and who you expect to solve the problem :smiley:

1 Like

My Personal View in short form

  1. Profit
  2. Inertia it is a problem for a small number of people so there is no
    pressure to fix it.
  3. If ISO was publisher then maybe but LDS is viewed suspiciously by
    many when they do anything.
    phil

I don’t really think it’s a bug with Gramps. Data issues with Gedcom certainly aren’t unique to Gramps. I think Gedcom is a data standard with a lot of shortcomings.

But it is the only one we have. And GEDCOM 7 is much better than 5.5.1. It is a problem that Gramps and many other genealogical programs do not implement that standard correctly. There are examples of applications that do a much better job.

The best solution: implement GEDCOM 7 for import and export in Gramps. Report issues whenever you found one (in Gramps, Ancestry, MyHeritage, …) as @codefarmer suggested.

The only reason GEDCOM remains the sole standard is that there isn’t enough collective pressure to demand better. When users advocate for modern Open Source and Open Data frameworks—specifically those that support true semantic interoperability—the response is predictably stagnant.

This resistance isn’t limited to commercial giants like Legacy or RootsMagic, who simply ignore the conversation. Even within the Gramps developer community, there is a frustrating reluctance to adopt truly interchangeable formats. While Gramps has an excellent native XML structure, it remains a functional silo. Instead of embracing recognized standards used by major global institutions, the typical response from developers is ‘fork it and do it yourself.’

To move genealogy into the modern era, we must demand support for established data exchange formats and ontologies, such as:

  • CIDOC CRM (ISO 21127): The backbone of OpenAtlas and the British Museum, providing a robust framework for historical events and complex prosopographical relationships.

  • Schema.org: Utilizing JSON-LD (Linked Data) and Microdata to make genealogical profiles machine-readable and discoverable.

  • IIIF (International Image Interoperability Framework): Using JSON-based manifests to share high-resolution primary sources and archival documents seamlessly.

  • GraphML: An XML-based format designed specifically for complex graph structures, far superior to the flat-file limitations of GEDCOM.

  • CSL (Citation Style Language): The industry standard for managing sources and bibliographies in JSON or XML.

True data portability requires support for standard serialization and modeling types like OWL (Web Ontology Language) and RDF (Resource Description Framework) for deep semantic modeling, as well as JSON-ND (Newline Delimited JSON) and CSV for high-volume data processing and accessibility.

The result of ignoring these standards is a fragmented industry where data integrity is sacrificed for the status quo. I have advocated for these Open Data formats multiple times, but both commercial vendors and open-source maintainers seem unwilling to bridge the gap between niche genealogy tools and the professional world of digital humanities.

Note: This content has been translated from Norwegian and edited for improved flow, clarity, and technical accuracy to better reflect the specific standards and formats discussed by Google AI.


Side note: For those using Markdown tools like Obsidian, I highly recommend checking out this plugin: Genealogy research in Obsidian for those who want to try .

3 Likes

Try using the MyHeritage Family Tree Builder. You might get better results than downloading directly from the MyHeritage website.

Family Tree Builder - Kostenloses Genealogie-Programm - MyHeritage

1 Like

6 Likes

Yeh, that is how it is today… mostly because people in the industry don’t actually look at open standards that already cover everything, even if they require a bit of “think outside the box” work, and because people using the software don’t ask for better formats…

But if we look at the formats I mentioned earlier, all of them can support a superset of genealogy without any real difficulty, because genealogy is just a niche within broader historical‑human data.
Actually, two of them are actually just transport formats for big datasets.

And just to be totally clear, no one here is asking for yet another standard.
Some of us actually do research on the existing ones and look at what might be beneficial to support…
…especially because there are excellent open‑source, open‑data tools outside the genealogy bubble that genealogists and “ancestral researchers” could use if we weren’t locked into a limited, historically incorrect, and lossy lineage‑link format that gets presented as “GEDCOM”.

In addition, just to avoid the usual confusion:
CIDOC CRM and CSL are completely different things. CIDOC CRM is a conceptual ontology for cultural‑heritage and historical information, while CSL is essentially a style system for formatting citations. One defines semantic structures and relationships; the other defines how references are rendered. Treating them as if they overlap or compete is exactly the kind of category error that keeps spawning “yet another standard”.

The irony is that we keep getting “yet another standard” not because the existing ones are insufficient, but because developers prefer inventing a new one over learning the capabilities of the old ones. That’s how you end up with 14 standards… and then someone proudly adds a 15th.

PS. I’ve said this before, but it’s worth repeating: Gramps XML is one of the best open standards we have for storing not only genealogical data, but a much broader range of historical and cultural information. The problem is simply that other open standards are more widely used in the broader research fields that genealogy is a part of — and adoption always beats technical merit.
So by supporting import/export to those formats, Gramps as a project might actually pull researchers from the broader fields toward the project, instead of isolating itself inside the genealogy niche.


Note: This text was originally drafted in Norwegian, then translated and refined for flow and technical clarity using AI assistance (Copilot and Google AI).

4 Likes

Of course the problem is mainly that there isn’t much incentive for genealogy program implementers to improve portability because it is better for them to increase lock-in.

Also (but probably of lesser importance) is that everyone (Gramps included) think they can do so much better by making some changes to their own implementation (this applies to Gramps too; lock-in is less relvant for Gramps because there is no profit element to possible lock-in).

Some of the sporadic contributions that I have made to Gramps (in the past) have been to improve support for GECOM in Gramps. I would much prefer if Gramps were based purely on the GEDCOM data model (this doesn’t mean that it would have to use GEDCOM as its underlying database - although some genealogy programs do exactly that). Some of the changes to Gramps (places hierarchical structure which I think is overly complicated - I am looking at you) move it further from GEDCOM which I think is regretable.

At one point, I did try to work out how to improve ‘GEDCOM’ import from MyHeritage, which almost implements GEDCOM sources correctly but not quite. However the lack of a plugin architecture for GEDOM import discouraged me.

I don’t see much point in moving to other ‘standards’ (e.g. GEDCOM 7) if other genealogy programs don’t implement them. Hence the main point: whatever the limitations of GEDCOM (though I think they are mostly in poor implementation of the standard rather than problems in the standard itself) it is the only portability standard there is so we should do our best to support it.

1 Like

Why would you want Gramps to be built purely on an LDS theology-based data model, unless you are a member of the LDS Church? Shouldn’t Gramps and other software be fully religious agnostic as well as political agnostic?

The idea that Gramps should limit itself to the GEDCOM data model is, in my view, a step backward for genealogical science. You mentioned the hierarchical place structure as ‘regrettable,’ but that is a perfect example of where Gramps chooses historical and geographical accuracy over a flawed, flat 1980s transport format.

GEDCOM was never designed to be a ‘gold standard’ for data modeling; it was built specifically as a pipe to feed the LDS Church’s Ancestral File and their temple work. Its architecture reflects those specific theological requirements, which is why it inherently struggles with complex social structures, non-traditional families, and the nuances required in professional humanities research.

We are essentially trying to force 21st-century historical data into a 40-year-old religious shipping container. Most software developers didn’t adopt the GEDCOM model because it was ‘good,’ but because it became a de facto necessity for data portability during an era of industry inertia.

Instead of forcing Gramps into a theological bottleneck, we should continue to advocate for Free Open-Source, Open-Data formats that allow for true interoperability with other historical and scientific research fields. Gramps should be leading the way toward better data standards, not shackling itself to the limitations of a legacy format designed for a very specific religious project.

GEDCOM is an extreme lossy format, even with version 7 when it comes to genealogy research.


Note: This text was originally drafted in Norwegian, then translated and refined for flow and technical clarity using AI assistance (Google AI).

1 Like

Simply because it is the gold standard for data portability!

There is another thread about which genealogy program people started with ( Was Gramps your first? - #11 by adriandavey ), and many people started with another genealogy program, so importing data from other programs is useful. Similarly, you might want to interchange your data either with a friend or relative that uses another program, or a website (like Ancestry), so it is very useful if Gramps is compatible with that other software.

GEDCOM is very widely used (despite sometimes eclectic implementations of the standard) by other software, so it is really useful.

I remember the X25 standard for data comms, and for example how much more sensible it was to have email addresses the other way round (e.g. name@uk.ac.imperial) but ultimately ‘the internet’ won, and few people would attempt to promote X25 nowadays!

Let us not forget that at it’s core GRAMPS is a multi purpose database
application with a front end that is highly(but not exclusively) tuned
for genealogical information. Whilst it’s ability to import/export
GEDCOM is useful (in most cases the least worst option) it is not
essential and certainly the structure should not be limited by an
outdated and much adulterated standard.
The huge variety of current/potential users and developers I am would
expect nothing less.
phil

2 Likes

If GEDCOM really were the ‘gold standard’ for portability, why does every single software vendor have to implement their own set of non-standardized custom tags (starting with _xxxx)?

The logic fails on its own terms:

The moment a “standard” requires you to create your own private dialect just to preserve basic data that the format doesn’t support, the standard has failed its primary purpose.
We aren’t actually ‘exchanging’ data; we are sending broken fragments and hoping the receiving software can guess what our custom tags mean—if it even cares to read them at all.

This is exactly why GEDCOM fails as a transport format.
A true transport layer should be a high-fidelity envelope.
Instead, GEDCOM acts as a ‘lowest common denominator’ filter.
When you move data from a rich, event-centric research environment like Gramps through a GEDCOM file, you aren’t just transporting it; you are actually strip-mining it of its context.

We are essentially using a 1980s telegram service to try and send 21st-century relational databases.

While this might have felt like a revolution back in the era of punch cards and magnetic tapes, we have long since passed the point where a flat, line-based text format can adequately serve modern data science.

This is why I advocate for moving toward linked-data standards in addition to the lossy GEDCOM format, and mapping (even using specialized AI-assisted workflows if needed) to formats like CIDOC CRM.

We need a transport layer that actually preserves the integrity of our research, and are interchangeable with other high end research tools, not one that forces us to lobotomize our data just to make it ‘portable’ using an outdated, lossy, religious “standard” that was never meant for universal use—it was only intended as a pipeline format for the Ancestral File system.


Note: This text was originally drafted in Norwegian, then translated and refined for flow and technical clarity using AI assistance (Google AI).

1 Like