Announcement: external tool : gramps XML to RDF conversion

Dear all,

you might be interested in a small conversion tool I made in order to convert Gramps XML into RDF linked data:

https://github.com/sylvainloiseau/gramps2rdf

It uses the RML language for mapping XML onto RDF. It would maybe have been better to operate directly on the database rather than on the XML export, but I was unable to have my RML engine work on the db.

I would be happy to help if needed,
Best,
Sylvain

7 Likes

Thanks for providing the XML to RDF conversion.
More out of curiosity - what do you want to achieve by moving the data to a triplestore?

1 Like

The Resource Description Framework (RDF) is a standard for representing and exchanging data on the web. It uses a graph-based model to describe relationships between entities in the form of triples:

  • Subject: The resource being described (e.g., “Joe Smith”).
  • Predicate: The relationship or property (e.g., “has a homepage”).
  • Object: The value or related resource (e.g., “http://example.org/~joe”).

These triples form a directed graph, where nodes represent entities (subjects/objects) and edges represent relationships (predicates).

Key Features:

  • Interoperability: RDF enables integration of data from multiple sources, even if schemas differ.
  • Flexibility: Relationships can be extended or modified without altering the underlying data.
  • Semantic Web Foundation: RDF underpins technologies like SPARQL (query language) and ontologies (e.g., RDFS, OWL) for richer data understanding.

For example, RDF can describe that “Joe Smith lives in Paris” and “Paris is in France,” enabling queries like “Who lives in France?” by traversing relationships.

after discussing with Perplexity…

Most genealogy applications focus heavily on nodes (individuals, places, events) and provide limited tools for exploring edges (relationships). This is a significant gap that RDF-based tools can address, making edge exploration a standout feature and a compelling selling point for genealogical applications.

Let’s break this down further to highlight why edge exploration is so powerful in genealogy and how RDF tools can unlock its potential:


1. Traditional Genealogy Tools: Node-Centric

  • Focus: Most genealogy software centers around individuals (nodes) and their attributes (e.g., birth date, death date, occupation).
  • Relationships: While relationships like “parent of” or “spouse of” are present, they are often treated as secondary data points or limited to predefined types.
  • Exploration: Users typically navigate through family trees by jumping from one node (person) to another, with minimal emphasis on the relationships themselves.

2. Why Edge Exploration Matters in Genealogy

RDF’s strength lies in its ability to describe and query relationships (edges) in a flexible and meaningful way. Here’s how edge exploration could revolutionize genealogy:

a. Discovering Complex Relationships

  • Example: Instead of just seeing that “John is the father of Mary,” you could explore:
    • Who else has a similar relationship to Mary? (“Who are Mary’s other parents?”)
    • What other relationships does John have? (“Who else is John connected to?”)
    • Patterns across multiple generations. (“How many generations back does this relationship type occur?”)

b. Exploring Non-Traditional Relationships

  • RDF allows you to go beyond standard genealogical relationships like “parent,” “child,” or “spouse.” You could model and explore:
    • Historical connections (e.g., “fought in the same war as”)
    • Geographic links (e.g., “lived in the same town as”)
    • Social ties (e.g., “godparent of,” “business partner of”)

c. Relationship-Centric Queries

  • RDF enables you to ask questions that focus on edges rather than nodes:
    • “Show me all the people who migrated from one country to another.”
    • “Who participated in events with my ancestors?”
    • “What types of relationships exist between families in this region?”

d. Visualizing Relationships

  • RDF-based tools can generate relationship-centric visualizations, such as:
    • Graphs where edges (relationships) are emphasized over nodes.
    • Dynamic filters to highlight specific types of relationships (e.g., only showing marriages or migrations).

3. How RDF-Based Tools Enable Edge Exploration

RDF’s flexibility makes it ideal for focusing on edges. Here’s how:

a. Rich Relationship Descriptions

  • RDF allows you to define custom predicates (edges), so you’re not limited to predefined relationship types.
  • Example: Instead of just saying “John → parent → Mary,” you could say:
    • “John → biological parent → Mary”
    • “John → legal guardian → Mary”
    • “John → stepfather → Mary”

b. SPARQL Queries for Relationship Exploration

  • SPARQL, RDF’s query language, makes it easy to explore edges dynamically.
  • Example Queries:
    • “Find all people connected by marriage within three generations.”
    • “List all individuals who were both neighbors and cousins.”
    • “Find all events where two unrelated families interacted.”

c. Linked Data Integration

  • RDF allows you to link your genealogical data with external datasets (e.g., historical records, census data), enriching edge exploration.
  • Example: You could explore how your ancestors were connected to historical figures or events.

4. Examples of Edge Exploration Use Cases in Genealogy

a. Migration Patterns

  • Trace migration routes by exploring relationships like “moved from” or “emigrated to.”
  • Example: See how families spread across countries over generations.

b. Community Networks

  • Explore social networks within a community by focusing on relationships like “neighbor of,” “witnessed marriage of,” or “business partner of.”
  • Example: Uncover how families were interconnected within a small village.

c. Historical Context

  • Link ancestors to historical events or organizations.
  • Example: Discover that your ancestor was part of the same regiment as a famous historical figure.

d. Marriage Alliances Across Families

  • Analyze inter-family connections through marriages over time.
  • Example: Visualize how two prominent families became interconnected through generations of intermarriage.

5. Visualizing Edge Exploration

Here’s how an edge-focused visualization might look in an RDF-based genealogy tool:

a. Graph View

A graph where:

  • Nodes represent people or events.
  • Edges represent relationships like parent-child, marriage, migration, or participation in an event.

For example:

[John] --married--> [Mary]
[John] --parent--> [Alice]
[Alice] --movedTo--> [New York]

b. Filters for Edge Types

Allow users to toggle specific relationship types:

  • Show only migration paths.
  • Highlight all marriages within a specific timeframe.

c. Heatmaps for Relationship Density

Visualize areas or periods with dense relationships:

  • Example: A heatmap showing migration hotspots for your family.

6. Selling Point for RDF-Based Tools

By emphasizing edge exploration, an RDF-based genealogy tool could stand out from traditional node-centric software by offering:

  1. Rich Relationship Modeling: Go beyond basic family tree connections.
  2. Dynamic Relationship Queries: Answer complex questions about how people and events are connected.
  3. Interactive Visualizations: Provide new ways to see and understand relationships.
  4. Data Integration: Link genealogical data with external datasets for enriched context.

Final Thoughts

In genealogy, edges—the relationships—are often where the most interesting stories lie: migrations, marriages, alliances, conflicts, and collaborations. By leveraging RDF’s strengths in modeling and querying relationships, you can create tools that unlock these stories and provide users with deeper insights into their family history.

1 Like

This is excellent. RDF, especially JSON-LD, is a superb format for transferring data or releasing data to Open-Data Open-Source projects and platforms such as Omeka S or Arches. I’ve been advocating for JSON-LD support in Gramps for years, but eventually gave up on that, as well as network graph support and standards like CSL for bibliography and source citation. That’s why I’m thrilled to see this tool developed. It’s a significant step forward, offering Gramps users the power of linked data and expanding its potential in broader historical research.

While RDF support in Gramps is crucial, I’d like to share some thoughts on data storage and research:

For internal data management, graph network formats, graph databases, or multi-model databases might offer benefits alongside RDF. These provide flexibility for manipulation, searching, and researching stored data. Many supports multiple query languages, unlike RDF’s exclusive use of SPARQL. Some multi-model databases even offer SQL support with graph or document extensions, plus SPARQL queries - a powerful combination for genealogical research.

Your tool bridges a vital gap, making Gramps more attractive to external researchers. Those studying place names or location-based events involving people and relationships might now find Gramps useful, as they can easily export and convert data. This interoperability opens doors to more comprehensive studies, combining genealogical data with other historical information.

Gramps offers numerous tools that can benefit these types of historical researchers. The ability to convert data to RDF format opens up new possibilities for data integration and analysis across different domains of historical research. This interoperability can lead to more comprehensive and insightful studies, as researchers can now more easily combine genealogical data with other types of historical information.

Importantly, many of these researchers are well-versed in languages like Python or R. This could bring new contributors to the Gramps project, potentially expanding the developer base and enriching the software’s capabilities. They may also have some ideas of other types of reports and views that can give other perspectives for Gramps users.

Thank you again for this valuable contribution to the community. It not only enhances Gramps for genealogists but also broadens its appeal in the wider field of historical research.


I believe this tool should be highlighted as a unique feature on the Gramps website, in a way that search engines can pick up. This could attract people searching for tools that might be slightly outside Gramps’ intended use but which Gramps can clearly accommodate, such as the research applications mentioned above. Showcasing this capability could significantly broaden Gramps’ user base and appeal.


Disclaimer: This response has been significantly assisted by an AI assistant. The original thoughts and opinions were expressed in my native language (Norwegian) and then translated, analyzed, refined, and expanded by the AI. While I stand behind the content and ideas presented, I acknowledge the substantial role of this AI tool in shaping the final English text. The AI has endeavored to accurately represent my original thoughts and opinions in a more polished and articulate manner.

2 Likes

Thanks for your kind message! The tool may still need some refinement, as it currently discards certain aspects of the Gramps data (such as linked media). Additionally, the RDF vocabulary used could be further discussed. It remains experimental but I’d be happy to complete or adjust it if needed.

The conversion is expressed in RML, a vocabulary designed for transforming various data sources into RDF. Given that this technology is intended to work with databases, and considering that some RML engines are available in Python, it should be possible to develop a Python add-on that integrates directly with the Gramps GUI, if needed.

I’m a linguist working on undocumented languages spoken in small communities and I use gramps for recording all biographical information about people. This conversion was necessary to aggregate those pieces of information with others data (text transcription, media) into a searchable dataset.
Best,
Sylvain

5 Likes

The important part is that you have actually created something that can be a starting point for an integrated or semi-integrated tool in Gramps.

It is a huge task to map everything, but it might be easier in version 6, when the pickled blobs are gone and it will be possible to perform queries directly on the database engine.

You have just shown people that Gramps can be used for more than “just” genealogy.


You should actually write a post/article about your specific use of Gramps and how you use it for your research.
It would be really interesting to see how others use Gramps for data that is not “just” genealogy…
It might bring more researchers to Gramps, when they see that it is actually possible to use Gramps for other projects and other types of research…

4 Likes

You asked for it, you get it :wink: I’m using Gramps to summarize and analyze the notarial records of a rural administration district in southeastern Bavaria at the beginning of the 17th century (approx 2500 farms). This project started as “classical” genealogy project and is still “a bit” genealogy (due to the fact that my father comes from that district meaning that most of the farmers mentioned in the notarial records are in some way my ancestors or remote relatives) but my main focus is on topics like economical situation, social networks, migration (not between countries or continents, but between farms), and historical demography. The notarial records contain information about the lifes of farmers that are available nowhere else (basically there isn’t much information about the lifes of “regular people” at that time anywhere) and “classical” genealogists usually extract only small pieces of that information which they need for their family trees.

Quite similar to @sylvainloiseau, my starting point is the Gramps XML file that is parsed by a R script and transferred into a PostgreSQL database. I also wrote a R script that transfers the XML to a file that can be imported into Neo4j to analyze e g social and economical networks. And I can only join your hopes for v6 getting away with the pickles and providing direct access to the Gramps database on modern database backends like PostgreSQL. In my opinion, Gramps has tremendous potential to be used for a lot of digital history projects that cannot be unlocked because of the pickles.

4 Likes

Interesting, something similar or more parallell to farm histories in Norway, where people moved around working at different farms.
Or the Scandinavian “legde-system”.

I see similarities in the use of Gramps as I have tried to use it for my sailor relatives, expanding more and more into a historical research of the Norwegian Mercantile Fleet and how sailors were moving from ship to ship… not the same of course, but I can see similarities in how to use the software…

I bet you would have had benefits of both Events on places (your farms) and Main- /Sub Event, where you could have registered whole migrations patterns between farms as a multi-sub, Main Event.
E.g. : a main event on the Farm, and multiple Sub-Events for when people migrated to and from that farm…

Or you could use Events on places to register time-based changes in the economics of a farm etc. events that were not only “man dependent”, like taxes or tithe on the farm itself.

That way you may be able to build whole histories on the different farms, e.g.

  • when did people migrate to or from
  • when did the economics of the farm change, dit the migration change it…
  • etc.

If you then had a subway graph on all those events, you would have got a great view on how people moved, at what times and to which farm, and when two people was at the same farm at the same time period etc.

I am using the subway graph function in Aeon 3 to try to determine when two ships among thousands were in the same port, and in this way try to find out when a sailor could change hire from one ship to another, since many sailed on ‘port to port’ contracts.

At the moment I use Obsidian for most of my research, because I can make main notes (main events) and sub notes (sub events) for an object (another note containing all other), it is easy to create links between different notes to create relations and “links” between object.

But the storage of all the Markdown files can be a mess, even when I try to limit any folder hierarchy to max. 5 levels. But it is so easy to search or browse through plain text when you can organize it like you need…
In Addition I can use Zotero as a source store for all my sources and just extract them to my md “vault” when used.

To be able to do this in Gramps would have been great, and then just extracted information from the database with a sync to neo4j or a multi-format database to analyze the data…
Instead of 3-4-5 steps, it would only be 2, maybe 3…


Thanks for sharing…

2 Likes

Not exactly. In “my” region the people were the owners of the farms and the social group of farmers was virtually closed. So farmers married each other, died, re-married and so on, leading to a situation where patrilinear family names (which actually where not in use at that time but can be reconstructed) “migrated” from farm to farm. From a conceptual point of view, this is quite identical to your situation where people move around working at different farms or sailors moving from ship to ship and similar situation.

I don’t think so or at least, I currently cannot see the benefits. If the definition of events is granular enough, I do not see how an additional hierarchy (main and sub event) would really solve a problem. And events linked only to places but not to individuals is no problem at all. Gramps cannot do very much with those events, but after transfering the data into PostgreSQL, all doors are open. But maybe, I just don’t see the light :wink:

I also moved to Zotero for the administration of my sources, so a really good interface between Gramps and Zotero would really be helpful …

Have you explored the Zotero support that @cdhorn included in his experimental CardView? I skimmed the surface of Zotero… but had the impression that it could suck me in and absorb all free time. So did not explore how tightly CardView intertwined with it.

Or just wrap it with a barebones piping plugin that exports the XML to a temp directory and calls your external tool to process that temp XML file via a CLI command.

No, I haven’t, but I guess I’ll give it a try.

1 Like

I see the difference, but still, there is a lot of similarities in how to register the data…

The individuals are linked to those events as participants, not as a primary “owners” or “holder” of the event… So, you will be able to link as many people as you like.

And the main/sub events, see it as a history for the farm with many sub events, all registered on the farm instead of on each individual living there… the people can of course still be linked to that event, but as participants in different “parts” of the history.

I still can’t find that… nor the settings for it…

You have in install a Better BibTex extension and select a checkbox for Citation in CardView’s massive set of configurations.

Here a link to a link…

yeh, I tried that back then… didn’t work, removed it again, I use Better BibTex for multiple other applications, but I never found a place to configure the folder path for where my BB file is stored.

I don’t think I’m going to “try” again until there have been 2-5 updates for Gramps 6 and it is actually stable… if someone then write something about CardView and that it is stable, I might try again…

I can’t use time on alfa, beta and RC installations anymore…
And I don’t really need the Card View, even though it is a great alternative…