Graphviz Export - Wish for Feature

As some of you maybe have noticed, I use a few other software packages for me research, two of them are Cytoscape and Tulip …
Both of this can read .gv and .dot files.

My question or wish is if it would be possible and if someone that know how to program it, could create an graphical report that contained everything in the Gramps Database with all connections and relations.

The following would be a kind of “Spec”:

  • People as Nodes (Full name as label, with birth and death data ala “Persson, Johan (b. 1870 - d, 1920)”)

  • Families as Nodes, with the Surname of the Partners as Labels (i.e. "Family of Jonsson & Larsdatter)

  • Event as Nodes, with Description or type as Label (and dates if its not possible to define a date range in the format)

  • Citations as Nodes

  • Sources as Nodes

  • Repositories as Nodes

  • Places as Nodes, with The Hierachy of the Places as a Sub-Graph

  • Notes as Nodes

  • Media as Nodes

  • All type of Relations, Roles, and Types as Edges, this would also include Gramps Internal Links in Notes, with the Name/Description as Label and also with Dates if there are no date attributes in the Graphviz format

  • It would be great if all Media Files could have the path added as a “hyperlink”, and it would be great if any other hyperlink could be added


Graph Software like Gephi, Tulip, Cytoscape,Palladio, yEd, Social Network Visualizer and the new Constellation are great tools for analyzing data, and consolidate data from other places, but all of them need CSV or Graph specific formats …

The GV file generated from i.e. the Relationship Graph report can be open directly in both Tulip and Cytoscape, so I know it works …

Another wish would be if anyone could make a full export to a graphML or JSON-LD format …?
I know yEd can import gedcom, but gedcom are an extremely limited format in many ways …
I have a feature request for a “full” csv export, but it seems not many use that format even though its one of the most utilized formats in the “research industry” both regarding DNA and other linked data …

Anybody else use this type of tools in their research?

Given the potential complexity of data attached to persons and families etc., I would think that this would give Graphviz quite the complex job to graph. I would expect that each person would get a constellation of nodes around him, many of them quite large. Similar for Families. And many objects also include notes and citations, so some of the events, and media would get their own constellation.

In my own database, getting the ~1000 persons graphed so that they can be seen is already quite the challenge; I cannot imagine how you could actually use such a graph, even if Graphviz didn’t choke on it. If you are serious about this I expect you will have to learn Python and Gramps ways of doing things, or get someone to do this for you. Good luck.

1 Like

Another related question:
Has anyone seen a good graphic of such a constellation?

I was thinking that sometimes my filters don’t find everything expected. But manually validating the results is too difficult.

So, over time, I’ve created a few pseudo trees which have examples or various relationship… but with the name and/or ID representative of the relationship (instead of John Smith & Mary Jones) I can merge trees to test my filters against complex criteria.

I was think that if someone had seen a constellation visualization that represented a truly diverse dataset, it would be good to recreate that tree & visualization in Gramps. Then have a filter test that would always have all those nodes in the same spots but light up nodes that the filter allowed.

Something like testing those old strings of Christmas tree lights every year!

And, yes, having more than people as Nodes would be intriguing.

Have you looked through GEPS 30: New Visualization Techniques? Fascinating reading… particularly when you follow the references!

It seems like there are a LOT of academic classes doing programming projects involving Genealogy now. Like the Summer of Code, it represents a great opportunity to get creative expansions of Gramps. Promoting Gramps as a framework for class projects might be an opportunity.

Could be like trying to drink from a fire hose though!

1 Like

It’s to be able to find connections between “nodes” that you can’t easily see in a “normal” family tree …
i.e. the connection between multiple people and a place, that are connected through a network of sources and and other people …

And as I wrote, its to be used in other software than Graphviz (Graphviz choke and crash with a report with 1000 Nodes if you use 600 DPI resolution for the Graph, but the Network Graph tools I mentioned can easily handle network graphs with 10-100k nodes and also millions on the right hardware, and it is interactive, many of them have a lot of feature regarding interactive graphs, and they can export to formats that can be used for web solutions like the Archer Project, Heurist or Researchspace

With software that create and work with Network Graph algoritms, there are actually no problem to work with a few millions records …
… Cytoscape are used to analyze millions of DNA sequences, and to find pathways and connections impossible to find in any other way than with some kind of network graphs …

Neo4j and other graph databases are a good examples, they wouldn’t have been created if there was no use for graph network …

Lineage-linked research have a lot of limitations, even though Gramps goes a long way to overcome many of them, some of the problem is still that there are few ways to Analyze and View the data that can be recorded … But Gramps is unparalleled and without a doubt the very best program for storing extended information about relatives and individuals, my wish is that it also would be possible to analyze this data in different ways …
I have already found familiy connections between Norwegian families that I wouldn’t have found using any Genealogy Software alone, just because there are none or very little analyzing feature in most Genealogy Software …

The Relationship Graph Report already have a lot of this, but it do not add sources and so on …

So the reason for me to ask for this was that there already was a know format for many of you … for me the best way would actually be if there was a JSON-LD format, because even though the Gramps XML format are way greater than a gedcom file, there are no software supporting it … A full DB dump to csv would be an alternative, but as you have stated earlier, “noone” use it …

All I wish for are some way to be able to use the data in multiple ways, and this was one of the easiest I could think of, to get data out in a format that software working with Network Graphs could import …

1 Like

yes, I have looked at it and read a lot about it …
But if you look at them, all of them are person and family related only … none shows the connections between Places, Events, Citations and Sources …

One of my interest are to present my data with software like The Arches Project (overkill), Heurist, Omeka or Researchspace, just to mention a few … a lot of the reason is because I do more than lineage-linked research, and many of the connections I find and the history I find connected to this is way of my “family”, but some of it are a large part of history, i.e. my GG-Grandfather, hold nearly 30 patents world wide, one of them was translated and signed by Septimus Crowe, his father was the British Consult in Norway, that also was an important person regarding the british investors in the Norwegian Railroad …
Or that my GGG’s father in law owned one of the most popular Tivolis outside Christiania (Oslo) at that time, and that the King of Denmark actually visited that Tivoli …
But there are little written history about all those connections …
This are connections I found using a combination of Zotero, Freeplane, Cytoscape and Aoen Timeline, while I have all my data registered in Gramps, problem is that I can’t find those relations in Gramps, even though I register all the data I find, because there are no “Shortest Path”, or “Nearest Neighbours” …
If you register a a repository in Gramps, and maybe connects 50 source to it, and maybe 300 citations, to those 50 sources, and for those 300 Citations you may have 4-500 or maybe a 1000 events, connected to 50-100 sub-places, and just 2 of all the people you have registered from those Citations may be connected in some way, that you couldn’t even think was possible, i.e. a crossway 8 generation back, or through a document with a different spelling of a name …
This are things you cant find if you dont use a Graph, the Deep Connection Gramplet is actually a textual variant of a Network Graph, but it has some limitations … understandable …

1 Like

The little 5 min video on this page do actually explain it very well:
https://www.researchspace.org/

Yes, as I mentioned in an earlier discussion:

I have experimented with the following approach:

** export to Gramps xml format (uncompressed)*
** use XSLT to rearrange the data into graphml format*
** use a tool such as yEd to view the graphml file*

I would not do this for family tree graphs, since Gramps already has so many good capabilities. Rather, the approach could be used to try other kinds of graphs. Anything can be defined as a “node” or an “edge”. For example, the nodes in the graph could represent people and places, and the edges could represent the events that relate the people to the places. The size, color, etc. of the nodes and edges can signify other variables. Graphs can also be nested.

Tools such as yEd and others will also do some automatic clustering of data in graphs. I have tried using that to make some sense of my DNA matches (importing the csv files directly into yEd).

More recently I found an “R” language package called “ggenealogy” (with two g’s at the beginning) that has some interesting features.The examples in the article are related to other types of genealogy (soybeans and academics) but they would be applicable to family history and things as well. It is not just for visualization, but also computation. Even if you don’t want to use R, it might give you some ideas.

The ggenealogy package uses another package called igraph, for which there is also a python version.

1 Like

Maybe I’m reading it wrong, but it seems like this idea starts in one direction & takes a left turn.

One is about output formats that would write current graphing object oriented output in another format. That format that is specific to particular chart editting tools. This would allow graphs (that might have a few awkwardly placed nodes in an Gramps generated chart) to be lightly tweaked.

The other is a data export that writes Gramps objects in another node & link file format. This allow another visualization to layout the nodes in network charts that don’t exist in Gramps… or are too complex for any of our current visualizations.

There is an interesting item in George’s comment:

If you’re doing the same XML transforms with XSLT repeatedly, then you should be able write up a detailed list of transforms. Those details would be a great stepstool to an XML dialect Export add-on.

Its a two parted thing … the best would be to have an export to a JSON-LD or a GraphML format, but a .GV or .DOT file can be utilized in some of the Graph Software, without the need of programming a XML/XLST conversion …

One of the reasons I wrote this, was that if there was a way to output data to a Graph format, there would be a bigger chance that more researchers that use those tools saw a benefit in using Gramps for register their data, specially using postgresql …

I never write a wish, that are “only for me”, I always try to think 1-2 step ahead … and there was a time a while ago someone talked about the need for more developers, and the best place to find anyone that work with historic data and that can do programming in Python, AND understand this type of “systems” it would be in a research environment, like an university or a research lab for i.e. social humanity research …
I have found 10’s of databases on British and Americans universities with network graphs of historic populations, everything from Walish to Black Americans … most of them are created in Spreadsheets and imported to some online network graph system (very often something the researchers has created themself) …

The thing is with any software, if it can’t share data with other software products, it will always end up as a niche software and very often die, Gramps are actually something of an exeption to that …

Digression: Last night I was surfing the Gensoft list of genealogy software, and there was hundreds of them listed, but only a few that was actively developed … and most of them had a lot of negative reviews …

Yes, I can manage with a combination of Excel, CSV, XLST, XML and JSON, but its cumberstone …
Wouldn’t it benefit Gramps in long term, and not just me, to have a format or possibility to create a graph that could be imported to other research tools, so that researchers that use those tools look at Gramps as a registration system for their data instead of Excel or CSV? Or am I thinking in a totally wrong direction here? More advanced users = more resources for the community and dev?

Another huge benefit of a graph format would be that it could be used to import data into online tools like Omeka, Heurist or Researchspace, and even Arches … or some of the plug-ins that exist for Wordpress or Drupal …

Here is a subset of the Gramps"example" database converted to graphml via XSLT.

It has only a few node types (person, family, event, place) and only a few details (names of people and places. The edges simply connect the nodes. Much more could be done with the data. But maybe it is enough for you to try loading into one of your preferred tools.

2 Likes

Thanks, I can maybe look at it and see if I can figure out a workflow for doing it …

1 Like

For those who may be wondering why bother with another graphing tool, when Gramps already produces many nice charts: another benefit of using a graph viewing program is that it can give you a sense of the scope of your research, in a way that you don’t get just by looking at trees containing only people and families.

In the attached screenshot (which I hope you can enlarge to view), notice the disconnected sets of nodes over on the right side. In this case, they include not only persons and families, but also events and places, that are not connected to the mass of data on the left. (Again, for this example I used the Gramps “example” database.)

Tools such as yEd (which I used here) allow to vary the shape, size, color etc. of nodes and edges based on attributes in the data. So, for example, you could color the person nodes green, the family nodes red, etc. I have not done that here.

Another feature of such tools is to find hierarchies or clusters within the data. And it’s worth repeating that the nodes and edges can represent whatever you choose to put in the input file – sources, citations, anything.

2 Likes

Thanks for helping me explaining this … you did a much better job than me :slight_smile:

And even though it would be “cool” to have all this in Gramps, its not needed for those of us that can work with files and scripts, but for those that can’t transform a file format to another, it would be of great help if there was some way to just export (or as I asked for create a network graph of the complete database including all “Nodes” and “Edges” in a graphical format that some of this graph tools can utilize … Then It could be exported from those programs to others …

I totally agree with those who think that adding this type of netwrok graphs with all the analyzing algoritms and so on, would start to bloat Gramps, but maybe i.e.the JSON export could be made to follow the JSON-LD format (that most graph tools support), it would be very helpfull …

In an export like that it could be a few simple choices, i.e.

  • What will you use as Edges
    **Citations
    ** Roles
    ** Relations
    ** Assosiations
  • Will you add the full Place Hierarchy or only the lowest level Place?

And so on …


And just so people understand, all this graph tools are zoomable and in most of the tools interactive, so you can zoom in to a single node or a group of node, and you can add Nodes and Edges where needed, and in this software you can analyze different type of paths between nodes, in a visual way thats difficult in a “normal” family tree, because in a family tree, you can’t get the connections that goes through a Place or an Event … You only get connections that you already have done person to person …

yEd looked interesting. Downloaded & installed it — during which noticed that it imports GEDCOM. And that GEDCOM dataset exported from example.gramps was far more complete than the transformed XML file… without any tweaking.

‘One Click’ layout
example2a

Organic layout
example3a

Yes, yEd is quite handy for viewing GEDCOM files, if you just want to see persons and families as nodes.

To be clear, the example graphml file contains only a subset of the data elements in the xml export, but it does include separate nodes for events and places, thus making the place hierarchy visible. (In yEd, you need to use Edit - Properties Mapper to decide how you want the different node types to be sized, colored, labeled, etc.)

The point was that one can create nodes and edges from any type of data, not just persons and families. The only limit is one’s ability (and patience!) in coding the XSL, and mine is somewhat limited.

1 Like

I found this GEDCOM-to-JSON-LD converter. That led me to this site where you can load a GEDCOM file for some interesting visualizations.

Here is someone else’s project. It concludes that it would be necessary to “create a GEDCOM-appropriate @context object” which of course is difficult due to all of the issues with GEDCOM and the way different programs use and extend it. The first converter above makes no attempt at that, but rather “maps a few [existing] ontologies to various parameters in GEDCOM”.

I imagine it could be possible to create a @context object based on the Gramps data model, and it could leverage things like GeoNames. Then perhaps a JSON-LD export could be possible. But I really don’t know much about it.

2 Likes

You got me a bit curious to see what a JSON-LD might look like after converted from GEDCOM (I understand GEDCOM, and JSON, but never heard of JSON-LD before today). I tried to use the converter you mentioned above, only to discover it has a lot of bugs around importing GEDCOM files. Looks like the author was working with a very specific GEDCOM subset, and did not include code to deal with the many GEDCOM possibilities. In any event it would not convert any of several trees I tried. If anyone gets this to work (or any other JSON-LD from a GEDCOM source), I would like to examine both files to see just how hard JSON-LD would be to create.

You might want to look at RDF to … if you not already have …
Graphml, JSON-LD, RDF all are “formats” that’s been used in linked data in different ways …