"Sets" in Gramps

Currently, by means of custom filters, Gramps allows users to create dynamic “sets” of a given object type, which can then be manipulated in certain ways (tagged, deleted, exported, etc.).

I wonder if it might be useful to have a more formal, persistent kind of “set” object in Gramps?

A set could be homogeneous, containing only objects of a given type, as well as other sets of that type. One example would be a set of events, as has been discussed in several other posts. Other examples would be clans/tribes/etc. which could be sets of families. A set of places could be helpful when multiple but non-enclosing or non-hierarchical locations have some intrinsic relationship. Perhaps even a set of sources could be useful. Anyway, it would be good to make the feature as generic as possible, rather than working for only certain object types.

A set could also be heterogeneous, containing any mix of object types and other homogeneous or heterogeneous sets. This might be useful for representing an organization which consists of both people and places (or addresses) - a colony, a company, a diaspora, etc.

At a minimum, a set would have a handle, a Gramps ID, and a name/description. It could also have attributes, notes, tags, and maybe even citations. (Possibly it could also have a date range, but that might complicate things unnecessarily, since some of the underlying objects already have date ranges of their own, so I would recommend against that.)

There could be filter rules to return objects of specific sets, or sets meeting some specified criteria, just as there are filter rules for each type of object. This would enable all existing reports and displays which have filtering to access them.

Sets could overlap; they need not be hierarchical. Filter rules could return the union or intersection of sets, just as they currently do when combining multiple rules.

Sets would not be exported to GEDCOM, except perhaps to somehow flag each object as belonging to a particular set. But they could be exported to other formats, and possibly imported from some other formats.

I continue to think about all of this, but am now at the point where I am interested in feedback on the potential utility of this feature.

3 Likes

I think there are many interesting ways to record and organize data which don’t fit into GEDCOM. Saving a set of objects is less of a problem than representing it to users, so they can understand them and benefit from the changes. Clans/tribes might be possible just as a list, but how would you present heterogeneous sets with complex relationships where a list is not sufficient.

One idea I had recently was creating organisation objects:
Families record close individuals through biological (and non-biological) relationships (e.g. parent-child), associations do that for friends and acquaintances and organisations would connect unrelated individuals through their jobs/positions in an organisation.

Person events especially occupation and military should take places as well as positions. The organisation should be a (hierarchal) list of job positions. Each position would be a list of individuals and a date or a time range when they had this position.

Yes, that is a challenge. A generic data storage solution would certainly benefit from a generic reporting solution. So initially I would probably just export the data and use some third-party tools to present it graphically. I might also try to learn how to create my own text reports within Gramps. But I realize that many users would not want to do either of those things.

Eventually, maybe Gramps could be enhanced to create more varieties of graphs. It already creates bipartite graphs (having both families and individuals), so I hope it would be possible to create other graphs having multiple types of nodes and edges.

I think that more specific enhancements such as organisations and super-events are good ideas, and having standard solutions for those would save people the trouble of creating their own solutions using “sets”.

I am curious to know what others might do with “sets” if they were available.

There are multiple ways to do this…

But one way is to look at how things are solved in a Network Graph and it’s clusters.
You can base a Cluster, Collection, Set on specific Keywords, Attributes or types (In most NGT’s you actually just define the column you want to be your cluster data, or you define a calculation of some data)

So if all objects in Gramps get attributes, it would be easy to set multiple “Cluster” or “Set” attributes so that one Object can be a member of multiple “Sets”, i.e. a Place, it can belong to multiple sets of “Clans” since multiple Clans can have clan members living in different places…

This could also be used for Military Units, or any other Legal Unit that doesn’t fit as a place or “Event”…


The problem as I see it is that for this to “work” is, Gramps need to move away from the lineage-linked research model (it shall of course support it), and move a little more in the direction of a Historical Humanities Research Tool where there are easier to register, research, analyze and visualize cross-objects relations, it main purpose should of course still be People and their Families, but if it become even more flexible in the ways subjects and objects was related, and how we could register those relations (links), it would become a lot more than “another genealogy software”.

And because of Gramps data model, it should be possible without a lot of design changes…

Just as an example, there are not much of a problem to visualize this type of Sets in a Network Graph software (it’s called “Clusters”), In i.e. Gephi or Cytoscape, you can define any field as a data field for a cluster, the same can be done in Graphviz, there you just define multiple clusters and any node or edge can be in any cluster.

With this feature it would also be easy to create VENN-Charts, to find crossovers, objects that is in more then one Set/cluster.


There wouldn’t even be necessary to add a lot of analyzing tools, because those tools already exist, all that was needed would be a few interchangeable export/import formats so that our data could be exchanged with other research tools like Cytoscape, Zotero, Gephi, Head Start, Vistorian, Arches, Heurist, Constellation etc. etc. and etc., or to make the famous web-API read/write/update.

Yes, I am gonna repeat this until I’m either dead, kicked out or the features has been implemented… (most likely in that order)…

I’m really glad more and more users are asking for functions and features that expand over in a wider HHR field of research.


And just a word to all those thinking about this as “bloating”…

To add functional features that is in the field of the tool, that is NOT bloating.


I support this and any other suggestion that give Gramps the feature set that it deserve to be even greater…


Which export/import formats are needed?

You can easily present those as lists, just as you do with Families or Places today.
you can just put them in sub-sets, collections based on their type.
i.e.

  • The Clan of Aasgard Set
    • People
    • Families
    • Events
    • Places
    • Sources (The Saga of the Norsemen)
  • The Clan of Midtgaard Set
    • People
    • Events
    • Media
    • Sources (The Saga of the Norsemen)
  • The Clan of McMacDonald
    • Families
    • Events
    • Places
    • Sources

and so on. All the sub-sets can be automatically generated based on the type of object or an attribute the user define, nothing else. so if you didn’t add sources, you wouldn’t get a “Sources” sub-set.

at least one OWL format RDF/XML or json-ld is widely used.
and one or two (or the most feature rich and most widely supported) network graph format.
here is a few (but I’m not an export in this), here is an article with some standards.
Messina, A. (2018). Overview of Standard Graph File Formats . https://doi.org/10.13140/RG.2.2.11144.88324

That document mention three…

  • graphml
  • gefx
  • gml

But you also have the tulip and pajek formats, I think networkX has it own format to, but it is important to use one of the most supported and feature rich format, so that it is enough to maintain one export/import of each.

For example, the json-ld format is supported by a lot of web based research and research presentation tools, and it already exist a lot of converters for that format to other vocabulary

While the graphml, gefx or gml can be open by most Network Graph Software.

There are already multiple reports in Gramps that utilize Graphviz, it can be read by i.e. Cytoscape (With some limitations in what information you add, the import doesn’t support linebreaks in the node name), and Tulip,
But I don’t think it’s an easy format to reimport to Gramps, and I don’t know if to many NG tools save to Graphviz (only tested import to Tulip and Cytoscape (I got the developer of the addon for Cytoscape to update it to be UTF-8 compatible)).

This sounds interesting. Can you explain how Sets would differ from Tags? I’ve never used Tags in gramps but it seems like they would fullfill the general purpose of arbitrarily grouping and selecting related objects.

Yes, they can currently serve that purpose. For example, you can create custom filter(s) to define the set of related objects, then tag the items in the resulting list.

The main difference is that Sets would have their own Attributes, Notes, Tags, Citations. And as I think about it more, maybe in addition to a Name/Description, they should also have a user-defined Type.

I admit it’s a bit of a solution looking for problems, but I like to think that, by being a generic approach, it might serve many needs.

While thinking of possible uses, keep in mind that there could also be singleton sets (having only one member) and even null sets (having no members).

In other words, a set would contain a list of zero to many handles of other objects of any type, including sets. Recursive sets should not be allowed (ideally prevented, else able to be detected somehow).

OK, here’s a practical example use of “sets”.

One could create a “set” for a triangulated DNA match. The set would contain the three persons involved in the triangulation, and optionally one family: the ancestral couple that is thought to be the source of the DNA segment. (We don’t know which ancestor in the couple, until we find a more distant matching cousin that leads us through one of those two ancestors into a more distant couple; but that will be another triangulation set.)

The set would also have its own attributes, which in this case would include chromosome number, start position, end position, cM, and SNPs. If there are also “person set ref attributes” (just as there are person event ref attributes), then that would be a place to store a person-specific set attribute to indicate whether the match is on the person’s maternal or paternal side.

Using new set-related filter rules, custom filters could find all of the triangulated match sets meeting any criteria involving the persons, the family (ancestral couple), and the details stored in the attributes.

A non-triangulated match set would be similar, except with only two persons in the set instead of three.

I hope this will inspire people to think of some ideas in other, unrelated subject areas.

2 Likes

@GeorgeWilmes: I think we should call the new sets CustomSets to avoid confusion and we could simplify it even more to a general Segment Match CustomSet.

Segment Match:

  • Handle
  • Gramps ID
  • CustomSetType (= Segment Match)
  • Attributes
    • chromosome number (required)
    • start position (required)
    • end position (required)
    • cM (optional)
    • SNPs (optional)
    • Reference Genome (optional, default=hg19)
  • References
    • two people sharing the segment (required)
    • common ancestor person or family (optional)
  • Sources
  • Media
  • Notes