"Sets" in Gramps

Currently, by means of custom filters, Gramps allows users to create dynamic “sets” of a given object type, which can then be manipulated in certain ways (tagged, deleted, exported, etc.).

I wonder if it might be useful to have a more formal, persistent kind of “set” object in Gramps?

A set could be homogeneous, containing only objects of a given type, as well as other sets of that type. One example would be a set of events, as has been discussed in several other posts. Other examples would be clans/tribes/etc. which could be sets of families. A set of places could be helpful when multiple but non-enclosing or non-hierarchical locations have some intrinsic relationship. Perhaps even a set of sources could be useful. Anyway, it would be good to make the feature as generic as possible, rather than working for only certain object types.

A set could also be heterogeneous, containing any mix of object types and other homogeneous or heterogeneous sets. This might be useful for representing an organization which consists of both people and places (or addresses) - a colony, a company, a diaspora, etc.

At a minimum, a set would have a handle, a Gramps ID, and a name/description. It could also have attributes, notes, tags, and maybe even citations. (Possibly it could also have a date range, but that might complicate things unnecessarily, since some of the underlying objects already have date ranges of their own, so I would recommend against that.)

There could be filter rules to return objects of specific sets, or sets meeting some specified criteria, just as there are filter rules for each type of object. This would enable all existing reports and displays which have filtering to access them.

Sets could overlap; they need not be hierarchical. Filter rules could return the union or intersection of sets, just as they currently do when combining multiple rules.

Sets would not be exported to GEDCOM, except perhaps to somehow flag each object as belonging to a particular set. But they could be exported to other formats, and possibly imported from some other formats.

I continue to think about all of this, but am now at the point where I am interested in feedback on the potential utility of this feature.

3 Likes

I think there are many interesting ways to record and organize data which don’t fit into GEDCOM. Saving a set of objects is less of a problem than representing it to users, so they can understand them and benefit from the changes. Clans/tribes might be possible just as a list, but how would you present heterogeneous sets with complex relationships where a list is not sufficient.

One idea I had recently was creating organisation objects:
Families record close individuals through biological (and non-biological) relationships (e.g. parent-child), associations do that for friends and acquaintances and organisations would connect unrelated individuals through their jobs/positions in an organisation.

Person events especially occupation and military should take places as well as positions. The organisation should be a (hierarchal) list of job positions. Each position would be a list of individuals and a date or a time range when they had this position.

Yes, that is a challenge. A generic data storage solution would certainly benefit from a generic reporting solution. So initially I would probably just export the data and use some third-party tools to present it graphically. I might also try to learn how to create my own text reports within Gramps. But I realize that many users would not want to do either of those things.

Eventually, maybe Gramps could be enhanced to create more varieties of graphs. It already creates bipartite graphs (having both families and individuals), so I hope it would be possible to create other graphs having multiple types of nodes and edges.

I think that more specific enhancements such as organisations and super-events are good ideas, and having standard solutions for those would save people the trouble of creating their own solutions using “sets”.

I am curious to know what others might do with “sets” if they were available.

Which export/import formats are needed?

at least one OWL format RDF/XML or json-ld is widely used.
and one or two (or the most feature rich and most widely supported) network graph format.
here is a few (but I’m not an export in this), here is an article with some standards.
Messina, A. (2018). Overview of Standard Graph File Formats . https://doi.org/10.13140/RG.2.2.11144.88324

That document mention three…

  • graphml
  • gefx
  • gml

But you also have the tulip and pajek formats, I think networkX has it own format to, but it is important to use one of the most supported and feature rich format, so that it is enough to maintain one export/import of each.

For example, the json-ld format is supported by a lot of web based research and research presentation tools, and it already exist a lot of converters for that format to other vocabulary

While the graphml, gefx or gml can be open by most Network Graph Software.

There are already multiple reports in Gramps that utilize Graphviz, it can be read by i.e. Cytoscape (With some limitations in what information you add, the import doesn’t support linebreaks in the node name), and Tulip,
But I don’t think it’s an easy format to reimport to Gramps, and I don’t know if to many NG tools save to Graphviz (only tested import to Tulip and Cytoscape (I got the developer of the addon for Cytoscape to update it to be UTF-8 compatible)).

This sounds interesting. Can you explain how Sets would differ from Tags? I’ve never used Tags in gramps but it seems like they would fullfill the general purpose of arbitrarily grouping and selecting related objects.

Yes, they can currently serve that purpose. For example, you can create custom filter(s) to define the set of related objects, then tag the items in the resulting list.

The main difference is that Sets would have their own Attributes, Notes, Tags, Citations. And as I think about it more, maybe in addition to a Name/Description, they should also have a user-defined Type.

I admit it’s a bit of a solution looking for problems, but I like to think that, by being a generic approach, it might serve many needs.

While thinking of possible uses, keep in mind that there could also be singleton sets (having only one member) and even null sets (having no members).

In other words, a set would contain a list of zero to many handles of other objects of any type, including sets. Recursive sets should not be allowed (ideally prevented, else able to be detected somehow).

OK, here’s a practical example use of “sets”.

One could create a “set” for a triangulated DNA match. The set would contain the three persons involved in the triangulation, and optionally one family: the ancestral couple that is thought to be the source of the DNA segment. (We don’t know which ancestor in the couple, until we find a more distant matching cousin that leads us through one of those two ancestors into a more distant couple; but that will be another triangulation set.)

The set would also have its own attributes, which in this case would include chromosome number, start position, end position, cM, and SNPs. If there are also “person set ref attributes” (just as there are person event ref attributes), then that would be a place to store a person-specific set attribute to indicate whether the match is on the person’s maternal or paternal side.

Using new set-related filter rules, custom filters could find all of the triangulated match sets meeting any criteria involving the persons, the family (ancestral couple), and the details stored in the attributes.

A non-triangulated match set would be similar, except with only two persons in the set instead of three.

I hope this will inspire people to think of some ideas in other, unrelated subject areas.

2 Likes

@GeorgeWilmes: I think we should call the new sets CustomSets to avoid confusion and we could simplify it even more to a general Segment Match CustomSet.

Segment Match:

  • Handle
  • Gramps ID
  • CustomSetType (= Segment Match)
  • Attributes
    • chromosome number (required)
    • start position (required)
    • end position (required)
    • cM (optional)
    • SNPs (optional)
    • Reference Genome (optional, default=hg19)
  • References
    • two people sharing the segment (required)
    • common ancestor person or family (optional)
  • Sources
  • Media
  • Notes