Class structure in Gramps: primary object

Hi all,
Before diving into the code, I’m trying to understand the class structure and the relationship between objects. I begin with the primary objects and write notes about what’s in gramps/gen/lib.

For primary objects, I’ve come up with this graph:

I noticed that NoteBase is an ancestor class of every primary objects. Why wasn’t it an ancestor class of BasicPrimaryObject, just the same as TagBase and PrivacyBase? (Or of PrimaryObject to avoid a situation where notes could have references to other notes)

Was it a late decision to allow NoteBase on all primary objects? Would changing the inheritance graph cause compatibility issues with existing database? I understand that objects must be serialised and unserialised when saved and reloaded. Could the change imply a situation where existing DBs could no longer be loaded correctly?

What is the rationale between separating AttributeBase and SrcAttributeBase? According to the code, they are strictly the same. Is there some yet unimplemented difference in the specification? Or a “logical” one?

More questions to come.

1 Like

Have you looked at the Gramps Data Model Diagram?

It can be difficult to discover the existing developer documentation as you explore. You don’t yet know the name for which to search or whether any documentation exists for that functionality.

One of our Finding Aids is the MediaWiki Developer Category. But one of the many part of our documentation where improvement would be useful is onboarding experienced Python developers who are new to Gramps. Please keep (and share) notes on your journey so we might create guideposts for future explorers

Yes, I’ve already checked out every “usable” bit from the Contribute tab of the gramps official site, notably the UML schema which IMHO is way too cluttered to be useful (for example many connectors merge and you can’t tell where they exit the spaghetti mess).

I’m trying to create manageable partial views of independent parts (or at least having the smallest mutual intersections).

I’d like to understand some design choices so that I can implement a clean RDBMS schema. It is presently 75% complete but I don’t know yet how to integrate it in the existing core which should remain unchanged to avoid any damage to the workflow. My goal is to be able to bulk process some “non-damaging” tasks in SQL outside Gramps.

3 Likes

Why wasn’t it an ancestor class of BasicPrimaryObject , just the same as TagBase and PrivacyBase ? (Or of PrimaryObject to avoid a situation where notes could have references to other notes)

In fact, it can happen if you really want to make a loop on Note Editor, via the link edition dialogue. But no one wants to do that and such edition sequence will never occur in a real gramps session. This will only generate such message:

1074804: ERROR: grampsapp.py: line 157: Unhandled exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/gramps/gui/editors/editnote.py", line 350, in save
    self.callback(self.obj.get_handle())
  File "/usr/lib/python3/dist-packages/gramps/gui/editors/editlink.py", line 158, in _on_new_callback
    object_class, "handle", obj.handle))
AttributeError: 'str' object has no attribute 'handle'

Will not corrupt the database: a cosmetic design/UI issue without need to fix it.

In such case (uri link), a Note object can also be the “parent” in the objects hierarchy and not always a descendant class.

For the hierarchical database design, maybe the gramps XML file format can give more clues on the historical relations between objects?

I just looked at my workaround for the loop on Note class…

+ import logging
+ _LOG = logging.getLogger("editlink")

    def _on_new_callback(self, obj):
        object_class = obj.__class__.__name__
+       # workaround for bug12260
+       try:
+           test = obj.handle
+       except AttributeError:
+           _LOG.warn(str(object_class))
+           return

:upside_down_face:

You may find such strange issue, which will never be designed in a data model diagram, but can occur…

ps: I closed the pull request without merging.
The above workaround was too horrible for a so cosmetic issue.

In code development, there is always a net benefit in having the slimmest possible structure. It makes the intent clearer.

Of course, it is wise to avoid loop in specification. This is why NoteBase should be an ancestor of PrimaryObject and not BasicPrimaryObject. If you look at TagBase, all primary objects (person, family, note, …) can be tagged. It is cleaner to include TagBase in BasicPrimaryObject than explicitly mention TagBase in the class declaration of the objects.

1 Like

Keep in mind as you review things that it is a 20 year old code base with a lot of technical debt and contributions by people with many different backgrounds. I myself am a relative new comer so don’t know the history and the context under which things were done but there are others here who likely are able to answer those sorts of questions.

Yes, it is obvious in some parts, mainly in the GUI management. To take an example in the core part, I see class definitions like class newClass(type): where, in my understanding of Python, the (type) is implicit by default making this definition inconsistent with the vast majority of class newClass:, except if the intent is to create a new metaclass. But is this necessary? I have not yet read enough code to make up my mind. I have also found class newClass(object, …): which I don’t understand. object is the ultimate base of all classes. So, why this inheritance which defines a class with practically no attributes?

These questions clearly show that a fresh and candid reading of the code is needed with conclusions duly recorded in some document for future developers’ benefit. It can also be the opportunity to tidy ip the code.

It was just an idea for any bulk process and data importing.
There is an Import gramplet, but as wrote on the above sample, Note class can also be a pseudo top parent class, maybe like the database class on XML structure. e.g., a custom representation, like:

/database/events/event[0]/@handle
..
/database/people/person[0]/@handle
..
/database/families/family[0]/@handle
..
/database/citations/citation[0]/@handle
..
/database/sources/source[0]/@handle
..
/database/places/placeobj[0]/@handle
..
/database/objects/object[0]/@handle
..
/database/repositories/repository[0]/@handle
..
/database/notes/note[0]/@handle

might be:

/note/events/event[-1]@hlink
..
/note/people/person[-1]@hlink
..
/note/families/family[-1]@hlink
..
/note/citations/citation[-1]@hlink
..
/note/sources/source[-1]@hlink
..
/note/places/place[-1]@hlink
..
/note/objects/object[-1]@hlink
..
/note/repositories/repository[-1]@hlink

where [-1] will be generated after the parent Note object.
/!\ custom illustration, just a translation, a pseudo-concept, no academic design or model

So, close to gedcom logic but much more flexible. Any set of primary objects (class) can be a single database… Handling relations via a top Note object could be a “simple” way for a bulk import and will provide a solution for data and relationships control. Just an idea.

In code development, there is always a net benefit in having the slimmest possible structure. It makes the intent clearer.

If you look at the above custom representation of the XML relations (XPath like), you may see some possible improvements. Flat database human reading vs a pure machine coding!

  1. Links. one directional relation on Associations (person → person).
    Family/Relationship links (child → parent , spouses/partners).
    Object References (role on eventref, section area on media objects, etc.). Backreferences. etc.

  2. Attributes. Some attributes are close to events (facts/events on gedcom). Attributes can have notes and citations. Check back references for primary objects into Person’s attribute. etc.

  3. Address and Places

  4. Date object.

  5. etc.

Sure, they could all be replaced by @handle/@hlink or any hash, and the structure will be slimmer (and faster) as the current one.
Does gramps need such improvements on desktop applications?

As far as I know, there is no limitation on DB model. One can generate a DB bridge or minor customization (customisation?) for web services or advanced calculations.

It only means newClass is a subclass of type. You can add methods to the newClass.

For the following: class newClass(object, …)
It means the subclass newClass is a subclass of (object, …)
The newClass class inherit of all methods of the classes object, …

@SNoiraud: according to the Python manual, there seems to be a subtlety between class newClass: and class newClass(type):. The former is simply a new class which inherits from type anyway. The latter a new metaclass.

Is such a difference intentional or does it result from contributors with different Python skill levels? Or also a left over from Python 2 to Python 3?

EDIT: only formatting tidy up

The only differences I see with the metaclass is we can add or overide methods.
I see no differences between the two for me.

Those subclass have been defined at the beginning of the project.
They cannot be removed.

Gramps is an object programmation program.
Such classes must be defined from the beginning.

You need to work with them or completely rewrite gramps.

Unless I’m wrong, you can always add or override methods in OOP. In Python, a metaclass creates new classes while a “standard” class creates instances. Most comments I’ve read about metaclasses is “don’t overuse them unless you really need them”, i.e. to change something in the internal representation of the class.

So, my question is targeted to Gramps creators: was there really an explicit intent with metaclass creation?

In the same idea, when I read class newClass(objet, metaclass=mc): I don’t see the role of object because the metaclass mc will ultimately inherit from object. object is a very poor “naked” class with only fundamental methods and without namespace. The next class is type which is usually implicitly referenced by all class creations. So, having both object and metaclass= cancels object inheritance (in my understanding) because the metaclass bring in type. Mentioning object will have no effect on MRO because the “most derived class” will be chosen.

Please correct me if I’m wrong.

I can’t answer for the Gramps creators, but the only occurences of class someClass(type) I can find in the current code base are for the classes gramps.gen.db.dummydb.MetaClass and gramps.gen.lib.grampstype.GrampsTypeMeta. That these are intended to be metaclasses is pretty obvious from the naming.

Concerning class newClass(object, metaclass=mc), as far as I understand this is 100% equivalent to class newClass(metaclass=mc). The object is very likely a leftover from the old Python 2 distinction between old-style and new-style classes, that doesn’t exist in Python 3 anymore.

1 Like

Thanks. I searched the code and effectively the dummydb is the only case of a custom metaclass. But I’ll have to carefully study this when I come to the DB part because this dummydb is a shield against bad Gramps programming when no “real” DB is open.

The other xxx(type) uses strictly adhere to Python recommendations/tricks to create enum.

Uses of xxx(object) puzzle me. There are lots of them in plugins (importer, libprogen.py, reorderids.py, buchheim.py among others) and one in gen/utils/symbols.py. And I wonder if it is correct. I must check with the Python manual to see the consequences of this choice.

class xxx(object): is equivalent to class xxx: in Python 3.

I’m curious, can you eloborate on your motivation

My goal is to be able to bulk process some “non-damaging” tasks in SQL outside Gramps.

Why don’t you just use Gramps’ own SQL schema? It’s not really relational (relations are stored in pickled dictionaries), but I am wondering why you would need a different database schema for processing tasks in SQL.

1 Like

@DavidMStraub: the present implementation with SQLite is a simulation of BSDDB (key/value pairs) over an RDBMS (not causing any rupture with “historical” GRAMPS). When you look at the SQL schema, you see that you have access fields (the key to the records, mainly the “handle”) and the payload is a BLOB. This means that all data details are opaque and you can’t build queries with external tools, like sqliteman, to interrogate and modify contents. What I want is to create a true clean SQL schema and see if I can gain versatility vs. Gramps. In my mind, this would be just another DB class collection, so that I can choose between BDSDB, SQLite (present) and a future SQLiteSQL without changing anything in Gramps data management.

One of the difficulties is the present SQL schema is not “self-complete” as you need also to look at JSON description to discover how the records are made of.

As I mentioned in a previous comment, my pure-SQL schema is 75% complete (on a theoretical point of view, no implementation started). What remains to define is the “-ref” parts. My implementation requires more tables to be created in the DB to represent the one-to-many relations for lists (list of notes associated to a person record for example) and other inverse relations. For the time being I’m in the paper proof-of-concept. After that, I’ll have to implement a class and its methods to be substituted via configuration to the current SQL BSDBD-simulation family. And perhaps later, have some relation-oriented architecture.

I hope to be able to rename custom types which cannot be done easily if at all in Gramps because custom types are not “centralised”, i.e. every record has its own copy of the name string.

One of my other concerns is “multi-level security” in generated Narrative Web. By this I mean different users can see different page sets. I can do that through URL diversion in NginX (my server engine) without modification to Gramps (except a minor one in webreport) provided data is appropriately marked up (likely to be a specific attribute).

1 Like

I hope to be able to rename custom types which cannot be done easily if at all in Gramps because custom types are not “centralised”, i.e. every record has its own copy of the name string.

Every record has its own copy of the name string, but the existing custom types are stored in the metadata table. Yes, you have to itererate over the entire table of a given object type to find and rename all the occurrences, but unless you have millions of objects and don’t see this as an issue.

I agree if Gramps would be started from scratch today one would choose a different database schema, probably using SQLAlchemy ORM for Gramps objects, but I don’t think it’s easy to change that now.

When I started working on Gramps Web I was initially skeptical whether it would work with the current database schema but found that it works great.

1 Like

Nope. If I read correctly the Python manual, class xxx: creates a class derived from type, i.e. it already inherits a certain number of methods. class xxx(object): creates a much poorer class with nearly no methods. In addition, I wonder if this results also in a metaclass, considering the properties of object.

A couple of notes;
I’m not an original developer, so cannot help with the ‘why this’ kind of questions. But I am pretty familiar with the code.

There are two addons that might be interesting based on this topic;

  1. the SQLite export/import addon. This was originally created (I think) to explore making a much more relational db. It exports in a completely different db layout than Gramps own standard; it doesn’t use any blobs for example. This might be useful for someone doing some relational lookups with external tools. The import portion can be used to create a new Gramps db from a modified exported db, which can be used if you want to modify with external tools. This works as long as all tables and relationships are maintained.

  2. the TypeCleanup addon. This can be used to modify or remove custom types as well as replacing custom types with standard types. The latter can be useful if someone misspelled a type on entry and it ends up custom.

P.S. nice chart; I think it may be a useful addition to our wiki.

2 Likes