Business Logic with overloaded methods

Developers,

Although I’ve been thinking about these ideas for a long time, I finally have a concrete solution to propose. First, I’ve put these ideas into a first draft of a PR, mostly to be very concrete about the plan:

Some comments:

  1. No filter, rules, gramplets, etc. need to change. Everything will continue to work as it has. People can still write code like they have.
  2. Over time, we can slowly move code (especially expensive to run code) from all over the code base to a centralized collection of “business logic” methods. (This is a term we used to use as a way of deciding what should not be in the database code)
  3. At the same time, we can provide database-specific implementations for speed
  4. The current low-level code requires adding a new columns to each primary table.

Ok, but the devil is in the details. The idea is to create a new abstraction class called BusinessLogic. In this class, we move some code out of filters, rules, gramplets and other plugins to here. These might be expensive functions like get_person_handles_that_have_common_ancestor_with_person() (this is based on a real filter). Here is an example from the PR:

    def get_father_mother_handles_from_primary_family_from_person(
            self,
            handle=None,
            person=None
    ):
        """ High-level implementation """
        if handle:
            person = self.get_person_from_handle(handle)

        fam_id = person.get_main_parents_family_handle()
        if fam_id:
            fam = self.get_family_from_handle(fam_id)
            if fam:
                f_id = fam.get_father_handle()
                m_id = fam.get_mother_handle()
                return (f_id, m_id)

        return (None, None)

As a mixin, this will be a method of the db. The code is mostly copied directly from the IsAncestorOf rule. And now we refactor the rule slightly to call the new method.

f_id, m_id = db.get_father_mother_handles_from_primary_family_from_person(person=person)

So far we have only refactored and moved code to a different place. By itself, this is a good thing. We keep implementing the same functions over and over again. Now, they will be in a standard location. (Third-party plugins can use the old system, or any of the new business logic functions.)

So, we start out with our normal, slow, unpickled primary objects, and one by one write low-level replacements for your favorite backend.

Possible variations:

  1. maybe name all BusinessLogic methods (say prefixed with “biz_”) for easier understanding
  2. we can combine the BusinessLogic mixin with DB in a few ways

Downsides:

  1. The business logic is based on lowlevel details, and might need to be changed if the DB changes. We can do some work to make that better
  2. It will takes some work to write the low-level code

Upsides:

  1. A way forward to make gramps much faster than today, without requiring any code changes in the gramplets
  2. Is a clean way to manage abstract and low-level implementations
  3. We can incrementally make each gramplet faster

Of course, comments, questions, and critiques welcomed.

1 Like

If I claim that unpickling the blob_data takes time, and this PR requires the pickles be turned into JSON, why not do that first and see what kind of performance just making that change makes?

Ok!

1 Like

Results! Well, it is true that unpicking takes time. But it is also true that converting text into JSON also takes take. Turns out they perform about the same in terms of access time.

Of course, unpacking the pickled blobs also takes up more space.

So, in a head-to-head comparison, blobs vs json are about even, but with json_data taking up more space.

BUT, there is a compelling reason to make the switch: direct SQL access. We can’t do that without this conversion.

3 Likes

I think moving business logic into the database layer is a truly terrible idea. It seems to run completely contrary to all the ideas we have developed over many years on modularisation, structured programming and abstraction. The database code should be just about accessing the database. Modularisation is all about making the code maintainable. This (unless I have misunderstood, but commit 645b40898c5287107ba6f1600a381a7a0273f722 seems to be this) makes the code so much more difficult to maintain.

Well, Business Logic appears in code everywhere! In views, gramplets of all kinds, and filters/rules. And is replicated in multiple places. I don’t think that is very organized.

The second part is just optionally implementing those in a lower level.

Honestly, the filters I was looking at hadn’t changed in years (probably never). So, I don’t think there would not be much to maintain once written.

Don’t you think the possibility of speeding up gramps is worth exploring?

3 Likes

Progress on conversion to JSON, and loading people views:

1 Like

If Business Logic appears in code everywhere, then surely the answer is to put that logic in a separate module, not put it into the database module.

If the business logic is put into the database modules, then surely it will have to be repeated for each kind of database (SQL, BSDDB or whatever future databases might be implemented), so it will be replicated anyway.

The fact that the code in filters has not been changed in years is irrelevant to the point about maintainability, because one never knows what changes might be needed in the future, and maintainability includes not complicating the database modules by including in them something that is not relevant to databases (that just makes the database modules more difficult to maintain).

(Sorry to use the word ‘module’, I know that is not the right word for Python, but it just seems to be the best word for the concept).

If Business Logic appears in code everywhere, then surely the answer is to put that logic in a separate module, not put it into the database module.

Maybe you missed the fact that BusinessLogic is a separate class, independent of all database backends.

If the business logic is put into the database modules, then surely it will have to be repeated for each kind of database (SQL, BSDDB or whatever future databases might be implemented), so it will be replicated anyway.

There is no longer a separate BSDDB backend. But each BusinessLogic method is not “replicated”. Rather each BizLogic method could be re-implemented in the backend, but only if we want.

One could easily just delete the backend implementations if they were too much to revise, and we’d be no worse off than we are today. In fact we’d be better off, because all of the methods are together.

I think it will be worth it, as many functions (like the DeepConnections gramplet) could be made very fast, instead of very slow. (I wrote it so I can dis it).

2 Likes

@Nick-Hall has offered some very insightful feedback on both the json_data depickled PR, and the BusinessLogic PR.

  1. Now the json_data is in the same JSON format that we use in other places in Gramps, and matches the schema for each object. (WIP)
  2. Since that is the JSON format, the business logic can be written in “jsonpath” format that works for pure-python json_data, and with SQLite and Postgresql syntax, like JSON_EXTRACT. That means we can write the busniess logic (and now really any generic db method) without any primary objects. AND it is a single method: extract_data().

I think this is pretty elegant while remaining abstract and overloaded for even faster. Thanks, Nick!

3 Likes

The current implementation adds a new database function, that works like this:

db.extract_data("person", handle, ["$.gramps_id"])

This will get the gramps_id from the person with the given handle. The jsonpath syntax is pretty rich. More on that later.

It works on generic databases (by using Python to get access to the given jsonpath. But it also have a SQLite implementation that is at least 3 times faster.

This is the foundation of a new layer of functionailty. One could easily build filters on this language too.

@DavidMStraub, one can easily parse part of GQL to this format (not the get_person() and other functions… at least not easily). I’ve started on a Python-to-sql version too. object-ql will have 100% coverage of queries (eg there is no query that you can’t write in it, but at the cost of slowness). But a Python-to-SQL QL version will be fast, but not have full coverage.

3 Likes

I can’t comment on coding. That’s beyond my pay grade.

But I’d like to say a word about outputs. I’m a novice user but I find many outputs difficult to understand. For example a gramplet gave me a chart showing people’s ages from my family tree (grouped). But there were no titles. It would have been great to have this split between male and female. And even better to have this info century by century, to see how demographics change over time.

On another tack, I recently made an Excel spreadsheet from a GEDCOM file.
It worked very well but names are in a separate place in the spreadsheet from addresses. They can be matched using the common ID but it’s difficult and tedious.

I’m sure if code was written with output formats more in mind, such problems would be avoided. I’m sure many people with a lot more knowledge of GRAMPS will know more about this kind of thing.

Before responding, welcome!

This posting seems to be in the the wrong thread to get a good response.

This is a development discussion about improving the speed and efficiency of the underlying search functions. Its purpose is to get the same outputs, but while using less CPU power and memory.

So maybe we need to move your posting to a new thread in “Help” (if you are asking for suggestion about how to achieve certain goals) or “Ideas” (if you are proposing a new feature)?

I am also a bit confused about some terms that seem mixed up. And the posting seems have several distinctly different subjects. So some subjects are likely to suffer short shrift.

  1. What do you mean by “Title”? A person tiltle is something like “Mrs.”, “Sir”, or “Dr.”; and a Place title is the fully qualified enclosi g hierarchy for the Place (e.g., “Pittsburgh, Allegheny county, Pennsylvania, USA”). But the rest of the sentence does not suggest either of these usages is what you meant.

  2. It is unclear how “made an Excel spreadsheet from a GEDCOM file” relates to what you want from Gramps. Gramps does NOT have feature to convert directly from GEDCOM to spreadsheet. So we’re missing which features you’re using to get from import (GEDCOM) to whichever (tree , view, or quickreport table copy’n’paste) export to (CSV/.ods) spreadsheet. Also, what do you mean by “Addresses” in this question? It seems unlikely you mean postal addresses of residences. So maybe you mean birthplaces?

  3. I am utterly unable to parse “if code was written with output formats more in mind”. Gramps supports so many output file formats and allows so many parameters for report filtering (and optional inclusions) that people complain about being unable to explore them all.so you must be trying to say something else.

Update on the PR to convert gramps-desktop to using JSON raw data rather than the unpickled-blobs.

I can now upgrade a database, and load all of the views. There will be additional speed gains to be made, but mostly focused right now on the conversion. The blob data has leaked into other places, so it will take some time to get those converted.

5 Likes