Gramps, Next Generation

Here is a cute example to find everyone born after 1980:

from gramps.gen.db import open_database
from gramps.gen.lib.date import Date

db = open_database("gen-100000, random-42, version-1", force_unlock=True)
matches = list(db.select_from_person(
    what=["person.handle", "event.date"],
    where=f"person.event_ref_list[person.birth_ref_index].ref == event.handle and event.date.sortval > {Date(1980).sortval}"
))

On 100k people, takes about 0.7 seconds.

2 Likes

Nice :heart:

I’ve not had time to look at your PR. Given the where and what seem to be able to work across tables, could you have just a single select method rather than select_from_person etc.?

2 Likes

Looking forward to working with you on this one, if you have time!

That is an interesting idea. I think there has to be a “main” table so it knows the direction of the JOIN. But worth exploring.

Three fixes that I could use a second pair of eyes on to get your feet wet (these are bugs):

  1. Use filter.apply() rather than apply_to_one() by dsblank ¡ Pull Request #2153 ¡ gramps-project/gramps ¡ GitHub - Use filter.apply() rather than apply_to_one()

  2. [GEPS 047] Select from table methods by dsblank ¡ Pull Request #2151 ¡ gramps-project/gramps ¡ GitHub

    There are two areas of fixes: don’t loop over possible handles, and remove the .index() code. Turns out that .index() is O(n) and was a major slow down.

And then into the breach of the rest of the PR. Looking for general feedback on the approach.

1 Like

Paid work is finished for the year so I should have some time over the holiday period. I’ll look at the two “fixes” first

2 Likes

I’ve moved the PR from “Draft” to “Ready for review”: [GEPS 047] Select from table methods by dsblank · Pull Request #2151 · gramps-project/gramps · GitHub

It contains a few interconnecting parts, which could be broken apart if necessary for sequential review.

1 Like

@SteveY and other devs: I moved the previously mentioned fixes to generic filter to a new PR: Optimizer: skip loop when all rules are optimized by dsblank ¡ Pull Request #2160 ¡ gramps-project/gramps ¡ GitHub

In addition to the fixes, I realized we could make another optimization:

Previously, we always looped over all handles because we couldn’t guarantee that there wasn’t an unoptimized rule (one that didn’t have selected_handles). But I added a check for that scenario, and now we don’t need to do the extra loop. That is about a 7x speed up in the 100k test family tree.

So the two stand-alone fixes are:

  1. Optimizer: skip loop when all rules are optimized by dsblank ¡ Pull Request #2160 ¡ gramps-project/gramps ¡ GitHub
  2. Use filter.apply() rather than apply_to_one() by dsblank ¡ Pull Request #2153 ¡ gramps-project/gramps ¡ GitHub
3 Likes

Rules generally look like this:

    def prepare(self, db, user):
        CODE

    def apply_to_one(self, db: Database, person: Person) -> bool:
        CODE

I realized that most of the filters looked like this after conversion to use select_from_table methods:

    def prepare(self, db, user):
        if db.can_use_fast_selects():
            self.selected_handles = set(
                list(
                    db.select_from_person(
                        what="person.handle",
                        where=WHERE,
                    )
                )
            )
        else:
            CODE

    def apply_to_one(self, db: Database, person: Person) -> bool:
        if db.can_use_fast_selects():
            return person.handle in self.selected_handles
        else:
            CODE

That is a standard pattern. That can actually be abstracted:

    @Rule.prepare_fast_selects(
        where=WHERE
    )
    def prepare(self, db, user):
        CODE

    @Rule.apply_fast_selects
    def apply_to_one(self, db: Database, person: Person) -> bool:
        CODE

So, the code only needs to have the decorators added, and everything else stays the same.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.