Summary of slow down issues

dsblank · October 9, 2024, 5:40pm

Making a note of speed-up findings while fresh on my mind.

Performed some tests on IsAncestorOf on the Example tree
If I run the filter on every person, it takes 2 minutes and 10 seconds
If I avoid all unpickling, it comes it at 2 or 3 seconds
Operating in batches is faster, fewer queries
Unpickling can be completely avoided, even when we need info from inside the blobs
This involves saving the unpickled data in a new column and using JSON_EXTRACT() on the JSON-based text
We could implement such methods in a business-logic layer on top of the database. The current, unpickled, deserialized objects can still be the fallback.
Another enhancement for regular operation is to not unpickle the same object repeatedly. We could have a cache of unpickled, or deserialized objects.

What does a completely optimized low-level business logic function look like?

def get_parent_handles_from_main_family(db, person_handle):
   db.dbapi.execute(
       "SELECT JSON_EXTRACT(unblob, '$[9]') FROM person WHERE handle = ? limit 1;",
       [person_handle]
   )
   row = db.dbapi.fetchone()
   parent_family_list =  json.loads(row[0])
   if parent_family_list:
       return parent_family_list[0]

It is low-level, but not any more so that the unserialize methods.

The above replaces this which requires an unpickled, unserialized Person to begin with, and then unpickles and unserializes a Family too:

        fam_id = person.get_main_parents_family_handle()
        if fam_id:
            fam = db.get_family_from_handle(fam_id)
            if fam:
                f_id = fam.get_father_handle()
                m_id = fam.get_mother_handle()

dsblank · October 9, 2024, 9:33pm

@StoltHD I saw your message from earlier this year, and thought you might be interested in seeing what you mentioned in Performance Issues with Gramps - #37 by StoltHD looks like.

emyoulation · October 9, 2024, 11:43pm

Just for completeness, Nick noted that a reason BSDDB could be faster with large trees is because SQLite has a smaller cache.

Could/should the cache be made adaptive to the Tree size?

And there’s an outstanding addition for the wiki article:

dsblank · October 10, 2024, 1:16am

To lookup all ancestors of all 2,157 people in the Example tree Gramps called pickle.loads() 4,731,244 times (that’s almost exactly 2157 * 2157). When using replacement functions like above, it was called 38 times. That is the number one issue of slowness IMHO.

ennoborg · October 10, 2024, 1:25am

That’s right, and my experience too in other software areas. Tweaking cache sizes and adding indexes can speed things up a bit, even with a factor of 2 or more, but rewriting code can make things dozens of times faster. I did that with Oracle too some 20 years ago, reducing the time needed by a query from 20 minutes to 20 seconds. That’s what better code can do.

emyoulation · April 3, 2025, 1:43pm

In 2009, a message on Geneanet notes that a Heredis dialect GEDCOM import of 124,032 persons had slowed 4x: from 15 minutes in Gramps 2.2.x to over 1 hour in Gramps 3.x version.

The wiki discussion of the Heredis dialect has a lot pre-process recommendation to reduce the number of import errors being reported.

Perhaps the Note generation could be more efficient?

Topic		Replies	Views
Understanding Gramps 6.0 Development roadmap	25	1567	February 19, 2025
Making Gramps filters Faster, and then Superfast Development performance	15	303	February 1, 2025
The Road Ahead for database representations Development database	41	490	December 15, 2024
Why is Gramps slow in some places? Development performance	5	338	January 6, 2024
5.2 seems to be slower than 5.1 Beta Testing	8	56	January 5, 2025

Summary of slow down issues

Related topics