Performance Issues with Gramps

So, I’m not a Python programmer but the classic approach is to use a profiler to understand why a particular operation is slow. Along the lines of:

Knowing what parts of the code are called the most or taking the most time is just the first step. Sometimes finding a speedup is easy and sometimes…not. Profiling eliminates some of the guesswork. Plus, when changes are made it can prove whether they are helping or not.

Craig

Yes. This has been done before. See GEPS 16: Enhancing Gramps Processing Speed for details. The Debugging Gramps page in our wiki has a section on profiling.

You can see some historic test results on our Gramps Performance wiki page.

We also have a Tips for large databases page which need updating for the SQLite backend. I’ll add a section for increasing the cache size.

4 Likes

These programs have been written in languages that are compiled (C, C++, …)
Gramps uses python which is an interpreted language.
We don’t need to compile it.
Performance is therefore not comparable.

My question is now: Why are you in such a hurry?
Is 1 hour of treatment really annoying? We have our whole life ahead of us.
We are really in a shitty society where the work asked for that day had to be finished last night.

1 Like

Well, I have most of my big trees already in Gramps, including the one with Charlemagne, so I can live with that GEDCOM import speed quite well, but I also know that Gramps can import these much faster, even though Python looks like an interpreted language.

And I say looks like, because it’s not, because the interpreter generates byte code on the fly, just like Java, in a way. And even C# programs are not fully compiled to machine language.

The truth is that things can be done way faster, even in Python, by using a better strategy. And that’s not by increasing cache sizes, but by filling the database in phases, just like PAF and RootsMagic do. And I know by experience that better code is a far better strategy than tweaking stuff.

I agree. For infrequent tasks perhaps people need to have a little bit of patience. Sometimes speed may not be the most important factor in the design.

In the case of deep connections and the Gedcom import the problem may be with the algorithm. I don’t mind people rewriting these for better performance if it bothers them.

Using our existing indexes in the filter code would also be a good idea. If we are clever, we can probably come up with a database agnostic design that can exploit the strengths of different backends in the future.

The comment that caught my attention was the one by @Davesellers. I can see that even a two second delay in a dialog that is used very often could be extremely irritating.

By increasing the database read cache size users can get an immediate performance improvement for virtually no work. I expect that this is one of the reasons that BSDDB is faster.

2 Likes

I know that, but it does not give the results that you can achieve with better algorithms, or architecture for that matter. I know that, because we had a problem with a query that took 20 minutes in my job, and that was a problem, because it meant that a user’s workstation would be frozen for that time.

By adding indexes, and maybe increasing caches too, our Oracle DBA was able to reduce the delay to something like 5 or 10 minutes, which was still not good enough, so I looked at the algorithm instead. And when I fixed that, I could run the same query in 20 seconds!

Increasing caches can help, but I don’t think that it can explain the difference between applying a filter in Gramps or RootsMagic, which is at least 20 times faster, thanks to its name table.

Fun fact: When I installed LMDE 6 on my HDD today, I found that the person filter worked just as fast on that as it does on my main database in Mint 21.3 on SSD. And that means that the caching works quite well, even for a 600k person database.

Gramps is slow because it does a full table scan. For a person search, every record is retrieved and matched using a python rule. When we discussed this 8 years ago we fully realised this. My prototype used our existing indexes and significantly improved performance.

1 Like

When I search for substrings of names, I bet that RM does a full table scan too, or simply sends something like ‘%borg%’ to SQLite, which results in an interactive search, including alternative names, where interactive means that the results appear while I’m typing.

If that is true, and an SQL like ‘%string%’ does a full table scan indeed, the difference is not the table scan itself, but the fact that in RM, the names sit in a table of their own. And with that, you don’t have the overhead that you have when you have to read all person objects, and search inside the name elements, or object members.

In other words, I don’t believe that the problem is in the full table scan, but in the table that we need to scan and parse, if there is no separate name table. With a separate name table, all work is done by the database, and the influence of Python as an interpreted language is probably quite small, compared to whatever language is user by RM.

That’s what I think indeed, and I have long delayed looking into the Deep Connections code. I do know that the Consanguity Gramplet is a lot faster, even though it does some similar things.

I’ll try to find some time to look into that again, especially because I use it like a thousand times more often than the GEDCOM import.

1 Like

Yes. SQL “Like” queries can only use the index up to the first wildcard.

I don’t know anything about the design of RM and I haven’t used it myself so I can’t really comment on specifics.

All designs have tradeoffs - advantages and disadvantages. A SQL query will be faster but a Gramps filter allows regular expressions which give more flexibility then SQL wildcards. This will be the case whether you create an extra table or not.

If you create the extra table to duplicate data for filter queries then you have to maintain it. If you normalise the data then our “get” methods will be slower.

With JSON there are other design options such as a generalised inverted index (GIN). My guess is that some form of hybrid design may well be the best choice.

1 Like

Well, I think we should appreciate the time we have and our life. The life is quite interesting thing. Today we are live and work. But tomorrow the ballistic missile is will fly to your house. I dont mean you personally @SNoiraud, this is abstract statement. But in any case, you should try to increase your productivity and automate routine. If humanity had not done this at one time, there would be neither technical progress, nor computers, nor the Internet. We would all still be farmers about which we are currently doing genealogical research.

Personally I care about how much work I will get done in this life. In this case, it depends on the performance of my computer and the software I use. This is answer on your question above @SNoiraud .

3 Likes

I like that, and I can see that such an index can be automatically made, and updated by PostgreSQL. I don’t see that in SQLite however, via Google, but there seems to be an extension named FTS (full text search) that can be used for the same thing.

Is that right?

I need to add, that for me, live interactive filtering is probably way more interesting for locations, because when I add those to an event, I always do a Search first, to check whether the place exists. A fast live search there, and an Add button that is always visible so that I can add a new place when I don’t find one, has way more advantages than faster person filtering by name.

And yes, regular expressions are an advantage indeed.

1 Like

This would be a significant savings in Workflow time (and unnecessary rework aggravation).

If there was an Add icon in the Place selector for when the place does NOT exist, it would definitely streamline the process. (eliminate the following steps: Cancel the Place Selector, click the Add button, navigate the hierarchy to the correct place enclosure level before adding.)

This item is mentioned in a 4.1.x version Jul 2015 Feature Request 0008698: Usability improvements: add a new or available place to an event

The request is a “4 part” request where item 2.1 was satisfied with drag’n’drop of Places to Clipboard enabled in v4.2

Since it is a muti-part, those tend to be partially implemented.

Performance improvements that affect things you do regularly should be prioritized over things people do rare. But both should ofc be looked in to.

Personally I have one “research in progress” note that has become very long. Mostly because how I decided to do research when it comes to one family.

When I have done an edit and click ok to close it, it takes about 5 seconds, even with a tiny change, but I am guessing how much change doesnt matter. My tree in general is still small.
The note gramplet also uses about 4s to show the note, while just opening it it by double clicking is still very fast.

May be a rare case, but just wanted to mention it

3 Likes

It has increased to 10s every time the note is saved(closed). I really shouldn’t be using it how I am currently using it.

I had the same issue and my solution was move the note to a txt-file and attach it as a media. I dont know if this is possible but it really would be great give to the users ability disable all note-calculations and use it as a simple textarea. But maybe it is not possible because of some reports generating features or something else.

How large is the note? Does it have many links in it?

I’ll have a look at it for you if you can create an example file that can reproduce the problem.

1 Like

I probably should have done that a long time ago, but I was also kind of curious how slow it would become after a while.

Most of the note, except about 15-20 lines is basically this:

[Between 5 words and a sentence what a link contains]
[Link to the location (A magazine for example)]
[Empty line]

Repeat over and over.
Pasting it in to Word, making the font small enough so that one line is one line, gives about 600 lines. So about 200 Short text lines, 200 links and 200 empty lines.

That may seem like an insane amount to have in one note, but considering I have been going through probably 800 or so search matches about the people that was famous in some circles, its already cut down with a rough cut, will go through them later and cut down plenty more. (And then remove links when the best is added as sources and info to the tree) (Maybe unnecessary explanation).

I can possibly share the note somehow if you want to somehow, most of the information dates to 1890-1950.

If what Gramps do is to calculate the links every time it saves, maybe its possible to make it so it only calculates those things on lines that was changed? (Dont know if its technically possible, too much work, or even worth doing)

1 Like

I do all my research in Obsidian and/or Foam for VSC.
Then I attach the markdown notes as media file to the Gramps object I am researching…

That way I have a lot of features to help me in the research.

Any Markdown or other notebook software that store notes in plain text files will do, in addition you can actually write longform articles atc in those tools if you want to, using your research…
Just a tip.