Improving performance in different areas

When I use the Deep Relations Gramplet on a ‘large’ tree, like 30,000 persons, with the generation limit set to 25, it will often take so long, that I give up, and kill Gramps. When I try the same in other programs, I get a result in seconds, and some can set relationship indicators for the whole tree in less than a minute, which is way less than Gramps needs for a single pair of persons.

It is often the case that GENi, with more than 160 Million connected profiles, needs less time than Gramps, even though their tree is more than a thousand times larger than mine.

Are there other areas that have similar problems?

When I first started using Deep Connections, it was dramatically faster. The performance hit didn’t evolve (as in, I added more nodes to my tree) but was instantaneous: apparently related to a release of Gramps, the relationship calculator or the add-on.

It started considering Note links, Associations and other indirect linkages… and as higher priority than direct genetics. And started showing alternate paths passing through that same ancestor of siblings.

When @ukulelehans start beta testing his Consanguinuity Gramplet, I had to jump ship. Unlike Deep Connections, his Gramplet allowed the Connection analysis to be limited to Direct ancestors.

Note: Yesterday, I discovered that having the Pedigree Gramplet active in the Relationships view was causing the GUI to become unresponsive after every change committed to an Event via the Person Editor. (Even though I was not changing the Active Person focus nor adding a Person to the Tree, the Pedigree Gramplet was continually recalculating. And with the generations limit configuration set to 25, it was consuming a lot of resources.)

And that’s quite a sin, passing the same relative twice. I mean, any decent developer should be aware of such a trap. And in a way it shows by the fact that we need a setting to prevent that the algorithm goes too far. I don’t have that setting in PAF, or RootsMagic, and GENi can do without that too.

Do you know when that performance hit hit you? I see no difference between the 3.4.9 that I still use for my daily work, and the same Gramplet in 5.1.5.

The Consanguinity Gramplet is not an alternative, because I like to see the path when a person is an in-law too.

P.S.: I made the title quite vague, because I’m curious about other performance hogs too.

And just for fun, GENIi needed less than 10 minutes to come up with something like this:

Denis McCullough is your 14th cousin’s husband’s brother’s wife’s third cousin’s husband’s great grandfather.

I find the noticeable perfomance drop for filtering and selectors very annoying, because I need to them all the time. If a report take more time to generate it isn’t a big issue for me, because I can decide when to generate it.

You probably ought to jump into the Developer discussions then.

One of the big performance hits for the Filter gramplet was the change from a “phrase” name search to a “Any word” search. The current workaround is to use the Searchbar because it remains a Phrase search.

But the intent has been expressed to convert the Searchbar to ALSO be a “Any word” search. I’m OK with that as a NON-default option. But the faster searchbar should be the default.

If the Filter Gramplet had the OPTION to switch back to a Phrase search, perhaps it would give it a faster modality?

It is True if you don’t use the “use regular expressions”. If you the regexp, you continue to use the phrase search.

Thanks. I will do some timing tests and post results.

Note that if that workaround works, it is ALMOST as counterintuitive as another utterly vital one needed repeatedly every data-entry session. The average Genealogist doesn’t even know what “Use regular expression” means nor the rules or patterns. That it has the extra baggage of toggling between “Phrase” & “multiple keyword” search is an “easter egg” that only a coder would suspect. And that this could be higher efficiency than using double-quotes to enclose the Phrase is arcane.

The most counterintuitive workaround is for Drilling down in the Select Object dialogs with Grouped lists… to drill down: Find, highlighting & Clearing the Find.

e.g.: to drill down to one of the MANY "Springfield"s as an Enclosing Place in the Select Place list: type the “Name contains” search pattern string "springfield, Click Find (Although the Find will actually locate the Records, there is an extra refresh of the dialog at the end of the Populating. So you must often click Find twice because it tends to flush the 1st results), highlight the desired enclosing “Springfield” by reviewing the Title, Clear the Find results while the drill-down target is selected, the focus has now scrolled to the approximate drill-down level.

Cannot use the Search-as-you-type Ctrl-F because it can only locate the 1st “Springfield” subgrouping.

It looks like this extra refresh does not show in Linux, meaning that I don’t see it in Mint.

The Select Place dialog does need improvement though, because it’s quite useless when you have identical street names in different cities, and automatic titles are off.

2 Likes

I am ambivalent about adding Street Level detail to the Place hierarchy. Unless the place is “significant” to the family history, using the Addresses tab records the data without burying us in minutiae … and it keeps the Hierarchy from exploding. I am more likely to want to see 6 people were in a Township of a County, not that they were on certain roads… roads that I probably will not be able to place in my mental map.

It makes collation of Map pins more viable too.

It would be nice if Addresses could leverage the Place hierarchy and if it also supported GPS data within its granularity. (The “Locality, State/county, City, Country” lines could alternately be equivalent to a single “Place ID” and which shows the Place Title. Although the Locality might be better kept separate so that you could include a business or building name that you don’t want added to the Place hierarchy. Like the name of the Long-Term Care facility, or “hospice at home of son-in-law, John Smith”) That might also give the Geography view the option (although not the obligation) to ‘explode’ the granularity when you zoom in.

It’s true that I don’t add many streets, but the problem is the same for churches. We have English and French churches in many cities, and many villages with a reformed church, which is not defined by a name, because it’s the only one of that designation. And then there are also loads of catholic churches named after saints, so I have a lot of churches named after St. Mary, or St. Nicholas for that matter. And all these can not be distinguished in the current selector, because it doesn’t show the hierarchy when you search. You only see that when you have automatic titles, but I don’t use those, for a couple of reasons, including the fact that I often have a title like St. Nicholas, Amsterdam, with the street address in the location details, and not in a separate address object.

In those ambiguous cases, I make a Place naming concession that causes a bit of future proofing anxiety.

It helps with data entry but it is pretty certain that it will come back to bite me when generating reports.

For REALLY ambiguous places (like the manifold Springfield’s : boroughs, cities, municipalities, towns, townlands, districts, townships, parishes, counties, etc.), I’ve embedded a parenthetical & the civil division. Even though many place descriptions include the civil division in Camel Case, that implies that term is part of the official name. That is untrue. So my civil division is left in lower case.

Thus, the Springfield in Oakland county, MI becomes “Springfield (Oakland) township” & in Massachusetts becomes “Springfield (MA) county)”

I have no problem adding up to the house number in the place hierarchy. More than once it allowed me to put myself on the tracks because of the proximity or to imagine the living conditions which could lead to a meeting or a marriage. It also allowed me to help me identify photos. So personally I find more advantages than disadvantages.

3 Likes

My concern is more about performance.

The two tables that are not flat lists (Places and Individuals) have significant latency populating the the Object Selectors in my tree. (It is a ‘research’ collection that includes hypotheticals, not strictly ‘proven’ genealogy.)

My Place tree already has loops. (Because enclosing Places changed over time for disputed territories.) So every piece of data kept out of the deepest levels can remove multiple evaluations when populating the Selector.

And the populating Place Selector is one of my two most flakey features. A session starts with it populating fast (less than 2 seconds) but it gets slower & slower throughout a data entry session. Just before I HAVE to restart Gramps, it takes more than a minute. Which is why my workflow resorts to clipboarding each Place used in a data-entry session.

(The other flakey portion where latency grows in a session is: switching between People & Relationships view. )

I know that GTK 3 is part of the problem, because the person tree view in that is much slower than in GTK 2, which I still use in Gramps 3.4.9. And initially, it was so slow, that Doug Blank added a caching mechanism to all tre views to make them a bit smoother. And when I compare Gramps 3.4.9 and 5.1.5 on the same platform, I can see that scrolling in 5.1.5 is quite jumpy. I can make this comparison for persons only, for obvious reasons.

The Gnome team has always denied that this is a bug, but I can prove it with trace statements, which show that GTK 3 generates way more callbacks than necessary when you start scrolling. And this may affect populating the selectors too.

Since you work with Windows, you have a chance to make a comparison between 3.4.9 and 5.1.5, because they can be installed independently. And when my interpretation is right, you will see a difference in latency in the person selector.

1 Like

The performance could easily be overcome by actually using the tables, indexes and views in a relational database engine for what it was made for, instead of serialized strings of multiple values all in one table (the blobs)…

More of the search and index work would be done in the database engine, and then utilizing views in the database for the most used views in Gramps.
And another benefit of this would be that we didn’t need to wait for Gramps to first update the whole blob, then read it back to memory and then populate the actual view in Gramps…

most likely to much work to programming that type of changes… but the benefits would be that Gramps would be a lot more responsive when it comes to read, write, update and delete objects…

2 Likes

A relational database isn’t necessarily faster. It can even be slower because data that is now in the blob needs to be retrieved from another table, although that should not be too much of a problem with the right indexes.

For places (locations) we already have a text field for the title, but not for the name, and also for the enclosed_by link. And both fields have indexes (indices) too. Adding a field for the name should be easy, and it would indeed allow to search the database, instead of building a whole place tree in memory, like we do now.

2 Likes

I reported this before, maybe it is related. When I add a new Place with an Enclosed by, it takes up to 10 seconds to save the record.
Does it have to save this blob and then retrieve it??
5.1.5 on Win10. There are about 4,000 Places.

Adding an Enclosing leaving requires updating the fully qualified place “title” for ALL the enclosed Places. And that seems to be a slow process.

(Is it repeating the assessment all the way to the topmost enclosing Place level for every enclosed place? And possibly adapting to time periods too? It seems like the Edit Place functionality could just do the currently active place and make updating the remainder into a background task.)

When I read this, I am assuming that you mean something like adding a new town enclosed by an existing county. Is that right? I’m asking, because Brian’s interpretation is a totally different scenario, and I assume that you have the same mother language, sort of. I don’t.

In my interpretation, adding a new place, enclosed by an existing one, means adding one row to the database, which has a handle to the enclosing object, and that’s fast. It will also mean finding the right node in the display tree, and adding an extra node (leaf) there. And on my PC, this takes just a second in Linux.