WebSearch: CSV and JSON files updates

Urchello · March 24, 2025, 6:03pm

This thread is about discussion around CSV and JSON files updates.
The first questions set is copied from another thread:

Not imediately, but after clicking on another entity (for example, another person), the data will be read from the updated CSV file, and the newly made changes to the links will be displayed. I need to double-check, but as I remember Gramps or WebSearch restart is required if json is updated. I thought json-updates will not be often and it doesn’t have sense reload json each time. This needs further discussions, what do you think?

@emyoulation I can even suggest read csv files and json each X seconds (move this option to config). Then all csv, json updates will be applied automatically with predefined frequency.

emyoulation · March 24, 2025, 6:11pm

Please do not do that. A Reload button in the Config would be far safer in avoiding extra Gramps “churning” in the background. (The timed automated backup does that and causes occasional complaints.)

Urchello · March 24, 2025, 6:16pm

So, on my opinion current behavior is optimal:

csv-updates are applied after active entity is changed.
json updates are applied after Gramps or WebSearch gramplet rebooting (I beleive its very rare action)

Urchello · March 24, 2025, 6:22pm

@romjerome Why do you think that the link https://www.geneanet.org/fonds/individus/?go=0&nom=%25(surname)s&prenom=%25(given)s should be in fr-links.csv?
This link redirecting me to NL locale. So, maybe nl-links.csv is better? or common-links.csv?

Urchello · March 24, 2025, 6:32pm

Looks like we need another csv-files category for region-ambiguous links with names like:

fr-ca-links.csv,
fr-be-links.csv,
fr-nl-links.csv

Such links will have two (or zero) flags.

An alternative option is:

fr-ambiguous-links.csv

This can be useful when citizens of one country are searching for information in the links of other countries.

Urchello · March 24, 2025, 7:16pm

Another thing I don’t like is the parenthetical notations that specify which variables are used. For example (g,s,b,d). But some links have differense only in these letters. I have no idea how make it more clearly

emyoulation · March 24, 2025, 7:32pm

I am not recommending the following, but maybe a different naming convention should be considered. (Fortunately, naming does not affect the gramplet code. Although it does affect the sorting in the Configuration.)

If you choose to do regional silos, it might be better to adopt locale codes. And since locale code use hyphens, maybe underscores should be used elsewhere. (Adding a leading underscore for the 3 global CSVs would groups them at the beginning of the list.)

be-links.csv
common-links.csv
de-links.csv
fr-links.csv
gb-links.csv
nl-links.csv
ru-links.csv
static-links.csv
sv-links.csv
ua-links.csv
uid-links.csv
us-links.csv

_common_links.csv
_static_links.csv
_uid_links.csv
be_links.csv
de_links.csv
en-gb_links.csv
en-us_links.csv
fr-BE_links.csv
fr-CA_links.csv
fr-CH_links.csv
fr-FR_links.csv
nl_links.csv
ru_links.csv
sv_links.csv
uk-UA_links.csv

Urchello · March 24, 2025, 7:47pm

Yes its possible. But maybe sorting in python is better solution to avoid extra characters.

Urchello · March 24, 2025, 7:50pm

emyoulation:

_common_links.csv
_static_links.csv
_uid_links.csv
be_links.csv
de_links.csv
en-gb_links.csv
en-us_links.csv
fr-BE_links.csv
fr-CA_links.csv
fr-CH_links.csv
fr-FR_links.csv
nl_links.csv
ru_links.csv
sv_links.csv
uk-UA_links.csv

romjerome · March 24, 2025, 8:46pm

oh, maybe it just means that geneanet forces the NL code for your locale (registered or set)…

I do not really think it should be in fr-links.csv! Just it is the ‘first’ alternative to the deprecaded one, which does not work anymore.

When we use the “old” query (pseudo-API) - alias the hardcoded link for the french locale - we get a message like this:
“As announced several months ago, the ‘Search by Names in Trees’ service is being replaced by new tools. This search tool, which was technically obsolete, only provided access to a very small portion of the available records on Geneanet.”

On the “new” incomplete pseudo-API, I still get a problem with “?go=”. Whatever number set, it returns something. So, it looks like a server ID.

Then, whatever argument/attribute set as key, we get something (tests with $ curl).

I have the impression that they made something to avoid hardcoded urls (uid for client or tracking stuff). With a typical url (and query/keys) for people reading french (fonds = founds or repositories in context ; individus = individuals, nom = surname, prenom = given name), we get an answer. I also got it in english. As with curl, I get two answers and an horrible HTML extra return, I wonder what is the main motivation?
One simple query, like a web navigation, but I get a strange answser.

Custom test and query

The expected return should be something like:

but for geneanet, we rather get something like:

https://www.geneanet.org/fonds/individus/?go=0

etc.

by using the same “local test”, we might try to list addresses with “closed” form/API. e.g.,

https://gbkcouples.geneabank.org/nom/?name=nom

https://gw1.geneanet.org/index.php3?b=favrejhas&m=NG&n=nom&t=N&x=0&y=0

Should work, but there is [no preview](https://gallica.bnf.fr/services/engine/search/sru?operation=searchRetrieve&exactSearch=false&collapsing=true&version=1.2&query=(text all “nom” prox/unit=word/distance=1 “prénom”)).

search | Open Library"nom, prénom"search | Open Library“nom, prénom”&language=fre

oh, this domain is maybe no more safe…

https:// www.cdhf.net /fr/index.php?t=villages

etc.

Take care, to navigate to the Belgium archives means that you are either reading french (walloon), dutch (flemish) or german !!!
In Switzerland, we can also find Italian version…

Seriously, when one switches to sources for Canada or Belgium, it will be difficult to assign a language code. This might explain why you get a NL code as return under geneanet, maybe after looking at archives for Belgium…

Urchello · March 25, 2025, 6:57am

Alright, let’s summarize together and try to make a final decision about what to do with these links.
I hope other researchers will join our discussion with @romjerome to share their opinions.

Problem:

We have several links where the locale or language corresponds to one country, but the resource actually belongs to another country. Sometimes, there is just a close historical or linguistic connection.
Here are the currently known ambiguous resources:

fr-links.csv – Canadian source (people):
https://www.fichierorigine.com/recherche?nom=%(surname)s
fr-links.csv – Canadian source (places):
https://www.fichierorigine.com/recherche?commune=%(place)s&pays=%(root_place)s
fr-links.csv – Belgium’s Archives:
https://search.arch.be/fr/rechercher-des-personnes/resultats/q/zoekwijze/s?text=%(surname)s

Possible solutions:

1. Leave everything as is
That means French users will continue seeing these links in fr-links.csv (as it was in the FRConnectPack addon).
This makes sense, because for example, on the homepage of the Canadian source, it clearly says in French:

“Répertoire des actes des émigrants français et étrangers établis au Québec des origines à 1865.”
This suggests the resource is intended for French speakers, especially those researching emigrants from France.

2. Move these links to the regions that own the archives
This is also a logical approach. But just like in the first case, users from those regions (e.g., Canada or Belgium) might be confused to see these resources in their localized CSV.

3. Create separate hybrid files, using a naming pattern like {locale}-{country}-links.csv.
For example: fr-ca-links.csv or fr-be-links.csv.
But over time, we might end up with too many files.
Also, @romjerome pointed out:

“It will be difficult to assign a language code.”

4. Create a single new file:
cross-links.csv
This file would contain all such cross-regional or linguistically linked resources.
Their Title would be labeled clearly, for example:

CA resource with FR locale
BE resource with FR locale

These links could be marked with a special icon:

Now we need to decide:

Which of the above 4 options do we choose, or does someone want to suggest an alternative approach?
Once we decide, I’ll implement it in WebSearch accordingly.

Urchello · March 25, 2025, 7:06am

@romjerome I have another question to your yesterday’s list of links. Could you summarize pls which of them shold be removed, which of them should be moved, etc.

romjerome · March 25, 2025, 8:31am

@Urchello does it make sens to split url?
Something like “domain”+“options (key/value)”

e.g.,
https://francearchives.gouv.fr/fr/basedenoms?es_names=nom&es_forenames=prénom&es_locations=lieu&fulltext_facet=free_text

I still have a problem for understanding the expected parameters, but for maintenance and checking active or efficient urls, we could run something like this:

$ curl https://francearchives.gouv.fr/fr/basedenoms -d \ '{"es_names:nom", "es_forenames:prénom", "es_locations:lieu", "fulltext_facet:free_text"}'

romjerome · March 25, 2025, 8:47am

@Urchello. This could be an other example for having “splitten” urls.

The new domain is now:

So, an update could be:

-http://www.cdhf.net/fr/index.php?t=bases&d=bases%2Fmoteurpat&c=moteurpat&f=selection&p=&order=&order2=&motcle=&patronyme=%(surname)s
+https://www.crhf.net/fr/index.php?t=bases&d=bases%2Fmoteurpat&c=moteurpat&f=selection&p=&patronyme=%(surname)s

‘d’ was for departemental, ‘r’ is now for regional!

To have a simple and quick solution for checking urls (without scripts or cookies stuff), should let us update urls into a pseudo-sandbox, and maybe via a simple script.

Urchello · March 25, 2025, 9:15am

This kind of testing won’t give the expected results because in some cases, redirects will work for visitors from certain regions and not for others. Some users might get a 404 or another error code, while others won’t. Additionally, sites can be temporarily unavailable. To properly test links, you’d need to use paid services like Apify and configure them correctly so that the visits appear as if coming from real users with real browsers and the appropriate locale. Testing with parameters won’t help either, since many resources require user authentication first. We also shouldn’t forget about services like Cloudflare, CAPTCHAs, and similar protections. All of this means such testing does not guarantee reliable results.
Regarding the idea of splitting parameters into separate fields — this complicates things for users who are not technically skilled.
From within the Gramplet, it’s easy to extract the domain using Python and test just that. It’s just as easy to strip out query parameters and test the base URL only.

I could automate this on a basic level by checking only domain availability (and I’ve even considered doing that), but I don’t see much practical value in it.

Urchello · March 25, 2025, 9:19am

It’s possible to write a script that parses all the CSV files and processes the URLs into a structured format suitable for testing — exactly the kind of structure you’re referring to.

Such a script could:

Iterate through all the .csv files,
Extract each URL and split it into base URL and query parameters,
Normalize them for analysis or templating,
Optionally, generate curl commands or send curl requests directly for testing (e.g., checking domain accessibility, tracking changes in accepted parameters, etc.),
Output the results into a .txt file for manual review or further automated checks.

But will anybody run this script? Im not sure..

Urchello · March 25, 2025, 9:25am

romjerome:

The new domain is now:

https://www.crhf.net

So, an update could be:
-http://www.cdhf.net/fr/index.php?t=bases&d=bases%2Fmoteurpat&c=moteurpat&f=selection&p=&order=&order2=&motcle=&patronyme=%(surname)s
+https://www.crhf.net/fr/index.php?t=bases&d=bases%2Fmoteurpat&c=moteurpat&f=selection&p=&patronyme=%(surname)s
‘d’ was for departemental, ‘r’ is now for regional!

I will double-check it and will fix. Thank you!

hamkg · March 25, 2025, 10:22am

Thanks for this. In my sv-links.csv file I replaced
https://search.geneanet.org/result.php?lang=sv&name=%(surname)s
with
https://www.geneanet.org/fonds/individus/?go=0&nom=%(surname)s&prenom=%(given)s
Works perfectly.

Urchello · March 25, 2025, 10:36am

I will make the same in WebSearch project for your region. Thank you guys!

romjerome · March 25, 2025, 1:24pm

I am not certain that it is the right url! I guess that you can try
https://www.geneanet.org/fonds/individus/?go=0&namn=%25(surname)s&förnamn=%25(given)s
and it might also match something…

Topic		Replies	Views
A new WebSearch gramplet is ready Development third-party-addon	319	2444	July 7, 2025
FamilySearch Gramplet Help third-party-addon , familysearch	134	4661	February 8, 2025
Updated narrative web for the future 5.2 release Beta Testing websolutions , feedback	58	2703	October 12, 2020
New website for addons Ideas third-party-addon , isotammi	28	1969	December 6, 2021
My first gramplet - "Historical Context" for gramps 5.2 Development third-party-addon , timeline	148	1362	May 11, 2025

Custom test and query

Problem:

Possible solutions:

Now we need to decide:

Related topics