This thread is about discussion around CSV and JSON files updates.
The first questions set is copied from another thread:
Not imediately, but after clicking on another entity (for example, another person), the data will be read from the updated CSV file, and the newly made changes to the links will be displayed. I need to double-check, but as I remember Gramps or WebSearch restart is required if json is updated. I thought json-updates will not be often and it doesn’t have sense reload json each time. This needs further discussions, what do you think?
@emyoulation I can even suggest read csv files and json each X seconds (move this option to config). Then all csv, json updates will be applied automatically with predefined frequency.
Please do not do that. A Reload button in the Config would be far safer in avoiding extra Gramps “churning” in the background. (The timed automated backup does that and causes occasional complaints.)
Another thing I don’t like is the parenthetical notations that specify which variables are used. For example (g,s,b,d). But some links have differense only in these letters. I have no idea how make it more clearly
I am not recommending the following, but maybe a different naming convention should be considered. (Fortunately, naming does not affect the gramplet code. Although it does affect the sorting in the Configuration.)
If you choose to do regional silos, it might be better to adopt locale codes. And since locale code use hyphens, maybe underscores should be used elsewhere. (Adding a leading underscore for the 3 global CSVs would groups them at the beginning of the list.)
oh, maybe it just means that geneanet forces the NL code for your locale (registered or set)…
I do not really think it should be in fr-links.csv! Just it is the ‘first’ alternative to the deprecaded one, which does not work anymore.
When we use the “old” query (pseudo-API) - alias the hardcoded link for the french locale - we get a message like this:
“As announced several months ago, the ‘Search by Names in Trees’ service is being replaced by new tools. This search tool, which was technically obsolete, only provided access to a very small portion of the available records on Geneanet.”
On the “new” incomplete pseudo-API, I still get a problem with “?go=”. Whatever number set, it returns something. So, it looks like a server ID.
Then, whatever argument/attribute set as key, we get something (tests with $ curl).
I have the impression that they made something to avoid hardcoded urls (uid for client or tracking stuff). With a typical url (and query/keys) for people reading french (fonds = founds or repositories in context ; individus = individuals, nom = surname, prenom = given name), we get an answer. I also got it in english. As with curl, I get two answers and an horrible HTML extra return, I wonder what is the main motivation?
One simple query, like a web navigation, but I get a strange answser.
Take care, to navigate to the Belgium archives means that you are either reading french (walloon), dutch (flemish) or german !!!
In Switzerland, we can also find Italian version…
Seriously, when one switches to sources for Canada or Belgium, it will be difficult to assign a language code. This might explain why you get a NL code as return under geneanet, maybe after looking at archives for Belgium…
Alright, let’s summarize together and try to make a final decision about what to do with these links.
I hope other researchers will join our discussion with @romjerome to share their opinions.
Problem:
We have several links where the locale or language corresponds to one country, but the resource actually belongs to another country. Sometimes, there is just a close historical or linguistic connection.
Here are the currently known ambiguous resources:
fr-links.csv – Canadian source (people): https://www.fichierorigine.com/recherche?nom=%(surname)s
fr-links.csv – Canadian source (places): https://www.fichierorigine.com/recherche?commune=%(place)s&pays=%(root_place)s
1. Leave everything as is
That means French users will continue seeing these links in fr-links.csv (as it was in the FRConnectPack addon).
This makes sense, because for example, on the homepage of the Canadian source, it clearly says in French:
“Répertoire des actes des émigrants français et étrangers établis au Québec des origines à 1865.”
This suggests the resource is intended for French speakers, especially those researching emigrants from France.
2. Move these links to the regions that own the archives
This is also a logical approach. But just like in the first case, users from those regions (e.g., Canada or Belgium) might be confused to see these resources in their localized CSV.
3. Create separate hybrid files, using a naming pattern like {locale}-{country}-links.csv.
For example: fr-ca-links.csv or fr-be-links.csv.
But over time, we might end up with too many files.
Also, @romjerome pointed out:
“It will be difficult to assign a language code.”
4. Create a single new file: cross-links.csv
This file would contain all such cross-regional or linguistically linked resources.
Their Title would be labeled clearly, for example:
CA resource with FR locale
BE resource with FR locale
These links could be marked with a special icon:
Now we need to decide:
Which of the above 4 options do we choose, or does someone want to suggest an alternative approach?
Once we decide, I’ll implement it in WebSearch accordingly.
@romjerome I have another question to your yesterday’s list of links. Could you summarize pls which of them shold be removed, which of them should be moved, etc.
I still have a problem for understanding the expected parameters, but for maintenance and checking active or efficient urls, we could run something like this:
‘d’ was for departemental, ‘r’ is now for regional!
To have a simple and quick solution for checking urls (without scripts or cookies stuff), should let us update urls into a pseudo-sandbox, and maybe via a simple script.
This kind of testing won’t give the expected results because in some cases, redirects will work for visitors from certain regions and not for others. Some users might get a 404 or another error code, while others won’t. Additionally, sites can be temporarily unavailable. To properly test links, you’d need to use paid services like Apify and configure them correctly so that the visits appear as if coming from real users with real browsers and the appropriate locale. Testing with parameters won’t help either, since many resources require user authentication first. We also shouldn’t forget about services like Cloudflare, CAPTCHAs, and similar protections. All of this means such testing does not guarantee reliable results.
Regarding the idea of splitting parameters into separate fields — this complicates things for users who are not technically skilled.
From within the Gramplet, it’s easy to extract the domain using Python and test just that. It’s just as easy to strip out query parameters and test the base URL only.
I could automate this on a basic level by checking only domain availability (and I’ve even considered doing that), but I don’t see much practical value in it.
It’s possible to write a script that parses all the CSV files and processes the URLs into a structured format suitable for testing — exactly the kind of structure you’re referring to.
Such a script could:
Iterate through all the .csv files,
Extract each URL and split it into base URL and query parameters,
Normalize them for analysis or templating,
Optionally, generate curl commands or send curl requests directly for testing (e.g., checking domain accessibility, tracking changes in accepted parameters, etc.),
Output the results into a .txt file for manual review or further automated checks.