I recently got my first DNA result and just started using the DNA Segment Map Gramplet - thanks @GaryGriffin for that!
One thing I noticed is that copy & pasting from Gedmacht requires me to remove the thousands separator dots, otherwise the overlaps were not displayed.
@Nick-Hall where should we file an enhancement feature request to support number format conversion in the Gramps variant of glocale? Or maybe I should ask how to write it to point out where the modification needs to occur?
I suggest not to bother at all with Grampsā locale management. The reason Gedmatch displays the table like this is likely due to my browserās locale, but we donāt know whether the browser uses the same locale as Gramps; in fact we donāt even know the data was copied from a browser on the same system that Gramps is running on.
I suggest to take a more pragmatic approach. We know which columns are floats and which columns are integers, and we also know that the floats are always less than 1000. So we can just drop all non-[0-9] in the integer columns and convert all , (there will be at most 1) to . in the float columns.
Agreed - I need to determine thousand separator and radix independent of locale.
So try the following code:
if re.search('\t',line) != None:
field = line.split('\t')
line_wo_thousand = line
if re.search(',',field[1]) != None:
line_wo_thousand = re.sub(',','',line)
elif re.search('.',field[1]) != None:
line_wo_thousand = re.sub('\.','',line)
if line_wo_thousand != line:
line = line_wo_thousand
if re.search(',',field[3]) != None:
line2 = re.sub(',','.',line)
else:
line2 = line
line = re.sub('\t',',',line2)
field = line.split(',')
The top and bottom lines exist in the original code. The logic is completely replaced. Check the Start Pos for the thousands separator and the SNPs for the radix. Once line is normalized to no thousands separator and period for radix, replace the tab charactor with comma and then process line.
No, the point is that the Gramplet needs to interpret copy/pasted content from a web site, and we donāt know which locale the web site displays the numbers in. For instance, if Iām opening the English web site from my browser with a non-English locale, it depends on the web site how it chooses to show me the numbers. So itās better to parse it in a locale independent way.
@GaryGriffin: I think your code looks good! In re.search, doesnāt the dot also need to be escaped?
Here is a slightly more concise version which I think has the same effect:
if "\t" in line:
field = line.split("\t")
if "," in field[1]:
line = line.replace(",", "")
elif "." in field[1]:
line = line.replace(".", "")
line = line.replace(",", ".")
line = line.replace("\t", ",")
field = line.split(",")
then explain to me how you plan to show me the decimal delimiter as commas after you have āimportedā them to Gramps, because I need all my result usable in MY language, not in English, and I bet most of Europeans need it to, when they are to use them in books and articles (or maybe export them again for use in other software).
Create a Note in the Association or attached to a Citation in the Association with the shared DNA segment data.
So you copy and paste the data from the external website (e.g. Gedmacht), in whatever format that website provides to you, into your note. The gramplet is not involved in this step at all.
Then, the Gramplet needs to process that note, and thus needs to handle different number formats.
That has nothing to do whatsoever with my opnion, ignorance, arrogance, or native language.
so why not use a separator that is commonly used, most system use CSV if they provide export of DNA results⦠by using commas, you will also drop one step for anything else that the gramplet might be doing int the future⦠e.g., reading a CSV file from any of the test providers directly.
The only two formats I have ever seen been used is CSV and TSV in any software that import/export DNA result, I donāt think I have ever used a software that actually use DSV as an export format⦠but maybe it should have been used, since it supports any delimiter you like, including pipe, middle point etc.
Why using delimiters not commonly used in other software or āstandardsāā¦
(I am not talking about locales, but digitally used āstandardsā).
No, itās feedback to Gramps or more accurately, for this function/gramplet specially, I donāt give feedbacks on tools I am not using or want to useā¦
I did want to use this gramplet, butā¦
I really do not understand why it is so difficult to use āstandardsā or logics, interoperability and interchangeability standards thatās already well established⦠or actually use tools and libraries that already do a great job for a given task, it just amazes me how reluctant, in general, some developers are to utilize the work of other open source and open data libraries and standards⦠itās like it is very important to try to invent the wheel all over again, and again, and again⦠it just amazes meā¦
But maybe thatās why I am moving more and more of my research over to software solutions that actually use Open Data and Open Standards and as much as possible store data in plain text files, so that the data can be reused in different software without the need of storing it multiple times in multiple āformatsā, and that utilize commonly used non-lossy interchangeable or interoperability formatsā¦
Most likely it is just because I am lazy or that my head no longer manage to process programming languages and that I just want to do research rather than start learning Python, C#, C++, Pearl, R, Julia etc. etc.
I will not interphere with my opinions anymore, sorry that I spoked up.
Hmmm. Making the input parsing ālocaleā agnostic (instead of adaptive to the OS locale) raises a curious issue. If youāre predicting which parsing rule to use by column order, what happens with a RTL (RightToLeft language) source for a DNA segment data table?
Maybe @avma or @yaron could find a hebrew example?
How about adding an āAuto Detectionā combo box to the paste field with all the different locale options in case the user is not happy with the results?
When talking about number formatting Hebrew is not far different than British English, our differences are mostly in date/time display in longer form but thatās not the case I assume.
Let me take a step back and explain the history of this gramplet. Initially it was written to accept CSV input. After realizing that most of the sites that provide the DNA shared segment info were generating a TSV (which some used a thousands separator), I added support for TSV - assuming an optional thousands separator of ā,ā and a radix (decimal) character of ā.ā. This was to ease the data entry to a cut-paste operation.
Sites like GEDmatch provided data in TSV with a thousands separator. When it was pointed out that the German version of GEDmatch used a different thousands separator (and radix), I updated the gramplet to work with either a ā,ā or a ā.ā as thousands separator.
Note that the user cuts from a program like GEDmatch and pastes to an Association Note. Where I made the change was how I processed the Note to extract the data. The Note is in whatever language the user wishes.
So with this change, the following lines in the Association Note are interpreted exactly the same. Any of the 4 formats can be used interchangeably.
1,54751900,83468985,31.8,1451 (CSV)
1 54751900 83468985 31.8 1451 (TSV with no thousands )
1 54,751,900 83,468,985 31.8 1,451 (TSV with thousands ',')
1 54.751.900 83.468.985 31,8 1.451 (TSV with thousands '.')