Replacing the translations extracting tool

hhyyrylainen · January 24, 2023, 9:41am

So we’ve had a python based translations extraction tool for basically since we added translation support to the game. It works pretty well, but there’s now a need to modify the behaviour in a pretty complex manner. So far the maintenance I’ve done on that tool has been pretty basic python code, but that change would require a bit of a deeper understanding of the tool.

Also there’s been some people who have failed to got the python based tool running. Though, I don’t find it particularly complicated to work with, even with a venv setup, but that might be because I have multiple years of professional python experience, but I do get that an extra dependency on an entirely different programming language is not that great. So not having to use python for that one task would be pretty nice.

Those two points basically are leading me to be in favour of just writing our own custom text extraction tool in C#. That would make it easier to maintain (for people who can’t read undocumented and very data dependent python code) and easier to run. I’m right now needing to update some of our code and CI build environment for dotnet 7, so I think I might see if this is a big task and do it at the same time if it seems feasible to do in a couple of hours.

hhyyrylainen · January 25, 2023, 10:16am

I managed to do this, it wasn’t super hard but did take quite a few hours:

github.com/Revolutionary-Games/Thrive

Updated to .NET 7, fixed new code check warnings, rewrote translation extraction in C#

Revolutionary-Games:master ← Revolutionary-Games:dotnet_7_update

opened 10:08AM - 25 Jan 23 UTC

hhyyrylainen

+37228 -34826

**Brief Description of What This PR Does** Updated to .NET 7, updated the ana…lyzers that go with it, and did some warnings fixing (https://forum.revolutionarygamesstudio.com/t/updating-to-disposing-all-godot-objects/975/3), and finally rewrote the translations extractor in C# so that python is no longer required for developing Thrive. I didn't test the translations extractor on Windows so someone testing that would be good. **Related Issues**  closes #3984 **Progress Checklist** Note: before starting this checklist the PR should be marked as non-draft. - [x] PR author has checked that this PR works as intended and doesn't break existing features: https://wiki.revolutionarygamesstudio.com/wiki/Testing_Checklist (this is important as to not waste the time of Thrive team members reviewing this PR) - [ ] Initial code review passed (this and further items should not be checked by the PR author) - [ ] Functionality is confirmed working by another person (see above checklist link) - [ ] Final code review is passed and code conforms to the [styleguide](https://github.com/Revolutionary-Games/Thrive/blob/master/doc/style_guide.md). Before merging all CI jobs should finish on this PR without errors, if there are automatically detected style issues they should be fixed by the PR author. Merging must follow our [styleguide](https://github.com/Revolutionary-Games/Thrive/blob/master/doc/style_guide.md#git).

Kasterisk · January 26, 2023, 9:39am

This is a bit unrelated to this thread’s purpose but still about i18n which I’ve been thinking for a while now. As somebody who have worked on many PRs involving localization, there are a couple gripes I have with our current localization system and some possible ideas for streamlining it.

I find our current way of adding new localizable strings a little too convoluted, specifically the way we have to manually fill new entries in en.po after extracting SOURCE_STRING texts (the translation identifiers). I’ve seen how Godot does localization, it seems they don’t even have an en.po file, the source string for English is the actual string itself! I think this is far more efficient and the benefit to this method is that we don’t have to do double work when adding new texts on the UI.

The next thing that bothers me is having to update the localization in every PR that touches localizable texts, this not only bloat many PRs but is the source of many merge conflicts to date, I know that this is to self-enforce up-to-date localizations and make a maintainer’s live easier but it gets tiring for PR makers. I think we could optimize a lot of this work by just doing the update maybe once every week, maybe have a bot that do that which also detects localization changes so it wouldn’t open redundant PRs. This kind of relates to the first point in that no manual en string addition will be needed.

This would be a massive overhaul and something would need to be done with existing translations in Weblate so I don’t expect something like this would be worked on anytime soon, but those were key points I feel could be improved upon as a regular contributor. Overall I think a refactor of the current system will be a worthy investment.

hhyyrylainen · January 26, 2023, 9:49am

Problem is that when typos are found in the English text, they cannot be fixed easily. Instead the typo will need to be fixed by a programmer who can run the text update code.

I made the concious decision of using translation keys instead of the raw text when setting up the translation system. This decouples the English text from being the translation key which has various benefits like that not messing with other texts when the English text just had a typo corrected.

PRs that don’t add the text content they change, are by definition broken.
We already don’t require fully up to date translation files (the source text locations can be incorrect in the PR).

One thing I put in as a TODO comment to the text extractor is that it would be possible to alphabetically sort the translation strings, that way only people who add or remove translations need to run the translation update.

Alphabetically sorting the translations is pretty easy to add.

Moving on from having English not be editable on Weblate would be a huge change for the translation workflow and as I said above that would mean that people can no longer suggest typo fixes for English or better wording. And changes like Rename suicide button related English text to “perish instantly” couldn’t be done easily like that just on Weblate. Tying together the technical implementation and what is shown to the user is often a bad idea, as not doing it gives you a lot of flexibility.

hhyyrylainen · January 28, 2023, 11:30am

I guess we could have a poll on alphabetically sorting the translations.

Here’s the pros:

Moving code around to different files or adding new elements that use existing translations don’t need to run the localization script
It might be slightly less likely for there to be merge conflicts as changes are perhaps more likely to be more spread out in the localization files

And the cons:

The order of translations as shown to new translators on weblate will be slightly less intuitive, though even now the order is not exactly thought out that well and might not be a big deal after all

Should translation files be alphabetically sorted?

Yes
No

0 voters

84634E1A607A · January 28, 2023, 12:54pm

It might be slightly less likely for there to be merge conflicts as changes are perhaps more likely to be more spread out in the localization files

As far as I am concerned, most of the merge conflicts are because of line changes in the comment (which can be ignored). This kind of merge conflicts cannot be solved my sorting things alphabetically.

However, some conflicts resulting from re-ordering will be easier to solve. (Which, before, was a big headache.) So I strongly Support this suggestion.

Append:

Also with translations sorted alphabetically, I suggest to add prefixes to more translation keys that are in a certain group. This way, keys with similar meaning will be grouped together (and more intuitive).

hhyyrylainen · January 28, 2023, 1:20pm

Certain things that aren’t overly specific, shouldn’t have too specific prefixes, though I guess something like “CHOICE_YES” or something like that could be done with prefixes to have general enough prefixes that it wouldn’t be a problem.

Related to this I’ve been thinking for some time now that we’d likely be able to make a custom merge script that automatically overwrites local source line location changes in translation files.

With a quick search I found this example showing how to add custom merging for a file type:

hhyyrylainen · January 30, 2023, 7:52am

I was hoping that more people would vote in the poll… but I guess there’s at least two people who would like to see this change so I guess I’ll make this change (with a quick option in the code to turn it off if we don’t like it after all).

Here’s the PR:

github.com/Revolutionary-Games/Thrive

Sorted translation files in alphabetical order

Revolutionary-Games:master ← Revolutionary-Games:sorted_translations

opened 07:51AM - 30 Jan 23 UTC

hhyyrylainen

+201994 -201993

**Brief Description of What This PR Does** https://forum.revolutionarygamesstud…io.com/t/replacing-the-translations-extracting-tool/979/5 **Progress Checklist** Note: before starting this checklist the PR should be marked as non-draft. - [x] PR author has checked that this PR works as intended and doesn't break existing features: https://wiki.revolutionarygamesstudio.com/wiki/Testing_Checklist (this is important as to not waste the time of Thrive team members reviewing this PR) - [ ] Initial code review passed (this and further items should not be checked by the PR author) - [ ] Functionality is confirmed working by another person (see above checklist link) - [ ] Final code review is passed and code conforms to the [styleguide](https://github.com/Revolutionary-Games/Thrive/blob/master/doc/style_guide.md). Before merging all CI jobs should finish on this PR without errors, if there are automatically detected style issues they should be fixed by the PR author. Merging must follow our [styleguide](https://github.com/Revolutionary-Games/Thrive/blob/master/doc/style_guide.md#git).

I opened an issue about creating a custom merge driver for the translation files to speed up solving merge conflicts:

github.com/Revolutionary-Games/Thrive

Create a custom merge driver for translation files that just accepts source file location changes

opened 07:11AM - 30 Jan 23 UTC

hhyyrylainen

difficult programming C# translation

as-is from the to be merged thing, but it should smartly preserve fuzzy markings… (I'm not exactly sure how that'd work, but at least a safe implementation would add the fuzzy marking if either side of the merge conflict has it). I found this example repo on how to set this up: https://github.com/Praqma/git-merge-driver