Replacing the translations extracting tool

So we’ve had a python based translations extraction tool for basically since we added translation support to the game. It works pretty well, but there’s now a need to modify the behaviour in a pretty complex manner. So far the maintenance I’ve done on that tool has been pretty basic python code, but that change would require a bit of a deeper understanding of the tool.

Also there’s been some people who have failed to got the python based tool running. Though, I don’t find it particularly complicated to work with, even with a venv setup, but that might be because I have multiple years of professional python experience, but I do get that an extra dependency on an entirely different programming language is not that great. So not having to use python for that one task would be pretty nice.

Those two points basically are leading me to be in favour of just writing our own custom text extraction tool in C#. That would make it easier to maintain (for people who can’t read undocumented and very data dependent python code) and easier to run. I’m right now needing to update some of our code and CI build environment for dotnet 7, so I think I might see if this is a big task and do it at the same time if it seems feasible to do in a couple of hours.

1 Like

I managed to do this, it wasn’t super hard but did take quite a few hours:

This is a bit unrelated to this thread’s purpose but still about i18n which I’ve been thinking for a while now. As somebody who have worked on many PRs involving localization, there are a couple gripes I have with our current localization system and some possible ideas for streamlining it.

I find our current way of adding new localizable strings a little too convoluted, specifically the way we have to manually fill new entries in en.po after extracting SOURCE_STRING texts (the translation identifiers). I’ve seen how Godot does localization, it seems they don’t even have an en.po file, the source string for English is the actual string itself! I think this is far more efficient and the benefit to this method is that we don’t have to do double work when adding new texts on the UI.

The next thing that bothers me is having to update the localization in every PR that touches localizable texts, this not only bloat many PRs but is the source of many merge conflicts to date, I know that this is to self-enforce up-to-date localizations and make a maintainer’s live easier but it gets tiring for PR makers. I think we could optimize a lot of this work by just doing the update maybe once every week, maybe have a bot that do that which also detects localization changes so it wouldn’t open redundant PRs. This kind of relates to the first point in that no manual en string addition will be needed.

This would be a massive overhaul and something would need to be done with existing translations in Weblate so I don’t expect something like this would be worked on anytime soon, but those were key points I feel could be improved upon as a regular contributor. Overall I think a refactor of the current system will be a worthy investment.

1 Like

Problem is that when typos are found in the English text, they cannot be fixed easily. Instead the typo will need to be fixed by a programmer who can run the text update code.

I made the concious decision of using translation keys instead of the raw text when setting up the translation system. This decouples the English text from being the translation key which has various benefits like that not messing with other texts when the English text just had a typo corrected.

PRs that don’t add the text content they change, are by definition broken.
We already don’t require fully up to date translation files (the source text locations can be incorrect in the PR).

One thing I put in as a TODO comment to the text extractor is that it would be possible to alphabetically sort the translation strings, that way only people who add or remove translations need to run the translation update.

Alphabetically sorting the translations is pretty easy to add.

Moving on from having English not be editable on Weblate would be a huge change for the translation workflow and as I said above that would mean that people can no longer suggest typo fixes for English or better wording. And changes like Rename suicide button related English text to “perish instantly” couldn’t be done easily like that just on Weblate. Tying together the technical implementation and what is shown to the user is often a bad idea, as not doing it gives you a lot of flexibility.

1 Like

I guess we could have a poll on alphabetically sorting the translations.

Here’s the pros:

  • Moving code around to different files or adding new elements that use existing translations don’t need to run the localization script
  • It might be slightly less likely for there to be merge conflicts as changes are perhaps more likely to be more spread out in the localization files

And the cons:

  • The order of translations as shown to new translators on weblate will be slightly less intuitive, though even now the order is not exactly thought out that well and might not be a big deal after all
Should translation files be alphabetically sorted?
  • Yes
  • No

0 voters

  • It might be slightly less likely for there to be merge conflicts as changes are perhaps more likely to be more spread out in the localization files

As far as I am concerned, most of the merge conflicts are because of line changes in the comment (which can be ignored). This kind of merge conflicts cannot be solved my sorting things alphabetically.

However, some conflicts resulting from re-ordering will be easier to solve. (Which, before, was a big headache.) So I strongly Support this suggestion.


Also with translations sorted alphabetically, I suggest to add prefixes to more translation keys that are in a certain group. This way, keys with similar meaning will be grouped together (and more intuitive).

Certain things that aren’t overly specific, shouldn’t have too specific prefixes, though I guess something like “CHOICE_YES” or something like that could be done with prefixes to have general enough prefixes that it wouldn’t be a problem.

Related to this I’ve been thinking for some time now that we’d likely be able to make a custom merge script that automatically overwrites local source line location changes in translation files.

With a quick search I found this example showing how to add custom merging for a file type:

I was hoping that more people would vote in the poll… but I guess there’s at least two people who would like to see this change so I guess I’ll make this change (with a quick option in the code to turn it off if we don’t like it after all).

Here’s the PR:

I opened an issue about creating a custom merge driver for the translation files to speed up solving merge conflicts: