Rewriting the repository history to remove large binary files

For years now I’ve wanted to clean up some really old mistakes in the Thrive git repository. Way back before the switch to subversion, large binary files were included directly in the Thrive repo. These haven’t been used for like 8+ years at this point but still they take up download bandwidth whenever Thrive repo is cloned.

What I’d like to do to solve this is run a git history cleanup to remove all the large binary files from history. I’d obviously create a new branch to preserve the original history so that nothing is lost, but the default branch would be cleaned.

This has the impact that all other branches will become unmergeable until rebased onto the cleaned branch (this is because the cleaning literally rewrites the repo history). And everyone who has cloned the repo will need to do force pulls / other cleanup on their local copies to stay up to date. That’s why I haven’t undertaken this effort yet, but I think I finally want to do this. Unless people really object to this I want to get this done this year.

In order to minimize disruptions, I’ll say now that I plan to do this on June 5th 2023 The disruption should at most be just a few hours as I lock weblate and merge latest changes before then performing the branch cleaning. I’ll warn all open PRs one week before that that there’s just one week to get merged to avoid having to do a rebase.


Seeing as no one complained, I’ve now marked this in my own calendar so that I remember to actually do this.

1 Like

Agree with this course of action, better now than never.

1 Like