Disclaimer: This article is for educational purposes only. Make backups, think & plan!
You will lose data if you're not careful. I am not responsible for your actions. Follow these steps at your own risk.

I am not the author of the tool featured in this article so if you find a bug submit a ticket at the tool's repo.

Sooner or later you'll commit something to your git repo that shouldn't be there.
For example ssh keys, passwords, or other sensitive data (e.g. your ex's photos).

Now, you want the data gone for good.

Your options are:
1. Create a brand new git repo and delete all copies of the old one and clone all fresh
2. Use BFG Repo-Cleaner
3. Use Git filter-branch (I haven't used this approach)

If you have a large team and servers that already use a specific repository it maybe time consuming to change it on every single machine but it's still an option.
In this article we'll be using BFG Repo-Cleaner.

Keep in mind as of the time of the writing you can delete files or folders that match a regular expression.
That is to say that unfortunately, you can't delete a specific folder or file. e.g. /data/my_pics/
If you try it you'll get this error: BFG aborting: No refs to update - no dirty commits found??

See issue #12 https://github.com/rtyley/bfg-repo-cleaner/issues/12

In my case I had named the files and folders descriptively so that helped me a lot.
It guess people mostly use BFG Repo-Cleaner to remove videos and other large files from the repo that have a certain extension.

Prep work

You need to have java (or JRE) installed.
So running java -version should produce some output like this:

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) Client VM (build 25.201-b09, mixed mode)

Clone the repo.

We need to clone the repo using --mirror option. Don't just run the bfg commands within the currently cloned folder.

git clone --mirror git://example.com/some-big-repo.git

Download the latest version of bfg from https://rtyley.github.io/bfg-repo-cleaner/
or if you're feeling adventurous you can download the master branch: https://github.com/rtyley/bfg-repo-cleaner/archive/master.zip

Here are the steps how to remove sensitive files from git

How to remove folders

remove the files from git and push.

java -jar bfg-1.13.0.jar --delete-folders "some_stupid_folder_name" some-big-repo.git

How to remove files

java -jar bfg-1.13.0.jar --delete-files my_stupid_file.php some-big-repo.git

Do some clean up

cd some-big-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive

Inspect repo. That's at least most tutorials suggest.

Push this to the world.
git push

As a result you'll still see your commits but they will be empty.

Next, clone the repo and branches again

 

What are all bfg options?

bfg 1.13.0
Usage: bfg [options] [<repo>]

-b, --strip-blobs-bigger-than <size>
strip blobs bigger than X (eg '128K', '1M', etc)
-B, --strip-biggest-blobs NUM
strip the top NUM biggest blobs
-bi, --strip-blobs-with-ids <blob-ids-file>
strip blobs with the specified Git object ids
-D, --delete-files <glob>
delete files with the specified names (eg '*.class', '*.{txt,log}' - matches on file name, not path within repo)
--delete-folders <glob> delete folders with the specified names (eg '.svn', '*-tmp' - matches on folder name, not path within repo)
--convert-to-git-lfs <value>
extract files with the specified names (eg '*.zip' or '*.mp4') into Git LFS
-rt, --replace-text <expressions-file>
filter content of files, replacing matched text. Match expressions should be listed in the file, one expression per line - by default, each expression is treated as a literal, but 'regex:' & 'glob:' prefixes are supported, with '==>' to specify a replacement string other than the default of '***REMOVED***'.
-fi, --filter-content-including <glob>
do file-content filtering on files that match the specified expression (eg '*.{txt,properties}')
-fe, --filter-content-excluding <glob>
don't do file-content filtering on files that match the specified expression (eg '*.{xml,pdf}')
-fs, --filter-content-size-threshold <size>
only do file-content filtering on files smaller than <size> (default is 1048576 bytes)
-p, --protect-blobs-from <refs>
protect blobs that appear in the most recent versions of the specified refs (default is 'HEAD')
--no-blob-protection allow the BFG to modify even your *latest* commit. Not recommended: you should have already ensured your latest commit is clean.
--private treat this repo-rewrite as removing private data (for example: omit old commit ids from commit messages)
--massive-non-file-objects-sized-up-to <size>
increase memory usage to handle over-size Commits, Tags, and Trees that are up to X in size (eg '10M')
<repo> file path for Git repository to clean

Related

 

Referral Note: When you purchase through an referral link (if any) on this page, we may earn a commission.
If you're feeling thankful, you can buy me a coffee or a beer