This is the current thread in the bug hunter community: how to find sensitive informations on GitHub. Understand how to find tokens/keys/passwords on the largest code database in the world in order to pwn a company and get massive rewards. Using tools or doing it manually, some very talented people like Th3G3nt3lman are real wizards when it comes to discover such treasures. Many tools are now available but since I wanted to learn Python for a long time, I jumped at the opportunity. In this article I’m going to present some scripts I wrote in PHP and Python to fit my needs.
Get the whole everything here: https://github.com/gwen001/github-search
All tools who contact the GitHub API require at least one token to be able to perform multiple queries.
The best way to deal with it is to create a single text file in the repository called
.tokens with 1 token per line.
Then all scripts will load this file. (ps: the more token you have the better it is).
First of all you better understand how GitHub search engine works. Here is a very quick resume of the Official GitHub documentation.
This script performs regexps on all repositories located in a specified folder and subfolders. But more it searches in the history of the repositories, not only the current version of the files. All commits are checked!
usage: git-history.py [-h] [-p PATH] [-d DATE] [-c LENGTH] [-s SEARCH] [-t THREADS] optional arguments: -h, --help show this help message and exit -p PATH, --path PATH path to scan -d DATE, --date DATE do no check commits prior this date, format: YYYY-MM-DD -c LENGTH, --length LENGTH only check in first n characters -s SEARCH, --search SEARCH term to search (regexp) -t THREADS, --threads THREADS max threads, default: 10
On big repositories (or folders who contain multiple repositories), this operation can take a while. To make it faster, you can customize the maximum threads used, set a date limit and a size option.
git-history.py -p ~/repos -s "AKIA.*" git-history.py -p ~/repos -s ~/.gf/aws-keys.json -d 2019-01-01 -c 1000000
Same same but different. Strongly inspired by TomNomNom onliner published on Twitter, this small bash script also looks for patterns in the history of Git repositories.
usage: git-pillage.py [-h] [-u URL] [-t THREADS] [-e EXTENSION] [-x EXCLUDE] [-v] optional arguments: -h, --help show this help message and exit -u URL, --url URL url of the .git, but without the .git directory -t THREADS, --threads THREADS threads, default: 5 -e EXTENSION, --extension EXTENSION extensions to download separated by comma, overwrite --exclude, default: all but default exclude -x EXCLUDE, --exclude EXCLUDE extensions to exclude separated by comma, default: png ,gif,jpg,jpeg,ico,svg,eot,otf,ttf,woff,woff2,css,less, po,mo,mp3,mp4,mpeg,avi -v, --verbose verbose mode, default: off
git-pillage.py -u https://www.example.com -t 10 -v git-pillage.py -u https://www.example.com/ -x pdf,doc,docx,txt,xls,xlsx git-pillage.py -u https://www.example.com/htdocs/src/ -t 10 -e php,sh,py,rb,bat -v
This script returns some informations about the peoples listed as contributors of every repositories of an organization.
usage: github-contributors.py [-h] [-t TOKEN] [-o ORG] [-u USER] optional arguments: -h, --help show this help message and exit -t TOKEN, --token TOKEN auth token -o ORG, --org ORG organization -u USER, --user USER user
git-contributors.py -o snapchat
Performs dorks on GitHub for the users/organizations provided (list can be provided by separating users/orgs with comma). Dorks must be listed in a single text file. Results are not stored (could be an option?), only the number or results is displayed.
Usage: /opt/bin/github-dorks.php <o/u/n> <org/user> [<dork file>]
The Python version of the previous script. It’s supposed to be much faster (and more stable) because of the multi thread option, but unfortunately GitHub rate limit on search code is pretty low so use it carefully.
usage: github-dorks.py [-h] [-d DORKS] [-t TOKEN] [-o ORG] [-e THREADS] [-u USER] optional arguments: -h, --help show this help message and exit -d DORKS, --dorks DORKS dorks file -t TOKEN, --token TOKEN auth token -o ORG, --org ORG organization -e THREADS, --threads THREADS maximum n threads -u USER, --user USER user
(try to) Find GitHub account of employees of a company through Google search and displays some basics informations about them.
2 mods are available for now:
- github, Google dork is:
site:github.com [term], nicknames are the ones returned
- linkedin, Google dork is:
site:linkedin.com/in [term], nicknames are generated by the script using firstname and lastname
usage: github-employees.py [-h] [-m MOD] [-s STARTPAGE] [-f FBCOOKIE] [-t TERM] [-p PAGE] [-r RESUME] [-i INPUT] [-k KEYWORD] [-o TOKEN] optional arguments: -h, --help show this help message and exit -m MOD, --mod MOD module to use to search employees on google, available: linkedin, github, default: github -s STARTPAGE, --startpage STARTPAGE search start page, default 0 -f FBCOOKIE, --fbcookie FBCOOKIE your facebook cookie (or set env var FACEBOOK_COOKIE) -t TERM, --term TERM term (usually company name) -p PAGE, --page PAGE n page to grab, default 10 -r RESUME, --resume RESUME resume previous session -i INPUT, --input INPUT input datas source saved from previous search -k KEYWORD, --keyword KEYWORD github keyword to search -o TOKEN, --token TOKEN github token
github-employees.py -m github -p 30 -t "valvesoftware" github-employees.py -f "fr=1dQvI2Lp7cTfjt; datr=6uqdXS6tsFY5CJ..." -m linkedin -o "1ec46a7..." -t "valvesoftware programmer" -t "valvesoftware admin"
Very simple script that clones all public repositories belonging to a given user/organization.
Usage: php github-grabrepo.php -o/-u <organization/user> [OPTIONS] Options: -d set destination directory (required) -o set organization (org or user required) -u set user (org or user required) -f grab forked repositories as well
Perform code search through GitHub API.
Usage: php github-search.php [OPTIONS] Options: -c set cookie session -e file extension filter -f looking for file -h print this help -l language filter -m look for commit, -o and -p are required -n no color -p provide repository name -o provide organization name -r maximum number of results, default 100 -s search string -t set authorization token (overwrite cookie, best option)
php github-search.php -o myorganization -s db_password php github-search.php -o myorganization -f wp-config.php -s db_password php github-search.php -c "user_session=B0KqycP8LlYORc-s3WFZoH71TG" -f wp-config -e php -r 1000 php github-search.php -t 32a11e6f340c2fe1a6071795a3b1a8c876b3cf29 -l php -s DB_USERNAME php github-search.php -s "DB_USERNAME filename:wp-config extension:php" -n
Web page that displays GitHub search results of dorks set in a config file
github-survey.json located in the same directory.
This file also contains an exclude list, so results considered as useless are skipped.
The exclude list is feeded through buttons in that same page.
It’s also possible to mark a file, so his
sha is set as the
last-sha parameter in the config file (and later highlighted in red).
This is very useful when combined with the next Python script.
The first version of my GitHub survey script.
It performs GitHub search using the same config file mentionned above but does not exclude anything.
An alert is sent to a Slack channel if the
total_count is superior to the
total_count of the previous run (it’s crontabed).
Unfortunately I understood later that GitHub API is not so reliable. If you perform several searches in a row (same dork), using the API or the website, you’ll notice that the results counter varies. Because of that I got many many (too many) notifications on my Slack, mostly false positive.
According to this I wrote a second version of GitHub survey which is much more reliable.
First it uses the
last-sha parameter to limit the search (search until…) then it filters the results with the exclude list feeded by the web page to decide if a notification should be send or not. Much better!
usage: github-survey2.py [-h] [-t TOKEN] [-p PAGE] [-c CONFIG] optional arguments: -h, --help show this help message and exit -t TOKEN, --token TOKEN auth token -p PAGE, --page PAGE n max page -c CONFIG, --config CONFIG config file, default: ~/.config/github-survey.json
This script performs a user search using GitHub API and display some informations about them.
usage: github-users.py [-h] [-t TOKEN] [-k KEYWORD] optional arguments: -h, --help show this help message and exit -t TOKEN, --token TOKEN auth token -k KEYWORD, --keyword KEYWORD keyword to search
You would probably think that all those scripts are useless. There is already so many tools to perform history search, secret discovery and so on… The main reason of all of this was to learn Python, it was part of my goal in 2019. I can now say that I am confortable with the basics, even if most of the code above is horrible (please, don’t judge, after syntax learning comes code organization).
I am not very familiar with GitHub issues (understand “it won’t be fixed soon”) so feel free to use those programs or not, fully rewrite them or partially or copy or update or whatever :)