reddit hackernews mail facebook facebook linkedin

GitHub tools collection

This is the current thread in the bug hunter community: how to find sensitive informations on GitHub. Understand how to find tokens/keys/passwords on the largest code database in the world in order to pwn a company and get massive rewards. Using tools or doing it manually, some very talented people like Th3G3nt3lman are real wizards when it comes to discover such treasures. Many tools are now available but since I wanted to learn Python for a long time, I jumped at the opportunity. In this article I’m going to present some scripts I wrote in PHP and Python to fit my needs.

Get the whole everything here: https://github.com/gwen001/github-search

All tools who contact the GitHub API require at least one token to be able to perform multiple queries. The best way to deal with it is to create a single text file in the repository called .tokens with 1 token per line. Then all scripts will load this file. (ps: the more token you have the better it is).

memo.txt

First of all you better understand how GitHub search engine works. Here is a very quick resume of the Official GitHub documentation.

term in filename: term in:file
term in path: term in:path
from a user: user:USERNAME
from an org: org:ORGNAME
in project: repo:USERNAME/REPOSITORY

filepath: term path:app/config (subdir included)
tagged language: language:python
filesize: size:>10000 (10kb)
filename: filename:config
extension: extension:py

string exclusion: hello NOT world
qualifier exclusion: cats stars:>10 -language:javascript

Use quotation marks for queries with whitespace
cats NOT "hello world"
...

git-history.py

This script performs regexps on all repositories located in a specified folder and subfolders. But more it searches in the history of the repositories, not only the current version of the files. All commits are checked!

usage: git-history.py [-h] [-p PATH] [-d DATE] [-c LENGTH] [-s SEARCH]
                      [-t THREADS]

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  path to scan
  -d DATE, --date DATE  do no check commits prior this date, format: YYYY-MM-DD
  -c LENGTH, --length LENGTH
                        only check in first n characters
  -s SEARCH, --search SEARCH
                        term to search (regexp)
  -t THREADS, --threads THREADS
                        max threads, default: 10

On big repositories (or folders who contain multiple repositories), this operation can take a while. To make it faster, you can customize the maximum threads used, set a date limit and a size option.

Examples:

git-history.py -p ~/repos -s "AKIA.*"
git-history.py -p ~/repos -s ~/.gf/aws-keys.json -d 2019-01-01 -c 1000000

Note that the regexps file (option -s) can be a single regexp or a file using the format of the nice tool gf by TomNomNom.

git history python

git-history.sh

Same same but different. Strongly inspired by TomNomNom onliner published on Twitter, this small bash script also looks for patterns in the history of Git repositories.

git history bash

git-pillage.py

Inspired by gitpillage.sh. I wanted to make it faster and more verbose, it was also a nice way to pratice that exercise on Pentesterlab about Git directory structure.

usage: git-pillage.py [-h] [-u URL] [-t THREADS] [-e EXTENSION] [-x EXCLUDE]
                     [-v]

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     url of the .git, but without the .git directory
  -t THREADS, --threads THREADS
                        threads, default: 5
  -e EXTENSION, --extension EXTENSION
                        extensions to download separated by comma, overwrite
                        --exclude, default: all but default exclude
  -x EXCLUDE, --exclude EXCLUDE
                        extensions to exclude separated by comma, default: png
                        ,gif,jpg,jpeg,ico,svg,eot,otf,ttf,woff,woff2,css,less,
                        po,mo,mp3,mp4,mpeg,avi
  -v, --verbose         verbose mode, default: off

Examples:

git-pillage.py -u https://www.example.com -t 10 -v
git-pillage.py -u https://www.example.com/ -x pdf,doc,docx,txt,xls,xlsx
git-pillage.py -u https://www.example.com/htdocs/src/ -t 10 -e php,sh,py,rb,bat -v

gitpillage

github-contributors.py

This script returns some informations about the peoples listed as contributors of every repositories of an organization.

usage: github-contributors.py [-h] [-t TOKEN] [-o ORG] [-u USER]

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        auth token
  -o ORG, --org ORG     organization
  -u USER, --user USER  user

Example:

git-contributors.py -o snapchat
github contributors
github contributors

github-dorks.php

Performs dorks on GitHub for the users/organizations provided (list can be provided by separating users/orgs with comma). Dorks must be listed in a single text file. Results are not stored (could be an option?), only the number or results is displayed.

Usage: /opt/bin/github-dorks.php <o/u/n> <org/user> [<dork file>]

github dorks php

github-dorks.py

The Python version of the previous script. It’s supposed to be much faster (and more stable) because of the multi thread option, but unfortunately GitHub rate limit on search code is pretty low so use it carefully.

usage: github-dorks.py [-h] [-d DORKS] [-t TOKEN] [-o ORG] [-e THREADS]
                       [-u USER]

optional arguments:
  -h, --help            show this help message and exit
  -d DORKS, --dorks DORKS
                        dorks file
  -t TOKEN, --token TOKEN
                        auth token
  -o ORG, --org ORG     organization
  -e THREADS, --threads THREADS
                        maximum n threads
  -u USER, --user USER  user

github dorks python

github-employees.py

(try to) Find GitHub account of employees of a company through Google search and displays some basics informations about them.

2 mods are available for now:

  • github, Google dork is: site:github.com [term], nicknames are the ones returned
  • linkedin, Google dork is: site:linkedin.com/in [term], nicknames are generated by the script using firstname and lastname

Since the script uses the magnificient Goop from s0md3v, a facebook cookie is required to bypass Google rate limit (can be provided on the command line or environment variable).

usage: github-employees.py [-h] [-m MOD] [-s STARTPAGE] [-f FBCOOKIE]
                           [-t TERM] [-p PAGE] [-r RESUME] [-i INPUT]
                           [-k KEYWORD] [-o TOKEN]

optional arguments:
  -h, --help            show this help message and exit
  -m MOD, --mod MOD     module to use to search employees on google,
                        available: linkedin, github, default: github
  -s STARTPAGE, --startpage STARTPAGE
                        search start page, default 0
  -f FBCOOKIE, --fbcookie FBCOOKIE
                        your facebook cookie (or set env var FACEBOOK_COOKIE)
  -t TERM, --term TERM  term (usually company name)
  -p PAGE, --page PAGE  n page to grab, default 10
  -r RESUME, --resume RESUME
                        resume previous session
  -i INPUT, --input INPUT
                        input datas source saved from previous search
  -k KEYWORD, --keyword KEYWORD
                        github keyword to search
  -o TOKEN, --token TOKEN
                        github token

Examples:

github-employees.py -m github -p 30 -t "valvesoftware"
github-employees.py -f "fr=1dQvI2Lp7cTfjt; datr=6uqdXS6tsFY5CJ..." -m linkedin -o "1ec46a7..." -t "valvesoftware programmer" -t "valvesoftware admin"

github employees

github-endpoints.py

Improve you recon by searching endpoints on GitHub. Very useful, you can also get some extras subdomains. Relative urls can be displayed or not as well as external domains urls. Based on regexp, it also has an exclude list. Feed it as much as you can to filter the results the way you like!

usage: github-endpoints.py [-h] [-t TOKEN] [-d DOMAIN] [-e] [-a] [-r] [-s]

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        auth token (required)
  -d DOMAIN, --domain DOMAIN
                        domain you are looking for (required)
  -e, --extend          also look for <dummy>example.com
  -a, --all             displays urls of all other domains
  -r, --relative        also displays relative urls
  -s, --source          display urls where endpoints are found

github endpoints

github-grabrepo.php

Very simple script that clones all public repositories belonging to a given user/organization.

Usage: php github-grabrepo.php -o/-u <organization/user> [OPTIONS]

Options:
	-d	set destination directory (required)
	-o	set organization (org or user required)
	-u	set user (org or user required)
	-f	grab forked repositories as well

github grabrepo

github-search.php

Perform code search through GitHub API.

Usage: php github-search.php [OPTIONS]

Options:
	-c	set cookie session
	-e	file extension filter
	-f	looking for file
	-h	print this help
	-l	language filter
	-m	look for commit, -o and -p are required
	-n	no color
	-p	provide repository name
	-o	provide organization name
	-r	maximum number of results, default 100
	-s	search string
	-t	set authorization token (overwrite cookie, best option)

Examples:

	php github-search.php -o myorganization -s db_password
	php github-search.php -o myorganization -f wp-config.php -s db_password
	php github-search.php -c "user_session=B0KqycP8LlYORc-s3WFZoH71TG" -f wp-config -e php -r 1000
	php github-search.php -t 32a11e6f340c2fe1a6071795a3b1a8c876b3cf29 -l php -s DB_USERNAME
	php github-search.php -s "DB_USERNAME filename:wp-config extension:php" -n

github search

github-secrets.py

Find secrets deployed on GitHub.

usage: github-secrets.py [-h] [-t TOKEN] [-s SEARCH] [-e] [-r REGEXP] [-u] [-v]

options:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        your github token (required)
  -s SEARCH, --search SEARCH
                        search term you are looking for (required)
  -e, --extend          also look for <dummy>example.com
  -r REGEXP, --regexp REGEXP
                        regexp to search, default is SecLists secret-keywords list (can be a tomnomnom gf file)
  -u, --url             display only url
  -v, --verbose         verbose mode, for debugging purpose

Examples:

	python3 github-secrets.py -s "AWS_KEY filename:.env" -r "AKIA[A-Z0-9]{16}"
	python3 github-secrets.py -s "DB_PASSWORD filename:wp-config.php" -r "DB_PASSWORD',\s*'[^']{4,}"

github secrets

github-subdomains.py

Find additional subdomains on GitHub. Very useful during you recon phase, you will probably get some extras subdomains other tools didn’t find because not public.

usage: github-subdomains.py [-h] [-t TOKEN] [-d DOMAIN] [-e] [-s]

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        auth token (required)
  -d DOMAIN, --domain DOMAIN
                        domain you are looking for (required)
  -e, --extend          also look for <dummy>example.com
  -s, --source          display first url where subdomains are found

github subdomains

github-survey/index.php

Web page that displays GitHub search results of dorks set in a config file github-survey.json located in the same directory. This file also contains an exclude list, so results considered as useless are skipped. The exclude list is feeded through buttons in that same page.

github survey index

It’s also possible to mark a file, so his sha is set as the last-sha parameter in the config file (and later highlighted in red). This is very useful when combined with the next Python script.

github-survey.py

The first version of my GitHub survey script.

It performs GitHub search using the same config file mentionned above but does not exclude anything. An alert is sent to a Slack channel if the total_count is superior to the total_count of the previous run (it’s crontabed).

github survey python

github-survey2.py

Unfortunately I understood later that GitHub API is not so reliable. If you perform several searches in a row (same dork), using the API or the website, you’ll notice that the results counter varies. Because of that I got many many (too many) notifications on my Slack, mostly false positive.

According to this I wrote a second version of GitHub survey which is much more reliable.

First it uses the last-sha parameter to limit the search (search until…) then it filters the results with the exclude list feeded by the web page to decide if a notification should be send or not. Much better!

usage: github-survey2.py [-h] [-t TOKEN] [-p PAGE] [-c CONFIG]

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        auth token
  -p PAGE, --page PAGE  n max page
  -c CONFIG, --config CONFIG
                        config file, default: ~/.config/github-survey.json

github survey python

github-users.py

This script performs a user search using GitHub API and display some informations about them.

usage: github-users.py [-h] [-t TOKEN] [-k KEYWORD]

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        auth token
  -k KEYWORD, --keyword KEYWORD
                        keyword to search

github users

gsearch-reflog.sh

This script downloads a given repository and performs searches in his logs in order to find secrets.

Usage: ./gsearch-reflog.sh <repository url>

github reflog

Conclusion

You would probably think that all those scripts are useless. There is already so many tools to perform history search, secret discovery and so on… The main reason of all of this was to learn Python, it was part of my goal in 2019. I can now say that I am confortable with the basics, even if most of the code above is horrible (please, don’t judge, after syntax learning comes code organization).

I am not very familiar with GitHub issues (understand “it won’t be fixed soon”) so feel free to use those programs or not, fully rewrite them or partially or copy or update or whatever :)

External resources