In addition to the basic set of Google directives covered in a previous post, Google “hacking” can do even more for penetration testers. Johnny Long has literally written the book on it–Google Hacking for Penetration Testers–and has also presented a number of times, including at DEFCON.
This post covers the talk video he gave at DEFCON 13. You can watch it here, or keep reading for the overview 🙂
Advanced Operators
This part covers the set of Google directives covered in a previous post. The format is:
operator:argument
Or as described in a previous post:
directive:argument
He has a helpful chart to cover all of the operators.
Google “Hacking”
Why call it “hacking”? The word “hack” has been co-opted for a number of other purposes, meaning either DIY things, or clever ideas (“lifehacking”). To clear this up, Johnny shows an example where he uses the following query to find the backend page of an ecommerce site:
inurl:admin inurl:orders filetype:php
In other words, Google “hacking” is using Google to do things it wasn’t necessarily meant to do, and to find things that aren’t necessarily public. There are also a number of examples towards the end of the talk (and the end of this post) that fall into the infosec “hacking” category.
Google Dorks and GDDS
A phase of his own invention, in his own words:
Google Dorks: foolish or inept person as revealed by Google
Google has started blocking malicious queries, which he calls the “Google Dork Detection System.” For example, the following query is blocked, as it reveals admin pages:
inurl:admin.php
Although, for what it’s worth, I was not able to recreate the Blocked Query page from his presentation (the talk is from 2005). He then shows a number of ways that he was able to get around this rule, for example:
- Changing the case: admin.PHP, etc.
- Query shifting: inurl:admin.php -> filetype:php and phrase ‘admin’
- Space / + injection ‘pow ered by PHP VB’
- Using curl with a user-agent (he shows it working with an illegitimate user-agent)
Again, this talk is from 2005 so your mileage may vary.
Trolling for email addresses
If you’re looking for an email address (spam bot, etc.), you _could _search for an email address (name@example.com) but Google strips out the “@” sign.
To get around this, he uses lynx to pull down results:
lynx -dump http://www.google.com/search?q=@gmail.com > test.html
Then, he uses a regex to parse the file and strip out email addresses. He also shows a list of emails found in a spreadsheet aptly named “emails.xlsx” (using a filetype directive).
Network Mapping
Next, he describes how to map a network using Google, which (if done right), allows you to send very few packets to the targets. His process is:
- Get list of targets
- Find related (outside domain) targets
- Expand the list of targets
- Verify potential target (see if it’s real)
- Discover related targets
- Vitality scan (see if targets are alive)
Get a list of targets
Using a query like site:microsoft.com
is a start, but doesn’t get you too far… only very obvious results. Instead, you can use a negative search to remove the obvious (and unwanted results), i.e.:
site:microsoft.com -site:www.microsoft.com
He then repeats this, subtracting more unwanted results to get more interesting subdomains. He takes this a step further by using lynx to automate the search and use regexes to make the search even more efficient. Note: Google frowns on automated crawling.
There are also “Googleturds”, which are Google queries that are broken, yet still produce results, like site:nasa
instead of site:nasa.gov
.
Finding related targets
If you link to a company (like Microsoft), it doesn’t necessarily mean you’re affiliated with them. But if they link to you, that means something.
You can use the link
operator to find websites that link to a given site:
link:microsoft.com
However, link
doesn’t necessarily play well with other operators.
Sensepost has a tool called BiLE: Bi-directional Link Extractor that can help automate this process.
Expanding the target list
If there’s an apples.foo.com
, is there an oranges.foo.com
too? You can use Google Sets to find more things in a set, even if you don’t necessarily know much about the category. You can also use this to find hosts that Google doesn’t “know” about.
Unfortunately, Google Sets has been discontinued but you can use an alternative like Word Grab Bag. You can then Google each pair (“apple orange” “apple pear” etc) to figure out which ones are most closely related.
Johnny uses this method to find subdomain names and expand the target list. Then, he sends a lookup to each one to see if it’s real or not.
Vitality scanning
To scan without sending any packets, Johnny goes through a number of queries to find sites that will:
- Allow you to send an email to someone, from that server
- Run the finger command
- PHP ping
- Port scan the target
And so on, on your behalf, without any packets being sent from you to the target.
Again, the talk is 13 years old, so your mileage may vary.
Viewing cached websites (text only)
To view a text-only version of the website (that doesn’t load live images), you can add the following string to the end of a cached URL link:
[cached_link]&strip=1
Additionally, there’s a Firefox plug-in that does this for you.
Cache Sliding
Google shows a snippet of preview text for each search result. You can use cache “sliding” to read through the entire website’s text entirely through the snippet.
Throw words from the snippet on the tail end of the query, but keep updating those words to get further and further into the text via the snippet.
“Fireworks”
Johnny shares a number of examples of things that probably shouldn’t be public. View the list on page 111 (and onward) in this PDF from BlackHat. On the list: print servers, webcams, routers, admin setup screens (!!), firewalls, VPN logins, etc… all found with Google queries using the techniques discussed in the talk.
Google Hacking Database (GHDB)
If you want to see more Google operators, you can check out Exploit Database’s Google Hacking section.
The “Google Hacking Database (GHDB)” is a categorized index of Internet search engine queries designed to uncover interesting, and usually sensitive, information made publicly available on the Internet.
In addition to Google, there are also interesting queries to use on other search engines. As the pages returned by these queries are likely not meant to be public, approach with caution, don’t use this information for nefarious purposes, keep your activities legal, etc.