Reconnaissance: the What, Why and How of Information Gathering

In his book The Basics of Hacking and Penetration Testing, Engebretson lays out 4 different stages of testing:  reconnaissance, scanning, exploitation, and post-exploitation / maintaining access.  I’ve written a number of posts covering the tools and techniques used in the reconnaissance phase.

Reconnaissance is information gathering.  Given a target, you need to know as much as you can about them and the network(s) they use.  That means lots and lots of research.  So how do you go from knowing practically nothing about a company or organization to knowing enough to gain access or control of their systems?

_Note:  I’ll copy Engebretson’s multiple warnings here.  Don’t conduct reconnaissance or use tools on targets without authorization, don’t do bad things, etc. _

Goals

The goals for the reconnaissance step are to:

  1. Gather as much information as possible about the target

  2. Sort through all the information gathered and create a list of attackable IP addresses or uniforn resource locators (URLs)

There are two types of reconnaissance, active and passive.

Active reconnaissance techniques involve interacting directly with the target, which means that you might be detected.

Passive reconnaissance techniques don’t involve direct interaction, meaning that the target cannot detect our activity.

OSINT

OSINT, or open source intelligence, is gathering publicly available information about a target.  Most companies have a fairly high web presence, are on social networks, press releases, and so on.

The target’s website (HTTrack)

One great place to learn about a target would be… their website.  You can make an offline-copy of their website using HTTrack.  For more in-depth information, check out this blog post.

Particular points of interest might include:

Google hacking

Once you know something about the target, you can use Google to find more information elsewhere (passively).  One tool to really boost your efficiency is Google “hacking.”  Google hacking is the use of Google directives/operators to find things that aren’t necessarily meant to be public.

For more information, see my posts about basic Google hacking, and an overview of Johnny Long’s “Google Hacking for Penetration Testers” DEFCON talk.

The Harvester

You can use Google to find email addresses, but if that sounds like too much work, you’re in luck  The Harvester is a Python script written by Christian Martorella that automates the process and does the work for you.  More details here.

Domain Name Servers (DNS) are responsible for mapping domain names to IP addresses, among other useful target information.

WHOIS

You can use Whois to find the DNS servers for a given website.  Whois also returns some information about a website’s owners/registrars, but the GDPR is making that less useful.  More info here.  

Netcraft

Netcraft will find you DNS information about your target, along with handy info like what type of server they’re running.

Host

Host is a command line utility that maps from a DNS server’s host name to an IP address.

NSLookup

NSLookup will query DNS servers and find relevant DNS records for a given record type (MX/email, etc.).

Dig

Zone transfers, which copy DNS records from one server to another, aren’t too common… but that doesn’t mean you can’t try.  You can use dig to attempt a zone transfer to learn more about a target.

Fierce

If your zone transfer fails, you can use Fierce, which is a Perl script that locates contiguous IP addresses and hostnames related to a given domain.

Once you have information about an email server, you can try to send an email and get it rejected as a way of learning more about the target’s defenses.

Other tools

MetaGooFil

MetaGooFil is a python script that automates a lot of Google hacking steps.  It searches the web for PDF and MS Office files related to a given domain, then strips out and reports relevant information.  This might include user names, file locations, and so on.

Social engineering

For those who are quick on their feet, social engineering is a great tool.  Social engineering is exploiting human weakness, which is typically the weakest part of any ‘technical’ system.

Review

Once you’ve got a pile of information, it’s time to make sense of it.  By the end of reconnaissance,  you’ll have a list of IP addresses that belong to, server or are related to the target.  You’ll probably need to cross some addresses off your list, depending on what falls inside of your authorized scope.

You’ll also have a list of email addresses, host names, and so on.

Yay!  Up next, scanning!