Subdomain scanner made easy – with Python!

Table of Contents

Subomain scanner with python? Why?

Do you want to build your subdomain scanner with python? But why?

There are many reasons why you might want to develop your own subdomain scanner.

Maybe you’re a security researcher who wants to find vulnerabilities in websites.
Maybe you’re a penetration tester who needs to assess the security of a client’s website.
Maybe you’re just curious about how websites work and want to learn more about how to find vulnerabilities in them.
Whatever your reason, developing your own subdomain scanner it’s not so challenging but rewarding process. It requires a bit of knowledge about network and coding.
But the end result can be a powerful tool that can help you find vulnerabilities in websites and protect yourself and others from cyber attacks.

Prerequisites

To be able to write our own tool you don’t need much, you just need to:

install python3
install the “requests” library
find a file with a set of possible subdomains
get a working connection

If we imagine we are on a kali Linux virtual machine, we probably have everything already, but let’s see the terminal commands if we are on a ubuntu machine:

sudo apt install python3
sudo apt install python3-pip
pip3 install requests
pip3 install optparse

The installation of a progress bar is optional:

pip3 install progress

Now that we’ve installed everything we need, let’s get a list of possible domains that we’re going to browse, for the tutorial I’ll use subdomains-top1million-5000.txt in this repository SecListRepo, more precisely at this address:
SecListsFile.

Introduction

Now we are ready we want our program to take as input the main domain and a file containing the list of domains to be iterated. It might also be interesting to add the possibility of saving the output to a file. You can find a few versions of a subdomain scanner online, but they all turn out to be quite slow, so let’s try using threads to try to do something better, and perhaps with the possibility of passing as a parameter the number of threads we need. So let’s see how to use the optparse library to collect the arguments we need.

def get_args():

    parser = optparse.OptionParser()

    parser.add_option('-d', '--main', dest='domain_name',
                        help='The domain name', metavar='DOMAIN_NAME')
    parser.add_option("-i", "--input", dest="input_list",
                  help="read the list from INPUT_FILE", metavar="INPUT_FILE")
    
    parser.add_option("-f", "--file", dest="output_file", default="",
                  help="write report to FILE", metavar="FILE")
    
    parser.add_option("-t", "--threads", type=int, dest='n_threads', help="Set the number of threads", metavar="N_THREADS", default=1)
    return parser.parse_args()

As we can see, the get_args method takes as input the arguments we need:

-d the main domain
-i the input file
-f the output file, if we want to save everything to a file
-t the number of threads to launch

finally, it returns the arguments that will be used in the main.

Auxiliary methods

Before writing the whole main, let’s define the methods and global variables we might need:

q = queue.Queue()
bar = None

active_domains = []
lock = threading.Lock()

def from_file(filename):
    with open(filename, 'r') as f:
        subdomains = f.read().split('\n')
        return subdomains

This is the method that reads a list of subdomains given a file and returns it to the calling method, but now let’s look at something more interesting:

def check_subdomain(domain, sub):
    subdomain = f"http://{sub.strip()}.{domain}"
    try:
        requests.get(subdomain, timeout=2)
    except (requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout) as e:

        return False
     
    return True

The check_subdomain method first takes a domain and a subdomain as arguments. It then sends a get request through the requests library and waits 2 seconds if it doesn’t get an immediate response.
If the request is successful, the method returns True, otherwise, if it throws an exception, the return value is False.

The last auxiliary method is append_if_exists, which will be used to insert a subdomain in a global list of existing domains.

It also uses a lock in order to avoid concurrency’s errors

def append_if_exists(host, sub):
    
    if(check_subdomain(host, sub)):
        with lock:
            active_domains.append(f"{sub}.{host}")

Finally we have the get_active method.

def get_active():
    global q
    

    while True:
        i = q.get()

        append_if_exists( domain_name, i)
        
        bar.next()
        q.task_done()

This method iterates over a queue until it’s empty, being the queue common to all threads we want to avoid race conditions, even if they are unlikely and not very dangerous, so we can manage that using the class queue.Queue.
Inside the loop, the first thing the method does is popping the element,append the domain, update the bar and then notify the task done.

Put all together and create the python subdomain scanner

In the main we’ll put everything together, we’ll call all the defined methods, whose behaviour we already know.
The queue will contain all the subdomains from which the threads will take the next value to check, and active_domains will be a list in which each thread will insert positive results.
Into the for loop we create all threads, set the thread.daemon as True (the thread will end with the main) amd everyone will call the get_active method.
With t.start() we launch all threads and then wait for the queue’s emptying with q.join().

We will use a try-catch to be able to stop the scan using CTRL+C without losing the results.
And finally, we decide whether to print the input to the screen or save it to a file.
Having done everything, let’s see the main inside the complete code (working with a simple copy and paste for the lazy ones).

import requests
import threading
import time
import queue
from progress.bar import Bar
import optparse

q = queue.Queue()
bar = None

active_domains = []
lock = threading.Lock()

def from_file(filename):
    with open(filename, 'r') as f:
        subdomains = f.read().split('\n')
        return subdomains


def check_subdomain(domain, sub):
    subdomain = f"http://{sub.strip()}.{domain}"
    try:
        requests.get(subdomain, timeout=2)
    except (requests.exceptions.ConnectionError, requests.exceptions.ReadTimeout) as e:

        return False
    return True

def append_if_exists(host, sub):
    
    if(check_subdomain(host, sub)):
        with lock:
            active_domains.append(f"{sub}.{host}")
        

def get_args():

    parser = optparse.OptionParser()

    parser.add_option('-d', '--main', dest='domain_name',
                        help='The domain name', metavar='DOMAIN_NAME')
    parser.add_option("-i", "--input", dest="input_list",
                  help="read the list from INPUT_FILE", metavar="INPUT_FILE")
    
    parser.add_option("-f", "--file", dest="output_file", default="",
                  help="write report to FILE", metavar="FILE")
    
    parser.add_option("-t", "--threads", type=int, dest='n_threads', help="Set the number of threads", metavar="N_THREADS", default=12)
    return parser.parse_args()

def get_active():
    global q
    

    while True:
        i = q.get()

        append_if_exists( domain_name, i)
        
        bar.next()
        q.task_done()


if __name__ == "__main__":
    

    options, args = get_args()
    for s in from_file(options.input_list):
        q.put(s)
    

    bar = Bar("Subdomain scanning...", max=q.qsize())
    domain_name = options.domain_name


    try:
        pre_time = time.time()
        
        for i in range(options.n_threads):
            t = threading.Thread(target=get_active)
            t.daemon = True
            t.start()
            
        q.join()

    
    except KeyboardInterrupt:
        pass
        
    finally:

        if options.output_file:
            with open(options.output_file, 'w') as f:
                f.write("\n".join(active_domains))
        
        else:
            print("\n")
            for e in active_domains:
                print(e)
        
        print(f"\nFound {len(active_domains)} subdomains")
        print("Executed in %s seconds" % (time.time()-pre_time))

Now let’s suppose call the file main.py, this is how to use it:

python3 main.py -d <MAIN_DOMAIN> -i <SUBDOMAIN_INPUT_FILE> -f <OUTPUT_FILE> -t <THREAD_NUMBER>

# Example:

python3 main.py -d google.com -i subdomains.txt -f output.txt -t 30

Subdomain scanner made easy – with Python!

How to create network scanner tool in a few lines of code!

What is malware analysis and why is it important?

What is malware analysis and why is it important?

You might also like

Cryptographic Hash Functions in Python: Secure Your Data Easily

Malware Obfuscation Techniques: All That You Need To Know

How To Do Process Enumeration: An Alternative Way

How To Do DLL Injection: An In-Depth Cybersecurity Example

Process Injection By Example: The Complete Guide

How To Build Your Own: Python String Analysis for Malware Insights

StackZero

Welcome Back!

Retrieve your password

Subdomain scanner made easy – with Python!

Subomain scanner with python? Why?

Prerequisites

Introduction

Auxiliary methods

Put all together and create the python subdomain scanner

Further readings

How to create network scanner tool in a few lines of code!

What is malware analysis and why is it important?

What is malware analysis and why is it important?

You might also like

Cryptographic Hash Functions in Python: Secure Your Data Easily

Malware Obfuscation Techniques: All That You Need To Know

How To Do Process Enumeration: An Alternative Way

How To Do DLL Injection: An In-Depth Cybersecurity Example

Process Injection By Example: The Complete Guide

How To Build Your Own: Python String Analysis for Malware Insights

StackZero

Tags

Welcome Back!

Retrieve your password