An Uncommon OSINT way to Juicy Files

OSINT via URL Shortener Service(s) and Public Google Docs

One of the most important parts of pentesting & bug bounty process is to perform a solid OSINT phase. In this article, I will be covering an uncommon OSINT technique that let me find a few juicy files with confidential information in them. The following keywords are the main topics of this article: URL Shortener Services, Google Docs

URL Shortener Services

Url shortener services are SaaS services that are used for creating short URLs that redirect you to long complex ones. Mostly they are used for cosmetic purposes and readability. A few examples of shortened urls can be found below:

bit.ly/{urlcode} goo.gl/{urlcode}

The problem starts when people start to use URL Shortener services with secret links such as Google documents with credentials or confidential financial data. As it sounds, if we can find which shortened URL is pointing to a potentially confidential document we can achieve our goal. But of course, in order to detect such links, we need to have a list of shortened URLs. Additional to that, even if we have a list of shortened URLs, there could be thousands of documents, so we need to find a way to filter out the results. So let's dive the challenges into the pieces.

Challenge #1: Getting a List of Shortened URLs

As it sounds, URL shortener services generate really short nice looking URLs. This means that, as long as we know which URL shortener service we want to target, a brute-force attack can be conducted against {urlcode} part to detect if the links are valid. Luckily, there is already a tool that publicly exists and performing the same plan against the most common URL shortener services for years. That tool is from an amazing team A.K.A UrlTeam.

Solution #1

https://tracker.archiveteam.org:1338/status

URL team

Moreover, for the sake of further automation, there is a tool that exists on GitHub called urlhunter that retrieves data from URL Team. urlhunter can be used for detecting long versions of URLs with specific keywords from UrlTeams' archives.

By using the power of urlhunter and URLteam we can retrieve a list of Google Doc URLs from URL shortener services.

Url hunter

As a result of the scan above, 50249 Google doc links have been found from Shortened URLs. The second challenge is, how to detect which links from the list have specific keywords in them according to our interests.

Challenge #2: Detecting URLs with Specific Keywords

In order to solve this challenge, I've written a simple python script that sends an HTTP request to the links and checks keywords in response messages.

Solution #2

from concurrent.futures import ThreadPoolExecutor
from numpy import loadtxt
import argparse
import urllib.request
from tqdm import tqdm

parser=argparse.ArgumentParser(description='''Before using this script, use urlhunter tool to grab links from shortener services. https://github.com/utkusen/urlhunter ''')
parser.add_argument('--urls', type=str, default="", help='List of urls', required=True)
parser.add_argument('--keywords', nargs='+', type=str, help='Keywords to search. Space seperated for multiple keywords', required=True)
parser.add_argument('--output', type=str, default=0, help = 'Output file name', required=True)
parser.add_argument('--threads', type=int, default=50, help='Thread number. Default 50')
args=parser.parse_args()

#Function to log positive results.
def log_results(url, file_name):
    f = open(file_name, "a")
    f.write(url + "\n")
    f.close()

#Main function that checks urls for spesific keywords.
def keyword_check(url, keywords, pbar, file_name):
    try:
        r = urllib.request.urlopen(url, timeout=60) 
        response = r.read().decode('utf-8')
        if any(keyword in response.lower() for keyword in keywords):
            log_results(url, file_name)
            pbar.update(1) #updates progress bar by incrementing by 1.
            return True
        else:
            pbar.update(1)
            return False
    except:
        pbar.update(1)
        return False

def main():
    f = open(args.output, "w") #creates output file.
    urls = loadtxt(args.urls, comments="#", delimiter="\t", unpack=False, dtype=str) #loads the txt into a list.
    with tqdm(total=len(urls)) as pbar: #progress bar.
        with ThreadPoolExecutor(max_workers=int(args.threads)) as executor: #multi threading stuff starts from here.
            [executor.submit(keyword_check, url, args.keywords, pbar, args.output) for url in urls] 

    print("Script completed.\nCheck the the {} file".format(args.output))
    return 
if __name__ == "__main__":
    main()

Results

By combining the urlhunter and the script I've written, I was able to find a lot of google docs with passwords in them.

Last updated

Was this helpful?