Ticker

6/recent/ticker-posts

Katana : The Powerful tool for Web Crawling Quick Guide



In this article, we will explore how to effectively use the Katana tool to maximize its potential and achieve better results in web application security scanning.

Crawling is a crucial aspect of web application security because it allows you to explore and discover areas of a web application that might not be immediately visible or well-documented.

You may also like to read: How Cybersecurity Companies Make Money 

    What We’re Going to Read

    • What is Katana
    • Features 
    • Installation
    • Usages

    Katana is an open-source tool written in Go, designed for efficient sub-domain enumeration and security scanning. With its speed, simplicity, and performance capabilities, Katana stands out as a go-to tool for discovering and verifying live sub-domains. This is essential for web application security audits, as attackers often exploit forgotten or misconfigured sub-domains that businesses may overlook.

    Features

    • Fast and fully configurable web crawling: Katana offers high-speed web crawling with extensive customization options to adapt to specific requirements.
    • Standard and Headless mode: It supports both traditional and headless crawling, allowing interaction with dynamic web pages through JavaScript.
    • Active and Passive mode: You can choose between active crawling, which interacts with the web application, or passive crawling, which only observes.
    • JavaScript parsing / crawling: Katana is capable of parsing and crawling JavaScript content, ensuring that dynamic parts of the web application are covered.
    • Customizable automatic form filling: It provides automatic form filling with customizable settings to simulate user input during the crawl.
    • Scope control - Preconfigured field / Regex: Katana allows precise control over the crawl scope using preconfigured fields or regular expressions to target specific parts of the application.
    • Customizable output - Preconfigured fields: The tool enables you to define which data fields are extracted, making the output highly customizable.
    • INPUT - STDIN, URL and LIST: Katana accepts input from standard input, individual URLs, or a list of URLs for bulk processing.
    • OUTPUT - STDOUT, FILE and JSON: It supports output to standard output, files, or JSON format, giving flexibility in how you handle and store results.

    Installation

    Install using Go :

    go install github.com/projectdiscovery/katana/cmd/katana@latest

    Install using docker :

    To install / update docker to latest tag -

    docker pull projectdiscovery/katana:latest

    To run katana in standard mode using docker -

    docker run projectdiscovery/katana:latest -u https://tesla.com

    To run katana in headless mode using docker -

    docker run projectdiscovery/katana:latest -u https://tesla.com -system-chrome -headless

     In Ubuntu :

    sudo apt update
    sudo snap refresh
    sudo apt install zip curl wget git
    sudo snap install golang --classic
    wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - 
    sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
    sudo apt update 
    sudo apt install google-chrome-stable

     Usages

    Run -h for Help Cmd 

    katana -h


     There are four different ways to give katana input:

    1. URL input

    katana -u https://example.com

    2. Multiple URL input

    katana -u https://example.com,https://example2.com

    3. List input

    katana -list url_list.txt

    4. STDIN input (piped)

    echo “https://example.com” | katana

    Crawling Mode 

     Standard Mode: Uses Go's HTTP library for fast requests without browser overhead. However, it cannot handle dynamic JavaScript or DOM manipulations, potentially missing post-rendered or asynchronous endpoints. Best for simpler applications:

    katana -u https://tesla.com

    Headless Mode: Runs in a browser context to analyze both raw HTTP responses and JavaScript-rendered content. Ideal for complex applications with DOM manipulation or asynchronous events:

    katana -u https://tesla.com -headless


    Controlling your scope
    Controlling your scope is important to returning valuable results. Katana has four main ways to control the scope of your crawl:
    1. Field-scope
    2. Crawl-scope
    3. Crawl-out-scope
    4. No-scope
    Field-scope (-fs): Limits crawling to specific domain levels.
      • rdn (default): Crawls root domain and subdomains (e.g., *.tesla.com).
      • fqdn: Crawls only the full subdomain (e.g., www.tesla.com).
      • dn: Crawls based on domain keyword (e.g., all URLs with "tesla").

    1. Crawl-scope (-cs): Uses regex to only return URLs that match a pattern (e.g., shop).

    2. Crawl-out-scope (-cos): Excludes URLs that match the given regex pattern (e.g., filter out shop).

    3. No-scope (-ns): Crawls beyond the target domain, including external links it finds (e.g., other domains linked to tesla.com).

    These options allow you to refine the results and control what gets crawled

    Here are some common uses of Katana for web crawling. In addition to these commands, there are many more available. You can explore them in detail on the official ProjectDiscovery blog.


    This information is helpful to you make sure to save bookmarks of our blog for more amazing content and join our Telegram channel to get the latest updates.
    Want to be a certified hacker and gain hands-on offensive hacking experience from zero to hero?

    Join Complete Offensive-Hacking Course Today To Get 10% Special Off