In this article, we will explore how to effectively use the Katana tool to maximize its potential and achieve better results in web application security scanning.
Crawling is a crucial aspect of web application security because it allows you to explore and discover areas of a web application that might not be immediately visible or well-documented.
You may also like to read: How Cybersecurity Companies Make Money
What We’re Going to Read
- What is Katana
- Features
- Installation
- Usages
Katana is an open-source tool written in Go, designed for efficient sub-domain enumeration and security scanning. With its speed, simplicity, and performance capabilities, Katana stands out as a go-to tool for discovering and verifying live sub-domains. This is essential for web application security audits, as attackers often exploit forgotten or misconfigured sub-domains that businesses may overlook.
Features
- Fast and fully configurable web crawling: Katana offers high-speed web crawling with extensive customization options to adapt to specific requirements.
- Standard and Headless mode: It supports both traditional and headless crawling, allowing interaction with dynamic web pages through JavaScript.
- Active and Passive mode: You can choose between active crawling, which interacts with the web application, or passive crawling, which only observes.
- JavaScript parsing / crawling: Katana is capable of parsing and crawling JavaScript content, ensuring that dynamic parts of the web application are covered.
- Customizable automatic form filling: It provides automatic form filling with customizable settings to simulate user input during the crawl.
- Scope control - Preconfigured field / Regex: Katana allows precise control over the crawl scope using preconfigured fields or regular expressions to target specific parts of the application.
- Customizable output - Preconfigured fields: The tool enables you to define which data fields are extracted, making the output highly customizable.
- INPUT - STDIN, URL and LIST: Katana accepts input from standard input, individual URLs, or a list of URLs for bulk processing.
- OUTPUT - STDOUT, FILE and JSON: It supports output to standard output, files, or JSON format, giving flexibility in how you handle and store results.
Installation
Install using Go :
go install github.com/projectdiscovery/katana/cmd/katana@latest
Install using docker :
To install / update docker to latest tag -
docker pull projectdiscovery/katana:latest
To run katana in standard mode using docker -
docker run projectdiscovery/katana:latest -u https://tesla.com
To run katana in headless mode using docker -
docker run projectdiscovery/katana:latest -u https://tesla.com -system-chrome -headless
In Ubuntu :
sudo apt update
sudo snap refresh
sudo apt install zip curl wget git
sudo snap install golang --classic
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
sudo apt update
sudo apt install google-chrome-stable
Usages
Run -h for Help Cmd
katana -h
There are four different ways to give katana input:
1. URL input
katana -u https://example.com
2. Multiple URL input
katana -u https://example.com,https://example2.com
3. List input
katana -list url_list.txt
4. STDIN input (piped)
echo “https://example.com” | katana
Crawling Mode
Standard Mode: Uses Go's HTTP library for fast requests without browser overhead. However, it cannot handle dynamic JavaScript or DOM manipulations, potentially missing post-rendered or asynchronous endpoints. Best for simpler applications:
katana -u https://tesla.com
Headless Mode: Runs in a browser context to analyze both raw HTTP responses and JavaScript-rendered content. Ideal for complex applications with DOM manipulation or asynchronous events:
katana -u https://tesla.com -headless
- Field-scope
- Crawl-scope
- Crawl-out-scope
- No-scope
rdn
(default): Crawls root domain and subdomains (e.g.,*.tesla.com
).fqdn
: Crawls only the full subdomain (e.g.,www.tesla.com
).dn
: Crawls based on domain keyword (e.g., all URLs with "tesla").
Crawl-scope (-cs): Uses regex to only return URLs that match a pattern (e.g.,
shop
).Crawl-out-scope (-cos): Excludes URLs that match the given regex pattern (e.g., filter out
shop
).No-scope (-ns): Crawls beyond the target domain, including external links it finds (e.g., other domains linked to
tesla.com
).
These options allow you to refine the results and control what gets crawled
Here are some common uses of Katana for web crawling. In addition to these commands, there are many more available. You can explore them in detail on the official ProjectDiscovery blog.