Hiding in Plain Sight Part Three: Watching the Watchers

The internet is constantly being scanned by various services – from search engine crawlers to security research platforms. While examining a public web server’s logs is one way to observe these scanners, not everyone has access to one. Thankfully, Greynoise.io offers a powerful alternative that lets us peek behind the curtain of internet-wide scanning activity.

Understanding Greynoise

Greynoise operates an extensive network of sensors across the internet, collecting and analyzing traffic patterns at a massive scale. Think of it as a radar system for internet traffic, capable of distinguishing between benign scanners, malicious actors, and everything in between. While its capabilities are vast, today we’ll focus on using it to examine the digital fingerprints of common scanning services.

Identifying Legitimate Crawlers

Let’s start with Google’s web crawlers. When we filter Greynoise results for “googlebot” classified as “benign”, we can observe Google’s crawler infrastructure in action. Googlebot uses consistent IP ranges and their user agent strings follow a standardized format.

Moving on to Censys’ scanning infrastructure reveals a different set of characteristics. Their scanning operations use distinct user agent strings that clearly identify their purpose.

Greynoise.io serves as an invaluable window into internet scanning activity, allowing us to observe and understand the behavior of various scanning services without running our own infrastructure. When you’re trying to blend in with common things, step one is to identify what these common things look like.