Role
Data Source
Collection Site
Collection Period
Data Volume
Growth and size
Network telescope
Merit Network, Inc.
07/18/2016–02/28/2017
370B packets, avg. 269K IPs/min
Device composition
Active scanning
Censys
07/19/2016–02/28/2017
136 IPv4 scans, 5 protocols
Ownership & evolution
Telnet honeypots
AWS EC2
11/02/2016–02/28/2017
141 binaries
Telnet honeypots
Akamai
11/10/2016–02/13/2017
293 binaries
Malware repository
VirusTotal
05/24/2016–01/30/2017
594 binaries
DNS — active
Georgia Tech
08/01/2016–02/28/2017
290M RRs/day
DNS — passive
Large U.S. ISP
08/01/2016–02/28/2017
209M RRs/day
Attack characterization
C2 milkers
Akamai
09/27/2016–02/28/2017
64.0K attack commands
DDoS IP addresses
Akamai
09/21/2016
12.3K IP addresses
DDoS IP addresses
Google Shield
09/25/2016
158.8K IP addresses
DDoS IP addresses
Dyn
10/21/2016
107.5K IP addresses
Table 1: Data Sources — We utilized a multitude of data perspectives to empirically analyze the Mirai botnet.
Protocol
Banners
Devices Identified
HTTPS
342,015
271,471
(79.4%)
FTP
318,688
144,322
(45.1%)
Telnet
472,725
103,924
(22.0%)
CWMP
505,977
35,163
(7.0%)
SSH
148,640
8,107
(5.5%)
Total
1,788,045
587,743
(31.5%)
Table 2: Devices Identified — We identified device type, model,
and/or vendor for 31.5% of active scan banners. Protocol ban-
ners varied drastically in device identifiability, with HTTPS
certificates being most descriptive, and SSH prompts being the
least.
during which devices may churn to new IP addresses. Fi-
nally, Censys executes scans for different protocols on
different days, making it difficult to increase label speci-
ficity by combining banners from multiple services. We
navigated these constraints by restricting our analysis
to banners that were collected within twenty minutes of
scanning activity (the time period after which we expire
a scan). This small window mitigates the risk of erro-
neously associating the banner data of uninfected devices
with Mirai infections due to DHCP churn.
Post-filtering, our dataset included 1.8 million banners
associated with 1.2 million Mirai-infected IP addresses
(Table 2). We had the most samples for CWMP, and
the least for SSH. We caution that devices with open
services that are not closed by Mirai (e.g., HTTPS and
FTP) can appear repeatedly in Censys banner scans during
our measurement window (due to churn) and thus lead to
over counting when compared across protocols. As such,
we intentionally explored protocols in isolation from one
another and limited ourselves to measurements that only
consider relative proportions rather than absolute counts
of infected hosts.
Finally, we processed each infected device’s banner to
identify the device manufacturer and model. We first ap-
plied the set of regular expressions used by Nmap service
probes to fingerprint devices [58]. Nmap successfully
handled 98% of SSH banners and 81% of FTP banners,
but matches only 7.8% of the Telnet banners. In order to
increase our coverage and also accommodate HTTPS and
CWMP (which Nmap lacks probes for), we constructed
our own regular expressions to map banners to device
manufacturers and models. Unfortunately, we found that
in many cases, there was not enough data to identify a
model and manufacturer from FTP, Telnet, CWMP, and
SSH banners and that Nmap fingerprints only provide
generic descriptions. In total, we identified device type
and/or model and manufacturer for 31.5% of banners
(Table 2). We caution that this methodology is suscepti-
ble to misattribution in instances where port-forwarding
and Universal Plug and Play (UPnP) are used to present
multiple devices behind a single IP address, making the
distinction between middlebox and end-device difficult.
3.3
Telnet Honeypots
To track the evolution of Mirai’s capabilities, we collected
binaries installed on a set of Telnet honeypots that mas-
queraded as vulnerable IoT devices. Mechanically, we
presented a BusyBox shell [92] and IoT-consistent device
banner. Our honeypots logged all incoming Telnet traf-
fic and downloaded any binaries that attackers attempted
to install on the host via wget or tftp (the methods of
infection found in Mirai’s original source). In order to
avoid collateral damage, we blocked all other outgoing
requests (e.g., scanning and DoS traffic).
We logged 80K connection attempts from 54K IP ad-
dresses between November 2, 2016 and February 28,
2017, collecting a total 151 unique binaries. We filtered
out executables unrelated to Mirai based on a YARA sig-
nature that matched any of the strings from the original
source code release, leaving us with 141 Mirai binaries.
We supplemented this data with 293 binaries observed by
honeypots operated by Akamai, which served a similar
purpose to ours, but were hosted on a different public
1096 26th USENIX Security Symposium
USENIX Association