When Data Become Radar: Tracing Spammers and Phishers Through the Abuse of the Internet Infrastructure Klaus Steding-Jessen CERT.br / NIC.br / CGI.br [email protected] Wagner Meira Jr. e-Speed / DCC / UFMG [email protected] APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 1/22 Agenda SpamPots Project Objectives Architecture Overview Mining Spam Campaigns Ongoing Work Monitoring Phishings and Fraud Abuses References APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 2/22 SpamPots Project Objectives Better understand the abuse of the Internet infrastructure by spammers • measure the problem from a different point of view: abuse of infrastructure X spams received at the destination • Help develop the spam characterization research • Measure the abuse of end-user machines to send spam • Provide data to trusted parties – help the constituency to identify infected machines – identify malware and scams targeting their constituency • Use the spam collected to improve antispam filters • Develop better ways to – identify phishing and malware – identify botnets via the abuse of open proxies and relays • Sensors at: AU, AT, BR, CL, NL, TW, US and UY APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 3/22 Architecture Overview Spammers, bots malware, etc Honeypots emulating open proxies and open relays Data Collection: Data Analysis: Collects all data periodically; Data mining process; Checks honeypots status. Generate analysis based on spam content. Storage Storage Members Portal: Statistics; Global distribution of spam campaings; Sample e−mails, URLs, etc. Data Warehouse APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 4/22 Case Study • IP from Nigeria • abuse SOCKS Proxy in Brazil • connects at an ISP in Germany • to authenticate with a stolen credential • to send a phishing to .uk victims • with a link to a phony Egg bank site • using a South Africa domain • hosted at an IP address allocated to “UK’s largest web hosting company based in Gloucester ” APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 5/22 Case Study (cont.) From: "Egg Bank Plc"<[email protected]> Subject: Online Banking Secure Message Alert! Date: Mon, 19 Apr 2010 14:46:29 +0100 X-SMTP-Proto: ESMTPA X-Ehlo: user X-Mail-From: [email protected] X-Rcpt-To: <victim1>@yahoo.co.uk X-Rcpt-To: <victim2>@yahoo.com X-Rcpt-To: <victim3>@yahoo.co.uk X-Rcpt-To: <victim4>@hotmail.co.uk (...) X-Rcpt-To: <victimN>@aol.com APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 6/22 Case Study (cont.) X-Sensor-Dstport: 1080 X-Src-Proto: SOCKS 5 X-Src-IP: 41.155.50.138 X-Src-Hostname: dial-pool50.lg.starcomms.net X-Src-ASN: 33776 X-Src-OS: unknown X-Src-RIR: afrinic X-Src-CC: NG X-Src-Dnsbl: zen=PBL (Spamhaus) X-Dst-IP: 195.4.92.9 X-Dst-Hostname: virtual0.mx.freenet.de X-Dst-ASN: 5430 X-Dst-Dstport: 25 X-Dst-RIR: ripencc X-Dst-CC: DE APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 7/22 Case Study (cont.) <table width="561"> <tbody><tr><td><br><font face="Arial" size="2"> You have 1 new Security Message Alert! <br><br> Log In into your account to review the new credit limit terms and conditions..<br> </font><p><font face="Arial" size="2"><br><font face="Arial"> </font></font><font face="Arial"><a rel="nofollow" target="_blank" href="http://www.mosaic.org.za/images/index.html"> Click here to Log In</a></font></p> <font face="Arial"> </font><font face="Arial" size="2"> </font><p><font face="Arial" size="2"><br><br> Egg bank Online Service<br> </font></p> <font face="Arial" size="2"> </font><hr> <font face="Arial" size="2"> <font color="999999" size="1"> Egg bank Security Department</font></font></td></tr></tbody></table> APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 8/22 Case Study (cont.) APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 9/22 Mining Spam Campaigns APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 10/22 Motivation • Spampots collect a huge volume of spams (2 million spams/day) • How to make sense of all this data? – Data Mining! – Cluster spam messages into Spam Campaigns to isolate the traffic associated to each spammer – Correlate spam campaign attributes to unveil different spamming strategies APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 11/22 The Pattern Tree Approach • Features are extracted from spam messages (subject, URLs, layout etc) • We organize them hierarquically inserting more frequent features on the top levels of the tree • Campaigns delimited by sequence of invariants APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 12/22 Data reduction 1. The Pattern Tree grouped 350M spam messages into 60K spam campaigns; 2. Obfuscation patterns are naturally discovered! 3. Automatically deals with new and unknown campaign obfuscation techniques Pajek APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 13/22 Pajek Some Findings Correlation of campaign language, source and target unveil spamming strategies, e.g: 1. Campaign Source=BR, ⇒ Campaign Language=Chinese, Campaign Target=yahoo.com.tw (confidence=87%) APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 14/22 Some Findings (2) 1. URLs are the most frequently features obfuscated on spams; layout remains quite unchanged 2. 10% of spammers abuse both open proxies and open relays on the same campaign 3. Spammers chain open proxies with open relays to conceal their identities over the network 4. Windows machines abuse open proxies, Linux abuse open relays APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 15/22 Mining Target Address Lists 1. Spamming IPs can be grouped according to the overlap on their e-mail address lists 2. Complementary to Spam Campaign Analysis 3. Evolution of Spam Campaigns associated to the same address list 122.116.112.123 122.116.112.163 205.209.142.138205.209.142.52 205.209.143.133 64.56.64.40 205.209.142.67 205.209.142.66 61.158.167.74 61.158.167.61 64.56.64.47 205.209.161.210 218.61.7.29 218.61.7.6 205.209.161.101 64.56.64.52 64.56.64.62 205.209.161.174 205.209.161.226 218.61.7.7 218.161.127.172 218.161.127.123 218.167.101.88 218.161.124.180 61.217.60.177 59.112.196.46 59.115.16.62 205.209.161.62 218.161.120.23 218.167.103.7 218.161.120.210 205.209.161.186 205.209.161.22 205.209.161.146 205.209.161.189 205.209.161.158 205.209.161.227 61.231.49.166 61.217.62.239 205.209.161.214 205.209.161.106 205.209.161.178 205.209.161.14 205.209.161.99 110.232.160.8 112.109.11.10 110.44.130.10 110.232.160.21 112.109.4.214 110.44.136.214 110.232.160.3 112.109.5.10 110.44.137.100 115.166.85.10 110.44.131.10 118.102.37.101 115.166.84.10 110.232.160.14 115.166.87.10 110.44.139.10 113.20.176.10 112.109.7.10 110.44.137. 113.20.187.10 110.232.160.19 110.232.160.13 118.102.35.253 113.20.185.10 112.109.9.10 118.102.37.104118.102.37.100 110.44.128 110.232.160.5 64.56.64.56 110.44.138.10 110.232.160.6 110.44.129.10 120.143.132.53 110.232.160.18 110.232.160.24 113.20.186.10 110.232.160. 110.44.131.100 118.102.37.103 118.102.34.252 120.143.132.173 110.232.163.214 110.232.160.9 110.232.160.20 112.109.8.214 113.20.160.10 113.20.160.214 110.232.160.22 110.232.160.11 110.232.160.2 112.109.11.214 110.232.160.4 113.20.163.214 110.232.160.7 113.20.176.214 113.20.178.214 115.166.86.10 118.102.37.102 110.232.160.15 113.20.187.100 110.232.160.12 113.20.178.10 74.222.1.42 59.112.197.46 61.217.161.211 59.115.17.112 59.112.198.168 59.112.196.128 59.112.198.215 61.228.8.144 59.112.192.17 59.112.192.210 59.112.193.25 59.112.194.234 59.112.198.150 190.64.90.234 123.204.76.37 61.217.60.244 61.217.62.237 61.228.8.95 190.64.67.110 61 217 154 147 APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 16/22 61.62.28.80 Pajek Ongoing Work 1. combining the views provided from different spampots 2. factorial design experiment to determine effects of spampots’ parameters 3. investigating the connection between bots and open proxies / open relays APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 17/22 Monitoring Phishings and Fraud Abuses APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 18/22 Comparing Brazilian Phishings x US Phishings • Brazilian Phishing Dataset provided by University of Sao Paulo • US Phishing Dataset provided by Jose Nazario (Arbor Networks) Tabela: Ocurrence of phishing indicators on Brazilian / US Phishings dataset # of phishings IP-based URLs Nonmatching URLs URL Redirection Malicious Attachment Suspicious Text BR 9,475 5% 3% 0.5% 9% 89% US 4,576 28% 15% 5% 0.1% 70% Brazilian Phishing less sophisticated; user education could be highly effective? APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 19/22 Detecting phishing campaigns with spampots 1. we extracted phishing features from phishing datasets 2. incremental tree update algorithm to detect spam/phishing campaigns in real time Phishing Datasets Phishing Features APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 20/22 References • A Campaign-based Characterization of Spamming Strategies. Pedro H. Calais Guerra, Douglas Pires, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen (CEAS ’08) • Spamming Chains: A New Way of Understanding Spammer Behavior. Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen (CEAS ’09) • Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns. Pedro H. Calais Guerra, Douglas Pires, Marco Ribeiro, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen (ACM KDD’09 demo paper) APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 21/22 References • Brazilian Internet Steering Comittee – CGI.br http://www.cgi.br/ • Computer Emergency Response Team Brazil – CERT.br http://www.cert.br/ • Previous presentations about the project http://www.cert.br/presentations/ • SpamPots Project white paper (in Portuguese) http://www.cert.br/docs/whitepapers/spampots/ APWG CeCOS IV, São Paulo, Brazil – May 11–13, 2010 – p. 22/22