Otimizando Servidores Web Davi Menezes Lead Cloud Technical Account Manager AWS Support – Latin America Different strategies for better performance • • • • Leverage newer hardware and software. Apply more resources through auto scaling. Offload the heavy lifting to someone else. Optimize the web server stack. Defining “better” performance • Throughput -- transactions per second (tps). • Latency reduction. • Cost reduction. Optimizations by definition are app-specific • Test and validate together with the application itself. • There is no substitute to production data. • Make it an integral part of the application itself. – E.g. Elastic Beanstalk .ebextensions Identifying Bottlenecks First understand your workload • What are we serving? – Number of transactions – Transaction size – Back-end resource consumption • How much can we do today? – Theoretical benchmark https://youtu.be/7Cyd22kOqWc – Actual production load (observability / data-driven) • What is the bottleneck resource? – “Choose instance type for the bounding resource” – Workload Analysis vs. Resource Analysis Avoid tuning finds at random Logs: the ultimate source of truth 119.246.177.166 - - [02/Nov/2014:05:02:00 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-" 117.21.173.27 - - [02/Nov/2014:06:28:39 +0000] "GET /manager/html HTTP/1.1" 404 289 "-" 117.21.225.165 - - [02/Nov/2014:16:36:58 +0000] "GET /manager/html HTTP/1.1" 404 289 "-" 50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET //wp-login.php HTTP/1.1" 404 289 "-" 50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET /blog//wp-login.php HTTP/1.1" 404 295 "-" 50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wordpress//wp-login.php HTTP/1.1" 404 300 "-" 50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wp//wp-login.php HTTP/1.1" 404 293 "-" 24.199.131.50 - - [03/Nov/2014:08:00:30 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-" 76.10.82.137 - - [03/Nov/2014:08:55:49 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-" 123.249.19.23 - - [03/Nov/2014:09:15:29 +0000] "GET /manager/html HTTP/1.1" 404 289 "-" 117.21.173.27 - - [03/Nov/2014:15:55:25 +0000] "GET /manager/html HTTP/1.1" 404 289 "-" 62.210.136.228 - - [03/Nov/2014:22:31:22 +0000] "GET / HTTP/1.1" 403 3839 "-" 24.27.104.175 - - [04/Nov/2014:00:18:18 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-" 198.20.69.74 - - [04/Nov/2014:02:07:05 +0000] "GET / HTTP/1.1" 403 3839 "-" 198.20.69.74 - - [04/Nov/2014:02:07:13 +0000] "GET /robots.txt HTTP/1.1" 404 287 "-” 181.188.47.118 - - [04/Nov/2014:03:02:56 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-" 117.21.173.27 - - [04/Nov/2014:09:27:19 +0000] "GET /manager/html HTTP/1.1" 404 289 "-" 193.174.89.19 - - [04/Nov/2014:13:34:23 +0000] "GET / HTTP/1.1" 403 3839 "-" CloudWatch Metric Anatomy • Statistical aggregation – – – – – Min Max Sum Average Count • One data point per minute. • Can trigger actions via alarms. Micro metrics vs. Macro metrics • Agent-based monitoring • Available in Amazon Linux • Provides highly-granular, server-specific insights Source: http://demo.munin-monitoring.org/ Coming from a variety of sources Customer generated AWS generated • Kernel and Operating System • Amazon CloudFront • Web Server • Amazon Elastic Load Balancing • Application Server/Middleware • Amazon CloudWatch • Application code • Amazon Simple Storage Service • Instance networking More than meet the eyes Latency Histogram 250 2000 200 1800 1600 1400 150 1200 1000 100 800 600 50 400 200 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 55 204 207 210 0 Frequency 0 1 6 111621263136414651566166717681869196 Latency at percentile Average Latency Noteworthy AWS CloudWatch metrics • EC2 Instances – New T2 CPU Credits – CPU utilization – Bandwidth (In/Out) • EBS – PIOPS utilization – GP2 utilization – Remember: 8GB volume will provision 24 IOPs! • Elastic Load Balancing – – – – RequestCount Latency Queue length and spillover Backend connections errors • CloudFront – Requests – BytesDownloaded Diving Deep on the Last Mile (you & us) Elastic Load Balancer ELB Connection Behavior • No true limits on influx of connections – But may require preemptive scaling (aka Pre-warming) • Leverages HTTP Keep-Alives • Configurable Idle Connection Timeout • HTTP Session Stickness & Health-checking – Fast Registration • SSL Off-loading and Back-end authentication ELB access logs Processing Time HTTP log entries 30 • Only one side of picture. 25 • Can’t log custom headers or 20 format logs. 15 • Logs are delayed. 10 • Drill down to instance level 5 responsiveness, but can’t dive 0 into latency outliers bytes 35 response_processing_time request_processing_time backend_processing_time ELB Key Metrics • Latency and Request Count • Surge Queue and Spillover • ELB 5xx and 4xx • Back-end Connection Errors • Healthy and Unhealthy Host Counts The life of an HTTP connection http:80 int cfd,fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP); fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP) struct sockaddr_in si; si.sin_family=PF_INET; # of open inet_aton("127.0.0.1",&si.sin_addr); file descriptors si.sin_port=htons(80); bind(fd,(struct sockaddr*)si,sizeof si) si); listen(fd,512) listen(fd,512); accept(fd,(struct sockaddr*)si,sizeof si)) si) != -1) { while ((cfd=accept(fd,(struct read_request(cfd); /* read(cfd,...) until "\r\n\r\n" */ write(cfd,"200 OK HTTP/1.0\r\n\r\n" ”Bem-vindo ao AWS Summit SP 2015.",19+27); close(cfd); } The last TCP mile • Accept Pending Queue – man listen(2): “(…) backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow.” – Recv-Q & Send-Q – TCP is stream oriented • man accept(2): Blocking vs. Non-blocking sockets Tweaking the TCP stack (aka sysctl) Queuing at the TCP layer first • ECONNREFUSED man listen(2): “if the underlying protocol supports retransmission, the request may be ignored so that a later reattempt at connection succeeds” – aka: TCP Retransmit Scaling in the Linux Networking Stack • Connection States – man netstat(8) • Backlog Maximum Length – Waiting to be accepted: /proc/sys/net/core/somaxconnn – Half-Open connections: /proc/sys/net/ipv4/tcp_max_syn_backlog – CPU's input packet queue: /proc/sys/net/core/netdev_max_backlog TCP is a Window based protocol • TCP Receive Window “considered one of the most important TCP tweaks” (ugh!) – BDP = avail. bandwidth (KBps) X RTT (ms) • Choose an EC2 Instance with proper Bandwidth TCP Initial Congestion Window • RFC 3390 – Higher Initial Window +/* TCP initial congestion window */ +#define TCP_INIT_CWND 10 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=(…) commited to the kernel 2.6.39 (May 2011) – ip route (…) initcwnd 10 (kernel <2.6.39) • Disable Slow Start (net.ipv4.tcp_slow_start_after_idle) • Google Research – “propose to increase (…) to at least ten segments (about 15KB) Pub: “An Argument for Increasing TCP's Initial Congestion Window” TCP Buffers & Memory Utilization • Buffering – – – – Use case: sending/receiving large amounts of data Auto-tunable by the kernel However, has bounds: min, default, and max. Tune: net.ipv4.tcp_rmem/wmem (in bytes) • Sockets demand on page allocation – Tune: net.ipv4.tcp_mem (in pages) inet_timewait_death_row About TIME-WAIT state “The TIME_WAIT state is our friend and is there to help us (i.e., to let old duplicate segments expire in the network). Instead of trying to avoid the state, we should understand it.” Vincent Bernat - (vincent.bernat.im) • TIME-WAIT Assassination RFC • Increase your port range – net.ipv4.ip_local_port_range – A ballpark of your rate of connections per second: (ip_local_port_range / tcp_fin_timeout) leads to about 500 connections per second ! Check your sources XKCD: Duty Call - https://xkcd.com/386/ TL;DR: Do *not* enable net.ipv4.tcp_tw_recycle • Clients behind NAT/Stateful FW • will get dropped *99.99999999% of time should never be enabled Linux’s TCP protocol man page do not recommend * Probably 100% but there may be a valid case out there net.ipv4.tcp_tw_reuse Makes a safer attempt at freeing sockets in TIME_WAIT state. Customer Story Arquitetura • Mais de 400k requisições por minuto API API … API API API … API • 100+ instâncias EC2 em produção distribuídas em diferentes availability zones em Virtual Private Clouds, diversos Elastic Load Balancing • RDS clusters, SQS, ElastiCache (Redis), CloudSearch, CloudWatch... Mongo Availability Zone Mongo Availability Zone • Serviços Gerenciados permitem que nossos sys admins possam ser mais produtivos Erros 400 no ELB • Identificou-se um aumento de erros 400 no ELB; • Em conjunto com o suporte enterprise da AWS, realizamos um Deep dive nos logs de acesso do ELB usando Elasticsearch • Verificamos que os eventos estavam correlacionados a usuários mobile de operadoras que usavam NAT em suas conexões 3g; • Tcpdump para trace de pacotes revelaram que conexões estavam sendo silenciosamente descartadas; Resultado das análises • Depois das analises descobrimos que estávamos com as configuração abaixo em nossos servidores – net.ipv4.tcp_tw_recycle & net.ipv4.tcp_tw_reuse habilitados • Quando se ativa recycle, o kernel tenta tomar decisões baseadas no timestamp usado pelos hosts remotos. Ele tenta achar o último timestamp usado por cada host remoto que tenham uma conexão em TIME_WAIT, e ira permitir o reaproveitamento do socket se o timestamp tiver corretamente incrementado, mas se o timestamp usado pelo host não tiver aumentado corretamente o pacote será descartado pelo kernel. • Muitos de nossos clientes conectam através de operadoras que usam NAT. Com a alta taxa de acesso entrando do mesmo IP passamos a ter o kernel recusando essas conexões devido a inconsistência no timestamp, resultando um Bad Request (400) no ELB. Testemunho de Vinicius Garcia (CTO da Easy): • A ajuda do suporte enterprise foi de extrema importância para encontramos a solução para o nosso caso • Se não tivéssemos todos os logs e os dados que levantamos para a análise, teria sido extremamente difícil e provavelmente não teríamos conseguido chegar a conclusão do que estava acontecendo. Tweaking the Webserver stack Webservers Tuning 101 • Tune resources consumption – Context Switches / CPU – Memory Utilization • Allow your webserver processes enough requests concurrently – “Child Processes” / “Max Clients” tunables The backlog is back, again! • Keep an eye on the somaxconn limits • Understand resources utilization by the webserver – Process Isolation vs. Blast Radius – Avoid Resources Saturation & Starvation Telling the webserver when to start • man tcp(7) – tcp_defer_accept: Webserver only awakes when there is data available! • Reduce the burden on the webserver’s process • TCP Socket is already established (i.e. no SYN flood) Nginx Apache • listen [deferred] • AcceptFilter http data • AcceptFilter https data Using the Zero-copy pattern • man sendfile(2) “copying is done within the kernel” • I.e. no use of User Space Nginx Apache • sendfile on • EnableSendFile on HTTP Keep-Alive Nginx Apache • keepalive_timeout 75s • keepalive_requests 100 • KeepAlive On • KeepAliveTimeout 5 • MaxKeepAliveRequests 100 Ensure it matches your ELB timeout setting; otherwise… look into your ELB’s 5XX metric “The small-packet problem” Flush() (tcp_cork) Nagle’s algo (tcp_nodelay) • flush() analogy • • The application needs to “uncork” The initial problem: “congestion collapse” • write() vs. writev() • Onto the wire asap the stream • sendfile() is a must Auto in Apache (+sendfile option) Set tcp_nopush to false in NGINX Always On in Apache Set tcp_nodelay flag in NGINX “The small-packet problem” Flush() (tcp_cork) • • • Nagle’s algo (tcp_nodelay) TCP_NODELAY is weaker than• TCP_CORK, so that The initial problem: flush()/*analogy * this option on corked socket is remembered, but collapse” “congestion The application needs tountil “uncork” * it is not activated cork is cleared. • write() vs. writev() the stream * • Onto the wire asap * However, when TCP_NODELAY is set we make sendfile() is a must * an explicit push, which overrides even TCP_CORK * for currently queued segments. Always On in Apache Auto in */Apache (+sendfile option) Set tcp_nopush to false in NGINX Set tcp_nodelay flag in NGINX Thanks Chartbeat! Further details: http://engineering.chartbeat.com/author/justinlintz/ Start w/ Small Wins and keep iterating! Quick review • Keep the connection for as long as possible. • Minimize the latency. • Increase throughput. • Most importantly, research what settings make most sense for your environment. Offload opportunities • Leverage ELB’s – Large Volumes Connection Handling – SSL Off-loading • CloudFront + S3 for static file delivery – Tune HTTP responses’ cache headers • Go Multi-region w/ Route 53 LBR Last thoughts • • • • Monitor everything. Tune your server to your workload. Improvement must be quantifiable. Experiment and continuously re-validate! And most importantly, REMEMBER: Otimizando Servidores Web Davi Menezes Cloud Technical Account Manager | AWS Support OBRIGADO!