Otimizando Servidores Web
Davi Menezes
Lead Cloud Technical Account Manager
AWS Support – Latin America
Different strategies for better performance
•
•
•
•
Leverage newer hardware and software.
Apply more resources through auto scaling.
Offload the heavy lifting to someone else.
Optimize the web server stack.
Defining “better” performance
• Throughput -- transactions per second (tps).
• Latency reduction.
• Cost reduction.
Optimizations by definition are app-specific
• Test and validate together with the application itself.
• There is no substitute to production data.
• Make it an integral part of the application itself.
– E.g. Elastic Beanstalk .ebextensions
Identifying Bottlenecks
First understand your workload
• What are we serving?
– Number of transactions
– Transaction size
– Back-end resource consumption
• How much can we do today?
– Theoretical benchmark
https://youtu.be/7Cyd22kOqWc
– Actual production load (observability / data-driven)
• What is the bottleneck resource?
– “Choose instance type for the bounding resource”
– Workload Analysis vs. Resource Analysis
Avoid tuning finds at random
Logs: the ultimate source of truth
119.246.177.166 - - [02/Nov/2014:05:02:00 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
117.21.173.27 - - [02/Nov/2014:06:28:39 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
117.21.225.165 - - [02/Nov/2014:16:36:58 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET //wp-login.php HTTP/1.1" 404 289 "-"
50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET /blog//wp-login.php HTTP/1.1" 404 295 "-"
50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wordpress//wp-login.php HTTP/1.1" 404 300 "-"
50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wp//wp-login.php HTTP/1.1" 404 293 "-"
24.199.131.50 - - [03/Nov/2014:08:00:30 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
76.10.82.137 - - [03/Nov/2014:08:55:49 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
123.249.19.23 - - [03/Nov/2014:09:15:29 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
117.21.173.27 - - [03/Nov/2014:15:55:25 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
62.210.136.228 - - [03/Nov/2014:22:31:22 +0000] "GET / HTTP/1.1" 403 3839 "-"
24.27.104.175 - - [04/Nov/2014:00:18:18 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
198.20.69.74 - - [04/Nov/2014:02:07:05 +0000] "GET / HTTP/1.1" 403 3839 "-"
198.20.69.74 - - [04/Nov/2014:02:07:13 +0000] "GET /robots.txt HTTP/1.1" 404 287 "-”
181.188.47.118 - - [04/Nov/2014:03:02:56 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
117.21.173.27 - - [04/Nov/2014:09:27:19 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
193.174.89.19 - - [04/Nov/2014:13:34:23 +0000] "GET / HTTP/1.1" 403 3839 "-"
CloudWatch Metric Anatomy
• Statistical aggregation
–
–
–
–
–
Min
Max
Sum
Average
Count
• One data point per minute.
• Can trigger actions via
alarms.
Micro metrics vs. Macro metrics
• Agent-based monitoring
• Available in
Amazon Linux
• Provides highly-granular,
server-specific insights
Source: http://demo.munin-monitoring.org/
Coming from a variety of sources
Customer generated
AWS generated
•
Kernel and Operating System
•
Amazon CloudFront
•
Web Server
•
Amazon Elastic Load Balancing
•
Application Server/Middleware
•
Amazon CloudWatch
•
Application code
•
Amazon Simple Storage Service
•
Instance networking
More than meet the eyes
Latency Histogram
250
2000
200
1800
1600
1400
150
1200
1000
100
800
600
50
400
200
6
9
12
15
18
21
24
27
30
33
36
39
42
45
48
55
204
207
210
0
Frequency
0
1 6 111621263136414651566166717681869196
Latency at percentile
Average Latency
Noteworthy AWS CloudWatch metrics
• EC2 Instances
– New T2 CPU Credits
– CPU utilization
– Bandwidth (In/Out)
• EBS
– PIOPS utilization
– GP2 utilization
– Remember: 8GB volume
will provision 24 IOPs!
• Elastic Load Balancing
–
–
–
–
RequestCount
Latency
Queue length and spillover
Backend connections errors
• CloudFront
– Requests
– BytesDownloaded
Diving Deep on the Last Mile (you & us)
Elastic Load Balancer
ELB Connection Behavior
• No true limits on influx of connections
– But may require preemptive scaling (aka Pre-warming)
• Leverages HTTP Keep-Alives
• Configurable Idle Connection Timeout
• HTTP Session Stickness & Health-checking
– Fast Registration
• SSL Off-loading and Back-end authentication
ELB access logs
Processing Time
HTTP log entries
30
•
Only one side of picture.
25
•
Can’t log custom headers or
20
format logs.
15
•
Logs are delayed.
10
•
Drill down to instance level
5
responsiveness, but can’t dive
0
into latency outliers
bytes
35
response_processing_time
request_processing_time
backend_processing_time
ELB Key Metrics
• Latency and Request Count
• Surge Queue and Spillover
• ELB 5xx and 4xx
• Back-end Connection Errors
• Healthy and Unhealthy Host Counts
The life of an HTTP connection
http:80
int cfd,fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP);
fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP)
struct sockaddr_in si;
si.sin_family=PF_INET;
# of open
inet_aton("127.0.0.1",&si.sin_addr);
file descriptors
si.sin_port=htons(80);
bind(fd,(struct sockaddr*)si,sizeof si)
si);
listen(fd,512)
listen(fd,512);
accept(fd,(struct sockaddr*)si,sizeof si))
si) != -1) {
while ((cfd=accept(fd,(struct
read_request(cfd);
/* read(cfd,...) until "\r\n\r\n" */
write(cfd,"200 OK HTTP/1.0\r\n\r\n"
”Bem-vindo ao AWS Summit SP 2015.",19+27);
close(cfd);
}
The last TCP mile
• Accept Pending Queue
– man listen(2): “(…) backlog argument defines the maximum length to which the
queue of pending connections for sockfd may grow.”
– Recv-Q & Send-Q – TCP is stream oriented
• man accept(2): Blocking vs. Non-blocking sockets
Tweaking the TCP stack (aka sysctl)
Queuing at the TCP layer first
• ECONNREFUSED
man listen(2):
“if the underlying protocol supports
retransmission, the request may be ignored
so that a later reattempt at connection
succeeds” – aka: TCP Retransmit
Scaling in the Linux Networking Stack
• Connection States
– man netstat(8)
• Backlog Maximum Length
– Waiting to be accepted: /proc/sys/net/core/somaxconnn
– Half-Open connections: /proc/sys/net/ipv4/tcp_max_syn_backlog
– CPU's input packet queue: /proc/sys/net/core/netdev_max_backlog
TCP is a Window based protocol
• TCP Receive Window
“considered one of the most important TCP tweaks” (ugh!)
– BDP = avail. bandwidth (KBps) X RTT (ms)
• Choose an EC2 Instance
with proper Bandwidth
TCP Initial Congestion Window
• RFC 3390 – Higher Initial Window
+/* TCP initial congestion window */
+#define TCP_INIT_CWND
10
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=(…)
commited to the
kernel 2.6.39 (May 2011)
– ip route (…) initcwnd 10 (kernel <2.6.39)
• Disable Slow Start (net.ipv4.tcp_slow_start_after_idle)
• Google Research
– “propose to increase (…) to at least ten segments (about 15KB)
Pub: “An Argument for Increasing TCP's Initial Congestion Window”
TCP Buffers & Memory Utilization
• Buffering
–
–
–
–
Use case: sending/receiving large amounts of data
Auto-tunable by the kernel
However, has bounds: min, default, and max.
Tune: net.ipv4.tcp_rmem/wmem (in bytes)
• Sockets demand on page allocation
– Tune: net.ipv4.tcp_mem (in pages)
inet_timewait_death_row
About TIME-WAIT state
“The TIME_WAIT state is our friend and is there to help us (i.e., to let old
duplicate segments expire in the network). Instead of trying to avoid the state,
we should understand it.”
Vincent Bernat - (vincent.bernat.im)
• TIME-WAIT Assassination RFC
• Increase your port range
– net.ipv4.ip_local_port_range
– A ballpark of your rate of connections per second:
(ip_local_port_range / tcp_fin_timeout)
leads to about 500 connections per second !
Check your sources
XKCD: Duty Call - https://xkcd.com/386/
TL;DR: Do *not* enable net.ipv4.tcp_tw_recycle
• Clients behind NAT/Stateful FW
• will get dropped
*99.99999999% of time
should never be enabled
Linux’s TCP protocol man page
do not recommend
* Probably 100% but there may be a valid case out there
net.ipv4.tcp_tw_reuse
Makes a safer attempt at freeing sockets in
TIME_WAIT state.
Customer Story
Arquitetura
• Mais de 400k requisições por minuto
API
API
…
API
API
API
…
API
• 100+ instâncias EC2 em produção
distribuídas em diferentes availability
zones em Virtual Private Clouds, diversos
Elastic Load Balancing
• RDS clusters, SQS, ElastiCache (Redis),
CloudSearch, CloudWatch...
Mongo
Availability Zone
Mongo
Availability Zone
• Serviços Gerenciados permitem que
nossos sys admins possam ser mais
produtivos
Erros 400 no ELB
• Identificou-se um aumento de erros 400 no ELB;
• Em conjunto com o suporte enterprise da AWS, realizamos um
Deep dive nos logs de acesso do ELB usando Elasticsearch
• Verificamos que os eventos estavam correlacionados a usuários
mobile de operadoras que usavam NAT em suas conexões 3g;
• Tcpdump para trace de pacotes revelaram que conexões estavam
sendo silenciosamente descartadas;
Resultado das análises
•
Depois das analises descobrimos que estávamos com as configuração abaixo
em nossos servidores
– net.ipv4.tcp_tw_recycle & net.ipv4.tcp_tw_reuse habilitados
•
Quando se ativa recycle, o kernel tenta tomar decisões baseadas no timestamp
usado pelos hosts remotos. Ele tenta achar o último timestamp usado por cada
host remoto que tenham uma conexão em TIME_WAIT, e ira permitir o
reaproveitamento do socket se o timestamp tiver corretamente incrementado,
mas se o timestamp usado pelo host não tiver aumentado corretamente o
pacote será descartado pelo kernel.
•
Muitos de nossos clientes conectam através de operadoras que usam NAT.
Com a alta taxa de acesso entrando do mesmo IP passamos a ter o kernel
recusando essas conexões devido a inconsistência no timestamp, resultando
um Bad Request (400) no ELB.
Testemunho de Vinicius Garcia (CTO da Easy):
• A ajuda do suporte enterprise foi de extrema importância para
encontramos a solução para o nosso caso
• Se não tivéssemos todos os logs e os dados que levantamos
para a análise, teria sido extremamente difícil e
provavelmente não teríamos conseguido chegar a conclusão
do que estava acontecendo.
Tweaking the Webserver stack
Webservers Tuning 101
• Tune resources consumption
– Context Switches / CPU
– Memory Utilization
• Allow your webserver processes enough
requests concurrently
– “Child Processes” / “Max Clients” tunables
The backlog is back, again!
• Keep an eye on the somaxconn limits
• Understand resources utilization by the webserver
– Process Isolation vs. Blast Radius
– Avoid Resources Saturation & Starvation
Telling the webserver when to start
• man tcp(7) – tcp_defer_accept:
Webserver only awakes when there is data available!
• Reduce the burden on the webserver’s process
• TCP Socket is already established (i.e. no SYN flood)
Nginx
Apache
• listen [deferred]
• AcceptFilter http data
• AcceptFilter https data
Using the Zero-copy pattern
• man sendfile(2)
“copying is done within the kernel”
• I.e. no use of User Space
Nginx
Apache
• sendfile on
• EnableSendFile on
HTTP Keep-Alive
Nginx
Apache
• keepalive_timeout 75s
• keepalive_requests 100
• KeepAlive On
• KeepAliveTimeout 5
• MaxKeepAliveRequests 100
Ensure it matches your ELB timeout setting; otherwise…
look into your ELB’s 5XX metric
“The small-packet problem”
Flush() (tcp_cork)
Nagle’s algo (tcp_nodelay)
•
flush() analogy
•
•
The application needs to “uncork”
The initial problem:
“congestion collapse”
•
write() vs. writev()
•
Onto the wire asap
the stream
•
sendfile() is a must
Auto in Apache (+sendfile option)
Set tcp_nopush to false in NGINX
Always On in Apache
Set tcp_nodelay flag in NGINX
“The small-packet problem”
Flush() (tcp_cork)
•
•
•
Nagle’s algo (tcp_nodelay)
TCP_NODELAY is weaker than• TCP_CORK,
so that
The initial problem:
flush()/*analogy
* this option on corked socket is remembered,
but collapse”
“congestion
The application
needs tountil
“uncork”
* it is not activated
cork is cleared.
• write() vs. writev()
the stream
*
• Onto the wire asap
* However,
when TCP_NODELAY is set we make
sendfile()
is a must
* an explicit push, which overrides even TCP_CORK
* for currently queued segments.
Always On in Apache
Auto in */Apache (+sendfile option)
Set tcp_nopush to false in NGINX
Set tcp_nodelay flag in NGINX
Thanks Chartbeat!
Further details: http://engineering.chartbeat.com/author/justinlintz/
Start w/ Small Wins and keep iterating!
Quick review
• Keep the connection for as long as possible.
• Minimize the latency.
• Increase throughput.
• Most importantly, research what settings make
most sense for your environment.
Offload opportunities
• Leverage ELB’s
– Large Volumes Connection Handling
– SSL Off-loading
• CloudFront + S3 for static file delivery
– Tune HTTP responses’ cache headers
• Go Multi-region w/ Route 53 LBR
Last thoughts
•
•
•
•
Monitor everything.
Tune your server to your workload.
Improvement must be quantifiable.
Experiment and continuously re-validate!
And most importantly,
REMEMBER:
Otimizando Servidores Web
Davi Menezes
Cloud Technical Account Manager | AWS Support
OBRIGADO!