http - Node.js Server Timeout Problems (EC2 + Express + PM2) -


i'm relatively new running production node.js apps , i've been having problems server timing out.

basically after amount of usage & time node.js app stops responding requests. don't see routes being fired on console anymore - it's whole thing comes halt , http calls client (iphone running afnetworking) don't reach server anymore. if restart node.js app server starts working again, until things inevitable stop again. app never crashes, stops responding requests.

i'm not getting errors, , i've made sure handle , log db connection errors i'm not sure start. thought might have memory leaks installed node-memwatch , set listener memory leaks doesn't called before server stops responding requests.

any clue might happening , how can solve problem?

here's stack:

  • node.js on aws ec2 micro instance (using express 4.0 + pm2)
  • database on aws rds volume running mysql (using node-mysql)
  • sessions stored w/ redis on same ec2 instance node.js app
  • clients iphones accessing server via afnetworking

once again no errors firing of modules mentioned above.

first of need bit more specific timeouts.

  • tcp timeouts: tcp divides message packets sent 1 one. receiver needs acknowledge having received packet. if receiver not acknowledge having received package within period of time, tcp retransmission occurs, sending same packet again. if happens couple of more times, sender gives , kills connection.

  • http timeout: http client browser, or server while acting client (e.g: sending requests other http servers), can set arbitrary timeout. if response not received within period of time, disconnect , call timeout.

now, there many, many possible causes this... more trivial less trivial:

  • wrong content-length calculation: if send request content-length: 20 header, means "i going send 20 bytes". if send 19, other end wait remaining 1. if takes long... timeout.

  • not enough infrastructure: maybe should assign more machines application. if (total load / # of cpu cores) on 1, or memory usage high, system may on capacity. keep reading...

  • silent exception: error thrown not logged anywhere. request never finished processing, leading next item.

  • resource leaks: every request needs handled completion. if don't this, connection remain open. in addition, incomingmesage object (aka: called req in express code) remain referenced other objects (e.g: express itself). each 1 of objects can use lot of memory.

  • node event loop starvation: @ end.


for memory leaks, symptoms be: node process using increasing amount of memory.

to make things worse, if available memory low , server misconfigured use swapping, linux start moving memory disk (swapping), i/o , cpu intensive. servers should not have swapping enabled.

cat /proc/sys/vm/swappiness 

will return level of swappiness configured in system (goes 0 100). can modify in persistent way via /etc/sysctl.conf (requires restart) or in volatile way using: sysctl vm.swappiness=10

once you've established have memory leak, need core dump , download analysis. way can found in other stackoverflow response: tools analyze core dump node.js

for connection leaks (you leaked connection not handling request completion), having increasing number of established connections server. can check established connections netstat -a -p tcp | grep established | wc -l can used count established connections.

now, event loop starvation worst problem. if have short lived code node works well. if cpu intensive stuff , have function keeps cpu busy excessive amount of time... 50 ms (50 ms of solid, blocking, synchronous cpu time, not asynchronous code taking 50 ms), operations being handled event loop such processing http requests start falling behind , timing out.

the way find cpu bottleneck using performance profiler. nodegrind/qcachegrind preferred profiling tools others prefer flamegraphs , such. can hard run profiler in production. take development server , slam requests. aka: load test. there many tools this.


finally, way debug problem is:

env node_debug=tls,net node <...arguments app>

node has optional debug statements enabled through node_debug environment variable. setting node_debug tls,net make node emit debugging information tls , net modules... being sent or received. if there's timeout see it's coming from.

source: experience of maintaining large deployments of node services years.


Popular posts from this blog

c# - ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file -

matlab - Compression and Decompression of ECG Signal using HUFFMAN ALGORITHM -

utf 8 - split utf-8 string into bytes in python -