panagiks: 2019

The issue

So a while ago, while working on a project, I encountered an endpoint that required some heavy computations to produce the response and as a result the response usually took a bit more than 5 minutes.

The examples that existed showcasing the usage of said endpoint were all done with curl.

So, my turn comes to consume this endpoint and lo and behold! The request hangs. Okay, weird. Let's try curl again. Works fine. Python, nope. Let's check with wget then, no luck.

In my desperation I tried all the Python libraries I had used in the past (requests, the built in http.client and aiohttp) obviously setting all the applicable timeouts sky high. Still no luck!

So what's so special about curl, what did it do that both wget and my Python implementation failed to?

Troubleshooting

Desperate times call for desperate measures I told my self and I grabbed my trusty strace brewed a big ol cup of coffee and got to work.

First thing that is immediately obvious to me is that curl does indeed do something different as strace tells me that curl blocks with epoll while both wget and my Python solutions block with select. This gives me a first clue that curl does indeed do 'something else' (TM) besides just waiting for a response but leaves me with not much more to follow on.

I decide to switch context and go from the lowest possible (for me at least) level to my most high level approach: replicate the Python solutions with insanely high timeouts and monitor its behavior. This yields an interesting result! The target endpoint supports a notation for the client to specify the seconds until the request should timeout, but despite my explicit definition of it to 600 seconds (that's 10 minutes) the remote server hasn't sent me (or should I say I haven't gotten ;) ) an explicit timeout for more than 15 minutes. This brings back bad memories ... This uncannily resembles the behavior of a firewall that instead of dropping the packets it just filters them, this way the client never explicitly knows that it cannot connect and just waits.

But the reminder the curl worked perfectly quickly snaps this thought out of my mind. Time for another coffee (I could use the break anyways)! While I wait for my coffee to brew I start whining to (ehm .. I mean discussing with) a colleague (SysAdmin) about the issue and my findings. In a heartbeat he suggest a misbehaving firewall. But why! How could curl go through I still don't get it. It's nearly night and there's a weekend ahead of me so I call it a day.

Monday morning I find a set of netstat commands (one with curl running against said endpoint and one with wget) along with their result. Son of a female dog!

With curl:

$ netstat -at --timers

Proto Recv-Q Send-Q Local Address           Foreign Address         State       Timer
.....
tcp        0      0 localMachine:lPort        remoteMachine ESTABLISHED keepalive (60/0/0)
.....

With wget:

$ netstat -at --timers

Proto Recv-Q Send-Q Local Address           Foreign Address         State       Timer
.....
tcp        0      0 localMachine:lPort        remoteMachine ESTABLISHED off (0.00/0/0)
.....

So now I know! curl uses a TCP level keepalive which means that a TCP packet is transmitted in a fixed interval regardless of whether there are any actual data to transfer. So there is in deed a misconfigured firewall somewhere along the way that chops down long running connections (anything longer than 5 minutes as I found out with some troubleshooting) without informing either party that the connection was dropped.

And now what ?

So you know what kind of problem you have but this isn't even half the solution. Unfortunately I could find in none of the http libraries I use in Python a reliable way to enable TCP keepalive through their API.

Show me the code !

I took the simplest and more 'core' solution here because it's easier for showcasing the approach; the same logic would apply with any other library.

# construct your headers; maybe add a keepalive header here to avoid the remote server closing the connection
headers = {}
# create your connection object 
conn = http.client.HTTPSConnection(host, timeout=600)
conn.connect()
# Now you will need to access the socket object of your connection; how you access this will vary depending on the library you use 
s = conn.sock
# Set the following socket options (feel free to play with the values to find what works best for you)
# SO_KEEPALIVE: 1 => Enable TCP keepalive
# TCP_KEEPIDLE: 60 => Time in seconds until the first keepalive is sent
# TCP_KEEPINTVL: 60 => How often should the keepalive packet be sent
# TCP_KEEPCNT: 100 => The max number of keepalive packets to send
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 100)
conn.request("GET", "/endpoint", {}, headers)
response = conn.getresponse()
data = response.read()

On Oct 08 2018 a session reuse vulnerability was discovered (and disclosed in aiohttp-session#325) in aiohttp-session that falls under CWE-287 (Improper Authentication) and is caused by the library's reliance on Storage technology for data expiration.

Due to this reliance storage backends with inherent expiry like Redis or Memcached were not vulnerable. On backends that solely rely on cookie storage for information storage, like EncryptedCookieStorage and NaClCookieStorage, however it is possible for a malicious client to re-create a cookie with the same session data. This effectively provides infinite lifetime sessions, thus defeating the purpose of session expiry which is to minimize the attack window on the most vulnerable part of an application's authentication, that is the session transfer.

On Oct 13 2018 the vulnerability was patched with aiohttp-session#331 and aiohttp-session v2.7.0 was released.

Finally on Dec 20 2018 this vulnerability was assigned CVE-2018-1000814.

Side effects

The applied patch of #331 does have some side effects. The patch breaks an implicit behavior of the library to create 'idle sessions'. So if you are not using one of the vulnerable backends and are utilizing aiohttp-session for idle sessions your best bet is to freeze your dependencies to aiohttp-session==2.6.0 until a fix is released.

Other references

Vulndb
NVD/NIST

panagiks

Wednesday, May 1, 2019

Python TCP keepalive on http request