<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>Mina Naguib</title>
		<description>Mina Naguib's Blog</description>
		<link>http://mina.naguib.ca</link>
		<atom:link href="http://mina.naguib.ca/feed.xml" rel="self" type="application/rss+xml" />
		
			<item>
				<title>The little ssh that (sometimes) couldn't</title>
				<description>&lt;h3&gt;Preface&lt;/h3&gt;
&lt;p&gt;This is a technical article chronicling one of the most interesting bug hunts I&amp;#8217;ve had the pleasure of chasing down.&lt;/p&gt;
&lt;p&gt;At &lt;a href=&quot;http://adgear.com/&quot;&gt;AdGear Technologies Inc.&lt;/a&gt; where I work, ssh is king.  We use it for management, monitoring, deployments, log file harvesting, even real-time event streaming.  It&amp;#8217;s solid, reliable, has all the predictability of a native unix tool, and just works.&lt;/p&gt;
&lt;p&gt;Until one day, random cron emails started flowing about it not working.&lt;/p&gt;
&lt;h3&gt;The timeout&lt;/h3&gt;
&lt;p&gt;The machines in our London data center were randomly failing to send their log files to our data machines in our Montreal data center.  This job is initiated periodically from cron, and the failure manifested itself as:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;cron emails stating that the ssh was unsuccessful
	&lt;ul&gt;
		&lt;li&gt;Sometimes hangs&lt;/li&gt;
		&lt;li&gt;Sometimes exits with a timeout error&lt;/li&gt;
	&lt;/ul&gt;&lt;/li&gt;
	&lt;li&gt;nagios warnings down the line for in-house sanity checks detecting the missing data in Montreal&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We logged into the London machines, manually ran the push command, and it worked successfully.  We brushed it off as temporary network partitions.&lt;/p&gt;
&lt;h3&gt;The timeouts&lt;/h3&gt;
&lt;p&gt;But the failures kept popping up randomly.  Once a day, a couple of times a day, then one Friday morning, several times an hour.  It was clear something&amp;#8217;s getting worse.  We kept up with manually pushing the files until we figure out what the problem was.&lt;/p&gt;
&lt;p&gt;There were 17 hops between London and Montreal.  We built a profile of latency and packet loss for them, and found that a couple were losing 1-3% of packets.  We filed a ticket with our London DC ops to route away from them.&lt;/p&gt;
&lt;p&gt;While London DC ops were verifying the packet loss, we started seeing random timeouts from London to our &lt;span class=&quot;caps&quot;&gt;SECOND&lt;/span&gt; data center in Montreal, and hops to that data center did not share the same routes we observed the packet loss at.  We concluded packet loss is not the main problem around the same time London DC ops replied saying they&amp;#8217;re not able to replicate the packet loss or timeouts and that everything looked healthy on their end.&lt;/p&gt;
&lt;h3&gt;The revelation&lt;/h3&gt;
&lt;p&gt;While manually keeping up with failed cron uploads, we noticed an interesting pattern.  A file transfer either succeeded at a high speed, or didn&amp;#8217;t succeed at all and hung/timed out.  There were no instances of a file uploading slowly and finishing successfully.&lt;/p&gt;
&lt;p&gt;Removing the large volume of data from the equation, we were able to recreate the scenario via simple vanilla ssh.  On a London machine an &amp;#8220;ssh mtl-machine&amp;#8221; would either work immediately, or hang and never establish a connection.  Eyebrows started going up.&lt;/p&gt;
&lt;h3&gt;Where the wild packets are&lt;/h3&gt;
&lt;p&gt;We triple-checked the ssh server configs and health in Montreal:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;span class=&quot;caps&quot;&gt;DNS&lt;/span&gt; servers were responding fast&lt;/li&gt;
	&lt;li&gt;&lt;span class=&quot;caps&quot;&gt;DNS&lt;/span&gt; reverse lookup was not enabled&lt;/li&gt;
	&lt;li&gt;Maximum client connections was high enough&lt;/li&gt;
	&lt;li&gt;We were not under attack&lt;/li&gt;
	&lt;li&gt;Bandwidth usage was nowhere near saturation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Besides, even if something was off, we were observing the hangs talking to 2 completely distinct data centers in Montreal.  Furthermore, our other data centers (non-London) were talking happily to Montreal.  Something about London was off.&lt;/p&gt;
&lt;p&gt;We fired up tcpdump and started looking at the packets, both in summary and in captured pcaps loaded into wireshark.  We saw telltale signs of packet loss and retransmission, but it was minimal and not particularly worrisome.&lt;/p&gt;
&lt;p&gt;We then captured full connections from cases where ssh established successfully, and full connections from cases where the ssh connection hung.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s what we logically saw when a connection from London to Montreal hung:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Normal &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; handshake&lt;/li&gt;
	&lt;li&gt;Bunch of ssh-specific back and forth, with normal &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; ack packets where they should be&lt;/li&gt;
	&lt;li&gt;A particular packet sent from London and received in Montreal&lt;/li&gt;
	&lt;li&gt;The same packet re-sent (and re-sent, several times) from London and received in Montreal&lt;/li&gt;
	&lt;li&gt;Montreal&amp;#8217;s just not responding to it!&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It didn&amp;#8217;t make sense why Montreal was not responding (hence London re-transmitting it).  The connection was stalled at this point, as the layer 4 protocol was at a stalemate.  More infuriatingly, if you kill the ssh attempt in London and re-launched it immediately, odds are it worked successfully.  When it did, tcpdump showed Montreal receiving the packet but responding to it, and things moved on.&lt;/p&gt;
&lt;p&gt;We enabled verbose debugging (-vvv) on the ssh client in London, and the hang occurred after it logged:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;debug2: kex_parse_kexinit: first_kex_follows 0 
debug2: kex_parse_kexinit: reserved 0 
debug2: mac_setup: found hmac-md5
debug1: kex: server-&amp;gt;client aes128-ctr hmac-md5 none
debug2: mac_setup: found hmac-md5
debug1: kex: client-&amp;gt;server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024&amp;lt;1024&amp;lt;8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Googling &amp;#8220;ssh hang SSH2_MSG_KEX_DH_GEX_GROUP&amp;#8221; has many results &amp;#8211; from bad WiFi, to windows &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; bugs, to buggy routers discarding &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; fragments.  One solution for LANs was to figure out the path&amp;#8217;s &lt;span class=&quot;caps&quot;&gt;MSS&lt;/span&gt; and set that as the &lt;span class=&quot;caps&quot;&gt;MTU&lt;/span&gt; on both ends.&lt;/p&gt;
&lt;p&gt;I kept decrementing the &lt;span class=&quot;caps&quot;&gt;MTU&lt;/span&gt; on a London server down from 1500 &amp;#8211; it didn&amp;#8217;t help until I hit the magic value 576.  At that point, I was no longer able to get the ssh hanging behavior replicated.  I had an ssh loop script running, and it was on-demand that I could cause timeouts by bringing the &lt;span class=&quot;caps&quot;&gt;MTU&lt;/span&gt; back up to 1500, or make them disappear by setting it to 576.&lt;/p&gt;
&lt;p&gt;Unfortunately these are public ad servers and globally setting the &lt;span class=&quot;caps&quot;&gt;MTU&lt;/span&gt; to 576 won&amp;#8217;t cut it, but the above did suggest that perhaps packet fragmentation or reassembly is broken somewhere.&lt;/p&gt;
&lt;p&gt;Going back to check the received packets with tcpdump, there was no evidence of fragmentation.  The received packet size matched exactly the packet size sent.  If something did fragment the packet at byte 576+, something else reassembled it successfully.&lt;/p&gt;
&lt;h3&gt;Twinkle twinkle little mis-shapen star&lt;/h3&gt;
&lt;p&gt;Digging in some more, I was now looking at full packet dumps (tcpdump -s 0 -X) instead of just the headers.  Comparing that magic packet in instances of ssh success vs ssh hang showed very little difference aside from &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt;/IP header variations.  It was however clear that this is the first packet in the &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; connection that had enough data to bypass the 576-byte mark &amp;#8211; all previous packets were much smaller.&lt;/p&gt;
&lt;p&gt;Comparing the same packet, during a hanging instance, as it left London, and as captured in Montreal, something caught my eye.  Something very subtle, and I brushed it off as fatigue (it was late Friday at this point), but sure enough after a few refreshes and comparisons, I wasn&amp;#8217;t imagining things.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s the packet as it left London (minus the first few bytes identifying the IP addresses):&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;0x0040:  0b7c aecc 1774 b770 ad92 0000 00b7 6563  .|...t.p......ec
0x0050:  6468 2d73 6861 322d 6e69 7374 7032 3536  dh-sha2-nistp256
0x0060:  2c65 6364 682d 7368 6132 2d6e 6973 7470  ,ecdh-sha2-nistp
0x0070:  3338 342c 6563 6468 2d73 6861 322d 6e69  384,ecdh-sha2-ni
0x0080:  7374 7035 3231 2c64 6966 6669 652d 6865  stp521,diffie-he
0x0090:  6c6c 6d61 6e2d 6772 6f75 702d 6578 6368  llman-group-exch
0x00a0:  616e 6765 2d73 6861 3235 362c 6469 6666  ange-sha256,diff
0x00b0:  6965 2d68 656c 6c6d 616e 2d67 726f 7570  ie-hellman-group
0x00c0:  2d65 7863 6861 6e67 652d 7368 6131 2c64  -exchange-sha1,d
0x00d0:  6966 6669 652d 6865 6c6c 6d61 6e2d 6772  iffie-hellman-gr
0x00e0:  6f75 7031 342d 7368 6131 2c64 6966 6669  oup14-sha1,diffi
0x00f0:  652d 6865 6c6c 6d61 6e2d 6772 6f75 7031  e-hellman-group1
0x0100:  2d73 6861 3100 0000 2373 7368 2d72 7361  -sha1...#ssh-rsa
0x0110:  2c73 7368 2d64 7373 2c65 6364 7361 2d73  ,ssh-dss,ecdsa-s
0x0120:  6861 322d 6e69 7374 7032 3536 0000 009d  ha2-nistp256....
0x0130:  6165 7331 3238 2d63 7472 2c61 6573 3139  aes128-ctr,aes19
0x0140:  322d 6374 722c 6165 7332 3536 2d63 7472  2-ctr,aes256-ctr
0x0150:  2c61 7263 666f 7572 3235 362c 6172 6366  ,arcfour256,arcf
0x0160:  6f75 7231 3238 2c61 6573 3132 382d 6362  our128,aes128-cb
0x0170:  632c 3364 6573 2d63 6263 2c62 6c6f 7766  c,3des-cbc,blowf
0x0180:  6973 682d 6362 632c 6361 7374 3132 382d  ish-cbc,cast128-
0x0190:  6362 632c 6165 7331 3932 2d63 6263 2c61  cbc,aes192-cbc,a
0x01a0:  6573 3235 362d 6362 632c 6172 6366 6f75  es256-cbc,arcfou
0x01b0:  722c 7269 6a6e 6461 656c 2d63 6263 406c  r,rijndael-cbc@l
0x01c0:  7973 6174 6f72 2e6c 6975 2e73 6500 0000  ysator.liu.se...
0x01d0:  9d61 6573 3132 382d 6374 722c 6165 7331  .aes128-ctr,aes1
0x01e0:  3932 2d63 7472 2c61 6573 3235 362d 6374  92-ctr,aes256-ct
0x01f0:  722c 6172 6366 6f75 7232 3536 2c61 7263  r,arcfour256,arc
0x0200:  666f 7572 3132 382c 6165 7331 3238 2d63  four128,aes128-c
0x0210:  6263 2c33 6465 732d 6362 632c 626c 6f77  bc,3des-cbc,blow
0x0220:  6669 7368 2d63 6263 2c63 6173 7431 3238  fish-cbc,cast128
0x0230:  2d63 6263 2c61 6573 3139 322d 6362 632c  -cbc,aes192-cbc,
0x0240:  6165 7332 3536 2d63 6263 2c61 7263 666f  aes256-cbc,arcfo
0x0250:  7572 2c72 696a 6e64 6165 6c2d 6362 6340  ur,rijndael-cbc@
0x0260:  6c79 7361 746f 722e 6c69 752e 7365 0000  lysator.liu.se..
0x0270:  00a7 686d 6163 2d6d 6435 2c68 6d61 632d  ..hmac-md5,hmac-
0x0280:  7368 6131 2c75 6d61 632d 3634 406f 7065  sha1,umac-64@ope
0x0290:  6e73 7368 2e63 6f6d 2c68 6d61 632d 7368  nssh.com,hmac-sh
0x02a0:  6132 2d32 3536 2c68 6d61 632d 7368 6132  a2-256,hmac-sha2
0x02b0:  2d32 3536 2d39 362c 686d 6163 2d73 6861  -256-96,hmac-sha
0x02c0:  322d 3531 322c 686d 6163 2d73 6861 322d  2-512,hmac-sha2-
0x02d0:  3531 322d 3936 2c68 6d61 632d 7269 7065  512-96,hmac-ripe
0x02e0:  6d64 3136 302c 686d 6163 2d72 6970 656d  md160,hmac-ripem
0x02f0:  6431 3630 406f 7065 6e73 7368 2e63 6f6d  d160@openssh.com
0x0300:  2c68 6d61 632d 7368 6131 2d39 362c 686d  ,hmac-sha1-96,hm
0x0310:  6163 2d6d 6435 2d39 3600 0000 a768 6d61  ac-md5-96....hma
0x0320:  632d 6d64 352c 686d 6163 2d73 6861 312c  c-md5,hmac-sha1,
0x0330:  756d 6163 2d36 3440 6f70 656e 7373 682e  umac-64@openssh.
0x0340:  636f 6d2c 686d 6163 2d73 6861 322d 3235  com,hmac-sha2-25
0x0350:  362c 686d 6163 2d73 6861 322d 3235 362d  6,hmac-sha2-256-
0x0360:  3936 2c68 6d61 632d 7368 6132 2d35 3132  96,hmac-sha2-512
0x0370:  2c68 6d61 632d 7368 6132 2d35 3132 2d39  ,hmac-sha2-512-9
0x0380:  362c 686d 6163 2d72 6970 656d 6431 3630  6,hmac-ripemd160
0x0390:  2c68 6d61 632d 7269 7065 6d64 3136 3040  ,hmac-ripemd160@
0x03a0:  6f70 656e 7373 682e 636f 6d2c 686d 6163  openssh.com,hmac
0x03b0:  2d73 6861 312d 3936 2c68 6d61 632d 6d64  -sha1-96,hmac-md
0x03c0:  352d 3936 0000 0015 6e6f 6e65 2c7a 6c69  5-96....none,zli
0x03d0:  6240 6f70 656e 7373 682e 636f 6d00 0000  b@openssh.com...
0x03e0:  156e 6f6e 652c 7a6c 6962 406f 7065 6e73  .none,zlib@opens
0x03f0:  7368 2e63 6f6d 0000 0000 0000 0000 0000  sh.com..........
0x0400:  0000 0000 0000 0000 0000 0000            ............
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And here&amp;#8217;s the same packet as it arrived in Montreal:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;0x0040:  0b7c aecc 1774 b770 ad92 0000 00b7 6563  .|...t.p......ec
0x0050:  6468 2d73 6861 322d 6e69 7374 7032 3536  dh-sha2-nistp256
0x0060:  2c65 6364 682d 7368 6132 2d6e 6973 7470  ,ecdh-sha2-nistp
0x0070:  3338 342c 6563 6468 2d73 6861 322d 6e69  384,ecdh-sha2-ni
0x0080:  7374 7035 3231 2c64 6966 6669 652d 6865  stp521,diffie-he
0x0090:  6c6c 6d61 6e2d 6772 6f75 702d 6578 6368  llman-group-exch
0x00a0:  616e 6765 2d73 6861 3235 362c 6469 6666  ange-sha256,diff
0x00b0:  6965 2d68 656c 6c6d 616e 2d67 726f 7570  ie-hellman-group
0x00c0:  2d65 7863 6861 6e67 652d 7368 6131 2c64  -exchange-sha1,d
0x00d0:  6966 6669 652d 6865 6c6c 6d61 6e2d 6772  iffie-hellman-gr
0x00e0:  6f75 7031 342d 7368 6131 2c64 6966 6669  oup14-sha1,diffi
0x00f0:  652d 6865 6c6c 6d61 6e2d 6772 6f75 7031  e-hellman-group1
0x0100:  2d73 6861 3100 0000 2373 7368 2d72 7361  -sha1...#ssh-rsa
0x0110:  2c73 7368 2d64 7373 2c65 6364 7361 2d73  ,ssh-dss,ecdsa-s
0x0120:  6861 322d 6e69 7374 7032 3536 0000 009d  ha2-nistp256....
0x0130:  6165 7331 3238 2d63 7472 2c61 6573 3139  aes128-ctr,aes19
0x0140:  322d 6374 722c 6165 7332 3536 2d63 7472  2-ctr,aes256-ctr
0x0150:  2c61 7263 666f 7572 3235 362c 6172 6366  ,arcfour256,arcf
0x0160:  6f75 7231 3238 2c61 6573 3132 382d 6362  our128,aes128-cb
0x0170:  632c 3364 6573 2d63 6263 2c62 6c6f 7766  c,3des-cbc,blowf
0x0180:  6973 682d 6362 632c 6361 7374 3132 382d  ish-cbc,cast128-
0x0190:  6362 632c 6165 7331 3932 2d63 6263 2c61  cbc,aes192-cbc,a
0x01a0:  6573 3235 362d 6362 632c 6172 6366 6f75  es256-cbc,arcfou
0x01b0:  722c 7269 6a6e 6461 656c 2d63 6263 406c  r,rijndael-cbc@l
0x01c0:  7973 6174 6f72 2e6c 6975 2e73 6500 0000  ysator.liu.se...
0x01d0:  9d61 6573 3132 382d 6374 722c 6165 7331  .aes128-ctr,aes1
0x01e0:  3932 2d63 7472 2c61 6573 3235 362d 6374  92-ctr,aes256-ct
0x01f0:  722c 6172 6366 6f75 7232 3536 2c61 7263  r,arcfour256,arc
0x0200:  666f 7572 3132 382c 6165 7331 3238 2d63  four128,aes128-c
0x0210:  6263 2c33 6465 732d 6362 632c 626c 6f77  bc,3des-cbc,blow
0x0220:  6669 7368 2d63 6263 2c63 6173 7431 3238  fish-cbc,cast128
0x0230:  2d63 6263 2c61 6573 3139 322d 6362 632c  -cbc,aes192-cbc,
0x0240:  6165 7332 3536 2d63 6263 2c61 7263 666f  aes256-cbc,arcfo
0x0250:  7572 2c72 696a 6e64 6165 6c2d 6362 7340  ur,rijndael-cbs@
0x0260:  6c79 7361 746f 722e 6c69 752e 7365 1000  lysator.liu.se..
0x0270:  00a7 686d 6163 2d6d 6435 2c68 6d61 732d  ..hmac-md5,hmas-
0x0280:  7368 6131 2c75 6d61 632d 3634 406f 7065  sha1,umac-64@ope
0x0290:  6e73 7368 2e63 6f6d 2c68 6d61 632d 7368  nssh.com,hmac-sh
0x02a0:  6132 2d32 3536 2c68 6d61 632d 7368 7132  a2-256,hmac-shq2
0x02b0:  2d32 3536 2d39 362c 686d 6163 2d73 7861  -256-96,hmac-sxa
0x02c0:  322d 3531 322c 686d 6163 2d73 6861 322d  2-512,hmac-sha2-
0x02d0:  3531 322d 3936 2c68 6d61 632d 7269 7065  512-96,hmac-ripe
0x02e0:  6d64 3136 302c 686d 6163 2d72 6970 756d  md160,hmac-ripum
0x02f0:  6431 3630 406f 7065 6e73 7368 2e63 7f6d  d160@openssh.c.m
0x0300:  2c68 6d61 632d 7368 6131 2d39 362c 786d  ,hmac-sha1-96,xm
0x0310:  6163 2d6d 6435 2d39 3600 0000 a768 7d61  ac-md5-96....h}a
0x0320:  632d 6d64 352c 686d 6163 2d73 6861 312c  c-md5,hmac-sha1,
0x0330:  756d 6163 2d36 3440 6f70 656e 7373 782e  umac-64@openssx.
0x0340:  636f 6d2c 686d 6163 2d73 6861 322d 3235  com,hmac-sha2-25
0x0350:  362c 686d 6163 2d73 6861 322d 3235 362d  6,hmac-sha2-256-
0x0360:  3936 2c68 6d61 632d 7368 6132 2d35 3132  96,hmac-sha2-512
0x0370:  2c68 6d61 632d 7368 6132 2d35 3132 3d39  ,hmac-sha2-512=9
0x0380:  362c 686d 6163 2d72 6970 656d 6431 3630  6,hmac-ripemd160
0x0390:  2c68 6d61 632d 7269 7065 6d64 3136 3040  ,hmac-ripemd160@
0x03a0:  6f70 656e 7373 682e 636f 6d2c 686d 7163  openssh.com,hmqc
0x03b0:  2d73 6861 312d 3936 2c68 6d61 632d 7d64  -sha1-96,hmac-}d
0x03c0:  352d 3936 0000 0015 6e6f 6e65 2c7a 7c69  5-96....none,z|i
0x03d0:  6240 6f70 656e 7373 682e 636f 6d00 0000  b@openssh.com...
0x03e0:  156e 6f6e 652c 7a6c 6962 406f 7065 6e73  .none,zlib@opens
0x03f0:  7368 2e63 6f6d 0000 0000 0000 0000 0000  sh.com..........
0x0400:  0000 0000 0000 0000 0000 0000            ............
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Did something there catch your eye ?  If not, I don&amp;#8217;t blame you.  Feel free to copy each into a text editor and rapidly switch back-and-forth to see some characters dance.  Here&amp;#8217;s what it looks like when they&amp;#8217;re placed in vimdiff:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/blog/vimdiff_packets.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Well well well. It&amp;#8217;s not packet loss, it&amp;#8217;s packet corruption!  Very subtle, very predictable packet corruption.&lt;/p&gt;
&lt;p&gt;Some interesting notes:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;The lower part of the packet (&amp;lt;576 bytes) is unaffected&lt;/li&gt;
	&lt;li&gt;The affected portion is predictably corrupted on the 15th byte of every 16&lt;/li&gt;
	&lt;li&gt;The corruption is predictable.  All instances of &amp;#8220;h&amp;#8221; become &amp;#8220;x&amp;#8221;, all instances of &amp;#8220;c&amp;#8221; become &amp;#8220;s&amp;#8221;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some readers might have already checked &lt;span class=&quot;caps&quot;&gt;ASCII&lt;/span&gt; charts and reached the conclusion:  There&amp;#8217;s a single bit statically stuck at &amp;#8220;1&amp;#8221; somewhere.  Flipping the 4th bit in a byte to 1 would reliably corrupt the above letters on the left side to the value on the right side.&lt;/p&gt;
&lt;p&gt;The obvious culprits within our control (&lt;span class=&quot;caps&quot;&gt;NIC&lt;/span&gt; cards, receiving machines) are not suspect due to the pattern of failure observed (several London machines &amp;#8594; Several Montreal data centers and machines).  It&amp;#8217;s got to be something upstream and close to London.&lt;/p&gt;
&lt;p&gt;Going back to validate, things started to make sense.  I also noticed a little hint in tcpdump verbose mode (tcp cksum bad) which was missed before.  A Montreal machine receiving this packet discarded it at the kernel level after realizing it&amp;#8217;s corrupt, never passing it to the userland ssh daemon.  London then re-transmitted it, going through the same corruption, getting the same silent treatment.  From ssh and sshd&amp;#8217;s perspective, the connection was at a stalemate.  From tcpdump&amp;#8217;s perspective, there was no loss, and Montreal machines appeared to be just ignoring data.&lt;/p&gt;
&lt;p&gt;We sent these findings to our London DC ops, and within a few minutes they changed outbound routes dramatically.  The first router hop, and most hops afterwards, were different.  The hanging problem disappeared.&lt;/p&gt;
&lt;p&gt;Late Friday night fixes are nice because you can relax and not carry problems and support staff into the weekend :)&lt;/p&gt;
&lt;h3&gt;Where&amp;#8217;s Waldo&lt;/h3&gt;
&lt;p&gt;Happy that we were no longer suffering from this problem and that our systems are caught up with the backlog, I decided I&amp;#8217;d try my hand at actually finding the device causing the corruption.&lt;/p&gt;
&lt;p&gt;Having the London routes updated to not go through the old path meant that I couldn&amp;#8217;t reproduce the problem easily.  I asked around until I found a friend with a FreeBSD box in Montreal I could use, which was still accessed through the old routes from London.&lt;/p&gt;
&lt;p&gt;Next, I wanted to make sure that the corruption is predictable even without ssh involvement.  This was trivially proven with a few pipes.&lt;/p&gt;
&lt;p&gt;In Montreal:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;nc -l -p 4000 &amp;gt; /dev/null
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then in London:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;cat /dev/zero | nc mtl 4000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Again, accounting for the randomness factor and settings things up in a retry loop, I got a few packets which remove any doubt about the previous conclusions.  Here&amp;#8217;s part of one &amp;#8211; remember that we&amp;#8217;re sending just a stream of nulls:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;0x0210  .....
0x0220  0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0230  0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0240  0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0250  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0260  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0270  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0280  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0290  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x02a0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x02b0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x02c0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x02d0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x02e0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x02f0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0300  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0310  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0320  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0330  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0340  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0350  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0360  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0370  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0380  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x0390  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x03a0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x03b0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x03c0  0000 0000 0000 0000 0000 0000 0000 1000 ................
0x03d0  0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03e0  .....
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With the bug replicated, I needed to find a way to isolate which of the 17 hops along that path cause the corruption.  There was simply no way to call up the provider of each cluster to ask them to check their systems.&lt;/p&gt;
&lt;p&gt;I decided pinging each router, incrementally, might be the way to go.  I crafted special &lt;span class=&quot;caps&quot;&gt;ICMP&lt;/span&gt; packets that are large enough to go over the 576 safety margin, and filled entirely with NULLs.  Then pinged the Montreal machine with them from London.&lt;/p&gt;
&lt;p&gt;They came back perfectly normal.  There was no corruption.&lt;/p&gt;
&lt;p&gt;I tried all variations of speed, padding, size &amp;#8211; to no avail.  I simply could not observe corruption in the returned &lt;span class=&quot;caps&quot;&gt;ICMP&lt;/span&gt; ping packets.&lt;/p&gt;
&lt;p&gt;I replaced the netcat pipes with &lt;span class=&quot;caps&quot;&gt;UDP&lt;/span&gt; instead of &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt;.  Again there was no corruption.&lt;/p&gt;
&lt;p&gt;The corruption needed &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; to be reproducible &amp;#8211; and &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; needs 2 cooperating endpoints.  I tried in vain to see if all the routers had an open &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; port I can talk to directly, to no avail.&lt;/p&gt;
&lt;p&gt;It seemed there was no easy way an external party can pinpoint the bad apple. Or was there ?&lt;/p&gt;
&lt;h3&gt;Mirror mirror on the wall&lt;/h3&gt;
&lt;p&gt;To detect whether corruption occurred or not, we need one of these scenarios:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Control over the &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; peer we&amp;#8217;re talking to inspect the packet at the destination
	&lt;ul&gt;
		&lt;li&gt;Not just in userland, where the packet would not get delivered if the &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; checksum failed, but root + tcpdump to inspect it as it arrives&lt;/li&gt;
	&lt;/ul&gt;&lt;/li&gt;
	&lt;li&gt;A &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; peer that acts as an echo server to mirror back the data it received, so we get inspect it at the sending node and detect corruption there&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It suddenly occurred to me that the second data point is available to us.  Not per-se, but consider this:  In our very first taste of the problem, we observed ssh clients hanging when talking to ssh servers over the corrupting hop.  This is a good passive signal that we can use instead of the active &amp;#8220;echo&amp;#8221; signal.&lt;/p&gt;
&lt;p&gt;&amp;#8230; and there are lots of open ssh servers out there on the internet to help us out.&lt;/p&gt;
&lt;p&gt;We don&amp;#8217;t need actual accounts on these servers &amp;#8211; we just need to kickstart the ssh connection and see if the cipher exchange phase succeeds or hangs (with a reasonable number of retries to account for corruption randomness).&lt;/p&gt;
&lt;p&gt;So this plan was hatched:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Use the wonderful &lt;b&gt;nmap&lt;/b&gt; tool &amp;#8211; specifically &amp;#8211; its &amp;#8220;random IP&amp;#8221; mode &amp;#8211; to make a list of geographically distributed open ssh servers&lt;/li&gt;
	&lt;li&gt;Test each server to determine whether it is:
	&lt;ul&gt;
		&lt;li&gt;Unresponsive/unpredictable/firewalled &amp;#8594; Ignore it&lt;/li&gt;
		&lt;li&gt;Negotiates successfully after being retried N times &amp;#8594; mark as &amp;#8220;good&amp;#8221;&lt;/li&gt;
		&lt;li&gt;Negotiates with hangs at the telltale phase after being retried N times &amp;#8594; mark as &amp;#8220;bad&amp;#8221;&lt;/li&gt;
	&lt;/ul&gt;&lt;/li&gt;
	&lt;li&gt;For both &amp;#8220;good&amp;#8221; and &amp;#8220;bad&amp;#8221; servers, remember the traceroute to them&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The idea was this:  All servers marked as &amp;#8220;bad&amp;#8221; will share a few hops in their traceroute.  We can then take that set of suspect hops, and subtract from it any that appear in the traceroutes of the &amp;#8220;good&amp;#8221; servers.  Hopefully what&amp;#8217;s left is only one or two.&lt;/p&gt;
&lt;p&gt;After spending an hour manually doing the above exercise, I stopped to inspect the data.  I had classified 16 servers as &amp;#8220;&lt;span class=&quot;caps&quot;&gt;BAD&lt;/span&gt;&amp;#8221; and 25 servers as &amp;#8220;&lt;span class=&quot;caps&quot;&gt;GOOD&lt;/span&gt;&amp;#8221;.&lt;/p&gt;
&lt;p&gt;The first exercise was to find the list of hops that appear in all the traceroutes of the &amp;#8220;&lt;span class=&quot;caps&quot;&gt;BAD&lt;/span&gt;&amp;#8221; servers.  As I cleaned and trimmed the list, I realized I won&amp;#8217;t even need to get to the &amp;#8220;&lt;span class=&quot;caps&quot;&gt;GOOD&lt;/span&gt;&amp;#8221; list to remove false positives.  Within the &amp;#8220;&lt;span class=&quot;caps&quot;&gt;BAD&lt;/span&gt;&amp;#8221; lists alone, there remained only 1 that was common to all of them.&lt;/p&gt;
&lt;p&gt;For what it&amp;#8217;s worth, it was 2 providers away:  London &amp;#8594; N hops upstream1 &amp;#8594; Y hops upstream2&lt;/p&gt;
&lt;p&gt;It was the first in Y hops of upstream2 &amp;#8211; right at the edge between upstream1 and upstream2, corrupting random &lt;span class=&quot;caps&quot;&gt;TCP&lt;/span&gt; packets, causing many retries, and, depending on the protocol&amp;#8217;s logical back-and-forth, hangs, or reduced transmission rates.  You may have been a telephony provider who sufferred dropped calls, a retailer who lost a few customers or sales, the possibilities really are endless.&lt;/p&gt;
&lt;p&gt;I followed up with our London DC ops with the single hop&amp;#8217;s IP address.  Hopefully with their direct relationship with upstream1 they can escalate through there and get it fixed.&lt;/p&gt;
&lt;p&gt;/filed under crazy devops war stories&lt;/p&gt;
&lt;h3&gt;Update&lt;/h3&gt;
&lt;p&gt;Through upstream1, I got confirmation that the hop I pointed out (first in upstream2) had an internal &amp;#8220;management module failure&amp;#8221; which affected &lt;span class=&quot;caps&quot;&gt;BGP&lt;/span&gt; and routing between two internal networks.  It&amp;#8217;s still down (they&amp;#8217;ve routed around it) until they receive a replacement for the faulty module.&lt;/p&gt;
&lt;p&gt;Thanks for the kind words and great comments here on Disqus, Reddit (&lt;a href=&quot;http://www.reddit.com/r/linux/comments/11x7ld/the_little_ssh_that_sometimes_couldnt/&quot;&gt;/r/linux&lt;/a&gt; &amp;amp; &lt;a href=&quot;http://www.reddit.com/r/sysadmin/comments/129bpf/the_little_ssh_that_sometimes_couldnt/&quot;&gt;/r/sysadmin&lt;/a&gt;) and &lt;a href=&quot;http://news.ycombinator.com/item?id=4709438&quot;&gt;hacker news&lt;/a&gt;&lt;/p&gt;</description>
				<pubDate>Mon, 22 Oct 2012 00:00:00 -0400</pubDate>
				<link>http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html</link>
				<guid isPermaLink="true">http://mina.naguib.ca/blog/2012/10/22/the-little-ssh-that-sometimes-couldnt.html</guid>
			</item>
		
			<item>
				<title>Food: Egyptian-style rice</title>
				<description>&lt;h3&gt;About&lt;/h3&gt;
&lt;p&gt;I&amp;#8217;ve been evolving this recipe, and it&amp;#8217;s reached a pretty good point worth sharing.  I recommend it highly as it&amp;#8217;s fairly easy to make and the result is fantastic.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/blog/rice.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Ingredients&lt;/h3&gt;
&lt;ul&gt;
	&lt;li&gt;2 tsp cumin seeds&lt;/li&gt;
	&lt;li&gt;5 cardamom pods&lt;/li&gt;
	&lt;li&gt;4 cloves&lt;/li&gt;
	&lt;li&gt;1 1/2 cinnamon sticks&lt;/li&gt;
	&lt;li&gt;1 bay leaf&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;1 bunch angelhair pasta&lt;/li&gt;
	&lt;li&gt;1 tbsp butter&lt;/li&gt;
	&lt;li&gt;1 1/2 cups basmati rice&lt;/li&gt;
	&lt;li&gt;1/2 orange&lt;/li&gt;
	&lt;li&gt;4 cups water&lt;/li&gt;
	&lt;li&gt;1/4 cup golden raisins&lt;/li&gt;
	&lt;li&gt;Salt&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;2 tbsp sliced almonds&lt;/li&gt;
	&lt;li&gt;2 tbsp coriander leaf &amp;#8211; chopped&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Preparation&lt;/h3&gt;
&lt;ol&gt;
	&lt;li&gt;Dry-roast the cumin, cardamom, cloves &amp;amp; 1/2 cinnamon stick, then grind and set aside&lt;/li&gt;
	&lt;li&gt;In the rice pot, break the pasta to 1-inch pieces, dry-roast on medium heat till dark brown&lt;/li&gt;
	&lt;li&gt;Add the butter, then cinnamon stick, bay leaf, and ground spices, fry for a minute&lt;/li&gt;
	&lt;li&gt;Add the rice, stir for a couple of minutes till well coated&lt;/li&gt;
	&lt;li&gt;Squeeze in the 1/2 orange juice, then add the water.  Stir, heat on high till boil, lower heat&lt;/li&gt;
	&lt;li&gt;Cook uncovered for 10 minutes, stirring 2-3 times, till the water is all gone.&lt;/li&gt;
	&lt;li&gt;At the 5-minute mark add the raisins, salt to-taste, and zest the 1/2 orange&amp;#8217;s peel&lt;/li&gt;
	&lt;li&gt;In a separate pan, dry-roast the sliced almonds&lt;/li&gt;
	&lt;li&gt;When the rice is done, serve garnished with the roasted almonds and coriander&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Enjoy.&lt;/p&gt;</description>
				<pubDate>Tue, 19 Jul 2011 00:00:00 -0400</pubDate>
				<link>http://mina.naguib.ca/blog/2011/07/19/food-egyptian-rice.html</link>
				<guid isPermaLink="true">http://mina.naguib.ca/blog/2011/07/19/food-egyptian-rice.html</guid>
			</item>
		
			<item>
				<title>Videotron latency and packet loss</title>
				<description>&lt;p style=&quot;text-style: italic; text-align: center;&quot;&gt;Jump to updates: &lt;a href=&quot;#update1&quot;&gt;1&lt;/a&gt; &lt;a href=&quot;#update2&quot;&gt;2&lt;/a&gt;  &lt;a href=&quot;#update3&quot;&gt;3&lt;/a&gt; &lt;a href=&quot;#update4&quot;&gt;4&lt;/a&gt; &lt;a href=&quot;#update5&quot;&gt;5&lt;/a&gt; &lt;a href=&quot;#update6&quot;&gt;6&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;How fast does your &lt;span class=&quot;caps&quot;&gt;ISP&lt;/span&gt; respond to your complaints ?&lt;/h3&gt;
&lt;p&gt;Our &lt;span class=&quot;caps&quot;&gt;ISP&lt;/span&gt;, &lt;a href=&quot;http://videotron.com&quot;&gt;Videotron&lt;/a&gt;, has been working on our problem for:&lt;/p&gt;
&lt;div style=&quot;font-weight: bold;font-size: 1.3em; text-align: center;&quot; class=&quot;flash&quot; id=&quot;countup&quot;&gt;A month&lt;/div&gt;
&lt;h3&gt;Background:&lt;/h3&gt;
&lt;p&gt;Where &lt;a href=&quot;http://bloom-hq.com/&quot;&gt;I work&lt;/a&gt; , our internet connection is via Quebec&amp;#8217;s leading telecommunications company, &lt;a href=&quot;http://videotron.com&quot;&gt;Videotron&lt;/a&gt; .  We&amp;#8217;re subscribed to their &amp;#8220;&lt;span class=&quot;caps&quot;&gt;TGV&lt;/span&gt; 50&amp;#8221; business-class service.&lt;/p&gt;
&lt;p&gt;Weeks ago, I started noticing intermittently bad performance, ranging from trouble with latency-sensitive applications (remote desktop, &lt;span class=&quot;caps&quot;&gt;VOIP&lt;/span&gt;, ssh) to web browser timeouts.&lt;/p&gt;
&lt;p&gt;Due to the problem&amp;#8217;s intermittency I had trouble getting videotron on the phone while the problem is happening, so I whipped up a &lt;a href=&quot;https://github.com/minaguib/pingrrd&quot;&gt;small tool&lt;/a&gt; to start graphing the ping latency and packet loss to the next immediate hop, the videotron gateway.  It then became clear we&amp;#8217;re suffering from high latency and packet loss, and that it is most prominent during business hours and slightly better at night.&lt;/p&gt;
&lt;p&gt;With enough data, I called videotron and described the problem.  After a couple of technician visits and more phone calls, videotron concluded that there&amp;#8217;s an area-wide problem and it would be fixed soon.  We got a ticket number and I was happy with that early response.&lt;/p&gt;
&lt;p&gt;I suppose &amp;#8220;soon&amp;#8221; is a relative term.  Many more phone calls later (I lost count), and I&amp;#8217;ll let you judge whether you still see the problem or not:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/blog/videotron-1month.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;At the time of this writing (noon March 31, 2011) the connection is so bad I&amp;#8217;ve had to pause and continue writing this post several times:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/blog/videotron-1hour.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For comparison&amp;#8217;s sake, here&amp;#8217;s what my personal videotron connection at home looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/blog/videotron-home-1week.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;And so&amp;#8230;&lt;/h3&gt;
&lt;p&gt;It&amp;#8217;s unfortunate that while the people working at videotron seem genuinely interested to help while on the phone, the end result is complete mediocrity when it comes to actually delivering on what was discussed (fixing the problem, calling back with status updates, escalating to supervisors, etc..)&lt;/p&gt;
&lt;p&gt;And so, our business genuinely suffers due to these technical problems, and even though we have an open ticket number with Videotron, there seems to be a big mismatch between our expectations of reliable internet service and their willingness to deliver it.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;How has your &lt;span class=&quot;caps&quot;&gt;ISP&lt;/span&gt; been treating you ?&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Updates:&lt;/h3&gt;
&lt;h4 id=&quot;update1&quot;&gt;Thursday, March 31 5pm&lt;/h4&gt;
&lt;p&gt;Videotron called.  Work will be done tonight to fix one problem as well as April 6th to fix another, the sum of which should fix our problems.&lt;/p&gt;
&lt;h4 id=&quot;update2&quot;&gt;Friday, April 1 noon&lt;/h4&gt;
&lt;p&gt;Videotron called.  Confirmed work was done last night but performance is still poor.  Will call back after next Wednesday&amp;#8217;s scheduled work.&lt;/p&gt;
&lt;h4 id=&quot;update3&quot;&gt;Tuesday, April 5 noon&lt;/h4&gt;
&lt;p&gt;Videotron called.  Confirmed some faulty equipment was fixed yesterday, and indeed our experience with service appears to be now going well.  Furthermore the work scheduled for tonight to split the cell is still going on so things thould improve even more.  Considering issue closed.&lt;/p&gt;
&lt;h4 id=&quot;update4&quot;&gt;Friday, April 8 noon&lt;/h4&gt;
&lt;p&gt;Things were going great (8ms average latency) since Monday.  That&amp;#8217;s 4 whole days of good service.&lt;/p&gt;
&lt;p&gt;However, two videotron techs show up,  unannounced, to perform some more testing and tuning.&lt;/p&gt;
&lt;p&gt;Halfway through their work, they &amp;#8220;tuned&amp;#8221; our connection enough to un-do the Monday fix.  We&amp;#8217;re now back to square one with a barely usable internet connection.  They were unable to restore it to low latency and left.&lt;/p&gt;
&lt;p&gt;Would you like to play &amp;#8220;guess when the videotron techs showed up and broke our connection&amp;#8221; ? Step right up:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/blog/videotron-1day-after-visit.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I have no words to describe the amount of frustration we&amp;#8217;re experiencing dealing with this company.&lt;/p&gt;
&lt;h4 id=&quot;update5&quot;&gt;Monday, April 18&lt;/h4&gt;
&lt;p&gt;Latency has been quite good since last Thursday (April 14th).  Hoping it&amp;#8217;ll remain stable and these problems are now past us.&lt;/p&gt;
&lt;h4 id=&quot;update6&quot;&gt;Thursday, June 16&lt;/h4&gt;
&lt;p&gt;A few temporary hiccups here and there aside, I&amp;#8217;m happy to say it seems videotron has put their act together.  We&amp;#8217;ve been having stable, fast service for a few weeks in a row.  I&amp;#8217;ve also received several courtesy calls to check on things and make sure we&amp;#8217;re satisfied.  Hoping all of this is now behind us.&lt;/p&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
&lt;!--
  var time_from = new Date(2011, 3 - 1, 1).getTime() / 1000;
  var time_to = new Date(2011, 4 - 1, 14, 0, 0).getTime() / 1000;
  var countup = $(&quot;countup&quot;);
  function seconds_to_words(seconds) {
    seconds = Math.floor(seconds);
    var sentence = [];
    var days = Math.floor(seconds / 86400);
    seconds -= (days * 86400);
    if (days === 1) {sentence.push(&quot;1 day&quot;)} else {sentence.push(String(days) + &quot; days&quot;)}
    var hours = Math.floor(seconds / 3600);
    seconds -= (hours * 3600);
    if (hours === 1) {sentence.push(&quot;1 hour&quot;)} else {sentence.push(String(hours) + &quot; hours&quot;)}
    var minutes = Math.floor(seconds / 60);
    seconds -= (minutes * 60);
    if (minutes === 1) {sentence.push(&quot;1 minute&quot;)} else {sentence.push(String(minutes) + &quot; minutes&quot;)}
    if (seconds === 1) {sentence.push(&quot;1 second&quot;)} else {sentence.push(String(seconds) + &quot; seconds&quot;)}
    var endpart = sentence.pop();
    return(sentence.join(&quot;, &quot;) + &quot; and &quot; + endpart);
  }
  function update_countup() {
    var till;
    if (typeof(time_to) == &quot;undefined&quot;) {
      till = (new Date()).getTime() / 1000;
    }
    else {
      till = time_to;
    }
    var elapsed = till - time_from;
    var sentence = seconds_to_words(elapsed);
    countup.update(sentence);
    setTimeout(update_countup, 1000);
  }
  update_countup();
// --&gt;
&lt;/script&gt;</description>
				<pubDate>Thu, 31 Mar 2011 00:00:00 -0400</pubDate>
				<link>http://mina.naguib.ca/blog/2011/03/31/videotron-latency.html</link>
				<guid isPermaLink="true">http://mina.naguib.ca/blog/2011/03/31/videotron-latency.html</guid>
			</item>
		
			<item>
				<title>Optional and name-based arguments in C</title>
				<description>&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;It&amp;#8217;s inevitable that some functions&amp;#8217; signatures become unwieldy.  This is especially true as a project ages, and various developers who were not involved in the initial vision are asked to extend and tweak various existing functionality.&lt;/p&gt;
&lt;p&gt;In a high-level language, it&amp;#8217;s often possible to effect new changes in an existing function without modifying all its consumers.&lt;/p&gt;
&lt;p&gt;In C however, that&amp;#8217;s not the case (unless the function already happens to use va_args).  Once a C function&amp;#8217;s signature changes, all consumers must be modified.  Things get even more complicated if the function is part of a public &lt;span class=&quot;caps&quot;&gt;API&lt;/span&gt; in a library.&lt;/p&gt;
&lt;p&gt;It occurred to me that there are existing features of C that allow us to pre-emptively design functions in a way that future-proofs them in anticipation of these types of changes.  I&amp;#8217;m not advocating that all functions be designed this way, however for the the class of functions that appear candidates for future signature expansion, my proposal may provide some ideas.&lt;/p&gt;
&lt;h3&gt;Existing stepping stones&lt;/h3&gt;
&lt;p&gt;The main features we&amp;#8217;ll be putting together for this method are:&lt;/p&gt;
&lt;h4&gt;Calling by-value&lt;/h4&gt;
&lt;p&gt;In C all function parameters and returns are passed by-value, not by reference.  That means the callee receives a copy of the parameters, and can not influence the caller&amp;#8217;s copy.&lt;/p&gt;
&lt;p&gt;Perhaps for the sake of performance, C developers generally assume the above rule leads to &amp;#8220;structs must be passed as a pointer&amp;#8221; to allow that copying to be performed, however that is not true.  It&amp;#8217;s perfectly valid in C to call a function passing it a struct (not a pointer to a struct).  The function&amp;#8217;s signature would look like this:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;say_hello&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The caller would call this function like so:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;say_hello&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;say_hello&lt;/code&gt; would receive a &lt;span class=&quot;caps&quot;&gt;COPY&lt;/span&gt; of that struct, which means its local &lt;code&gt;p&lt;/code&gt; will be at a different address than &lt;code&gt;bob&lt;/code&gt;.  The inner members of &lt;code&gt;p&lt;/code&gt; will be copied, individually, to the values of the inner members of &lt;code&gt;bob&lt;/code&gt;&lt;/p&gt;
&lt;h4&gt;Initializing structs&lt;/h4&gt;
&lt;p&gt;Initializing a variable such as an int can be done like so:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Similarly, initializing a struct can be done like so:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Accounting&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The individual struct members are given, between brackets, in the same order as the struct declaration.&lt;/p&gt;
&lt;p&gt;Missing members are allowed:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In the above case, any members not supplied will be initialized to 0 / &lt;span class=&quot;caps&quot;&gt;NULL&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s perfectly legal to not supply any members, in which case all members will be set to 0 / &lt;span class=&quot;caps&quot;&gt;NULL&lt;/span&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Inlined initialization&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;say_hello&lt;/code&gt; which accepts a struct may be called using an inline-initialized struct like so:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;say_hello&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Accounting&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Some compilers may need the struct explicitly cast, like so:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;n&quot;&gt;say_hello&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Accounting&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;C99 Designated initializers&lt;/h4&gt;
&lt;p&gt;When initializing a struct, you have the option of referencing its individual members by name, instead of by position:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;age&lt;/span&gt;        &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;department&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Accounting&amp;quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Again, as with position-based initialization, members may be omitted to default them to 0 / &lt;span class=&quot;caps&quot;&gt;NULL&lt;/span&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;person&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bob&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4&gt;Variadic macros&lt;/h4&gt;
&lt;p&gt;While macros need no introduction to any seasoned C programmers, they&amp;#8217;re typically declared to accept a fixed set of arguments.  &lt;a href=&quot;http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html&quot;&gt;Variadic macros&lt;/a&gt; allow for a dynamic set of arguments to the macro.  We&amp;#8217;ll be using them for some final syntactic sugar, but they&amp;#8217;re not strictly necessary.&lt;/p&gt;
&lt;h3&gt;Putting it all together&lt;/h3&gt;
&lt;p&gt;We&amp;#8217;ll assume we&amp;#8217;re writing a &lt;code&gt;find&lt;/code&gt; function that, given a set of criteria, searches an employee directory and returns matches.&lt;/p&gt;
&lt;p&gt;The implementation of the search and returning the result aren&amp;#8217;t important.  What is however is designing a signature for that function that anticipates inevitable future changes to the supplied search criteria.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;cm&quot;&gt;/* nba.h */&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#ifndef _NBA_H&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define _NBA_H&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_find_args&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min_age&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_age&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define FindArgs(...) ((struct _find_args){__VA_ARGS__})&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_find_args&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define Find(...) (find(FindArgs(__VA_ARGS__)))&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#endif&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The implementation of &lt;code&gt;find&lt;/code&gt; itself, for the purpose of demonstrating consuming the input, looks like so:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_find_args&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Looking for criteria:&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Name: [%s]&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Min age: [%d]&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_age&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\t&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Max age: [%d]&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fa&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_age&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Finally, the consumer consumes it using the &lt;code&gt;Find&lt;/code&gt; macro (notice the capitalization).  Here&amp;#8217;s a battery of consumption cases:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;c&quot;&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Albert&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Jane&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;cm&quot;&gt;/* Static input */&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Bob&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;cm&quot;&gt;/* Dynamic input */&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Mary&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;cm&quot;&gt;/* Dynamic input in loop */&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;Find&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
                        &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;min_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                        &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;max_age&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When run, the output is:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;$ ./nba
Looking for criteria:
        Name: [Bob]
        Max age: [20]
Looking for criteria:
        Name: [Mary]
        Min age: [8]
Looking for criteria:
        Name: [Albert]
Looking for criteria:
        Name: [Jane]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3&gt;End result and viability&lt;/h3&gt;
&lt;p&gt;The end result is demonstrated in the &lt;code&gt;main&lt;/code&gt; function above.&lt;/p&gt;
&lt;p&gt;The pros are:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Extremely clear naming for the arguments being given to the function&lt;/li&gt;
	&lt;li&gt;No need to supply arguments that are not needed&lt;/li&gt;
	&lt;li&gt;Ability to add new members to the struct without altering existing code that consumes that function&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cons are:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;There&amp;#8217;s probably more overhead to copying structs in general compared to copying the same number of elements declared as regular function parameters&lt;/li&gt;
	&lt;li&gt;Missing members defaulting to 0 / &lt;span class=&quot;caps&quot;&gt;NULL&lt;/span&gt; may be logically ambiguous if 0 / &lt;span class=&quot;caps&quot;&gt;NULL&lt;/span&gt; is a logically acceptable explicit value for a given member.  That is why the above example explicitly sets &lt;code&gt;min_age&lt;/code&gt; / &lt;code&gt;max_age&lt;/code&gt; to -1 to indicate non-interest.  Another approach would be to use a secondary member to indicate the desire to &amp;#8220;use&amp;#8221; the first member.  For example, &lt;code&gt;age&lt;/code&gt; coupled with &lt;code&gt;use_age&lt;/code&gt;&lt;/li&gt;
	&lt;li&gt;Variadic macros may not be available everywhere.  This method still works without them, at the cost of more verbosity for the function consumer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So far, the above has just been ideas and experimentation on my part.  It&amp;#8217;s not something I&amp;#8217;ve used in any actual projects I&amp;#8217;m working on.&lt;/p&gt;
&lt;p&gt;If you&amp;#8217;re a C developer I&amp;#8217;d love to hear your thoughts on this approach.&lt;/p&gt;</description>
				<pubDate>Thu, 09 Dec 2010 00:00:00 -0500</pubDate>
				<link>http://mina.naguib.ca/blog/2010/12/09/c-optional-and-name-based-arguments.html</link>
				<guid isPermaLink="true">http://mina.naguib.ca/blog/2010/12/09/c-optional-and-name-based-arguments.html</guid>
			</item>
		
			<item>
				<title>PostgreSQL transactions wrapping child+parent modifications, deadlocks, and ActiveRecord</title>
				<description>&lt;h3&gt;From PostgreSQL&amp;#8217;s perspective&lt;/h3&gt;
&lt;p&gt;To demonstrate the issue, let&amp;#8217;s set up two tables and create a parent record with id 1:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;  &lt;span class=&quot;k&quot;&gt;create&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parents&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;serial&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;primary&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;default&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;create&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;serial&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;primary&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;parent_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;integer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;references&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parents&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;insert&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parents&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, let&amp;#8217;s do a transaction that inserts a child under parent 1, and increment that parent&amp;#8217;s &lt;code&gt;children_count&lt;/code&gt; field:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;  &lt;span class=&quot;k&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;insert&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parent_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;update&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parents&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;commit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Everything is fine and peachy.  However, what happens if there are two processes doing the above concurrently, and the timing for each step happens as demonstrated below:&lt;/p&gt;
&lt;table&gt;
	&lt;tr&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
		&lt;th&gt;Session 1&lt;/th&gt;
		&lt;th&gt;Session 2&lt;/th&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;1&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;2&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;insert&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parent_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;3&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;4&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;insert&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;into&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parent_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;5&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;update&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parents&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;6&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;update&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parents&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;children_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;7&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;commit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
	&lt;/tr&gt;
	&lt;tr&gt;
		&lt;td&gt;8&lt;/td&gt;
		&lt;td&gt;&amp;nbsp;&lt;/td&gt;
		&lt;td&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;commit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;
	&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;Things go smoothly as expected until step 5, when we receive the first hint that something isn&amp;#8217;t right. It&amp;#8217;ll block.&lt;/p&gt;
&lt;p&gt;Step 6 will also appear to block, for roughly 1 second, then session 1 will un-block while session 2 will receive:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;  DETAIL:  Process 20906 waits for ExclusiveLock on tuple (0,1) of relation 26335948 of database 16385; blocked by process 20918.
  Process 20918 waits for ShareLock on transaction 168526; blocked by process 20906.
  HINT:  See server log for query details.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So &lt;em&gt;what happened ?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Deadlocks generally occur if two processes are grabbing locks without following the same order in acquiring and releasing them.  For example:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;  Process 1 grabs lock A = success
  Process 2 grabs lock B = success
  Process 1 grabs lock B = block
  Process 2 grabs lock A = impossible
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;PostgreSQL has an internal mechanism for detecting such a scenario, and it (by default) kicks in 1 second after a blocking lock.  If a deadlock is detected, one of the parties receives an error and their transaction is rolled back.&lt;/p&gt;
&lt;p&gt;In our case though, we&amp;#8217;re not acquiring any obvious locks (explicitly with an &lt;span class=&quot;caps&quot;&gt;SQL&lt;/span&gt; &lt;span class=&quot;caps&quot;&gt;LOCK&lt;/span&gt; command, or select.. &lt;span class=&quot;caps&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;caps&quot;&gt;UPDATE&lt;/span&gt;). So what gives ?&lt;/p&gt;
&lt;p&gt;It turns out that, when tables are referencing each other using foreign keys, PostgreSQL will implicitly acquire a row lock on the referenced row in the secondary (parents) table to protect the atomicity of the entire operation.  Therefore, in the above example, both step 2 and step 4 acquire a shared lock on the parents row id 1.  Step 5 then blocks on the lock held by step 4.  Step 6 causes the deadlock, which aborts session 2&amp;#8217;s transaction.  This releases the shared lock and allows step 5 to proceed.&lt;/p&gt;
&lt;p&gt;This has been a long-standing issue in the database and the PostgreSQL hackers refer to that type of thing as &amp;#8220;user-hostile&amp;#8221;.  Recently, I ran into this issue and through searching saw it was recently re-ignited in August by &lt;a href=&quot;http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/146931&quot;&gt;Joel Jacobson&lt;/a&gt; .  The discussion had some excellent ideas but it went dormant after a while.&lt;/p&gt;
&lt;h3&gt;From ActiveRecord&amp;#8217;s perspective&lt;/h3&gt;
&lt;p&gt;All clients to PostgreSQL being equal, Rails/ActiveRecord is at a disadvantage with regards to this bug, mainly because:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;It&amp;#8217;s a web framework &amp;#8211; where the usual use case is concurrent access&lt;/li&gt;
	&lt;li&gt;A popular little feature called &amp;#8220;cached counters&amp;#8221;&lt;/li&gt;
	&lt;li&gt;The ease of extending an ActiveRecord model with callbacks, touching the parent for various reasons&lt;/li&gt;
	&lt;li&gt;The abstraction between the developer and the raw &lt;span class=&quot;caps&quot;&gt;SQL&lt;/span&gt; calls, which, next to lowering the barrier of entry, makes isolating and investigating such a problem more difficult&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This problem may manifest itself in Rails with an error whose stack trace includes something like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;text&quot;&gt;ActiveRecord::StatementInvalid: PGError: ERROR: deadlock detected DETAIL: Process 4844 waits for ShareLock on transaction 34688860; blocked by process 4846. Process 4846 waits for ExclusiveLock on tuple (1,40) of relation 19104 of database 16388; blocked by process 4844. HINT: See server log for query details. : UPDATE &amp;#39;parents&amp;#39; SET &amp;#39;children_count&amp;#39; = COALESCE(&amp;#39;children_count&amp;#39;, 0) + 1 WHERE (&amp;#39;id&amp;#39; = 925) AND ( (&amp;#39;parents&amp;#39;.&amp;#39;type&amp;#39; = &amp;#39;MaleParent&amp;#39; ) )
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3&gt;What can be done&lt;/h3&gt;
&lt;p&gt;None of the obvious solutions are pretty:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Skip cached counters, parent modifications, and anything of that sort&lt;/li&gt;
	&lt;li&gt;Remove the foreign key constraints&lt;/li&gt;
	&lt;li&gt;Don&amp;#8217;t use transactions&lt;/li&gt;
	&lt;li&gt;Divide the operation into several transactions, where the parents aren&amp;#8217;t modified in the same transaction as the children&lt;/li&gt;
	&lt;li&gt;Serialize the entire transaction by grabbing a lock (using an external mechanism or a PostgreSQL advisory lock)&lt;/li&gt;
	&lt;li&gt;Add client code to handle the deadlock error and (safely) re-try the needed operations&lt;/li&gt;
	&lt;li&gt;Fix the issue in PostgreSQL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The viability of all of the above depends on the application, however fixing the issue in PostgreSQL isn&amp;#8217;t out of the question.  As a matter of fact, it&amp;#8217;s already begun!  You can see progress &lt;a href=&quot;https://www.fossexperts.com/content/foreign-key-locks&quot;&gt;here&lt;/a&gt; , &lt;a href=&quot;http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/&quot;&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;How you can help&lt;/h3&gt;
&lt;p&gt;With a monetary contribution, small or large.&lt;/p&gt;
&lt;p&gt;The above work was commissionned after the August discussion on the pgsql-hackers mailing list, however the funds for it have only been minimally raised and they&amp;#8217;re still a few thousand dollars short.&lt;/p&gt;
&lt;p&gt;If you or your employer would like to see this fix come to fruition, please visit the donation page at:&lt;/p&gt;
&lt;p class=&quot;flash&quot;&gt;&lt;a href=&quot;https://www.fossexperts.com/content/foreign-key-locks-0&quot;&gt;https://www.fossexperts.com/content/foreign-key-locks-0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Every little bit helps.  You&amp;#8217;ll be helping improve an already great open source database, as well as all who rely on it including &lt;span class=&quot;caps&quot;&gt;ROR&lt;/span&gt; consumers.&lt;/p&gt;
&lt;h3&gt;Update&lt;/h3&gt;
&lt;p&gt;28 patch revisions later, it seems that Alvaro Herrera has carried this feature through.  You can see the patch &lt;a href=&quot;https://commitfest.postgresql.org/action/patch_view?id=987&quot;&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This patch is candidate for commitfest &lt;a href=&quot;https://commitfest.postgresql.org/action/commitfest_view?id=17&quot;&gt;2013-01&lt;/a&gt; &amp;#8211; which means you may see it in a beta PostgreSQL release near you sometime soon.&lt;/p&gt;</description>
				<pubDate>Mon, 22 Nov 2010 00:00:00 -0500</pubDate>
				<link>http://mina.naguib.ca/blog/2010/11/22/postgresql-foreign-key-deadlocks.html</link>
				<guid isPermaLink="true">http://mina.naguib.ca/blog/2010/11/22/postgresql-foreign-key-deadlocks.html</guid>
			</item>
		
	</channel>
</rss>
