Jul 30
2009

When the worst case is really, really bad

Posted by Andrew Koenig in Miscellaneous MusingsArchitecture and DesignAlgorithmic complexity

andrewk

In my last note, I talked about the idea of something happening "with probability 1."  Such events are bound to happen eventually, even though there is no way to predict just how long it might take.  A similar notion occurs with algorithms that have a worst case that is much worse than average.  Two such algorithms, or categories of algorithms, are quicksort and hashing.

Hashing typically relies on the notion a hash code with the property that if two values are unequal, their hash codes will usually be unequal as well.  Such hash codes can fail in three ways.

First, the hash code might be poorly designed.  As a limiting example, one can imagine someone constructing a data structure for which the hash code is always 42, with the intention of coming back later and replacing that sillly hash code with something genuinely useful.  Forgetting to replace it might slip through testing and cause performance problems during production.  More generally, from time to time we see hash codes that just don't work as well in practice as they did in theory.

The second kind of failure is much less likely: Perhaps someone could come along by accident and supply a bunch of data values that just happen to have the same hash code.  The whole point of hashing, of course, is to choose a hash algorithm that will make this eventuality very unlikely in practice.  Indeed, if the hash algorithm is effectively random, then the chance of such dire performance problems is usually too small to worry about.

Where such problems become severe is in the third case:  Perhaps a malicious user might reverse-engineer the hash code and supply data with the specific purpose of making the hash algorithm perform badly.  Such reverse engineering is a kind of denial-of-service attack, and in our increasingly security-sensitive world, has to be taken more seriously than most people realize.  The point is that events that are rare in theory can be made common through a coordinated effort.

Many years ago, I heard about a striking example of such an attack.  It took place in a college dorm that had tankless toilets.  Such toilets rely on a large flow of water being for the short time during which the toilet is being flushed.  A small pressure-activated device in the toilet shuts off the water after it has been running long enough.

What happened in this dorm was that some students decided to see what would happen if they flushed all the toilets in the building at the same time.  To do this, they enlisted a bunch of friends, synchronized their watches, and flushed the toilets.

The results were surprising.  Allowing water to flow to all of those toilets at once dropped the water pressure in the building to below what was necessary to activate the shutoff mechanisms.  So all the toilets continued to allow water to trickle through their pipes, which in turn prevented the water pressure from increasing.  In effect, flushing all the toilets at once broke the building's plumbing, and it stayed broken until someone went through the building and shut off enough of the toilets' water supply by hand to allow the remaining toilets to shut themselves off.

This is a graphic example of how to crash a system by arranging for enough events to happen at the same time that shouldn't be doing so.  It's much the same kind of denial-of-service attack that one might mount against a hashing algorithm.

In case you think such an attack is infeasible in practice, consider quicksort.  Like hashing, this is an algorithm that works very well most of the time but has a horrible worst case.  The basic idea is to pick a random element in the sequence to be sorted, and then partition the sequence into three segments that are respectively less than, equal to, or or greater than the chosen element.  This can be done in linear time by swapping elements.  After partitioning, one runs quicksort recursively on the two segments that are unequal to the chosen element.

Quicksort works because a randomly chosen element is apt to sort into a random position in the sequence.  If by chance one chooses too many elements that are close to the highest or lowest value in the sequence, performance will be horrible; but that almost never happens.

But consider abstract data types, which in principle can have comparison functions that can change at run time.  Might it be possible to mount a denial-of-service attack against a quicksort program by observing how it chooses the elements to compare and subverting the comparison function accordingly?

The answer is yes: In 1999, Doug McIlroy wrote a paper that explains how to do it.

Fortunately, I have not personally encountered a case where someone actually made a system misbehave by using McIlroy's technique.  However, the fact that it exists, together with the results of flushing all those toilets, suggests that when we design systems that are intended to be used by people who might conceivably be hostile toward us, we should be very careful when we think about events that "almost never happen so we can ignore them."



Comments (1)Add Comment
That's why I like heap sort
written by martin cohen, August 06, 2009
On the average slower than quick sort but guaranteed
n log n time with no additional space.

Write comment
You must be logged in to a comment. Please register if you do not have an account yet.

busy

Get your FREE Subscription to Dr. Dobb’s Digest today!

Dobbs Code Talk Quick Poll

This time next year, your most important operating system (host and/or target) will be:

Look Who's Code Talking


Louis T Klauder
City: Quarryville

Ted Roche
City: Contoocook

Richard Fink
City: Mill Valley

Rob Withheld
City: Topeka

ranjix _
City: los angeles

Kevin Kindschuh
City: Portland

Dobbs Code Talk Tags

.NET abstraction Ada Adobe Agile Ajax algorithm Algorithmic complexity ALM Analogical reasoning Android Anecdotes Apple Application Development AppStore Architecture and Design ARM Artificial Intelligence Artificial Life Assembler Programming Audio files AVX AWK Banking Bazaar Best Practices Blender Books Brain computer interfacing Build C C Programming C Sharp Cartoon Category theory Cellular automata Clojure Cloud Computing Cobol Cocoa Coder Of The Month Cognition as compression Collaboration Common Process/Frameworks Compilers Computational humour Computational narrative Computational politics Computer Science Computers in art computing pioneers concurrency Conferences Consciousness research Contest Contest140 contests CPlusPlus crime CSharp D Programming Data Centers Databases Debugging Delphi Deployment design Design Patterns Digital Signal Processing Distributed Django Documentation DSL dynamic language Eclipse EDA education Emacs Embedded Systems Encryption engineering Erlang Etymology Excel exception handling Facebook Financial computing Five Questions Flash Flash Lite Flex Forth Fortran Fraud FreeBSD Fun Functional Programming gadgets Games Gender Git gnuplot Go Google Graphics GUI hardware Heron High School High-Performance Computing History Holographic reduced representations HTML5 Humanity Humour Hungarian Notation Identity Inkscape Innovation Intel Interview iPhone J2EE Java JavaFX JavaOne JavaScript language engineering Legal lex LINQ Linux Lisp Literate Programming Logic Programming m4 Mainframes Make Mathematica Mercurial Mesh messaging Metaprogramming Microsoft MID Miscellaneous Musings ML Mobile Software Mobility modeling modular programming multicore Music MVC myblog Natural Language Processing Networking Neural networks newspeak Nokia numerical computing Object Rexx ObjectiveC Office Office 2007 Online spreadsheets OOP Open Source Openaccess publishing OpenBSD OpenSolaris Operating Systems Optimization Oracle Pair Programming Parallelism Concurrency Parsing Pascal Patents Patterns Performance Perl PHP Podcast Pop11 Poplog Privacy Processing Productivity Programming Language Implementation Programming Language One Programming language semantics Programming Languages Programming Style Project Management Prolog Psychology Public understanding of science puzzle Python QA Quantum Computing Quotes Rails Realtime recls Requirements Research practice REST Review RIA rich internet applications Robotics Ruby SaaS Software as a service Scala Schadenfreude Science fiction Screencast Scripting SD Best Practices Search Security Semantic Web Silverlight Snobol SOA social Social Networks Society for the Study of Artificial Intelligence a Software Development Methodology and Management Songs and poems Spending Priorities Spreadsheets SQL Startups Statistics Storage String pattern matching Survey Teaching Testing The Business of Programming The Dobbs Challenge The Future Theory Topology Transhumanism Travel on the Job Twitter Types Unix Upgrade Usability Use Cases USENET User Experience User Interface Design Version Control video virtual machines Virtualization Visual Studio Visual Studio Sponsored Post WCF Web Development Windows Windows 7 Windows Live Wireless WOA WPF X Window System yacc

Subscribe to Dr. Dobbs Newsletter

Email:
Dr. Dobb's Update
Delivered twice a week, Dr. Dobb's Update provides unbiased and objective news, commentary and technical features spanning the entire software development marketplace.

Latest Comments

Jonathan's Last Day at Sun
For the 8 years I worked there, it was fantastic. I worked there under McNealy and I have undying admiration for the guy. I only knew Jonathan periphe...
Implementing Thread Local Storage on OS ...
Back in the day, I did a fair amount of work with PThreads. Wonderful design. Some quirks, but basically really, really nice. Although I wrote a lot ...
More Technonecrophilia with Snobol One-L...
Yeah, It's probably identical except for the (embedded) copy number, I would think. Once it became freely distributable, the copy I've been distribut...
More Technonecrophilia with Snobol One-L...
There's a spitbol-3.7-win.exe at http://code.google.com/p/spitbol/downloads/list . I found it via Dave Shield's blog page http://daveshields.wordpress...
Jonathan's Last Day at Sun
Sadness.

The Latest From Our Member Blogs

How To Select Trainees
Written by Joel Wiesen   
01/27/10
Hiring the right trainee can be harder than hiring a trained programmer.  One approach is described at my website: http://www.aprtestingservices.com/business/lpat/
 
Technical Job Interviews
Written by Keith Kerlan   
01/20/10
What is the best way to interview for software developer positions?  I've been on both sides of the job interviewing table, but have been on the interviewee side of some not too  great inter
 
Timers/timeouts in multi-threaded event-loops
Written by Christof Meerwald   
01/03/10
The traditional way to integrate timeout handling (or timers) in (single-threaded) event loops was to just pass the appropriate timeout value to the select/poll/epoll syscall. While this works fine
 
C vs C++
Written by Issam Lahlali   
12/04/09
I think that the debate "C vs C++" will end when the two langages died, and each one have its advantages and inconvenients, the choice of one instead of another depend on the application c
 
Great Jobs at CISCO
Written by Brent Rogers   
11/30/09
Hello! I am a recruiter at CISCO. We have a number of great jobopportunities at CISCO right now. Please take a look at the job links listedbelow and please send me an updated resume if you are interes
 
OK Labs, ST-Ericsson, and the Mobile/Wireless Ecosystem
Written by Steve Subar   
11/17/09
Two weeks ago, OK Labs and ST-Ericsson announced the selection of OK Labs as ST-Ericsson's mobile virtualization partner. To earn this coveted position, OK Labs prevailed in a rigorous evaluation
 
C++ Ninjas Needed in Santa Clara, California
Written by Brent Rogers   
09/30/09
Hello! I am a recruiter at CISCO. Our PostPath teamin Santa Clara is building a new Email SaaS business at CISCO. We are looking forsenior developers with Zimbra expertise to help us accomplish this t
 
Fighting Fragmentation with Mobile Virtualization
Written by Steve Subar   
09/21/09
Last week Motorola and T-Mobile announced the launch of a new and innovative Android-based smartphone, the Cliq. This attractive, feature-rich slider handset happens to build on a chipset and firmware
 
Insights into Router Design: Unit Testing of Networking Protocols
Written by Rajesh Kumar Venkateswaran   
09/07/09
  Unit testing is a software validation methodology through which a programmer tests individual modules or units of source code. If the programmer has been responsible for developing a networ
 
Insights into Router Design: Implementation of Networking Protocols
Written by Rajesh Kumar Venkateswaran   
09/06/09
  Modern data networking consists of a large number of networking protocols, each of which has its own domain of applicability. Some run on end stations (also called hosts), some on enterp
 
Insights and Innovations in Networking
Written by Rajesh Kumar Venkateswaran   
09/05/09
Networking devices such as routers and switches have evolved quite a bit over the past years, both in the service provider network and in the enterprise. It is a challenge to build these devices, bo
 
reddit threads community
Written by Christof Meerwald   
08/30/09
I have just started a threads community over at reddit to cover topics such as multithreading, concurrency and parallel programming. Feel free to join if you are interested. -- cmeerw.org 
 

The Latest From Dr. Dobbs

DDJ