Big-O notation is a means of describing how the number of operations required for an algorithm scales as the input grows in size. To use it correctly is to dive deeply into the realm of computer science theory, and to carefully distinguish it from the related small-o notation, big-θ notation, big-Ω notation, and probably many mutant hybrids thereof. While these distinctions add precision to statements about algorithmic scaling, outside computer science theory exams and the remarks of pedantic blog commenters, you’ll rarely see such distinctions made in practice. Far more common in the data science world is a less rigid use of big-O notation: as a general (if imprecise) description of the scaling of an algorithm. With apologies to theorists and pedants, this is the interpretation we’ll use throughout this book.
Big-O notation, in this loose sense, tells you how much time your algorithm will take as you increase the amount of data. If you have an
O[N] (read “order N”) algorithm that takes 1 second to operate on a list of length N=1,000, then you should expect it to take roughly 5 seconds for a list of length N=5,000. If you have an
O[N2] (read “order N squared”) algorithm that takes 1 second for N=1000, then you should expect it to take about 25 seconds for N=5000.
For our purposes, the
N will usually indicate some aspect of the size of the dataset (the number of points, the number of dimensions, etc.). When trying to analyze billions or trillions of samples, the difference between
O[N2] can be far from trivial!
Notice that the big-O notation by itself tells you nothing about the actual wall-clock time of a computation, but only about its scaling as you change N. Generally, for example, an
O[N] algorithm is considered to have better scaling than an
O[N2] algorithm, and for good reason. But for small datasets in particular, the algorithm with better scaling might not be faster. For example, in a given problem an
O[N2] algorithm might take 0.01 seconds, while a “better”
O[N] algorithm might take 1 second. Scale up N by a factor of 1,000, though, and the
O[N] algorithm will win out.
Even this loose version of Big-O notation can be very useful when comparing the performance of algorithms, and we’ll use this notation throughout the book when talking about how algorithms scale.