The C++ standard library’s sorting functions and its associative containers all require a strict weak ordering criterion. But what is that, really?
Well, I thought I had an intuitive feeling for what “strict weak” means, but recently I realized I needed to nail it down. If you look up the definition, you might encounter something like this:
a strict weak ordering is a strict partial order in which incomparability is transitive.
Strict Partial What?!
So naturally, my next instinct was to track down the definition of “strict weak,” and then of whatever that was based on, and so on, so I could parse the meaning of words like “strict,” “weak,” “total,” etc. Bad idea.
The problem is that some of these words don’t act like ordinary adjectives, and if you try to think of them that way, you’ll just get horribly confused. For example, a strict total order is-not-a total order. In fact, instead of creating a refined ordering concept, as one might expect, adding the word “strict” seems to create a sibling concept. It takes away the reflexivity requirement (i.e. ∀ x, x φ x) and adds irreflexivity (∀ x, ¬(x φ x)). So, “≤” over the integers is a total order (∀ xεZ, x ≤ x), “<” over integers is a strict total order (∀ xεZ, x ≮ x), and a relation φ for which x φ x is sometimes true and sometimes false is neither “total” nor “strict total.” 1
Okay. Having said all that, a strict weak order lies somewhere in the territory between a total order and a partial order, so let’s take a look at those.Total Orderings
In a total order φ, every pair of elements x and y are related. That is, either (x φ y) or (y φ x) is true, but—unless x is identical to y—not both. If you represent the ordering relation as a directed graph where an edge from x to y means (x φ y), you always get something like Fig 1: every vertex has a self-loop, and every pair of vertices are connected by a single edge. The graph for a “strict total” order looks just the same, except that there are no self-loops.
Since all orderings are transitive relations, you can reduce the complexity of thinking about them by looking only at the graph’s transitive reduction, which eliminates nearly all of its redundant information. If you arrange the the result vertically from “greater” to “lesser,” and drop any self-loops, you get what’s known as a Hasse Diagram. The Hasse Diagram for total orders and strict total orders is always a nice, linear list as in Fig 2.
Partial Orderings
In a partial order, the Hasse diagram is a Directed Acyclic Graph (DAG)—a directional network with no loops.
Notice that in Fig. 3, x and z are not ordered with respect to one another; they are called incomparable. That is perhaps a badly-chosen name, because it is possible to compare x and z; if you do, though, you’ll find that neither x ≺ z nor z ≺ x.
Incomparability
If the whole idea of incomparability seems a bit implausible, consider the relation of human ancestry. By this criterion, you are preceded by your parents, who in turn are preceded by their parents, etc. However, your maternal grandparents are incomparable to your paternal grandparents.
The divisibility ordering over integers is another example. As you can see from Fig. 4, the prime numbers line up together above 1, and are incomparable with one another.
Order of the Weak
Perhaps counterintuitively, a strict weak ordering actually has stronger requirements than a strict partial ordering does. In a strict weak ordering, every pair of incomparable elements are related—or incomparable—to every other element in exactly the same way. Thus, the Hasse diagrams of all strict weak orderings follow the same general pattern shown in Fig. 5.: every element is related to all the elements at the next-higher level (and to no elements at the same level).
Strictly Speaking, Weak ⇒ Strict?
There’s some disagreement in the literature I’ve found over whether there’s such a thing as a “non-strict” weak order. The definition of “weak order” in Elements of Programming implies strictness. I like this definition because it’s simple and elegant. Any definition of “non-strict weak order” seems to require an extra rule to add reflexivity.
In other contexts I found on the web, though, the unqualified term “weak order” actually implies reflexivity, i.e. non-strictness. Probably for this reason, “weak order” is seldom used alone, but instead is usually prefixed either by “strict,” “non-strict,” or “reflexive” (sometimes parenthesized). That seems like a good idea to me.
Properties of Incomparability
Incomparability can be viewed as a relation in its own right:
x‖y ≝ x⊀y ∧ y⊀x
The incomparability relation is symmetric, almost by definition: x‖y implies y‖x, and vice versa. In a strict ordering, where nothing is ordered with respect to itself, it’s also reflexive: x‖x for all x. In a strict weak ordering, it’s also transitive. Since an equivalence relation is by definition symmetric, reflexive, and transitive, when taken together, these three properties allow us to group incomparable elements of a strict weak order into equivalence classes as shown in Fig. 6.
Notice that applying the strict weak ordering relation between
representatives of these equivalence classes results in a strict total
order of the equivalence classes themselves. In fact, we could view a
strict weak ordering as the result of ignoring some of the elements’
distinguishing characteristics and applying a strict total ordering to
the rest. With that understanding in hand, it becomes easy to recognize
many of the orderings we can legitimately use with std::sort. For
example, the “is a brighter color than” comparison ignores hue and
saturation, arranging colors based on a total ordering over brightness.
A case-insensitive string comparison ignores case and performs a (total)
lexicographical ordering of the lowercased characters.
How Does It Work?
So why is this “strict weak ordering” property so essential to efficient sorting? Well, since all incomparable elements are interchangeable as far as sort order is concerned, a sorting algorithm can avoid making comparisons. In particular, the transitivity property of incomparability in a weak ordering means that once you know that x‖y and y‖z, you also don’t have to compare x and z to get the sort order right.
Are You Pondering What I’m Pondering?
A couple of thought-provoking ideas came to me as I was writing this article and I thought it would be more fun to leave them as open questions than to answer them in detail myself. Maybe we’ll get some good answers in the comments
Maps
The sorting criterion for a std::map is a strict weak ordering that
ignores the value field of its (key, value) pairs and orders the
elements… based on a strict weak ordering of the keys alone. But we
said above that a strict weak ordering could be viewed as a total
ordering over the non-ignored characteristics. Is there an
inconsistency here?
What About Topological Sort?
Those of you who were awake during graph theory class might remember
that it’s possible to efficiently sort the elements of a DAG so that the
arrows all point in one direction after sorting. In fact, you can do
it in linear time. We said that every partial ordering could be
represented as a DAG. So why does std::sort require a strict weak
order? Shouldn’t this allow a super-efficient sort
on any partial ordering criterion, even one that isn’t weak?
-
It would be interesting to know if such an “only sometimes reflexive” property ever shows up in a useful ordering relation. I can’t think of an example. ↩






Re: Footnote #1, I don’t know if it would constitute “a useful ordering relation” but the predicate “is the smallest multiple-of-3 greater than or equal to” is sometimes true and sometimes false. It happens to produce the same truth value as the monadic “is a multiple-of-3″, but it gives you the flavor of a relation which could have a more convoluted truth table.
(Quote)I would say “Building upon Mats’ point,” except that I conceived the following exposition just before I read, rather than just after I read:
so the following is actually my original exposition. The ensuing discussion, especially, “…but you lost me here, friend. std::sort needs a strict weak ordering so that lower_bound can work on the result?” shows that Mats’ exposition left at least part of the point unrevealed. Maybe the following will help.
The reason we want std::sort to require a strict weak ordering is because we like that insertion of a new element in a sorted list is O(logN). If working with a strict weakly ordered set, that is true, but that beneficially favorable order requires a strict weak ordering.
Consider the partially ordered but not strict weakly ordered set of complex numbers S = {2,i,3,i+2,4} of length N, where N==5. It uses the usual meaning of those terms: integers obey the usual < ordering, and pairs of complex numbers with equal imaginary part obey the usual < ordering. i is the SquareRootOfNegativeOne and every complex number with non-zero imaginary part is IncomparableToAnyInteger.
Insertion of the new element (i+1) is correct if it produces a set {2,3,4,i,i+1,i+2} or {2,i,3,i+1,4,i+2} or {i,i+1,2,3,i+2,4} or any of the several others, but incorrect if it produces a set {2,i+1,3,i,4,i+2} or {2,i,3,i+2,4,i+1} etc.
Insertion of a new element into a partially ordered but not strict weakly ordered set, is O(N). To illustrate, consider the steps. At each step make comparison of the additional element A with something in the set. A comparison will either produce a partition or an Incomparable. The number of remaining steps needed will not be reduced if the comparison produced an Incomparable. To ensure a correct result will require wasting comparison steps proportional to the number of elements in the set to which A is Incomparable, plus the logarithm of the number of elements in the set to which A is comparable. The sum of these two effects is O(N).
Basically this is all just further illumination of Mats’ point that:
(Quote)Hi Duffy,
I understand what you and Mat (no ‘s’) are saying, but I just don’t find that line of reasoning compelling. It’s entirely normal for generic libraries to give conditional efficiency guarantees (e.g. “if you hand me random access iterators, I give you O(log N) binary search; otherwise I give you O(log N) comparisons but O(N) steps”). Saying “if you give me a strict weak ordering, you’ll be able to find the position of a new element in O(log N) but it might be linear otherwise” is not really any different. None of that makes your reasoning invalid, but for me personally it doesn’t have the solid “clunk” of Truth about it.
Either of these explanations seem more likely to me:
sort, when and if we need it.But neither one of those really wows me either. Still looking for sat-is-fac-tion…
(Quote)No time to read this paper right now but it looks relevant and interesting.
(Quote)There is a third possibility, or maybe it’s a special case of your second possibility. This is that an algorithm capable of sorting a general strict partial order when applied to a strict weak order might be significantly poorer than an algorithm designed for a strict weak order, and paying that price is not worth it when the strict weak order is the usual case. I strongly suspect this will be the case. Worse, it’s possible (I mean by that I haven’t yet seen a proof otherwise) that the algorithm might even fail to be O(n log n) in the case where it is, but is not specified as, a strict weak order.
So my question is, can anyone demonstrate a sorting algorithm that will sort a strict partial order with no other known properties (beyond irreflexivity and transitivity) which I think I showed earlier can’t be O(n log n) in that general case, but which is O(n log n) if that strict partial prder happens to be a strict weak order, even without being told that? (The sanity check cases I suggest trying any candidate on are the empty relation, and the singleton relation where there are unique x, y such that x R y. The former is strict weak, the latter isn’t, but they are very similar.)
(Quote)Yeah, I think this is a special case of #2, or at least closely related.
(Quote)But if this case were so, would it provide a compelling reason for demanding strict weak ordering?
I think it would definitely be compelling if it broke O(n log n), and I think it would be moderately compelling if it cost a factor of 1.5. (Why 1.5? Well, suppose each time you tested a < b and failed, you then needed to also test b < a. Note that that’s pure speculation as to what might happen, my guess is that it’s actually worse than that in the worst case.)
Then if it is a compelling case, it would just need proving. (Or, of course, disproving.)
(Quote)I’ll make another stab at providing you sat-is-fac-tion, with a completely different exposition.
In a situation where a body of elements are inherently partial ordered, an operation upon them that is “sort” like is preferable if it produces a consistent result. So when calling a “sort” operation, it is worthwhile to provide a predicate that is strict weakly ordered, rather than suffer the almost nondeterministic result of sorting a partial ordered set. I can remember sort tools back at Dragon that were nondeterministic from run to run, swapping which line from the input appears before the other, if the sort key was nonunique and was a subset of the, always unique, whole line. Such tools were usually considered “broken” by the user.
Consider also for example the complex numbers. It is fine to sort by modulus of the elements. The complex numbers are inherently just partial ordered, but there is the concept of “they could be sorted according to such-and-such if nothing higher priority differentiates their position.” I would not be surprised to find that in EVERY situation, it is worthwhile to provide a predicate that is strict weakly ordered. Thus if we were to provide support for sort of a partial ordered set, that would just be wasted library routines.
(Quote)Do you consider the random rearranging of equivalent elements in a sorted strict weak order to be somehow “less nondeterministic?”
FYI, if you read EOP you’ll see that they go further, implying that a total order is both inherently achievable and desirable.
(Quote)Up until this post, I thought you were inclined towards less order, i.e. partial, rather than towards more order, i.e. total. But upon reflection, what you were probably more inclined towards was hearing other perspectives which might enlighten.
So in the enlightenment vein, yes I do consider the random rearranging of equivalent elements in a sorted strict weak order to be somehow “less nondeterministic” than the seemingly random appearance at any juncture, of the incomparable elements. If the incomparable elements group themselves together in their own ghetto, somehow that is more appealing TO ME, than permitting them to appear anywhere, like neutrinos.
Example for illustration, {-i+1, 0, 2, 5, i, i+1} seems more sorted TO ME than does {0, -i+1, 2, i, 5, i+1}.
Total order advocacy is probably what holds the most appeal for me. I see order preserving sorts as the only sorts worth doing, and order preserving sorts (where the original index is a unique lowest priority part of the sort key) are always total order.
(Quote)Bingo
I confess that I didn’t really understand the part in the “…”, but:
(Quote)stable_sortonly requires a strict weak ordering, not a total order. So I’m really not sure what you’re driving at here.I didn’t know it by that name: stable_sort, but I guess it is what “order preserving sort” back in my days in 6.036 has evolved into.
A technique to achieve stable_sort, with any sort algorithm, is that if two elements compare as equivalent through all of the provided predicate sort keys, then append a simple std::less comparison of the original index of each element. When doing a multi-field sort, the aggregation structure to represent what predicate and data field and width is primary key, what predicate and data field and width is the secondary key, what is tertiary, etc. is easy to extend by one more to refer to the fixed-width index/address of the elements.
A predicate invocation that would have been:
gets extended one more level to
and there is no possibility of falling through past that.
(Quote)…and it takes O(N) additional storage. The term “stable” for order-preserving sort seems to date back to at least 1977 from what I can tell. But I don’t see how any of this relates to your statement that stable sorts are always total orders.
(Quote)Because the original index of each element is a unique numeric value that is trivial to order. Either the elements don’t compare as equivalent, in which case they are ordered, or the elements compare as equivalent and so are differentiated as to order by the unique original index of each element.
P.S. “stable” may have been in the vernacular, but when I took 6.036, The term C++ hadn’t even been coined yet, let alone stable_sort.
(Quote)Another way I’ve thought of to phrase it, that might have explanatory value: “My identity comprises my baggage and where I am in line.” If you are going to compare me to another, you have to consider both: what I hold and where I was when you started.
(Quote)which makes me disinclined to use a library
stable_sortif I can achieve the same effect with a faster sort algorithm just by providing a slightly more sophisticated predicate.By providing a more sophisticated predicate, thus making me “total ordered”, the same result will be produced whether using library
(Quote)sortorstable_sort.Except normally the information needed by the more sophisticated predicate isn’t available for free. stable_sort doesn’t need the items to be tagged with their original index, the more sophisticated predicate will. I’m also not at all sure that the index would be something I’d want to add to a class just to avoid using stable_sort. However as I can’t recall actually needing a stable_sort (except possibly when sorting a std::list where it’s the only type of sort available) that is not clear to me.
Also note that the gain is at best a factor, as both are O(n log n), some of which will be lost in using the new predicate. Is this (requiring to use sort rather than stable_sort) premature optimisation (or even pessimisation if adding the index costs)?
(Quote)Just a trivial addition: it might be worth mentioning an example of a type programmers are likely to use in real programming that initially appears to fit the requirements for a strict weak order, but really doesn’t. At first glance, floating point numbers appear to be ordered — but IEEE floating point includes “NaN” (Not a Number), which isn’t ordered at all. In fact, a NaN even goes beyond being unordered, and doesn’t even have an equivalence relation (i.e., a NaN won’t compare equal to anything, not even itself).
(Quote)The statement in the original XML Schema xsd:double and xsd:float that “Not-a-number equals itself and is greater than all float values including positive infinity.” (see 2001 forum post for an in-depth discussion) further muddied the waters, regarding what programmers should do with IEEE floating point.
More recent revisions of XML Schema xsd:double and xsd:float have come off that original stance somewhat, instead using the following phrases:
Still it does tend to require a dedicated clause in code to get sorting compatible with this, out of IEEE floating point. One allowance that should be recognized, the XML schema governs textual representations of numbers, not the binary representations. To say that NaN = NaN, for XML means \116\141\116 equals \116\141\116, not that the 32bit or 64bit binary patterns which are NaN in IEEE floating point all equal each other. If you use XML in your programming domain, however, for the XML floating point to sort differently from the binary floating point, is usually undesirable.
(Quote)I made an attempt with quicksort (which with a random pivot may be a real sorting algorithm). The only problem is that it appears to me to not to need a strict weak ordering, a partial ordering will do.
The essential of quicksort (of course we recurse on the sets found) is to pick a pivot element y and then two sets X and Z such that for all x in X we have x R y, and for all z in Z we do not have z R y. That allows each element (other than y) to be easily put in X or Z unambiguously. Note that all incomparable (to y) elements are in Z. (That’s arbitrary of course, but they must all go the same set.)
Now suppose that algorithm were broken, as the non-swo version of insertion sort is, then that would require having an x0 in X and a z0 in Z such that z0 R x0. (Problems within X and Z are handled by recursion, and there are by definition no problems involving y.) But x0 R y, so by transitivity z0 R y, which is a contradiction.
(Of course I first tried some examples, and they didn’t break, which made me think perhaps it isn’t broken, and then led to the above.)
I’m left slightly puzzled. The only thing I did notice using the divisor example was that it was easy to end up with Z much larger than X, and maybe that could become an issue in terms of computation. However there’s always the case where R is empty, and that ends up comparing everything with everything, the worst case – but that’s a strict weak ordering, so that doesn’t help a “why a swo?” question. Incidentally that looks like the usual computational complexity requirements requiring a total ordering – does the C++ standard make such a point, or does some other sorting algorithm do better?
Of course there’s still the fact, as Mat Marcus pointed out, that the sorted result is fragile in that it doesn’t support simple insertion nicely, and that is a real issue. But not quite the same one.
(Quote)Impressive work, Christopher!
Notes:
(Quote)the 2003 standard only requires the average-case behavior of QuickSort for its
sort, though implementations commonly use IntroSort (original paper), which has O(N logN) worst-case behavior—the same as HeapSort. C++0x will require O(N logN). Maybe you need to look at a true O(N logN) algorithm like HeapSort to see problems.Even QuickSort (and IntroSort) implementations typically devolve to insertion sort near the leaves, when sorting 5 elements or so, as an optimization. Not having a strict weak ordering obviously breaks that optimization, but that doesn’t seem like a compelling argument against accepting a partial ordering.
If the comparison used with
std::sortturns out to be overconstrained, that won’t be the only way we’ve overconstrained that algorithm.Moving on to mergesort (still tackling the easier ones, even the most basic heap sort has two operations to analyse rather than the one each of insertion, quick and merge sorts) I think the strict weak criterion finally pays off.
The essential function in merge sort is merging two sorted lists X and Y into a single sorted list. The rest is just recursion. Let the relation be R, the first element of X be x and the first element of Y be y. (If X or Y is empty we just take what’s in the other one.) We pick x (remove from X and add to sorted list) if x R y, otherwise we pick y. There’s a chosen asymmetry to take from Y if the two elements are incomparable.
Now this works with a strict weak ordering, but not with a general partial ordering. The latter is easier, so let’s get it out of the way. With R being divides, X could be (9, 2) and Y could be (4, 5). Then the sorted list ends up as (4, 5, 9, 2) which is not sorted. Also note that there isn’t a simple fix that involves just front elements of the lists, because had the example been (9, 2) and (4, 3) there is no possible choice of front elements (9 or 4) to pick first that results in a sorted list.
Actually that second result means that I don’t need to show that the simple front element selection I picked works as a merge sort, just to assert that we all know that merge sort exists and works, but with a general partial order merging doesn’t work. But that’s not really playing fair, so let’s finish off. I’ll introduce two more relations x Q y iff x and y are incomparable, and x P y iff x R y or x P y. (Basically R is <, Q is a form of =, and P is <=.) P is transitive, which we can show from the strict weak order property of R.
(There are four cases to check: x R y and y R z => x R z hence x P z and similarly for Q are the trivial two. The other two follow most easily from the Hasse diagram level approach to what a swo is, but alternatively – for one case, the other left, as they say, as an exercise. Suppose x R y and y Q z but not x P z. Not x P z means that z R x, which combined with x R y gives z R y which contradicts y Q z.)
Now we will consider the concept of element by element sorted, under P, so that if z0 and z1 are consecutive elements in Z then z0 P z1. The mergesort construction method suggested above creates element by element sorting, assuming X and Y are element by element sorted (and we can use recursion to get that) as two consecutive elements in the merged list either came from the same list, for which the result is obvious, or from different lists, and the merging process ensures that if they are x from X then y from Y then x R y hence x P y, or if they are y from Y then x from X then not x R y, i.e. y P x. Finally element by element sorted under P is sorted under P (by transitivity of P) and then sorted under R (because the contradiction condition that u is before v but v R u itself contradicts sorted under P).
So, in short, mergesort can be done with strict weak ordering, and needs the strict weak ordering.
(I have this strong suspicion that all of the above can be boiled down to a fraction of the length, more clearly, and is in one of the standard texts, Knuth being the obvious guess. But it’s instructive to me at least to see what’s going on.)
(Quote)The benefit of a SWO is that ‘incomparability’ gives rise to an equivalence relation. If we know that a range is sorted into (increasing) blocks of equivalent elements, then for a given value x, we can use efficient algorithms to search for “the leftmost element in the range that is equivalent to x”. In case we have a total order, we may replace the word equivalent with the word equal. In your “worst case” all elements are equivalent, and the logarithmic lower_bound algorithm will correctly return the 0-th element in the range. If we relax SWO to partial order, an element that is incomparable to everything else might appear at any position in a ’sorted’ range, so we cannot safely use, for example, the logarithmic lower_bound.
The problem with relaxing the precondition of sort so that it no longer requires SWO, is that it also conditionally weakens the post-condition. The current post-condition, that the sorted range is organized into increasing blocks of equivalent elements is quite useful. An algorithm such as topological_sort has its use cases. But I haven’t seen a good argument for why it is worth ‘overloading’ std::sort to accommodate it, perhaps trading away the ability to reason, for example, that “in a sorted (random access) range, finding the insertion point for an element has logarithmic complexity”.
(Quote)And if we have a partial order, we can search for “the leftmost element in the range that is incomparable to x.”
I don’t think you’ve explained why this prevents me from finding the leftmost element incomparable to x in log(N) time. To justify your claim, if we return to the fundamental requirement of binary search that the range be partitioned according to the comparison function, you’d need to show that a topological sort of a partial order does not partition the range into three subranges of elements that are “less,” “incomparable,” and “greater” than any given element value. I’m pretty sure that I can prove that it does, and I suspect (but don’t think I can prove) that the real problem, if there is one, happens much earlier. That is, I suspect it’s impossible to achieve an O(N log(N)) worst-case sort given a partial ordering criterion.
I think this “conditionally weakens” argument is a fallacy. It expands the domain (set of valid inputs) and only yields the “weakened” postcondition for inputs that were previously invalid.
Further, I don’t have a counterexample at the tip of my brain, but doesn’t it seem a bit implausible that a postcondition that the range is sorted topologically might not be useful?
There’s no trade unless you mind changing the word “sorted” to “strict-weak sorted.” Anyway, I’m not poking at this problem at the level of library design changes. I want to understand whether the “strict weak” criterion is (technically) just an over-constraint, and if not, what forces us to accept it.
(Quote)A couple of comments and a bit of a puzzle for this thread -
Mat had pulled in N1313 – I think there are two observations here. You can binary search to find a partition point, this doesn’t require any ordering relation, only a predicate that partitions the set. A find_partition_point() algorithm would be worthwhile. I don’t see this as a replacement interface for lower_bound() – you really want the interface to lower_bound() and search() to match so you can reuse components. IMO the missing piece in the standard here is a projection (or key) function. You should be able to sort with a projection function as well as use lower_bound (and upper_bound and equal_range) with the same comparison and projection functions.
Now a bit of a puzzle – recently came across an issue where std::sort() was crashing because the comparison function was generating indeterminate results (meaning random) if the arguments were within some epsilon of each other (this happened because an incorrect compiler optimization was going through extended precision floating point and the result of the comparison became dependent on rounding error). Such a comparison has the odd property of missing transitivity for equivalence (incomparability) but being otherwise valid (not quite a valid partial order because of the indeterminate nature of the comparison). An implementation of merge_sort will sort “correctly” with such a comparison function. Pairs of values within epsilon are treated as being equivalent. Can you prove that it works?
(Quote)What does that mean, please?
(Quote)That Mat brought N1313 into the discussion.
(Quote)Sean, after rewriting this post about 10 times, I realized the problem— transitivity of equality comparison doesn’t enter into it. You mean that in this broken ordering criterion, equivalence (incomparability) isn’t transitive, right?
Can I ask what you mean by putting the word “correctly” in quotes when you say ‘
(Quote)merge_sortwill sort “correctly”?’Yes to the first question (edited the post to be more clear).
Quotes to imply “for some definition of correctly” – which I define in the next sentence.
(Quote)Do you mean that the result is sorted except that the positions of any pair of adjacent elements within epsilon of one another might be swapped?
(Quote)Yes.
(Quote)Hi Sean. Consider the range S = {1,2,…100, i} with the following non-strict order: i is incomparable to all, integers obey the usual < ordering. Then S is in sorted order. Binding to “IsIncomparableToI” yields a range consisting entirely of {false, …, false}. There is no dispute that partition point will return the 0-th position when looking for i, in logarithmic time. What it can’t do is find whether i is present in logarithmic time. Nor can we say that it produces an element ‘equivalent’ to i in logarithmic time, since there is no such notion, given only a partial order. For example, how is equal range to behave in this case? All elements are ‘equal’ to i, yet only 49 is equal to itself. A library where one may assume that one call invoke lower_bound, equal_range, etc. with useful and predictable results appeals to me. I haven’t yet seen that weakening STLs notion of sortedness to include partially ordered ranges is very useful (rather than a separate a topological_sort).
I only mentioned N1313 because in an offline discussion, I understood Dave to believe that it led the reader to understand that strict weak order was an over-constraint. I tried to show how it was orthogonal to the issues raised above.
(Quote)That should have been: All elements are ‘equal’ to i, yet 49 is ‘equal’ only to itself and i.
(Quote)I was agreeing with you Mat – just for different reasons The observations made in this N1313 don’t argue for a weaking of the lower_bound requirements but rather new algorithms (like partition_point() and topological_sort() ) and fixing the interface for both lower_bound and sort to include a projection function.
(Quote)Actually, I brought in N1313, which [blows own horn] is the first known published description of binary search in terms of a partition. No,
partition_pointis not a replacement interface for lower_bound, but I would build the proof thatlower_boundworks on the proof thatpartition_pointworks, the latter being the more general and fundamental algorithm.As to the puzzle, when you say that the result is random when the values are close, I think you’re leaving a lot of information out. Is it consistent, i.e. will two instances of the same test produce the same result? Is it consistently assymmetric (i.e. x < y ⇒ y ≮ x)? What about transitivity of “<”?
It seems that at least one interpretation of your statement isn’t provable. Christopher Dearlove showed in this post that mergesort doesn’t work with an arbitrary (strict?) partial ordering. But the difference between a strict partial ordering and a strict weak ordering—transitivity of equivalence—is exactly the property you say is missing from this broken comparison. Does it have some other property not present in an arbitrary strict partial ordering?
(Quote)The result of x < y is random iff x is within epsilon of y. Two instances of the same test will not produce the same result. If x < y is indeterminate implies y < x is indeterminate. You lose all other axioms on the relation when x and y are within epsilon. I pointed out transitivity of equivalence because that axiom never applies since we’re defining equivalence to be values within epsilon. The is, if “==” means “within epsilon” then x == y && y == z does not imply that x == z, even if you could test for “within epsilon” which you can’t, since any test will be indeterminate.
“Does it have some other property not present in an arbitrary strict partial ordering?” You tell me
(Quote)My comment on possibly needing total ordering was wrong. We may not need a total ordering for O(n log n) performance, but we do need more than just partial ordering – and I guess strict weak ordering is that something (or at least a useful option for that something). Consider the cases of the empty relation, and the single pair relation where there is a unique x and y such that x R y. With no other constraints on the partial ordering we need to find those specific x and y to ensure that a sorting is valid, and that means we need to check every pair, as the two examples only differ in one pair. But if we know the ordering is strict weak we don’t need to check all pairs, we can, for example, identify the empty ordering in O(n) comparisons, and the single pair example isn’t a strict weak ordering. So this is another issue where the strict weak property matters.
Note that this doesn’t contradict that quicksort can handle a partial order, we know quicksort has a quadratic worst case, and this is one – in fact try a quicksort on the single pair relation and you can see the worst case happening. It does suggest an argument that goes “you can’t do unqualified partial order in O(n log n), so if your sort algorithm has guaranteed O(n log n) worst case, then it must be using more information than just that it’s a partial order – possibly that it’s strict weak”. If that argument can be made rigorous – and I think it probably can be – then one can say “heap sort can’t do unqualified partial ordering” without actually looking at heapsort’s details beyond that it is O(n log n). The weakest point in the above is making it rigorous (for whatever level of rigor you need) that you need to look at every pair to guarantee a sorting of the single pair relation with no further assumptions, but I’m fairly sure of that.
(Quote)I think Mat Marcus’s “why a strict weak ordering” just needs a few more words. Moving from sorted property to sorting algorithm, as an exercise let’s try an insertion sort (yes, I know it doesn’t have the required complexity, but it’s a starting point).
What’s the definition of insertion sort we want? Using relation R, to insert item n into the correct position from 0 to n, inclusive, moving later elements as required, we insert y (i.e. x[n]) as the new value at position i if (having already determined that we don’t insert earlier) it fails the test x[i] R y. That’s easy to define, stops as soon as possible, and doesn’t have to handle incomparability as a special case.
So trying that for x/3 < y/3 for (1, 2, 3) listing only the sorted elements after each insertion, gives (1), (2, 1), (2, 1, 3). That’s a valid sorting.
With “divides” as an example that is not a strict weak order let’s try sorting (2, 3, 4). Again listing only the sorted elements, first we have (2), then (3, 2), then (4, 3, 2) because 4 is incomparable to 3. But this is not a valid sorting, 2 divides 4, so 4 should be after 2.
The point is that with a strict weak ordering we can stop once we hit something incomparable. But with a general partial ordering we can’t. However I’m not going to try and work out what (complexity increasing) fix you do need to apply to get an insertion sort working with a partial ordering, or work out exactly what happens with a real sorting algorithm.
(Quote)Welcome, Christopher—
Your approach looks promising to me! On the other hand, I think someone needs to work out a few of those things you say you’re not going to try to work out before we get much more insight from this than was available in the “How Does It Work?” section of the original article. Are you sure you don’t want to give it a shot?
(Quote)You posed the question “why does std::sort require a strict weak order?” The short answer: So that sorted ranges will be strict weak ordered, a basic requirement that enables algorithms such as the usual random access logarithmic complexity lower_bound to work properly.
Here’s a small example that demonstrates how conventional implementations of lower_bound can fail in the absence of the strict weak order requirement. Consider a universe consisting of the three elements {1, 2, i}. Define < by:
1 < 2 is true x < y is false in all other cases.
Then the range {1, 2, i} is ordered with respect to <. But the usual lower_bound will fail, for example, to find the desired position for the element i in this range, since at the first step the test *middle < value, i.e. 2 < i, will return false, and as a result the algorithm will erroneously throw away the upper half-range. Ultimately, lower_bound will return an iterator that points to the first element. This is not a desirable outcome.
It is straightforward to demonstrate that transitivity of incomparability, the relevant aspect of strict weak orders, is necessary to guarantee that a sorted range consists of contiguous blocks of equivalent elements with each block < the next.
Incidentally, I don’t see how N1313 addresses this problem. If I skimmed the paper correctly, the authors are proposing the following requirement “For this to work, of course, the unary function object must return true for zero or more initial elements, and false for all the rest.” Unfortunately, I don’t think that this is strong enough. In the above example, wouldn’t N1313 lead us to consider the unary function IsLessThanI? This function yields {false, false, false} for our example, which would seem to give the same undesirable result as the above.
(Quote)Yes, and frankly I think the answer in the article is a little weak, so I’m glad you’re attacking this question.
…but you lost me here, friend.
std::sortneeds a strict weak ordering so thatlower_boundcan work on the result? This line of reasoning seems flawed. Isn’t that like sayingrotatemust require a random-access sequence (it doesn’t) so thatsortcan work on the result?Let me ask the question a different way.
It’s possible to sensibly sort elements of a set given just a partial ordering relation. For example, the result could have the same order as topological_sort gives (once you figure out where the edges are). There’s no reason in principle that
std::sortcouldn’t do this, other than efficiency. The question I’m trying to ask is, “what is it about a strict weak ordering that makes it possible forstd::sortto be efficient, and how does that work?” We know that the key has to be transitivity of equivalence, but why does that make such a big difference? Note that this is a bit different (and I think a lot harder) than asking where the fallacy lies in the “topological sort paradox.”Oh, and as for how N1313 is related: it doesn’t address the problem, nor is it intended to. The important insight of N1313 is that orderings (other than a simple partitioning) don’t enter into the true fundamental requirements for binary search. Requiring a partitioning is indeed strong enough for binary search to work—a point-of-view, incidentally, that EOP has adopted as well.
(Quote)What program did you use to make your diagrams? Is that OmniGraffle or Graphviz or something?
Thanks
(Quote)Yep, OmniGraffle Pro. Makes a decent picture if you work at it, though I think it might have been easier with keynote or one of the other iWork applications.
(Quote)Dave and I discussed this post at length but came to somewhat different conclusions as to the approach. Everything in his post is correct and informative, but I think the definition of a strict weak ordering in terms of a strict partial ordering, although common, comes at the problem from the wrong direction and causes some confusion.
Instead, if we start with a strict total order, r, which can be defined as:
exactly one of r(x, y), r(y, x), or x == y (trichotomous)
r(x, y) && r(y, z) implies r(x, z) (transative)
Then, if we weaken the trichotomous property to:
exactly one of r(x, y), r(y, x), or !(r(x, y) || r(y, x)) (weak trichotomous)
We obtain an alternative (and equivalent) definition of strict weak order. Trichotomous implies irreflexive (hence “strict”) and asymmetric.
This is the approach in Elements of Programming (section 4.2).
(Quote)Sean, I don’t think you’ve read the latest version of the article
(Quote)Heh – apparently not as thoroughly as I should have.
(Quote)If you compare the definition of strict weak ordering that you gave with the definition in Section 4.2 of the book, you’ll see that you’re actually missing something. Your definition requires transitivity and the property that exactly one of r(x,y), r(y,x), and !r(x,y) && !r(y,x) must hold. This is exactly the same as the definition in Dave’s posting. As Dave points out in his posting, what you really want in a strict weak ordering is that incomparability, that is, the relation “!r(x,y) && !r(y,x)” is an equivalence relation. But that does not automatically follow from the definition that you both gave. You have to require it explicitly. That’s exactly what the definition in the book does: see bottom paragraph on page 51. Another good source is Matt Austern’s book “Generic Programming and the STL”. On page 89, he states that one of the axioms of a strict weak ordering is “Transitivity of equivalence.” If you read that in the full context of Matt’s book, you see that that’s the point where he explicitly requires incomparability to be an equivalence relation.
Another way of seeing that your definition as it stands is not quite complete is to apply it to the example of proper divisibility of integers that Dave shows in Figure 4 of his posting. Your definition is satisfied: proper divisibility of integers is transitive, and for two integers a and b you have exactly one of “a properly divides b”, “b properly divides a”, or “a and b are incomparable by proper divisibility”. But proper divisibility of integers is not a strict weak ordering, because here, incomparability is not an equivalence relation.
(Quote)Grr. Yes. Made the same mistake Alex and Paul made in the first printing, when you drop equality in the trichotomy you have to pickup transitivity because it’s no longer implied. – see the errata on page 3: http://www.elementsofprogramming.com/eop-errata.pdf
Sean
(Quote)Thanks, Thomas, for pointing that out. In Sean’s defense, he isn’t the only person to encourage me to look at strict weak ordering that way. I guess the “weakened trichotomy” approach looks appealing, but really doesn’t put the emphasis in the right place if you’re trying to clarify “strict weak,” since you need to tack something on to make the distinction from “partial.” I wonder if I should roll back to an earlier version of the article that doesn’t even try to go down that road.
In my own defense, I never intended to give a definition of strict weak ordering other than the one at the very top of this article, and if you think you see another one in here maybe I need to re-phrase something. The idea here was to make strict weak ordering more understandable by exploring its properties, and since I did spend considerable “ink” on incomparability as a relation in its own right, its transitivity, and the resulting equivalence classes, I don’t think I was negligent.
(Quote)Actually, I like the way the article is organized quite a bit. It worked very well for me. It’s just that the first sentence under “Order of the Weak” (A strict weak order is like a strict total order, except that…) looks an awful lot like it’s meant to be a definition. That is true in particular because “strict weak” appears in boldface; it is common in math books to set a term that is about to be defined in boldface. If I may make a suggestion: turn that sentence into a definition by either adding the condition that incomparability must be an equivalence relation, or, if you prefer a non-redundant definition, adding that incomparability must be transitive. In the latter case, you would then have to remark that symmetry of incomparability is trivial, and that reflexivity of incomparability follows easily from weak trichotomy.
That way, your article would contain a fully explicit definition of strict weak order. If someone then wants to convince herself that this explicit definition is equivalent to the one at the top of the article, great, that’s an easy exercise.
(Quote)Actually, that’s the convention I follow too. This one stayed bold after it stopped being a definition
.
Well, the information needed to make it a valid definition is already there in the very next sentence:
However, I think it would be better to go back to my original “definition,” which starts with partial ordering, gets that information about transitivity of equivalence into the initial sentence, and doesn’t rely on the “weakened trichotomy” description, which, as you have made brilliantly clear, just bounces us from total to partial order without making any progress toward understanding “strict weak.”
Humn, I’m not sure I understand why I’d need to make those additional remarks. Here’s the original (informal) “definition:”
Does that work for you? It seems to me to be minimal, complete, and easily understood.
(Quote)Come to think of it, I believe that does make for a better flow of things. Figures 4 and 5 already express the same thing visually; now the text would perfectly match that.
I must admit that the more formal arguments are easier for me to understand, but that’s of course a mathematician’s bias. You should not listen to my advice here. Your informal definition is certainly valid, and it does make for a very nice explanation of the meaning and purpose behind the formal properties.
(Quote)So, “yes that’s an improvement, but don’t listen to me because I’m a mathematician?”
flips mental coin…
I think I’ll make that change, then.
(Quote)I was a bit sleepy when I posted my message sorry
Michael Witten has already said it. The edges we are given are dependencies in the form “b must come after a” and we are given only the relevant ones. Notice we never compare nodes! the nodes are “previously compared” in some way and we just process the resulting dependencies to output a factible sequence.
About the strict-order part, I was confusing total with strict. Anyway I think its important to require one of them (implementation detail) and requiring strict is more natural in the following case:
Consider binary search. If R(a, s) holds where R is strict and s is the value we are searching, you can be pretty sure a isn’t the value you want. But if R is non-strict you have to ask an additional question (eq(a, s)?) before deciding what to do next. Sorting and then searching with an strict R seems to be a bit faster and it seems you shouldn’t mix them because the implementation depends on the assumption.
(Quote)Nice! This actually seems like the clearest explanation yet of the topological ordering “paradox.”
You lost me there.
Hmm. I’m not so sure. Isn’t it true that ¬( x ≾ y ) ⇔ y ≺ x? It seems to me you get exactly the same information from both kinds of ordering. But I might be missing something. Anyway, again, I think we should stay away from binary search when making points about ordering.
(Quote)Topological sorting is O(|V|+|E|), where V is the set of vertices of the graph and E the set of edges. Calculating the edges given only the set of vertices and the relation
xRy := there is an edge from x to y
takes O(|V|^2) time, which solves the paradox. Even if you can do with a subset E’ of E, taking advantage of the assumption that the graph is a DAG (so E-E’ is redundant for the purposes of sorting), that would still take O(|V|log|V|).
(Quote)Joaquín—Great answer…right up until you lost me with the E’ subset. What’s that supposed to be, the DAG’s transitive reduction? And how does your O(|V|log|V|) come about?
(Quote)As for the E’ bit: if you want to use topological sorting (TS) to sort a set of elements V given a SWO R you must feed TS with V and a set of edges E, where
E := {x->y : xRy}.
But for TS to work you don’t need to calculate the full set E, as many edges are redundant: for instance if xRy and yRz then xRz, so E includes x->y, y->z and y->z; for the latter is redundant in the sense that the output of TS does not depend on its existence, given the other two do exist. So, you can get by with some subset E’ with univocally determines TS output: given that coming up with such E’ is tantamount to sorting V, calculating it takes O(|V|log|V|).
(Quote)Then, yes, (V,E’) could be the transitive reduction of (V,E). But I still don’t get your point
(Quote)We both agree calculating the full E takes O(|V|^2), right? Now, how do you go about calculating E’? One way is:
sort(V); // O(|V|log|V|) for(x : sorted(V)) { if(xRy) E'=E'+(x->y); }which takes O(|V|log|V|). If you find a way to calculate E’ in less than that then you’ve made a breakthrough discovery in the art of computer programming
because applying then TS you’d have improved on the nlogn known sorting algorithms.
Anything obscure yet?
(Quote)right…
Sure, but what does that prove? It seems pretty obvious that if you rely on an O(|V|log|V|) operation to build your graph, the overall result can be no better than O(|V|log|V|)
(Quote)This (i.e. by sorting) is the obvious way to compute E’, and as I said before if one comes up with a better procedure that would get her some fame in the field (in fact it’s impossible to improve on this).
Let me restate the whole argument, hopefully in terms you find clearer:
Let (V,R) denote a set of vertices and a SWO on them, CE (compute edges) some algorithm returning, for a given (V,R), a set of edges such that
transitive closure(CE(V,R)) = {x->y : xRY}
and TS a topological sorting algorithm on (V,E) DAGs. If we apply these two algorithms CE and TS sequentally we get a sorting algorithm for (V,R) running in time:
O(TS(V,CE(V,R)) = O(CE(V,R)) + O(|V|+|CE(V,R)|) (1)
Now the paradox says: if we can come up with a suitable CE such that
then expression (1) would tell us that we’ve found a sorting algorithm whose running time improves on the classical nlogn boundary. Paradox solution: It’s been proved that comparison sorting cannot be done faster than O(nlogn) (see for instance here), so it follows that
that is, either the running time of the CE algorithm is higher than nlogn or the set of edges produced by the algorithm has size larger than |V|log|V| (or both).
(Quote)Well, I understand everything you’re saying, and it’s all correct as far as I can see, but I don’t find it very illuminating that a linear-time sort would violate fundamental laws of computing. We already know there has to be something wrong with the idea that toplogical sort can be used to sort any partial order in linear time. The point was to understand the difference between the two problems that makes the idea incorrect.
(Quote)Yeah, well, illumination is pretty much in the eye of the illuminated, I guess.
The one, illuminating if you want, insight that arises from the argument I exposed above is that going from some representation of data to a structurally different representation of the same data is not free in terms of computational cost. A binary relation R in some set V, which is represented in computational terms as a predicate function, encodes the exact same information as R’s associated graph, represented by a bunch E of (x,y) pairs, but calculating E from R (or the other way around) takes quadratic time on |V|. Other, functionally equivalent representations of R (vg. the transtitive reduction stuff we we’re talking about) also take more than linear time to compute.
When reviewing algorithms one has to be careful whether the data representation assumed by the algorithm matches the one we’re working with in our scenario, otherwise the cost of computing the required representation can ruin the overall performance. The predicate-graph dilemma seems to be a prominent example of this.
(Quote)…that’s why I was with you until you brought up E’. It didn’t seem to add anything for me—which, as you point out, doesn’t mean it helped nobody.
(Quote)I’m probably confused, but here goes…
Maps
You have pointed out that a pair can be viewed as having two characteristics (key and value), one characteristic of which is used for ordering and one of which isn’t: (key, value) => (used, ignored). Your implication is that since the key characteristic is used, the requirement to have a strict weak ordering relation over the pairs must in fact be the same as having a strict total ordering relation over the keys, so that it would be not quite correct/necessary to require having a strict weak relation over the keys.
However, a strict weak ordering relation implies (by definition) that incomparability is transitive, so if the set of keys is ordered by such a relation, then pairs must also be ordered by such a relation; that is, a pair’s value characteristic can just be viewed as another ignored characteristic of the key object—the pair tuple can be expanded/flattened into one object with a bunch of characteristics: (key_characteristic_0, key_characteristic_1, …, key_characteristic_n, value_characteristic_0, value_characteristic_1, …, value_characteristic_n) = (used, used, used, unused, used, unused, unused, used, … whatever).
Put another way: comparison characteristics are atomic, but keys are not atomic; keys are aggregates of characteristics: Can’t you still use the “is a brighter color than” relation to define the ordering of the keys?
Another way to say all of this is that the requirement doesn’t apply to the key because it applies to the pair. Rather, the requirement applies to the key so that it applies to the pair.
Topological Sort
I think Rodrigo has the right of it; Topological sorting is linear when the edges are known—after the nodes have been sorted, the task you are trying to solve.
I think you are confusing finding edges with choosing a particular ordering.
(Quote)Congrats, Michael— I’d say you explained the
(Quote)std::mapissue in three or four valid waysThanks! You’ve made my day!
(Quote)I think topological sorting is a bad example. It gives you a valid sequence given some already known restrictions. In fact, the edges are part of the input and they contribute to the runtime of the algorithm. If you don’t know nothing about the input, no algorithm that is based on comparisons between elements can be faster than n log n.
About the strict requirement of std::sort, I think that’s because you can’t pretend to use sort and then use a binary search that uses equality, if sorting wasn’t strict.
(Quote)Supporse that container (with size N) and comparing function is given.
If comparing function is strict weak order, we will take O(N log N) time to make up DAG structure.
If comparing function has NO assumptions, it’s O(N^2) time.
(Quote)Hi Tayuka,
That’s an interesting approach, but needs some explanation if it’s going to shed any light. What’s the basis for your assertions about the time it takes to build the DAG? Also, we can always make some assumptions about the comparing function if we’re doing a topological sort. We know we have a partial ordering, at least.
(Quote)Hi Rodrigo,
I think you’re on the right track about topological sorting, but your explanation doesn’t seem very cogent to me. For example, you do know something about the input: it’s a partial order. As for it being a “bad example,” well, you might say that’s intentional. The challenge of coming up with a really clear explanation of where the reasoning went wrong is an exercise in understanding the problem space.
I’m afraid I don’t understand what you’re saying about binary search at all. Binary search depends only on partitioning, and doesn’t really have anything to do with orderings (unless you think of a partition as a degenerate ordering of
falseandtrue). See N1313 for more background. It’s possible I’ve missed your point completely, though; if so, please help me understand.Cheers!
(Quote)