The Traveling Salesman Problem Is Not NP-complete

As an interview question, for many years I'd ask candidates to write a brute-force solution for the traveling salesman problem (TSP). This isn't nearly as hard as it sounds: you just need to try every possible path, which can be done using a basic depth first search. A lot of the time candidates who would get stuck would mention something about how the problem is NP-complete (usually to indicate that they thought the problem was impossible). The complexity class of TSP is in not something I asked as part of the interview, and not something I would use to judge a candidate. But hearing that TSP is NP-complete over and over used to kind of irk me, because TSP is not NP-complete. This surprises a lot of people, but it's true.

The term "NP-complete" has a very specific technical meaning, so it's not surprising that people misuse the term. Therefore before continuing, I'm going to define various complexity classes.

Definitions

P is the set of all problems that can be solved in polynomial time.

NP is the set of all problems whose solutions can be verified in polynomial time.

NP-hard problems are informally defined as those that can't be solved in polynomial time. In other words, the problems that are harder than P. This is actually a simplified, informal definition; later I'll give a more accurate definition.

NP-complete problems are the problems that are both NP-hard, and in NP. Proving that a problem is NP is usually trivial, but proving that a problem is NP-hard is not. Boolean satisfiability (SAT) is widely believed to be NP-hard, and thus the usual way of proving that a problem is NP-complete is to prove that there's a polynomial time transformation of the problem to SAT.

Why TSP Is Not NP-complete

Why is TSP not NP-complete? The simple answer is that it's NP-hard, but it's not in NP. Since it's not in NP, it can't be NP-complete.

In TSP you're looking for the shortest loop that goes through every city in a given set of cities. Suppose you're given a set of cities, and the solution for the shortest loop among these cities. How would you verify that the solution you're given really is the shortest loop? In other words, how do you know there's not another loop that's shorter than the one given to you? The only known way to verify that a provided solution is the shortest possible solution is to actually solve TSP. Since it takes exponential time to solve NP, the solution cannot be checked in polynomial time. Thus this problem is NP-hard, but not in NP.

In general, for a problem to be NP-complete it has to be a "decision problem", meaning that the problem is to decide if something is true or not. There's a simple variation of TSP called "decision TSP" that turns it into a decision problem. Imagine that instead of finding the shortest loop going through all cities, your goal is to determine if there exists any loop whose total length is less than some fixed number. For example, the question might be: is there a loop that goes through all of these cities, whose total distance is less than 100 km? In the negative case this is just as hard as regular TSP, because you'd end up testing all possible paths. But there's an important difference: the solution can be verified in linear time by adding up all of the distances making up the path, and that's what makes this variant part of NP. There's a straightforward proof that the decision variant is also NP-complete, by showing that there's an equivalence between the TSP decision problem and the Hamiltonian path problem.

What Does NP-hard Really Mean?

Earlier I mentioned that the definition of NP-hard as "problems harder than P" is informal. In nearly all cases this is sufficient, but it's not technically accurate. Formally, a problem is NP-hard if given an oracle machine for the problem, all other problems in NP could be solved in polynomial time.

The best known example of a problem that is in NP, but thought not to be NP-hard, is integer factorization. It's trivial to verify that the factorization of a number is correct, simply by taking the product of the factors given to you. This puts integer factorization in NP. However, it is widely believed that integer factorization is not NP-hard (and thus not NP-complete), because there doesn't appear to be any equivalence between integer factorization and other NP-complete problems.

Integer factorization is therefore in a weird complexity class. We think it's not NP-hard, and intuitively it feels "easier" than NP-hard problems like 3-SAT. A lot of smart people have looked at integer factorization, and no one has found a polynomial time algorithm for integer factorization. Thus it is probably "in-between" P and NP-hard, but not one really knows for sure.