As an interview question, for many years I’d ask candidates to write a
brute-force solution for
traveling salesman problem (TSP).
This isn’t nearly as hard as it sounds: you just need to try every possible
path, which can be done using a basic depth first search. A lot of the time
candidates who would get stuck would mention something about how the problem is
NP-complete (usually to indicate that they thought the problem was impossible).
The complexity class of TSP is in not something I asked as part of the
interview, and not something I would use to judge a candidate. But hearing that
TSP is NP-complete over and over used to kind of irk me, because TSP is not
NP-complete. This surprises a lot of people, but it’s true.
The term “NP-complete” has a very specific technical meaning, so it’s not
surprising that people misuse the term. Therefore before continuing, I’m going
to define various complexity classes.
P is the set of all problems that can be solved in polynomial time.
NP is the set of all problems whose solutions can be verified in
NP-hard problems are informally defined as those that can’t be solved in
polynomial time. In other words, the problems that are harder than P. This is
actually a simplified, informal definition; later I’ll give a more accurate
NP-complete problems are the problems that are both NP-hard, and in NP.
Proving that a problem is NP is usually trivial, but proving that a problem is
Boolean satisfiability (SAT) is
widely believed to be NP-hard, and thus the usual way of proving that a problem
is NP-complete is to prove that there’s a polynomial time transformation of the
problem to SAT.
Why TSP Is Not NP-complete
Why is TSP not NP-complete? The simple answer is that it’s NP-hard, but it’s not
in NP. Since it’s not in NP, it can’t be NP-complete.
In TSP you’re looking for the shortest loop that goes through every city in a
given set of cities. Suppose you’re given a set of cities, and the solution for
the shortest loop among these cities. How would you verify that the solution
you’re given really is the shortest loop? In other words, how do you know
there’s not another loop that’s shorter than the one given to you? The only
known way to verify that a provided solution is the shortest possible solution
is to actually solve TSP. Since it takes exponential time to solve NP, the
solution cannot be checked in polynomial time. Thus this problem is NP-hard, but
not in NP.
In general, for a problem to be NP-complete it has to be a “decision problem”,
meaning that the problem is to decide if something is true or not. There’s a
simple variation of TSP called “decision TSP” that turns it into a decision
problem. Imagine that instead of finding the shortest loop going through all
cities, your goal is to determine if there exists any loop whose total length is
less than some fixed number. For example, the question might be: is there a loop
that goes through all of these cities, whose total distance is less than 100 km?
In the negative case this is just as hard as regular TSP, because you’d end up
testing all possible paths. But there’s an important difference: the solution
can be verified in linear time by adding up all of the distances making up the
path, and that’s what makes this variant part of NP. There’s a straightforward
proof that the decision variant is also NP-complete, by showing that there’s an
equivalence between the TSP decision problem and
Hamiltonian path problem.
What Does NP-hard Really Mean?
Earlier I mentioned that the definition of NP-hard as “problems harder than P”
is informal. In nearly all cases this is sufficient, but it’s not technically
accurate. Formally, a problem is NP-hard if given
an oracle machine for the
problem, all other problems in NP could be solved in polynomial time.
The best known example of a problem that is in NP, but thought not to be
is integer factorization.
It’s trivial to verify that the factorization of a number is correct, simply by
taking the product of the factors given to you. This puts integer factorization
in NP. However, it is widely believed that integer factorization is not NP-hard
(and thus not NP-complete), because there doesn’t appear to be any equivalence
between integer factorization and other NP-complete problems.
Integer factorization is therefore in a weird complexity class. We think it’s
not NP-hard, and intuitively it feels “easier” than NP-hard problems like 3-SAT.
A lot of smart people have looked at integer factorization, and no one has found
a polynomial time algorithm for integer factorization. Thus it is probably
“in-between” P and NP-hard, but not one really knows for sure.