Like everyone present during the taping of the shows (to be broadcast on February 14, 15 and 16), Baker signed what he describes as a "Draconian" non-disclosure agreement. What sets Baker apart from everyone else in the audience that day is the fact that he will reveal the in-depth, behind-the-scenes story of Watson in great detail.
Author of the notable 2008 big-data-computing expose "The Numerati," Baker has spent the last year researching and writing Final Jeopardy: Man Vs. Machine and the Quest to Know Everything, set for release on February 17, the day after the matches conclude. In fact, the first 11 chapters are already available as an e-book, with the 12th and final chapter to be downloadable the day the print edition is released.
Baker couldn't discuss the outcome of the matches. But in a pre-show interview with Information Week, he shared details on the wake-up call IBM had in the early days of the project and the disagreements the company had with Jeopardy's producers. Baker also shares his take on why the "contrivance" of a TV quiz-show challenge is a serious matter for IBM and what's likely to come of the technology breakthroughs.
A Darwinian Experiment
You might think IBM could assemble all the technology needed to run Watson from its vast portfolio of commercial computing technologies, but Baker says that's not the case. Sure, it runs on off-the-shelf IBM Power 750 Servers, but much of the analytic software -- the heart of Watson's brain -- was developed from scratch.
Back in 2007, when the project was in its opening stages at IBM Research, David Ferrucci, the computer scientist leading the challenge, feared that the project would either fail miserably or, worse, be discovered and outdone by some basement hacker taking a novel approach.
To set up internal competition that would subject his researchers to "Darwinian pressures," as Baker writes in his book, Ferrucci assigned James Fan, a recent doctoral grad and new researcher on his team, to develop a system that would take on IBM's best incumbent question and answer technology, a platform called Piquant.
The two rival teams, Fan and a group of veteran researchers experienced with Piquant, were given four weeks to develop and train a system on the same set of 500 sample Jeopardy clues. The Piquant developers could draw from an existing platform based on years of research, but Fan had to cobble a system together completely from scratch. To do so, he combined entity extraction technologies (software that can spot people, places, things, dates and concepts) with lots of algorithms -- some sophisticated, some incredibly crude.
In one example, he developed a program that would submit the Jeopardy clue as a Google search and then take the title of the first Wikipedia page appearing in the result set as the correct answer. "Some of Fan's algorithms were seemingly dumb, but they worked on a limited domain of the question," Baker explains. "You can populate the machine with lots of algorithms, each with its own specialty; if each one can bring back a small percentage of correct answers, then you can create a whole ecosystem in which a bunch of algorithms deliver a bunch of answers, and an analytical system can then determine which should be trusted."
In the resulting bake-off, held in March 2007, the Piquant-based system answered only 30 percent of the clues correctly. But Fan's system did nearly as well.
"The bake-off proved that Piquant was not up to snuff, and Ferrucci concluded they were going to have to build an entirely new and much more ambitious platform if they were going to succeed," says Baker.
Adopting elements of Fan's approach, the first breakthrough was combining many algorithms and then correlating and scoring the confidence in myriad answers. Indeed, this "ensemble" idea is not entirely new, and it has cropped up elsewhere in recent years. For example, ensemble analysis was used by many of the leading contestants in the 2009 Netflix Prize competition.
There's much more to the technology development story (as explored in this story and covered in great detail in Baker's book). But IBM Research basically spent much of 2008 and 2009 adding millions of lines of new code to Watson's analysis and scoring software. Another breakthrough was the addition of a feedback loop that enabled Watson to learn from correct and incorrect answers supplied by both humans and the computer itself.
The Show Agreement Challenge
The IBM Research team gained confidence as Watson's Jeopardy performance steadily improved, but it took two demonstrations and a bit of hard negotiation with Jeopardy's producers to hammer out the details of the competition. After an initial agreement was reached in early 2010, Jeopardy's producers introduced new requirements.
"Jeopardy wanted the computer to have a physical finger so it would have to press a button just like the humans," Baker explains. "IBM had done all its testing with Watson buzzing in purely electronically, so they were upset and felt that Jeopardy was trying to graft human limitations onto a computer."
From IBM's perspective, its developers were building a brain, but Jeopardy's producers were trying to turn it into a robot. In the interest of perceived fairness, according to Baker, IBM ultimately acquiesced and gave Watson an electromechanical actuator -- an equivalent to the buzzers that human contestants use.
In another disagreement, Ferrucci's team wanted assurances that Jeopardy's writers could not bias the clues in favor of the human contestants.
"Conceivably, writers could fill the game with all kinds of puns and riddles designed to foil a machine," Baker says. "IBM feared the writers would make it a Turing test so that instead of a game of Jeopardy it would become a test to see if the machine could pass for a human."
Jeopardy assured IBM that the clues for an entire season's worth of episodes had already been written, but the show's producers granted the extra precaution of setting aside 30 sets of clues and using a third-party company to randomly choose those to be used in the human-vs.-machine episodes. Publicity Backlash?
As IBM and Jeopardy both hoped, the challenge has generated plenty of publicity. There was a barrage of international news coverage following a January 13 press conference. The social networks have been buzzing ever since, and then on February 9, PBS aired a Nova special entitled "Smartest Machine on Earth: Can a Computer Win on Jeopardy?" No doubt there will be another round of hoopla around this week's episodes and the final outcome.
There are also signs of alarm and a bit of a backlash. Critical social media comments and letters to editors have questioned whether IBM could have put the tens of millions of dollars spent on Watson toward a loftier goal than playing Jeopardy.
But even if the Jeopardy Challenge is "a gimmick and contrivance" that's okay, Baker counters. "So many people now complain that companies are purely driven by quarterly profits and that pure research has faded away or is going into developments that can be monetized immediately. Here's IBM saying, 'We're going to develop a new type of computing that might have all kinds of very useful applications,' so I don't see anything to criticize in that."
Of course, Baker, too, stands to benefit as Jeopardy Challenge publicity will boost sales of his book. But IBM needs the publicity in part because "all the sexy stuff is now being developed by the likes of Apple and Google," Baker says. "To attract investors and top PhD students coming out of the top programs, they want to show them that you can do really cool stuff at IBM. They are competing for those brains."
Watson in the Real World
There's no doubt that IBM has advanced the science of deep question-and-answer technology with an English-language-interpreting interface. But what's to come of this development?
"The question is whether the interface can be hitched to a flexible back-end system that could provide different types of analytics for different industry needs at the right price point," Baker says.
There's no guarantee that IBM will be the company that ends up taking commercial advantage, Baker says.
"My conclusion in the book is that this type of computing is going to spread and it's going to become available through cell phones fairly soon, but Google or Oracle could end up providing these types of analytics," Baker says.
IBM has been playing up possible medical uses, whereby a Watson-like computer becomes a diagnostic aid to doctors. Baker says initial uses are likely to be much more mundane, like powering help-desk software.
As for the fear of computers that the challenge has stoked, IBM and Jeopardy have played that up to add to the fun and spectacle of the event, Baker says. In the real world, he says he has no doubt that the technology will ultimately be used for the benefit of humans.
"Watson is not that smart, but it's very powerful," Baker explains. "It can't make decisions, it doesn't really understand, and it doesn't really think, but it can read through ridiculous amounts of data and come up with possible answers with a known degree of confidence."
Watson often serves up some answers that are way off track, but human experts can discount those," Baker says. "If the technology can come up with two or three good ideas that are interesting and that lead to new lines on inquiry, then the humans would be smart ones to take advantage of the tool."
Need more convincing? Check your local listings to watch Watson and his human counterparts in action on Jeopardy this week.