{"id":21,"date":"2010-10-06T19:42:59","date_gmt":"2010-10-07T02:42:59","guid":{"rendered":"http:\/\/www.skierpage.com\/blog\/?p=21"},"modified":"2010-10-06T19:42:59","modified_gmt":"2010-10-07T02:42:59","slug":"web-computer-nell-learns-by-reading","status":"publish","type":"post","link":"https:\/\/www.skierpage.com\/blog\/2010\/10\/web-computer-nell-learns-by-reading\/","title":{"rendered":"web: computer NELL learns by reading"},"content":{"rendered":"<p>The <a href=\"http:\/\/rtw.ml.cmu.edu\/rtw\/\">Read the Web team at CMU<\/a> have come up with NELL, a computer program that learns by reading web pages, extracting facts and improving its reading ability as it does so.<\/p>\n<p>As I <a href=\"\/blog\/2006\/06\/web-knowledge-and-semantics.html\">blogged a few years ago<\/a>, we&#8217;ve been here before, starting with the <a title=\"Wikipedia page for &quot;Cyc&quot;\" href=\"http:\/\/en.wikipedia.org\/wiki\/Cyc\">Cyc<\/a> project.\u00a0 After 26 years, that&#8217;s struggled to get to 170,000 facts. The NELL team are proud to have reached 440,000 beliefs, but I suspect the whole project will get sludgy and confused as it tries to turn the richness of language into a set of cut-and-dried belief statements. As I <a href=\"http:\/\/gizmodo.com\/comment\/30300878\">commented elsewhere<\/a>, the  more meanings that you patiently explain to these systems, the less  they  know.  Very quickly they learn &#8220;New York&#8221; is a &#8220;city&#8221; which is a   geographical and governmental entity, but what the hell do they do with   the sentence \u201c<em>I&#8217;m in a New York state of mind<\/em>\u201d?!  The context  for  meaning is vast, it takes years of hard living in the real world  to  gain meaning from many sentences about the world.  (Thus the AI   researchers trying to raise a robot like a baby.)<\/p>\n<p>Also, many of the things that NELL has figured out are already well-explained and codified in information sources on the web, such as the mighty Wikipedia. The <a href=\"http:\/\/www.nytimes.com\/2010\/10\/05\/science\/05compute.html?_r=1\">New York Times article about the project<\/a> gives the example<\/p>\n<blockquote><p>Peyton Manning is a football player (category).  The Indianapolis Colts is a football team (category). By scanning text  patterns, NELL can infer with a high probability that Peyton Manning  plays for the Indianapolis Colts.<\/p><\/blockquote>\n<p>But just go to Wikipedia and in <a title=\"wiki text of Wikipedia article for &quot;Peyton Manning&quot;\" href=\"http:\/\/en.wikipedia.org\/w\/index.php?title=Peyton_Manning&amp;action=edit\">the  source of Peyton Manning&#8217;s page<\/a> you see <tt>[[Category:American football  quarterbacks]]<\/tt> and in the template, <tt>{Infobox NFLactive ...  |currentteam=Indianapolis Colts}}<\/tt> ! Almost every fact simple enough for NELL to learn that&#8217;s notable is already codified on Wikipedia! Already the <a href=\"http:\/\/en.wikipedia.org\/wiki\/DBpedia\">DBpedia project<\/a> takes such semi-structured data from Wikipedia pages and maps it to  computer-readable semantic statements using standard vocabularies like rdf:Description, skos:subject,  dbprop:currentteam , etc.  DBpedia has millions of  such bits of information about the millions of things that have Wikipedia pages. Does it know more then NELL&#8217;s 440,000 facts? Well,  what does it mean to &#8220;know&#8221; something anyway?  What does it  mean to mean something?  What is what?  Huh?<\/p>\n<p>It&#8217;s cool that NELL has amassed so many beliefs  by reading, but that&#8217;s dwarfed by the millions of machine-readable  &#8220;facts&#8221; already out there.  NELL knows enough to get confused and  require human correction, but that&#8217;s a weak kind of intelligence.  If a NELL or DBpedia  can&#8217;t do original research or come up with insights, then is either system better than Googling &#8220;<a title=\"Google search\" href=\"http:\/\/www.google.com\/search?q=What+is+Peyton+Manning's+football+team%3F\">What is Peyton Manning&#8217;s football team?<\/a>&#8221; and scanning the results for  an answer?<\/p>\n<p>When Doug Lenat started Cyc in 1984 it was all about reading a \u201cnewspaper\u201d, a quaint set of articles printed on a dead tree. Now it&#8217;s all social and webified. You can read <a href=\"http:\/\/twitter.com\/#!\/cmunell\">NELL&#8217;s tweet stream<\/a> and see it get things right, and wrong:<\/p>\n<blockquote><p>I think &#8220;Steel Mobile Phone&#8221; is a <a href=\"http:\/\/twitter.com\/#search\/%23buildingmaterial\" target=\"_blank\">#buildingmaterial<\/a> (<a title=\"http:\/\/bit.ly\/b4V0HU\" rel=\"nofollow\" href=\"http:\/\/bit.ly\/b4V0HU\" target=\"_blank\">http:\/\/bit.ly\/b4V0HU<\/a>)<br \/>\nI think &#8220;dutchtown high school&#8221; is a <a href=\"http:\/\/twitter.com\/#search\/%23sportsteam\" target=\"_blank\">#sportsteam<\/a> (<a title=\"http:\/\/bit.ly\/bCh4YY\" rel=\"nofollow\" href=\"http:\/\/bit.ly\/bCh4YY\" target=\"_blank\">http:\/\/bit.ly\/bCh4YY<\/a>)<br \/>\nI think &#8220;doubletree hotel and waltham&#8221; is a <a href=\"http:\/\/twitter.com\/#search\/%23hotel\" target=\"_blank\">#hotel<\/a> (<a title=\"http:\/\/bit.ly\/aanr6j\" rel=\"nofollow\" href=\"http:\/\/bit.ly\/aanr6j\" target=\"_blank\">http:\/\/bit.ly\/aanr6j<\/a><\/p><\/blockquote>\n<p>Scanning NELL&#8217;s apparent representation:<\/p>\n<ul>\n<li>The team&#8217;s scrunchedtogethernames willfully ignores the Wikipedia friendly naming approch, e.g. <a title=\"Wikipedia page for &quot;Building material&quot;\" href=\"http:\/\/en.wikipedia.org\/wiki\/Building_material\">Building_material<\/a>.<\/li>\n<li>I wonder how NELL will handle naming conflicts.\u00a0 Again, Wikipedia has a fine approach: Steel (band), Steel (comics), Steel (film), etc. NELL needs to learn to disambiguate its names by matching Wikipedia articles, otherwise it&#8217;s going to wind up terribly confused between all the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Michael_Jackson_%28disambiguation%29\">Michael Jackson<\/a>s out there.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>NELL follows in the footsteps of Cyc, but Wikipedia already codifies all the easy facts. <a href=\"https:\/\/www.skierpage.com\/blog\/2010\/10\/web-computer-nell-learns-by-reading\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[24,25],"tags":[],"class_list":["post-21","post","type-post","status-publish","format-standard","hentry","category-search","category-semantic-web-web"],"_links":{"self":[{"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/posts\/21","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/comments?post=21"}],"version-history":[{"count":0,"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/posts\/21\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/media?parent=21"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/categories?post=21"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.skierpage.com\/blog\/wp-json\/wp\/v2\/tags?post=21"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}