Filed in Partner & Customer Updates by John McKenna | May 31, 2012 12:15 pm
Diffbot is a technology company focused on enabling the next generation of smarter software by applying cutting edge research to the problem of understanding web content. It’s a robot that uses techniques in computer vision, machine learning and natural language processing to identify and extract the important parts from web content and pages. Diffbot provides an API for developers so their applications can read web pages like a human set of eyes. The Diffbot goal is to understand the semantics of every page on the web.
Born in the Stanford PhD program, Diffbot was conceived to initially help founder Michael Tung track changes to his course’s web pages. Mike was working on computer vision and artificial intelligence, and decided to start applying these same technologies to understanding web pages. Diffbot went on to become one of the inaugural startups funded by Stanford’s accelerator StartX. “Our goal with Diffbot is to understand every corner of the web, and make every bit of it accessible for developers trying to create new, rich applications and experiences,” explains Tung, who is also Diffbot’s CEO.
Diffbot officially launched its first two on-demand APIs last fall. The Front Page API is designed to analyze home and index pages using common layout markers including headlines, bylines, images, articles, ads and more. The Article API is used to extract clean article text, related images and videos, and generate unique cross-referenced tags from news and blog web pages. Diffbot is now handling more than 100 million API calls per month from the thousands of developers using Diffbot APIs.
“We think the future of the web is one based around its objects, the important content items on pages or in applications, rather than the pages themselves,” says Diffbot Vice President of Product John Davi. “As a precursor to this, we’ve categorized the web into approximately 20 different page types that can be visually analyzed and itemized, including everything from product and review pages to social networking profiles and recipes. There are an additional 18 or so page types we’ve defined, so our job is to take our core visual learning algorithms, and expand our API offerings to cover these additional page types.”
Recently, Diffbot secured a $2 million investment from technology veterans including Sky Dayton, founder of EarthLink; Andy Bechtolsheim, co-founder of Sun Microsystems; Joi Ito, director of MIT Media Lab; Brad Garlinghouse, CEO of YouSendIt; and other executives and founders from Facebook, Twitter and Yahoo, with participation from Matrix Partners. Funding will be used to rapidly expand Diffbot’s core team of machine-learning and natural-language-processing experts in order to speedily expand upon infrastructure and release additional API offerings, with new investors and advisors on board to build Diffbot for scale.
“Diffbot is an incredibly sophisticated tool for developers to rapidly build compelling applications around web content,” adds Dayton, founder of EarthLink and Boingo. “The more developers use Diffbot, the more it learns about and adds structure to data on the web. This technology is becoming the basis for a new kind of web experience enhanced by machine interpretation of content.”
The Rackspace Startup Program congratulates Diffbot, creators of visual learning robot technology that lets developers analyze, extract and enhance web content, in securing funding to take the company to the next level. Are you ready to take your startup to the next level? If so, the Space Cowboys are here to help with the same world class cloud computing platform upon which Diffbot was built. Let us know what you’re building and we’ll let you know how we can help provide the rocket fuel to launch it!
Check out more news, articles and Startup Spotlight features from the Rackspace Startup Program.
Source URL: http://www.rackspace.com/blog/rsp-spotlight-on-diffbot-the-visual-learning-robot-for-the-web/
Copyright ©2015 The Official Rackspace Blog unless otherwise noted.