Uncategorized

Roll Your Own Google

Originally in: http://www.wired.com/science/discoveries/news/2005/12/69817

December 13, 2005

Post image for Roll Your Own Google

In a move with potentially far-reaching implications for the search market, Alexa Internet is opening up its huge web crawler to any programmer who wants paid access to its rich trove of internet data.

Alexa, a subsidiary of Amazon.com that is best known for its traffic rankings, on Monday unveiled Alexa Web Search Platform, a set of online tools for searching, indexing, computing, storing and publishing vast quantities of net data.

Alexa claims it’s the first time that developers, students and startups will be given inexpensive access to an industrial-scale web crawler — the same technology used by industry giants like Yahoo (Yahoo Slurp) and Google (Googlebot).

“It sounds innocuous but it’s big,” said Alexa CEO Bruce Gilliat. “We’re giving access to billions of pages and computing resources…. Users have never had this opportunity before. Big industry has ruled search, because it was the only player with access to the tools.”

Alexa spiders 4 billion to 5 billion pages a month and archives 1 terabyte of data a day. The new platform will allow developers to build their own search engines.

“If it is what they claim it is, it strikes me that this is nontrivial news,” said search industry pundit and author John Battelle. “Anyone can crawl the web, but crawling and maintaining an index at scale is very difficult and very expensive. They are providing convenient access to something that was very dear.”

Battelle said the move, if it pans out as promised, could have a big impact on the search industry, and could possibly lessen Google’s growing dominance in web search.

Alexa’s offering may help “create an ecosystem (in search) where something can occur outside the Googleverse,” he said.

To illustrate the new service’s potential, Alexa developed a photo search engine that allows users to query photo metadata normally hidden from standard keyword searches, such as the date the photo was taken or the camera used.

Musipedia, another Alexa prototype, provides users with the ability to search the web by melody. Give the engine a keyword or melodic contour, and it returns similar music. Musipedia allows users to input their own whistling as a query.

From computer scientists to web hobbyists, Gilliat predicted Alexa’s inexpensive services will spawn numerous creative results. Costs are priced at $1 per transaction, which range from a CPU hour of computing time to gigabytes of uploads and downloads. Gilliat said a complete web snapshot should cost a “couple thousand” dollars.

Thanks to the company’s history, Gilliat believes Alexa is well-positioned to democratize data search.

It is an interesting return to the spotlight for Alexa, the commercial cousin of Internet Archive, a nonprofit founded by Brewster Kahle that is dedicated to preserving a public index of the web and its history. Alexa’s crawler donates directly to the Internet Archive.

Alexa has been archiving the web from its Presidio of San Francisco offices since it was founded in 1996. In 1997, Alexa unveiled its toolbar, one of the first such search-specific browser add-ons, which has since registered more than 10 million downloads. Amazon acquired Alexa in 1999.

Alexa has more than a thousand machines involved in storage, access and computation, and the company expects high demand for the new service.
“Using our crawler saves massive time, money and computational power,” Gilliat said. “There are lots of really smart people out there who don’t work for a search engine, but they have good ideas, needs and desires for what they want from web search. They have an inkling, and we have the way.”
Amazon and Alexa representatives declined to speculate whether this move might compel other search engines to commercialize their crawlers.
Battelle, however, characterized the news as “Amazon casting a stone in the lake of search.”

He said Alexa’s announcement echoes other developments in recent years at Amazon, a company that prides itself on leveraging the strength of its user community.

“I have been consistently impressed by the innovative thinking there,” Battelle said. “This is the type of news you might come to expect from Amazon…. We can now sift the web and do it cheaply and frequently. This feels very Web 2.0.”

Leave a Comment

Additional comments powered by BackType