Posts Tagged ‘aws’

In 2000, the Privacy Commissioner in Canada passed the Personal Information Protection & Electronic Documents Act (PIPEDA). It sets out the legal rules for collecting and storing (read: protecting) personal information on Canadians.

Separately, in its Department of Homeland Security response to the terrorist attacks of 9/11, the US government expanded its power to subpoena data with the “Patriot Act”. The US government can subpoena data from companies and gag them from even notifying account holders. The public doesn’t know how often this happens, but we do have a high profile example in the news right now: Twitter Shines a Spotlight on Secret F.B.I. Subpoenas.

Since these two legal frameworks were established, cloud computing has became broadly accepted, thanks in part to excellent platforms like Amazon Web Services. None of the major cloud platforms have data centers in Canada, which means there is a potential collision between PIPEDA and US law that would complicate cloud usage for services that collect and store Canadian consumer data.

Should Canadian companies be nervous about storing data in cloud providers in the US? I turned up no definitive answer, but here’s some interesting tidbits from the Canadian government.

  • “PIPEDA does not prohibit organizations in Canada from transferring personal information to an organization in another jurisdiction for processing.”
    “In an investigation into a complaint involving outsourcing to a U.S. firm by CIBC Visa, the OPC found CIBC to be in compliance with PIPEDA.”–Guidelines for Processing Personal Data Across Borders
  • “PIPEDA does not hinder our global economy. In fact, the legislation itself states that it is intended to support and promote electronic commerce by protecting personal information.”
    “The organization needs to use contractual or other means to provide a level of protection comparable to PIPEDA while the information is being processed by a third party.”–Canadian Federal Privacy Legislation: The First Ten Years (Sept 2010)

Sounds like some wiggle room to me. Executives need to weigh the risk of being sued in Canada (as CIBC Visa was sued in 2004) against the significant reward of moving their systems into the cloud.

CentriLogic, a cloud/hosting company, is trying to capitalize on the situation. Their marketing pitch, writ large in all-caps on their homepage: Do you know where your cloud servers are?

Unfortunately, the CentriLogic platform is nowhere near competitive with Amazon’s, which continues to impress me with frequent and significant upgrades. (AWS is this CTO’s dream come true!)

How can Canada-focused companies deliver consumer products and services on a par with their US counterparts if they can’t leverage modern cloud platforms?

Hat tip to lawyer friends Rob Hyndman and Jonas Brandon for their perspectives as I looked into this.

Google crawls the web, caches and indexes the pages it finds into a database, and provides a consumer user interface to search that database. What if we could build other applications on top of that same database?

Not having access to Google’s copy of the web, other companies do the same thing as Google themselves in order to provide their services. Some examples:

  • Attributor lets large publishers find video, image, and text copyright infringements.
  • TinEye lets users upload an image and see where it is used online.
  • MajesticSEO lets website owners track backlinks to their pages.

Amazon has a growing list of Public Data Sets. What if they could provide cached “views” of the web that could be processed using EC2 or Elastic MapReduce? That would allow more entrepreneurs to think big about using the whole of the web as a data set.

Amazon’s Public Data Sets already has some cached versions of Wikipedia. Companies (like Freebase) and researchers (such as Jun Liu & Sudha Ram) seem to use them. My wish is we had such a query-able data store for all websites.

Until we have such a data source and platform, here’s Ilya Grigorik’s excellent presentation on Building a Mini-Google in Ruby to do it ourselves.

An aside: For simple site search problems, I prefer the design of Bing’s API over Google’s Site Search. We use Bing for LearnHub Search.


Get every new post delivered to your Inbox.