Martes, Hulyo 28, 2015

Almost big data (analysis without Hadoop)

Everyone likes to feel like the Big Man on Campus, and if they aren't, they're looking for a campus of the appropriate size where they can stand out. So it's no surprise that when the words "big data" started flowing through the executive suite, the suits started asking for the biggest, most powerful big data systems as if they were purchasing a yacht or a skyscraper.
The funny thing is, many problems aren't big enough to use the fanciest big data solutions. Sure, companies like Google or Yahoo track all of our Web browsing; they have data files measured in petabytes or yottabytes. But most companies have data sets that can easily fit in the RAM of a basic PC. I'm writing this on a PC with 16GB of RAM -- enough for a billion events with a handful of bytes. In most algorithms, the data doesn't need to be read into memory because streaming it from an SSD is fine.
There will be instances that demand the fast response times of dozens of machines in a Hadoop cloud running in parallel, but many will do just fine plugging along on a single machine without the hassles of coordination or communication.

Walang komento:

Mag-post ng isang Komento