Reach out to me if you are struggling with ideas on how to save the world.

Posts tagged ‘Hadoop’

Hadoop or not to Hadoop


I have been a relational database man for many years. When you said the word “data” only one thing popped into mind – relational database. Now days things are changing. With so many disparate data sources out there it is becoming increasingly more difficult to harvest that data, let alone use it in some meaningful way. I know Google had this challenge and developed MapReduce. Has anyone ever used Hadoop? My first read of this elephant is that there is still a lot of development that must occur to harvest the data and then tons of minutia that must be written in Java just to organize the information into some usable format.

I really would like the opinion of folks that have actually installed it (in a cloud – not locally) and then developed something useful with it. My gut tells me this has some serious potential. However, I don’t want to even go down this path and install all the components unless I can be convinced this is something I cannot do without.

I think I understand the cloud piece. You purchase that aspect of it from Amazon or RackSpace, not sure about Windows Azure yet, but in any case that handles the infrastructure side of things. Now that you have that out of the way and you have downloaded the client and all the goodies that make up the client piece you are ready to go. Now what? Do you really have to start developing in Java to craft up your map and reduce segments? Do you really have to think through all the workflow of what output goes into what input? Again, my gut tells me you do. I am simply trying to justify learning this, and then assisting small to mid size organizations in implementing it.

Thanks for at least reading this far. Comments welcome.