What is Hadoop and what is the technology behind it? Looking at a Key Business Analytics Platform

If you are considering a career in business analytics, a working knowledge of Hadoop – a platform for handling Big Data – goes a long way. Hadoop is in wide use around the world and in the San Francisco Bay Area, where the demand for business analytics professionals is the highest (according to Forbes).

Big Data has transformed businesses and requires a new class of data intelligence professionals that can meet the challenge interpreting it. But there are also technical challenges to working with Big Data. Organizations ingest an enormous amount of data every millisecond—most of which is unstructured and cannot be handled by conventional databases. The servers needed to “crunch” these data are expensive and can be difficult to implement. Among the solutions to these challenges is Hadoop, an open source framework that uses a variety of tools and techniques to peer into big data and give decision-makers better insight. Hadoop is a component of the curriculum of Golden Gate University’s new master’s degree in Business Analytics because it is geared toward Big Data and provides a resource to professionals who specialize in its interpretation.

We asked GGU alumnus and Senior Hadoop Administrator at UnitedHealth Group, Ken Nakagawa, to answer a few questions about Hadoop.

How would you explain what Hadoop is if someone asked you at a bus stop or on Caltrain?

Hadoop is free open source software that allows companies to store and analyze data that was probably not utilized before because of the cost of hosting many proprietary servers, as well as the processing speed needed to examine large data sets.

The longer answer is that Hadoop has the significant advantage of being able to analyze unstructured data like log files, chat conversations, and tweets, etc. The amount of data generated is so enormous that conventional large computer systems and relational databases cannot keep up and provide a cost-efficient solution. A significant advantage is that Hadoop is open source and you can use commodity servers. Before companies such as IBM, HP, Sun Microsystems, etc. would offer data warehouse products for storing a large amount of data, but they were usually very expensive.

What method does Hadoop use to make data crunching faster?

Our Hadoop cluster has about 370 nodes (servers) the combined storage is about three petabytes. At my company, we are ingesting between 9 and 12 terabytes of data a day. Hadoop can make data crunching faster by combing a set of commodity PC servers a cluster can act like one giant computer. Hadoop will assign a part of a job to each server within the cluster to work on their part, get their results, combine them and present the results as a whole. Clustering nodes – rather than buying a large server – provides scalability depending on your need. You can easily scale from three nodes up to thousands.  It is both technically efficient and cost effective.

I think a Business Analytics specialist will have an advantage if they study or have experience with Hadoop and its analytic tools. If business people can get hands on experience accessing all that data, they usually find new information and patterns. 

Can you describe a particular project that Hadoop that was memorable?

EBay is a big Hadoop user. All those items you see for sale on their site are stored as unstructured data in a Hadoop database. When I was a consultant, EBay was one of my most memorable customers because their business is so integrated with Hadoop.

How does Hadoop make sense of unstructured data?

Hadoop can store unstructured data and have databases like Apache HBase™ serve it up. You can store, query and even modify the data just like using a relational database and retrieve it just as quickly.

Why would a business person need hands on knowledge of Hadoop – outside of the technical side which you inhabit?

I think a Business Analytics specialist will have an advantage if they study or have experience with Hadoop and its analytic tools. If business people can get hands on experience accessing all that data, they usually find new information and patterns. For a big company, it can give you access to other department’s data that you didn’t have before. You will have a greater source of data to work with!

Ken Nakagawa (MS ’02, Database Development and Administration) is the Senior Security Consultant Hadoop Administrator at UnitedHealth Group.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.