Installing HDInsight
It’s been a while since I’ve had the opportunity to blog so when I decided to install HDInsight on a VM, I figured what better opportunity to get back in the swing of it. The Jumping Off Point To get...
View ArticleInstalling Mahout for HDInsight on Windows Server
I am passionate when it comes to analytics, data mining and machine learning and I think most organizations do too little when it comes to this arena. That’s why one of my favorite parts of the Hadoop...
View ArticleHIVE on HDInsight: First Glance
Hive Introduction Within the Hadoop ecosystem, you can use HDFS to load and store data and MapReduce to do both simple and hardcore processing. One of the missing pieces to the puzzle that is familiar...
View ArticlePreparing Data for Hadoop
In my next couple of blog entries, I will be focusing on PIG and then MapReduce. Before that however, I need to prepare a dataset and get it loaded in HDFS. The data that I will be working with is...
View ArticleWhen Pigs Fly: an apache pig introduction
In previous posts, we have looked at what it takes to get started with with Hadoop on Windows using HDInsight. We also looked at Hive, which is the data warehousing framework built on top of Hadoop. In...
View ArticleShakin’ Bacon: Using Pig To Process Data
In my last post (see HERE), I introduce the Apache Pig project and showed you the equivalent of the “Hello World” demo in Pig. In this post, we are going to use the GSOD (Global Summary of the Day)...
View ArticleMMM More Bacon – Pig User-Defined Functions (UDFs)
Okay…okay…I know…the pig jokes are lame and getting old by now…maybe a picture of a kitten dressed like a Pig will cheer you up. Luckily this is the last of my introductory Pig posts before moving on...
View ArticleMap/Reduce – A Brief Introduction
Somewhere between teaching a BI Bootcamp class and wrestling my troop of kids, I promised myself I would get a blog post in this week. Luckily, I’ve had a few code heavy posts, so we will dial it back...
View ArticleMapReduce – First Glance
In my last post, we took a helicopter tour of the MapReduce framework and its many facets. I believe its important to have a functional understanding of MapReduce even if you never intend to never work...
View ArticleMapReduce Ninja Moves: Combiners, Shuffle & Doing A Sort
Who’s driving this car? At first glance it appears that as a developer, you have very little if no control over how MapReduce behaves. In some regards this is an accurate assessment. You have no...
View ArticleHello My Name is Sqoop
If my previous post we have looked at different means and methods for loading and subsequently working with data in a Hadoop environment. Largely missing from the discussion to date however is how SQL...
View ArticleBuilding a Mahout Recommendation Engine: Part 1 – Types of Recommenders
Recommendation Engines have become a pervasive and daily part of our digitally connected lives. Whether your shopping on Amazon or reading new articles on your Yahoo! home page the products and news...
View Article#Mahout Recommendation Engines: Part 2 – Ride the Elephant
In Part 1 of this blog series we built a foundation by introducing the various techniques that can be used to generate recommendations for products or items to your users. In this post, we begin...
View Article#Mahout recommendation Engines: Part 3 – Moving Data
In the previous two posts of this series we built a foundation for designing and building a recommendation engine. In the first post we built an understanding for what a recommendation engine looks...
View ArticleIntroduction to #Hive Collections
After a much needed vacation in the sunny Florida Keys and some time away from the work and blogosphere world, its time to get back on the hamster wheel. Like most RDBMS systems Hive supports a number...
View ArticlePartitions & Buckets in #Hive
In my previous post, we discussed the map, array and struct data types and their implementation in Hive. Continuing on the Hive theme, this post will introduce partitioning and bucketing as method for...
View ArticleIndexes & Views in #Hive
In my last Hive post, we introduced partitions and bucketing both of which allow you to horizontally slice data to make it more manageable and easy to query. Staying the course in this post we will...
View ArticleOink: Improving #Pig Development
Over the last couple (ok more than a couple) of months, we’ve taken a meandering stroll through the different parts and pieces that form the foundation of the Hadoop ecosystem. We’ve covered Hive,...
View ArticleStreaming #Pig
As a C# developer there are a number of opportunities available for writing code that is either used by or interacts with a Hadoop/HDInsight cluster. A number of these have been well publicized and...
View Article3 Little Piggy’s: Advanced #Pig Join Scenarios
One of the most common operations in any Pig job is the join. A join, much like what you like the ones you work with in SQL Server, brings together two sets of data into one. These joins can happen in...
View Article
More Pages to Explore .....