Hadoop and Hive by Scott Leberknight

Hadoop is an open source framework maintained by the Apache Software Foundation for creating fault-tolerant, distributed applications that process vast amounts of data in parallel across a cluster of commodity servers. Hadoop consists of two primary components: the Hadoop Distributed Filesystem (HDFS) and a MapReduce framework. HDFS is a distributed filesystem which efficiently stores very large files across a cluster in a fault-tolerant manner. MapReduce is a framework for dividing data processing into two distinct phases, mapping and reducing, in order to deconstruct a problem so it can be run in parallel across many machines in order to speed data transformation and aggregation. In this talk we'll look at both HDFS and the MapReduce framework. We'll also look at one specific Hadoop subproject, Hive, which provides a data warehousing capability on top of Hadoop and allows developers and analysts to query their data stored in HDFS using SQL queries.

Scott Leberknight is the Chief Architect at Near Infinity Corporation

You might also like...

Comments

Other nearby events

Map

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Java is to JavaScript what Car is to Carpet.” - Chris Heilmann