dashboard    feed    |FILTER :    videos    slides    links    quiz    |SEARCH :    youtube    slideshare    quizlet    Google    
Posts
Quality of Service in Hadoop ? eBay Tech Blog

At eBay we run Hadoop clusters comprising thousands of nodes that are shared by thousands of users. We analyze data on these clusters to gain insights for improved customer experience. In this post, we look at distributing RPC resources fairly [?]
24-Sep-2014
Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co. - Michael G. Noll

How to benchmark and stress test an Apache Hadoop cluster with built-in benchmark tools such as TeraSort and TestDFSIO
11-Aug-2014


06-Feb-2014


31-Jan-2014


31-Jan-2014
Considering 10GE Hadoop clusters and the network

Considering 10GE Hadoop clusters and the network
16-Jan-2014
Big data serialization using Apache Avro with Hadoop

Apache Avro is a serialization framework that produces data in a compact binary format that doesn't require proxy objects or code generation. Get to know Avro, and learn how to use it with Apache Hadoop.
27-Dec-2013


10-Nov-2013


10-Nov-2013
Examples | Apache Spark

Spark : Spark is built around distributed datasets that support types of parallel operations: transformations, which are lazy and yield another distributed dataset (e.g., map, filter, and join), and actions, which force the computation of a dataset and return a result (e.g., count). The following examples show off some of the available operations and features. Several additional examples are distributed with Spark:
10-Nov-2013
Tutorial · nathanmarz/storm Wiki · GitHub

storm - Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
06-Nov-2013


06-Nov-2013


05-Nov-2013
Data Lab: MapReduce on HBase Table

Data LabMapReduce on HBase Table
31-Oct-2013
Data Lab: Implementing Custom Writable

Data LabImplementing Custom Writable
31-Oct-2013


23-Oct-2013


23-Oct-2013
Algorithms - Apache Mahout - Apache Software Foundation

Algorithms : This section contains links to information, examples, use cases, etc. for the various algorithms we intend to implement. Click the individual links to learn more. The initial algorithms descriptions have been copied here from the original project proposal. The algorithms are grouped by the application setting, they can be used for. In case of multiple applications, the version presented in the paper was chosen, versions as implemented in our project will be added as soon as we are working on them.
06-Oct-2013


29-Sep-2013


27-Sep-2013


27-Sep-2013
Tutorial · nathanmarz/storm Wiki · GitHub

storm - Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more
26-Sep-2013


26-Sep-2013
Sqoop User Guide (v1.4.2)

Table of ContentsSqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
25-Sep-2013
Sqoop Client API Guide ? Apache Sqoop documentation

Apache Sqoop documentationSqoop Client API Guide¶
25-Sep-2013
Sqoop 5 Minutes Demo ? Apache Sqoop documentation

Apache Sqoop documentationSqoop 5 Minutes Demo¶
24-Sep-2013
Tutorial - Apache Hive - Apache Software Foundation

TutorialHive TutorialConceptsUsage and Examples
21-Sep-2013
Hadoop Tutorial

Hadoop TutorialHadoop Tutorial
21-Sep-2013


13-Sep-2013


09-Sep-2013


07-Sep-2013


07-Sep-2013