dashboard    feed    |FILTER :    videos    slides    links    quiz    |SEARCH :    youtube    slideshare    quizlet    Google    
Posts
IBM commits to Apache Spark compute engine | CIO

Big Blue will embed Spark in its platforms and offer itas a service on its IBM Bluemix cloud. It will also an donate its IBM SystemML machine learning technology to the Spark open source ecosystem.
08-Aug-2015
Apache Spark RDD API Examples

Zhen He : HomeRDD function calls
13-Apr-2015
An introduction to JSON support in Spark SQL | Databricks

An introduction to JSON support in Spark SQL
08-Mar-2015
Run Apache Spark on Apache Mesos · Mesosphere

Mesos allows you to create a highly-available and scalable cluster on your existing hardware.
25-Feb-2015
Learning Spark - O'Reilly Media

Data in all domains is getting bigger. How can you work with it efficiently? This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big...
25-Feb-2015
1. Introduction to Data Analysis with Spark - Learning Spark

Chapter 1. Introduction to Data Analysis with Spark This chapter provides a high-level overview of what Apache Spark is. If you are already familiar with Apache Spark and its components ...
25-Feb-2015
Apache Spark RDD API Examples

Zhen He : HomeRDD function calls
25-Feb-2015
Apache Spark RDD API Examples

Zhen He : HomeRDD function calls
25-Feb-2015
Logistic Regression in Apache Spark | Insight

Apache Spark is an amazingly fast large scale data processing engine that can be run on Hadoop, Mesos or on your local machine. [1] In contrast to Mahout, Hadoop, Spark allows not only Map Reduce, but general programming tasks; which is good for us because ML is primarily not Map Reduce. And building a ML Algorithm?
25-Feb-2015
Developer Resources | Databricks

Spark Developer Resources : Databricks provides a number of free resources online for Spark training, including course materials, video archives, sample apps, knowledge base, etc.
25-Feb-2015
Introduction

17-Feb-2015
Index of /~amir/files/download/dic

Index of /~amir/files/download/dic
02-Feb-2015
Databricks Spark Reference Applications

Reference Applications demonstrating Apache Spark - brought to you by Databricks.
30-Jan-2015
spark-jobserver/spark-jobserver · GitHub

spark-jobserver - REST job server for Apache Spark
15-Jan-2015
Integrating Kafka and Spark Streaming: Code Examples and State of the Game - Michael G. Noll

Integrating Kafka and Spark Streaming: code examples and state of the game.
20-Dec-2014

Resilient Distributed DataSets - Apache SPARK
05-Nov-2014

Resilient Distributed DataSets - Apache SPARK
05-Nov-2014

Spark Internals - Hadoop Source Code Reading #16 in Japan
17-Sep-2014

A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
17-Sep-2014
Optimize map performamce with mapPartitions | Big Data Analytics with Spark

As we can see in CSV Parser, we may need to create a new object for each record of an RDD as in The mLine function is used in the map method of an RDD. In this case the parser object is created each time for each record, although they are exactly the same thing.?
17-Sep-2014
Real-time Processing (Spark, Puma, HOP)

Data-Intensive Systems:Real-time Stream Processing
17-Sep-2014
YouTube
User Case Study Talks - Conviva - Davis Shepherd - UC Berkeley AmpLab 2013
AMP Camp Three -- Analytics and Machine Learning at Scale was held in Berkeley California and live streamed online, August 29-30, 2013. AMP Camp 3 attendees and online viewers learned to solve big data problems using components of the Berkeley Data Analytics Stack (BDAS) including Spark, Shark, Mesos, Tachyon, MLbase as well as Hadoop.The event was held in the Chevron Auditorium in the International House at UC Berkeley
26-Aug-2014
AlpineNow/alpineml · GitHub

Contribute to alpineml development by creating an account on GitHub.
12-Aug-2014
SNAP Sequence Aligner

SNAP : SNAP is a new sequence aligner that is 3-20x faster and just as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools (in some cases) and lets it match larger mutations that they may miss. SNAP also natively reads BAM, FASTQ, or gzipped FASTQ, and natively writes SAM or BAM, with built-in sorting, duplicate marking, and BAM indexing.
09-Aug-2014
Big Data Benchmark

Click Here for the previous version of the benchmark
15-Jul-2014
Arindam's Tech Blog ? SPARK streaming and other real time stream processing framework

SPARK streaming and other real time stream processing framework Streams are everywhere; twitter streams, tcp streams, clickstreams, log streams, event streams. Processing and analyzing them in real...
04-Jun-2014


14-May-2014


14-May-2014


14-May-2014


14-May-2014


14-May-2014
Introduction

17-Jan-2014
YouTube


28-Dec-2013
YouTube


28-Dec-2013
YouTube


28-Dec-2013
YouTube


28-Dec-2013


19-Dec-2013


19-Dec-2013
Examples | Apache Spark

Spark : Spark is built around distributed datasets that support types of parallel operations: transformations, which are lazy and yield another distributed dataset (e.g., map, filter, and join), and actions, which force the computation of a dataset and return a result (e.g., count). The following examples show off some of the available operations and features. Several additional examples are distributed with Spark:
19-Dec-2013