1. 程式人生 > >Spark Tutorial – Learn Spark Programming

Spark Tutorial – Learn Spark Programming

Bigdata Apache Spark Spark Online training

Introduction to Spark Programming

That is Spark? Spark Programming is nothing but a general-purpose & lightning fast cluster computing platform. That reveals development API’s, which also qualifies data workers to author streaming,單詞machine learning or SQL workloads which demand repeated access to data sets. However, Spark can perform

batch processing and stream processing. Batch processing refers to the processing of the previously collected job in a single batch. Whereas stream processing means to deal with Spark Streaming Data.

Also, it is designed in such a way that it integrates with all the Big data tools. Like spark can access any

Hadoop data source, also can run on Hadoop clusters., Apache Spark extends Hadoop MapReduce to next level. That also includes iterative Query And stream processing.

One more common belief about Spark is that it is an extension of Hadoop. Although that is not true. However, Spark is independent of Hadoop since it has its own

cluster management system. Basically, it uses Hadoop for storage purpose only.

Although, there is one spark’s key feature that it has in-memory cluster computation capability. Also increases the processing speed of an application.

Basically, Apache Spark offers high-level APIs to users, such as Java, Scala, Python and R., Spark is written in Scala still offers rich APIs in Scala, Java, Python, as well as R. We can say, it is a tool for running lighting applications.

Most importantly, on comparing Spark with Hadoop, It is 100 times faster faster than Big Data Hadoop and 10 times faster than accessing data from disk.


Spark History

At first, in 2009 Apache Spark was introduced in the UC Berkeley R & D Lab. Which is now known as AMPLab. Afterwards , in 2010 it became open source under BSD license. Further, the spark was donated to Apache Software Foundation, in 2013 Then in 2014, it became top-level Apache project.

Why Spark?

Spark Tutorial - Why Apache Spark

Spark Tutorial – Why Spark?

As we know, there was no general purpose computing engine in the industry, since

  1. To perform batch processing, we were using Hadoop MapReduce.
  2. Also, to perform stream processing, we were using Apache Storm / S4.
  3. Moreover, for interactive processing, we were using Apache Impala / Apache Tez.
  4. To perform graph processing, we were using Neo4j / Apache Giraph.

There was was no powerful engine in the industry, that can process the data both in real-time and batch mode. Also, there was a requirement that one engine can respond in sub-second and perform in-memory processing.

Basic, these features create the difference between Hadoop and Spark. Also makes a huge comparison between Spark vs Storm.

Apache Spark Components

In this Apache Spark Tutorial, we discussed Spark Components. It puts the promise for faster data processing as well as easier development. It is only possible because of its components. All these Spark components resolved the issues that occurred while while using Hadoop MapReduce.

to learn more Spark Ecosystem Component



Spark Tutorial – Learn Spark Programming