Creating a SparkSession
One of the very first things you will do when working with Spark is to create a SparkSession. SparkSession is the getway to using all of the facilities within spark including DataFrames, RDDs and other utilities. It also provides access points to lower level context such as SparkContext, SQLContext, StreamingContext and HiveContext.
To create the SparkSession, we import the object from pyspark.sql as follows:
from pyspark.sql import SparkSession
# use the builder() method and getOrCreate() to start the session.
spark = SparkSession.builder.appName("My Spark Session").getOrCreate()
Spark Session Attributes
Once the session is loaded, we can access the many objects, attributes and methods available. For example, we can find the version of spark we are working with
spark.version
You can also see a few of the spark utilities directly on the command line.
spark.