Creating a SparkSession

One of the very first things you will do when working with Spark is to create a SparkSession. SparkSession is the getway to using all of the facilities within spark including DataFrames, RDDs and other utilities. It also provides access points to lower level context such as SparkContext, SQLContext, StreamingContext and HiveContext.

To create the SparkSession, we import the object from pyspark.sql as follows:

from pyspark.sql import SparkSession

# use the builder() method and getOrCreate() to start the session.        
spark = SparkSession.builder.appName("My Spark Session").getOrCreate()
    

Spark Session Attributes

Once the session is loaded, we can access the many objects, attributes and methods available. For example, we can find the version of spark we are working with

spark.version
4.0.0

You can also see a few of the spark utilities directly on the command line.

spark.
spark.Builder() spark.clearTags() spark.interruptAll() spark.registerProgressHandler( spark.tvf spark.active() spark.client spark.interruptOperation( spark.removeProgressHandler( spark.udf spark.addArtifact( spark.conf spark.interruptTag( spark.removeTag( spark.udtf spark.addArtifacts( spark.copyFromLocalToFs( spark.newSession() spark.sparkContext spark.version spark.addTag( spark.createDataFrame( spark.profile spark.sql( spark.builder spark.dataSource spark.range( spark.stop() spark.catalog spark.getActiveSession() spark.read spark.streams spark.clearProgressHandlers() spark.getTags() spark.readStream spark.table(