i'm trying connect spark amazon redshift i'm getting error :

my code follow :

from pyspark.sql import sqlcontext pyspark import sparkcontext  sc = sparkcontext(appname="connect spark redshift") sql_context = sqlcontext(sc) sc._jsc.hadoopconfiguration().set("fs.s3n.awsaccesskeyid", <accessid>) sc._jsc.hadoopconfiguration().set("fs.s3n.awssecretaccesskey", <accesskey>)  df = \     .option("url", "jdbc:redshift://") \     .option("dbtable", "table_name") \     .option("tempdir", "bucket") \     .load() 

here step step process connecting redshift.

  • download redshift connector file . try below command
wget "" 
  • save below code in python file(.py want run) , replace credentials accordingly.
from pyspark.conf import sparkconf pyspark.sql import sparksession  #initialize spark session  spark = sparksession.builder.master("yarn").appname("connect redshift").enablehivesupport().getorcreate() sc = spark.sparkcontext sqlcontext = hivecontext(sc)  sc._jsc.hadoopconfiguration().set("fs.s3.awsaccesskeyid", "<accesskeyid>") sc._jsc.hadoopconfiguration().set("fs.s3.awssecretaccesskey", "<accesskeysectret>")   taxonomydf = \     .format("com.databricks.spark.redshift") \     .option("url", "jdbc:postgresql://url.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") \     .option("dbtable", "table_name") \     .option("tempdir", "s3://mybucket/") \     .load()  
  • run spark-submit below
spark-submit --packages com.databricks:spark-redshift_2.10:0.5.0 --jars redshiftjdbc4- 


