Spark Redshift with Python -


i'm trying connect spark amazon redshift i'm getting error :

enter image description here

my code follow :

from pyspark.sql import sqlcontext pyspark import sparkcontext  sc = sparkcontext(appname="connect spark redshift") sql_context = sqlcontext(sc) sc._jsc.hadoopconfiguration().set("fs.s3n.awsaccesskeyid", <accessid>) sc._jsc.hadoopconfiguration().set("fs.s3n.awssecretaccesskey", <accesskey>)  df = sql_context.read \     .option("url", "jdbc:redshift://example.coyf2i236wts.eu-central-    1.redshift.amazonaws.com:5439/agcdb?user=user&password=pwd") \     .option("dbtable", "table_name") \     .option("tempdir", "bucket") \     .load() 

here step step process connecting redshift.

  • download redshift connector file . try below command
wget "https://s3.amazonaws.com/redshift-downloads/drivers/redshiftjdbc4-1.2.1.1001.jar" 
  • save below code in python file(.py want run) , replace credentials accordingly.
from pyspark.conf import sparkconf pyspark.sql import sparksession  #initialize spark session  spark = sparksession.builder.master("yarn").appname("connect redshift").enablehivesupport().getorcreate() sc = spark.sparkcontext sqlcontext = hivecontext(sc)  sc._jsc.hadoopconfiguration().set("fs.s3.awsaccesskeyid", "<accesskeyid>") sc._jsc.hadoopconfiguration().set("fs.s3.awssecretaccesskey", "<accesskeysectret>")   taxonomydf = sqlcontext.read \     .format("com.databricks.spark.redshift") \     .option("url", "jdbc:postgresql://url.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") \     .option("dbtable", "table_name") \     .option("tempdir", "s3://mybucket/") \     .load()  
  • run spark-submit below
spark-submit --packages com.databricks:spark-redshift_2.10:0.5.0 --jars redshiftjdbc4-1.2.1.1001.jar test.py 

Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

python - Error opening file in H5PY (File signature not found) -