Spark Read.json cant find file -
hey have 1 master , 1 slave node standalone spark cluster on aws. have folder home directory called ~/notebooks. launch jupyter notebooks , connect jupyter in browser. have file in there called people.json (simple json file).
i try running code
from pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext conf = sparkconf().setappname('practice').setmaster('spark://ip-172-31-2-186:7077') sc = sparkcontext(conf=conf) sqlcontext = sqlcontext(sc) df = sqlcontext.read.json("people.json")
i error when run last line. don't file right there... ideas?-
py4jjavaerror: error occurred while calling o238.json. : org.apache.spark.sparkexception: job aborted due stage failure: task 1 in stage 4.0 failed 4 times, recent failure: lost task 1.3 in stage 4.0 (tid 37, ip-172-31-7-160.us-west-2.compute.internal): java.io.filenotfoundexception: file file:/home/ubuntu/notebooks/people.json not exist
make sure file available on worker nodes. best way use shared files system (nfs, hdfs). read external datasets documentation
Comments
Post a Comment