Spark Read.json cant find file -


hey have 1 master , 1 slave node standalone spark cluster on aws. have folder home directory called ~/notebooks. launch jupyter notebooks , connect jupyter in browser. have file in there called people.json (simple json file).

i try running code

from pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext  conf = sparkconf().setappname('practice').setmaster('spark://ip-172-31-2-186:7077') sc = sparkcontext(conf=conf)  sqlcontext = sqlcontext(sc)  df = sqlcontext.read.json("people.json") 

i error when run last line. don't file right there... ideas?-

py4jjavaerror: error occurred while calling o238.json. : org.apache.spark.sparkexception: job aborted due stage failure: task 1 in stage 4.0 failed 4 times, recent failure: lost task 1.3 in stage 4.0 (tid 37, ip-172-31-7-160.us-west-2.compute.internal): java.io.filenotfoundexception: file file:/home/ubuntu/notebooks/people.json not exist

make sure file available on worker nodes. best way use shared files system (nfs, hdfs). read external datasets documentation


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -