Spark Read.json cant find file -

August 15, 2010

hey have 1 master , 1 slave node standalone spark cluster on aws. have folder home directory called ~/notebooks. launch jupyter notebooks , connect jupyter in browser. have file in there called people.json (simple json file).

i try running code

from pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext  conf = sparkconf().setappname('practice').setmaster('spark://ip-172-31-2-186:7077') sc = sparkcontext(conf=conf)  sqlcontext = sqlcontext(sc)  df = sqlcontext.read.json("people.json")

i error when run last line. don't file right there... ideas?-

py4jjavaerror: error occurred while calling o238.json. : org.apache.spark.sparkexception: job aborted due stage failure: task 1 in stage 4.0 failed 4 times, recent failure: lost task 1.3 in stage 4.0 (tid 37, ip-172-31-7-160.us-west-2.compute.internal): java.io.filenotfoundexception: file file:/home/ubuntu/notebooks/people.json not exist

make sure file available on worker nodes. best way use shared files system (nfs, hdfs). read external datasets documentation

Search This Blog

Two

Spark Read.json cant find file -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -