Spark Dataframe order preservation .Does calling the save operation on orderBy dataframe preserves ordering -

June 15, 2013

i ran test cases spark shell . statement executed of form .

read.orderby($"p_int".asc ).write.format("com.databricks.spark.csv").save(“file:///tmp/output.txt”)

the content in output directory seems sorted. cannot find documentation in spark related guarantees provided either dataframewriter in terms of preserving partition order or row order.

the question can expect data in target file sorted ?and please add link proper documentation.

if coalesce 1 partition before saving, output sorted. careful thought, when reading .csv in spark, if in spark config spark.default.parallelism more 1, ordering lost.

Search This Blog

Two

Spark Dataframe order preservation .Does calling the save operation on orderBy dataframe preserves ordering -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -