Spark Dataframe order preservation .Does calling the save operation on orderBy dataframe preserves ordering -


i ran test cases spark shell . statement executed of form .

read.orderby($"p_int".asc ).write.format("com.databricks.spark.csv").save(“file:///tmp/output.txt”)

the content in output directory seems sorted. cannot find documentation in spark related guarantees provided either dataframewriter in terms of preserving partition order or row order.

the question can expect data in target file sorted ?and please add link proper documentation.

if coalesce 1 partition before saving, output sorted. careful thought, when reading .csv in spark, if in spark config spark.default.parallelism more 1, ordering lost.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -