apache pig - Avoiding multiple headers in pig output files -


we use pig load files directories containing thousands of files, transform them, , output files consolidation of input.

we've noticed output files contain header record of every file processed, i.e. header appears multiple times in each file.

is there way have header once per output file?

raw_data = load '$input'     using org.apache.pig.piggybank.storage.csvexcelstorage(',') 

do transforms

store data '$output'  using  org.apache.pig.piggybank.storage.csvexcelstorage('|') 

did try option?

skip_input_header

see https://github.com/apache/pig/blob/31278ce56a18f821e9c98c800bef5e11e5396a69/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/csvexcelstorage.java#l85


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -