How to read a .seq file from s3 in Spark -


i trying .seq file s3. when try read using

sc.textfile("s3n://logs/box316_0.seq").take(5).foreach(println)  

it outputs -

    seqorg.apache.hadoop.io.text"org.apache.hadoop.io.byteswritable'org.apache.hadoop.io.compress.gzipcodecp 

and bunch of encoded characters. format , how should go decoding file ? first time hadoop please generous :)

update : tried

sc.sequencefile[text,byteswritable]("s3n://logs/box316_0.seq").take(5).foreach(println) 

so data json blob stored in sequence file , gives me -

 serialization stack: - object not serializable  (class: org.apache.hadoop.io.text, value: 5) -  field (class: scala.tuple2, name: _1, type: class java.lang.object)  - object (class scala.tuple2, (5,7g 22 73 69 6d 65 43 74 71 9d 90 92 3a .................. – user1579557 5 mins ago      

try:

val path = "s3n://logs/box316_0.seq" val seq = sc.sequencefile[longwritable,byteswritable](path) val usablerdd = seq.map({case (_,v : byteswritable) =>  text.decode(v.getbytes)) 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -