How to read a .seq file from s3 in Spark -

May 15, 2012

i trying .seq file s3. when try read using

sc.textfile("s3n://logs/box316_0.seq").take(5).foreach(println)

it outputs -

    seqorg.apache.hadoop.io.text"org.apache.hadoop.io.byteswritable'org.apache.hadoop.io.compress.gzipcodecp

and bunch of encoded characters. format , how should go decoding file ? first time hadoop please generous :)

update : tried

sc.sequencefile[text,byteswritable]("s3n://logs/box316_0.seq").take(5).foreach(println)

so data json blob stored in sequence file , gives me -

 serialization stack: - object not serializable  (class: org.apache.hadoop.io.text, value: 5) -  field (class: scala.tuple2, name: _1, type: class java.lang.object)  - object (class scala.tuple2, (5,7g 22 73 69 6d 65 43 74 71 9d 90 92 3a .................. – user1579557 5 mins ago

try:

val path = "s3n://logs/box316_0.seq" val seq = sc.sequencefile[longwritable,byteswritable](path) val usablerdd = seq.map({case (_,v : byteswritable) =>  text.decode(v.getbytes))

Search This Blog

Two

How to read a .seq file from s3 in Spark -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -