How to read a .seq file from s3 in Spark -
i trying .seq file s3. when try read using
sc.textfile("s3n://logs/box316_0.seq").take(5).foreach(println)
it outputs -
seqorg.apache.hadoop.io.text"org.apache.hadoop.io.byteswritable'org.apache.hadoop.io.compress.gzipcodecp
and bunch of encoded characters. format , how should go decoding file ? first time hadoop please generous :)
update : tried
sc.sequencefile[text,byteswritable]("s3n://logs/box316_0.seq").take(5).foreach(println)
so data json blob stored in sequence file , gives me -
serialization stack: - object not serializable (class: org.apache.hadoop.io.text, value: 5) - field (class: scala.tuple2, name: _1, type: class java.lang.object) - object (class scala.tuple2, (5,7g 22 73 69 6d 65 43 74 71 9d 90 92 3a .................. – user1579557 5 mins ago
try:
val path = "s3n://logs/box316_0.seq" val seq = sc.sequencefile[longwritable,byteswritable](path) val usablerdd = seq.map({case (_,v : byteswritable) => text.decode(v.getbytes))
Comments
Post a Comment