ruby on rails - Can I duplicate rows with kiba using a transform? -


i'm using gem transform csv webscraped personel-database has no api.

from scraping ended csv. can process pretty fine using gem, there's 1 bit wondering

consider following data:

==================================== | name  |  article_1   | article_2 | ------------------------------------ | andy  |  foo         | bar       | ==================================== 

i can turn this:

====================== | name  |  article   | ---------------------- | andy  |  foo       | ---------------------- | andy  |  bar       | ====================== 

(i used tutorial this: http://thibautbarrere.com/2015/06/25/how-to-explode-multivalued-attributes-with-kiba/)

i'm using normalizelogic on loader this. code looks like: source rownormalizer, normalizearticles, csvsource, 'rp00119.csv' transform addcolumnentiteit, :entiteit, "ocmw"


what wondering, can achieve same using transform? code this:

source csvsource, 'rp00119.csv' transform normalizearticles transform addcolumnentiteit, :entiteit, "ocmw" 

so question is: can achieve duplicate row transform class?

in kiba released, transform cannot yet more 1 row - it's either 1 or zero.

the kiba pro offering i'm building includes multithreaded runner happens (by side-effect rather actual goal) allow transforms yield arbitrary number of rows, looking after.

but said, without kiba pro, here number of techniques help.

the first possibility split etl script 2. cut @ step want normalize articles, , put destination here instead. in second etl script, use source able explode row many. think i'd recommend in case.

if that, can use either simple rake task invoke etl scripts sequence, or can alternatively use post_process invoke next 1 if prefer (i prefer first approach because makes easier run either 1 or another).

another approach (but complicated current scenario) declare same source n times, yield given subset of data, e.g.:

pre_process   field_count = number_of_exploded_columns # extract csv? end  (0..field_count).each |shard|   source mysource, shard: shard, shard_count: field_count end 

then inside mysource conditionnally yield this:

yield row if row_index % field_count == shard 

that's 2 patterns think of!

i recommend first 1 started though, more easy.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -