apache spark - SparkSQL timeseries data: how to apply custom aggregation function on window in DataFrame -


i have incoming data frame, in following format (timestamp, data1, data2):

2015-09-25t11:00:00.000z "test" "value1" 2015-09-25t12:00:00.000z "test" "value2" 2015-09-25t13:00:00.000z "test" "value3" 

i need "look back" based on window size , perform aggregation on third column, if window size 1 hour, output should be:

2015-09-25t11:00:00.000z "test" "value1" 2015-09-25t12:00:00.000z "test" "value1, value2" 2015-09-25t13:00:00.000z "test" "value2, value3" 

for 2 hour window:

2015-09-25t11:00:00.000z "test" "value1" 2015-09-25t12:00:00.000z "test" "value1, value2" 2015-09-25t13:00:00.000z "test" "value1, value2, value3" 

i thinking writing custom aggregation function can group , using spark sql windowed operation, not supported in spark 1.6. maybe working on tasks before , can help?


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -