apache spark - SparkSQL timeseries data: how to apply custom aggregation function on window in DataFrame -
i have incoming data frame, in following format (timestamp, data1, data2):
2015-09-25t11:00:00.000z "test" "value1" 2015-09-25t12:00:00.000z "test" "value2" 2015-09-25t13:00:00.000z "test" "value3" i need "look back" based on window size , perform aggregation on third column, if window size 1 hour, output should be:
2015-09-25t11:00:00.000z "test" "value1" 2015-09-25t12:00:00.000z "test" "value1, value2" 2015-09-25t13:00:00.000z "test" "value2, value3" for 2 hour window:
2015-09-25t11:00:00.000z "test" "value1" 2015-09-25t12:00:00.000z "test" "value1, value2" 2015-09-25t13:00:00.000z "test" "value1, value2, value3" i thinking writing custom aggregation function can group , using spark sql windowed operation, not supported in spark 1.6. maybe working on tasks before , can help?
Comments
Post a Comment