r - One hot encoding / binary columns for each day of the year and select them -
i have r dataset of flight data. need add 365 columns dataset, 1 each day-of-the-year, value 1 if data[i]$flightdate
of entry corresponds day-of-the-year, 0 otherwise (see question why).
previously had managed extract day of year flightdate string using lubridate
data$dayofyear <- yday(ymd(data$flightdate))
how go generating each 365 columns, , keep columns (along others) future svd ? need repeat same hours in day (which split ranges of 30 or 10 minutes), 48-120 one-hot columns different variable have added later.
note : dataset contains 500k flights per month, (so 16k flights single dayoftheyear if take 1 year of data), , has 100 variable (columns)
sample input data row data[1,]
:
{ dayofyear: 10, fieldgoodforsvd1 : 235 fieldbadforsvd2 : "some string" ... }
sample output data row (after generating 365 binary cols , selecting fields compatible svd)
{ dayofyear1: 0, ... dayofyear9: 0, dayofyear10: 1, // flight had taken place on dayofyear dayofyear11: 0, ... dayofyear365: 0, fieldgoodforsvd1 : 235 }
edit
suppose input data matrix looks that
dayofyear ; fieldgoodforsvd1 ; fieldbadforsvd2 1 ; 275 ; "los angeles" 1 ; 256 ; "san francisco" 5 ; 15 ; "chicago"
the final output should be
fieldgoodforsvd1 ; dayofyear1 ; dayofyear2 ; ... ; dayofyear4 ; dayofyear5 ; dayofyear6 ; ... ; dayofyear365 275 ; 1 ; 0 ; ... ; 0 ; 0 ; 0 ; ... ; 0 256 ; 1 ; 0 ; ... ; 0 ; 0 ; 0 ; ... ; 0 5 ; 0 ; 0 ; ... ; 0 ; 1 ; 0 ; ... ; 0
here final code 1 hot encoding dayofyear , timeslot, , proceeds svd
dsan = (d[!is.na(d$fieldgoodforsvd1) & d[!is.na(d$fieldgoodforsvd2),]) # need factors perform 1 hot encoding dsan$dayofyear <- as.factor(yday(ymd(dsan$flightdate))) dsan$timeslot <- as.factor(round(dsan$deptime/100)) # in case time slots 2055 20h55 dsvd= with(dsan,data.frame( fieldgoodforsvd1, fieldgoodforsvd2, # ~ performs 1 hot encoding (on factors), -1 removes intercept term model.matrix(~dayofyear-1,dsan), model.matrix(~timeslot-1,dsan) )) thesvd = svd(scale(dsvd))
Comments
Post a Comment