r - One hot encoding / binary columns for each day of the year and select them -


i have r dataset of flight data. need add 365 columns dataset, 1 each day-of-the-year, value 1 if data[i]$flightdate of entry corresponds day-of-the-year, 0 otherwise (see question why).

previously had managed extract day of year flightdate string using lubridate

data$dayofyear <- yday(ymd(data$flightdate)) 

how go generating each 365 columns, , keep columns (along others) future svd ? need repeat same hours in day (which split ranges of 30 or 10 minutes), 48-120 one-hot columns different variable have added later.

note : dataset contains 500k flights per month, (so 16k flights single dayoftheyear if take 1 year of data), , has 100 variable (columns)

sample input data row data[1,]:

{   dayofyear: 10,    fieldgoodforsvd1 : 235   fieldbadforsvd2 : "some string"   ... }  

sample output data row (after generating 365 binary cols , selecting fields compatible svd)

{   dayofyear1: 0,   ...    dayofyear9: 0,    dayofyear10: 1, // flight had taken place on dayofyear   dayofyear11: 0,    ...   dayofyear365: 0,    fieldgoodforsvd1 : 235 }  

edit

suppose input data matrix looks that

dayofyear ; fieldgoodforsvd1 ; fieldbadforsvd2  1         ; 275              ; "los angeles" 1         ; 256              ; "san francisco" 5         ; 15               ; "chicago" 

the final output should be

fieldgoodforsvd1 ; dayofyear1 ; dayofyear2 ; ... ; dayofyear4 ; dayofyear5 ; dayofyear6 ; ... ; dayofyear365  275              ;    1       ;      0     ; ... ; 0           ; 0         ; 0          ; ... ; 0 256              ;    1       ;      0     ; ... ; 0           ; 0         ; 0          ; ... ; 0 5                ;    0       ;      0     ; ... ; 0           ; 1         ; 0          ; ... ; 0 

here final code 1 hot encoding dayofyear , timeslot, , proceeds svd

dsan = (d[!is.na(d$fieldgoodforsvd1) & d[!is.na(d$fieldgoodforsvd2),])  # need factors perform 1 hot encoding dsan$dayofyear <- as.factor(yday(ymd(dsan$flightdate))) dsan$timeslot <- as.factor(round(dsan$deptime/100)) # in case time slots 2055 20h55  dsvd= with(dsan,data.frame(   fieldgoodforsvd1,   fieldgoodforsvd2,   # ~ performs 1 hot encoding (on factors), -1 removes intercept term   model.matrix(~dayofyear-1,dsan),   model.matrix(~timeslot-1,dsan) )) thesvd = svd(scale(dsvd)) 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -