r - How to drop groups when there are not enough observations? -


how drop groups when there not enough observations? in following reproducible example, each person (identified name) has 10 observations:

install.packages('randomnames') # install package if required install.packages('data.table')  # install package if required lapply(c('data.table', 'randomnames'), require, character.only = true) # load packages  set.seed(1) testdt <- data.table( date = rep(seq(as.date("2010/1/1"), as.date("2019/1/1"), "years"),10),                       name = rep(randomnames(10, which.names='first'), times=1, each=10),                       y    =  runif(100, 5, 15),                       x    =  rnorm(100, 2, 9), testdt <- testdt[ x > 0] 

now want keep persons @ least 6 observations, gracelline, anna, aesha , michael must removed, because have 3, 2, 4 , 5 observations respectively.

  testdt[, length(x), by=name]             name v1    1:      blake  6    2:  alexander  6    3:     leigha  8    4: gracelline  3    5:   epifanio  7    6:     keasha  6    7:      robyn  6    8:       anna  2    9:      aesha  4   10:    michael  5 

how do in automatic way (real dataset larger)?

edit:

yes it's duplicate. :( last proposed method fastest one.

> system.time(testdt[, .sd[.n>=6], = name])    user  system elapsed    0.293   0.227   0.517  > system.time(testdt[testdt[, .i[.n>=6], = name]$v1])    user  system elapsed    0.163   0.243   0.415  > system.time(testdt[,if(.n>=6) .sd , = name])    user  system elapsed    0.073   0.323   0.399  

we group 'name', nrow (.n), , if greater 6, subset data.table (.sd).

testdt[,if(.n>=6) .sd , = name] #       name       date         y           x # 1:     blake 2010-01-01  9.820801  3.69913070 # 2:     blake 2012-01-01  9.935413 15.18999375 # 3:     blake 2013-01-01  6.862176  3.37928004 # 4:     blake 2014-01-01 13.273733 21.55350503 # 5:     blake 2015-01-01 11.684667  6.27958576 # 6:     blake 2017-01-01  6.079436  7.49653718 # 7: alexander 2010-01-01 13.209463  4.62301612 # 8: alexander 2012-01-01 12.829328  2.00994816 # 9: alexander 2013-01-01 10.530363  2.66907192 #10: alexander 2016-01-01  5.233312  0.78339246 #11: alexander 2017-01-01  9.772301 12.60278297 #12: alexander 2019-01-01 11.927316  7.34551569 #13:    leigha 2010-01-01  9.776196  4.99655334 #14:    leigha 2011-01-01 13.612095 11.56789854 #15:    leigha 2013-01-01  7.447973  5.33016929 #16:    leigha 2014-01-01  5.706790  4.40388912 #17:    leigha 2016-01-01  8.162717 12.87081025 #18:    leigha 2017-01-01 10.186343 12.44362354 #19:    leigha 2018-01-01 11.620051  8.30192285 #20:    leigha 2019-01-01  9.068302 16.28150109 #21:  epifanio 2010-01-01  8.390729 17.90558542 #22:  epifanio 2011-01-01 13.394404  8.45036728 #23:  epifanio 2012-01-01  8.466835 10.19156807 #24:  epifanio 2013-01-01  8.337749  5.45766822 #25:  epifanio 2014-01-01  9.763512 17.13958472 #26:  epifanio 2017-01-01  8.899895 14.89054015 #27:  epifanio 2019-01-01 14.606180  0.13357331 #28:    keasha 2013-01-01  8.253522  6.44769498 #29:    keasha 2014-01-01 12.570871  0.40402566 #30:    keasha 2016-01-01 12.111212 14.08734943 #31:    keasha 2017-01-01  6.216919  0.06878532 #32:    keasha 2018-01-01  7.454885  0.38399123 #33:    keasha 2019-01-01  6.433044  1.09828333 #34:     robyn 2010-01-01  7.396294  8.41399676 #35:     robyn 2011-01-01  5.589344  1.33792036 #36:     robyn 2012-01-01 11.422883  1.66129246 #37:     robyn 2015-01-01 12.973088  2.54144396 #38:     robyn 2017-01-01  9.100841  6.78346573 #39:     robyn 2019-01-01 11.049333  4.75902075 

or instead of if, can directly use .n>1 , wrap `.sd

testdt[, .sd[.n>=6], = name] 

it little slow, option .i row index , subset

testdt[testdt[, .i[.n>=6], = name]$v1] 

Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -