regex - R: Reading multiline data patterns from file -
here file pattern
metastring: time1, a,b,c,d,f 144135 42435 345425 2342423 263766 35553 353453 3534553 355345 52454 525252 2423465 245466 45645 355345 6454556 355662 26397 353577 3558676 metastring: time2, a,c,d,f 224234 23423 324234 4242324 312323 13123 312312 1312321 246456 63564 646544 4456456 244424 53556 546456 4645645
metastrings consist of time stamp , a,b,c,d names referring strings of numbers (e.g. "a" refers first number string of block). number strings fixed-width quantity not constant, depends on metastring. want either data.frame structured this:
time1 144135 42435 345425 2342423 time1 b 263766 35553 353453 3534553 time1 c 355345 52454 525252 2423465 time1 d 245466 45645 355345 6454556 time1 f 355662 26397 353577 3558676 time2 224234 23423 324234 4242324 time2 c 312323 13123 312312 1312321 time2 d 246456 63564 646544 4456456 time2 f 244424 53556 546456 4645645
or able read single block @ time matching metastring format , reading lines between 2 metastrings. can't find way it, since gsubfn read.pattern seems read file line @ time , can't further metastring.
to data frame in return, here's possibility uses readlines()
, post-processing on strings. in code, replace textconnection(text)
name of file.
## read file dat <- readlines(textconnection(text)) ## find 'metastring' lines meta <- grepl("metastring", dat, fixed = true) ## split 'metastring' lines first 2 columns ## create first 2 columns f2cols <- do.call( "rbind", lapply( strsplit(dat[meta], "(.*: )|, ?"), function(x) cbind(text1 = x[2], text2 = tail(x, -2)) ) ) ## create final data frame cbind(f2cols, read.table(text = dat[!meta])) # text1 text2 v1 v2 v3 v4 # 1 time1 144135 42435 345425 2342423 # 2 time1 b 263766 35553 353453 3534553 # 3 time1 c 355345 52454 525252 2423465 # 4 time1 d 245466 45645 355345 6454556 # 5 time1 f 355662 26397 353577 3558676 # 6 time2 224234 23423 324234 4242324 # 7 time2 c 312323 13123 312312 1312321 # 8 time2 d 246456 63564 646544 4456456 # 9 time2 f 244424 53556 546456 4645645
data:
text <- "metastring: time1, a,b,c,d,f\n144135 42435 345425 2342423\n263766 35553 353453 3534553\n355345 52454 525252 2423465\n245466 45645 355345 6454556\n355662 26397 353577 3558676\nmetastring: time2, a,c,d,f\n224234 23423 324234 4242324\n312323 13123 312312 1312321\n246456 63564 646544 4456456\n244424 53556 546456 4645645"
Comments
Post a Comment