Subset a dataframe according to matches between dataframe column and separate character vector in R -
i want use chacracter vector to:
- find rows in dataframe contain single or greater matches vector in comma delimited list within column of dataframe
- subset dataframe retaining rows matches
example data
look<-c("id1", "id2", "id5", "id9") df<-data.frame(var1=1:10, var2=3:12, var3=rep(c("","id1,id3","id1,id9","",""))) df var1 var2 var3 1 1 3 2 2 4 id1,id3 3 3 5 id1,id9 4 4 6 5 5 7 6 6 8 7 7 9 id1,id3 8 8 10 id1,id9 9 9 11 10 10 12 where output like:
var1 var2 var3 1 2 4 id1,id3 2 3 5 id1,id9 3 7 9 id1,id3 4 8 10 id1,id9 the match between var3 column greater 1 value look vector.
is there base r solution doesn't involve using strsplit on var3 column?
1) create appropriate regular expression , perform grep. requested not use packages , not use strsplit:
subset(df, grepl(paste0("\\b", paste(look, collapse = "|"), "\\b"), var3)) giving:
var1 var2 var3 2 2 4 id1,id3 3 3 5 id1,id9 7 7 9 id1,id3 8 8 10 id1,id9 1a) depending on precisely var3 , look contain may possible shorten (but less general longer 1 above -- example id1 match id11 if used prior solution not have problem):
subset(df, grepl(paste(look, collapse = "|"), var3)) 2) if willing relax strsplit requirement still not use packages:
subset(df, sapply(strsplit(as.character(var3), ","), function(x) any(x %in% look)))
Comments
Post a Comment