Subset a dataframe according to matches between dataframe column and separate character vector in R -
i want use chacracter vector to:
- find rows in dataframe contain single or greater matches vector in comma delimited list within column of dataframe
- subset dataframe retaining rows matches
example data
look<-c("id1", "id2", "id5", "id9") df<-data.frame(var1=1:10, var2=3:12, var3=rep(c("","id1,id3","id1,id9","",""))) df var1 var2 var3 1 1 3 2 2 4 id1,id3 3 3 5 id1,id9 4 4 6 5 5 7 6 6 8 7 7 9 id1,id3 8 8 10 id1,id9 9 9 11 10 10 12
where output like:
var1 var2 var3 1 2 4 id1,id3 2 3 5 id1,id9 3 7 9 id1,id3 4 8 10 id1,id9
the match between var3
column greater 1 value look
vector.
is there base r solution doesn't involve using strsplit
on var3
column?
1) create appropriate regular expression , perform grep. requested not use packages , not use strsplit
:
subset(df, grepl(paste0("\\b", paste(look, collapse = "|"), "\\b"), var3))
giving:
var1 var2 var3 2 2 4 id1,id3 3 3 5 id1,id9 7 7 9 id1,id3 8 8 10 id1,id9
1a) depending on precisely var3
, look
contain may possible shorten (but less general longer 1 above -- example id1
match id11
if used prior solution not have problem):
subset(df, grepl(paste(look, collapse = "|"), var3))
2) if willing relax strsplit
requirement still not use packages:
subset(df, sapply(strsplit(as.character(var3), ","), function(x) any(x %in% look)))
Comments
Post a Comment