dataframe - Interpolate missing values, and conducting cumulative sum based on other columns in R -
i have data frame this:
wpt id fuel dist express local 1 s36 12 1 1 0 2 s36 14 2 1 0 inter s36 na na 1 0 inter s36 na na 1 0 3 s36 16 4 1 0 inter s36 na na 0 1 4 s36 18 6 0 1 5 s36 22 7 0 1 6 w09 45 9 1 0 inter w09 na na 1 0 inter w09 na na 1 0 inter w09 na na 1 0 7 w09 48 14 0 1 8 w09 50 15 0 1
(1)i'd interpolate , insert values places nas of fuel , dist column. treat rows "inter" start , end rows of regular numbered "wpt" unit. conduct interpolation.
the expected output this:
wpt id fuel dist express local 1 s36 12 1 1 0 2 s36 14 2 1 0 inter s36 14.6667 2.67 1 0 inter s36 15.3333 3.33 1 0 3 s36 16 4 1 0 inter s36 17 5 0 1 4 s36 18 6 0 1 5 s36 22 7 0 1 6 w09 45 9 1 0 inter w09 45.75 10.25 1 0 inter w09 46.50 11.50 1 0 inter w09 47.25 12.75 1 0 7 w09 48 14 0 1 8 w09 50 15 0 1
to clear, first segment interpolation calculated this:
> seq(14,16,length.out = 4)
[1] 14.00000 14.66667 15.33333 16.00000
(2) want cumulative sum of each class of express , local id. expected output this:
id cumsum.fuel cumsum.dist express local s36 4 3 1 0 s36 5 2 0 1 w09 2.25 3.75 1 0 w09 2 1 0 1
to clear, cum.sum.fuel "s36" express 16-12=4. same applies others.
thanks in advance!!!
to fill both columns, can use mutate_at
after grouping "id"
library(dplyr) library(zoo) df2 <- df1 %>% group_by(id) %>% mutate_at(vars(fuel, dist), na.approx) df2 # wpt id fuel dist express local # <chr> <chr> <dbl> <dbl> <int> <int> #1 1 s36 12.00000 1.000000 1 0 #2 2 s36 14.00000 2.000000 1 0 #3 inter s36 14.66667 2.666667 1 0 #4 inter s36 15.33333 3.333333 1 0 #5 3 s36 16.00000 4.000000 1 0 #6 inter s36 17.00000 5.000000 0 1 #7 4 s36 18.00000 6.000000 0 1 #8 5 s36 22.00000 7.000000 0 1 #9 6 w09 45.00000 9.000000 1 0 #10 inter w09 45.75000 10.250000 1 0 #11 inter w09 46.50000 11.500000 1 0 #12 inter w09 47.25000 12.750000 1 0 #13 7 w09 48.00000 14.000000 0 1 #14 8 w09 50.00000 15.000000 0 1
for second part,
library(data.table) df2 %>% group_by(id, express1 = rleid(express), local1 = rleid(local)) %>% summarise(express = first(express), local = first(local), cumsum.fuel = last(fuel) - first(fuel), cumsum.dist = last(dist) - first(dist)) %>% ungroup() %>% select(-express1, - local1) #source: local data frame [4 x 5] # id express local cumsum.fuel cumsum.dist # <chr> <int> <int> <dbl> <dbl> #1 s36 1 0 4.00 3.00 #2 s36 0 1 5.00 2.00 #3 w09 1 0 2.25 3.75 #4 w09 0 1 2.00 1.00
or can without rleid
df2 %>% group_by(id, express, local) %>% summarise(cumsum.fuel = last(fuel) - first(fuel), cumsum.dist = last(dist) - first(dist))
Comments
Post a Comment