dataframe - Interpolate missing values, and conducting cumulative sum based on other columns in R -


i have data frame this:

wpt    id   fuel  dist  express   local  1     s36   12    1     1         0  2     s36   14    2     1         0  inter s36   na    na    1         0  inter s36   na    na    1         0  3     s36   16    4     1         0  inter s36   na    na    0         1  4     s36   18    6     0         1  5     s36   22    7     0         1  6     w09   45    9     1         0  inter w09   na    na    1         0  inter w09   na    na    1         0  inter w09   na    na    1         0  7     w09   48    14    0         1  8     w09   50    15    0         1 

(1)i'd interpolate , insert values places nas of fuel , dist column. treat rows "inter" start , end rows of regular numbered "wpt" unit. conduct interpolation.

the expected output this:

wpt    id   fuel     dist  express   local  1     s36   12       1     1         0  2     s36   14       2     1         0  inter s36   14.6667  2.67  1         0  inter s36   15.3333  3.33  1         0  3     s36   16       4     1         0  inter s36   17       5     0         1  4     s36   18       6     0         1  5     s36   22       7     0         1  6     w09   45       9     1         0  inter w09   45.75    10.25 1         0  inter w09   46.50    11.50 1         0  inter w09   47.25    12.75 1         0  7     w09   48       14    0         1  8     w09   50       15    0         1 

to clear, first segment interpolation calculated this:

> seq(14,16,length.out = 4)
[1] 14.00000 14.66667 15.33333 16.00000

(2) want cumulative sum of each class of express , local id. expected output this:

id  cumsum.fuel  cumsum.dist express  local s36    4             3          1       0 s36    5             2          0       1 w09    2.25          3.75       1       0 w09    2             1          0       1 

to clear, cum.sum.fuel "s36" express 16-12=4. same applies others.

thanks in advance!!!

to fill both columns, can use mutate_at after grouping "id"

library(dplyr) library(zoo) df2 <- df1 %>%           group_by(id) %>%           mutate_at(vars(fuel, dist), na.approx)  df2 #     wpt    id     fuel      dist express local #   <chr> <chr>    <dbl>     <dbl>   <int> <int> #1      1   s36 12.00000  1.000000       1     0 #2      2   s36 14.00000  2.000000       1     0 #3  inter   s36 14.66667  2.666667       1     0 #4  inter   s36 15.33333  3.333333       1     0 #5      3   s36 16.00000  4.000000       1     0 #6  inter   s36 17.00000  5.000000       0     1 #7      4   s36 18.00000  6.000000       0     1 #8      5   s36 22.00000  7.000000       0     1 #9      6   w09 45.00000  9.000000       1     0 #10 inter   w09 45.75000 10.250000       1     0 #11 inter   w09 46.50000 11.500000       1     0 #12 inter   w09 47.25000 12.750000       1     0 #13     7   w09 48.00000 14.000000       0     1 #14     8   w09 50.00000 15.000000       0     1 

for second part,

library(data.table) df2 %>%    group_by(id, express1 = rleid(express), local1 = rleid(local)) %>%    summarise(express = first(express),              local = first(local),               cumsum.fuel = last(fuel) - first(fuel),              cumsum.dist = last(dist) - first(dist))  %>%     ungroup() %>%      select(-express1, - local1) #source: local data frame [4 x 5] #    id express local cumsum.fuel cumsum.dist #  <chr>   <int> <int>       <dbl>       <dbl> #1   s36       1     0        4.00        3.00 #2   s36       0     1        5.00        2.00 #3   w09       1     0        2.25        3.75 #4   w09       0     1        2.00        1.00 

or can without rleid

df2 %>%     group_by(id, express, local) %>%      summarise(cumsum.fuel = last(fuel) - first(fuel),                cumsum.dist = last(dist) - first(dist))   

Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -