Availability of material under CC-BY-SA.
Change-point detection is a topic that has been around for decades and it has often been considered together with trend detection.
Change-point: distributional or structural change in data in a way that the distribution of data before change-point somehow differs from the distribution of data after change-point.
Trend: Slowly departure from the past norm of data.
set.seed(1234)
x <- rnorm(200)
y <- arima.sim(n=200,model = list(ar=0.8))
t <- seq(1,4,length.out = 200)
w <- x+t
z <- c(x[1:60],x[61:200]+2)
par(mfrow=c(2,2))
ts.plot(x,main="randomly normally-distributed");ts.plot(y,main="timely correlated");ts.plot(w,main="upward trend");ts.plot(z,main="abrupt change")
More complicated cases
set.seed(1234)
x <- rnorm(200)
t <- seq(1,4,length.out = 100)
w <- x+c(t,rev(t))
w1 <- x+c(t,rep(t[100],100))
z <- c(x[1:60],x[61:120]+2,x[121:200]-2)
z1 <- sin(c(1:100))+runif(100,-0.1,0.1)
par(mfrow=c(2,2))
ts.plot(w,main="trend upward/downward");ts.plot(w1,main="trend upward/stable");ts.plot(z,main="abrupt change");ts.plot(z1,main="seasonality")
Abrupt change vs trend
library(gplots)
set.seed(123)
x <- rnorm(200)
y <- c(1:200)
z <- c(x[1:100],x[101:200]+2)
ts.plot(z)
lines(lowess(y,z,f=.8), col="red",lwd=2)
lines(lowess(y[1:100],z[1:100],f=.8), col="blue",lwd=4)
lines(lowess(y[101:200],z[101:200],f=.8), col="gold",lwd=4)
Notes:
A fitted model to time-ordered data might not always be valid.
Abrupt changes make (unreal) trends.
Hypothesis of trend detection methods:
\(\mathcal{H}_0\): data is independently and randomly ordered, vs \(\mathcal{H}_1\): there exist an/a upward/downward trend in data.
Mann-Kendall:
We now apply the Mann-Kendall method to a normally-distributed data as follows:
library(trend)
set.seed(123)
x <- rnorm(200)
mk.test(x)
##
## Mann-Kendall trend test
##
## data: x
## z = -0.94578, n = 200, p-value = 0.3443
## alternative hypothesis: true S is not equal to 0
## sample estimates:
## S varS tau
## -8.960000e+02 8.955000e+05 -4.502513e-02
We next make an artificial change of magnitude 2 at time 101
z <- c(x[1:100],x[101:200]+2)
mk.test(z)
##
## Mann-Kendall trend test
##
## data: z
## z = 9.3933, n = 200, p-value < 2.2e-16
## alternative hypothesis: true S is not equal to 0
## sample estimates:
## S varS tau
## 8.890000e+03 8.955000e+05 4.467337e-01
Hypothesis of change-point detection methods:
\(\mathcal{H}_0\): there is no change-point in data, vs \(\mathcal{H}_1\): there exist at least one change-point.
We now make use of the method e.divisive for the same simulated data.
E.divisive:
library(ecp)
e.divisive(matrix(x,ncol = 1),sig.lvl=.05)
## $k.hat
## [1] 1
##
## $order.found
## [1] 1 201
##
## $estimates
## [1] 1 201
##
## $considered.last
## [1] 99
##
## $p.values
## [1] 0.38
##
## $permutations
## [1] 199
##
## $cluster
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
e.divisive(matrix(z,ncol = 1),sig.lvl=.05)
## $k.hat
## [1] 2
##
## $order.found
## [1] 1 201 101
##
## $estimates
## [1] 1 101 201
##
## $considered.last
## [1] 148
##
## $p.values
## [1] 0.005 0.620
##
## $permutations
## [1] 199 199
##
## $cluster
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
## [112] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [149] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [186] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Simulation from AR(0.8)
set.seed(1234)
x.ar <- arima.sim(168,model = list(ar=0.8))
mk.test(x.ar)
##
## Mann-Kendall trend test
##
## data: x.ar
## z = 4.7336, n = 168, p-value = 2.206e-06
## alternative hypothesis: true S is not equal to 0
## sample estimates:
## S varS tau
## 3.452000e+03 5.315053e+05 2.460793e-01
e.divisive(matrix(x.ar,ncol = 1),sig.lvl=.05)
## $k.hat
## [1] 4
##
## $order.found
## [1] 1 169 31 77 115
##
## $estimates
## [1] 1 31 77 115 169
##
## $considered.last
## [1] -1
##
## $p.values
## [1] 0.005 0.015 0.005
##
## $permutations
## [1] 199 199 199
##
## $cluster
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2
## [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [75] 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [149] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Simulate 400 normally-distributed, average 0 and standard deviation 1, vectors of length 168.
Apply the Mann-Kendall and E.divisive methods to all 400 normally distributed vectors and check for how many vectors they detect trend/change-points.
Make two groups, each of length 200, of 400 normally-distributed vectors. For the first group produce an artificial change of magnitude 0.5 at time 21. For the second group produce an artificial change of magnitude 0.5 at time 81. Let the produced change last until the end of the time series.
For each group, apply Mann-Kendall and E.divisive to all vectors and check for how many vectors they detect the produced artificial change.
Repeat the steps 1-2 for data generated from an autoregressive time series with parameter 0.8.
We now load the packages needed for this section.
library(repmis)
library(remote)
library(trend)
library(ecp)
library(gimms)
library(plotKML)
library(sp)
library(raster)
library(tmap)
Monthly LST remote sensing data of Castellon, during period 2001-2018, can be read/prepared as follows:
source_data("https://github.com/Moradii/Geomundus-2019/blob/master/Castellon/Data/lst_day_castellon_monthly.RData?raw=True")
## [1] "lst.day.castellon.monthly"
source_data("https://github.com/Moradii/Geomundus-2019/blob/master/Castellon/Data/castellon_border.RData?raw=True")
## [1] "castellon"
castellon <- spTransform(castellon, CRS("+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs"))
lst.day <- mask(lst.day.castellon.monthly,castellon)
Sys.setlocale("LC_ALL","English")
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
library(zoo)
names(lst.day) <- as.yearmon(seq(as.Date("2001/1/1"), as.Date("2018/12/31"), "month"))
We now, in a step-by-step study, go through LST, day-time, remote sensing data of Castellon.
spplot(lst.day[[1:12]])
boxplot(lst.day[[1:36]],xaxt='n',main="LST day 2001 - 2003")
lst.day.d <- deseason(lst.day) #deseason data
names(lst.day.d) <- as.yearmon(seq(as.Date("2001/1/1"), as.Date("2018/12/31"), "month"))
boxplot(lst.day.d[[1:36]],xaxt='n',main="deseasoned LST day 2001 - 2003")
cas_d <- aggregate(lst.day.d,fac=4,mean)
spplot(cas_d[[1:12]])
boxplot(cas_d[[1:36]],xaxt='n',main="aggregated deseasoned LST day 2001 - 2003")
# check the temporal autocorrelation in data
pacfs <- lapply(X=1:ncell(cas_d), function(i){
x <- cas_d[i]
if(!anyNA(x)){return(pacf(x,plot = FALSE))}
else{return(0)}
})
pacf_coef <- numeric()
for (i in 1:length(pacfs)) {
if(class(pacfs[[i]])=="numeric"){pacf_coef[i] <- NA}
else{pacf_coef[i] <- pacfs[[i]]$acf[1]}
}
mean(pacf_coef[!is.na(pacf_coef)])
## [1] 0.2996758
summary(pacf_coef[!is.na(pacf_coef)])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.09137 0.25702 0.29411 0.29968 0.33718 0.64548
par(mar=rep(4,4))
boxplot(pacf_coef[!is.na(pacf_coef)])
# display obtained PACF coefficient per pixel as an image
pacf.day.image <- cas_d[[1]]
pacf.day.image@data@values <- pacf_coef
ckey <- list(labels=list(cex=1.5))
spplot(pacf.day.image,colorkey=ckey,scales=list(draw=T,cex=1.4))
# check the spatial autocorrelation
moran <- unlist(
lapply(X=1:nlayers(cas_d), function(i){
Moran(cas_d[[i]])
})
)
plot(moran,type="l")
summary(moran)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2711 0.4596 0.5110 0.5076 0.5635 0.7163
# check if there is temporal autocorrelation in obtained spatial autocorrelations
pacf(moran)
cas_d@data@names[which.max(moran)]
## [1] "Oct.2018"
# we now apply the Mann-Kendall method to LST remote sensing data of Castellon
par(mfrow=c(1,1))
mk <- significantTau(cas_d,prewhitening = F,p=0.001)
cells.mk <- Which(mk,cells=T)
# pacf_coef[cells.trd1]
mk.len <- length(cells.mk)
plot(mk)
plot(castellon,add=TRUE)
# apply the E.divisive method to all detected pixels with change/trend based on Mann-Kendall
e.div <- list()
for (i in 1:mk.len) {
e.div[[i]] <- e.divisive(as.matrix(as.numeric(cas_d[cells.mk[i]]),ncol=1),sig.lvl=.001)$estimates
}
ln <- unlist(lapply(e.div,length))
# represent the pixels detected by both methods
mk.day.div <- mk
mk.day.div@data@values[-c(cells.mk[which(ln>2)])] <- NA
plot(mk.day.div)
plot(castellon,add=TRUE)
tmap_leaflet(tm_shape(mk.day.div) +
tm_raster(alpha = 0.75) +
tm_view(set.view = 8))
Distinguishing actual change-points from false positives needs further investigation. A wider study on the performance of trend/change-point detection methods can be found in (Militino, Ugarte, and Moradi 2019).
Load lst_nigth_castellon_monthly.RData
Load castellon_border.RData.
Mask the lst night data using the border of the province of Castellon.
Check if there is seasonality in lst night data.
Remove the seasonality component, if there is.
You can make aggregation for simplicity.
Calculate partial autocorrelations for all pixel time series and plot its spatial variation.
Calculate Moran’s I statistics for all monthly images, make a time series of obtained values per monthly LST image, and check its partial autocorrelation.
Apply the Mann-Kendall method to the aggregated lst night.
Make use of the E.divisive method for all detected pixels by Mann-Kendall and plot the matches.
Repeat the exercise 2 for LST and NDVI remote sensing data of Nordrhein Westfalen.
Repeat the exercise 2 for LST and NDVI remote sensing data of Lisbon.
Militino, A. F., M .D. Ugarte, and M. Moradi. 2019. “Change-Point Testing Procedures for Spatio-Temporal Data.” In Preparation.