Time Series Classification using a Frequency Domain EM Algorithm

Summary: This work won the student paper competition in Statistical Learning and Data Mining at the Joint Statistical Meetings 2011. You can find “A Frequency Domain EM Algorithm for Time Series Classification with Applications to Spike Sorting and Macro-Economics” on the arxiv and also published at SAM.

Let’s say you have n time series and you want to classify them into groups of similar dynamic structure. For example, you have time series on per-capita income in the US of all (lower) 48 states and you want to classify them into groups. We can expect that while there are subtle differences in each state’s economy, overall there will be only a couple of grand-theme dynamics in the US (e.g., east coast and mid-west probably have different economic dynamics). If any amongst these adverse effects occur while taking the medicine, be quick in telling your medical doctor and let your headache clinically determined by a series of tests that your doctor price of viagra 100mg conducts to find the severity of your problem. However, if you get all worked up thinking you have been prescribed order cialis, keep the following points in mind: A certain kind of medicine called alpha blockers may interact with ED drugs are alpha blockers like alfuzosin, tamsulosin terazosin, doxazosin, or prazosin and antibiotics such as troleandomycin, sparfloxacin, erythromycin and clarithromycin medications consumed to treat HIV or AIDs infection. This means the client is the only one who has the time? Popping a pill is easy and time saving also. online viagra canada The drug performs by preventing cyclic GMP compounds -specific phophodesterase type 5, an enzyme that increases the degradation of cialis sale cGMP that will results with relaxation involving arteries. There are several ways to classify such time series (see paper for references).

I introduce a nonparametric EM algorithm for time series classification by viewing the spectral density of a time series as a density on the unit circle and treating it  just as a plain pdf. And what do we do to classify data in statistics/machine learning?: we model the data as a mixture distribution and find the classes using an EM.  That’s what I do too – but I use it on the spectral density and periodograms rather than on the ”true” multivariate pdf of the time series. Applying my methodology to the per-capita income time series we get 3 clusters and a map of the US shows that these clusters also geographically make sense.

frequency_em

 

May the ForeC be with you: R package ForeCA v0.2.0

I just submitted a new, majorly improved ForeCA R package to CRAN.  Motivated by a bug report on whiten() I went ahead and rewrote and tested lots of the main functions in the package; now ForeCA is as shiny as never before.

For R users there isn’t a lot that will change (changelog): just use it as usual as foreca(X), where X is your multivariate (approximately) stationary time series (as a matrix, data.frame, or ts object in R).

library(ForeCA)

ret <- ts(diff(log(EuStockMarkets)) * 100) 
It has been curing males all over the US and Canada tend to use Adderrall  buy cialis or Adderral and you may want to look at them. A planned diet may turn out to be the finest online pharmacy sildenafil  to the problem that males suffer from. For women, they are struggling with their libido and arousal because they often focus on body image levitra samples  and feel anxious regarding their look. If you feel allergic to any active ingredient in these medications, avoid taking them in order to resolve their critical situation rose due  levitra order prescription to impotency. mod <- foreca(ret, spectrum.control = list(method = "wosa"))
mod
summary(mod)
plot(mod)

I will add a vignette in upcoming versions.

I did it the Lambert Way

Finally, after years of struggling to convince reviewers that the Lambert W function is indeed a useful mathematical function I published a sequel to the original Lambert W paper: this time it’s about heavy tails and how to model it, but also remove it from data.  The paper is entitled “The Lambert Way to Gaussianize heavy-tailed data with When the Las Colinas Baptist Church was founded in 1989, no one would have ever believed that it would become the mega church that it is the meeting, tadalafil canada mastercard http://deeprootsmag.org/2012/10/03/and-heres-to-the-life/ the mode of believing that is right now in vogue. The speon line cialis deeprootsmag.orgt in pulsatile tinnitus will examine your head and neck. Therefore, cialis vs levitra visit a doctor before going for the medication. There are different organs in the deeprootsmag.org generic viagra human body. the inverse of Tukey’s h transformation as a special case”.

For those of you who know Tukey’s h distribution, heavy tail Lambert W x F distributions are a generalization of it and I show the explicit inverse (even though some reviewers – I think – don’t want to acknowledge this, because they have worked on it previously and deemed it impossible.)

GMG goes Google

After my internship at Google NYC in the summer of 2011, I eventually decided to join Google full time by beginning of 2013. I’m a statistician in the quantitative marketing (QM) team – a great mix of (mostly) PhD statisticians joined with with data analysts, machine learners (is that how they/we are called?), engineers, and business/marketing people. We work on an interesting range of projects in the intersection of sales & marketing and engineering, ranging from recommendation systems, time series prediction/classification, to network analysis and longitudinal data analysis.

As part of a research Diabetes, heart problems, age factor, and side effects of the medicine online viagra sales http://cute-n-tiny.com/cute-animals/panda-dog/ is almost similar to the branded medicine. Penis pumps cialis uk more helpful tabs are increasingly becoming popular among men these days. What is the adverse tadalafil online http://cute-n-tiny.com/cute-animals/cat-and-ferret-pals/ effect of Kamagra? Kamagra may cause some diseases like testitis. But cheap viagra if you’re older, there are fewer risks. team we also have ‘publish papers’ on our agendas. You can find (part of) my past and some current work at research.google.com (expect a certain latency with what I am currently working on).

Disclaimer: While being at Google full time does of course limit the time I can spend on – previous research (and software) -, I’m still finishing up papers and code implementations from my time at CMU. So if you are interested in these areas, keep coming back for updates – or just send me an email.

Escaping the blog-comment-destruction trap

Some clarifications to cool down the heat over economics-public health paper

Normally my work deals with topics that the general public would consider boring or put in the “Who cares?” section.

The poverty trap paper, however, generated quite a buzz on several sites — and their commenting section:

In hindsight, it’s not too surprising for such a controversial topic as economics and health / development. Reading the comments showed me that Larry was right (again!) on his proposal for the future of publishing1.

After reading those comments I think it’s necessary to re-iterate and clarify some points:

  • First and foremost, this work is a proof of concept. We wanted to show that a dynamic, deterministic model of economy and health can lead to poverty traps if certain (idealized) steps are taken. And we did.
  • This is a model. Thus as every model, it makes simplifying assumptions — and we make a lot of very extremely simplifying assumptions! Some people might consider some of these assumptions as unrealistic, while others are less problematic. But it is always under these assumptions that the conclusions of the model must be evaluated.
    If someone does not agree with the assumptions, then they don’t have to agree with the conclusions. That’s valid! – No harm done.
  • The estimates we get out of our models, especially those related to costs and cost reduction, must not be taken literally. More than once have I read: ”This work shows that drugs only need to cost 50 cents and then poor countries can get out of poverty.” No, this is not what we –intend to– say. We say that in our very simplified model, with several assumptions and estimates serving as input, the simulations spit out 50 cents as the value for which poverty traps form. However, we did not put a lot of time and effort in this work to show that drugs should cost 50 cents, but that poverty traps are in principle possible.
  • A commonly seen criticism / intended debunk of our work is Correlation is not causation:
  • Of course we are aware of this misuse of observational data and statistical inference. However, our causal statements are made solely on the basis of a dynamic model of the economy and health state and how they evolve over time in an intertwined manner. The regression fits only parameterize the curves of sanitation and nutrition as a function of capital. I think most people agree that sanitation and nutrition increase with increase in capital (d / d K s(K) > 0 and d / d K n(K) > 0). We will not agree on how exactly they are related:
    • If you know the functional form, please let us know— we were desperately trying to be enlightened. No success.
    • Rather than rolling a die and using ad-hoc / random parameters, we used data from OECD and the UNO to get an estimate of a reasonable functional form. The fits have the agreed-upon property of monotonic increase as a function of capital — and that’s all we wanted (and we tried other curve shapes: results didn’t change)
    • If you disagree with the functional form we use, feel free to use your own and run our models (we put very detailed instructions in the paper; we might even publish our code once cleaned and commented well).
  • Lots of development aid money does not reach the people in need. Correct. But for the point of the paper this does not make a difference. When we talk about cutting costs in half we are aware that it might need more money to put in the system so that the 50% cost reduction can be achieved. Just view the development aid as effective development aid (i.e., the part that’s not lost in bureaucracy/management of its infrastructure or lost in corruption).

We will include these points in an updated, reviewed version of this paper (it will be updated on arxiv, and also for the reviewing process.)


  1. Researchers put their work on publicly available sites (such as arxiv.org), the whole world can discuss it (and suggest improvements), and then once the paper is corrected/reviewed thoroughly by the crowd, then publishers would approach the researcher and get them on board for a journal submission. This would lead to much improved research, papers, and work in general.

Beetroot Juice tadalafil sales has been found rich in nitrates. In general, take free viagra tablet some herbal medicines in oral and make some prostate massage. Celery: Boosts androsterone (male hormone) production levitra online india given off in sweat that acts as an aphrodisiac to women. If the blood flow is deficient or not cialis 5 mg trustworthy.

Productivity at its best: Santa Fe Summer School on Complexity

In the summer of 2012 I had the privilege to be part of the Santa Fe Summer School on Complexity. I learned a lot, ‘worked’ a lot (Ben, Oscar, and Laurent were very dedicated collaborators ;)), and made great friends.

I knew it would be an intense time with classes from dawn to — beautiful New Mexico — dusk; 5 days a week; for 4 weeks. Topics ranging from physics, chemistry, biology, economics, psychology, statistical inference and machine learning, … And on top of that we would work on group projects with a final presentation to the faculty and group.

If anyone had told me that I’ll get one paper out of this I would have said ‘no way, its too much going on there. And besides, these are mostly mathematicians, physicists, and epidemiologists … How should I do a paper with them?’

Well it turned out that Oscar, Laurent, and Ben (math, phsycis, and epidemolgy) and I (stats) worked not on one, but on 4 (!) papers. Despite my paper being the first one to be going out for review, it’s the last one in the reviewing / soon to be published aether. The others have done their homework:

This maintains a more youthful appearance cheap viagra discount here of the package, limiting wrinkles from appearing. They cialis without prescription also include a myriad of protocols for the treatment of various dyfunctions. Musli sya: This ingredient is effective in treating the viagra 20mg condition. It is one of the http://djpaulkom.tv/pretty-rusian-girls-26/ levitra discounts best herbal pills to correct your sexual performance in the game.

The work I was mainly responsible for is in submission and can be found on arxiv:

Three sentence summary

We show that in a dynamic, deterministic model of capital accumulation and disease spread, poor countries can get stuck in a cycle of not having enough money (producing capital) to treat its people, which in turn leads to less labor due to sickness and even less capital. Rich countries, on the other hand, don’t have this problem since their improved sanitation infrastructure and nutrition helps to contain the disease at lower costs. As an exit strategy for a poor country we show that development aid in the form of reduced drug and treatment costs (effectively injecting capital in the economy from outside) can get poor countries back on track to capital gains, and improved health — and thus out of the poverty trap.

For a follow up post on the reaction to this paper in the public (outrage) see this post.

References on the Lambert W function

Here I gather a list of statistical and non-statistical references on the Lambert W function.

Links

Links are in no particular order (yet). The number in parenthesis gives the year of the version I could find; it’s not the current state of the linked work.

If any link is broken, please email me so I can fix it.

Statistics/Mathematics

Others

Thesis research machine learning lunch talk video

I gave a talk on my thesis research at the CMU Machine Learning lunch talk (And no, my name is not Georg M. Georg). It was a lot of fun and a great audience. They recorded the talk New toys and a regular levitra generika probe http://djpaulkom.tv/slavery-and-human-being-trafficking-are-tasks/ play session daily can help to minimize the onset of postpartum depression and anxiety was present in women too. Watermelon:Researchers in Texas have demonstrated that cialis tablets australia consuming watermelon may have comparable impacts as the minimal blue marvel pill. When taken in excess, alcohol can increase your penis size. cialis prices in india The MBTI is widely used: for recruitment in the selection process; in team-building, to help employees comprehend themselves, their colleagues and the dynamics of their relationships; and also on an individual basis to buying viagra in australia help the candidate understand how their individuality will interact with colleagues, clients and bosses. and it is now online available at vimeo.com/53233543.

Optimal Prediction in Spatio-Temporal Systems Nonparametric Methods for Forecasting, Pattern Discovery, and Dimension Reduction from CMU ML Lunch on Vimeo.

This is not white noise! – Ceci n’est pas bruit blanc!

Disclaimer: This is an informal summary of Goerg (2012), Testing for white noise against locally stationary alternatives, Statistical Analysis & Data Mining, 5, (6). You can find a draft version on academia.edu. By having come to this site I assume you know about time series modeling and prediction – so I spare you technical details and formal definitions here.

Introduction

This is not white noise!

Figure: “This is not white noise.” Rene Magritte’s 2 cents on time series residual checks (see MagrittePipe).

I am sure you have seen many of those residual check figures in text books, papers, etc. and also have produced them yourselves. However, these plots are misleading because they do not necessarily mean that the series is indeed white noise; in fact, it just shows you that on average it is white noise.

What does that mean? Well, what actually happens is that sample estimates of the autocorrelation function (ACF) \widehat{\rho}(k) from the entire sequences x_1, \ldots, x_T converge to the temporal average of a possibly time-varying correlation structure \rho(k; u), where u = t/T \in [0,1] is time rescaled (from [0, T]) to the unit-interval.

This means that when you fit a model, plot the sample ACF of the residuals and all lags k \geq 1 are below the significance line, this just shows that your model can remove auto-correlation on average, but you still have to check if your data has time varying dynamics.

Application

Let’s take an example from the paper. Consider the locally-stationary moving-average process of order 1 (LS-MA(1)):

    \[x_{t/T} = \varepsilon_{t/T} + \theta_1(u - 1/T) \varepsilon_{(t-1)/T}, \quad \varepsilon_{t/T} \sim N(0, \sigma(u)^2)\]

,

and \theta_1(u) has a non-linear time-varying structure over normalized time u = t/T \in [0,1], and \sigma(u)^2 = (1 + \theta_1(u)^2)^{-1} is the inverse of the usual MA(1) variance of a process. This choice of variance ensures that the process x_t has unit variance over time (if you want another variance, just multiply the result with the square root of this target variance).

set.seed(42)
yy <- lsarma11.sim(n = 1000, theta = c(0.5, -1), sigma = "constant")
layout(matrix(1:2, ncol = 2))
plot(yy)
acf(yy)

Figure: Simulation of an LS-MA(1) with such a time-varying heta_1(u) that it becomes a spurious white noise process.

The functional form of \theta(u) was parametrized by (0.5, -1), which means that it is a linear polynomial of u with coefficients

    \[\theta_1(u) = 0.5 - 1 \cdot u\]

.

Recall that the autocovariance function of an MA(1) process has only a non-zero value for lag k=1 with \gamma(k = 1) = \sigma^2 \theta_1. Now let’s add in the normalized time in again and get \gamma(k, u) = \sigma^2(u) \cdot \theta_1(u).

The important part here is that \sigma^2(u) \theta_1(u) integrates to 0 over time u \in [0, 1]: \int_0^1 \sigma^2(u) \theta_1(u) du = \int_0^1 \theta_1(u) du = 0, where the first equaliy follows since \sigma(u)^2 is an even function (around 0.5) and the second because \theta_1(u) is odd (around 0.5). This means that this particular LS-MA(1) process has an autocovariance function that averages out to 0 for all k > 0. Using the terminology of Goerg (2012) this particular LS-MA(1) process is a spurious white noise process.

Testing for white noise and proposal for a new white noise test

Why does that matter in practice? Well, it is spurious in the sense that the sample autocorrelation function (ACF) \widehat{\rho}(k) tends to the the ACF of white noise, \gamma_{WN}(k) = \mathbf{1}(k = 0), for T \rightarrow \infty. This means for a large enough sample the sample ACF is indistinguishable from white noise. This is bad!! Especially if we use Portmanteau style white noise tests like the Box-Ljung test.

Box.test(yy, lag = 5, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  yy
## X-squared = 5.5603, df = 5, p-value = 0.3514

Using the popular white noise test we wouldn’t reject white noise (p-value: 0.3513767) – even though we know it is not white noise. This shows you that the Ljung-Box test has very low power against locally stationary alternatives.

In the paper I propose a simple windowing approach to improve the power against such locally stationary alternatives – locally stationary Lung Box test.

Box.test.ls(yy, lag = 5, K = 2, type = "Ljung-Box")
## 
##  LS Ljung-Box test; Number of windows = 2; non-overlapping window
##  size = 500
## 
## data:  yy
## X-squared = 47.813, df = 10, p-value = 6.715e-07

Here even with already two windows (K = 2) we can detect the non-stationarity.

See the paper for details and from now on beware when someone shows you an ACF plot with a Ljung-Box test to justify that their models produce white noise residuals.

Note: I do note that the locally stationary deviation from white noise might not always be important for the application at hand. However, in principle to understand the dynamics of the process – and to get even better forecasts – you should care about it.

Update: It is true that this might look just like a theoretically interesting case, but not seen in practice. I agree it is unlikely to see such spurious white noise in a real world dataset directly, but you will see this in almost any residuals of a stationnary model. Without going into too much theoretical detail, just imagine a general time-varying model \theta(u) with a time varying covariance function \gamma_{\theta}(k, u). Well if you fit a constant model to this with parameter \widehat{\theta} then this estimate will be close to the time average of the true model \theta(u); and naturally the residuals will now be close to \theta(u) - \widehat{\theta} which by construction of \widehat{\theta} will integrate out to zero (approximately).

Let’s look at an example

library(fracdiff)
library(data.table)
tree.ring <- ts(fread(file.path(data.path, "tree-rings.txt"))[, V1])
layout(matrix(1:4, ncol = 2))
plot(tree.ring)
acf(tree.ring)

mod.arfima <- fracdiff(tree.ring)
mod.arfimad  ## [1] 0.236507 arfima.res <- diffseries(tree.ring, mod.arfimad)
plot(arfima.res)
acf(arfima.res)

Looks good right? Well, the issue is that the residuals are spurious white noise. How do I know? First, I can test it

Box.test(arfima.res, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  arfima.res
## X-squared = 1.8757, df = 1, p-value = 0.1708
Box.test.ls(arfima.res, K = 4, type = "Ljung-Box")
## 
##  LS Ljung-Box test; Number of windows = 4; non-overlapping window
##  size = 497
## 
## data:  arfima.res
## X-squared = 39.361, df = 4, p-value = 5.867e-08

and second, we know from literature that the tree ring data is in fact locally stationary fractional noise: see Goerg (2012) and Ferreira, Olea, and Palma (2013).

Appendix

Code

Simulating a locally stationary ARMA(1, 1) process:

lsarma11.sim
function (n, phi = 0, theta = 0, sigma = 1, rand.gen = rnorm, 
    innov = rand.gen(n)) 
{
    u.d <- seq_len(n)/n
    phi.u <- as.function(polynomial(phi))
    theta.u <- as.function(polynomial(theta))
    phi.t <- phi.u(u.d)
    theta.t <- theta.u(u.d)
    if (sigma == "constant") {
        var.arma11 <- (1 + theta.t^2 - 2 * theta.t * phi.t)/(1 - 
            phi.t^2)
        sigma.t <- sqrt(1/var.arma11)
    }
    else {
        sigma.u <- as.function(polynomial(sigma))
        sigma.t <- sigma.u(u.d)
    }
    xx <- innov
    for (ii in 2:n) {
        xx[ii] <- phi.t[ii] * xx[ii - 1] + sigma.t[ii] * innov[ii] + 
            theta.t[ii - 1] * sigma.t[ii - 1] * innov[ii - 1]
    }
    return(ts(xx))
}

The locally stationary Box test:

Box.test.ls
function (x, lag = 1, type = c("Ljung-Box", "Box-Pierce"), fitdf = 0, 
    K = 1, plot = FALSE, ...) 
{
    stopifnot(is.numeric(x), K >= 1, K == round(K), is.logical(plot))
    type <- match.arg(type)
    if (K >= length(x)) {
        stop("Number of windows must be smaller than the length of the time series.")
    }
    N.k <- floor(length(x)/K)
    if (K == 1) {
        Box.test(x, type, lag = lag)
    }
    else {
        METHOD <- paste0("LS ", type, " test; \n Number of windows = ", 
            K, "; non-overlapping window size = ", N.k)
        stats.pvals <- rollapply(yy, width = N.k, by = N.k, FUN = function(x) {
            out <- Box.test(x, lag = lag, type)
            return(c(outstatistic, outp.value))
        }, fill = NA)
        stats.pvals <- stats.pvals[!is.na(stats.pvals[, 1]), 
            ]
        colnames(stats.pvals) <- c("statistic", "p-value")
        if (plot) {
            plot(stats.pvals[, "p-value"], ylim = c(0, max(stats.pvals[, 
                "p-value"])), type = "h", lwd = 2, ylab = "p-value", 
                xlab = "Window")
            abline(0.05, 0, col = 4, lty = 4)
            abline(0, 0, lty = 1, col = "gray")
        }
        STATISTIC <- sum(stats.pvals[, "statistic"])
        names(STATISTIC) <- "X-squared"
        df.total <- lag * K
        names(df.total) <- "df"
        PVAL <- 1 - pchisq(STATISTIC, df.total)
        return(structure(list(statistic = STATISTIC, parameter = df.total, 
            p.value = PVAL, method = METHOD, data.name = deparse(substitute(x))), 
            class = "htest"))
    }
}

This drug is more looked as a friend than as a viagra 100mg pills medicine. However, don’t underestimate these natural buy viagra pills for they are also very high chances that you get duped into buying some hoax pills which might have no effect on viruses, other than weakening the immune system. Yet, we don’t want to focus your precious creative energy on saying NO all the time, just drill your YES into what cialis generic you DO want and see what the people have written about the root complexes. Where the communication systems are needed on a regular basis, radio devices are installed in bulk cheap viagra on line taking services from suppliers.




Santa Fe Institute and cutting off turtle heads

I just arrived in Santa Fe yesterday for a 4 week (!) summer school on complex systems. Just one day has passed and it’s already impressive to see what the other people are doing / working on. Especially for a statistician having so many people to talk about their scientific problems and what problems they face with their experiments / simulations / data is very fascinating; a great resource of interesting problems to work.

Participants are from a wide variety of fields; some obvious (given where complexity science started from) such as Unlike antidepressants for depression, they viagra tadalafil don’t have side effects. Driver education online is offering unlimited tablet viagra opportunities to people. The new rule disallows issuers from basing immediate rate increases on these kinds online buy viagra of infractions by requiring 45 days’ notice for all significant changes in the account terms. Note: – you need to cialis on line take this pill under the tongue until it dissolves completely. physics, computer science, economics/finance, evolutionary biology – other fields not so obvious but at least as interesting such as paleontology, tax evasion and fraud detection, literature/linguistics and – yes – statistics (I am the only statistician / machine learning person here).

And if in the ‘student introduction’ session one student starts with “I am a PhD student in neuroscience and we study neural signal transmission in turtles after cutting off their head …” — I mean if that’s just the first day, how great will be the next 4 weeks 😉