Implementing spectral LDA

I spent some time this afternoon trying to implement this cool NIPS paper, which promises a new inference technique for LDA topic models using spectral decomposition. Unfortunately, I just couldn’t get it to work — the resulting “topics” include negative numbers. I think maybe I am not correctly computing the empirical estimates of the moments. It seems obvious, but the papers never quite explain how to do it, and it’s hard for me to think what else might be wrong. You may try using this code as a starting point (sorry for the clunky numpy and scipy, I’m still learning); if you figure it out, please comment!

from scipy import sparse, linalg
from numpy import random, array, sqrt, power

# Jacob Eisenstein
# attempt at implementing spectral LDA (Anandkumar et al, NIPS 2012)
# Dec 23, 2012

def getNormalizer(counts):
l = counts.shape
return(sparse.spdiags(1/array(counts.sum(1)).flatten(),0,l,l))

def getDiagSum(counts):
l = counts.shape
return(sparse.spdiags(array(counts.sum(1)).flatten(),0,l,l))

def normalizeRows(x):
#this is not the best way to do this
return(getNormalizer(x) * x)

def computeMu(counts):
return(counts.sum(axis=0).T / counts.sum())

def computeExxt(counts):
freqs = normalizeRows(counts)
return (freqs.T * getDiagSum(counts) * freqs).todense() / counts.sum()

def computePairs(counts,mu,a0):
pairs = computeExxt(counts) - (a0 / (a0+1)) * mu * mu.T
return(pairs)

def computeTriples(counts,eta,a0,mu):
l = counts.shape
#computing E[x_1 x_2^T ]
#this is the part that i'm not 100% confident about
#  is a vector of length = #documents
freqs = normalizeRows(counts)
exxt = computeExxt(counts)
eta_mu_inner = (eta.T*mu).sum() #better way to do this?

part1 = freqs.T * sparse.spdiags(array(counts * eta).flatten(),0,l,l) * freqs / counts.sum()
part2 = (a0 / (a0+2)) * (exxt*eta*mu.T + mu*eta.T*exxt + eta_mu_inner*exxt)
part3 = (2 * a0 * a0 / ((a0+2)*(a0+1))) * eta_mu_inner*mu*mu.T
return(part1 - part2 + part3)

def eca(counts,k,a0,nips_version=True):
# counts should be an l x d sparse matrix, where l = number of documents and d = number of words
# k is number of desired topics
# a0 is sum of Dirichlet prior

#the paper says \theta is drawn uniformly from S^{k-1}, which I assume is the simplex
#it doesn't work if \theta is drawn from a zero-mean Gaussian either
theta = random.rand(k,1)
theta /= theta.sum()

mu = computeMu(counts)
pairs = computePairs(counts,mu,a0)

#step 1: random projection to d x k matrix
d = counts.shape
U = pairs * random.randn(d,k)

#step 2: whiten (NIPS version)
if nips_version:
u,s,v = linalg.svd(U.T * pairs * U)
V = u / sqrt(s)
W = U * V
else:
#step 2b: whiten (ArXiv version, p 16)
u,s,v = linalg.svd(pairs)
W = u[:,0:k] / sqrt(s[0:k])

#step 3: svd
eta = W.dot(theta)
trips = computeTriples(counts,eta,a0,mu)
xi,singvals,ignore = linalg.svd(W.T * trips * W)

#step 4: reconstruct
O = linalg.pinv(W).T.dot(xi)
return(O,singvals)


update

(August 6, 2013)

1. Thanks to Peter Lacey-Bordeaux for encouraging me to learn to typeset this code properly.
2. I was able to successfully implement the RECOVER algorithm in this ICML 2013 paper by Arora et al, in Matlab. I mean to do it in Python, but haven’t had time. The topics look good and it’s really fast.