Implementing spectral LDA

I spent some time this afternoon trying to implement this cool NIPS paper, which promises a new inference technique for LDA topic models using spectral decomposition. Unfortunately, I just couldn’t get it to work — the resulting “topics” include negative numbers. I think maybe I am not correctly computing the empirical estimates of the moments. It seems obvious, but the papers never quite explain how to do it, and it’s hard for me to think what else might be wrong. You may try using this code as a starting point (sorry for the clunky numpy and scipy, I’m still learning); if you figure it out, please comment!

from scipy import sparse, linalg
from numpy import random, array, sqrt, power

# Jacob Eisenstein
# attempt at implementing spectral LDA (Anandkumar et al, NIPS 2012)
# Dec 23, 2012

def getNormalizer(counts):
    l = counts.shape[0]

def getDiagSum(counts):
    l = counts.shape[0]

def normalizeRows(x):
#this is not the best way to do this
    return(getNormalizer(x) * x)
def computeMu(counts):
    return(counts.sum(axis=0).T / counts.sum())

def computeExxt(counts):
    freqs = normalizeRows(counts)
    return (freqs.T * getDiagSum(counts) * freqs).todense() / counts.sum()

def computePairs(counts,mu,a0):
    pairs = computeExxt(counts) - (a0 / (a0+1)) * mu * mu.T

def computeTriples(counts,eta,a0,mu):
    l = counts.shape[0]
    #computing E[x_1 x_2^T ]
    #this is the part that i'm not 100% confident about
    #  is a vector of length = #documents
    freqs = normalizeRows(counts)
    exxt = computeExxt(counts)    
    eta_mu_inner = (eta.T*mu).sum() #better way to do this?

    part1 = freqs.T * sparse.spdiags(array(counts * eta).flatten(),0,l,l) * freqs / counts.sum()
    part2 = (a0 / (a0+2)) * (exxt*eta*mu.T + mu*eta.T*exxt + eta_mu_inner*exxt)
    part3 = (2 * a0 * a0 / ((a0+2)*(a0+1))) * eta_mu_inner*mu*mu.T
    return(part1 - part2 + part3)

def eca(counts,k,a0,nips_version=True):
    # counts should be an l x d sparse matrix, where l = number of documents and d = number of words
    # k is number of desired topics
    # a0 is sum of Dirichlet prior 

    #the paper says \theta is drawn uniformly from S^{k-1}, which I assume is the simplex
    #it doesn't work if \theta is drawn from a zero-mean Gaussian either
    theta = random.rand(k,1)
    theta /= theta.sum()

    mu = computeMu(counts)
    pairs = computePairs(counts,mu,a0)
    #step 1: random projection to d x k matrix
    d = counts.shape[1]
    U = pairs * random.randn(d,k)

    #step 2: whiten (NIPS version)
    if nips_version:
        u,s,v = linalg.svd(U.T * pairs * U)
        V = u / sqrt(s)
        W = U * V
    #step 2b: whiten (ArXiv version, p 16)
        u,s,v = linalg.svd(pairs)
        W = u[:,0:k] / sqrt(s[0:k])

    #step 3: svd
    eta =
    trips = computeTriples(counts,eta,a0,mu)
    xi,singvals,ignore = linalg.svd(W.T * trips * W)
    #step 4: reconstruct 
    O = linalg.pinv(W)


(August 6, 2013)

  1. Thanks to Peter Lacey-Bordeaux for encouraging me to learn to typeset this code properly.
  2. I was able to successfully implement the RECOVER algorithm in this ICML 2013 paper by Arora et al, in Matlab. I mean to do it in Python, but haven’t had time. The topics look good and it’s really fast.

One Response to Implementing spectral LDA

  1. Unfortunately the whitespace got destroyed while posting. I personally use when posting code on blogs.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: