Speaker of the House Paul Ryan is a tax wonk ― and most observers of Congress know that. But knowing what interests the other 434 members of Congress is harder.

I applied some natural language processing algorithms – including word2vec – to a huge archive of press releases.

Each press release is assigned to up to four legislative topics – from a (slightly-modified) list of legislative topics picked by the Library of Congress – by comparing the document’s doc vector to the word vectors from a hand-picked, validated list of words associated with that topic (like “surveillance” to the “civil liberties” topic or “airports” to the “infrastructure topic”.)

I wrote about the algorithm and the goals.