Chamber of Secrets: Teaching a Machine What Congress Cares About
Speaker of the House Paul Ryan is a tax wonk ― and most observers of Congress know that. But knowing what interests the other 434 members of Congress is harder.
I applied some natural language processing algorithms – including word2vec – to a huge archive of press releases.
Each press release is assigned to up to four legislative topics – from a (slightly-modified) list of legislative topics picked by the Library of Congress – by comparing the document’s doc vector to the word vectors from a hand-picked, validated list of words associated with that topic (like “surveillance” to the “civil liberties” topic or “airports” to the “infrastructure topic”.)