What distinguishes my member of Congress from their colleagues? Whom do they resemble? I set out to answer to these questions – for every member of Congress – to add the answers to our Represent congress data site.

I applied some natural language processing algorithms – including word2vec and TFIDF – to a huge archive of press releases.

The algorithm pulls out niche issues like the sage grouse – real issues that weren’t front of mind to me, which demonstrates its value. The “similar members” results picks up on real groupings of congressmembers, only from similar topics in phrases in their press releases.

I wrote about the algorithm and the goals.