NYCDH Week: Analyzing Congressional E-Newsletters

I attended the “What Matters to Your Congressperson?” workshop at NYU for DH Week where, as Nancy pointed out, there were some difficulties getting onto the WiFi network. The instructor, Lindsey Cormack, introduced us to DCinbox, a project she started in 2009 as a student pursuing a Ph.D. in politics, and continues to maintain. It is a database of e-newsletters sent out by members of the House and Senate, or at least those who do send them out, and can be used for research purposes. According to the website, some 90,000 e-newsletters have been collected. By examining, for example, the word frequency used in the newsletters of a certain representative or senator, one can see the issues they focus on the most.

The site links to the full dataset of newsletters which is updated in real time. This dataset is presented through Kibana – described as “an open source data visualization plugin for Elasticsearch” – a tool which I’m still trying to fully understand. Through it you can filter the results and make visualizations.

The description on the DH Week website said the workshop would show how to “perform text analyses in R” using the newsletters but it turned out not to focus as much on R as I had hoped. Given the different levels of experience among the people in the class, I imagine it would be difficult to do anything beyond giving an overview of the code used to study the results and to make the graphs. Still, it pushed me to revisit text analysis in R, and to also consider how different texts from elected officials can be analyzed – from tweets to official statements, including press releases, and biographies. And what conclusions could be drawn from a look at word frequency, sentiment analysis, etc.

Kimberly went through Rep. Jerry Nadler’s newsletters – the representative for the district in which NYU is located – over the years and pointed out words, like “health,” that come up among the most frequently used terms. This brought up the issue of what words to filter out (“stop words”) when doing text analysis. She removed words like “Facebook” (most likely used in newsletters to tell readers to visit their Facebook page) and his name in addition to the usual stop words of “and,” “I,” etc. But do you remove geographic terms like “New York,” “Brooklyn” and “Manhattan”?

She also shared some overall patterns:

  • More newsletters would be sent out following a major issue, event or tragedy
  • During President Obama’s term, his name would appear in the e-newsletters of Republicans far more than in Democrats’
  • Republicans send more newsletters than Democrats

Neither of the New York senators send out newsletters even though, as Kimberly pointed out, Sen. Gillibrand has published her daily schedule online – what she calls a “sunshine report” – in an effort to make her office more transparent. This brings up the issue of whether newsletters do indeed give insight into the workings of a politician’s office and their daily agenda being that it is essentially a prepared text.

Some links I’ve found to be helpful on text analysis in R:

This entry was posted in Workshops. : . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Comment

  1. Posted February 15, 2019 at 10:43 am | Permalink

    Jennifer, thanks for writing about this workshop! That was one I wanted to go to but couldn’t.

    Most of these things aren’t very surprising, but I wonder why the Republicans send out more newsletters than the Democrats do! It seems to me like it’s a useful way to communicate, even if, as you point out, it might not be the strike for transparency it’s painted as.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar