TRIKE: Revised Project Plan

TRIKE Project Plan

  • Team Members and roles  
    • Nancy Foasberg: content strategy and development, editor, OER perspective.
    • Rob Garfield: technical development, pedagogical perspective.
    • Hannah House: content strategy and development, content project manager, data critique perspective.
    • Natasha Ochshorn: content strategy and development, philosophy of DH perspective, communications.
    • Sabina Pringle: outreach, technical project management and development.
  • Abstract
    TRIKE (Transformational Repository for Instruction, Knowledge, and Explication) is an open educational resource providing infrastructure for the sharing of datasets and data transformations alongside humanistic interrogation of the decisions made in selecting and working with that data. As a resource, it is being created primarily for use in courses in the digital humanities, providing pedagogical scaffolding in the form of thematic and textually linked data-based experiments that will model how data-work fits into the humanitarian’s toolbox. Additionally, we envision this modeling will have broader applications benefiting humanities research outside of the classroom by providing any interested researcher with contextualizing information about the decisions made when working with data. We propose development of a prototype website that we hope will be used by faculty partners and their students in Fall 2019. We will evaluate the prototype against predetermined measures of success to determine next steps for a new, funded scope to expand the prototype’s capabilities and reach.



  • Environmental scan
    Humanitarian Datasets have proliferated, however, while an environmental scan revealed several projects addressing one or more of the modes we plan to work in, few projects marry datasets, methodological instruction, and data criticality, despite calls for this kind of work.
    The prototype we propose expands beyond existing resources because it : a) provides access to a varied set of datasets at multiple stages, b) comments on the datasets to illuminate both the constructed nature of all data and the specific choices that were made when handling these datasets, and c) provides nuanced, methodological instruction based in concrete datasets while acknowledging that multiple methodologies may be appropriate for any given data.

    Existing instructional datasets
    Alan Liu’s DH Toychest provides curated lists of digital humanities tools and tutorials for various approaches to data alongside a “starting set” of demo corpora, but does not provide “work-in-progress” snapshots of datasets. The Perseids Project structures datasets for reuse and provides lesson plans for instructors, but is methodologically and disciplinarily specific.

    Existing methodological tutorials
    Collections of methodological tutorials can be found at sites including: Tooling up for Digital Humanities, devdh.org, Digital Art History 101, and Nodegoat.  These resources vary in breadth and focus, and provide advice and tutorials for novice DHers, but do not provide sample datasets.

    Other projects include data with suggestions of how it might be used. Jonathan Reeve’s Corpus DB provides both data and some Python scripts with which it can be manipulated. The M.O.N.K. Project offers public domain collections and the schemas that go along with them. Making the History of 1989 provides lesson plans and modules alongside primary sources. None represent the choices made by researchers as they work with a corpus.

    Existing data collections
    Electronic, humanities-relevant data collections, both open and toll-access, are plentiful, but vary in the amount of user support and access to data that they provide.

    Many of the largest collections provide some instruction for researchers, including the Digital Public Library of America’s “DPLA Pro”, Europeana’s “Europeana Pro”, “Chronicling America”, the New York Public Library digital collections and associated API, and the Open Science Framework. Toll-access resources such as JSTOR and primary source databases offered by GALE and Alexander Street Press may also serve as data sources; JSTOR in particular helps to enable this with JSTOR Data for Research. Institutions such as Michigan State University also curate datasets, but do not provide access to the general public (Higgins, Kudzia, and Rodriguez). Other potential data sources include NYC Open Data, HathiTrust, Wikidata, the Vera Institute of Justice, and others.

    TRIKE works on a much smaller scale than the resources listed above, but will be much more explicit in its relationship to both pedagogy and methodology.

    While instructional datasets, methodological tutorials, and data collections all exist, we did not find a tool that uses a variety of types of real data to demonstrate the process of data preparation and provide tutorials for their instructional use.  TRIKE will be a unique resource and will help to fulfill an important need.



  • What technologies will be used?
    Our site will be developed in WordPress and hosted on the Commons to leverage the existing support base and community around both.

    • Data files and documentation will be made available on GitHub
    • Other Technologies to be determined by the datasets we choose
      • Definite:
        • data prep, analysis and processing: Python
      • Potential:
        • Mapping: Carto
        • Visualization Tools: Tableau
        • Topic Modeling: NLTK or gensim
        • Image analysis
        • Gephi, Cytoscape, or another tool for network analysis



    • which of these are known?
        • Rob, Sabina, and Nancy have experience with WordPress
        • Sabina has experience with Manifold
        • Nancy has experience with Omeka
        • Hannah, Rob, Nancy and Sabina are studying Python
        • Hannah and Sabina have experience with Tableau
        • Hannah has experience with Carto and Cytoscape
    • which need to be learned?
        • We will need to explore available themes and plugins for WordPress on Commons to find the ones best suited to the organizational framework of our project
        • We are all pretty new to Slack, so are experiencing a learning curve there as well
        • We will also likely need to learn more about whichever of the analysis tools we choose to use
    • what’s plan to learn them? what support is needed?
        • Python Users Group (if applicable)
        • Consultations with the Digital Fellows and Andie
        • Consultations with Jonathan Reeve and Patrick Smyth
        • Online Tutorials and other documentation
        • NYCDH week workshops, particularly on Omeka and gensim
  • How will the project be managed?
      • Slack with Google Drive (and Giphy) integrations
  • Milestones
This entry was posted in Group Project Reports. : . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted February 20, 2019 at 9:36 am | Permalink

    Thanks for this update and detailed outline of your work plan, team! Thinking about your presentation yesterday I am curious as to the kinds of data you’re beginning to gather and the choice to use Shakespeare’s The Tempest. I have some reservations about another Shakespeare-as-test-case/dead white man project, though I understand that this helps with some issues of permissions and availability. Depending on which kinds of data you’re looking for, keep in mind that the Folger Digital Texts has XML and TEI (among others) files available for download. The Internet Shakespeare Editions also has some useful data that I’m sure they’d be happy to share.

    I’m also looking forward to hearing more about how the documentation will be created. Will you be asking individuals with useful and shareable data to explain their process, or will you be creating these explications from scratch based on your own interpretation? Are there plans for sample units or exercises so that students can participate in this process? Links to or collaborations with existing projects? Will you be reaching out to potential faculty to Beta test this down the line?

    Finally: I know you have already settled on a dataset, but just in case: BYU just released two new datasets for researchers:

    1/ The TV Corpus https://corpus.byu.edu/tv/
    325 million words in 75,000 very informal TV episodes (e.g. comedies and dramas) from 1950-2018

    2/ The Movie Corpus
    https://corpus.byu.edu/movies/ 200 million words in 25,000 movies from 1930-2018

  2. Posted February 20, 2019 at 5:44 pm | Permalink

    Thank you for your comments, Andie. We are discussing as a group and will be following up with some questions.

  3. Posted February 20, 2019 at 7:48 pm | Permalink

    Hi Andie,

    We’re very open to suggestions and those are some really cool datasets you’ve shared! A couple things that have come up as we’ve discussed your input, and which we’d like to hear more from you on are:

    1. TV is and has been very much white-man-controlled territory. We’re wondering whether analyzing TV shows predominantly written/produced by white men is meaningfully different from our existing plan to analyze a play written by a white man.

    2. the crux of our project is creating a framework for data contextualization and critique and we planned to have three different datasets to show that our approach could work on a diversity of types of data and/or analysis. Using a single dataset would substantially change and weaken that element. Are there other particular datasets you can think of that might well supplement the ones you’ve suggested?

    Regarding sample units and exercises and explicit collaborations – these add-ons are not reasonably within scope for development in the very short timeframe we have this semester. However, they are all exciting areas into which we definitely think the project could grow if the prototype seems successful. We included plans for such in our original project proposal to Matt and Steve as a “Phase 2” activity to be pursued if this prototype has legs and we secure funding after this semester.

    Looking forward to your further input, and thank you.
    The TRIKEr gang

  4. Posted February 26, 2019 at 12:32 pm | Permalink

    Hannah and Trike team,

    1. Fair point re: TV shows! I’m likely responding to my own Shakespeare fatigue, but I certainly see the value of using him as a proof of concept. This is ultimately the team’s choice and possibly something to include on your public-facing site with regards to your rationale and the project’s narrative. These two datasets popped up on Twitter recently and I thought I’d share.

    2. I tend to know more about early modern projects, which may complement your focus on Shakespeare and The Tempest. I know the Database of Early English Playbooks (DEEP) has their data available for download (http://deep.sas.upenn.edu/export.php). There’s also the TCP project from Early English Books online from which you can extract full-text transcriptions of different editions of the Tempest, for instance. Perhaps this is not the direction you intend to go, and I’m happy to make other suggestions and brainstorm potential consultants if you’re looking for more diverse data! Let’s talk further in class.

    3. Leaving the explicit pedagogical collaborations for Phase 2 sounds reasonable. Another thing to consider including in the front-facing site regarding future plans about the project.

Post a Reply to Andie Silva Cancel reply

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar