Data Management Plan – Immigrant Newspapers

Roles and Responsibilities

The implementation, monitoring, and adhering to the data management plan will be a collective effort. This is primarily due to the means by which our team has been acquiring the data for some 150 publications – from using Chronicling America as a starting point for background information to gathering visuals through microfilm and street addresses from historical directories. We have split up the research according to immigrant groups and have been compiling the information that can be found online in a central document. Luci will be responsible for collecting and maintaining the digital copy data of microfilm and newspaper images, along with the text data compilations. Antonios will be responsible for maintaining the coding data that he is building to be loaded into our site.

Description of the Data

This project will produce three types of data. One type of simple data will be in the form of microfilm image captures. Another type will be spreadsheets of newspaper documentation information and publishing house locations. The third type of data will be programming and code writing that will be happening in preparation of our website launch.

While we have selected only certain parts of the data provided in Chronicling America, it currently remains “raw” data in the sense that the values will have to be made consistent with the following as things to take into consideration when cleaning the data: format of addresses, how to include any missing years, what kind of notes to include in the final product, etc.

Data Storage and Protection

During the research and collecting phase, Luci will be using an external hard drive that is used solely for the purpose of storing the data. The information that is collected is not confidential or sensitive, so there are not many steps that will need to be taken to ensure safety in that manner.

After the project is complete, the records of the data that has been published on our website will be locked in the Graduate Center repository.

Data Format and Documentation

The information that we are gathering during our research and collection will be in the form of digital images and graphics (images: png, jpg, raw or tiff,  logos: vectors, ep), maximizing the chance of future software compatibility. As mentioned above, the data will also include images of the newspapers gathered, with permission, through online and physical archives.

We will need to document the process through which information was translated in relation to all of our source material, especially due to the often incomplete nature of known information per newspaper. The data collection has been an exercise in sleuthing, and will require and explanation of our compilation methods. Links to sources that provide address information is being collected in Trello for the time being.

Data Access, Sharing and Archiving

We will consider uploading our data onto GitHub to benefit any parties that may wish to build off our project, but this isn’t one of our project’s primary goals/tasks. Our group will think about this more down the line, as the project progresses. Another possibility is the option of generating spreadsheets on the site in the form of csv files based on a visitor’s search results, as well as of the entire database, which can be downloaded.

After we upload all the text and image in the hosting server that the Graduate Center will provide, we are eligible to keep BACK UP by downloading all the htdocs folder and also the sql database, . This way, there is no danger of losing anything if the hosting server or the domain name is repealed.

Re-Use and Re-Distribution of Data

We will be crediting the origin of our images throughout are website and should have no issue with using them, especially after obtaining permission from other databases.

Long-term Archiving and Preservation

CUNY repository will provide a permanent archiving solution for our project.


This entry was posted in Group Project Reports. : . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Comment

  1. Posted March 5, 2019 at 3:32 pm | Permalink

    Good to see how much data is being gathered at this point and the range of information you’ve been able to collect so far. It makes sense to split the data collection so that the database can continue to progress while other elements and data points are being gathered. When do you plan to integrate the new information into the database? As it comes in, once you have a number of records, or toward the end after all the data has been collected and recorded? If there are plans to work collaboratively on any of this, you may need also to store the database on a cloud system so that multiple team members can work on it.

    I appreciate that there is a clear plan for documentation so as to preserve the decisions made by the team and ensure future users can take advantage of your hard work. It might be interesting to post some of these choices as you make them, in a blog for the project that can help generate interest and even help you with feedback and user input down the line. I think the idea of having a link to downloadable .CSV files makes complete sense at this point.

    And finally: what does it mean to say that the project will be “locked” in a CUNY repository? Do you mean that you wish to make it private or accessible only to CUNY users? Or that a stable copy of the completed project will be maintained in the repository? And speaking of “complete”, let’s discuss further what that looks like, and where it should live when it’s formally published.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar