Reflection on a Data Management Plan

I made an appt. for Camilla and I to meet with librarian Stephen Klein, the Digital Dissertation Librarian, before class last Monday Feb. 26. He had good news for us about working with Omeka for the LAC group project CONTENT, DATA and OBJECTIVES, and bad news for us about the “RIGHTS + USAGE” of our planned data set. It was great that he pointed this out, esp. as rights require a long lead time to request and obtain.

When librarian Stephen Zweibel then visited our DH Praxis class later that same day/eve, I asked him during his Q + A with the class about the matters of rights surrounding the French report that the LAC project data is hinging on, specifically, the data tables contained in it, and one aspect in particular: is the data in the report the intellectual property of the co-authors and/or the many French cultural agencies that helped prepare the report? Stephen assessed that there were many sub-questions therein, so he recommended I email Jill Cirasella.

Serendipitously, I recently connected with her via Twitter when she posted her slides of her “Fair Usage in DH” workshop she gave during NYCDH Week. I received a reply from her quickly afterward, which was very generous of her b/c she’s on sabbatical this term.

Based upon Jill’s expertise in her reply to my inquiry, Camilla has assessed the LAC project’s use of the data from the report as “good to go” and thereby as a “go ahead” with our original planned data set.

So the main takeaway for myself in this reflection, and from me in this post, is the reminder that there is NO Data Management Plan without clearing all rights + usage FIRST. Granted, there are times when you must hedge and TRIPLE track all permissions with a Content Management Plan AND a Data Management Plan. This is the reality of building UNIQUE, DH-y digital platforms from data + content + intellectual property.

Posted in Personal Blogs, Uncategorized | Tagged , , , , , , , , , , , , , , , , , | Authors: | 2 Responses

Personal journal: data management and research practices

Prompt: Personal journal entries reflecting on how data management can improve your research practices

Having a good data management plan from the outset of a project can help improve the efficiency of a project by making clear upfront what file formats and organizational structures should be used. This prevents rework and needing to make last minute conversions down the line. Good data management can also provide backup and hedge against data loss mid-project. Good management of data can help prevent confusion and errors by making the right data easy to find. It also makes it easier to use one’s dataset for additional future projects, or for others to use the dataset for their own work.

Having a data management plan in place on collaborative projects, such as those we’re undertaking in this praxis class, helps ensure that we are all working with our data in broadly the same ways. It also gives us a consistent place to store our initial and transformed data, keeping all elements of the project together. Good data management practices are good research practices.

 

Posted in Personal Blogs | Authors: | Leave a comment

Data Management Plan Reflection

We’ve been working on the data management plan. Rob has taken the lead on this part of the project and has worked up a really nice plan including a description of the data, a statement on our protocols for working with data, and our plans for archiving the content. This has brought up a lot of good questions that we need to think about in our group. These questions often go far beyond the technical aspects of collecting, storing, and conveying our data and get into how we’re thinking and talking about our project. For instance, we’re working through some questions of language. Since our project explicitly seeks to highlight the always already constructed nature of data, we want to show how the data changes as we work with it, while still rejecting the common language of raw/dirty/unprocessed data vs “cooked”/clean/processed data. So, how do we talk about the different stages we are presenting? What is the best way for us to refer to the datasets we’ve found or constructed, while avoiding falling into the same language that gives rise to the perceptions we seek to challenge? These are questions we’re still considering, and it’s going to be important throughout the project to keep paying close attention to the language we use.

There are more practical considerations we are also working our way through. Some questions that have arisen:

  • What kinds of data are available for us to use in ways we have in mind? Do we need to find public domain or openly licensed data? If we are making our own datasets, what kinds of configurations are allowable under fair use and which are infringing? What can we re-license under our own free licenses? What about the software itself? I know that the GNU Public Licenses are closely related to Creative Commons licenses, about which I know a lot, but I don’t know much more about the GPL and definitely need to read up on that. We’re also consulting Jill Cirasella with some of our fair use questions. For my part, I want to work with some proprietary metadata, but I think that after the first stage of processing, I will probably be able to make a more “data-like” version of it available. Similarly, Natasha is working with scholarly articles, not necessarily openly licensed ones, but is using them to create a dataset that she can share.
  • How should we break up our “stages” of data? We are planning to make multiple copies of each stage of our datasets, following LOCKSS, and we have a plan to use GitHub to keep track of each of these, but it turns out that, just as “data” is not a naturally occurring element, neither are the points at which we’ll break it from one stage to another. This is something that might vary from one dataset to another, and of course, it’s a difficult thing to plan before we actually start getting in there and doing that work.
  • How might our data practices look different from one dataset to the next? This is very relevant to our project as we’re working with very different types of data! Because my data is very text-based, the problems it poses are ones that are generally well understood, and I think following best practices will serve us well in that case, but we also wanted to work with sound and possibly image files. This is useful work! How do we preserve the data from it?
  • How will this be archived? We don’t have a formal plan to continue beyond the end of the semester, but the data will still exist! We plan to use GitHub, and the website will continue to exist, but we’re also talking about how CUNY Academic Works supports the archiving of our data.

This is a really good project for us to work on early in our DH lives, because it’s explicitly about how data is selected, transformed, and presented, so the questions that any researcher would need to ask about data are far more deeply integrated into our thought processes than they might be if we were only interested in analyzing data and finding a result. The DMP encourages us to explicitly ask questions about data types, documentation, and platform that we need to answer in any case. Additionally, it puts us in a position to appreciate (and/or critique) the data practices of the researchers who came before us, because in some cases, we may well be using datasets that already exist.

As for me personally, I’m contributing to the data management aspects of this project by:

  • Writing a narrative that lays out in detail exactly how the dataset was created. If neither MLA nor EBSCO makes significant changes in the near future, a reader could use this to recreate the parts of my dataset that I can’t make public. If such changes ARE made, I hope to create enough documentation that it will be easy to tell how the data is different.
  • Brushing up on GitHub. I was introduced to it during the Graduate Center Digital Institute and got my feet wet there, but I’m not yet comfortable with the platform and need to work on that further. Rob has also introduced us to git-lfs (Git Large File Storage), so I need to walk through the tutorials on this.
  • Contacting Jill for conversations about fair use.
  • Reading up on the GPL.
  • Creating my dataset, making a copy on my hard drive and in the cloud, and uploading it to GitHub.

I’ve already been working on some of these, of course (it’s already Monday!!), but I have a little more to do before class tomorrow.

Posted in Personal Blogs | Tagged , , , , , , | Authors: | Leave a comment

Freedom Dreaming: A Call to Imagine Data Management Plan

Description of the Data

The data that we are interested in collecting includes text, images, videos and audio surrounding our Freedom Dreaming project and concept. For starting out, we may begin by only collecting text and images, and as the project progresses add in video and audio. The data may be obtained through online submissions on our website, and on the social networks of Twitter and Instagram. The process is two fold: data submission and collection/archiving.

Data Storage and Protection

The data will be presented on our website for the public to see and review as they wish. One of our current challenges is determining how this data will be stored on the backend, so we are currently researching and exploring our options. Currently, we are looking into Google Cloud infrastructure, Firebase database, and Flamelink as a content management service.

Ultimately, the main purpose of our project is to make the data readily available and visible for individuals to see in real-time. By visually displaying the data that is received, we are taking the practice of conscious-raising used within the United States social movements in the 1960s and 1970s and applying it to the digital world. We are considering how conscious-raising and awareness of systemic oppression can be used as a pedagogical tool to teach empathy towards others’ lives and experiences.

Data Format and Documentation

The digital data format is largely dependent on how we decide to collect data on the backend. We are still considering our methods for formatting and documenting the data, but we are interested in learning ways to sort the data by date of input, type, location, source, etc.

One option we may consider is keeping the data on Google drive for the email address we plan to create for the project and website questions or comments. We may decide to collect the data at regular intervals (every six months, every month etc.) and include it within our files in addition to a backend database. These Google spreadsheet files can be shared on our website as well.

Data Access, Sharing, and Archiving

The project itself is designed in a way that presents data instantaneously to the audience. This is important because we want individuals to feel their voice is heard and shared with our audience with the exception of harmful and hateful content. Content moderation will be done to ensure that no harmful or hateful submissions are presented on the website. For website submissions, we will simply remove content that is harmful and hateful at our discretion. For Twitter and Instagram, we will utilize the measures that these platforms already have in place for removing inappropriate content. We alert all members who submit content to our site or social media campaigns that we reserve the right to remove or report this content as needed. ‘

Long-Term Archiving and Preservation

After the research is complete, the website will remain and continue to collect data. We hope to continue our outreach efforts after our class concludes in hopes that this project continues to be far-reaching. Additionally, it would be interesting to analyze the data at a later date to consider the type(s) of submissions we receive, the audience we reach, if particular systems of oppression are more visible than others etc.

Posted in Group Project Reports | Authors: | 1 Response

Data Management Plan – Immigrant Newspapers

Roles and Responsibilities

The implementation, monitoring, and adhering to the data management plan will be a collective effort. This is primarily due to the means by which our team has been acquiring the data for some 150 publications – from using Chronicling America as a starting point for background information to gathering visuals through microfilm and street addresses from historical directories. We have split up the research according to immigrant groups and have been compiling the information that can be found online in a central document. Luci will be responsible for collecting and maintaining the digital copy data of microfilm and newspaper images, along with the text data compilations. Antonios will be responsible for maintaining the coding data that he is building to be loaded into our site.

Description of the Data

This project will produce three types of data. One type of simple data will be in the form of microfilm image captures. Another type will be spreadsheets of newspaper documentation information and publishing house locations. The third type of data will be programming and code writing that will be happening in preparation of our website launch.

While we have selected only certain parts of the data provided in Chronicling America, it currently remains “raw” data in the sense that the values will have to be made consistent with the following as things to take into consideration when cleaning the data: format of addresses, how to include any missing years, what kind of notes to include in the final product, etc.

Data Storage and Protection

During the research and collecting phase, Luci will be using an external hard drive that is used solely for the purpose of storing the data. The information that is collected is not confidential or sensitive, so there are not many steps that will need to be taken to ensure safety in that manner.

After the project is complete, the records of the data that has been published on our website will be locked in the Graduate Center repository.

Data Format and Documentation

The information that we are gathering during our research and collection will be in the form of digital images and graphics (images: png, jpg, raw or tiff,  logos: vectors, ep), maximizing the chance of future software compatibility. As mentioned above, the data will also include images of the newspapers gathered, with permission, through online and physical archives.

We will need to document the process through which information was translated in relation to all of our source material, especially due to the often incomplete nature of known information per newspaper. The data collection has been an exercise in sleuthing, and will require and explanation of our compilation methods. Links to sources that provide address information is being collected in Trello for the time being.

Data Access, Sharing and Archiving

We will consider uploading our data onto GitHub to benefit any parties that may wish to build off our project, but this isn’t one of our project’s primary goals/tasks. Our group will think about this more down the line, as the project progresses. Another possibility is the option of generating spreadsheets on the site in the form of csv files based on a visitor’s search results, as well as of the entire database, which can be downloaded.

After we upload all the text and image in the hosting server that the Graduate Center will provide, we are eligible to keep BACK UP by downloading all the htdocs folder and also the sql database, . This way, there is no danger of losing anything if the hosting server or the domain name is repealed.

Re-Use and Re-Distribution of Data

We will be crediting the origin of our images throughout are website and should have no issue with using them, especially after obtaining permission from other databases.

Long-term Archiving and Preservation

CUNY repository will provide a permanent archiving solution for our project.

 

Posted in Group Project Reports | Authors: | 1 Response

Data Management Plan / Lost Art Collective

Description of the Data

The data used in this project will be obtained from the “The Restitution of African Cultural Heritage. Toward a New Relational Ethics” by Felwine Sarr & Bénédicte Savoy. The report contains a list of the pieces of art at The Musée du quai Branly – Jacques Chirac in Paris from several African countries listed in a table format. As of this writing, the data may be obtained in one of three ways: direct input (the most labor intensive), OCR from the pdf’d list, or an electronic CSV file from the authors of the paper.

Data Storage and Protection

The data will be stored in an online database (Omeka). Since the purpose of the project is to make the dataset public, the only protection of the data will be to make it non-editable by the public, but it will be viewable to the public.

Data Format and Documentation

The digital data will be available in a spreadsheet on a google doc and on the Omeka Database. A google doc will house the original spreadsheet and the Omeka Database will be attached to the server.

Data Access, Sharing and Archiving

After the research is complete, the data will remain on the Omeka Server, the google document and also be available through a website created on the CUNY academic commons. The audience for this data are students in art history, Africana studies and digital humanities as well as art historians, people interested in provenance of stolen art, and the general public interested.

Re-Use and Re-Distribution of the Data

This data is unrestricted and will be free to use under a creative commons license.

Long-Term Archiving and Preservation

At the very least, the data will continue to be available for download from a public site on the CUNY Academic Commons.

 

Posted in Group Project Reports | Tagged | Authors: | 1 Response

Project TRIKE update: week 5 of 17

Group report covering 2/20-2/26

It was a busy week! Lots of back and forth on the team Slack channel. Nancy, Natasha, Sabina and Rob had a great running brainstorm and discussions around potential projects and datasets. Meanwhile, I developed a detailed work plan and posted it as a shared Google Slides doc for group review and comments. Our work is continuing apace, though we’re all aware that things may change. We will revisit and adjust the work plan as needed. 

We added a couple more Slack integrations, including a Doodle poll to find a mutually agreeable time for the group to meet over the weekend. Our meeting on Sunday was a 1.25 hour group chat, wherein we fully covered a rigorous preset agenda in the expected timeframe because I run meetings like a drill sergeant. It’s probably annoying – but it’s demonstrably effective. I added notes from the meeting to another running shared Google Slides doc where all meeting notes will be tracked. That document is less rigorously organized than the work plan, which is fine for its purposes.

We took a vote via poll (Polly Slack integration) for whether to keep the project datasets limited to Shakespeare or to open it up to other datasets, so as not to come off as a Shakespeare project. We had one vote on either side, two in the middle, and one semi-abstention. After further discussion we decided (as relayed during class) to open it up to different topics.

Other productive group developments included agreeing that we would all check the Project TRIKE slack channel regularly and review materials and answer any questions from other group members within 48 hours, unless communicated otherwise. We are going to rotate responsibility for writing these weekly reports. We also had a great discussion about expectations for the project past this semester.

Good communication is key to being able to effectively collaborate and have a positive group work experience. This team has been great about talking out our different points of view. I feel like we’re in a really healthy place and have a good foundation and momentum for moving forward.

Happy National Chocolate Souffle Day Eve to all of you.

 

Posted in Group Project Reports | Tagged , , , , | Authors: | 1 Response

Periodic Reminder of Class Goals and Endgame

Some reminders and recaps regarding our conversations last night:

  • Praxis: Our class is grounded on praxis, which means we learn by doing; in such a case the instructor functions as a collaborator and consultant rather than setting formal and strict rules. We are collaborating together to learn about best practices, project development, and team building, and our class will evolve out of the needs of the groups.
  • Posts: Group and individual posts are as much for your benefit as for mine, so that we get in the habit of documenting progress and generating information that can be used for the public-facing portions of the project (e.g. bio statements). Our schedule regularly includes some links to help point you in the right direction, but if there is ever need for further guidance and clarification, or if as a class we prefer to have more time with formal lectures, I am always open to hearing how our class and site can be made more productive. That being said, there are no length or content requirements for personal journals–they are a way for you to keep track of your progress. Group posts should be more formal and attend to the theme of the week, but how you craft them is similarly a decision that should be made by the group.
    • This week’s DMP: use the checklist Steve provided and the links on the schedule thoughtfully. Minimally, try to account for: what data you have, what you’re doing with it (e.g. transforming? cleaning? organizing? storing?), and where it’s being kept. Further details will vary on the needs of each project.
  • Evaluation: This class is what you make of it–each group defines the parameters by which they want to be evaluated and what they want and need in order to achieve the goals set by the work plan (and these goals can and should be continuously in flux as we evaluate the direction of the project).
  • Contact: I am available via email, DM on the Commons, and during office hours (5:30pm-6:30pm in room 4104) for any further questions or concerns, or if you simply want to touch base about how the class/group work is going.

 

Posted in Instructor Comments | Authors: | Leave a comment

LAC Contributor’s Agreement

We have created a crowd sourced Google document and we are sharing it out on the link below.

LAC Contributor’s Agreement Google Doc link.

Posted in Uncategorized | Authors: | Leave a comment

Personal Bio and Contribution Statement

Bio

S.C. Lucier is a director, designer, and production manager of concert dance, theatre, and private events in NYC.  She graduated magna cum laude from Marymount Manhattan College’s Theater Directing program where she received the department’s Gold Key of Excellence for her work in “innovative storytelling techniques,” and was a member of the SCDF Observership Class of emerging directors 2015-2016. “Luci” is currently pursuing an M.A. in New York Studies (History) in the MALS program at The Graduate Center. Favorite directing: Kerrigan-Lowdermilk’s The Bad Years Immersive House-Party Musical, Held: A Musical Fantasy at Fringe Festival NYC and NYMF 2018, multiple Shakespeare with Hip to Hip Theatre Company, Steve Romagnoli’s Skip to My Lou premiere at New City. Favorite design: Achilles’ Heels by Richard Move with Martha Graham Dance Company + Debbie Harry at Joyce Theatre, props for Lincoln Center’s Clark Studio Theater. Favorite site management: Louis Vuitton Museum (NYC). Luci is currently the Administrator of the soon-to-be American LGBTQ+ Museum in Manhattan, the career archivist for the Choreographer Sally Silvers, and a Captain of New York City’s World Championship Roller Derby team. 

Contribution Statement 

In Immigrant Newspapers, I am currently the Project Manager and also in charge of Outreach. I am trying to take as much external work of the project off of the shoulders of my colleagues as possible, so they are more able to handle the digital building and development of the database. I am very much looking forward to learning as much as I can about the tools we will be using in our website development, however, and will be happy to contribute in that area once the large task of research and data gathering has been completed. I’m excited to be setting up relationships with other databases and libraries that we would like to utilize and even visit for this project, especially due to my studies of museum science and my future interest in interactive archival exhibit-building. 

Posted in Personal Blogs | Authors: | 1 Response
Skip to toolbar