The Lost Art Collective team has been focused on gathering and preparing our data for entry into Omeka. Given that the data is the heart of the project, we’d like to elaborate on how we acquired our datasets and the steps to ‘clean’ and ‘prep’ the information.
The heart of this project is based on a report, The Restitution of African Cultural Heritage: Toward a New Relational Ethics, that was researched and presented by a team of French and African scholars, Art Historians, and researchers in November 2018. In addition to describing the varied issues concerning the restitution of African art and artifacts back to their countries of origin, the report documents the African Collections that are housed at the Musée du quai Branly-Jacques Chirac in Paris, France, totaling over 70,000 items.
We decided to focus on the items identified in this report for the initial phase of our project.
Permissions to use the data (completed):
Our first task was to verify that we would not be violating any copyrights by incorporating these items into our database. Carolyn emailed one of the GC’s librarians regarding “Fair Use” of the data in the report (we got the go-ahead). As it turns out, since the information is available through the Museum’s online search, it is considered public.
However, we do not have the rights to use the images of the artifacts. Pam and Patty figured out how to combine some of the data fields to create a url that will link to the museum’s search engine and display each item.
Gathering the data (completed):
Next step was to get the date into an excel format, so that it could be converted to a csv file (to be uploaded into Omeka). We had a parallel approach to doing this:
Pam focused on converting the pdf version of the report to an ocr format, and then converting this into an excel file. This proved to be very “messy” and time consuming, and would have required significant time to “clean” the data.
Concurrent to this, Patty contacted one of the report’s authors, who connected her with one of the researchers working closely with the museum’s staff to create the databases, which he kindly shared with us. We now had all 70,000 records in an excel spreadsheet!
Understanding & Cleaning the data (60% complete):
We are currently cleaning the data (since there are 48 country files, we’ve divided this up among all group members). These efforts include:
– translating the relevant fields from French to English
– Splitting some fields to multiple columns (in order to map the information correctly)
– adding new fields (for example, the catalogue item number includes the date that the item was entered into the museum’s collection, so we are adding a new field with this date, to add another criteria for querying the database.
Give the volume of data, we are starting with a sub-set of some of the larger data sets. For example, if there are over 7,000 items from a single country, we will begin by translating 10% of the records, so that we can get started with loading the data into Omeka.
Mapping the Data to Omeka (conforming to the Dublin Core Metadata Standard) (50% complete):
Patty and Camilla have begun mapping the data from the spreadsheets to the Omeka standard. We need to learn more about Omeka before we can finalize this step.
Learning Omeka:
In our efforts to learn our chosen platform, the team attended the Omeka Workshop that was offered by the Digital Fellows on March 12th. Having more specific questions, we attended a follow-up session with Kristin (a Digital Fellow) on Mar 20th; unfortunately Kristin was out sick, so we are attempting to find other means to get our questions answered.
We’ve identified a few templates that would be appropriate for our project, and have begun mapping our data to the Omeka data types.
Procuring a Server for Omeka:
Carolyn is working with Andie and the Digital Fellows to procure server space, and will load Omeka on the server.
Camilla will be the Site Administrator.
Once this is complete, we can begin configuring Omeka and loading the data.
The Map and Data Visualization Component:
We originally thought to use Neatline for the map component of our project, as it plugs into Omeka. We are re-thinking this, as we haven’t found any resources at the GC who can help us learn this product, and we don’t think we’ll have enough time to accomplish this by the end of the semester.
In the meantime, Patty and Pam will be meeting with Javier (Digital Fellows) to explore if we can use Carto for a few visualizations.
Camilla is exploring other tools to use for data visualizations. Patty and Pam also spoke with Augustine about using R or Python to create some basic visualizations. More to come on this.
3/26-4/2 Goals
Hi teams,
As we discussed today, these are some things to potentially keep in mind as you work on projects this week. As usual, feel free to skip or ignore anything that is not relevant to your work plan:
Group updates may be posted any time between now and the end of the week (ideally Friday, but before Sunday)