The Data Critique

Explore what the dataset includes, excludes, and overlooks

How was the data created?

This data was created from the Museum of Modern Art (MoMA)’s internal collection database, which catalogs works that have been accessioned into the MoMA’s collection since its establishment in 1929. MoMA regularly updates this data with new and revised records and releases it “as is” for research, meaning some entries may be incomplete or not yet curatorially approved. The dataset includes only text-based metadata, with no images.

The artworks dataset includes metadata such as title, artist, date created, medium, dimensions, and date acquired. It is important to note that some of these records are marked as “not Curator Approved” to indicate incomplete information.

The artists dataset contains metadata for each artist, including their name, nationality, gender, birth and death year, and unique identifiers (Wiki QID and Getty ULAN ID). Both datasets are available in CSV and JSON formats encoded in UTF-8.

Image of The Museum of Modern Art (MoMA) collection data on Github

Where do the sources come from?

The sources for the Museum of Modern Art (MoMA) collection dataset come directly from MoMA’s own internal records and institutional collection database.

The data set consists of two CSV files, one documenting every artwork and the other representing artists, each cataloged and maintained by MoMA itself. All of the information is curated and regularly updated by their staff, meaning that the dataset is not compiled from secondary sources or third parties and instead reflects primary data managed by MoMA. This helps ensure the accuracy and institutional authenticity of the dataset we are using for our project. The GitHub repository also makes this data openly available for research under a CC0 public domain dedication and allows citation by the general public through providing a DOI.

Who funded and maintains the dataset?

The collection dataset released by MoMA on GitHub does not identify any specific external funders. Neither the official README nor MoMA’s “About the Collection” page lists dedicated financial support for this dataset.

The data is released under CC0 license and regularly updated from the museum’s internal database as part of ongoing research and digital management operations, with updates largely automated via the MoMA Collection bot.

In comparison, MoMA’s exhibitions dataset and exhibition history project are clearly labeled as funded by the Leon Levy Foundation, which shows that funding sources vary across the museum’s open data projects. Since the collection dataset does not disclose a specific funder, its maintenance and release are likely supported through MoMA’s regular operational funds and general institutional resources rather than a dedicated external grant.

What information is missing or excluded?

Because MoMA releases the dataset “as is” for research purposes, some information are absent or incomplete. MoMA notes that many records have not been curatorially approved, and users are cautioned that the data may contain gaps or inconsistencies.

Building on this, we can see that the dataset documents technical details such as medium, artist name, and acquisition date, but it does not include information about the decision-making processes behind acquisitions. There is no metadata indicating who selected a work, what criteria guided the acquisition, or how institutional priorities shaped collecting practices. As a result, the dataset does not capture the contextual factors that influence why certain artworks enter MoMA’s collection. Similarly, the dataset does not provide information on an artist’s race, artistic movement, or the social and historical contexts surrounding each artwork. These details are central to understanding an artwork’s significance, yet they are not represented in the metadata.

How does MoMA define “woman”? How do we define it?

MoMA’s Artists dataset includes a “Gender” field, but the entries reveal an inconsistent and institutionally shaped approach to gender classification. While most artists are labeled simply as “male” or “female,” the dataset also contains variations such as “male (trans? ftm?)” and “female (transwoman)”, as seen with artists like Anton Prinner and Tadáskía. These annotations suggest that MoMA occasionally identifies or speculates about transgender identities, but without a standardized system.

For the purpose of our project, we use the gender values provided in the dataset because they determine how artists appear in MoMA’s records and therefore shape the visualizations we created. We define “women artists” as all artists labeled “Female” in the dataset, including transgender women, while recognizing that this classification is limited, historically constructed, and not fully inclusive.

What are the ideological effects of these definitions?

MoMA’s gender definitions carry significant ideological effects because they reproduce a binary and institutionally controlled understanding of gender. By categorizing artists primarily as “male” or “female,” with only occasional and inconsistent notes like “transwoman” or “trans? ftm?,” the dataset reinforces the idea that gender is fixed, knowable, and classifiable through institutional authority rather than self-identification.