Coder Social home page Coder Social logo

collection's Introduction

DOI

#Welcome to the Carnegie Museum of Art’s Collection Dataset. In celebration of our 120th anniversary, Carnegie Museum of Art is making public the collections records of all of its accessioned artworks. This release contains data on approximately 28269 objects across all departments of the museum; fine arts, decorative arts, photography, contemporary art, and the Heinz Architectural Center.

Additionally, the metadata for the Teenie Harris Archive has been included. For ease of use, they are contained within their own files, but it includes approximately 59031 records using the same structure and format.

In this repository, you will find the files containing all of the records, as well as a description of the data, the data structure, and some guidelines on using the data. Please take a minute to familiarize yourself with the structure and guidelines below.

Your feedback and input is always welcome. If you’ve got questions or suggestions, please send them our way with “GitHub Dataset” in the subject header.

Data Structure

This data release includes nearly all accessioned works in our database. It contains basic data for each work.

The data is released in two forms, as a CSV dump (cmoa.csv & teenie.csv) and as a JSON dump (cmoa.json & teenie.json). The data contained in both formats are identical—you may choose the form that makes most sense to you. Please note that both the CSV and the JSON may contain newlines (\n) within any text field, and they often appear within the provenance, medium and credit_line fields.

For ease of use, we also provide individual JSON files for each work in the cmoa and teenie directories. These follow the same metadata format as the bulk file download, but are smaller and easier to read. Each file is named with the GUID portion of the object's ID. We have also included an index.json file within that directory that lists the ID, title, and first image (when available) for each object.

Metadata Format

Artwork Information

Header/JSON object Key Type Required? Description Example
title String Mandatory The main title that identifies the object or artwork. No multiples. Portrait of A Boy OR Wheatfields After the Rain.
creation_date String Optional The human readable date of creation for the object. Note that this is a string and may not be a valid date. c. 1950” OR date unknown
creation_date_earliest ISO_8601 date Optional This is the earliest date the object could have been created. May be null if no date known. May be the same as creation_date_latest, which indicates an exact date known. 1990-10-17
creation_date_latest ISO_8601 date Optional This is the latest date the object could have been created. May be null if no date known. May be the same as creation_date_earliest, which indicates an exact date known. 1990-10-17
medium String Optional Material of which this is this object/artwork is made. Oil on canvas OR Acrylic on board OR Plastic, glass, and rubber
accession_number String Mandatory This is a number assigned by the museum when it takes official ownership of an object. 1 2001.45.3 OR 2013.29.1A-B OR 96.1
id String (GUID) Mandatory A unique string that identifies the record of the object in the collections database. 692a68c5-af1e-4124-80f1-cbf38be51abe
credit_line String Mandatory Identifies and gives credit to the person, foundation, or method by which the object was acquired. Gift of John Doe OR Museum Purchase, by Exchange.
date_acquired ISO_8601 date Mandatory The date the object became the legal property of the museum. 1990-10-17
department String Optional The department within the museum that is responsible for the item. Fine Arts OR Decorative Arts OR Photography OR Contemporary Art
physical_location String Optional The location of the object/artwork within the museum. When an object is on view in the galleries, a specific gallery location is given. When an object is in storage, the location will only say Not on View. If an object is on loan to another institution, it will say on loan. Scaife Gallery 8 OR On loan OR Not on view.
item_width Number Optional The maximum width of the artwork/object in inches. 11.5
item_height Number Optional The maximum height of the artwork/object in inches. 11.5
item_depth Number Optional The maximum depth of the artwork/object in inches. 14.5
item_diameter Number Optional The maximum diameter of the artwork/object in inches. 180.53
web_url String Optional The URL of the collection page for this item.
provenance_text String Optional The ownership history of an object/artwork. 2 Mary Cassatt [1844-1926], France; Galeries Durand-Ruel, Paris, France, by August 1892 [1]; Durand-Ruel Galleries, New York, NY, 1895; purchased by Department of Fine Arts, Carnegie Institute, Pittsburgh, PA, October 1922. NOTES: [1] Recorded in stock book in August 1892.
classification String Optional The name of a group to which the work belongs within the museum's classification scheme, based on similar characteristics. Prints OR Photographs

Image Information

There may be more than one image associated with an artwork. In the CSV, each column for the images may contain a pipe-separated list of values. For the JSON, there will be a images key containing a nested array of objects—one for each image.

Note that the image linked from this URL is NOT released under CCO at this time.

Header/JSON object Key Type Required? Description Example
image_url Array of Strings Optional The URL of a thumbnail image of the artwork.
image_rights Array of Strings Optional The rights text associated with the linked image. (not currently exported)

Artist Information

Note that there may be more than one creator associated with an artwork. In the CSV, each column for the creator may contain a pipe-separated list of values. For the JSON, there will be a creator key containing a nested array of objects—one for each creator.

Header/JSON object Key Type Required? Description Example
artist_id String Mandatory This is a unique identifier for the artist. 123456
party_type String Mandatory This is the type of entity represented. Possible values are: Organization OR Person OR Collaboration
full_name String Mandatory The full name of the artist, creator, or creators, who made the object. John Singer Sargent.
cited_name String Mandatory The name of the artist as used in a standard citation, with surname first, and forename last. Cassatt, Mary.
role String Optional Describes a person’s involvement with this object. designer, manufacturer, artist.
nationality String Optional The nationality of the artist/creator. French, American, Italian.
birth_date ISO_8601 date Optional The birthdate of the artist/creator. Precision may vary based on how much is known about the artist 1959-01-01 OR 1959
death_date ISO_8601 date Optional The death date of the artist/creator. Precision may vary based on how much is known about the artist 1959-01-01 OR 1959
birth_place String Optional Name of place of birth, with as much specificity as possible, preference is for City, Country if known. If city is unknown, then list only country. Paris, France.
death_place String Optional Name of place of death, with as much specificity as possible, preference is for City, Country if known. If city is unknown, then list only country. Paris, France.

1: Accession Number Details: It is in two or more parts, each part separated by a period. The first number indicates the year it was acquired (yy or yyyyy), the middle part generally identifies the lot in which it was given, and the third number identifies the specific item within the lot. Letters are also used to identify removable parts of a specific item, such as a coffee pot(A) and the lid to the coffee pot(B). Items acquired before 1996 begin with a 2 digit year identifier, so 96.1 refers to 1896. After 1996, four digit numbers were used.

2: Provenance Text Details: The provenance is listed in chronological order, beginning with the earliest known owner. Life dates of owners, if known, are enclosed in brackets. Uncertain information is indicated by the terms “possibly” or “probably” and explained in footnotes. Dealers, auction houses, or agents are enclosed in parentheses to distinguish them from private owners. Relationships between owners and methods of transactions are indicated by punctuation: a semicolon is used to indicate that the work passed directly between two owners (including dealers, auction houses, or agents), and a period is used to separate two owners (including dealers, auction houses, or agents) if a direct transfer did not occur or is not known to have occurred. Footnotes are used to document or clarify information.


Usage Guidelines

The dataset contains the data and metadata of approximately 28269 objects in the collection of Carnegie Museum of Art and another approximately 59031 records from the Teenie Harris Archive in Pittsburgh, PA, USA. We are providing this data without restrictions for all to enjoy. We've got a few guidelines, but we've worked hard to make this dataset as open and explorable as possible.

Please contact us if you have any questions.

Image usage

The dataset provides links to collections images on CMOA’s collections search page, but does not provide the images themselves.

Images are not covered under the same license as the dataset.

If you would like to license images of artworks in CMOA’s collection, please contact the Rights and Reproduction Department.

Dataset Integrity

Collections data is provided for the purposes of exploration, education, experimentation, and fun, but it is to be used at your own risk.

Please be aware that the dataset contains incomplete data and/or errors. CMOA staff does not guarantee or provide curatorial approval for these records.

At CMOA, research is always ongoing, and so our understanding of these objects and their metadata are subject to change. This dataset will be updated on a regular basis to reflect our current best understanding of the object. You are advised to use, or update to, the most current version of the dataset for best accuracy.

If you have identified errors in the dataset, or have additional information to add, we welcome your feedback! Please contact us at mailto:[email protected].

Thanks!

Pull Requests

Please note that we will not accept pull requests for the data in this repository. If you have corrections, please email them to us at mailto:[email protected] and we will forward them to the appropriate department for correction and inclusion in a future release. We will, however, review issues and pull requests for documentation or other non-data content here, as well as issues or suggestions of how we could improve these releases.

Attribution

Our dataset is being offered under CC0 1.0 Universal license.

We respectfully ask that you acknowledge CMOA as a source wherever possible, in order to preserve a link to the dataset. If this data is to be cited in a publication, please cite it using this DOI: DOI. By providing acknowledgement or citation, you enable others to verify, replicate, and further explore your presentation and interpretation of our data. And it’s just nice.

No Endorsement/Representation

Use of this dataset does not grant or imply CMOA’s approval, commission, or support of your work. CMOA retains the rights to all of its trademarks, and they are not part of the dataset. If you transform or modify to the dataset, you must clearly distinguish the resulting work as having been modified from the CMOA dataset. If you create a derivative dataset from the CMOA dataset, we ask that you consider releasing the derivative under a CC0 license, which mirrors the licensing of the CMOA dataset.

Acknowledgement

The writers owe a debt to MoMA, Tate, and Cooper Hewitt for their help in shaping these guidelines, and their leadership in this area. Cheers!

collection's People

Contributors

davbre avatar kulas avatar mdlincoln avatar workergnome avatar zacyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collection's Issues

Contributor policies

This question may be part of a bigger discussion, but what would a CONTRIBUTOR file look like for this repo, given that (like all the museum datasets out there, I'd wager) it is a generated extract of an upstream CMS?

This would be relevant particularly for PRs/issues that address data content. Over at the @tategallery they have accepted PRs that have addressed typos in the original data, presumably following some internal process for integrating those changes upstream into their CMS. @MuseumofModernArt doesn't have any PRs quite like that, but there are a few issues where users have pointed out typos, and maintainers note that the fixes will be made to the source CMS and reflected downstream in the next repo update. Depending on the internal workflow that you settle on, it might be useful to let contributors know that issues, rather than PRs, will be welcomed; or that PRs are welcome but will not be accepted into the repo, instead being addressed in the next content update, etc.

Missing DOI

We reference a DOI, but we haven't actually generated one yet.

Specify total number of objects

I'd add the expected number of objects to the README (27371 by my count), since #1 means someone might naively run wc -l cmoa.csv and get 30128 and get confused.

\n within fields

It's probably worth warning the user that they will find \n interspersed in the provenance fields, as this can (annoyingly) choke some CSV parsers. Do line breaks occur in any other fields?

URLs in JSON files lead to 404 errors

I notice that many if not all the URLs in your JSON files lead to 404 Not Found errors.

example:

in file: https://github.com/cmoa/collection/blob/master/cmoa/0018b42c-b408-4070-94a7-843288cceb9a.json
Line 17:
"web_url": "http://collection.cmoa.org/CollectionDetail.aspx?item=1023910",

web_url here leads to a 404 Not Found error

Line 17:
"image_url": "http://collection.cmoa.org/CollectionImage.aspx?irn=72723&size=Medium"

image_url here leads to a 404 Not Found error

but this link works:

https://collection.cmoa.org/objects/0018b42c-b408-4070-94a7-843288cceb9a

Perhaps all the URLs need updating?

Extract all is discarding preceding nulls

In the processing script, if there is a NULL,something, we are discarding the position information. This is a problem.

(Need to find an example—this is an old bug from a note I wrote to myself a while ago.)

Missing columns + incorrect date value (Good Tables report)

I forked this repository and slightly edited the datapackage.json to remove the schema references which now work differently (http://specs.frictionlessdata.io/tabular-data-resource/). Unfortunately, this involves literally copying the schema across two different resources. This is necessary to use the tools until they catch up to v1 of the specs and support JSON Pointers.

After that, I ran goodtables datapackage datapackage.json (https://github.com/frictionlessdata/goodtables-py) which will test the first 1000 rows. It looks like the [missing-value] errors result from having too few columns (commas) for these rows. It looks like several columns look something like below:

a,b,c
1,2
3,4
5,6

as opposed to:

a,b,c
1,2,
3,4,
5,6,

Finally, on row 991, there is an incorrectly formatted date: -0001-01-03.

There may be more errors like this. This is only for the first 1000 rows.

$ goodtables datapackage datapackage.json

[6,29] [missing-value] Row 6 has a missing value in column 29
[7,29] [missing-value] Row 7 has a missing value in column 29
[8,29] [missing-value] Row 8 has a missing value in column 29
[9,29] [missing-value] Row 9 has a missing value in column 29
[10,29] [missing-value] Row 10 has a missing value in column 29
[11,29] [missing-value] Row 11 has a missing value in column 29
[12,29] [missing-value] Row 12 has a missing value in column 29
[13,29] [missing-value] Row 13 has a missing value in column 29
[14,29] [missing-value] Row 14 has a missing value in column 29
[15,29] [missing-value] Row 15 has a missing value in column 29
[33,29] [missing-value] Row 33 has a missing value in column 29
[150,29] [missing-value] Row 150 has a missing value in column 29
[165,29] [missing-value] Row 165 has a missing value in column 29
[339,29] [missing-value] Row 339 has a missing value in column 29
[356,29] [missing-value] Row 356 has a missing value in column 29
[358,29] [missing-value] Row 358 has a missing value in column 29
[366,29] [missing-value] Row 366 has a missing value in column 29
[370,29] [missing-value] Row 370 has a missing value in column 29
[372,29] [missing-value] Row 372 has a missing value in column 29
[373,29] [missing-value] Row 373 has a missing value in column 29
[405,29] [missing-value] Row 405 has a missing value in column 29
[412,29] [missing-value] Row 412 has a missing value in column 29
[413,29] [missing-value] Row 413 has a missing value in column 29
[414,29] [missing-value] Row 414 has a missing value in column 29
[415,29] [missing-value] Row 415 has a missing value in column 29
[416,29] [missing-value] Row 416 has a missing value in column 29
[417,29] [missing-value] Row 417 has a missing value in column 29
[418,29] [missing-value] Row 418 has a missing value in column 29
[422,29] [missing-value] Row 422 has a missing value in column 29
[424,29] [missing-value] Row 424 has a missing value in column 29
[441,29] [missing-value] Row 441 has a missing value in column 29
[443,29] [missing-value] Row 443 has a missing value in column 29
[444,29] [missing-value] Row 444 has a missing value in column 29
[445,29] [missing-value] Row 445 has a missing value in column 29
[448,29] [missing-value] Row 448 has a missing value in column 29
[458,29] [missing-value] Row 458 has a missing value in column 29
[459,29] [missing-value] Row 459 has a missing value in column 29
[479,29] [missing-value] Row 479 has a missing value in column 29
[480,29] [missing-value] Row 480 has a missing value in column 29
[481,29] [missing-value] Row 481 has a missing value in column 29
[482,29] [missing-value] Row 482 has a missing value in column 29
[483,29] [missing-value] Row 483 has a missing value in column 29
[484,29] [missing-value] Row 484 has a missing value in column 29
[488,29] [missing-value] Row 488 has a missing value in column 29
[492,29] [missing-value] Row 492 has a missing value in column 29
[503,29] [missing-value] Row 503 has a missing value in column 29
[527,29] [missing-value] Row 527 has a missing value in column 29
[529,29] [missing-value] Row 529 has a missing value in column 29
[546,29] [missing-value] Row 546 has a missing value in column 29
[550,29] [missing-value] Row 550 has a missing value in column 29
[552,29] [missing-value] Row 552 has a missing value in column 29
[557,29] [missing-value] Row 557 has a missing value in column 29
[565,29] [missing-value] Row 565 has a missing value in column 29
[572,29] [missing-value] Row 572 has a missing value in column 29
[574,29] [missing-value] Row 574 has a missing value in column 29
[644,29] [missing-value] Row 644 has a missing value in column 29
[645,29] [missing-value] Row 645 has a missing value in column 29
[648,29] [missing-value] Row 648 has a missing value in column 29
[652,29] [missing-value] Row 652 has a missing value in column 29
[655,29] [missing-value] Row 655 has a missing value in column 29
[661,29] [missing-value] Row 661 has a missing value in column 29
[667,29] [missing-value] Row 667 has a missing value in column 29
[670,29] [missing-value] Row 670 has a missing value in column 29
[671,29] [missing-value] Row 671 has a missing value in column 29
[729,29] [missing-value] Row 729 has a missing value in column 29
[730,29] [missing-value] Row 730 has a missing value in column 29
[754,29] [missing-value] Row 754 has a missing value in column 29
[764,29] [missing-value] Row 764 has a missing value in column 29
[766,29] [missing-value] Row 766 has a missing value in column 29
[767,29] [missing-value] Row 767 has a missing value in column 29
[769,29] [missing-value] Row 769 has a missing value in column 29
[773,29] [missing-value] Row 773 has a missing value in column 29
[776,29] [missing-value] Row 776 has a missing value in column 29
[786,29] [missing-value] Row 786 has a missing value in column 29
[837,29] [missing-value] Row 837 has a missing value in column 29
[841,29] [missing-value] Row 841 has a missing value in column 29
[842,29] [missing-value] Row 842 has a missing value in column 29
[844,29] [missing-value] Row 844 has a missing value in column 29
[884,29] [missing-value] Row 884 has a missing value in column 29
[891,29] [missing-value] Row 891 has a missing value in column 29
[892,29] [missing-value] Row 892 has a missing value in column 29
[893,29] [missing-value] Row 893 has a missing value in column 29
[898,29] [missing-value] Row 898 has a missing value in column 29
[899,29] [missing-value] Row 899 has a missing value in column 29
[904,29] [missing-value] Row 904 has a missing value in column 29
[905,29] [missing-value] Row 905 has a missing value in column 29
[918,29] [missing-value] Row 918 has a missing value in column 29
[951,29] [missing-value] Row 951 has a missing value in column 29
[957,29] [missing-value] Row 957 has a missing value in column 29
[960,29] [missing-value] Row 960 has a missing value in column 29
[961,29] [missing-value] Row 961 has a missing value in column 29
[962,29] [missing-value] Row 962 has a missing value in column 29
[965,29] [missing-value] Row 965 has a missing value in column 29
[968,29] [missing-value] Row 968 has a missing value in column 29
[969,29] [missing-value] Row 969 has a missing value in column 29
[973,29] [missing-value] Row 973 has a missing value in column 29
[974,29] [missing-value] Row 974 has a missing value in column 29
[986,29] [missing-value] Row 986 has a missing value in column 29
[987,29] [missing-value] Row 987 has a missing value in column 29
[988,29] [missing-value] Row 988 has a missing value in column 29
[989,29] [missing-value] Row 989 has a missing value in column 29
[991,3] [non-castable-value] Row 991 has non castable value -0001-01-03 in column 3 (type: date, format: default)
[1000,29] [missing-value] Row 1000 has a missing value in column 29

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.