Defining Data

What is data? Or should that question be “What are data?” It’s a bit of a cliché to begin discussing a complex concept by turning to Webster’s, but if ever there was a case for it, this is it:


Data leads a life of its own quite independent of datum, of which it was originally the plural. It occurs in two constructions: as a plural noun (like earnings), taking a plural verb and plural modifiers (such as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns (such as they, them); and as an abstract mass noun (like information), taking a singular verb and singular modifiers (such as this, much, little), and being referred to by a singular pronoun (it). Both constructions are standard. The plural construction is more common in print, evidently because the house style of several publishers mandates it.

Long story short: there’s no right answer. In English, you’ll encounter data sometimes as a singular (mass) noun and sometimes as a plural one. Occasionally context seems to make one or the other choice more natural. Unless you’re writing for publication, it’s all right to go back and forth between them.

However you construe data in English, it’s useful to remember that, as we’ve just seen, in Latin it’s the plural of datum, a “(thing) given.” Thus, one way to understand data is as things. Perhaps because data plays such a large role in the natural and social sciences, there’s a tendency to associate data with one type of thing: numbers. According to, data enters the English language in the 17th century, around the time of the Western scientific revolution. From the 1890s it takes on the now familiar sense of “numerical facts collected for future reference”—perhaps (since the 1960s), in a computer database.

But the world is full of things that aren’t numerical. These things can be data, too. Geographical locations can be treated as data. So can proper names and street addresses. So can the words in a dictionary or a poem or a novel.

Research data

The quotations below offers a variety of perspectives on research data in particular.

Material or information on which an argument, theory, test or hypothesis, or another research output is based.

Queensland University of Technology. Manual of Procedures and Policies. Section 2.8.3.

What constitutes such data will be determined by the community of interest through the process of peer review and program management. This may include, but is not limited to: data, publications, samples, physical collections, software and models.

Marieke Guy

Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues.

OMB-110, Subpart C, section 36, (d) (i)

The short answer is that we can’t always trust empirical measures at face value: data is always biased, measurements always contain errors, systems always have confounders, and people always make assumptions

Angela Bassa

Broadly, we can understand research data as materials or information necessary to come to a conclusion. What constitutes such materials and information will depend on the project in question.

Forms of data

Just as there are many possible sources of data, there are many ways to represent data. Here are some examples:

  • Non-digital text (lab books, field notebooks)
  • Digital texts or digital copies of text
  • Statistical analysis (SPSS, SAS, R)
  • Scientific sample collections
  • Data visualizations
  • Computer code
  • Standard operating procedures and protocols
  • Protein or genetic sequences
  • Artistic products
  • Curriculum materials (e.g. course syllabi)
  • Spreadsheets (e.g. .xlsx, .numbers, .csv)
  • Audio (e.g. .mp3, .wav, .aac)
  • Video (e.g. .mov, .mp4)
  • Computer Aided Design/CAD (.cad)
  • Databases (e.g. .sql)
  • Geographic Information Systems (GIS) and spatial data (e.g. .shp, .dbf, .shx)
  • Digital copies of images (e.g. .png, .jpeg, .tiff)
  • Web files (e.g. .html, .asp, .php)
  • Matlab files & 3D Models (e.g. .stl, .dae, .3ds)
  • Metadata & Paradata (e.g. .xml, .json)
  • Collection of digital objects acquired and generated during research