Geospatial Data

How The Dataverse Software Ingests Shapefiles

A shapefile is a set of files, often uploaded/transferred in .zip format. This set may contain up to fifteen files. A minimum of three specific files (.shp, .shx, .dbf) are needed to be a valid shapefile and a fourth file (.prj) is required for some applications – or any type of meaningful visualization.

For ingest, four files are the minimum required:

  • .shp - shape format; the feature geometry itself
  • .shx - shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
  • .dbf - attribute format; columnar attributes for each shape, in dBase IV format
  • .prj - projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format

Ingest

When uploaded to a Dataverse installation, the .zip is unpacked (same as all .zip files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (.shp, .shx, .dbf, .prj)

For example:

  • bicycles.shp (required extension)
  • bicycles.shx (required extension)
  • bicycles.prj (required extension)
  • bicycles.dbf (required extension)
  • bicycles.sbx (NOT required extension)
  • bicycles.sbn (NOT required extension)

Upon recognition of the four required files, the Dataverse installation will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:

  • Required: .shp, .shx, .dbf, .prj
  • Optional: .sbn, .sbx, .fbn, .fbx, .ain, .aih, .ixs, .mxs, .atx, .cpg, shp.xml

Then the Dataverse installation creates a new .zip with mimetype as a shapefile. The shapefile set will persist as this new .zip.

Example

1a. Original .zip contents:

A file named bikes_and_subways.zip is uploaded to the Dataverse installation. This .zip contains the following files.

  • bicycles.shp (shapefile set #1)
  • bicycles.shx (shapefile set #1)
  • bicycles.prj (shapefile set #1)
  • bicycles.dbf (shapefile set #1)
  • bicycles.sbx (shapefile set #1)
  • bicycles.sbn (shapefile set #1)
  • bicycles.txt
  • the_bikes.md
  • readme.txt
  • subway_line.shp (shapefile set #2)
  • subway_line.shx (shapefile set #2)
  • subway_line.prj (shapefile set #2)
  • subway_line.dbf (shapefile set #2)

1b. The Dataverse installation unzips and re-zips files:

Upon ingest, the Dataverse installation unpacks the file bikes_and_subways.zip. Upon recognizing the shapefile sets, it groups those files together into new .zip files:

  • files making up the “bicycles” shapefile become a new .zip
  • files making up the “subway_line” shapefile become a new .zip
  • remaining files will stay as they are

To ensure that a shapefile set remains intact, individual files such as bicycles.sbn are kept in the set – even though they are not used for mapping.

1c. The Dataverse installation final file listing:

  • bicycles.zip (contains shapefile set #1: bicycles.shp, bicycles.shx, bicycles.prj, bicycles.dbf, bicycles.sbx, bicycles.sbn)
  • bicycles.txt (separate, not part of a shapefile set)
  • the_bikes.md (separate, not part of a shapefile set)
  • readme.txt (separate, not part of a shapefile set)
  • subway_line.zip (contains shapefile set #2: subway_line.shp, subway_line.shx, subway_line.prj, subway_line.dbf)

For two “final” shapefile sets, bicycles.zip and subway_line.zip, a new mimetype is used:

  • Mimetype: application/zipped-shapefile
  • Mimetype Label: “Shapefile as ZIP Archive”

Previous: Universal Numerical Fingerprint (UNF) | Next: Shibboleth and OAuth