Geospatial Data¶
How The Dataverse Software Ingests Shapefiles¶
A shapefile is a set of files, often uploaded/transferred in .zip
format. This set may contain up to fifteen files. A minimum of three specific files (.shp
, .shx
, .dbf
) are needed to be a valid shapefile and a fourth file (.prj
) is required for some applications – or any type of meaningful visualization.
For ingest, four files are the minimum required:
.shp
- shape format; the feature geometry itself.shx
- shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly.dbf
- attribute format; columnar attributes for each shape, in dBase IV format.prj
- projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
Ingest¶
When uploaded to a Dataverse installation, the .zip
is unpacked (same as all .zip
files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (.shp
, .shx
, .dbf
, .prj
)
For example:
bicycles.shp (required extension)
bicycles.shx (required extension)
bicycles.prj (required extension)
bicycles.dbf (required extension)
bicycles.sbx (NOT required extension)
bicycles.sbn (NOT required extension)
Upon recognition of the four required files, the Dataverse installation will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:
Required:
.shp
,.shx
,.dbf
,.prj
Optional:
.sbn
,.sbx
,.fbn
,.fbx
,.ain
,.aih
,.ixs
,.mxs
,.atx
,.cpg
,shp.xml
Then the Dataverse installation creates a new .zip
with mimetype as a shapefile. The shapefile set will persist as this new .zip
.
Example¶
1a. Original .zip
contents:
A file named bikes_and_subways.zip
is uploaded to the Dataverse installation. This .zip
contains the following files.
bicycles.shp
(shapefile set #1)bicycles.shx
(shapefile set #1)bicycles.prj
(shapefile set #1)bicycles.dbf
(shapefile set #1)bicycles.sbx
(shapefile set #1)bicycles.sbn
(shapefile set #1)bicycles.txt
the_bikes.md
readme.txt
subway_line.shp
(shapefile set #2)subway_line.shx
(shapefile set #2)subway_line.prj
(shapefile set #2)subway_line.dbf
(shapefile set #2)
1b. The Dataverse installation unzips and re-zips files:
Upon ingest, the Dataverse installation unpacks the file bikes_and_subways.zip
. Upon recognizing the shapefile sets, it groups those files together into new .zip
files:
files making up the “bicycles” shapefile become a new
.zip
files making up the “subway_line” shapefile become a new
.zip
remaining files will stay as they are
To ensure that a shapefile set remains intact, individual files such as bicycles.sbn
are kept in the set – even though they are not used for mapping.
1c. The Dataverse installation final file listing:
bicycles.zip
(contains shapefile set #1:bicycles.shp
,bicycles.shx
,bicycles.prj
,bicycles.dbf
,bicycles.sbx
,bicycles.sbn
)bicycles.txt
(separate, not part of a shapefile set)the_bikes.md
(separate, not part of a shapefile set)readme.txt
(separate, not part of a shapefile set)subway_line.zip
(contains shapefile set #2:subway_line.shp
,subway_line.shx
,subway_line.prj
,subway_line.dbf
)
For two “final” shapefile sets, bicycles.zip
and subway_line.zip
, a new mimetype is used:
Mimetype:
application/zipped-shapefile
Mimetype Label: “Shapefile as ZIP Archive”
Previous: Universal Numerical Fingerprint (UNF) | Next: Shibboleth and OAuth