Contents:
A shapefile is a set of files, often uploaded/transferred in .zip format. This set may contain up to fifteen files. A minimum of three specific files (.shp, .shx, .dbf) are needed to be a valid shapefile and a fourth file (.prj) is required for WorldMap – or any type of meaningful visualization.
For ingest and connecting to WorldMap, four files are the minimum required:
When uploaded to Dataverse, the .zip is unpacked (same as all .zip files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (.shp, .shx, .dbf, .prj)
For example:
Upon recognition of the four required files, Dataverse will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:
Then Dataverse creates a new .zip with mimetype as a shapefile. The shapefile set will persist as this new .zip.
1a. Original .zip contents:
A file named bikes_and_subways.zip is uploaded to the Dataverse. This .zip contains the following files.
1b. Dataverse unzips and re-zips files:
Upon ingest, Dataverse unpacks the file bikes_and_subways.zip. Upon recognizing the shapefile sets, it groups those files together into new .zip files:
To ensure that a shapefile set remains intact, individual files such as bicycles.sbn are kept in the set – even though they are not used for mapping.
1c. Dataverse final file listing:
For two “final” shapefile sets, bicycles.zip and subway_line.zip, a new mimetype is used:
WorldMap supplies target layers – or JoinTargets – that a tabular file may be mapped against. A JSON description of these CGA-curated JoinTargets may be retrieved via API at http://worldmap.harvard.edu/datatables/api/jointargets/. Please note: login is required. You may use any WorldMap account credentials via HTTP Basic Auth.
Example of JoinTarget information returned via the API:
{
"data":[
{
"layer":"geonode:census_tracts_2010_boston_6f6",
"name":"Census Tracts, Boston (GEOID10: State+County+Tract)",
"geocode_type_slug":"us-census-tract",
"geocode_type":"US Census Tract",
"attribute":{
"attribute":"CT_ID_10",
"type":"xsd:string"
},
"abstract":"As of the 2010 census, Boston, MA contains 7,288 city blocks [truncated for example]",
"title":"Census Tracts 2010, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Concatenation of state, county and tract for 2010 Census Tracts. Reference: https://www.census.gov/geo/maps-data/data/tract_rel_layout.html\r\n\r\nNote: Across the US, this can be a zero-padded \"string\" but the original Boston layer has this column as \"numeric\" ",
"name":"2010 Census Boston GEOID10 (State+County+Tract)"
},
"year":2010,
"id":28
},
{
"layer":"geonode:addresses_2014_boston_1wr",
"name":"Addresses, Boston",
"geocode_type_slug":"boston-administrative-geography",
"geocode_type":"Boston, Administrative Geography",
"attribute":{
"attribute":"LocationID",
"type":"xsd:int"
},
"abstract":"Unique addresses present in the parcels data set, which itself is derived from [truncated for example]",
"title":"Addresses 2015, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Boston, Administrative Geography, Boston Address Location ID. Example: 1, 2, 3...nearly 120000",
"name":"Boston Address Location ID (integer)"
},
"year":2015,
"id":18
},
{
"layer":"geonode:bra_neighborhood_statistical_areas_2012__ug9",
"name":"BRA Neighborhood Statistical Areas, Boston",
"geocode_type_slug":"boston-administrative-geography",
"geocode_type":"Boston, Administrative Geography",
"attribute":{
"attribute":"BOSNA_R_ID",
"type":"xsd:double"
},
"abstract":"BRA Neighborhood Statistical Areas 2015, Boston. Provided by [truncated for example]",
"title":"BRA Neighborhood Statistical Areas 2015, Boston (BARI)",
"expected_format":{
"expected_zero_padded_length":-1,
"is_zero_padded":false,
"description":"Boston, Administrative Geography, Boston BRA Neighborhood Statistical Area ID (integer). Examples: 1, 2, 3, ... 68, 69",
"name":"Boston BRA Neighborhood Statistical Area ID (integer)"
},
"year":2015,
"id":17
}
],
"success":true
}
When a user attempts to map a tabular file, the application looks in the Geoconnect database for JoinTargetInformation. If this information is more than 10 minutes* old, the application will retrieve fresh information and save it to the db.
(* Change the timing via the Django settings variable JOIN_TARGET_UPDATE_TIME.)
This JoinTarget info is used to populate HTML forms used to match a tabular file column to a JoinTarget column. Once a JoinTarget is chosen, the JoinTarget ID is an essential piece of information used to make an API call to the WorldMap and attempt to map the file.
The get_join_targets() function in dataverse_layer_services.py uses the WorldMap API, retrieves a list of available tabular file JointTargets. (See the dataverse_layer_services code in GitHub.)
The get_latest_jointarget_information() in utils.py retrieves recent JoinTarget Information from the database. (See the utils code in GitHub.)
Previous: Universal Numerical Fingerprint (UNF) | Next: SELinux