Contents:
A shapefile is a set of files, often uploaded/transferred in .zip format. This set may contain up to fifteen files. A minimum of three specific files (.shp, .shx, .dbf) are needed to be a valid shapefile and a fourth file (.prj) is required for WorldMap – or any type of meaningful visualization.
For ingest and connecting to WorldMap, four files are the minimum required:
When uploaded to Dataverse, the .zip is unpacked (same as all .zip files). Shapefile sets are recognized by the same base name and specific extensions. These individual files constitute a shapefile set. The first four are the minimum required (.shp, .shx, .dbf, .prj)
For example:
Upon recognition of the four required files, Dataverse will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:
Then Dataverse creates a new .zip with mimetype as a shapefile. The shapefile set will persist as this new .zip.
1a. Original .zip contents:
A file named bikes_and_subways.zip is uploaded to the Dataverse. This .zip contains the following files.
1b. Dataverse unzips and re-zips files:
Upon ingest, Dataverse unpacks the file bikes_and_subways.zip. Upon recognizing the shapefile sets, it groups those files together into new .zip files:
To ensure that a shapefile set remains intact, individual files such as bicycles.sbn are kept in the set – even though they are not used for mapping.
1c. Dataverse final file listing:
For two “final” shapefile sets, bicycles.zip and subway_line.zip, a new mimetype is used:
WorldMap supplies target layers – or JoinTargets – that a tabular file may be mapped against. A JSON description of these CGA-curated JoinTargets may be retrieved via API at http://worldmap.harvard.edu/datatables/api/jointargets/. Please note: login is required. You may use any WorldMap account credentials via HTTP Basic Auth.
Example of JoinTarget information returned via the API:
{
      "data":[
        {
          "layer":"geonode:census_tracts_2010_boston_6f6",
          "name":"Census Tracts, Boston (GEOID10: State+County+Tract)",
          "geocode_type_slug":"us-census-tract",
          "geocode_type":"US Census Tract",
          "attribute":{
            "attribute":"CT_ID_10",
            "type":"xsd:string"
          },
          "abstract":"As of the 2010 census, Boston, MA contains 7,288 city blocks [truncated for example]",
          "title":"Census Tracts 2010, Boston (BARI)",
          "expected_format":{
            "expected_zero_padded_length":-1,
            "is_zero_padded":false,
            "description":"Concatenation of state, county and tract for 2010 Census Tracts.  Reference: https://www.census.gov/geo/maps-data/data/tract_rel_layout.html\r\n\r\nNote:  Across the US, this can be a zero-padded \"string\" but the original Boston layer has this column as \"numeric\" ",
            "name":"2010 Census Boston GEOID10 (State+County+Tract)"
          },
          "year":2010,
          "id":28
        },
        {
          "layer":"geonode:addresses_2014_boston_1wr",
          "name":"Addresses, Boston",
          "geocode_type_slug":"boston-administrative-geography",
          "geocode_type":"Boston, Administrative Geography",
          "attribute":{
            "attribute":"LocationID",
            "type":"xsd:int"
          },
          "abstract":"Unique addresses present in the parcels data set, which itself is derived from [truncated for example]",
          "title":"Addresses 2015, Boston (BARI)",
          "expected_format":{
            "expected_zero_padded_length":-1,
            "is_zero_padded":false,
            "description":"Boston, Administrative Geography, Boston Address Location ID.  Example: 1, 2, 3...nearly 120000",
            "name":"Boston Address Location ID (integer)"
          },
          "year":2015,
          "id":18
        },
        {
          "layer":"geonode:bra_neighborhood_statistical_areas_2012__ug9",
          "name":"BRA Neighborhood Statistical Areas, Boston",
          "geocode_type_slug":"boston-administrative-geography",
          "geocode_type":"Boston, Administrative Geography",
          "attribute":{
            "attribute":"BOSNA_R_ID",
            "type":"xsd:double"
          },
          "abstract":"BRA Neighborhood Statistical Areas 2015, Boston. Provided by [truncated for example]",
          "title":"BRA Neighborhood Statistical Areas 2015, Boston (BARI)",
          "expected_format":{
            "expected_zero_padded_length":-1,
            "is_zero_padded":false,
            "description":"Boston, Administrative Geography, Boston BRA Neighborhood Statistical Area ID (integer).  Examples: 1, 2, 3, ... 68, 69",
            "name":"Boston BRA Neighborhood Statistical Area ID (integer)"
          },
          "year":2015,
          "id":17
        }
      ],
      "success":true
}
When a user attempts to map a tabular file, the application looks in the Geoconnect database for JoinTargetInformation. If this information is more than 10 minutes* old, the application will retrieve fresh information and save it to the db.
(* Change the timing via the Django settings variable JOIN_TARGET_UPDATE_TIME.)
This JoinTarget info is used to populate HTML forms used to match a tabular file column to a JoinTarget column. Once a JoinTarget is chosen, the JoinTarget ID is an essential piece of information used to make an API call to the WorldMap and attempt to map the file.
The get_join_targets() function in dataverse_layer_services.py uses the WorldMap API, retrieves a list of available tabular file JointTargets. (See the dataverse_layer_services code in GitHub.)
The get_latest_jointarget_information() in utils.py retrieves recent JoinTarget Information from the database. (See the utils code in GitHub.)
Previous: Universal Numerical Fingerprint (UNF) | Next: SELinux