{"_id":"56e1cc1ae63f910e00e5986b","version":{"_id":"55faeacad0e22017005b8268","project":"55faeacad0e22017005b8265","__v":33,"createdAt":"2015-09-17T16:31:06.800Z","releaseDate":"2015-09-17T16:31:06.800Z","categories":["55faeacbd0e22017005b8269","55faf550764f50210095078e","55faf5b5626c341700fd9e96","55faf8a7825d5f19001fa386","560052f91503430d007cc88f","560054f73aa0520d00da0b1a","56005aaf6932a00d00ba7c62","56005c273aa0520d00da0b3f","5601ae7681a9670d006d164d","5601ae926811d00d00ceb487","5601aeb064866b1900f4768d","5601aee850ee460d0002224c","5601afa02499c119000faf19","5601afd381a9670d006d1652","561d4c78281aec0d00eb27b6","561d588d8ca8b90d00210219","563a5f934cc3621900ac278c","5665c5763889610d0008a29e","566710a36819320d000c2e93","56ddf6df8a5ae10e008e3926","56e1c96b2506700e00de6e83","56e1ccc4e416450e00b9e48c","56e1ccdfe63f910e00e59870","56e1cd10bc46be0e002af26a","56e1cd21e416450e00b9e48e","56e3139a51857d0e008e77be","573b4f62ef164e2900a2b881","57c9d1335fd8ca0e006308ed","57e2bd9d1e7b7220000d7fa5","57f2b992ac30911900c7c2b6","58adb5c275df0f1b001ed59b","58c81b5c6dc7140f003c3c46","595412446ed4d9001b3e7b37"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"v1","version_clean":"1.0.0","version":"1"},"__v":4,"category":{"_id":"56e1ccdfe63f910e00e59870","version":"55faeacad0e22017005b8268","project":"55faeacad0e22017005b8265","__v":5,"pages":["56e311b0cb6ef20e0084f24b","56e312096e602e0e00700b2d","56e312bcd46bc30e007bb99a","56e31309cb6ef20e0084f250","56e31357cb6ef20e0084f254"],"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-03-10T19:37:03.043Z","from_sync":false,"order":19,"slug":"write-ingest-vector-services-guide","title":"Write & Ingest Vector Services Guide"},"parentDoc":null,"user":"56267741db1eda0d001c3dbb","project":"55faeacad0e22017005b8265","updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-03-10T19:33:46.563Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"The process of ingesting shapefiles via S3 buckets is relatively straightforward: simply drop a ZIP archive with all the necessary shapefile files into the S3 bucket. However, for some types of shapefiles, some preparation might be necessary, particularly with telling the process how to map shapefile data to vector item fields.\n\n<h2>Shapefile mapping file locations</h2>\n\n The shapefile ingestion process requires a file to describe how to map shapefile columns to vector index fields. That file can be located in one of three different places:\n \n  - internal to the ZIP archive\n  - at the root of the S3 bucket\n  - in a database on the processing system\n\nIn the event that mappings for a bucket exist in more than one place, the mapping internal to the zip takes precedence over the mapping in the S3 bucket, which in turn takes precedence over the mapping in the database.\n\n<h2>Internal mapping file</h2>\n\n Each ZIP archive for shapefiles can contain a file called \"mapping.properties\". If the ingestion process finds a file with that name in the ZIP archive, it will use that file to map shapefile columns to vector item fields.\n\n<h2>S3 bucket root</h2>\n\n If the ingestion process does not find the mapping file in the ZIP, then it will look at the root of the S3 bucket for the mapping file. The file there should also be named \"mapping.properties\".\n\n<h2>Database entry</h2>\n\n If a mapping file isn't found either in the ZIP or in the S3 bucket, the ingestion process will attempt to look up the bucket name in a database. If it finds an entry with a mapping for the bucket, it will use that mapping. If it doesn't, it will not process the shapefile since there are no more automatic places to look for mappings.\n\n<h2>Mapping file format</h2>\n\n A field mapping file defines some information needed by the ingestion process to handle the shapefile, as well as defining some standard fields and default values to use. Note, all columns from a shapefile entry are also automatically included in the 'attributes' map of a vector item.\n\n A field mapping file must include an entry defining the coordinate reference system of the entries in the shapefile. For example:\n  <pre><code>vector.crs=EPSG:4326</code></pre>\n\n A few other default values can be defined for all items in the shapefile:\n  <pre><code>vector.ingestSource={default-source}\nvector.itemType={default-type}</code></pre>\n\n For example:\n  <pre><code>vector.ingestSource=Tomnod\nvector.itemType=Nepal Earthquake</code></pre>\n\n As well, the index to which the items in the shapefile will be written can be specified by using the vector item index name template format (described here: [Vector Services Elasticsearch Index Name Templates](doc:vs-elasticsearch-index-name-templates)). To specify a particular index, include the 'vector.index' property. For example:\n  <pre><code>vector.index=vector-shapefile-{geohash}-{item_date}</code></pre>\n\n The default value for the index name template if the property is not included when ingesting from S3 is:\n  <pre><code>vector-shapefile-ingest-{date}</code></pre>\n\n Finally, shapefile fields can be mapped directly to vector item fields by defining a line with the shapefile column name and an '=' followed by the vector item field. The vector item fields that can be mapped are as follows:\n - item_date\n - name\n - item_type\n - text\n - source\n\nFor all of the above values except 'text', the value is taken directly from the column value and converted to the proper type (e.g. an 'item_date' field is converted to a Date). For text, the value of any field mapped to the 'text' type is appended to a single pipe-delimited string. For example, if a shapefile included columns named \"fieldOne\" and \"fieldTwo\", and an entry had the corresponding values \"valueOne\" and \"valueTwo\", if those fields were both mapped to the 'text' field, the final value would be \"valueOne | valueTwo\".\n\n A complete mapping file for a shapefile that contains at least columns with the names \"tagger_id\" and \"id\" might look as follows:\n  <pre><code>vector.crs=EPSG:4326\nvector.ingestSource=Tomnod\nvector.itemType=Nepal Earthquake\nvector.index=vector-tomnod-nepal-{ingest_date}\ntagger_id=source\nid=name</code></pre>","excerpt":"","slug":"ingesting-shapefiles-via-s3-buckets","type":"basic","title":"Ingesting Shapefiles via S3 Buckets"}

Ingesting Shapefiles via S3 Buckets


The process of ingesting shapefiles via S3 buckets is relatively straightforward: simply drop a ZIP archive with all the necessary shapefile files into the S3 bucket. However, for some types of shapefiles, some preparation might be necessary, particularly with telling the process how to map shapefile data to vector item fields. <h2>Shapefile mapping file locations</h2> The shapefile ingestion process requires a file to describe how to map shapefile columns to vector index fields. That file can be located in one of three different places: - internal to the ZIP archive - at the root of the S3 bucket - in a database on the processing system In the event that mappings for a bucket exist in more than one place, the mapping internal to the zip takes precedence over the mapping in the S3 bucket, which in turn takes precedence over the mapping in the database. <h2>Internal mapping file</h2> Each ZIP archive for shapefiles can contain a file called "mapping.properties". If the ingestion process finds a file with that name in the ZIP archive, it will use that file to map shapefile columns to vector item fields. <h2>S3 bucket root</h2> If the ingestion process does not find the mapping file in the ZIP, then it will look at the root of the S3 bucket for the mapping file. The file there should also be named "mapping.properties". <h2>Database entry</h2> If a mapping file isn't found either in the ZIP or in the S3 bucket, the ingestion process will attempt to look up the bucket name in a database. If it finds an entry with a mapping for the bucket, it will use that mapping. If it doesn't, it will not process the shapefile since there are no more automatic places to look for mappings. <h2>Mapping file format</h2> A field mapping file defines some information needed by the ingestion process to handle the shapefile, as well as defining some standard fields and default values to use. Note, all columns from a shapefile entry are also automatically included in the 'attributes' map of a vector item. A field mapping file must include an entry defining the coordinate reference system of the entries in the shapefile. For example: <pre><code>vector.crs=EPSG:4326</code></pre> A few other default values can be defined for all items in the shapefile: <pre><code>vector.ingestSource={default-source} vector.itemType={default-type}</code></pre> For example: <pre><code>vector.ingestSource=Tomnod vector.itemType=Nepal Earthquake</code></pre> As well, the index to which the items in the shapefile will be written can be specified by using the vector item index name template format (described here: [Vector Services Elasticsearch Index Name Templates](doc:vs-elasticsearch-index-name-templates)). To specify a particular index, include the 'vector.index' property. For example: <pre><code>vector.index=vector-shapefile-{geohash}-{item_date}</code></pre> The default value for the index name template if the property is not included when ingesting from S3 is: <pre><code>vector-shapefile-ingest-{date}</code></pre> Finally, shapefile fields can be mapped directly to vector item fields by defining a line with the shapefile column name and an '=' followed by the vector item field. The vector item fields that can be mapped are as follows: - item_date - name - item_type - text - source For all of the above values except 'text', the value is taken directly from the column value and converted to the proper type (e.g. an 'item_date' field is converted to a Date). For text, the value of any field mapped to the 'text' type is appended to a single pipe-delimited string. For example, if a shapefile included columns named "fieldOne" and "fieldTwo", and an entry had the corresponding values "valueOne" and "valueTwo", if those fields were both mapped to the 'text' field, the final value would be "valueOne | valueTwo". A complete mapping file for a shapefile that contains at least columns with the names "tagger_id" and "id" might look as follows: <pre><code>vector.crs=EPSG:4326 vector.ingestSource=Tomnod vector.itemType=Nepal Earthquake vector.index=vector-tomnod-nepal-{ingest_date} tagger_id=source id=name</code></pre>