Lesson: Bulk Ingesting Vectors into VectorServices

To ingest a small number of vectors into Vector Services, it is easier and faster to use the Write API. This lesson helps to walk through bulk ingesting vectors, which is useful when there are a significant number of vectors you want to ingest into Vector Services.

Decide on your index name#

Elasticsearch allows us to set a template for defining field mappings for vector items when it automatically creates an index. We do that by matching the index name to be created against an expression in the index template.

There are a few considerations when deciding on your index name.

  • Index must not contain any upper-case letters.
  • Index may include placeholders.
  • There is no default index on vector bulk ingest.
  • In order for Vector Services to pick up the index, it must start with vector-. User generated indices have an additional restriction of requiring user-provided- directly following the vector-. For example, if the desired index name is bulk-test, the index name that is actually specified is vector-user-provided-bulk-test. Contact [email protected] to request a different start to the index name.

See Vector Services Elasticsearch Index Name Templates for full details on ways to structure an index name.

Set up your ingest configuration file#

In order to ingest the vectors in such a way that they can be easily retrieved, creating a configuration file to include alongside the vectors in S3 is necessary. The configuration file defines some information needed by the ingestion process, as well as allowing users to define some standard fields and default values to use. It is also possible to map access to vectors. Note: Vector access can be altered after vector ingest, as well.

There are a few required definitions in the configuration file. Without these field definitions, the ingest will error and fail.

  • vector coordinate reference system (crs of the vectors being ingested, such as EPSG:4326)
  • vector index (index that vectors will be ingested into)
  • user token (GBDX token of user ingesting vectors)
  • vector file type (may be shapefile or geojson, not case sensitive)

A configuration file with only the required definitions may look like

vector.fileType=GEOJSON
vector.index=vector-user-provided-bulk-test-{ingest_date}
vector.userToken={valid GBDX auth token}
vector.crs=EPSG:4326

Other default values that can be defined for vectors include:

  • ingest source
  • item type
  • vector access (what GBDX accounts are allowed to view the vectors)

A configuration file including all of these things might look like

vector.fileType=GEOJSON
vector.index=vector-user-provided-bulk-test-{ingest_date}
vector.userToken={valid GBDX auth token}
vector.crs=EPSG:4326
vector.ingestSource=Test Source
vector.itemType=Test Type
vector.access.group={gbdx_account_1},{gbdx_account_2}

For a complete list of fields that can be mapped in the mapping file, see Ingesting Shapefiles via S3 Buckets.

Once the configuration file is set up to handle the vector-specific needs, save the file as mapping.properties and include it with the vectors to be ingested.

Bundle all vectors and the configuration file into a zip file. The zip file needs the configuration file at the root, and the data to be ingested needs to be in a directory named items.

Note: Bulk ingest does not handle files zipped within a zipped file. The input data must be single-zipped. Bulk ingest currently supports either shapefiles or geojson/json format within the zip file.

Load vector data into S3#

In order to ingest vectors, Vector Services first needs to be able to reach the vectors. To do this, upload the vectors into the S3 location that the ingest process watches. Not all users have access to this S3 location; contact [email protected] for assistance.

Vector Bulk Ingest#

Once the zip file is in the appropriate S3 bucket, Vector Bulk Ingest will automatically pick it up and begin ingest.