GBDX

Vector Query Syntax, Query Fields, and Type Suffixes

Syntax information for Aggregation and Vector Services calls

Query Syntax

Vector querying uses Elasticsearch query syntax. For a full guide to Elasticsearch syntax, visit Elasticsearch's syntax documentation.

Things to note:

  • Vector Services provides two representations of our vector data: GeoJSON and ESRI JSON. Our standard representation is GeoJSON, but we also provide ESRI-specific endpoints for users that require that format.

  • When querying on items that are phrases, for instance "Non-violent protest," the phrase must be in double quotation to ensure that it is the phrase that is being searched on and not the individual parts of the phrase. For example, querying for the item type "Non-violent protest" would be
    item_type:"Non-violent protest"

Booleans

Case matters in boolean search. Generally speaking, you want to capitalize your booleans in a boolean search. Querying on item_type:Road AND item_date:2017-01-13 searches ES for docs containing both the specified item_type and item_date, and will return only docs with the item_type of Road from the date of 2017-01-13. If, instead, the query is on item_type:Road and item_date:2017-01-13, the search in ES is for docs containing either the item_type of Road, the word and, or the item_date of 2017-01-13. In other words, item_type:Road and item_date:2017-01-13 is translated as item_type:Road OR and OR item_date:2017-01-13.

Note: Boolean matters inside parentheses as well - item_type:(1BProduct and WV03_VNIR) and item_type:(1BProduct AND WV03_VNIR) follows the same pattern, where the former breaks out into a search for item_type:1BProduct OR item_type:and OR item_type:WV03_VNIR while the latter breaks out into a search for item_type:1BProduct AND item_type:WV03_VNIR.

The Different Types

In the JSON of a given vector, there are multiple "types." One type is grouped under the geometry section of the vector data.

   {
    "geometry": {
      "coordinates": [
        [
          [
            37.51445220776068,
            55.89066246220917
          ],
          [
            37.51445220776068,
            55.89101403267135
          ],
          [
            37.51510527269151,
            55.89101403267135
          ],
          [
            37.51510527269151,
            55.89066246220917
          ],
          [
            37.51445220776068,
            55.89066246220917
          ]
        ]
      ],
      "type": "Polygon"
    },

Although a user cannot query specifically on that field, they can query on the synthetic field geom_type that corresponds to that field if the user is looking for results of a specific geometry, for instance
geom_type:Polygon

Another type is the type of the vector item, listed as item_type under the properties section of the vector data.

      "properties": {
      "ingest_source": "ObjectDetection",
      "access": {
        "groups": [
          "_ALL_"
        ],
        "users": [
          "_ALL_"
        ]
      },
      "item_date": "2015-01-06T08:26:39Z",
      "original_crs": "EPSG:4326",
      "item_type": [
        "Airliner"
      ],

This type is queried as the item_type if the user is looking for results of a specific type of vector item. Generally speaking, a user querying on the type of vector item will get fewer - but more specific - results than a user querying on the type of geometry.
item_type:Airliner

Finally, all of the data in vector services is represented as GeoJSON "Feature" objects. An object's class is specified with a "type" attribute in the GeoJSON representation, so there will always be a top-level "type" field with a value of "Feature".

 [
  {
    "type": "Feature",

Property Fields

The following is a list of common property fields in the Vector Services. These fields exist for every properly ingested Vector Services vector item, and each of these fields may be used, separately or in combination, to construct more detailed and specific queries:

  • attributes
  • format
  • geom_type
  • ingest_attributes
  • ingest_date
  • ingest_source
  • item_date
  • item_type
  • name
  • source
  • text

The geom_type is automatically generated based on the item geometry. For a list of geom_types the Vector Services handles, see Elasticsearch's GeoJSON Type (note that Vector Services works with the GeoJSON Type in the table, not the Elasticsearch Type, and therefore does not handle envelope or circle.).

Source vs Ingest Source

The source is not a required field, whereas the ingest_source is. The source field is potentially the original source of the data, and can be used as a means to keep track of that as long as the one(s) who ingest the data include the information. For instance, for some HGIS data, the source could be Google Maps or Bing, but the ingest_source is HGIS. Simply put, ingest_source is where the data got into the vector services system, while source is where the data was actually created. In some cases, the values can be the same; in some cases, the values could be different.

Ingest Date vs Item Date

The ingest_date represents the date that the vector has been written into Vector Services. The item_date represents the date that the vector item was originally generated. These date values can be identical in some cases, but in some cases they are not. When generating a query with a specific date, be sure to choose the correct date type.

  • Date queries may be done using the ISO-8601 format YYYY-MM-DDThh:mm:ss.fffZ (ex: 2016-31-01T14:12:28.011Z) for static dates or with the keyword "now" indicating the date and time at the moment the query is run. ISO-8601 format can be shortened to YYYY-MM-DD, but be aware that, when time is not specified, the time is set to midnight for the query (ex. 2017-01-13 reads as 2017-01-13T00:00:00.000Z).
  • Date expressions can include simple "date math" by appending "+" or "-" along with a numeral and a unit. The following time range units are available:

    • y (year)
    • M (month)
    • w (week)
    • d (day)
    • h (hour)
    • m (minute)
    • s (second)

    For example, if a user wants to query on the items ingested over the last week, the query using the "now" keyword would be ingest_date:[now-1w TO now].
    If a user wants to query on all of the items in Vector Services from the beginning until April 14, 2015, the query would be item_date:[* TO 2015-04-14].
    Note: the capital 'M' is for months and lower-case 'm' is for minutes. It's very easy to confuse these when entering date phrases, so if the results are not as expected (e.g. empty aggregations), check to make sure the time phrase is using the correct units.

Ranges

Typically, a user will want more than a single date for a query. For that, use ranges. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}.

For example, setting up a query for items from all days in 2017: item_date:[2017-01-01 TO 2017-12-31]. Or for items from all days prior to 2015: item_date:{* TO 2015-01-01T00:00:00.000Z}. Note that curly and square brackets can be combined.

Ranges with one side unbounded can use the following syntax:

  • item_date:>now-1w
  • item_date:>=2017-01-01
  • item_date:<2017-01-01
  • item_date:<=now-1w

Attributes and Ingest Attributes Fields

The attributes and ingest_attributes fields, unlike the other fields listed, have subfields to query on. Different vector items have different attributes and/or ingest_attributes fields depending on the source data. To query on an attributes or ingest_attributes subfield, the user must use additional formatting, using attributes. or ingest_attributes. as a prefix. For instance, suppose there is an attribute field called "userName" with the string "User1" in a few documents that the user wants to query on. To see those documents, the user queries on
attributes.userName:User1

Type Suffixes

Analyzed vs Not Analyzed

String fields may either be analyzed or not analyzed on ingest. Analyzed fields are broken apart into "tokens" based on punctuation and whitespace, and everything is made lowercase. So, if a field is analyzed, the user may query on it and retrieve any results that include any part of the queried field. For example, suppose there is an analyzed field with the string "Google FTW" in one document, and "www.google.com" in another document; the user queries for "google" on this field and will retrieve both documents.

Fields not analyzed will only return items that exactly match the query parameters. In the prior example, supposing that the field is not analyzed in this case, the query for "google" would return neither document. However, a query on the exact term "www.google.com" would return the second document.

The following fields are ingested in analyzed form, and do not require the analyzed suffix.

  • name
  • text

In order to query on a field that is not analyzed by default, the user should add the suffix .analyzed to the field query. For example,
item_type:building
In the above case, since item_type is not analyzed, the returned results would only be items with the exact type "building" - case sensitive. In the following case, adding the .analyzed suffix returns results that include building in the type, case insensitive.

item_type.analyzed:building
In this case, results of both 'BUILDING' and 'building' would be returned, along with items with an item_type of, for example, "Government Building," etc.

Raw

Users may want to query a string attribute as a not-analyzed value. In that case, querying by adding the raw suffix to the attribute field name will query the non-tokenized content of that field, including whitespaces and anything following. Note, however, that only string attribute fields have a '_raw' version. For instance, adding the raw suffix to a date field will return an empty set, because there are no raw date fields.

In order to query on a raw field, the user should add the suffix _raw to the end of the field they are querying on. For example,
terms:attributes.userName
In the above case, since the userName field is not raw, the returned results would potentially return abridged results due to whitespace. (ie, userName "John Smith" would return as "John")

terms:attributes.userName_raw
In this case, the user would retrieve the full "John Smith" as one of the results for userName.

Sorting Results

Users may wish to have the resulting vector items of their vector queries returned in a specific order. This is possible for vector queries, by adding the sort parameter to an API call to either start paging requests or list the vector items directly. The sort parameter defaults to ascending, but the user may select ascending or descending order. Any vector fields may be used in conjunction with sort, and multiple fields may be used at once, separated by a comma. For example,
sort=item_type
sort=item_date:desc
sort=ingest_source:asc,attributes.name:desc

Note that sorting the query increases length of the response time.