Geocoding is the process of identifying a geographic location represented by a description of a place. In IPUMS GeoMarker’s geocoding, places are described by street addresses uploaded by the user, and geographic locations are returned as latitude/longitude (lat/long) points. Geocoding is performed by Texas A&M University’s Geoservices Geocoding Platform.
Geocoding consists of three main steps:
- Address parsing and standardization
- Selecting a matching reference feature
- Determining a lat/long point location
Address parsing and standardization
The geocoder breaks down the input address into a series of standard attributes, such as street number, street name, and directional prefix or suffix. The attributes are then standardized to match conventions used in reference datasets. For example, “East” may be standardized to “E,” or “Avenue” to “AVE.”
Selecting a matching reference feature
The Texas A&M geocoder includes several databases of reference features. Reference features are described by both address-based attributes and geographic locations, providing the link between the input address and the output location. The geocoder queries the reference feature databases to look for matches to the standardized address attributes from the input addresses. Common types of reference features include address points, parcels, street segments, and ZIP code boundaries.
Determining a lat/long point location
If the input address is matched to an address point, the latitude and longitude of that point is returned. If the input address is matched to a reference feature that is a line or polygon rather than a point, the geocoder must determine a single lat/long point to return. For reference features represented as polygons, such as parcels or ZIP code boundaries, the geocoder returns the centroid of the polygon. Street segment reference features are represented as lines with a range of address numbers that fall along that segment. The geocoder uses the input address number to interpolate the point along the street segment where that address is likely to be found.