--- license: other --- # Whereabouts: Reference databases This is a space containing reference databases to be used by [whereabouts](https://github.com/ajl2718/whereabouts). Whereabouts is a geocoding package in Python that implements some clever record linkage algorithms in SQL using DuckDB. The package itself is available at [whereabouts](https://github.com/ajl2718/whereabouts) and can be installed via ``` pip install whereabouts ``` ## Installation of reference databases Once the package is installed you will need to install a geocoding database, which has been built from a country's or region's address data. This repo contains a collection of these databases for different countries and regions. Currently it has files for **Australia:** - Whole of country - Victoria, Australia - New South Wales, Australia **United States**: - Florida, United States - California, United States - Massachusetts, United States More are being added as I get around to cleaning the data and creating the corresponding databases. The file format is `__` where `` is either `sm` or `lg` depending on whether the inverted index has been created using pairs of consecutive tokens or trigrams. The large models can handle lower quality address data at the expense of speed. Example (install the small Australian geocoding database) ``` python -m whereabouts download au_all_sm ``` ## Start geocoding Once you have installed the package and a database you can start geocoding your data. ``` from whereabouts.Matcher import Matcher addresslist = ['122 station st fairfield vic', '643-645 sydney road brsunwick', '504 sydney rd brunswick'] matcher = Matcher(db_name='au_all_sm') matcher.geocode(addresslist, how='standard') ``` ## License Disclaimer for Third-Party Data Note that while the code from this package is licensed under the MIT license, the pre-built databases use data from data providers that may have restrictions for particular use cases: - The Australian databases are built from the [Geocoded National Address File](https://https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf) with conditions of use based on the [End User License Agreemment](https://data.gov.au/dataset/ds-dga-e1a365fc-52f5-4798-8f0c-ed1d33d43b6d/distribution/dist-dga-0102be65-3781-42d9-9458-fdaf7170efed/details?q=previous%20gnaf) - The US databases are still work-in-progress but are based on data from [OpenAddresses](https://openaddresses.io/) and so any work with whereabouts based on US address data should adhere to the [OpenAddresses license](https://github.com/openaddresses/openaddresses/blob/master/LICENSE). Users of this software must comply with the terms and conditions of the respective data licenses, which may impose additional restrictions or requirements. By using this software, you agree to comply with the relevant licenses for any third-party data.