Kd::Tree
Crystal implementation of "K-Dimensional Tree" and "N-Nearest Neighbors" based on http://en.wikipedia.org/wiki/Kd-tree.
Installation
Add this to your application's shard.yml:
dependencies:
  kd_tree:
    github: geocrystal/kd_tree
Usage
require "kd_tree"
For example, construct a new tree where each point is represented as a two-dimensional array in the form [x, y], where x and y are numbers (such as Int32, Float64, etc).
kd = Kd::Tree(Array(Int32)).new(points)
Find the nearest point to [x, y]. Returns an array with one point:
kd.nearest([x, y])
Find the nearest k points to [x, y]. Returns an array of points:
kd.nearest([x, y], k)
Example
require "kd_tree"
points = [
  [2.0, 3.0],
  [5.0, 4.0],
  [4.0, 7.0],
  [7.0, 2.0],
  [8.0, 1.0],
  [9.0, 6.0],
]
kd = Kd::Tree(Array(Float64)).new(points)
kd.nearest([1.0, 1.0])
# => [[2.0, 3.0]])
kd_tree.nearest([1.0, 1.0], 2)
# => [[2.0, 3.0], [5.0, 4.0]])
Complex objects
Kd::Tree(T) can accept any object that responds to #size and #[](i : Int) methods.
class GeoLocation
  property name : String
  property longitude : Float64
  property latitude : Float64
  getter size = 2 # Assuming all GeoLocation objects are 2-dimensional
  def initialize(@name : String, @longitude : Float64, @latitude : Float64)
  end
  # Define an indexer to allow easy access by index for longitude and latitude
  def [](index : Int32) : Float64
    case index
    when 0 then @longitude
    when 1 then @latitude
    else        raise "Index out of bounds"
    end
  end
end
# Create an array of GeoLocation points
points = [
  GeoLocation.new("New York", -73.935242, 40.730610),
  GeoLocation.new("Los Angeles", -118.243683, 34.052235),
  GeoLocation.new("London", -0.127647, 51.507322),
  GeoLocation.new("Tokyo", 139.691711, 35.689487),
]
# Initialize the KD-tree with these points
kd_tree = Kd::Tree(GeoLocation).new(points)
# Find the nearest point to London
target = GeoLocation.new("Near London", -0.125740, 51.508530)
nearest_point = kd_tree.nearest(target, 1)
puts "Nearest to London: #{nearest_point.first.name} (longitude #{nearest_point.first.longitude}, latitude #{nearest_point.first.latitude})"
# Nearest to London: London (longitude -0.127647, latitude 51.507322)
Distance
For distance calculations, the squared Euclidean distance is used. However, you can easily monkey-patch the Kd::Tree#distance method to implement another algorithm, such as the Haversine formula, to calculate distances between two points given their latitudes and longitudes.
require "haversine"
module Kd
  class Tree(T)
    private def distance(m : T, n : T)
      # Calling `Haversine.distance` with 2 pairs of latitude/longitude coordinates.
      # Returns a distance in meters.
      Haversine.distance({m.latitude, m.longitude}, {n.latitude, n.longitude}).to_meters
    end
  end
end
points = [
  GeoLocation.new("New York", -73.935242, 40.730610),
  GeoLocation.new("Los Angeles", -118.243683, 34.052235),
  GeoLocation.new("London", -0.127647, 51.507322),
  GeoLocation.new("Tokyo", 139.691711, 35.689487),
]
kd_tree = Kd::Tree(GeoLocation).new(points)
# Find the nearest point to London
target = GeoLocation.new("Near London", -0.125740, 51.508530)
nearest_point = kd_tree.nearest(target, 1)
puts "Nearest to London: #{nearest_point.first.name} (longitude #{nearest_point.first.longitude}, latitude #{nearest_point.first.latitude})"
# Nearest to London: London (longitude -0.127647, latitude 51.507322)
Performance
Using a tree with 1 million points [x, y] of Float64 on my Apple M1 Pro (10) @ 3.23 GHz:
crystal run benchmark/benchmark.cr --release
Benchmarking KD-Tree with 1 million points
                        user     system      total        real
build(init)         1.840140   0.021103   1.861243 (  1.872732)
nearest point   1   0.004484   0.000002   0.004486 (  0.004490)
nearest point   5   0.007391   0.000010   0.007401 (  0.007479)
nearest point  10   0.011406   0.000090   0.011496 (  0.011679)
nearest point  50   0.034097   0.000819   0.034916 (  0.035175)
nearest point 100   0.133828   0.003721   0.137549 (  0.156548)
nearest point 255   0.220200   0.000631   0.220831 (  0.223081)
nearest point 999   0.731941   0.000441   0.732382 (  0.737236)
Contributing
- Fork it (https://github.com/geocrystal/kd_tree/fork)
 - Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
 
Contributors
- mamantoha Anton Maminov - creator, maintainer