🌎 GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization

Description

GeoCLIP addresses the challenges of worldwide image geo-localization by introducing a novel CLIP-inspired approach that aligns images with geographical locations, achieving state-of-the-art results on geo-localization and GPS to vector representation on benchmark datasets (Im2GPS3k, YFCC26k, GWS15k, and the Geo-Tagged NUS-Wide Dataset). Our location encoder models the Earth as a continuous function, learning semantically rich, CLIP-aligned features that are suitable for geo-localization. Additionally, our location encoder architecture generalizes, making it suitable for use as a pre-trained GPS encoder to aid geo-aware neural architectures.

Method

Similarly to OpenAI's CLIP, GeoCLIP is trained contrastively by matching Image-GPS pairs. By using the MP-16 dataset, composed of 4.7M Images taken across the globe, GeoCLIP learns distinctive visual features associated with different locations on earth.

🚧 Repo Under Construction 🔨

🗺️📍 Worldwide Image Geolocalization

Usage: GeoCLIP Inference

import torch
from PIL import Image
from model.GeoCLIP import GeoCLIP

model = GeoCLIP()
model.eval()

image_file = "images/Kauai.png"

image = Image.open(image_file)

with torch.no_grad():
    top_pred_gps, top_pred_prob = model.predict(image, top_k=5)

print("Top 5 Predictions")
print("=================")
for i in range(5):
    lat, lon = top_pred_gps[i]
    print(f"Prediction {i+1}: ({lat:.6f}, {lon:.6f})")
    print(f"Probability: {top_pred_prob[i]:.6f}")
    print("")

🌐 Pre-Trained Location Encoder

In our paper, we show that once trained, our location encoder can assist other geo-aware neural architectures. Specifically, we explore our location encoder's ability to improve multi-class classification accuracy. We achieved state-of-the-art results on the Geo-Tagged NUS-Wide Dataset by concatenating GPS features from our pre-trained location encoder with an image's visual features. Additionally, we found that the GPS features learned by our location encoder, even without extra information, are effective for geo-aware image classification, achieving state-of-the-art performance in the GPS-only multi-class classification task on the same dataset.

Usage: Pre-Trained Location Encoder

import torch
import torch.nn as nn
from model.location_encoder import LocationEncoder

gps_encoder = LocationEncoder()
gps_encoder.load_state_dict(torch.load('model/weights/location_encoder_weights.pth'))

loc_data = torch.Tensor([[40.7128, -74.0060], [34.0522, -118.2437]])  # NYC and LA in lat, long
output = gps_encoder(loc_data)
print(output.shape) # (2, 512)

Acknowledgments

This project incorporates code from Joshua M. Long's Random Fourier Features Pytorch. For the original source, visit here.

Citation

@article{cepeda2023geoclip,
  title={GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization},
  author={Vivanco, Vicente and Nayak, Gaurav Kumar and Shah, Mubarak},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2023}
}

hoteryoung / geo-clip Goto Github PK

geo-clip's Introduction

🌎 GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization

Description

Method

🗺️📍 Worldwide Image Geolocalization

Usage: GeoCLIP Inference

🌐 Pre-Trained Location Encoder

Usage: Pre-Trained Location Encoder

Acknowledgments

Citation

geo-clip's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent