Coder Social home page Coder Social logo

yriyazi / high-resolution-remote-sensing-image-captioning-based-on-structured-attention-and-sam-network Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 2.87 MB

Automatically generating language descriptions for remote sensing images has emerged as a significant research area within the field of remote sensing. This project focuses on attention-based captioning methods, which are a prominent class of deep learning-based techniques for generating captions.

Python 96.66% Jupyter Notebook 3.34%
deep-learning deeplearning image-captioning paper-implementations semantic-segmentation

high-resolution-remote-sensing-image-captioning-based-on-structured-attention-and-sam-network's Introduction

Automatic Language Description Generation for Remote Sensing Images

Project Description:

Automatically generating language descriptions for remote sensing images has emerged as a significant research area within remote sensing. This project focuses on attention-based captioning methods, a prominent class of deep learning-based techniques for generating captions. These methods have the advantage of generating descriptive text while emphasizing the corresponding object locations within the image.

However, traditional attention-based approaches typically generate captions using coarse-grained and unstructured attention units. This limitation hinders the exploitation of structured spatial relationships within semantic content in remote-sensing images. These images possess unique structural characteristics that differentiate them from natural images, posing greater challenges for remote sensing image captioning. Unfortunately, many existing remote sensing captioning methods draw inspiration from the computer vision community without considering domain-specific knowledge.

To address this challenge, we introduce a novel fine-grained, structured, attention-based method designed to leverage the inherent structural characteristics of semantic content in high-resolution remote sensing images. Our approach generates improved descriptions and produces pixel-wise segmentation masks for semantic content. Notably, this segmentation can be simultaneously trained with the captioning task within a unified framework, eliminating the need for pixel-wise annotations.

Key Features:

  • Fine-grained, structured, attention-based captioning method.
  • Utilization of spatial relationships within semantic content.
  • Generation of pixel-wise segmentation masks for semantic content.
  • Unified training framework for captioning and segmentation.
  • Improved captioning accuracy compared to state-of-the-art methods.
  • High-resolution and meaningful segmentation masks.

Project Structure:

The project directory structure is as follows:

  • config.yaml: Configuration file for project settings.
  • deeplearning: Directory containing deep learning-related code.
  • nets: Neural network architectures and model definitions.
  • test.py: Script for testing and evaluation.
  • dataloaders: Data loading utilities.
  • losses: Custom loss functions.
  • results: Directory to store experiment results.
  • train.py: Script for model training.
  • datasets: Dataset handling and preprocessing.
  • models: Trained model checkpoints.
  • SAM: Structured Attention Mechanism implementation.
  • utils: Utility functions and helper scripts.

Getting Started:

To begin working with this project, follow these steps:

  1. Set up your Python environment with the required dependencies.
  2. Configure the config.yaml file to adjust project settings.
  3. Use the provided scripts for training (train.py), testing (test.py), and evaluation.
  4. Explore the various directories for specific code components and functionalities.

Evaluation:

The proposed method has been extensively evaluated on three remote sensing image captioning benchmark datasets. The evaluation includes detailed ablation studies and parameter analyses. The results demonstrate that our approach outperforms state-of-the-art methods regarding captioning accuracy. Furthermore, our method's ability to generate high-resolution and meaningful segmentation masks adds significant value to the remote sensing image analysis pipeline.

Contributing:

Contributions to this project are welcome! If you want to improve the automatic language description generation for remote-sensing images, please submit pull requests or contact the project maintainers.

License:

This project is distributed under the MIT License.

Citation

If you find this work helpful or build upon it in your research, please consider citing the following paper:

[Yassin Riyazi, S. Mostaf Sajadi, Abas Zohrevand, and Reshad Hosseini*. 2024. "High-Resolution RemoteSensing Image Captioning Based on Structured Attention and SAM Network" ICEE, 2024, Page Numbers. DOI]

Acknowledgments:

We acknowledge the support of the open-source community and the valuable insights gained from the remote sensing and computer vision domains. This project builds upon the collaborative spirit of knowledge sharing and technological advancement.

Contact:

For questions or inquiries, please contact [email protected].

high-resolution-remote-sensing-image-captioning-based-on-structured-attention-and-sam-network's People

Contributors

yriyazi avatar smsadjadi avatar

Stargazers

Nick Imanzi avatar  avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.