Coder Social home page Coder Social logo

jtwhu / facade-segmentation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from theoclark/facade-segmentation

0.0 0.0 0.0 46.69 MB

Fine tuning the model from the following paper on CMP Facade database: "The devil is in the labels: Semantic segmentation from sentences"

Jupyter Notebook 100.00%

facade-segmentation's Introduction

facade-segmentation

An exploration of the following paper: "The devil is in the labels: Semantic segmentation from sentences" (Paper | GitHub) fine-tuned on the CMP Facade Database (Info).

Usage

The project is in a Jupyter notebook (model.ipynb). For easiest usage: clone the repository, upload to Google Drive and open using colab. Model weights and the CMP database will need to be downloaded first time around.

Introduction

This semantic segmentation paper trains a Segformer model to map each pixel in an image to an image embedding in the CLIP feature space. The CLIP language encoder is then used to map labels onto the same space so that per-pixel predictions can be made.

The main development in the paper is the use of full sentences over single labels to generate the text embeddings. Full sentences are reasoned to carry more semantic meaning and are therefore able to generate richer embeddings, leading to more accurate classification.

Method

Using the 12 target labels for the CMP Facade dataset, I generated 3 different sets of embeddings. One using just the raw labels, a second using the short prompt: β€œthis is an image of {label}” and a third using a full definition in line with the paper. These definitions were taken from Wikipedia as in the paper, except where there was no appropriate page, in which case dictionary definitions were used. The full definitions are listed in the appendix.

Baseline results were first taken using the provided model and then the decoder part of the Segformer was trained for 5 epochs for each set of embeddings. Results were assessed by calculating the average per-pixel accuracy score for all 378 images.

Results

Whilst version 1 and 2 have similar accuracy scores for one-shot predictions, qualitatively there seems to be quite a difference (figure 1). In a one-shot setting the shorter the description for the embedding, the better the result appears to be. Version 2 and 3 both struggle to predict balconies and window predictions are patchy.

After fine tuning the difference between the three models is much smaller, though version 3 struggles to predict the presence of doors. This indicates that, with these definitions at least, using longer descriptions for the embeddings does not confer any advantage.

After fine tuning the decoder for 5 epochs, the difference in per-pixel accuracy between all 3 models was negligible (all 52%). Further training would no doubt yield incremental improvements for all models.

results

predictions

facade-segmentation's People

Contributors

theoclark avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.