Coder Social home page Coder Social logo

captchacrack_crnn's Introduction

Captcha CRNN Model Build + CTC Loss

Tags: CNN, Captcha Cracker, LSTM, Pytorch

為什麼不能直接用CNN 辨識字串?

  • 對於具有多個數字的圖像,例如四個數字的排列,可能有許多不同的組合。這樣的情況使得使用傳統的CNN 可能難以直接捕捉所有可能的排列,因為模型需要學習大量的變化。
  • 此外,數字的位置、大小和方向的變化也會增加難度。

CRNN架構

Untitled

CNN Layer(卷積層)

  • 用於提取圖像的局部特徵。
  • 輸出Convolutional feature maps(卷積特徵圖)。

Map to sequence(映射到序列)

  • 將Convolutional feature maps轉換為feature sequence,以便能夠將圖像的特徵序列輸入到RNN中學習時序信息。

RNN Layer(遞歸層)

  • 使用Feature sequence學習圖像中由左至右的時序信息,這對於文字識別中的序列性資訊非常重要。

CNN主要負責提取圖像的區域特徵,而RNN則在整個特徵序列中學習時序信息,以便更好地理解和識別圖像中的文字。這種結合CNN和RNN的方法,特別是在文字識別(OCR)等任務中,可以有效處理圖像中的序列性信息,提高模型的性能。

CNN Layers

Untitled 2 Untitled 1

  • 輸入圖像格式為(N, C , H , W)

  • 使用Conv 層和MaxPooling層 Downsample

  • 作者在maxpooling2 ,maxpooling3 使用不對稱的kernel filter (2,1) 長條形的filter 有利於捕捉文字類特徵

  • 輸出Conv Feature Maps (N ,512,1,25) 就是512層 1 x 25特徵圖

    N : Batch size

    C : Channel

    H : Height

    W : Width

Map to Sequence

Untitled 3

  • 把Conv Feature Maps (N, C , H , W) 轉換成LSTM 能接受的形狀 $(L,N,$$H_{in}$)
  • 所以 (N ,512,1,25) 轉換成 (25,N,512)

L : Sequence Length(time step)

N : Batch Size

$H_{in}$ : Input Size(feature number)

RNN Layers

Untitled 4

  • 使用Bidirectional LSTM (雙向LSTM)原因是可以同時處理正向(左到右)和反向(右到左)的序列,捕捉字符之間的上下文信息

  • Bidirectional LSTM 輸入形狀為 $(L,N,$$H_{in}$) 輸出形狀為 $(N,L,2∗H_{cell})$

    L : Sequence Length(time step)

    N : Batch Size

    $H_{in}$ : Input Size(feature number)

    $H_{cell}$ : Hidden Size

Transcription Layers

Untitled 5

  • 轉錄是將RNN每幀(frame)的預測轉換為標籤序列的過程
  • 在數學上,轉錄的目標是在每幀的預測條件下找到概率最高的標籤序列

Connectionist Temporal Classification (CTC) layer

  • Sequence(RNN output) to Sequence(target label) 的模型對齊是十分困難,

Reference


captchacrack_crnn's People

Contributors

aiden1020 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.