Coder Social home page Coder Social logo

varnames's Introduction

Variable/Identifier Name Analysis

Source code is for humans

This is a project I have been thinking about for some time, exploring an interest in the human communication aspects of programming, non-reserved keyword/syntax features of source that are determined by creative/appropriate human language identifiers.

Potentially find commonly misused or improvable naming patterns, identify high or low quality code/commits, look for patterns related to code authorship, training, or sibling projects.

Steps

  1. acquire repo/directory/files
  2. read source code files
  3. extract variables, array keys, function names
  4. analyze strings
    • split multi-word names (ie. camelCase => camel case)
    • show list of files/projects containing one or more string
    • view as tag clouds
    • count occurences in corpus and per file, TF-IDF to relate "similar" files/projects
  5. create a report
  6. add modules/functions/scripts/ReST APIs to find identifiers in other languages

Beyond

  1. attempt to classify source code application by its identifiers
  2. vote for good/bad names
  3. rate repo on naming readability and descriptive precision
  4. extend beyond PHP with existing static analyais tools/abstract syntax tree

Thoughts:

PHP is easy ($xxx, function xxx(...), ->xxx, ['xxx']) (Bash/Perl similar)

Java is pretty easy, with declarations ([scope] TypeName xxx, function xxx)

Other languages, not sure yet...should make use of native tooling, compilers, analysis, there may be libraries which extract these and more details from many/all languages

analyze and report as above for PHP

Prototype/Proof-of-Concept

varnames.sh - proof of concept with basic PHP variable regex only

varnames.txt - raw output

varnames_word_parts.txt - after programming-case conversion tool

varnames_word_parts_counts.txt - word parts with corpus frequency

PoC corpus is from a number of URL shortener repos from GitHub (~700, but only a fraction of them in PHP, approximately 5000 PHP files)

varnames's People

Contributors

cwmoore avatar

Watchers

Curtis W. Moore avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.