Coder Social home page Coder Social logo

amperser / proselint Goto Github PK

View Code? Open in Web Editor NEW
4.3K 47.0 176.0 4.87 MB

A linter for prose.

Home Page: http://proselint.com

License: BSD 3-Clause "New" or "Revised" License

JavaScript 29.05% HTML 28.07% CSS 0.30% Python 38.75% Shell 0.21% Ruby 0.04% SCSS 3.57% Procfile 0.01%
linter prose writer advice knowledge language style

proselint's People

Contributors

agentydragon avatar carreau avatar catherineh avatar craigkelly avatar dependabot[bot] avatar drinks avatar hugovk avatar j10sanders avatar jacalvo avatar jayvdb avatar joshmgrant avatar jwilk avatar kylesezhi avatar laraross avatar lcd047 avatar m-charlton avatar manuel-uberti avatar marsam avatar mavit avatar mpacer avatar nvuillam avatar nytelife26 avatar patchranger avatar patrick96 avatar pyup-bot avatar suchow avatar tatsh avatar tdenewiler avatar viccuad avatar vikasgorur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proselint's Issues

Rubric for hard-to-implement features

Develop some systematic way to describe features within the extracted sources that are not easily implementable but with an eye to why they are not easily implementable and any clues as to what sources may provide a solution to the problem.

Architecture for sharing processed data across rules

If we need to use more computationally intense analyses in multiple rules (e.g., nltk & syntax parsing to identify whether •while• is being used as a conjunction or a adverb) it would make more sense to memoize the output so that it can be accessed by other rules rather than rerun.

This should be a fairly general system that automatically builds the data structures so that they can be shared across individual checks, possibly with some kind of a require type statement being present in more than one rule?

Choose a sensible naming/numbering scheme for errors

The convention is a capital letter and a 3 digit code. For example, pep257 uses codes like D100, D302, etc. It might be nice for us to use a 3-letter code for the source of the advice and a 3-digit code for the specific check, e.g., DFW201. The numeric codes can then be organized across sources according to higher-level categories of errors. For example, 100-level codes might be for overused words, phrases, idioms, symbols, and grammatical structures. The 200-level codes might be for nonsensical structures, such as DFW's comparing uncomparables. This fails if a particular author has > 99 pieces of advice of a particular kind, but if we run into that problem, then we're doing great. If that happens it might also suggest that our errors could use some compression (e.g., by merging all the overused single words into one check).

The URLs it leads to are nice and compact, too: http://lifelinter.com/DFW201.

Check for common typographical issues

2 x 4 vs. 2 × 4
2-4 vs. 2–4
Bose-Einstein condensate vs. Bose–Einstein condensate
--- vs. —
+/- vs. ±

(Take a look at Jordan's typography talk for some examples.)

Run checks in parallel

There's an opportunity to run the linter in a way that's massively parallel. The main insights here are that many of the rules can be run independently of each other and that they can be run independently on separate parts of the text (e.g., at the paragraph level).

Create a sports detector

One of the entries in GMAU is:

answer back is a common REDUNDANCY, especially in BrE—e.g.: “Hilary and Piers du Pre seem determined to wreak the ultimate revenge on their sister by discrediting her while she lies—unable to answer back [read answer]—in her grave.” Julian Lloyd Webber, “An Insult to Jackie’s Memory,” Daily Telegraph, 4 Jan. 1999, at 15.

In AmE, the phrase is fairly common in sportswriting in the sense “to equal an opponent’s recent scoring effort”—e.g.:
• “Even when the Cougars did score, the Herd answered back in an instant.” Joe Davidson, “Herd Remain on a Roll,” Sacramento Bee, 21 Nov. 1998, at D1.
• “Jake Armstrong quickly answered back for the Knights, but the two-goal cushion was short-lived.” Joe Connor, “La Jolla, Bishop’s Tie One On in Wester,” San Diego Union-Trib., 16 Dec. 1998, at D6.

Some writers have used the sport phrase metaphorically—e.g.: “The last time somebody tried to impose prohibition on Chicago, the city answered back with Al Capone.” Peter Annin, “Prohibition Revisited?” Newsweek, 7 Dec. 1998, at 68. Despite the currency of this usage, answer can carry the entire load by itself.

LANGUAGE-CHANGE INDEX answer back for answer (outside sports): Stage 3

This pattern, where there is an exception to a rule when talking about a particular topic (or where a rule applies only when talking about the topic) will come up many times.

Apply memoized rule checks at the paragraph level

Rules are currently defined as functions over the full text of the document. It would be better to apply the functions to each paragraph separately. The reason for this is that, for many documents (especially large ones), most of the paragraphs will not change between saves or keystrokes, such that when these functions are memoized, most of the linter computations will be available right away.

Great writing should come back nearly clean

It would be good to include an automated test sweet that runs the linter over writing that is written by a great author and has already been heavily edited and copyedited (e.g., an essay from The New Yorker that went on to win the Pulitzer prize in nonfiction) . The linter should be nearly silent.

Unincorporated clichés from GMAU

the following need some more thought before including.

  • "inclement weather", ?
  • "there is wide support" in politics
  • boasts as a transitive verb,
  • choreograph used figuratively,
  • giveth ... taketh away
  • orchestrate in nonmusical contexts
  • venerable when used for 'old'

it would also be good to go through all the clichés and think of variant forms that might appear.

Create a plugin system

I want a single code file for each check that:

  1. Implements the check.
  2. Includes a docstring that is autogenerated into a web page.
  3. Includes test cases that do and do not raise an error.

Here a sample of what I'm imagining:

"""DFW001: Comparing uncomparables.

---
layout:     post
error_code: DFW201
source:     David Foster Wallace
title:      PL001: Comparing an uncomparable
date:       2014-06-10 12:31:19
summary:    Comparing an uncomparable.
categories: check

---

David Foster Wallace says:

> This is one of a class of adjectives, sometimes called "uncomparables", that
can be a little tricky. Among other uncomparables are precise, exact, correct,
entire, accurate, preferable, inevitable, possible, false; there are probably
two dozen in all. These adjectives all describe absolute, non-negotiable
states: something is either false or it's not; something is either
inevitable or it's not. Many writers get careless and try to modify
uncomparables with comparatives like more and less or intensives like very. But
if you really think about them, the core assertions in sentences like "War is
becoming increasingly inevitable as Middle East tensions rise"; "Their cost
estimate was more accurate than the other firms'"; and "As a mortician, he has
a very unique attitude" are nonsense. If something is inevitable, it is bound
to happen; it cannot be bound to happen and then somehow even more bound to
happen. Unique already means one-of-a-kind, so the adj. phrase very unique is
at best redundant and at worst stupid, like "audible to the ear" or
"rectangular in shape". You can blame the culture of marketing for some of
this difficulty. As the number and rhetorical volume of US ads increase, we
become inured to hyperbolic language, which then forces marketers to load
superlatives and uncomparables with high-octane modifiers (special --- very
special --- Super-special! --- Mega-Special!!), and so on. A deeper issue
implicit in the problem of uncomparables is the dissimilarities between
Standard Written English and the language of advertising. Advertising English,
which probably deserves to be studied as its own dialect, operates under
different syntactic rules than SWE, mainly because AE's goals and assumptions
are different. Sentences like "We offer a totally unique dining experience";
"Come on down and receive your free gift"; and "Save up to 50 per cent... and
more!" are perfectly OK in Advertising English — but this is because
Advertising English is aimed at people who are not paying close attention.
If your audience is by definition involuntary, distracted and numbed, then free
gift and totally unique stand a better chance of penetrating — and simple
penetration is what AE is all about. One axiom of Standard Written English is
that your reader is paying close attention and expects you to have done the
same.
"""

import re


def check(text):

    error_code = "PL001"
    msg = "Comparison of an uncomparable."  # do formatting thing

    comparators = [
        "very",
        "more",
        "less",
        "extremely",
        "increasingly"
    ]

    uncomparables = [
        "unique",
        "correct",
        "inevitable",
        "possible",
        "false",
        "true"
    ]

    errors = []
    for comp in comparators:
        for uncomp in uncomparables:
            occurences = [
                m.start() for m in re.finditer(comp + "\s" + uncomp, text)]
            for o in occurences:
                errors.append((1, o, error_code, msg))
    return errors

def test1():
    pass

working out how i can best contribute using github/git

I may need your advice on this one.

I know to do pull requests requires having set up a separate fork of the repo (or at least I think I know that), and I successfully managed to add my fork as a repo, but I fear trying to push changes and overriding anything you've done.

Or should I not worry about that? This is the kind of thing that is most frustrating about trying to work on these projects — I don't want to break anything but I'm not sure always how to properly set it up so that everything is correctly following version control protocol.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.