amperser / proselint Goto Github PK
View Code? Open in Web Editor NEWA linter for prose.
Home Page: http://proselint.com
License: BSD 3-Clause "New" or "Revised" License
A linter for prose.
Home Page: http://proselint.com
License: BSD 3-Clause "New" or "Revised" License
Develop some systematic way to describe features within the extracted sources that are not easily implementable but with an eye to why they are not easily implementable and any clues as to what sources may provide a solution to the problem.
If we need to use more computationally intense analyses in multiple rules (e.g., nltk & syntax parsing to identify whether •while• is being used as a conjunction or a adverb) it would make more sense to memoize the output so that it can be accessed by other rules rather than rerun.
This should be a fairly general system that automatically builds the data structures so that they can be shared across individual checks, possibly with some kind of a require type statement being present in more than one rule?
The convention is a capital letter and a 3 digit code. For example, pep257 uses codes like D100
, D302
, etc. It might be nice for us to use a 3-letter code for the source of the advice and a 3-digit code for the specific check, e.g., DFW201
. The numeric codes can then be organized across sources according to higher-level categories of errors. For example, 100-level codes might be for overused words, phrases, idioms, symbols, and grammatical structures. The 200-level codes might be for nonsensical structures, such as DFW's comparing uncomparables. This fails if a particular author has > 99 pieces of advice of a particular kind, but if we run into that problem, then we're doing great. If that happens it might also suggest that our errors could use some compression (e.g., by merging all the overused single words into one check).
The URLs it leads to are nice and compact, too: http://lifelinter.com/DFW201.
Stub placed in checks.
e.g., DFW on DFW.
edit
correction
rule
error
suggestion
guideline
tip
recommendation
pointer
Steps:
2 x 4 vs. 2 × 4
2-4 vs. 2–4
Bose-Einstein condensate vs. Bose–Einstein condensate
--- vs. —
+/- vs. ±
(Take a look at Jordan's typography talk for some examples.)
There's an opportunity to run the linter in a way that's massively parallel. The main insights here are that many of the rules can be run independently of each other and that they can be run independently on separate parts of the text (e.g., at the paragraph level).
http://instruct.westvalley.edu/lafave/DFW_present_tense.html
This is a particularly nice essay because the first page or two is just a list of idioms and grammatical structures that should be avoided.
His book is so thorough an authoritative that it would be great to get him involved in some way, perhaps as an advisor. It would also be amazing if he (or Oxford University Press) allowed deeper integration with the text of his book.
http://www.amazon.com/Garners-Modern-American-Usage-Garner/dp/0195382757
One of the entries in GMAU is:
answer back is a common REDUNDANCY, especially in BrE—e.g.: “Hilary and Piers du Pre seem determined to wreak the ultimate revenge on their sister by discrediting her while she lies—unable to answer back [read answer]—in her grave.” Julian Lloyd Webber, “An Insult to Jackie’s Memory,” Daily Telegraph, 4 Jan. 1999, at 15.
In AmE, the phrase is fairly common in sportswriting in the sense “to equal an opponent’s recent scoring effort”—e.g.:
• “Even when the Cougars did score, the Herd answered back in an instant.” Joe Davidson, “Herd Remain on a Roll,” Sacramento Bee, 21 Nov. 1998, at D1.
• “Jake Armstrong quickly answered back for the Knights, but the two-goal cushion was short-lived.” Joe Connor, “La Jolla, Bishop’s Tie One On in Wester,” San Diego Union-Trib., 16 Dec. 1998, at D6.Some writers have used the sport phrase metaphorically—e.g.: “The last time somebody tried to impose prohibition on Chicago, the city answered back with Al Capone.” Peter Annin, “Prohibition Revisited?” Newsweek, 7 Dec. 1998, at 68. Despite the currency of this usage, answer can carry the entire load by itself.
LANGUAGE-CHANGE INDEX answer back for answer (outside sports): Stage 3
This pattern, where there is an exception to a rule when talking about a particular topic (or where a rule applies only when talking about the topic) will come up many times.
Rules are currently defined as functions over the full text of the document. It would be better to apply the functions to each paragraph separately. The reason for this is that, for many documents (especially large ones), most of the paragraphs will not change between saves or keystrokes, such that when these functions are memoized, most of the linter computations will be available right away.
See, e.g., http://www.intelligentediting.com/
This is the first entry of Garner's Modern American Usage.
There's a discussion about implementing it on StackOverflow.
If I quote someone, the linter shouldn't try to correct me on their prose.
It would be good to include an automated test sweet that runs the linter over writing that is written by a great author and has already been heavily edited and copyedited (e.g., an essay from The New Yorker that went on to win the Pulitzer prize in nonfiction) . The linter should be nearly silent.
the following need some more thought before including.
it would also be good to go through all the clichés and think of variant forms that might appear.
I want a single code file for each check that:
Here a sample of what I'm imagining:
"""DFW001: Comparing uncomparables.
---
layout: post
error_code: DFW201
source: David Foster Wallace
title: PL001: Comparing an uncomparable
date: 2014-06-10 12:31:19
summary: Comparing an uncomparable.
categories: check
---
David Foster Wallace says:
> This is one of a class of adjectives, sometimes called "uncomparables", that
can be a little tricky. Among other uncomparables are precise, exact, correct,
entire, accurate, preferable, inevitable, possible, false; there are probably
two dozen in all. These adjectives all describe absolute, non-negotiable
states: something is either false or it's not; something is either
inevitable or it's not. Many writers get careless and try to modify
uncomparables with comparatives like more and less or intensives like very. But
if you really think about them, the core assertions in sentences like "War is
becoming increasingly inevitable as Middle East tensions rise"; "Their cost
estimate was more accurate than the other firms'"; and "As a mortician, he has
a very unique attitude" are nonsense. If something is inevitable, it is bound
to happen; it cannot be bound to happen and then somehow even more bound to
happen. Unique already means one-of-a-kind, so the adj. phrase very unique is
at best redundant and at worst stupid, like "audible to the ear" or
"rectangular in shape". You can blame the culture of marketing for some of
this difficulty. As the number and rhetorical volume of US ads increase, we
become inured to hyperbolic language, which then forces marketers to load
superlatives and uncomparables with high-octane modifiers (special --- very
special --- Super-special! --- Mega-Special!!), and so on. A deeper issue
implicit in the problem of uncomparables is the dissimilarities between
Standard Written English and the language of advertising. Advertising English,
which probably deserves to be studied as its own dialect, operates under
different syntactic rules than SWE, mainly because AE's goals and assumptions
are different. Sentences like "We offer a totally unique dining experience";
"Come on down and receive your free gift"; and "Save up to 50 per cent... and
more!" are perfectly OK in Advertising English — but this is because
Advertising English is aimed at people who are not paying close attention.
If your audience is by definition involuntary, distracted and numbed, then free
gift and totally unique stand a better chance of penetrating — and simple
penetration is what AE is all about. One axiom of Standard Written English is
that your reader is paying close attention and expects you to have done the
same.
"""
import re
def check(text):
error_code = "PL001"
msg = "Comparison of an uncomparable." # do formatting thing
comparators = [
"very",
"more",
"less",
"extremely",
"increasingly"
]
uncomparables = [
"unique",
"correct",
"inevitable",
"possible",
"false",
"true"
]
errors = []
for comp in comparators:
for uncomp in uncomparables:
occurences = [
m.start() for m in re.finditer(comp + "\s" + uncomp, text)]
for o in occurences:
errors.append((1, o, error_code, msg))
return errors
def test1():
pass
I may need your advice on this one.
I know to do pull requests requires having set up a separate fork of the repo (or at least I think I know that), and I successfully managed to add my fork as a repo, but I fear trying to push changes and overriding anything you've done.
Or should I not worry about that? This is the kind of thing that is most frustrating about trying to work on these projects — I don't want to break anything but I'm not sure always how to properly set it up so that everything is correctly following version control protocol.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.