github-linguist / linguist Goto Github PK
View Code? Open in Web Editor NEWLanguage Savant. If your repository's language is being reported incorrectly, send us a pull request!
License: MIT License
Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
License: MIT License
ugh, anyone with Ruby experience want to figure out why github's linguist does not consider .nim files to be the Nimrod language anymore?
I'm quite sure it fails on my comp because I have the latest Ruby version and it doesn't support it.
I don't know what I need, all I want is to get linguist to run.
I also noticed that linguist fails with an error:
custom_require.rb:36:in `require': cannot load such file -- pygments (LoadError)
But I can't find any "gem install pygments"
Do people really HAVE to use bundler in order to try linguist? I don't like bundler
at all, it messes up things in ways I don't want to. :(
All we have to find out is why linguist no longer recognizes .nim files
Nimrod:
type: programming
color: "#37775b"
primary_extension: .nim
extensions:
It should work but it does not.
(.nim are default extensions for nimrod files)
I would be nice to have a highlighter for git commits, so I could paste the output of git show
around "```commit" and it would look nice.
(p.s., I know about the diff highlighter, I'm mainly talking about making the message and the metadata before it look nice)
I started a Ruby projects thats a Rails generator. In github search, its categorized as Javascript project. Github support said linguist is behind the project categorization, so I though I'd file an issue.
Project: https://github.com/joshcrews/flexible_admin
https://github.com/search?type=Repositories&language=&q=flexible+admin
Linguist is flagging any file with a *.n extension as Nemerle, but the extension is used by Neko binary code.
Since this is compiled code, I don't think it should be counted towards any source code total -- but it should not be flagged as Nemerle!
For example, I have a project which includes haXe source code, that compiles to a Neko application for processing Javascript, building JS projects, etc. 68% of the file total is the compiled *.n application, while the rest is the haXe source code.
It's not clear whether this limitation is intentional or if this is a side effect of the YAML loading, but it's not possible to update the Classifier instance with a new language.
I'm trying to learn new languages to the already existing classifier at the smallest cost possible and I'm trying to follow the following workplan:
#gc
should be the one I have to call, but according to the source, it does not do anything. It this something you plan to implement ?)Do you think this is an acceptable use of your library ?
Right now, I'm duck typing Language to feed Classifier#train
, this seems to be enough for it to work. Because the Classifier
is not dependent on Language
at all, maybe #train
could simply use a String as parameter (and #classify
return Strings too). This would greatly simplifies the interop with your lib :-)
Following, a simple test-case and patch that allows the test-case to pass.
Cheers,
Pierre.
The Classifier should be able to pick up on shebang scripts and detect them correctly.
Try to get our current mime-type extensions pushed upstream to the mime-types
lib. Then try to decouple integration from Linguist. Language detection shouldn't be dependent on any sort of mime type.
Theres already a linguist
gem, we'll take github-linguist
.
We need to check these files contents. See this repo's tests. They're not perl.
I found the place where #!
files are analyzed for the right language, but I don't see anywhere a way to extend it. In our case, the simplest way to identify a Racket file would be to look for a #lang
line (see example here). A less precise but possibly more broadly useful heuristic is to look for an exec foo
line near the top of the file.
Either way, it's not clear whether this is intended to be customizeable, and if so, how to do it.
foundation detected as ~75% php.
But php files in foundation use a lot of php and one to three php instructions.
It should be detected as ~70% html and ~5% php
create a repo like mine: https://github.com/borgified/linguist-test
Hello,
I have a repository in Github, the Refu Library, which is a pure C project. For some reason the majority of the source files are identified as Objective C and so the project itself is tagged as Objective C. Here is the repository:
http://github.com/LefterisJP/Refu/
I have no knowledge of Ruby so I can't understand how the Linguist project works to find the problem. Any assistance with this matter will be appreciated.
Prolog files are once again misclassified as Perl files. The disambiguation code seems to have been removed. The current specs for Prolog defines "primary_extension" as ".prolog", which nobody in the Prolog programming community uses and ever used. The default extension for Prolog is ".pl" (long before Perl ever existed). How to get the disambiguation functionality back?
We've submitted our pull request to Pygments to add Lasso as a programming language, and it's been accepted! Lasso now has a lexer:
https://bitbucket.org/birkenfeld/pygments-main/pull-request/95/new-lexer-for-the-lasso-language
What do I need to do next to get Lasso added into Linguist? I need to know which files I should edit for my pull request. Thank you!
Draft Blog Post.
Write up a more complete README.
I think that all *.elf files should be marked as binary automatically (without reading the file)
At the moment it is recognized as Perl.
edit: Spelling. Both English and Perl are not my native language ;-)
Linguist is getting Verilog and Coq confused (see Verilog projects
included in https://github.com/languages/Coq and Coq projects included
in https://github.com/languages/Verilog). Both use .v files. I've gone
through the commit history and the first place that I can get it to
fail is at 4484011, however it may be
failing one commit before that at
c114d71. I can't tell for the latter
commit as that fails the Matlab / obj-c case first. Everything passes
if you go one commit earlier.
I'm using some of my Verilog files to test it, specifically, the files
sitting in https://github.com/seldridge/verilog, and linguist just
isn't having it. Linguist continues to pass for the one test file
(sha-256-functions.v) currently in use. I'm no Ruby guy, so I haven't
attempted to look into this in any significant depth beyond the regex
in blob_helper.rb. This doesn't seem to be the issue as it's picking
up the important matches in my testcases, namely comment structure and
the "module" keyword.
https://github.com/github/linguist/blob/master/lib/linguist/repository.rb#L27
Repository
requires all the repo blobs be allocated at once. We need to defer this for larger repos.
Pygments now supports Coq .v files. See https://bitbucket.org/birkenfeld/pygments-main/issue/734/support-for-coq
Would it be possible to get this into Github?
Thanks.
http://pygments.org/download/ -- Release 1.5 "Zeitdilatation" is out!
Depends on pygments/pygments.rb#15, unless linguist is still using github/albino in production.
#129 depends on this issue.
Some repositories (like SignalR), have samples that include common javascript libraries like jQuery etc. and github ends up classifying the project as javascript instead of C# (in this particular case). Nothing is wrong with this at a high level since jQuery is javascript, but for project maintainers that want more control over statistics need a way to opt out of this behavior.
I see 2 options:
https://github.com/mishoo/UglifyJS/pull/172/files
Some JS files with just a couple long lines are getting marked as minified.
I'm receiving the following error when I try to install linguist via bundle:
linguist at /usr/lib64/ruby/gems/1.9.1/bundler/gems/linguist-d8903afc12b1 did not have a valid gemspec.
This prevents bundler from installing bins or native extensions, but that may not affect its functionality.
The validation message from Rubygems was:
authors may not be empty
If I clone linguist locally and add an authors line to the .gemspec file, it works fine.
I'm on ruby 1.9.1
It would be nice if linguist would be able to read a .linguist-ignore
file at the root of the project (or any other name) to be able to not process some files. These files (which can either be auto-generated or imported) are usually not in the same language that the initial project, and may become eventually quite big, so making the statistics completely wrong.
If you thing that feature is useful, I'm happy to propose a patch.
In the Readme, the example is:
Linguist::Blob.new("linguist.rb")
But that class does not exist.
Escript bundle is a compressed Erlang script. Linguist detect it incorrectly as a JavaScript:
$ file ./rebar
./rebar: a escript script text executable
$ linguist ./rebar
./rebar: 0 lines (0 sloc)
type: Binary
mime type: text/plain
language: JavaScript
$
...so many Erlang projects that are shipped with rebar build tool script may be detected as JavaScript projects alghough they are pure-Erlang!
Hello,
few weeks ago (remember ? #208) we added MaxMSP samples in the JSON folder ; but now files are detected as JavaScript. MaxMSP code/patcher is a graph of objects, dynamically load at runtime ; it is save as JSON but have nothing related to JavaScript.
IMHO the only solution should be to add extensions to "languages.yml" : ".mxt" is the old format (Max 4) ; Since Max 5 the extensions are ".maxpat" and ".maxhelp".
Since the new language breakdown bar was introduced, I keep seeing the Modelica language in most of my repositories, even if I didn't even know such a language existed.
Example: http://i.imgur.com/akW7P.png
Travisbot failed this request: #216
To be honest, fairly new to Github and while it looked like contributing to linguist would prove straightforward, something has clearly gone awry. Any idea what?
The last part of the README file talks about using some bundle
thing, which I guess is some ruby utility. Maybe add some more exact description for the uninitiated masses?
Could you please update your pygments. There is an updated version of the autohotkey lexer in it that is much better.
https://bitbucket.org/birkenfeld/pygments-main/changeset/1c549d7cb1db
Thanks
The syntax of Twig templates is equivalent of the Jinja one (but for PHP projects instead of Python ones) so it could probably be done by reusing the Jinja lexer.
Twig is the default templating engine for Symfony2 (which uses Github) so it would help a lot to have proper highlighting for .twig
files.
I've seen you consider Matlab's extension as .matlab, however it is popular to use .m (one of the standard extensions).
I know this conflicts with Objective-C's m files, but it would be interesting to have an option to make syntax checks to guess the extension in dubious cases.
This is confusing to me, as I have both Objective-C and Matlab repositories.
The creation of file blobs can fail on creation because the file contents might be encoded. This issue should only be present in Ruby 1.9+ as Ruby 1.8 did not care for encoded files.
A tempory solution is to do this in the file_blob.rb
# Public: Read file contents.
#
# Returns a String.
def data
File.read(@path).encoding.to_s
end
Only thing is the test cases fail now.
Note: If this project was only intended to only work with Ruby 1.8, then disregard this
play framework is a Java framework and I believe has a sloc ≥90% of Java. However it shows 76% of it is Python. What could be possibly wrong?
There should be support for .wsgi files, they contain python code so it´s just another python file extension..
links: http://en.wikipedia.org/wiki/Wsgi , http://www.python.org/dev/peps/pep-3333/
Check out the md
, txt
, and zip
files in this repo. They all contain the same content, but the zip
file is presented as a binary would be. That's not right!
Add .psd1 (module manifest) into the PowerShell syntax group
Hello,
The documentation states that it should returns floats. On my installation, it returns negative numbers:
[[#<Linguist::Language name=PHP>, -66.98989614319586],
[#<Linguist::Language name=JavaScript>, -68.77510897386178],
[#<Linguist::Language name=Ruby>, -70.7837674453772],
[#<Linguist::Language name=Perl>, -71.16156437444059],
[#<Linguist::Language name=Gosu>, -72.90117504252562],
[#<Linguist::Language name=Python>, -73.0532406574862],
[#<Linguist::Language name=Objective-C>, -74.10993364147689],
[#<Linguist::Language name=TeX>, -77.81775680913668],
[#<Linguist::Language name=Java>, -78.66295010514327],
[#<Linguist::Language name=Kotlin>, -79.19112391377584],
[#<Linguist::Language name=Scala>, -79.596874273976],
[#<Linguist::Language name=C++>, -80.16597822216151],
[#<Linguist::Language name=CoffeeScript>, -83.44077180874064],
[#<Linguist::Language name=Apex>, -83.80881093343098],
[#<Linguist::Language name=C>, -85.47097078986161],
[#<Linguist::Language name=AppleScript>, -85.68956917025051],
[#<Linguist::Language name=SCSS>, -86.60214237229394],
[#<Linguist::Language name=Groovy>, -86.89541966825266],
[#<Linguist::Language name=Shell>, -87.43588353355483],
[#<Linguist::Language name=Dart>, -87.459050333217],
[#<Linguist::Language name=Coq>, -88.6740351917743],
[#<Linguist::Language name=Rust>, -93.09294395196528],
[#<Linguist::Language name=Nemerle>, -93.21419319559817],
[#<Linguist::Language name=PowerShell>, -93.51902834727619],
[#<Linguist::Language name=Arduino>, -93.5392310545937],
[#<Linguist::Language name=Opa>, -93.78609113252523],
[#<Linguist::Language name=XQuery>, -93.83645881136175],
[#<Linguist::Language name=R>, -94.21217552783614],
[#<Linguist::Language name=Delphi>, -94.35016127081002],
[#<Linguist::Language name=SuperCollider>, -94.40855958019455],
[#<Linguist::Language name=Verilog>, -94.8229388269385],
[#<Linguist::Language name=OpenCL>, -96.50244013644215],
[#<Linguist::Language name=Groovy Server Pages>, -96.56948552051941],
[#<Linguist::Language name=Racket>, -97.8652823987905],
[#<Linguist::Language name=OCaml>, -99.6352432871025],
[#<Linguist::Language name=Matlab>, -101.76930665936734],
[#<Linguist::Language name=XML>, -101.8170795450655],
[#<Linguist::Language name=Haml>, -102.25666430330622],
[#<Linguist::Language name=Scilab>, -102.64814316943966],
[#<Linguist::Language name=INI>, -102.66212941141441],
[#<Linguist::Language name=Logtalk>, -103.5329577692118],
[#<Linguist::Language name=GAS>, -103.96895960118005],
[#<Linguist::Language name=Sass>, -104.20257445236155],
[#<Linguist::Language name=Turing>, -104.82161366076778],
[#<Linguist::Language name=OpenEdge ABL>, -105.1428606897919],
[#<Linguist::Language name=VimL>, -112.11353183520714],
[#<Linguist::Language name=Standard ML>, -112.11353183520714],
[#<Linguist::Language name=Nu>, -112.80667901576709],
[#<Linguist::Language name=Parrot Assembly>, -112.80667901576709],
[#<Linguist::Language name=Scheme>, -112.80667901576709],
[#<Linguist::Language name=Julia>, -112.80667901576709],
[#<Linguist::Language name=Ioke>, -112.80667901576709],
[#<Linguist::Language name=Rebol>, -112.80667901576709],
[#<Linguist::Language name=Parrot Internal Representation>, -112.80667901576709],
[#<Linguist::Language name=Emacs Lisp>, -112.80667901576709],
[#<Linguist::Language name=Tea>, -112.80667901576709],
[#<Linguist::Language name=Nimrod>, -112.80667901576709],
[#<Linguist::Language name=VHDL>, -112.80667901576709],
[#<Linguist::Language name=Diff>, -112.80667901576709],
[#<Linguist::Language name=Markdown>, -112.80667901576709],
[#<Linguist::Language name=Visual Basic>, -112.80667901576709],
[#<Linguist::Language name=Prolog>, -112.80667901576709],
[#<Linguist::Language name=AutoHotkey>, -112.80667901576709],
[#<Linguist::Language name=XSLT>, -112.80667901576709],
[#<Linguist::Language name=YAML>, -112.80667901576709]]
Still the results are in the correct order...
ruby --version
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]
The same behavior on x86_64 linux.
Hi,
I'm trying to train the Classifier and hence to serialize it to disk. I run into an issue while trying to serialize the default Classifier:
irb(main):006:0> Linguist::Classifier.instance.to_yaml($STDOUT)
ArgumentError: comparison of Array with Array failed
from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:172:in `sort'
from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:172:in `block in to_yaml'
from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:170:in `each'
from /home/oct/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1/gems/github-linguist-2.0.1/lib/linguist/classifier.rb:170:in `to_yaml'
from (irb):6
from /home/oct/.rbenv/versions/1.9.3-p194/bin/irb:12:in `<main>'
Shall we add highlighting for it? https://github.assistly.com/agent/case/2839
Compile in Linux this simple assembly program using ("as exit.s -o exit.o;ld exit.o -o exit;rm exit.o"):
.section .data
.section .text
.globl _start
_start:
movq $111, %rdi
movq $60, %rax
syscall
And run "bundle exec linguist folder" you will see this:
88% Perl
12% Assembly
My Github repo is not getting any graph data. This is built on approx 98% PHP and a little bit of Javascript. Not sure why I am not getting stats anymore (I used to).
-Chris
Drop Linguist::Pathname
.
I can't understand why linguist detect my main project language as Objective-C. It's completely written in C++ (Qt). I don't know Ruby language, so I can't find problem. Can anyone help me?
P.S. My project does not have any *.mm or *.m files. It has only *.h, *.cpp, *.ui, *.qrc, *.css, *.png files.
P.P.S. Problem in GitHub "language color bar" (at the right top of repo page). It's OK with main language.
I didn't see a way to pass options to each lexer from languages.yml
but it would be great to have the startinline
option in Pygments turned on for PHP. See Lexars for web-related languages and markup under PhpLexer:
startinline
If given andTrue
the lexer starts highlighting with php code (i.e.: no starting<?php
required).
The default isFalse
.
Ideally, this sample snippet of PHP code from the Symfony2 project would be highlighted with ```php without having to include <?php
:
/**
* Client simulates a browser and makes requests to a Kernel object.
*
* @author Fabien Potencier <[email protected]>
*
* @api
*/
class Client extends BaseClient
{
protected $kernel;
/**
* Constructor.
*
* @param HttpKernelInterface $kernel An HttpKernel instance
* @param array $server The server parameters (equivalent of $_SERVER)
* @param History $history A History instance to store the browser history
* @param CookieJar $cookieJar A CookieJar instance to store the cookies
*/
public function __construct(HttpKernelInterface $kernel, array $server = array(), History $history = null, CookieJar $cookieJar = null)
{
$this->kernel = $kernel;
parent::__construct($server, $history, $cookieJar);
$this->followRedirects = false;
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.