Coder Social home page Coder Social logo

facebook / nailgun Goto Github PK

View Code? Open in Web Editor NEW
732.0 42.0 137.0 5.35 MB

Nailgun is a client, protocol, and server for running Java programs from the command line without incurring the JVM startup overhead.

Home Page: https://github.com/facebook/nailgun

License: Other

Java 71.84% Makefile 0.46% C 9.34% Python 18.36%

nailgun's Introduction

nailgun

Build status


Note: Nailgun is based on original code developed by Marty Lamb. In October, 2017, Marty transferred the repository to Facebook, where it is was previously maintained by the Buck1 team. In April, 2023, Buck1 was deprecated in favor of Buck2, which does not use Nailgun. As a result this repository is now unmaintained.

Nailgun remains available under the Apache license, version 2.0.


Build and Installation

Nailgun is a client, protocol, and server for running Java programs from the command line without incurring the JVM startup overhead.

Programs run in the server (which is implemented in Java), and are triggered by the client (written in C), which handles all I/O.

The server and examples are built using maven. From the project directory, "mvn clean install" will do it.

The client is built using make. From the project directory, "make && sudo make install" will do it. To create the windows client you will additionally need to "make ng.exe".

This repository contains implementations of a nailgun client in Python and in C.

For additional client implementations in other languages, see:

  • snailgun, a client implementation written in Scala that compiles to native.
  • railgun, a client implementation written in Ruby.

For more information, see the nailgun website.

License

Apache License 2.0

Legal

nailgun's People

Contributors

bhamiltoncx avatar blaisorblade avatar drslump avatar dwightguth avatar eed3si9n avatar gaul avatar henrich avatar ilya-klyuchnikov avatar jimpurbrick avatar jsmucr avatar jvican avatar kwlzn avatar lauri-elevant avatar lhns avatar martylamb avatar mrk-andreev avatar nataliejameson avatar natthu avatar ndmitchell avatar sbalabanov avatar sbalabanov-zz avatar sethp-jive avatar styurin avatar timuralp avatar ttsugriy avatar valencik avatar vemv avatar vhristov avatar xavierd avatar zpao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nailgun's Issues

Exception after the first request in casting Apache Felix ThreadInputStream

I'm trying to use jclouds-cli with nailgun and after the first request exits successfully, subsequent requests fail with:

Exception in thread "NGSession 2: 127.0.0.1: org.jclouds.cli.runner.Main" java.lang.ClassCastException: org.apache.felix.gogo.runtime.threadio.ThreadInputStream cannot be cast to com.martiansoftware.nailgun.ThreadLocalInputStream

I'm trying to figure this out, but if someone more familiar with Java/Apache Felix has an idea of what's going on, and can point me in the right direction, that would be awesome.

Nailgun client read future was interrupted

Hello, we're using nailgun in bloop: http://github.com/scalacenter/bloop.

We're using latest master for both the server and the python plugin, and we're getting this warning all the time in the server logs (after every executed command).

Dec 12, 2017 12:40:41 PM com.martiansoftware.nailgun.NGInputStream$1 run
WARNING: Nailgun client read future was interrupted
java.lang.InterruptedException
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
	at java.util.concurrent.FutureTask.get(FutureTask.java:204)
	at com.martiansoftware.nailgun.NGInputStream$1.run(NGInputStream.java:91)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Why is this happening? How can we make this warning disappear?

Static variables shared across instances

Run directly under normal java the following code just outputs "Hello, World". When run under nailgun it prints an increasing number of "Worlds" each time it is run.

public class HelloWorld {
    static String x="Hello, ";
    public static void main(String[] args) {
        x=x+"World";
        System.out.println(x);
    }
}

I doubt this is a "code bug" that can be fixed, but is nevertheless a trap for the unwary that I think should be documented. It could also be documented that the main() function should be threadsafe as nailgun does not appear to serialize the requests but instead starts multiple threads at a time.

Nailgun does not respect remote environment properties in session.

I am passing some system properties to nailgun client.
However, those system properties are not set in the jvm.

so, if i make a call, with environment properties.
{
'Nailgunproperty': 'true'
'someproperty' : 'sometext'
}
My main class needs the property set via jvm -D flag

Usually if i dont want to use nailgun server the way i run the main is

jvm -Dsomeproperty=sometext main_class

NGSessionPool Size size is always 0

Javadoc ->

/**	 
* number of sessions to store in the pool
 */
final int poolSize;

And in constructor ->
this.poolSize = Math.min(0, poolsize);

UTF-8 arguments are not passed on correctly

Command line arguments that contain non-ASCII characters are not passed on correctly via nailgun.

Consider the following simple class:

package de.thorstenvitt.nailgun;
public class EchoArguments {
    public static void main(String[] args) {
        for (int i = 0; i < args.length; i++) {
            System.out.printf("%2d: %s\n", i, args[i]);
        }
    }
}

When calling this using ng de.thorstenvitt.nailgun.EchoArguments über in a UTF-8 locale, it outputs

 0: ��ber

data race on field com.martiansoftware.nailgun.NGServer.shutdown

Two race reports on this:

Data race on field com.martiansoftware.nailgun.NGServer.shutdown: {{{
    Concurrent read in thread T13 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGServer.run(NGServer.java:416)
    T13 is created by T1
        at org.kframework.kserver.KServerFrontEnd.run(KServerFrontEnd.java:72)

    Concurrent write in thread T47 (locks held: {Monitor@77ca2286})
 ---->  at com.martiansoftware.nailgun.NGServer.shutdown(NGServer.java:320)
        - locked Monitor@77ca2286 at com.martiansoftware.nailgun.NGServer.shutdown(NGServer.java:316) 
        at org.kframework.kserver.KServerFrontEnd.nailMain(KServerFrontEnd.java:130)
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
    T47 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)
}}}

Data race on field com.martiansoftware.nailgun.NGServer.shutdown: {{{
    Concurrent read in thread T13 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGServer.run(NGServer.java:426)
    T13 is created by T1
        at org.kframework.kserver.KServerFrontEnd.run(KServerFrontEnd.java:72)

    Concurrent write in thread T47 (locks held: {Monitor@77ca2286})
 ---->  at com.martiansoftware.nailgun.NGServer.shutdown(NGServer.java:320)
        - locked Monitor@77ca2286 at com.martiansoftware.nailgun.NGServer.shutdown(NGServer.java:316) 
        at org.kframework.kserver.KServerFrontEnd.nailMain(KServerFrontEnd.java:130)
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
    T47 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)
}}}

The spinning loop here is broken because NGServer.shutdown is neither volatile nor atomic variable and it's not protected by a lock. Judging from the code, you probably want NGServer.shutdown to be set to true exactly once. So in this case, I would say it's better to declare shutdown as an AtomicBoolean.

mvn clean install does not work as it asks for gpg key

[INFO] Not executing Javadoc as the project is not a Java classpath-capable package
[INFO] 
[INFO] --- maven-gpg-plugin:1.4:sign (sign-artifacts) @ nailgun-all ---
GPG Passphrase: *
gpg: no default secret key: väärä salasana
gpg: signing failed: väärä salasana
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] nailgun-server .................................... SUCCESS [4.251s]
[INFO] nailgun-examples .................................. SUCCESS [0.873s]
[INFO] nailgun-all ....................................... FAILURE [14.269s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19.512s
[INFO] Finished at: Wed Nov 28 18:34:50 EET 2012
[INFO] Final Memory: 9M/81M
[INFO] ------------------------------------------------------------------------

The README says the server should be created with mvn clean install but this is not possible without configuring gpg first.

Disable broken alias support in VMS in favour of defining foreign command aliases

On OpenVMS, symlinks are not supported. Command aliases are easily created using foreign commands, but those foreign commands are expanded by the time they reach argv[0]. See:

http://h71000.www7.hp.com/commercial/c/docs/5492p007.html

Thus, the alias defined in the foreign command should include the alias class argument, something that can be easily done in foreign commands that cannot be done with symlinks.

I have patched the client for VMS here, and tested

bg@5b2b1d8

If that looks good, I'll issue a pull request.

Thanks,
Ben
p.s. Yesterday, @headius filed #27 after he helped me figure out that was preventing my DCL wrapper for launching the nailgun server from working. Paired with this fix, JRuby 1.7.9 on an HP rx2600 on OpenVMS V8.2 improves from ~11.5s to ~2s to execute "jruby -e 'puts %[hello world]'" ... Impressive!

connect: Connection refused

when I start a server on 127.0.0.1 on port 2114,it starts successfully,But when I use command ng Classfile, it told me "connect: Connection refused".

if I started a server without address and port, the client worked fine, it gave the output .what's the problem?

Shipping Nailgun with a JVM?

I would like to use Packr to remove the JVM deployment dependency from my application, but I would also like to keep the benefits of Nailgun.

Is it currently possible to ship Nailgun with a JVM?

add git tags

Tags for the different versions would be nice. It makes it easier to make and maintain build scripts for example.

You can push git tags with git push --tags

Missing license

Nailgun used to ship license, which is missing after transition to GitHub. Could you please add the license back? Thank you.

No tag for Nailgun 0.9.3?

I noticed that there's no tag for Nailgun 0.9.3 (nor for 0.9.2, by the way).

Is there a reason for not pushing tags along with the releases?

ng command collides with angular-cli

The default installed ng command collides with the Angular-CLI project which unfortunately uses the same command name.

This becomes an issue when you want to install both, and npm may overwrite the installed version of nailgun in /usr/local/bin which happened to me.

On Ubuntu 16.04 when you install the nailgun package (apt install nailgun), it maps nailgun to the command ng-nailgun instead of ng. Not sure if the ng collision is why, but the documentation for nailgun is now out of sync with those using Ubuntu.

Due to the popularity of both Angular-CLI and Ubuntu, I would suggest considering another command name to be consistent across platforms. While ng-nailgun is a bit long, anyone can always link it to ng for existing scripts.

On that note though, thanks for this awesome tool! My company uses the protocol for a Node server to speed up running Node.js scripts: Nodegun.

Add thread locals for system properties

The system properties (remoteEnv in the code) depend on the nailgun client. They are accessible in the nailgun context. However, they need to be accessed explicitly at the use site: that code (which can come from dependencies) depending on java.lang.System.getProperty won't work.

I think for user-defined system properties to be usable, Nailgun should have thread locals and update the system-wide properties with the user-defined properties per ngsession.

Provide mechanism for the client to exit with appropriate status code

Currently, when the client sends the task to the server that throws an exception, the client process exits with 131 no matter what the actual status code should be.

Using JRuby (which uses basically the stock ng.c in nailgun-client) as an example, the steps to reproduce this behavior is as follows:

$ jruby --ng-server &                    
[1] 34927
$ NGServer started on all interfaces, port 2113.

$ jruby --ng -e 'at_exit do; exit 1; end'
org.jruby.exceptions.RaiseException: (SystemExit) exit
$ echo $?
131

JRuby sets the status code to 1 in this case, but nailgun is overwriting it with 899, which gets truncated to 131 as an int.

It would be great if there is a way to pass the appropriate status code from the server to the client.

This issue was reported as http://bugs.jruby.org/7031. I have a temporary candidate fix to overwrite the client's status code to 1; this is not ideal, since it adds a maintenance overhead and makes the nailgun update slightly more tedious.

Interrupt Server Thread When Client Is Killed

Currently if the client is killed the server thread continues to run which is counterintuitive for command line applications. It would be nice if the client sent a message to the server when killed allowing it to set a killed flag on the server thread and/or interrupt any I/O the server thread is currently performing. I'm happy to work on a patch for this if it would likely be merged.

ng.py crashes in a cygwin environment when trying to reference Kernel32 dll

I came across this problem when using the scalacenter bloop project, which uses a modified copy of ng.py, here's the symptom I was seeing:

$  python pynailgun/ng.py --nailgun-port 8212 about
Traceback (most recent call last):
  File "pynailgun/ng.py", line 775, in <module>
    k32 = ctypes.CDLL("Kernel32", use_errno=True)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 366, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: No such file or directory

After some experimentation, I came up with a modification, which resolves the problem on both of my systems. On line 775 of ng.py, I changed this:

elif sys.platform == "cygwin":
    k32 = ctypes.CDLL("Kernel32", use_errno=True)

to this:

elif sys.platform == "cygwin":
    try:
        k32 = ctypes.CDLL("Kernel32", use_errno=True)
    except:
        k32 = ctypes.CDLL("Kernel32.dll", use_errno=True)

I assume the ng.py script works as-is on some system somewhere, although it fails on both of my Windows 10 cygwin64 python2.7 systems, a desktop and a laptop. It's conceivable that the code as it's currently configured doesn't work in any cygwin environment, and that there are no such users. Alternately, it might be that it worked at one time, but that cygwin64 has changed in some way.

Support Python3

Hi, just wondering if python3 support has been investigated and if a PR adding support for python3 (without breaking support for python2) would be accepted?

Significant slowdown and possible increased memory usage since df35649

We created a fork of commit df35649 some time ago in order to add a few pieces of functionality missing in that commit (see #49, #48, #47). This commit (runtimeverification/nailgun@9386dea) served us quite well in our project for some time and continues to be used currently with minor modification (see yilongli/nailgun@d3e321a, which contains five commits that we cherry-picked to fix race conditions).

Recently I attempted to upgrade our version of nailgun to be a fork of #76, but unfortunately had to put that task on hold because it seems that something in between these two commits has had a very significant impact on the performance of a nailgun server running several concurrent processes. You can see my commit that is subject to the performance penalties at runtimeverification/nailgun@c284046. Before I made the change of nailgun version, our mvn verify terminated on our build server in roughly 6 minutes with the longest-running integration test taking roughly 70 seconds to complete. Aftewards, the build started intermittently hitting its 10-minute timeout window, as well as throwing occasional OutOfMemoryErrors, and the increase in time spent running the same single program was over 100%, increasing to between 200 and 300 seconds.

Now it's possible that both situations are in some way due to GC thrashing and that modifying the GC settings will fix both errors. But even if that is the only source of the difference in performance, that would imply that some recent change to nailgun has significantly increased nailgun's memory footprint. I haven't done a lot of investigating of this issue, but it seems to affect both local and network sockets. I don't have any more time to devote to trying to figure this out at the moment, but @bhamiltoncx suggested I file an issue with the information in case someone would like to investigate. The project in question is github.com/kframework/k. I know people have had issues building it on some platforms, which I haven't had the time to fix yet, but you are welcome to try it out (to build, just run mvn verify).

bad performance: sending stdin from client to server

The code which is responsible to transfer stdin from client to server performs very bad.

Assume a simple java program which reads stdin and just prints the number of lines

cat file_with_some_thousand_lines | java CountLines

this is fast. If you run

cat file_with_some_thousand_lines | ng CountLines

this is slooooow

data race on field com.martiansoftware.nailgun.NGSessionPool.done

Race report:

Data race on field com.martiansoftware.nailgun.NGSessionPool.done: {{{
    Concurrent read in thread T15 (locks held: {Monitor@2c7d1e9c})
 ---->  at com.martiansoftware.nailgun.NGSessionPool.give(NGSessionPool.java:102)
        - locked Monitor@2c7d1e9c at com.martiansoftware.nailgun.NGSessionPool.give(NGSessionPool.java:101) 
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:362)
    T15 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)

    Concurrent write in thread T47 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGSessionPool.shutdown(NGSessionPool.java:116)
        at com.martiansoftware.nailgun.NGServer.shutdown(NGServer.java:328)
        at org.kframework.kserver.KServerFrontEnd.nailMain(KServerFrontEnd.java:130)
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
    T47 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)
}}}

Similar to #65. The assignment done = true at line#116 is not properly synchronized.

Problem with nested connections to a unix domain socket

I use NG to traverse a tree where the recursion is implemented by invoking a wrapper script as a subprocess at each node of the tree. The wrapper script detects when it's launched from the command line and starts the NG server. Each nested invocation of the script then takes the other path and runs the NG client connecting back to the server for each of its children. My current traversal goes 4 deep.

Works fine with the server listening on TCP, but throws the exception below when using a unix domain socket. It does seem to be a timing issue as on a loaded machine the NG server might throw one or two exceptions per thousand nodes while on an otherwise idle machine I'll see ~50 per thousand.

I will work up an example I can share, but it will take me a while.

I've tried various combinations of:
RHEL 6/7
JDK 8/10
JNA 4.4.0/4.5.1

Jul 11, 2018 2:40:08 PM com.martiansoftware.nailgun.NGCommunicator lambda$startBackgroundReceive$1
WARNING: Nailgun client read future raised an exception
java.io.IOException: com.sun.jna.LastErrorException: [104] Connection reset by peer
        at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.doRead(NGUnixDomainSocket.java:127)
        at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.read(NGUnixDomainSocket.java:98)
        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
        at com.martiansoftware.nailgun.NGCommunicator.readChunkImpl(NGCommunicator.java:482)
        at com.martiansoftware.nailgun.NGCommunicator.readChunk(NGCommunicator.java:465)
        at com.martiansoftware.nailgun.NGCommunicator.lambda$null$0(NGCommunicator.java:191)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: com.sun.jna.LastErrorException: [104] Connection reset by peer
        at com.martiansoftware.nailgun.NGUnixDomainSocketLibrary.read(Native Method)
        at com.martiansoftware.nailgun.NGUnixDomainSocket$NGUnixDomainSocketInputStream.doRead(NGUnixDomainSocket.java:125)
        ... 9 more

Prefer better build system instead? 230 s (4 minutes) for a fresh build is unacceptable.

Fresh build times:

third_party/java/nailgun$ time mvn package
230.53 real 20.14 user 2.54 sys
third_party/java/nailgun$ tup
... elided ...
[2.297s] third_party/java/nailgun/nailgun-server: [JAVAC_JAR] nailgun-server.jar
... elided ...

230 seconds is about 4 minutes! Versus 2.2 seconds. You don't have a ton of dependencies
with nailgun. Please drop Maven and consider using a real build system:
https://github.com/facebook/buck or http://gittup.org/tup/.

Nailgun loaded in bootclasspath raises NPE in alias logic

This line in AliasManager can raise NPE if the Nailgun classes are loaded in the boot classpath. Classes loaded this way have a null getClassLoader.

The fix is to explicitly use the system classloader when getClassLoader returns null.

This is obviously a somewhat unusual case, but we had a JRuby user hit it.

data race on field com.martiansoftware.nailgun.NGSession.done

Race report:

Data race on field com.martiansoftware.nailgun.NGSession.done: {{{
    Concurrent read in thread T62 (locks held: {Monitor@669c82b})
 ---->  at com.martiansoftware.nailgun.NGSession.nextSocket(NGSession.java:165)
        - locked Monitor@669c82b at com.martiansoftware.nailgun.NGSession.nextSocket(NGSession.java:163) 
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:186)
    T62 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)

    Concurrent write in thread T13 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGSession.shutdown(NGSession.java:133)
        at com.martiansoftware.nailgun.NGServer.run(NGServer.java:431)
    T13 is created by T1
        at org.kframework.kserver.KServerFrontEnd.run(KServerFrontEnd.java:72)
}}}

It seems that the assignment done = true at line 131 should be moved into the synchronization block below.

unable to link or copy the ng command

While discovering #110, I found out that linking the ng command to anything else fails to work.
e.g. sudo ln -s /usr/local/bin/ng /usr/local/bin/ng2

Running ng2 results in connect: Connection refused in the console instead of the help screen.

The same thing occurs on a hard copy or even moving the file.

Is this because the name is hard-coded into the source somehow? Anyways, it also makes command collisions even harder to reconcile.

The python nailgun client doesn't seem to like system processes

I have tried to run the Echo nail example via a system process, where my driver (which is a java process) runs the python script, sends input and waits for output. To do this, I use NuProcess.

I am, in fact, using system in and system out to communicate with a process with a certain protocol. The message I am sending from the java driver has the following shape: "$HEADER\r\n$BODY".

Here's the thing: when I run my process without nailgun, my application logic is run; when I run it with nailgun, it gets stucked and the java process doesn't get anything back via stdout. I have dug into the code and the culprit seems to be the nailgun python script.

This is what I have found:

  • There is indeed a race condition between the stdin thread and the logic that either sends the stdin to the server or receives it back. Sometimes it may read one part of what I send to the nailgun process (I found by adding some printlns + flush), but nothing is outputted after.
  • The stdin reading logic of the python script only reads line per line, read. This is problematic for me because (following the shape of my messages) the script will block and never read $BODY if there's no subsequent message including \n. This makes nailgun unusable for communication protocols.

Is there a way these issues can be addressed? I'm happy to have a look at them and implement a better approach, but would like to get some feedback on this first.

possible data race on the non-thread-safe NGServer.allNailStats

Hi, I am a developer of the K framework. While we are trying to catch the data races inside the K framework with a dynamic race detector, the race detector also prints out some race reports inside Nailgun, which K uses heavily.

Here is one:

Data race on field java.util.HashMap.$state: {{{
    Concurrent write in thread T15 (locks held: {Monitor@444ecc49})
 ---->  at com.martiansoftware.nailgun.NGServer.getOrCreateStatsFor(NGServer.java:245)
        - locked Monitor@444ecc49 at com.martiansoftware.nailgun.NGServer.getOrCreateStatsFor(NGServer.java:241) 
        at com.martiansoftware.nailgun.NGServer.nailStarted(NGServer.java:258)
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:314)
    T15 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)

    Concurrent read in thread T14 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGServer.nailFinished(NGServer.java:269)
        at com.martiansoftware.nailgun.NGSession.run(NGSession.java:332)
    T14 is created by T13
        at com.martiansoftware.nailgun.NGSessionPool.take(NGSessionPool.java:85)
}}}

It's basically saying that there are concurrent and conflicting accesses to NGServer.allNailStats, which is a HashMap, and that the allNailStats.get() operation is not properly synchronized. It seems the allNailStats.get() should also be wrapped inside a synchronization block.

Can you please confirm whether it is a false positive of the race detector? Thanks!

Overhead of SecurityManager

Running javac(not limited to) inside nailgun process has very high contention caused by security checks, in our case mostly:

java.io.UnixFileSystem.canonicalize0(Native Method)
java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:172)
java.io.File.getCanonicalPath(File.java:618)
java.io.FilePermission$1.run(FilePermission.java:215)
java.io.FilePermission$1.run(FilePermission.java:203)
java.security.AccessController.doPrivileged(Native Method)
java.io.FilePermission.init(FilePermission.java:203)
java.io.FilePermission.<init>(FilePermission.java:277)
sun.net.www.protocol.file.FileURLConnection.getPermission(FileURLConnection.java:225)
sun.misc.URLClassPath.check(URLClassPath.java:604)

We iterated over several solutions to this and settled on a custom java agent that ‘disables’ security manager. This keeps nailgun protocol intact, but in our case improves compilation speed within pantsbuild from 75 to 12 minutes.

package com.r9.nailgun;

import java.io.IOException;
import java.lang.instrument.ClassDefinition;
import java.lang.instrument.Instrumentation;
import java.lang.instrument.UnmodifiableClassException;
import java.net.JarURLConnection;

import javassist.ClassPool;
import javassist.CtClass;
import javassist.CtMethod;

/**
 * This class disables security manager without breaking nailgun protocol that depends on it.
 * Implementation notes:
 * Can't use transformer cause classes are already loaded.
 * Can't add fields and methods or mark fields public with redefinition.
 * SecurityManager is not accessible with reflection.
 * Tested with nailgun 0.9.1 and java 8, should be compatible with nailgun 0.9.3 (and java 9 ???)
 * Warning: This agent disables security manager. Use on your own risk.
 * 
 * @author Justinas Dabravolskas
 * @author Darius Prakaitis
 *
 */
public class ExitAgent {

    // This is a storage for securityManager that is used in Runtime.exit
    public static volatile java.lang.SecurityManager exitSecurity;

    public static void premain(String agentArgs, Instrumentation inst) {
        try {

            // make ExitAgent accessible to application, we probably load the second instance in
            // different class loader but who cares
            JarURLConnection connection = (JarURLConnection) ExitAgent.class.getClassLoader().getResource("com/r9/nailgun/ExitAgent.class").openConnection();
            inst.appendToBootstrapClassLoaderSearch(connection.getJarFile());

            inst.redefineClasses(new ClassDefinition(Class.forName("java.lang.System"), getSystemByteCode()));

            inst.redefineClasses(new ClassDefinition(Class.forName("java.lang.Runtime"), getRuntimeByteCode()));

        } catch (ClassNotFoundException e1) {
            e1.printStackTrace();
            throw new RuntimeException(e1);
        } catch (UnmodifiableClassException e1) {
            e1.printStackTrace();
            throw new RuntimeException(e1);
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }

    private static byte[] getSystemByteCode() {
        try {
            ClassPool cp = ClassPool.getDefault();
            CtClass cc = cp.get("java.lang.System");
            // don't access volatile field at all
            CtMethod getSecurityManager = cc.getDeclaredMethod("getSecurityManager", new CtClass[] {});
            getSecurityManager.setBody("{return null;}");

            CtMethod setSecurityManager = cc.getDeclaredMethod("setSecurityManager", new CtClass[] { cp.get("java.lang.SecurityManager") });
            // make security manager available to Runtime.exit only
            setSecurityManager.setBody("{com.r9.nailgun.ExitAgent.exitSecurity=$1;}");
            byte[] byteCode = cc.toBytecode();
            cc.detach();
            return byteCode;
        } catch (Exception ex) {
            ex.printStackTrace();
            throw new RuntimeException(ex);
        }
    }

    private static byte[] getRuntimeByteCode() {
        try {
            ClassPool cp = ClassPool.getDefault();
            CtClass cc = cp.get("java.lang.Runtime");
            CtMethod m = cc.getDeclaredMethod("exit", new CtClass[] { CtClass.intType });
            m.setBody("{SecurityManager security = com.r9.nailgun.ExitAgent.exitSecurity; if(security != null){security.checkExit($1);} Shutdown.exit($1);}");

            byte[] byteCode = cc.toBytecode();
            cc.detach();
            return byteCode;
        } catch (Exception ex) {
            ex.printStackTrace();
            throw new RuntimeException(ex);
        }
    }

}

NGSecurityManager causes unnecessary slowdowns

nailgun massively slows down programs that do a lot of file I/O.

Java File object will call SecurityManager.checkRead for various operations. In a normal environment with no SecurityManager, this is instant. Under nailgun, it calls the default implementation, which creates a FilePermission object and passes it to NGSecurityManager.checkPermission, which does nothing (if base is null). Constructing the FilePermission object can be expensive.

NGSecurityManager should implement checkRead and exit early if base is null.

We can submit a patch for this if it will be accepted.

How to properly terminate the nail?

I'm runnning storescp tool from dcm4che project. This utility work as server - listen some port in infinite loop and interupted by pressing CTRL+C keys. But how shutdown it in nailgun? I'm trying to close transport, but nail still work. Sending CTRL+C via stdin also not work. Thank you in advance for any help.

Limit access to the daemon to the same user

The Nailgun docs prominently note that:

Before you download it, be aware that it's not secure. Not even close. Although there are means to ensure that the client is connected to the server from the local machine, there is not yet any concept of a "user". Any programs that run in Nailgun are run with the same permissions as the server itself. You have been warned.

A standard approach to improve the security story would be to require that the client passes an authentication token that it reads from a file written by the server (this is often piggybacked on the file used for port discovery). This file can be restricted to be readable only be the current server user (locking down the file permissions is a bit fiddly to do in Java in a cross platform way, but is possible with the NIO APIs).

An alternative approach is to use Unix Domain Sockets / Windows Named Pipes (as is done in facebook/watchman), rather than a TCP socket on the loopback interface. This would require some platform-specific native code (or a library that wraps said native code) on the server side.

nailgun client exits 227 (NAILGUN_CONNECTION_BROKEN) after successful exit

I'm trying to run the Java CLI program FITS via its fits-ngserver.sh nailgun server launcher. On any successful invocation, the ng nailgun client always exits 227, e.g. NAILGUN_CONNECTION_BROKEN, even though the nailgun server log reports the command exiting 0. Running it directly via java exits 0 as expected.

When testing from the latest 0.9.2 commit, I noticed the same command shows up in the server log twice, which might provide some useful information:

NGSession 1: 127.0.0.1: edu.harvard.hul.ois.fits.Fits disconnected
NGSession 1: 127.0.0.1: edu.harvard.hul.ois.fits.Fits exited with status 0

I've tested with 0.7.1, 0.9.1, and the latest commit from the master branch, on Ubuntu and Mac OS X.

Windows client appears mute

We built clients for both Linux and Windows. The Linux one behaves as expected, the Windows one does not print the results of commands at all.

The Windows client happily prints the help content and that of version, but does not print things like classpath contents.

Stepping through the server we can see both clients executing code as expected and both clients receive network traffic so we're a little puzzled why one prints the returned value but the other does not. Continuing to investigate this...

Feature Request: Server Exit When Out of Memory

When out of memory or garbage collection errors occur, the server just hangs. I would really like the server to just exit since it can be run in an endless loop if needed and a hanging server does nothing good. It even hangs up the client, stopping anything that could fail gracefully if the server was borked.

recv: Connection reset by peer

I'm getting this messages on the client side after my job finishes. What might cause this?

recv: Connection reset by peer

This happens on third or fourth time I send the same nail to the server. First couple runs are usually clean.

I use nailgun to run time and memory consuming jobs. The reason to use nailgun is to share memory between the jobs that takes a while to bootstrap. The jobs are multi-threaded.

The job is silent until it's finished. At the end it prints the result in just one line using Console.out.println (it's in Scala). I noticed that if I print another line, then it won't be passed over to the client. i.e. something will happen between printing to the std out.

possible data race on NGServer.serversocket

Again, we got this race report when testing the K framework:

Data race on field com.martiansoftware.nailgun.NGServer.serversocket: {{{
    Concurrent read in thread T1 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGServer.getPort(NGServer.java:384)
        at org.kframework.kserver.KServerFrontEnd.run(KServerFrontEnd.java:74)
        at org.kframework.main.FrontEnd.main(FrontEnd.java:52)
        at org.kframework.main.Main.runApplication(Main.java:109)
        at org.kframework.main.Main.runApplication(Main.java:99)
        at org.kframework.main.Main.main(Main.java:51)
    T1 is the main thread

    Concurrent write in thread T13 (locks held: {})
 ---->  at com.martiansoftware.nailgun.NGServer.run(NGServer.java:413)
    T13 is created by T1
        at org.kframework.kserver.KServerFrontEnd.run(KServerFrontEnd.java:72)
}}}

It says the assignment serversocket = new ServerSocket(port, 0, addr) and the return statement return ((serversocket == null) ? port : serversocket.getLocalPort()); are in a data race. I think it's sufficient to declare serversocket as volatile.

IO redirection not working (OSX)?

I only tried this on OSX but it seems like IO redirection is not working properly. I built nailgun like it says on the wiki (configure/make in tools/nailgun).

I run the server with jruby --ng-server.
Then I tried running irb with jruby --ng -S irb. It displays the prompt but then I cannot get it to do anything more, it's like it's not getting my input. Input is taken from the server's terminal but echoed back on the client, and the prompt is displayed on the client but all other output is on the server.
I also tried ruby --ng -e "puts 'hello'; s = STDIN.getc; puts s", but hello is displayed on the server's terminal instead of the client's. Then getc also does not react to input on the client, and on the server's terminal I had to type several characters before it then echoed the last one back to me. After the script is finished, on the client's terminal the characters I typed in while the script was running are then sent and executed to by the shell (e.g. if I typed ls <enter> then ls was executed).

This seems to not be working pretty obviously for me, am I doing something wrong?

Is the nailgun C client up to date?

I wonder if the nailgun C client is up to date. If it's not, I guess it's better to remove it? If it is, what's the difference between using the C client and the python one?

NGSessionPool poolSize always zero?

I think this is a mistake.

NGSessionPool(NGServer server, int poolsize) {
    this.server = server;
    this.poolSize = Math.min(0, poolsize);

    pool = new NGSession[poolSize];
    poolEntries = 0;
}

If poolsize is negative, an exception is thrown. If poolsize is positive, poolSize is zero.

JavaOutOfMemoryError (on socket reading within current NGSession)

Hi Marty,

Some errors occured while using apache-common-tools to start/stop the Nail Gun server in auto mode installed as a Windows Service.

I've made a list of files differences that may be useful in some cases to support a stable version of the Nail Gun Server.

More information can be found at the following address (zipped file):
http://www.myupload.dk/showfile/1vfsTu78p.7z

Mainly the problem is connected with socket input / output stream reading / writing a buffer within an active NGSession (JavaOutOfMemoryError).

Sincerely yours,
Alexander R.

feature request: allow nailgun client to start server

I have "uneducated" customers who could use the ng client from batch files. I would be interested in seeing support for the following use case:

 `ng --config nailgun-config.cfg arg1 arg2 arg3 ...`
  1. reads config file nailgun-config.cfg which looks like a jar-file manifest

    Class-Path: foo bar baz
    Main-Class: com.example.foo.FooClass
    Jvm-Args: [something here]
    Nailgun-Port: 12345

  2. attempts to contact nailgun server on requested port

  3. if attempt succeeds goto step 6

  4. attempt fails: launch server with given class path and JVM args, bound to localhost on Nailgun-Port

  5. if attempt still fails, abort with error

  6. launch com.example.foo.FooClass on server with arg1 arg2 arg3 ... for arguments

This essentially makes the process of running the client the same: the first time it may take a few seconds for the JVM to start up, but after that it should be quick. Then my "uneducated" customers don't need to worry about how to run the server properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.