twitter / hadoop-lzo Goto Github PK

Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20

License: GNU General Public License v3.0

Shell 54.23% Java 35.12% C 5.98% Makefile 3.70% M4 0.97%

hadoop-lzo's Introduction

Hadoop-LZO

Hadoop-LZO is a project to bring splittable LZO compression to Hadoop. LZO is an ideal compression format for Hadoop due to its combination of speed and compression size. However, LZO files are not natively splittable, meaning the parallelism that is the core of Hadoop is gone. This project re-enables that parallelism with LZO compressed files, and also comes with standard utilities (input/output streams, etc) for working with LZO files.

Origins

This project builds off the great work done at https://code.google.com/p/hadoop-gpl-compression. As of issue 41, the differences in this codebase are the following.

it fixes a few bugs in hadoop-gpl-compression -- notably, it allows the decompressor to read small or uncompressable lzo files, and also fixes the compressor to follow the lzo standard when compressing small or uncompressible chunks. it also fixes a number of inconsistently caught and thrown exception cases that can occur when the lzo writer gets killed mid-stream, plus some other smaller issues (see commit log).
it adds the ability to work with Hadoop streaming via the com.apache.hadoop.mapred.DeprecatedLzoTextInputFormat class
it adds an easier way to index lzo files (com.hadoop.compression.lzo.LzoIndexer)
it adds an even easier way to index lzo files, in a distributed manner (com.hadoop.compression.lzo.DistributedLzoIndexer)

Hadoop and LZO, Together at Last

LZO is a wonderful compression scheme to use with Hadoop because it's incredibly fast, and (with a bit of work) it's splittable. Gzip is decently fast, but cannot take advantage of Hadoop's natural map splits because it's impossible to start decompressing a gzip stream starting at a random offset in the file. LZO's block format makes it possible to start decompressing at certain specific offsets of the file -- those that start new LZO block boundaries. In addition to providing LZO decompression support, these classes provide an in-process indexer (com.hadoop.compression.lzo.LzoIndexer) and a map-reduce style indexer which will read a set of LZO files and output the offsets of LZO block boundaries that occur near the natural Hadoop block boundaries. This enables a large LZO file to be split into multiple mappers and processed in parallel. Because it is compressed, less data is read off disk, minimizing the number of IOPS required. And LZO decompression is so fast that the CPU stays ahead of the disk read, so there is no performance impact from having to decompress data as it's read off disk.

You can read more about Hadoop, LZO, and how we're using it at Twitter at https://www.cloudera.com/blog/2009/11/17/hadoop-at-twitter-part-1-splittable-lzo-compression/.

Building and Configuring

To get started, see https://code.google.com/p/hadoop-gpl-compression/wiki/FAQ. This project is built exactly the same way; please follow the answer to "How do I configure Hadoop to use these classes?" on that page, or follow the summarized version here.

You need JDK 1.6 or higher to build hadoop-lzo (1.7 or higher on Mac OS).

LZO 2.x is required, and most easily installed via the package manager on your system. If you choose to install manually for whatever reason (developer OSX machines is a common use-case) this is accomplished as follows:

Download the latest LZO release from https://www.oberhumer.com/opensource/lzo/
Configure LZO to build a shared library (required) and use a package-specific prefix (optional but recommended): ./configure --enable-shared --prefix /usr/local/lzo-2.10
Build and install LZO: make && sudo make install
On Windows, you can build lzo2.dll with this command: B\win64\vc_dll.bat

Now let's build hadoop-lzo.

C_INCLUDE_PATH=/usr/local/lzo-2.10/include \
LIBRARY_PATH=/usr/local/lzo-2.10/lib \
  mvn clean package

Running tests on Windows also requires setting PATH to include the location of lzo2.dll.

set PATH=C:\lzo-2.10;%PATH%

Additionally on Windows, the Hadoop core code requires setting HADOOP_HOME so that the tests can find winutils.exe. If you've built Hadoop trunk in directory C:\hdc, then the following would work.

set HADOOP_HOME=C:\hdc\hadoop-common-project\hadoop-common\target

Once the libs are built and installed, you may want to add them to the class paths and library paths. That is, in hadoop-env.sh, set

    export HADOOP_CLASSPATH=/path/to/your/hadoop-lzo-lib.jar
    export JAVA_LIBRARY_PATH=/path/to/hadoop-lzo-native-libs:/path/to/standard-hadoop-native-libs

Note that there seems to be a bug in /path/to/hadoop/bin/hadoop; comment out the line

    JAVA_LIBRARY_PATH=''

because it keeps Hadoop from keeping the alteration you made to JAVA_LIBRARY_PATH above. (Update: see https://issues.apache.org/jira/browse/HADOOP-6453). Make sure you restart your jobtrackers and tasktrackers after uploading and changing configs so that they take effect.

Build Troubleshooting

The following missing LZO header error suggests LZO was installed in non-standard location and cannot be found at build time. Double-check the environment variable C_INCLUDE_PATH is set to the LZO include directory. For example: C_INCLUDE_PATH=/usr/local/lzo-2.10/include

[exec] checking lzo/lzo2a.h presence... no
[exec] checking for lzo/lzo2a.h... no
[exec] configure: error: lzo headers were not found...
[exec]                gpl-compression library needs lzo to build.
[exec]                Please install the requisite lzo development package.

The following Can't find library for '-llzo2' error suggests LZO was installed to a non-standard location and cannot be located at build time. This could be one of two issues:

LZO was not built as a shared library. Double-check the location you installed LZO contains shared libraries (probably something like /usr/lib64/liblzo2.so.2 on Linux, or /usr/local/lzo-2.10/lib/liblzo2.dylib on OSX).
LZO was not added to the library path. Double-check the environment varialbe LIBRARY_PATH points as the LZO lib directory (for example LIBRARY_PATH=/usr/local/lzo-2.10/lib).

[exec] checking lzo/lzo2a.h usability... yes [exec] checking lzo/lzo2a.h presence... yes [exec] checking for lzo/lzo2a.h... yes [exec] checking Checking for the 'actual' dynamic-library for '-llzo2'... configure: error: Can't find library for '-llzo2'

The following "Native java headers not found" error indicates the Java header files are not available.

[exec] checking jni.h presence... no
[exec] checking for jni.h... no
[exec] configure: error: Native java headers not found. Is $JAVA_HOME set correctly?

Header files are not available in all Java installs. Double-check you are using a JAVA_HOME that has an include directory. On OSX you may need to install a developer Java package.

$ ls -d /Library/Java/JavaVirtualMachines/1.6.0_29-b11-402.jdk/Contents/Home/include
/Library/Java/JavaVirtualMachines/1.6.0_29-b11-402.jdk/Contents/Home/include
$ ls -d /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/include
ls: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/include: No such file or directory

Maven repository

The hadoop-lzo package is available at https://maven.twttr.com/.

For example, if you are using ivy, add the repository in ivysettings.xml:

  <ibiblio name="twttr.com" m2compatible="true" root="https://maven.twttr.com/"/>

And include hadoop-lzo as a dependency:

  <dependency org="com.hadoop.gplcompression" name="hadoop-lzo" rev="0.4.17"/>

Using Hadoop and LZO

Reading and Writing LZO Data

The project provides LzoInputStream and LzoOutputStream wrapping regular streams, to allow you to easily read and write compressed LZO data.

Indexing LZO Files

At this point, you should also be able to use the indexer to index lzo files in Hadoop (recall: this makes them splittable, so that they can be analyzed in parallel in a mapreduce job). Imagine that big_file.lzo is a 1 GB LZO file. You have two options:

index it in-process via:

  hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer big_file.lzo

index it in a map-reduce job via:

  hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer big_file.lzo

Either way, after 10-20 seconds there will be a file named big_file.lzo.index. The newly-created index file tells the LzoTextInputFormat's getSplits function how to break the LZO file into splits that can be decompressed and processed in parallel. Alternatively, if you specify a directory instead of a filename, both indexers will recursively walk the directory structure looking for .lzo files, indexing any that do not already have corresponding .lzo.index files.

Running MR Jobs over Indexed Files

Now run any job, say wordcount, over the new file. In Java-based M/R jobs, just replace any uses of TextInputFormat by LzoTextInputFormat. In streaming jobs, add "-inputformat com.hadoop.mapred.DeprecatedLzoTextInputFormat" (streaming still uses the old APIs, and needs a class that inherits from org.apache.hadoop.mapred.InputFormat). Note that to use the DeprecatedLzoTextInputFormat properly with hadoop-streaming, you should also set the jobconf property stream.map.input.ignoreKey=true. That will replicate the behavior of the default TextInputFormat by stripping off the byte offset keys from the input lines that get piped to the mapper process. For Pig jobs, email me or check the pig list -- I have custom LZO loader classes that work but are not (yet) contributed back.

Note that if you forget to index an .lzo file, the job will work but will process the entire file in a single split, which will be less efficient.

hadoop-lzo's People

Contributors

Stargazers

Watchers

Forkers

toddlipcon andeq4q stuhood moonpolysoft zhuomingliang znbailey jso jaxlaw clizzin rangadi ivmaykov joey fengjz1 grgghrn onlynone dvryaboy kevinweil lelayf lalaguozhe alexey-romanenko skiold clsung strategist922 weixu8 ashishtadose abrock ysudhir ekoontz koteswararaokolli jgustave mailmahee hjgl123 sjlee zcwfeng yuanke ivihrov caodaoxi jackerxff vertipub size-of-world baojian-zz bmarshall-zenoss huidaoli chengen cnauroth sandeepsamudrala knowledgehacker lemonhall wugang33 killerwhile chenchun mtdata packet7 ash211 laurentgo web5design mtsgrd imclab carreyzhan geota coderplay elisbyberi phoenixhadoop ustbgaofan qus-jiawei tianyi yilab nickman tarnfeld guoyanyan gsteelman dts3 dd0129 themodernlife eugenepig mithrandird mingyukim krux erikfang sunjue-heavyrain altiscale wei-he airy-ict colzer jamestyack tmulin jibaro eugencepoi jackhaowx029 wuzhongdehua kushalvenkatesh hiryou laomanong521 williamren skale1990 copoo viking93 gnosek kwmonroe allenxwang

hadoop-lzo's Issues

Unable to set the queuename

Hello,

I am indexing the lzo files using map-reduce as per below command (on CDH 5.5.1).
hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer -Dmapreduce.job.queuename=root.test big_file.lzo

The problem is the job is not using the queuename specified on the command line, it's taking default queue. On other hand, if I just execute the PI example, it uses the specified queue:
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi -Dmapreduce.job.queuename=root.test 10 100

Can you please advise how I can use the non-default queue, Thanks.

Windows compilation: dependency on msbuild

I want to get the library compiled on Windows. I installed Visual Studio Community, .net Framework 4.5.2 but I fail to put my hands on the 'msbuild' tool; it is simply not there despite having installed these tools. Thus, compilation fail.

Please document where 'msbuild' can be found or better, remove that dependency.

Error in distributed indexer : Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

Trying to run the distributed indexer to the CDH4B1 cluster
and get the following ERROR infomation:

12/10/24 18:08:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
12/10/24 18:08:56 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f]
12/10/24 18:08:56 INFO lzo.DistributedLzoIndexer: Adding LZO file /user/Terry/data2/test.txt.lzo to indexing list (no index currently exists)
12/10/24 18:08:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/10/24 18:08:57 INFO input.FileInputFormat: Total input paths to process : 1
12/10/24 18:08:58 INFO mapred.JobClient: Running job: job_201210241738_0002
12/10/24 18:08:59 INFO mapred.JobClient: map 0% reduce 0%
12/10/24 18:09:08 INFO mapred.JobClient: Task Id : attempt_201210241738_0002_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
12/10/24 18:09:14 INFO mapred.JobClient: Task Id : attempt_201210241738_0002_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
12/10/24 18:09:20 INFO mapred.JobClient: Task Id : attempt_201210241738_0002_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
12/10/24 18:09:29 INFO mapred.JobClient: Job complete: job_201210241738_0002
12/10/24 18:09:29 INFO mapred.JobClient: Counters: 7
12/10/24 18:09:29 INFO mapred.JobClient: Job Counters
12/10/24 18:09:29 INFO mapred.JobClient: Failed map tasks=1
12/10/24 18:09:29 INFO mapred.JobClient: Launched map tasks=4
12/10/24 18:09:29 INFO mapred.JobClient: Rack-local map tasks=4
12/10/24 18:09:29 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=15178
12/10/24 18:09:29 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
12/10/24 18:09:29 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/10/24 18:09:29 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

and any one who can explain it to me ?thank you !!

DistributedLzoIndexer should treat input paths as globs

It would be handy to be able to abbreviate a list of directories by a glob, e.g.:

hadoop  com.hadoop.compression.lzo.DistributedLzoIndexer '/logs/2014/12/01/{04,05}'

with/without lzo index behave differently

I run hive to query (select count(*) from table) data compressed with lzo while the record number with lzo index are different with the number without lzo index. Did anyone know if this is a known issue?

Can anyone one help on this?

Thanks!
Leo

Compilation error

hadoop-lzo won't compile on two separate machines.

Linux XXXX 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Linux XXXX 2.6.32-131.2.1.el6.x86_64 #1 SMP Wed May 18 07:07:37 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Here is gcc -v for the second machine.

Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC)

And ant -v.

Apache Ant(TM) version 1.8.2 compiled on December 20 2010

I downloaded hadoop-lzo using git clone and I am using version 2.05 of lzo compiled from http://www.oberhumer.com/opensource/lzo/. I saw a similar issue in another thread (http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201001.mbox/%[email protected]%3E) and this issue was supposedly resolved by "upgrading" to version 2.02. I tried downgrading to version 2.02 of LZO but did not have any success (same error).

 [exec] libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I/home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native -I./impl -I/usr/lib/jvm/java-1.6.0-openjdk/include -I/usr/lib/jvm/java-1.6.0-openjdk/include/linux -I/home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoCompressor.lo -MD -MP -MF impl/lzo/.deps/LzoCompressor.Tpo -c /home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl/lzo/LzoCompressor.c  -fPIC -DPIC -o impl/lzo/.libs/LzoCompressor.o
 [exec] /home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl/lzo/LzoCompressor.c: In function ‘Java_com_hadoop_compression_lzo_LzoCompressor_initIDs’:
 [exec] /home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl/lzo/LzoCompressor.c:125:37: error: expected expression before ‘,’ token
 [exec] /home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl/lzo/LzoCompressor.c: In function ‘Java_com_hadoop_compression_lzo_LzoCompressor_compressBytesDirect’:
 [exec] /home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl/lzo/LzoCompressor.c:274:3: warning: implicit declaration of function ‘strstr’
 [exec] /home/michael/Downloads/kevinweil-hadoop-lzo-4537f94/src/native/impl/lzo/LzoCompressor.c:274:14: warning: incompatible implicit declaration of built-in function ‘strstr’
 [exec] make: *** [impl/lzo/LzoCompressor.lo] Error 1

Enhancement: add LOG.warn() about insufficiently large io.compression.codec.lzo.buffersize

I found a bug in Pig's contrib/zebra: https://issues.apache.org/jira/browse/PIG-3208 that causes LZO decompression to fail, and I thought it might be helpful to print a warning on the Java side to help diagnose these type of bugs.

Currently, when the LZO decompressor fails in this way, you see that your task JVM failed with a return value of 134, and in the task's syslog, you may see something like:

 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.InternalError: lzo1x_decompress returned: -4

which is not very informative.

Note that this is not due to a bug in hadoop-lzo itself, however, but rather in Pig's TFile class. Still, hadoop-lzo can help by diagnosing the problem and recommending a solution.

Hadoop LZO fails to build with Open JDK 1.6 & Ubuntu.

I could not find how to communicate with you. I ran ant clean test at hadoop-lzo directory and see following error.
[exec] ompressor.c:273:3: warning: implicit declaration of function ‘strstr’ [-Wimplicit-function-declaration]
[exec] /home/deepakkv/softwares/hadoop-lzo/src/native/impl/lzo/LzoCompressor.c:273:14: warning: incompatible implicit declaration of built-in function ‘strstr’ [enabled by default]
[exec] make: *** [impl/lzo/LzoCompressor.lo] Error 1

To fix above i included string.h in /home/deepakkv/softwares/hadoop-lzo/src/native/impl/lzo/LzoCompressor.c and ran into below error
[exec] /home/deepakkv/softwares/hadoop-lzo/src/native/impl/lzo/LzoCompressor.c: In function ‘Java_com_hadoop_compression_lzo_Lzodepbase=echo impl/lzo/LzoCompressor.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||';
[exec] /bin/bash ./libtool --tag=CC --mode=compile gcCompressor_initIDs’:
[exec] /home/deepakkv/softwares/hadoop-lzo/src/native/impl/lzo/LzoCompressor.c:125:37: error: expected expression before ‘,’ token
[exec] make: *** [impl/lzo/LzoCompressor.lo] Error 1

Environment
LIBRARY_PATH=/usr/local/lzo-2.06/lib
JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64/
Linux ubuntu 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
C_INCLUDE_PATH=/usr/local/lzo-2.06/include

can't index lzo file if the filename length is 3 or less

I tried to index lzo file, but the error occurred if the filename length is 3 or less
I think this is a bug, so I report steps to occur the error and my environment when the error occurred.

---- environment ----

OS: fedora 64bit (i use cloudera's AMI cloudera-ec2-hadoop-images/cloudera-hadoop-fedora-20090623-x86_64 ami-2359bf4a)
LZO library version: 2.02
lzop version: v1.02rc1
hadoop version: hadoop-0.20.2+228

--- steps ----

$ hadoop jar /usr/lib/hadoop/hadoop-0.20.2+228-examples.jar teragen 10000000 1G
...
$ hadoop fs -copyToLocal 1G/part-00000 1G
$ lzop 1G
$ hadoop fs -copyFromLocal 1G.lzo .
$ hadoop jar /usr/lib/hadoop/lib/hadoop-lzo-0.4.4.jar com.hadoop.compression.lzo.LzoIndexer 1G.lzo
10/07/09 09:06:57 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
10/07/09 09:06:57 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 5c25e00]
10/07/09 09:06:57 INFO lzo.LzoIndexer: LZO Indexing directory lzo...
10/07/09 09:06:57 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file hdfs://localhost:9000/user/hadoop/1G.lzo, size 0.18 GB...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at com.hadoop.compression.lzo.LzopInputStream.readInt(LzopInputStream.java:92)
at com.hadoop.compression.lzo.LzopInputStream.readHeaderItem(LzopInputStream.java:103)
at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:189)
at com.hadoop.compression.lzo.LzopInputStream.(LzopInputStream.java:55)
at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:70)
at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:224)
at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:86)
at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

The length of a M/R job's name

In some situation, it needs to create index on lots of files in the same time and the name of each file may get very long, when I pass an array which contains 100 files,the name of the job which invoked by the "ToolRunner.run(new DistributedLzoIndexer(), args)" become very unamiable on the jobtracker.jsp.

Get corrent result processing files, but errors processing directories?

I use spark to process lzo files. With a.lzo and b.lzo in "/input/", eg.
When processing seperately with filename as "/input/a.lzo" and "/input/b.lzo", it can get corrent result.
When processing together with filename as "/input" or "/input/", get errors like this:

14/09/05 13:01:32 WARN LzopInputStream: IOException in getCompressedData; likely LZO corruption.
java.io.IOException: Corrupted uncompressed block
at com.hadoop.compression.lzo.LzopInputStream.verifyChecksums(LzopInputStream.java:219)
at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:284)
at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:261)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:201)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:184)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$1.hasNext(Iterator.scala:847)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:133)
at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:130)
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571)
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/09/05 13:01:32 WARN LzopInputStream: Incorrect LZO file format: file did not end with four trailing zeroes.
java.io.IOException: Corrupted uncompressed block
at com.hadoop.compression.lzo.LzopInputStream.verifyChecksums(LzopInputStream.java:219)
at com.hadoop.compression.lzo.LzopInputStream.close(LzopInputStream.java:342)
at org.apache.hadoop.util.LineReader.close(LineReader.java:150)
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:241)
at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:211)
at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63)
at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:196)
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:204)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Anyone can help?

Very slow compression of uncompressable data

I performed a small benchmark of LzoCodec (benchmark hadoop job is here: https://gist.github.com/1600759).

The result is quite suprising: LZO is more than 3 times slower on random data compared to plain storage or Snappy compressor.

On our cluster I get these numbers:

without compression: 122 seconds
Snappy: 126 seconds
LZO: 390 seconds

I use latest revision of hadoop-lzo and lzo 2.04. Do you have any suggestion how to fix that?

build difficulties

Hi, I'm having issues building the jar with ant. I am not much of a java developer but we are building a hadoop system to work with our cluster's Torque batch scheduler and need to compile our own version. So far, elephant-bird and hadoop are working but I can't get hadoop-lzo to build. I've tried passing the directory to the hadoop libraries to ant with

ant -noclasspath -lib $HOME/src/hadoop-0.20.2 -lib $HOME/hadoop-0.20.2/lib clean compile-native test tar

but it's failing with during compile-native with "[javah] Error: Class org.apache.hadoop.conf.Configuration could not be found." It looks like the hadoop libraries aren't being found. Full command output is below.

Am I missing some kind of environment variable that should be set? How do I specify that the hadoop classes are included during the build process?

Any help would be appreciated,
Kameron Harris
University of Vermont / onehappybird.com

Buildfile: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build.xml

clean:
[delete] Deleting directory /gpfs1/home/k/h/kharris/src/hadoop-lzo/build

ivy-download:
[get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.0.0-rc2/ivy-2.0.0-rc2.jar
[get] To: /gpfs1/home/k/h/kharris/src/hadoop-lzo/ivy/ivy-2.0.0-rc2.jar
[get] Not modified - so not downloaded

ivy-init-dirs:
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/ivy
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/ivy/lib
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/ivy/report
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/ivy/maven

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.0.0-rc2 - 20081028224207 :: http://ant.apache.org/ivy/ ::
:: loading settings :: file = /gpfs1/home/k/h/kharris/src/hadoop-lzo/ivy/ivysettings.xml

ivy-resolve-common:
[ivy:resolve] :: resolving dependencies :: com.hadoop.gplcompression#Hadoop-GPL-Compression;[email protected]
[ivy:resolve] confs: [common]
[ivy:resolve] found commons-logging#commons-logging;1.0.4 in maven2
[ivy:resolve] found junit#junit;3.8.1 in maven2
[ivy:resolve] found commons-logging#commons-logging-api;1.0.4 in maven2
[ivy:resolve] :: resolution report :: resolve 209ms :: artifacts dl 6ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| common | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------

ivy-retrieve-common:
[ivy:retrieve] :: retrieving :: com.hadoop.gplcompression#Hadoop-GPL-Compression
[ivy:retrieve] confs: [common]
[ivy:retrieve] 3 artifacts copied, 0 already retrieved (180kB/12ms)
No ivy:settings found for the default reference 'ivy.instance'. A default instance will be used
DEPRECATED: 'ivy.conf.file' is deprecated, use 'ivy.settings.file' instead
:: loading settings :: file = /gpfs1/home/k/h/kharris/src/hadoop-lzo/ivy/ivysettings.xml

init:
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/classes
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/src
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/test
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/test/classes

compile-java:
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/build.xml:216: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 24 source files to /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/classes
[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoLineRecordReader.java:31: warning: [deprecation] FileSplit in org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.FileSplit;
[javac] ^
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java:34: warning: [deprecation] FileSplit in org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.FileSplit;
[javac] ^
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java:35: warning: [deprecation] InputSplit in org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.InputSplit;
[javac] ^
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java:36: warning: [deprecation] JobConf in org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.JobConf;
[javac] ^
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java:37: warning: [deprecation] JobConfigurable in org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.JobConfigurable;
[javac] ^
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java:40: warning: [deprecation] TextInputFormat in org.apache.hadoop.mapred has been deprecated
[javac] import org.apache.hadoop.mapred.TextInputFormat;
[javac] ^
[javac] /gpfs1/home/k/h/kharris/src/hadoop-lzo/src/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java:67: warning: [deprecation] TextInputFormat in org.apache.hadoop.mapred has been deprecated
[javac] public class DeprecatedLzoTextInputFormat extends TextInputFormat {
[javac] ^
[javac] 8 warnings

check-native-uptodate:

compile-native:
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/native/Linux-amd64-64/lib
[mkdir] Created dir: /gpfs1/home/k/h/kharris/src/hadoop-lzo/build/native/Linux-amd64-64/src/com/hadoop/compression/lzo
[javah] Error: Class org.apache.hadoop.conf.Configuration could not be found.

BUILD FAILED
/gpfs1/home/k/h/kharris/src/hadoop-lzo/build.xml:242: compilation failed

Total time: 5 seconds

changing the group id

This issue was brought up during the maven conversion (#70).

Currently the group id is specified as "com.hadoop.gplcompression". But Chris points out that we cannot publish to the maven central with this group id, and that it needs to be "com.twitter".

There are implications in changing the group id, and I'm opening this issue to discuss the next steps.

java.io.IOException: Corrupted uncompressed block

java.io.IOException: Corrupted uncompressed block
at com.hadoop.compression.lzo.LzopInputStream.verifyChecksums(LzopInputStream.java:221)
at com.hadoop.compression.lzo.LzopInputStream.close(LzopInputStream.java:344)
at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:377)
at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:192)
at java.io.InputStreamReader.close(InputStreamReader.java:199)
at java.io.BufferedReader.close(BufferedReader.java:517)
at com.hadoop.compression.lzo.TestLzopInputStream.runTest(TestLzopInputStream.java:147)
at com.hadoop.compression.lzo.TestLzopInputStream.testTruncatedFile(TestLzopInputStream.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:176)
at junit.framework.TestCase.runBare(TestCase.java:141)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:129)
at junit.framework.TestSuite.runTest(TestSuite.java:255)
at junit.framework.TestSuite.run(TestSuite.java:250)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

十月 30, 2014 3:17:57 下午 com.hadoop.compression.lzo.TestLzopInputStream runTest

Thread Safely?

LzoCompressor.realloc() fails to free the old buffer via cleaner

I noticed that this in the log while running the unit tests:

WARNING: Couldn't realloc bytebuffer
java.lang.IllegalAccessException: Class com.hadoop.compression.lzo.LzoCompressor can not access a member of class java.nio.DirectByteBuffer with modifiers "public"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:261)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:253)
at java.lang.reflect.Method.invoke(Method.java:594)
at com.hadoop.compression.lzo.LzoCompressor.realloc(LzoCompressor.java:249)
at com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:264)
at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:216)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
at com.hadoop.compression.lzo.TestLzoCodec.testCodecPoolChangeBufferSize(TestLzoCodec.java:57)

The old buffer may be eventually freed by the garbage collector, but at least the portion of code that frees the direct buffer explicitly via the cleaner isn't working. I see this both on Linux (open JDK) and Mac.

I suspect this is because we're not setting the accessible flag to true, and can be remedied easily.

FYI, you won't see this stack trace in the current master because with hadoop 2.0 the log4j logging isn't coming out properly. But it still happens there too. If you go back one commit and run the unit tests, you'll easily see this in the log.

msbuild

build-native-win:
[mkdir] Created dir: c:\Users\Administrator\Desktop\hadoop-lzo\target\native
\Windows_NT-${env.PLATFORM}\lib
[mkdir] Created dir: c:\Users\Administrator\Desktop\hadoop-lzo\target\classe
s\native\Windows_NT-${env.PLATFORM}\lib
[mkdir] Created dir: c:\Users\Administrator\Desktop\hadoop-lzo\target\native
\Windows_NT-${env.PLATFORM}\src\com\hadoop\compression\lzo
[javah] [Forcefully writing file RegularFileObject[c:\Users\Administrator\De
sktop\hadoop-lzo\target\native\Windows_NT-${env.PLATFORM}\src\com\hadoop\compres
sion\lzo\com_hadoop_compression_lzo_LzoCompressor.h]]
[javah] [Forcefully writing file RegularFileObject[c:\Users\Administrator\De
sktop\hadoop-lzo\target\native\Windows_NT-${env.PLATFORM}\src\com\hadoop\compres
sion\lzo\com_hadoop_compression_lzo_LzoCompressor_CompressionStrategy.h]]
[javah] [Forcefully writing file RegularFileObject[c:\Users\Administrator\De
sktop\hadoop-lzo\target\native\Windows_NT-${env.PLATFORM}\src\com\hadoop\compres
sion\lzo\com_hadoop_compression_lzo_LzoDecompressor.h]]
[javah] [Forcefully writing file RegularFileObject[c:\Users\Administrator\De
sktop\hadoop-lzo\target\native\Windows_NT-${env.PLATFORM}\src\com\hadoop\compres
sion\lzo\com_hadoop_compression_lzo_LzoDecompressor_CompressionStrategy.h]]
[exec] 生成启动时间为 2014/10/29 9:52:34。
[exec] 节点 1 上的项目“c:\Users\Administrator\Desktop\hadoop-lzo\src\main
native\gplcompression.sln”(默认目标)。
[exec] ValidateSolutionConfiguration:
[exec] 正在生成解决方案配置“Release|Win32”。
[exec] 项目“c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplc
ompression.sln”(1)正在节点 1 上生成“c:\Users\Administrator\Desktop\hadoop-lzo
src\main\native\gplcompression.vcxproj”(2) (默认目标)。
[exec] InitializeBuildStatus:
[exec] 正在创建“c:\Users\Administrator\Desktop\hadoop-lzo\target/native/
Windows_NT-${env.PLATFORM}/gplcompression.unsuccessfulbuild”，因为已指定“Alway
sCreate”。
[exec] ClCompile:
[exec] C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\CL.exe
/c /Zi /nologo /W3 /WX- /O2 /Oi /Ot /Oy- /GL /D WIN32 /D NDEBUG /D _WINDOWS /D _
USRDLL /D _WINDLL /D _UNICODE /D UNICODE /Gm- /EHsc /MD /GS /Gy /fp:precise /Zc:
wchar_t /Zc:forScope /Fo"c:\Users\Administrator\Desktop\hadoop-lzo\target/native
/Windows_NT-${env.PLATFORM}/" /Fd"c:\Users\Administrator\Desktop\hadoop-lzo\targ
et/native/Windows_NT-${env.PLATFORM}/vc100.pdb" /Gd /TC /analyze- /errorReport:q
ueue impl\lzo\LzoCompressor.c impl\lzo\LzoDecompressor.c /D HADOOP_LZO_LIBRARY=L
"lzo2.dll"
[exec] LzoCompressor.c
[exec] c:\users\administrator\desktop\hadoop-lzo\src\main\native\impl\lzo/l
zo.h(22): fatal error C1083: 无法打开包括文件:“lzo/lzo1.h”: No such file or di
rectory [c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplcompressio
n.vcxproj]
[exec] LzoDecompressor.c
[exec] c:\users\administrator\desktop\hadoop-lzo\src\main\native\impl\lzo/l
zo.h(22): fatal error C1083: 无法打开包括文件:“lzo/lzo1.h”: No such file or di
rectory [c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplcompressio
n.vcxproj]
[exec] 已完成生成项目“c:\Users\Administrator\Desktop\hadoop-lzo\src\main\n
ative\gplcompression.vcxproj”(默认目标)的操作 - 失败。
[exec] 已完成生成项目“c:\Users\Administrator\Desktop\hadoop-lzo\src\main\n
ative\gplcompression.sln”(默认目标)的操作 - 失败。
[exec]
[exec] 生成失败。
[exec]
[exec] “c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplcompr
ession.sln”(默认目标) (1) ->
[exec] “c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplcompr
ession.vcxproj”(默认目标) (2) ->
[exec](ClCompile 目标) ->
[exec] c:\users\administrator\desktop\hadoop-lzo\src\main\native\impl\lzo
/lzo.h(22): fatal error C1083: 无法打开包括文件:“lzo/lzo1.h”: No such file or
directory [c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplcompress
ion.vcxproj]
[exec] c:\users\administrator\desktop\hadoop-lzo\src\main\native\impl\lzo
/lzo.h(22): fatal error C1083: 无法打开包括文件:“lzo/lzo1.h”: No such file or
directory [c:\Users\Administrator\Desktop\hadoop-lzo\src\main\native\gplcompress
ion.vcxproj]
[exec]
[exec] 0 个警告
[exec] 2 个错误
[exec]
[exec] 已用时间 00:00:00.77
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.040 s
[INFO] Finished at: 2014-10-29T09:52:35+08:00
[INFO] Final Memory: 32M/181M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:
run (build-native-win) on project hadoop-lzo: An Ant BuildException has occured:
exec returned: 1
[ERROR] around Ant part ...... @ 12:71 in c:\Users\Administrator\Desktop\hadoop-lzo\target\a
ntrun\build-build-native-win.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception

Administrator@PC201301012314 ~/Desktop/hadoop-lzo (master)

Why default Release|Win32? how to change to x64?

Why cannot include 'lzo/lzo1.h'?

Embed the native libraries in the hadoop-lzo jar

the snappy-java library has this cool feature of embedding the native libraries in the jar and loads the correct one depending on the os. That would be a cool feature to add to LZO and would make testing easier.
https://github.com/xerial/snappy-java

in particular:
https://github.com/xerial/snappy-java/blob/develop/src/main/java/org/xerial/snappy/SnappyLoader.java
https://github.com/xerial/snappy-java/tree/develop/src/main/resources/org/xerial/snappy/native

LZOP compression corrupts output for specific input

We're using hadoop-lzo 0.4.7 with the patch for the empty file infinite loop (kevinweil@9d06b25)

For a specific input string the LzopCodec seems to corrupt the compressed output. We have a repeatable test case demonstrating this. The output of the test case follows - the first block contains the output of the file prior to compression, the second block contains the corrupted contents of the compressed/decompressed file:

***************************************************
* Content being compressed
*   /tmp/lzop-test129499022709115218.cleartext
***************************************************
0.5 74  25425
0.9 200 25384
0.95    203 4
0.98    211 2
0.99    219 3
0.995   240 5
***************************************************

***************************************************
* Content after compression/decompression
*   compressed:   /tmp/lzop-test129499022709115218.cleartext.lzop
*   uncompressed: /tmp/lzop-test129499022709115218.cleartext.uncompressed
***************************************************
0.5 74  25425
0.9 200 25384t5 203 ?8  211 2u9H
                                                    9   3
0.995   240 5
***************************************************

If I use the lzop binary (LZOP(1)) to compress/uncompress it works as expected

$ lzop -c /tmp/lzop-test129499022709115218.cleartext  > output.lzop
$ lzop -d output.lzop
$ cat output
0.5 74  25425
0.9 200 25384
0.95    203 4
0.98    211 2
0.99    219 3
0.995   240 5

One more interesting piece of data is that if we set the LZO buffer size to a small value output is not corrupted:

c.set(LzoCodec.LZO_BUFFER_SIZE_KEY, "1024");

This is the test code to reproduce the problem:

import com.hadoop.compression.lzo.LzoCodec;
import com.hadoop.compression.lzo.LzopCodec;
import org.apache.commons.io.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

import java.io.*;
import java.net.URI;

public class LzoTestCorruptReduceOutput4 {

    static String inputData = "0.5\t74\t25425\n"+
            "0.9\t200\t25384\n"+
            "0.95\t203\t4\n"+
            "0.98\t211\t2\n"+
            "0.99\t219\t3\n"+
            "0.995\t240\t5";

    static LzopCodec codec = new LzopCodec();
    static RawLocalFileSystem fs = new RawLocalFileSystem();

    public static void main(String[] args) throws Exception {

        Configuration c = new Configuration();
        // uncommenting the following line seems to fix the corruption
//        c.set(LzoCodec.LZO_BUFFER_SIZE_KEY, "1024");

        codec.setConf(c);
        fs.setConf(new Configuration());
        fs.initialize(new URI("file:///"), new Configuration());

        File cleartextFile = File.createTempFile("lzop-test", ".cleartext");
        File cleartextUncompressedFile = new File(cleartextFile.getAbsoluteFile() + ".uncompressed");
        File compressedFile = new File(cleartextFile.getAbsoluteFile() + ".lzop");

        FileUtils.writeStringToFile(cleartextFile, inputData);

        System.out.println("");
        System.out.println("***************************************************");
        System.out.println("* Content being compressed");
        System.out.println("*   " + cleartextFile.getAbsolutePath());
        System.out.println("***************************************************");
        System.out.println(inputData);
        System.out.println("***************************************************");

        compress(cleartextFile, compressedFile);

        decompress(compressedFile, cleartextUncompressedFile);

        System.out.println("");
        System.out.println("***************************************************");
        System.out.println("* Content after compression/decompression");
        System.out.println("*   compressed:   " + compressedFile.getAbsolutePath());
        System.out.println("*   uncompressed: " + cleartextUncompressedFile.getAbsolutePath());
        System.out.println("***************************************************");
        System.out.println(FileUtils.readFileToString(cleartextUncompressedFile));
        System.out.println("***************************************************");
    }

    public static void compress(File input, File output) throws IOException {
        copyStream(
                fs.open(new Path(input.getAbsolutePath())),
                codec.createOutputStream(fs.create(new Path(output.getAbsolutePath()), true)));
    }

    public static void decompress(File input, File output) throws IOException {
        copyStream(
                codec.createInputStream(fs.open(new Path(input.getAbsolutePath()))),
                fs.create(new Path(output.getAbsolutePath()), true));
    }

    public static void copyStream( InputStream is, OutputStream os) throws IOException {
        IOUtils.copy(is, os);
        IOUtils.closeQuietly(os);
    }
}

Any assistance would be greatly appreciated.

add unit tests for DistributedLzoIndexer

add an ability to run maven build (and test) against old hadoop (1.0)

Currently the maven build builds and tests against hadoop 2.0 artifacts. It would be nice to be able to do the same against hadoop 1.0 artifacts to ensure we don't break things in the 1.0 environment.

Combined LZO files

Would it be possible to add support combined LZO files? For example, if I compress two files and then concatenate the compressed versions, it'd be nice to be able to decompress the combined file get the contents of both files back out. The lzop program supports this.

Does it only work for hadoop 0.20,not work for 0.20.1+2?

fail to build hadoop-lzo on Ubuntu 14.04 with oracle-java8

pigpigpig@pigpigpig:~/code/hadoop-lzo$ mvn clean package -Dmaven.test.skip=true
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building hadoop-lzo 0.4.20-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hadoop-lzo ---
[INFO] Deleting /home/pigpigpig/code/hadoop-lzo/target
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (check-platform) @ hadoop-lzo ---
[INFO] Executing tasks

check-platform:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (set-props-non-win) @ hadoop-lzo ---
[INFO] Executing tasks

set-props-non-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (set-props-win) @ hadoop-lzo ---
[INFO] Executing tasks

set-props-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ hadoop-lzo ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/pigpigpig/code/hadoop-lzo/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hadoop-lzo ---
[INFO] Compiling 25 source files to /home/pigpigpig/code/hadoop-lzo/target/classes
[WARNING] bootstrap class path not set in conjunction with -source 1.6
/home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/DistributedLzoIndexer.java:[41,20] [deprecation] isDir() in FileStatus has been deprecated
[WARNING] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/DistributedLzoIndexer.java:[87,14] [deprecation] Job(Configuration) in Job has been deprecated
[WARNING] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndexer.java:[82,18] [deprecation] isDir() in FileStatus has been deprecated
[WARNING] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/mapreduce/LzoIndexOutputFormat.java:[31,28] [deprecation] cleanupJob(JobContext) in OutputCommitter has been deprecated
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (build-info-non-win) @ hadoop-lzo ---
[INFO] Executing tasks

build-info-non-win:
[propertyfile] Creating new property file: /home/pigpigpig/code/hadoop-lzo/target/classes/build.properties
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (build-info-win) @ hadoop-lzo ---
[INFO] Executing tasks

build-info-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (check-native-uptodate-non-win) @ hadoop-lzo ---
[INFO] Executing tasks

check-native-uptodate-non-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (check-native-uptodate-win) @ hadoop-lzo ---
[INFO] Executing tasks

check-native-uptodate-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (build-native-non-win) @ hadoop-lzo ---
[INFO] Executing tasks

build-native-non-win:
[mkdir] Created dir: /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib
[mkdir] Created dir: /home/pigpigpig/code/hadoop-lzo/target/classes/native/Linux-amd64-64/lib
[mkdir] Created dir: /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/src/com/hadoop/compression/lzo
[javah] [Forcefully writing file RegularFileObject[/home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/src/com/hadoop/compression/lzo/com_hadoop_compression_lzo_LzoCompressor.h]]
[javah] [Forcefully writing file RegularFileObject[/home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/src/com/hadoop/compression/lzo/com_hadoop_compression_lzo_LzoCompressor_CompressionStrategy.h]]
[javah] [Forcefully writing file RegularFileObject[/home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/src/com/hadoop/compression/lzo/com_hadoop_compression_lzo_LzoDecompressor.h]]
[javah] [Forcefully writing file RegularFileObject[/home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/src/com/hadoop/compression/lzo/com_hadoop_compression_lzo_LzoDecompressor_CompressionStrategy.h]]
[exec] checking for a BSD-compatible install... /usr/bin/install -c
[exec] checking whether build environment is sane... yes
[exec] checking for a thread-safe mkdir -p... /bin/mkdir -p
[exec] checking for gawk... no
[exec] checking for mawk... mawk
[exec] checking whether make sets $(MAKE)... yes
[exec] checking whether to enable maintainer-specific portions of Makefiles... no
[exec] checking for style of include used by make... GNU
[exec] checking for gcc... gcc
[exec] checking whether the C compiler works... yes
[exec] checking for C compiler default output file name... a.out
[exec] checking for suffix of executables...
[exec] checking whether we are cross compiling... no
[exec] checking for suffix of object files... o
[exec] checking whether we are using the GNU C compiler... yes
[exec] checking whether gcc accepts -g... yes
[exec] checking for gcc option to accept ISO C89... none needed
[exec] checking dependency style of gcc... gcc3
[exec] checking how to run the C preprocessor... gcc -E
[exec] checking for grep that handles long lines and -e... /bin/grep
[exec] checking for egrep... /bin/grep -E
[exec] checking for ANSI C header files... yes
[exec] checking for sys/types.h... yes
[exec] checking for sys/stat.h... yes
[exec] checking for stdlib.h... yes
[exec] checking for string.h... yes
[exec] checking for memory.h... yes
[exec] checking for strings.h... yes
[exec] checking for inttypes.h... yes
[exec] checking for stdint.h... yes
[exec] checking for unistd.h... yes
[exec] checking minix/config.h usability... no
[exec] checking minix/config.h presence... no
[exec] checking for minix/config.h... no
[exec] checking whether it is safe to define EXTENSIONS... yes
[exec] checking for gcc... (cached) gcc
[exec] checking whether we are using the GNU C compiler... (cached) yes
[exec] checking whether gcc accepts -g... (cached) yes
[exec] checking for gcc option to accept ISO C89... (cached) none needed
[exec] checking dependency style of gcc... (cached) gcc3
[exec] checking build system type... x86_64-unknown-linux-gnu
[exec] checking host system type... x86_64-unknown-linux-gnu
[exec] checking for a sed that does not truncate output... /bin/sed
[exec] checking for fgrep... /bin/grep -F
[exec] checking for ld used by gcc... /usr/bin/ld
[exec] checking if the linker (/usr/bin/ld) is GNU ld... yes
[exec] checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
[exec] checking the name lister (/usr/bin/nm -B) interface... BSD nm
[exec] checking whether ln -s works... yes
[exec] checking the maximum length of command line arguments... 1572864
[exec] checking whether the shell understands some XSI constructs... yes
[exec] checking whether the shell understands "+="... yes
[exec] checking for /usr/bin/ld option to reload object files... -r
[exec] checking for objdump... objdump
[exec] checking how to recognize dependent libraries... pass_all
[exec] checking for ar... ar
[exec] checking for strip... strip
[exec] checking for ranlib... ranlib
[exec] checking command to parse /usr/bin/nm -B output from gcc object... ok
[exec] checking for dlfcn.h... yes
[exec] checking for objdir... .libs
[exec] checking if gcc supports -fno-rtti -fno-exceptions... no
[exec] checking for gcc option to produce PIC... -fPIC -DPIC
[exec] checking if gcc PIC flag -fPIC -DPIC works... yes
[exec] checking if gcc static flag -static works... yes
[exec] checking if gcc supports -c -o file.o... yes
[exec] checking if gcc supports -c -o file.o... (cached) yes
[exec] checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
[exec] checking whether -lc should be explicitly linked in... no
[exec] checking dynamic linker characteristics... GNU/Linux ld.so
[exec] checking how to hardcode library paths into programs... immediate
[exec] checking whether stripping libraries is possible... yes
[exec] checking if libtool supports shared libraries... yes
[exec] checking whether to build shared libraries... yes
[exec] checking whether to build static libraries... yes
[exec] checking for dlopen in -ldl... yes
[exec] checking for unistd.h... (cached) yes
[exec] checking stdio.h usability... yes
[exec] checking stdio.h presence... yes
[exec] checking for stdio.h... yes
[exec] checking stddef.h usability... yes
[exec] checking stddef.h presence... yes
[exec] checking for stddef.h... yes
[exec] checking lzo/lzo2a.h usability... yes
[exec] checking lzo/lzo2a.h presence... yes
[exec] checking for lzo/lzo2a.h... yes
[exec] checking Checking for the 'actual' dynamic-library for '-llzo2'... "liblzo2.so.2"
[exec] checking for special C compiler options needed for large files... no
[exec] checking for _FILE_OFFSET_BITS value needed for large files... no
[exec] checking for stdbool.h that conforms to C99... yes
[exec] checking for _Bool... yes
[exec] checking for an ANSI C-conforming const... yes
[exec] checking for off_t... yes
[exec] checking for size_t... yes
[exec] checking whether strerror_r is declared... yes
[exec] checking for strerror_r... yes
[exec] checking whether strerror_r returns char _... yes
[exec] checking for mkdir... yes
[exec] checking for uname... yes
[exec] checking for memset... yes
[exec] checking for JNI_GetCreatedJavaVMs in -ljvm... yes
[exec] checking jni.h usability... yes
[exec] checking jni.h presence... yes
[exec] checking for jni.h... yes
[exec] configure: creating ./config.status
[exec] config.status: creating Makefile
[exec] config.status: creating impl/config.h
[exec] config.status: executing depfiles commands
[exec] config.status: executing libtool commands
[exec] depbase=echo impl/lzo/LzoCompressor.lo | sed 's|[^/]_$|.deps/&|;s|.lo$||';
[exec] /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I/home/pigpigpig/code/hadoop-lzo/src/main/native -I./impl -I/usr/lib/jvm/java-8-oracle/include -I/usr/lib/jvm/java-8-oracle/include/linux -I/home/pigpigpig/code/hadoop-lzo/src/main/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoCompressor.lo -MD -MP -MF $depbase.Tpo -c -o impl/lzo/LzoCompressor.lo /home/pigpigpig/code/hadoop-lzo/src/main/native/impl/lzo/LzoCompressor.c &&
[exec] mv -f $depbase.Tpo $depbase.Plo
[exec] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/home/pigpigpig/code/hadoop-lzo/src/main/native -I./impl -I/usr/lib/jvm/java-8-oracle/include -I/usr/lib/jvm/java-8-oracle/include/linux -I/home/pigpigpig/code/hadoop-lzo/src/main/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoCompressor.lo -MD -MP -MF impl/lzo/.deps/LzoCompressor.Tpo -c /home/pigpigpig/code/hadoop-lzo/src/main/native/impl/lzo/LzoCompressor.c -fPIC -DPIC -o impl/lzo/.libs/LzoCompressor.o
[exec] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/home/pigpigpig/code/hadoop-lzo/src/main/native -I./impl -I/usr/lib/jvm/java-8-oracle/include -I/usr/lib/jvm/java-8-oracle/include/linux -I/home/pigpigpig/code/hadoop-lzo/src/main/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoCompressor.lo -MD -MP -MF impl/lzo/.deps/LzoCompressor.Tpo -c /home/pigpigpig/code/hadoop-lzo/src/main/native/impl/lzo/LzoCompressor.c -o impl/lzo/LzoCompressor.o >/dev/null 2>&1
[exec] depbase=echo impl/lzo/LzoDecompressor.lo | sed 's|[^/]*$|.deps/&|;s|.lo$||';
[exec] /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I/home/pigpigpig/code/hadoop-lzo/src/main/native -I./impl -I/usr/lib/jvm/java-8-oracle/include -I/usr/lib/jvm/java-8-oracle/include/linux -I/home/pigpigpig/code/hadoop-lzo/src/main/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoDecompressor.lo -MD -MP -MF $depbase.Tpo -c -o impl/lzo/LzoDecompressor.lo /home/pigpigpig/code/hadoop-lzo/src/main/native/impl/lzo/LzoDecompressor.c &&
[exec] mv -f $depbase.Tpo $depbase.Plo
[exec] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/home/pigpigpig/code/hadoop-lzo/src/main/native -I./impl -I/usr/lib/jvm/java-8-oracle/include -I/usr/lib/jvm/java-8-oracle/include/linux -I/home/pigpigpig/code/hadoop-lzo/src/main/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoDecompressor.lo -MD -MP -MF impl/lzo/.deps/LzoDecompressor.Tpo -c /home/pigpigpig/code/hadoop-lzo/src/main/native/impl/lzo/LzoDecompressor.c -fPIC -DPIC -o impl/lzo/.libs/LzoDecompressor.o
[exec] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/home/pigpigpig/code/hadoop-lzo/src/main/native -I./impl -I/usr/lib/jvm/java-8-oracle/include -I/usr/lib/jvm/java-8-oracle/include/linux -I/home/pigpigpig/code/hadoop-lzo/src/main/native/impl -Isrc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoDecompressor.lo -MD -MP -MF impl/lzo/.deps/LzoDecompressor.Tpo -c /home/pigpigpig/code/hadoop-lzo/src/main/native/impl/lzo/LzoDecompressor.c -o impl/lzo/LzoDecompressor.o >/dev/null 2>&1
[exec] /bin/bash ./libtool --tag=CC --mode=link gcc -g -Wall -fPIC -O2 -m64 -g -O2 -L/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server -Wl,--no-as-needed -o libgplcompression.la -rpath /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/../install/lib impl/lzo/LzoCompressor.lo impl/lzo/LzoDecompressor.lo -ljvm -ldl
[exec] libtool: link: gcc -shared impl/lzo/.libs/LzoCompressor.o impl/lzo/.libs/LzoDecompressor.o -L/usr/lib/jvm/java-8-oracle/jre/lib/amd64/server -ljvm -ldl -m64 -Wl,--no-as-needed -Wl,-soname -Wl,libgplcompression.so.0 -o .libs/libgplcompression.so.0.0.0
[exec] libtool: link: (cd ".libs" && rm -f "libgplcompression.so.0" && ln -s "libgplcompression.so.0.0.0" "libgplcompression.so.0")
[exec] libtool: link: (cd ".libs" && rm -f "libgplcompression.so" && ln -s "libgplcompression.so.0.0.0" "libgplcompression.so")
[exec] libtool: link: ar cru .libs/libgplcompression.a impl/lzo/LzoCompressor.o impl/lzo/LzoDecompressor.o
[exec] libtool: link: ranlib .libs/libgplcompression.a
[exec] libtool: link: ( cd ".libs" && rm -f "libgplcompression.la" && ln -s "../libgplcompression.la" "libgplcompression.la" )
[exec] libtool: install: cp /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/.libs/libgplcompression.so.0.0.0 /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib/libgplcompression.so.0.0.0
[exec] libtool: install: warning: remember to run`libtool --finish /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/../install/lib'
[exec] libtool: install: (cd /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib && { ln -s -f libgplcompression.so.0.0.0 libgplcompression.so.0 || { rm -f libgplcompression.so.0 && ln -s libgplcompression.so.0.0.0 libgplcompression.so.0; }; })
[exec] libtool: install: (cd /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib && { ln -s -f libgplcompression.so.0.0.0 libgplcompression.so || { rm -f libgplcompression.so && ln -s libgplcompression.so.0.0.0 libgplcompression.so; }; })
[exec] libtool: install: cp /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/.libs/libgplcompression.lai /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib/libgplcompression.la
[exec] libtool: install: cp /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/.libs/libgplcompression.a /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib/libgplcompression.a
[exec] libtool: install: chmod 644 /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib/libgplcompression.a
[exec] libtool: install: ranlib /home/pigpigpig/code/hadoop-lzo/target/native/Linux-amd64-64/lib/libgplcompression.a
[copy] Copying 5 files to /home/pigpigpig/code/hadoop-lzo/target/classes/native/Linux-amd64-64/lib
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (build-native-win) @ hadoop-lzo ---
[INFO] Executing tasks

build-native-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-resources-plugin:2.3:testResources (default-testResources) @ hadoop-lzo ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 12 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ hadoop-lzo ---
[INFO] Not compiling test sources
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (prep-test) @ hadoop-lzo ---
[INFO] Executing tasks

prep-test:
[mkdir] Created dir: /home/pigpigpig/code/hadoop-lzo/target/test-classes/logs
[INFO] Executed tasks
[INFO]
[INFO] --- maven-surefire-plugin:2.14.1:test (default-test) @ hadoop-lzo ---
[INFO] Tests are skipped.
[INFO]
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hadoop-lzo ---
[INFO] Building jar: /home/pigpigpig/code/hadoop-lzo/target/hadoop-lzo-0.4.20-SNAPSHOT.jar
[INFO]
[INFO] >>> maven-source-plugin:2.2.1:jar (attach-sources) @ hadoop-lzo >>>
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (check-platform) @ hadoop-lzo ---
[INFO] Executing tasks

check-platform:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (set-props-non-win) @ hadoop-lzo ---
[INFO] Executing tasks

set-props-non-win:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.7:run (set-props-win) @ hadoop-lzo ---
[INFO] Executing tasks

set-props-win:
[INFO] Executed tasks
[INFO]
[INFO] <<< maven-source-plugin:2.2.1:jar (attach-sources) @ hadoop-lzo <<<
[INFO]
[INFO] --- maven-source-plugin:2.2.1:jar (attach-sources) @ hadoop-lzo ---
[INFO] Building jar: /home/pigpigpig/code/hadoop-lzo/target/hadoop-lzo-0.4.20-SNAPSHOT-sources.jar
[INFO]
[INFO] --- maven-javadoc-plugin:2.9:jar (attach-javadocs) @ hadoop-lzo ---
[INFO]
Loading source files for package com.hadoop.mapreduce...
Loading source files for package com.hadoop.compression.lzo...
Loading source files for package com.hadoop.compression.lzo.util...
Loading source files for package com.hadoop.mapred...
Loading source files for package com.quicklz...
Loading source files for package org.apache.hadoop.io.compress...
Constructing Javadoc information...
Standard Doclet version 1.8.0_20
Building tree for all the packages and classes...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoIndexOutputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoIndexRecordWriter.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoLineRecordReader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoSplitInputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoSplitRecordReader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoSplitRecordReader.Counters.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/LzoTextInputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/CChecksum.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/DChecksum.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/DistributedLzoIndexer.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/GPLNativeCodeLoader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzoCodec.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzoIndex.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzoIndexer.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzoInputFormatCommon.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzopCodec.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzopDecompressor.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzopInputStream.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/LzopOutputStream.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/util/CompatibilityUtil.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/DeprecatedLzoLineRecordReader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/DeprecatedLzoTextInputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/quicklz/QuickLZ.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/org/apache/hadoop/io/compress/LzoCodec.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/overview-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/package-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/package-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/package-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/util/package-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/util/package-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/util/package-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/package-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/package-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/package-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/package-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/package-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/package-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/quicklz/package-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/quicklz/package-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/quicklz/package-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/org/apache/hadoop/io/compress/package-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/org/apache/hadoop/io/compress/package-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/org/apache/hadoop/io/compress/package-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/constant-values.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoIndexRecordWriter.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoSplitInputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoLineRecordReader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoSplitRecordReader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoSplitRecordReader.Counters.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoIndexOutputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/class-use/LzoTextInputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzopDecompressor.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzopInputStream.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/GPLNativeCodeLoader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzopCodec.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzopOutputStream.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzoIndex.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/DistributedLzoIndexer.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/CChecksum.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzoInputFormatCommon.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/DChecksum.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzoIndexer.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/class-use/LzoCodec.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/util/class-use/CompatibilityUtil.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/class-use/DeprecatedLzoTextInputFormat.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/class-use/DeprecatedLzoLineRecordReader.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/quicklz/class-use/QuickLZ.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/org/apache/hadoop/io/compress/class-use/LzoCodec.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/package-use.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/compression/lzo/util/package-use.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapred/package-use.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/hadoop/mapreduce/package-use.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/com/quicklz/package-use.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/org/apache/hadoop/io/compress/package-use.html...
Building index for all the packages and classes...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/overview-tree.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/index-all.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/deprecated-list.html...
Building index for all classes...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/allclasses-frame.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/allclasses-noframe.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/index.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/overview-summary.html...
Generating /home/pigpigpig/code/hadoop-lzo/target/apidocs/help-doc.html...
7 errors
34 warnings
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19.251s
[INFO] Finished at: Thu Sep 18 16:26:01 CST 2014
[INFO] Final Memory: 25M/216M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.9:jar (attach-javadocs) on project hadoop-lzo: MavenReportException: Error while creating archive:
[ERROR] Exit code: 1 - /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoCodec.java:86: error: bad HTML entity
[ERROR] * Check if native-lzo library is loaded & initialized.
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoCodec.java:89: error: bad HTML entity
[ERROR] * @return true if native-lzo library is loaded & initialized;
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:73: warning: no @return
[ERROR] public int getNumberOfBlocks() {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:79: warning: no description for @param
[ERROR] * @param block
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:82: error: malformed HTML
[ERROR] * The argument block should satisfy 0 <= block < getNumberOfBlocks().
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:82: error: malformed HTML
[ERROR] * The argument block should satisfy 0 <= block < getNumberOfBlocks().
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:166: warning: no description for @throws
[ERROR] * @throws IOException
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:168: warning: no @return
[ERROR] public static LzoIndex readIndex(FileSystem fs, Path lzoFile) throws IOException {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndex.java:203: warning: no description for @throws
[ERROR] * @throws IOException
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndexer.java:48: error: @param name not found
[ERROR] * @param lzoUri The file to index.
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndexer.java:49: warning: no description for @throws
[ERROR] * @throws IOException
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndexer.java:51: warning: no @param for lzoPath
[ERROR] public void index(Path lzoPath) throws IOException {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoIndexer.java:128: warning: no @param for args
[ERROR] public static void main(String[] args) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:42: warning: no @param for dflags
[ERROR] public void initHeaderFlags(EnumSet dflags,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:42: warning: no @param for cflags
[ERROR] public void initHeaderFlags(EnumSet dflags,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:97: warning: no @param for typ
[ERROR] public synchronized boolean verifyDChecksum(DChecksum typ, int checksum) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:97: warning: no @param for checksum
[ERROR] public synchronized boolean verifyDChecksum(DChecksum typ, int checksum) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:97: warning: no @return
[ERROR] public synchronized boolean verifyDChecksum(DChecksum typ, int checksum) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:105: warning: no @param for typ
[ERROR] public synchronized boolean verifyCChecksum(CChecksum typ, int checksum) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:105: warning: no @param for checksum
[ERROR] public synchronized boolean verifyCChecksum(CChecksum typ, int checksum) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:105: warning: no @return
[ERROR] public synchronized boolean verifyCChecksum(CChecksum typ, int checksum) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopDecompressor.java:35: warning: no @param for bufferSize
[ERROR] public LzopDecompressor(int bufferSize) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzoDecompressor.java:169: error: bad HTML entity
[ERROR] * @return true if lzo decompressors are loaded & initialized,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopInputStream.java:113: warning: no @param for in
[ERROR] protected void readHeader(InputStream in) throws IOException {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopInputStream.java:113: warning: no @throws for java.io.IOException
[ERROR] protected void readHeader(InputStream in) throws IOException {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopOutputStream.java:41: warning: no @param for out
[ERROR] protected static void writeLzopHeader(OutputStream out,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopOutputStream.java:41: warning: no @param for strategy
[ERROR] protected static void writeLzopHeader(OutputStream out,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/LzopOutputStream.java:41: warning: no @throws for java.io.IOException
[ERROR] protected static void writeLzopHeader(OutputStream out,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:87: warning: no @return
[ERROR] public static boolean isVersion2x() {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:107: warning: no @param for conf
[ERROR] public static TaskAttemptContext newTaskAttemptContext(Configuration conf,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:107: warning: no @param for id
[ERROR] public static TaskAttemptContext newTaskAttemptContext(Configuration conf,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:107: warning: no @return
[ERROR] public static TaskAttemptContext newTaskAttemptContext(Configuration conf,
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:127: warning: no @param for context
[ERROR] public static Configuration getConfiguration(JobContext context) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:127: warning: no @return
[ERROR] public static Configuration getConfiguration(JobContext context) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:135: warning: no @param for context
[ERROR] public static Counter getCounter(TaskInputOutputContext context, Enum counter) { [ERROR] ^ [ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:135: warning: no @param for counter [ERROR] public static Counter getCounter(TaskInputOutputContext context, Enum counter) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:135: warning: no @return
[ERROR] public static Counter getCounter(TaskInputOutputContext context, Enum<?> counter) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:142: warning: no @param for counter
[ERROR] public static void incrementCounter(Counter counter, long increment) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:142: warning: no @param for increment
[ERROR] public static void incrementCounter(Counter counter, long increment) {
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:147: error: bad HTML entity
[ERROR] * Hadoop 1 & 2 compatible counter.getValue()
[ERROR] ^
[ERROR] /home/pigpigpig/code/hadoop-lzo/src/main/java/com/hadoop/compression/lzo/util/CompatibilityUtil.java:150: warning: no @param for counter
[ERROR] public static long getCounterValue(Counter counter) {
[ERROR] ^
[ERROR]
[ERROR] Command line was: /usr/lib/jvm/java-8-oracle/jre/../bin/javadoc @options @packages
[ERROR]
[ERROR] Refer to the generated Javadoc files in '/home/pigpigpig/code/hadoop-lzo/target/apidocs' dir.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

Tests fail on FreeBSD

I'm on 10.2-RELEASE-p7 amd64. After replacing make with gmake and deleting erroneous FreeBSD detection I have built hadoop-lzo, but it fails test:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running com.hadoop.compression.lzo.TestLzopInputStream
2016-05-27 09:57:29,000 ERROR lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(63)) - Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:54)
    at com.hadoop.compression.lzo.TestLzopInputStream.runTest(TestLzopInputStream.java:111)
    at com.hadoop.compression.lzo.TestLzopInputStream.testBigFile(TestLzopInputStream.java:59)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:243)
    at junit.framework.TestSuite.run(TestSuite.java:238)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:159)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:87)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)
2016-05-27 09:57:29,015 WARN  lzo.TestLzopInputStream (TestLzopInputStream.java:runTest(112)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:29,018 WARN  lzo.TestLzopInputStream (TestLzopInputStream.java:runTest(112)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:29,020 WARN  lzo.TestLzopInputStream (TestLzopInputStream.java:runTest(112)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:29,021 WARN  lzo.TestLzopInputStream (TestLzopInputStream.java:runTest(112)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:29,022 WARN  lzo.TestLzopInputStream (TestLzopInputStream.java:runTest(112)) - Cannot run this test without the native lzo libraries
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.218 sec
Running com.hadoop.compression.lzo.TestLzoCodec
2016-05-27 09:57:29,432 ERROR lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(63)) - Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:54)
    at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
    at com.hadoop.compression.lzo.TestLzoCodec.testCodecPoolReuseWithoutConf(TestLzoCodec.java:66)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:243)
    at junit.framework.TestSuite.run(TestSuite.java:238)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:159)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:87)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)
2016-05-27 09:57:29,439 ERROR lzo.LzoCodec (LzoCodec.java:<clinit>(81)) - Cannot load native-lzo without native-hadoop
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.29 sec <<< FAILURE!
testCodecPoolReuseWithoutConf(com.hadoop.compression.lzo.TestLzoCodec)  Time elapsed: 0.271 sec  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
    at com.hadoop.compression.lzo.TestLzoCodec.testCodecPoolReuseWithoutConf(TestLzoCodec.java:73)

testCodecPoolChangeBufferSize(com.hadoop.compression.lzo.TestLzoCodec)  Time elapsed: 0 sec  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
    at com.hadoop.compression.lzo.TestLzoCodec.testCodecPoolChangeBufferSize(TestLzoCodec.java:48)

testCodecPoolReinit(com.hadoop.compression.lzo.TestLzoCodec)  Time elapsed: 0.001 sec  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
    at com.hadoop.compression.lzo.TestLzoCodec.testCodecPoolReinit(TestLzoCodec.java:22)

Running com.hadoop.compression.lzo.TestLzoRandData
2016-05-27 09:57:29,983 ERROR lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(63)) - Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:54)
    at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
    at com.hadoop.compression.lzo.TestLzoRandData.runTest(TestLzoRandData.java:52)
    at com.hadoop.compression.lzo.TestLzoRandData.testLzoRandDataHugeChunks(TestLzoRandData.java:48)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:243)
    at junit.framework.TestSuite.run(TestSuite.java:238)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:159)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:87)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)
2016-05-27 09:57:29,990 ERROR lzo.LzoCodec (LzoCodec.java:<clinit>(81)) - Cannot load native-lzo without native-hadoop
Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.32 sec <<< FAILURE!
testLzoRandDataHugeChunks(com.hadoop.compression.lzo.TestLzoRandData)  Time elapsed: 0.243 sec  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
    at com.hadoop.compression.lzo.LzopCodec.getCompressor(LzopCodec.java:171)
    at com.hadoop.compression.lzo.LzopCodec.createOutputStream(LzopCodec.java:72)
    at com.hadoop.compression.lzo.TestLzoRandData.runTest(TestLzoRandData.java:62)
    at com.hadoop.compression.lzo.TestLzoRandData.testLzoRandDataHugeChunks(TestLzoRandData.java:48)

testLzoRandDataLargeChunks(com.hadoop.compression.lzo.TestLzoRandData)  Time elapsed: 0.035 sec  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
    at com.hadoop.compression.lzo.LzopCodec.getCompressor(LzopCodec.java:171)
    at com.hadoop.compression.lzo.LzopCodec.createOutputStream(LzopCodec.java:72)
    at com.hadoop.compression.lzo.TestLzoRandData.runTest(TestLzoRandData.java:62)
    at com.hadoop.compression.lzo.TestLzoRandData.testLzoRandDataLargeChunks(TestLzoRandData.java:44)

testLzoRandData(com.hadoop.compression.lzo.TestLzoRandData)  Time elapsed: 0.023 sec  <<< ERROR!
java.lang.RuntimeException: native-lzo library not available
    at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
    at com.hadoop.compression.lzo.LzopCodec.getCompressor(LzopCodec.java:171)
    at com.hadoop.compression.lzo.LzopCodec.createOutputStream(LzopCodec.java:72)
    at com.hadoop.compression.lzo.TestLzoRandData.runTest(TestLzoRandData.java:62)
    at com.hadoop.compression.lzo.TestLzoRandData.testLzoRandData(TestLzoRandData.java:40)

Running com.hadoop.compression.lzo.TestLzopOutputStream
2016-05-27 09:57:30,941 WARN  util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-05-27 09:57:31,306 ERROR lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(63)) - Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:54)
    at com.hadoop.compression.lzo.TestLzopOutputStream.runTest(TestLzopOutputStream.java:121)
    at com.hadoop.compression.lzo.TestLzopOutputStream.testBigFile(TestLzopOutputStream.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:243)
    at junit.framework.TestSuite.run(TestSuite.java:238)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:159)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:87)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)
2016-05-27 09:57:31,313 WARN  lzo.TestLzopOutputStream (TestLzopOutputStream.java:runTest(122)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:31,350 WARN  lzo.TestLzopOutputStream (TestLzopOutputStream.java:runTest(122)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:31,382 WARN  lzo.TestLzopOutputStream (TestLzopOutputStream.java:runTest(122)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:31,412 WARN  lzo.TestLzopOutputStream (TestLzopOutputStream.java:runTest(122)) - Cannot run this test without the native lzo libraries
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.121 sec
Running com.hadoop.mapreduce.TestLzoTextInputFormat
2016-05-27 09:57:32,027 ERROR lzo.GPLNativeCodeLoader (GPLNativeCodeLoader.java:<clinit>(63)) - Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1864)
    at java.lang.Runtime.loadLibrary0(Runtime.java:870)
    at java.lang.System.loadLibrary(System.java:1122)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:54)
    at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
    at com.hadoop.mapreduce.TestLzoTextInputFormat.<init>(TestLzoTextInputFormat.java:67)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at junit.framework.TestSuite.createTest(TestSuite.java:61)
    at junit.framework.TestSuite.addTestMethod(TestSuite.java:294)
    at junit.framework.TestSuite.addTestsFromTestCase(TestSuite.java:150)
    at junit.framework.TestSuite.<init>(TestSuite.java:129)
    at org.junit.internal.runners.JUnit38ClassRunner.<init>(JUnit38ClassRunner.java:71)
    at org.junit.internal.builders.JUnit3Builder.runnerForClass(JUnit3Builder.java:14)
    at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
    at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:29)
    at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:57)
    at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray2(ReflectionUtils.java:208)
    at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:159)
    at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:87)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:95)
2016-05-27 09:57:32,042 ERROR lzo.LzoCodec (LzoCodec.java:<clinit>(81)) - Cannot load native-lzo without native-hadoop
2016-05-27 09:57:32,096 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTest(152)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,097 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTest(152)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,100 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTest(152)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,100 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTest(152)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,110 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,110 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,111 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,111 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,113 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,113 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,114 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
2016-05-27 09:57:32,114 WARN  mapreduce.TestLzoTextInputFormat (TestLzoTextInputFormat.java:runTestIgnoreNonLzo(293)) - Cannot run this test without the native lzo libraries
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.314 sec

Results :

Tests in error: 
  TestLzoCodec.testCodecPoolReuseWithoutConf:73 ? Runtime native-lzo library not...
  TestLzoCodec.testCodecPoolChangeBufferSize:48 ? Runtime native-lzo library not...
  TestLzoCodec.testCodecPoolReinit:22 ? Runtime native-lzo library not available
  TestLzoRandData.testLzoRandDataHugeChunks:48->runTest:62 ? Runtime native-lzo ...
  TestLzoRandData.testLzoRandDataLargeChunks:44->runTest:62 ? Runtime native-lzo...
  TestLzoRandData.testLzoRandData:40->runTest:62 ? Runtime native-lzo library no...

Tests run: 20, Failures: 0, Errors: 6, Skipped: 0

Potential thread safety issue with LzoDecompressor

The problem occurs when trying to read lzo compressed files with spark using sc.textFile(...).
But works fine when using LzoTextInputFormat, with the same dataset and job config.

I encounter multiple

java.lang.InternalError: lzo1x_decompress_safe returned: -6
    at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
    at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:315)
    at com.hadoop.compression.lzo.LzopDecompressor.decompress(LzopDecompressor.java:122)
    at com.hadoop.compression.lzo.LzopInputStream.decompress(LzopInputStream.java:252)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
    at java.io.InputStream.read(InputStream.java:101)
    at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)

And sometime few

Compressed length 892154724 exceeds max block size 67108864 (probably corrupt file)
  at com.hadoop.compression.lzo.LzopInputStream.getCompressedData(LzopInputStream.java:291)

Those happen only when having multiple threads per jvm (multiple executor-cores).
We are using a snapshot version of 0.4.20 starting from this commit.

Thanks

LzoCompress is not a public class,What`s the purpose?

Anytime I use LzoCompress,i need package the class in the com.hadoop.compression.lzo

Now i have change the access modifiers to public, is there any problem?

Unable to process LzoProtobuffB64LinePigStore data with new hadoop-lzo elephant bird

Hi,

Have compiled the latest hadoop-lzo EB and unable to process the LzoProtobuffB64LinePigStore(previously classname) data in pig query.

Have passing the message proto in arguments.

eg.,

LOAD '/tmp/mytest.lzo' using com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore('ad_data');

Is this LzoProtobuffB64LinePigStore class deprecated in the package, if yes which class to be appropriate for that case.

Please help.

Thanks.

JVM crash in LZO

JVM crashes in lzo while decompressing likely corrupted file. Just wondering if the code could be hard-ended against this?

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd2629c061b, pid=11675, tid=140541442496256
#
# JRE version: 6.0_24-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [liblzo2.so.2+0x1361b]  lzo1x_decompress+0x1eb

---------------  T H R E A D  ---------------

Current thread (0x00000000413cd800):  JavaThread "Background process" [_thread_in_native, id=12567, stack(0x00007fd25ab19000,0x00007fd25ac1a000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x000000004215f000

Registers:
RAX=0x00000000040b4eb9, RBX=0x000000004215f000, RCX=0x00007fd25ac17f78, RDX=0x0000000041a79000
RSP=0x00007fd25ac17eb0, RBP=0x00000000040b4eb8, RSI=0x0000000041a38001, RDI=0x000000004215ef65
R8 =0x0000000041a79024, R9 =0x00000000039ceed8, R10=0x0000000041a78f85, R11=0x0000000000000000
R12=0x00007fd25ac18038, R13=0x0000000041a79000, R14=0x00000000413d1bd0, R15=0x0000000041a38000
RIP=0x00007fd2629c061b, EFL=0x0000000000010206, CSGSFS=0x0000000000000033, ERR=0x0000000000000006
  TRAPNO=0x000000000000000e

Register to memory mapping:

RAX=0x00000000040b4eb9
0x00000000040b4eb9 is pointing to unknown location

RBX=0x000000004215f000
0x000000004215f000 is pointing to unknown location

RCX=0x00007fd25ac17f78
0x00007fd25ac17f78 is pointing into the stack for thread: 0x00000000413cd800
"Background process" prio=10 tid=0x00000000413cd800 nid=0x3117 runnable [0x00007fd25ac17000]
   java.lang.Thread.State: RUNNABLE

RDX=0x0000000041a79000
0x0000000041a79000 is pointing to unknown location

RSP=0x00007fd25ac17eb0
0x00007fd25ac17eb0 is pointing into the stack for thread: 0x00000000413cd800
"Background process" prio=10 tid=0x00000000413cd800 nid=0x3117 runnable [0x00007fd25ac17000]
   java.lang.Thread.State: RUNNABLE

RBP=0x00000000040b4eb8
0x00000000040b4eb8 is pointing to unknown location

RSI=0x0000000041a38001
0x0000000041a38001 is pointing to unknown location

RDI=0x000000004215ef65
0x000000004215ef65 is pointing to unknown location

R8 =0x0000000041a79024
0x0000000041a79024 is pointing to unknown location
R9 =0x00000000039ceed8
0x00000000039ceed8 is pointing to unknown location

R10=0x0000000041a78f85
0x0000000041a78f85 is pointing to unknown location

R11=0x0000000000000000
0x0000000000000000 is pointing to unknown location

R12=0x00007fd25ac18038
0x00007fd25ac18038 is pointing into the stack for thread: 0x00000000413cd800
"Background process" prio=10 tid=0x00000000413cd800 nid=0x3117 runnable [0x00007fd25ac17000]
   java.lang.Thread.State: RUNNABLE

R13=0x0000000041a79000
0x0000000041a79000 is pointing to unknown location

R14=0x00000000413d1bd0
0x00000000413d1bd0 is pointing to unknown location

R15=0x0000000041a38000
0x0000000041a38000 is pointing to unknown location

Top of Stack: (sp=0x00007fd25ac17eb0)
0x00007fd25ac17eb0:   00000000413cd9c8 00007fd25ac17fb0
0x00007fd25ac17ec0:   00007fd25ac18038 00007fd262bec174
0x00007fd25ac17ed0:   0000000100000000 00007fd262bec9b3
0x00007fd25ac17ee0:   00007fd2629c0430 0000000100040000
0x00007fd25ac17ef0:   00007fd200000001 00000000c3449cf8
0x00007fd25ac17f00:   0000000000000000 00007fd270ef8e00
0x00007fd25ac17f10:   0000000000000020 0000000000000001
0x00007fd25ac17f20:   00000000c3449cf8 00007fd26be3e720
0x00007fd25ac17f30:   00007fd25ac17fc0 00007fd26be2f9a9
0x00007fd25ac17f40:   00000000bec9ca78 00007fd26be2f9a9
0x00007fd25ac17f50:   0000000000000001 0000000000000000
0x00007fd25ac17f60:   00000000be082f20 00007fd26be5066d
0x00007fd25ac17f70:   0000000000040000 0000000000000000
0x00007fd25ac17f80:   00007fd25ac17f70 00000000beeec710
0x00007fd25ac17f90:   0000000000000000 00000000beeec710
0x00007fd25ac17fa0:   00007fd25ac18038 00000000413cd800
0x00007fd25ac17fb0:   00007fd25ac18010 00007fd26be40a8f
0x00007fd25ac17fc0:   00007fd26be2f9a9 0000000000040000
0x00007fd25ac17fd0:   00007fd25ac17fd0 0000000000000000
0x00007fd25ac17fe0:   00007fd25ac18038 00000000beeecd78
0x00007fd25ac17ff0:   0000000000000000 00000000beeec710
0x00007fd25ac18000:   0000000000000000 00007fd25ac18030
0x00007fd25ac18010:   00007fd25ac18090 00007fd26be2f9ee
0x00007fd25ac18020:   0000000000000000 00007fd26be3c296
0x00007fd25ac18030:   000000000000000c 00000000c348e138
0x00007fd25ac18040:   0000000000000029 00000000c348e138
0x00007fd25ac18050:   00007fd25ac18040 00000000beeebecb
0x00007fd25ac18060:   00007fd25ac180c0 00000000beeecd78
0x00007fd25ac18070:   0000000000000000 00000000beeebf60
0x00007fd25ac18080:   00007fd25ac18030 00007fd25ac180d0
0x00007fd25ac18090:   00000000ebfaaff0 00007fd26c003b88
0x00007fd25ac180a0:   0000000000000000 0000000000000001

Instructions: (pc=0x00007fd2629c061b)
0x00007fd2629c060b:   c3 0f 1f 40 00 44 8b 1f 49 83 e9 04 48 83 c7 04
0x00007fd2629c061b:   44 89 1b 48 83 c3 04 49 83 f9 03 77 e8 48 83 e8

Stack: [0x00007fd25ab19000,0x00007fd25ac1a000],  sp=0x00007fd25ac17eb0,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [liblzo2.so.2+0x1361b]  lzo1x_decompress+0x1eb

[error occurred during error reporting (printing native stack), id 0xb]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)

j  com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(I)I+0
j  com.hadoop.compression.lzo.LzoDecompressor.decompress([BII)I+171
J  org.apache.hadoop.io.compress.DecompressorStream.read()I
J  org.apache.hadoop.io.WritableUtils.readVInt(Ljava/io/DataInput;)I
...

Old LZO version

Hi,
I try to uncompress a LZO archive with your library and I got the following error:

java.io.IOException: Compressed with incompatible lzo version: 0x2060 (expected 0x2050)

Is there a way to uncompress older LZO version with your API ?
Thanks

Mathieu

Tags for versioning

Hi,

I see the build.xml has a version and it has been changed I assume is because the code is good shape to move to the next version, would it be possible to add tags to those versions, that would be very helpful to fetch specific versions.

Thanks

how to get lzo loaded?

at every beginning,I run:hadoop jar hadoop--examples.jar grep input output 'dfs[a-z.]+' successfully,then I followed this link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ to install lzo and run:hadoop jar hadoop--examples.jar grep input output 'dfs[a-z.]+',got errors:

java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
.....................................................................
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
.....................................................................
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 22 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec
not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
.......
... 27 more
Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec

    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          .....................................................................
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
    ... 29 more

Obviously,lzo library didn't get loaded,but when I run:"ps -eaf|grep lzo",got:

Djava.library.path=/usr/local/............/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hadoop-lzo-0.4.4.jar:/usr/local/hadoo......:..

See,lzo library existed in Djava.library.path.Actually,this problem came up one month ago,until now I haven't solved this problem,it is killing me,here I post my all configure files,would you please help me dig problem out?Thank you,
core-site.xml

fs.default.name hdfs://AlexLuya:8020 hadoop.tmp.dir /home/alex/tmp

    <property>
            <name>io.compression.codecs</name>
            <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec
            </value>
    </property>
    <property>
            <name>io.compression.codec.lzo.class</name>
            <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>

mapreduce.xml

mapred.job.tracker AlexLuya:9001 mapred.tasktracker.reduce.tasks.maximum 1 mapred.tasktracker.map.tasks.maximum 1 mapred.local.dir /home/alex/hadoop/mapred/local mapred.system.dir /tmp/hadoop/mapred/system mapreduce.map.output.compress true mapreduce.map.output.compress.codec com.hadoop.compression.lzo.LzoCodec

hadoop-env.sh

Set Hadoop-specific environment variables here.

The only required environment variable is JAVA_HOME. All others are

optional. When running a distributed configuration it is best to

set JAVA_HOME in this file, so that it is correctly defined on

remote nodes.

The java implementation to use. Required.

export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_21

Extra Java CLASSPATH elements. Optional.

export HADOOP_CLASSPATH=

The maximum amount of heap to use, in MB. Default is 1000.

export HADOOP_HEAPSIZE=200

Extra Java runtime options. Empty by default.

export HADOOP_OPTS=-server

Command specific options appended to HADOOP_OPTS when specified

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"

export HADOOP_TASKTRACKER_OPTS=

The following applies to multiple commands (fs, dfs, fsck, distcp etc)

export HADOOP_CLIENT_OPTS

Extra ssh options. Empty by default.

export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

Where log files are stored. $HADOOP_HOME/logs by default.

export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.

export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

host:path where hadoop code should be rsync'd from. Unset by default.

export HADOOP_MASTER=master:/home/$USER/src/hadoop

Seconds to sleep between slave commands. Unset by default. This

can be useful in large clusters, where, e.g., slave rsyncs can

otherwise arrive faster than the master can service them.

export HADOOP_SLAVE_SLEEP=0.1

The directory where pid files are stored. /tmp by default.

export HADOOP_PID_DIR=/var/hadoop/pids

A string representing this instance of hadoop. $USER by default.

export HADOOP_IDENT_STRING=$USER

The scheduling priority for daemon processes. See 'man nice'.

export HADOOP_NICENESS=10

Hadoop-streaming job with lzo has extra column

I'm guessing the extra column (at the start) corresponds to the index somehow. Is there any way to make this not show up in Hadoop streaming jobs? Maybe this could be documented?

Build fails on ubuntu 12.04

Hi,

I have a problem that looks very similar to issue 33. It seems that HADOOP_LZO_LIBRARY is not being set correctly.

My system is as follows: ant 1.8.2, java 1.6.0_24

I've tried gcc 4.6 and 4.4 with same results

The compilation output error is as follows:

[exec] /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I/home/peter/compiled/twitter-hadoop-lzo-cc0cdbd/src/native -I./impl -I/usr/lib/jvm/java-6-openjdk-amd64/include -I/usr/lib/jvm/java-6-openjdk-amd64/include/linux -I/home/peter/compiled/twitter-hadoop-lzo-cc0cdbd/src/native/impl -Is/home/peter/compiled/twitter-hadoop-lzo-cc0cdbd/src/native/impl/lzo/LzoCompressor.c: In function ‘Java_com_hadoop_compression_rc/com/hadoop/compression/lzo -g -Wall -fPIC -O2 -m64 -g -O2 -MT impl/lzo/LzoCompressor.lo -MD -MP -MF $depbase.Tpo -c -o impl/lzo_LzoCompressor_initIDs’:
[exec] /home/peter/compiled/twitter-hadoop-lzo-cc0cdbd/src/native/impl/lzo/LzoCompressor.c:125:37: error: expected expression before ‘,’ token

Note that in issue 33, one fix is to set the following flag:

LDFLAGS="-Wl,--no-as-needed"

however that does not seem to work for me.

thanks
Peter

Getting "No LZO codec found, cannot run"

I've installed the lzo jar on all the machines in my hadoop cluster but keep getting this exception in job runs...

java.io.IOException: No LZO codec found, cannot run.
    at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(DeprecatedLzoLineRecordReader.java:53)
    at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:158)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:371)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)

It's quite misleading. Could anyone point me in the right direction?

Build fails on Ubuntu 11.10 (changed ld default behavior)

I noticed that the build of hadoop-lzo fails on Ubuntu 11.10. The reason is that the default behavior of ld was changed in 11.10.

From Ubuntu 11.10 Release Notes:

The compiler passes by default two additional flags to the linker:

[...snipp...]

-Wl,--as-needed with this option the linker will only add a DT_NEEDED tag for a dynamic library mentioned on the command line if if the library is actually used.

This was apparently planned to be changed already back in 11.04 but was eventually reverted in the final release. From 11.04 Toolchain Transition:

Also in Natty, ld runs with the --as-needed option enabled by default. This means that, in the example above, if no symbols from libwheel were needed by racetrack, then libwheel would not be linked even if it was explicitly included in the command-line compiler flags. NOTE: The ld --as-needed default was reverted for the final natty release, and will be re-enabled in the o-series.

Now in hadoop-lzo this manifests itself in the script {{src/native/configure}}. The following check will fail because of the changed ld default behavior:

  echo 'int main(int argc, char **argv){return 0;}' > conftest.c
  if test -z "`${CC} ${CFLAGS} ${LDFLAGS} -o conftest conftest.c -llzo2 2>&1`"; then
        if test ! -z "`which otool | grep -v 'no otool'`"; then
      ac_cv_libname_lzo2=\"`otool -L conftest | grep lzo2 | sed -e 's/^  *//' -e 's/ .*//'`\";
    elif test ! -z "`which objdump | grep -v 'no objdump'`"; then
      ac_cv_libname_lzo2="`objdump -p conftest | grep NEEDED | grep lzo2 | sed 's/\W*NEEDED\W*\(.*\)\W*$/\"\1\"/'`"
    elif test ! -z "`which ldd | grep -v 'no ldd'`"; then
      ac_cv_libname_lzo2="`ldd conftest | grep lzo2 | sed 's/^[^A-Za-z0-9]*\([A-Za-z0-9\.]*\)[^A-Za-z0-9]*=>.*$/\"\1\"/'`"
    else
      as_fn_error $? "Can't find either 'objdump' or 'ldd' to compute the dynamic library for '-llzo2'" "$LINENO" 5
    fi
  else

This line compiles a dummy C script and tells gcc to link it to the liblzo library (the native LZO library).

${CC} ${CFLAGS} ${LDFLAGS} -o conftest conftest.c -llzo2

Because of the changed ld behavior in 11.10, however, conftest will not link liblzo2. Hence the subsequent check will fail; since I do not have otool installed on my local dev box, the following code is run in my case:

 ac_cv_libname_lzo2="`objdump -p conftest | grep NEEDED | grep lzo2 | sed 's/\W*NEEDED\W*\(.*\)\W*$/\"\1\"/'`"

This command will return an empty string and assign it to the variable ac_cv_libname_lzo2. The eventual result is that the symbol HADOOP_LZO_LIBRARY will be assigned an empty string, too.

cat >>confdefs.h <<_ACEOF
#define HADOOP_LZO_LIBRARY ${ac_cv_libname_lzo2}
_ACEOF

Without a proper value for HADOOP_LZO_LIBRARY, however, the build of hadoop-lzo will fail.

HOW TO FIX

In our build setup we are using LDFLAGS to pass this option when invoking ant like so:

$ env LDFLAGS="-Wl,--no-as-needed" ant ...

In general though I'd think it would be best to fix this in build.xml:

    <exec dir="${build.native}" executable="sh" failonerror="true">
       <env key="OS_NAME" value="${os.name}"/>
       <env key="OS_ARCH" value="${os.arch}"/>
       <env key="LDFLAGS" value="-Wl,--no-as-needed"/>    <== ADD THIS LINE
       ...
    </exec>

Best,
Michael

Error in distributed indexer when running on hadoop 0.21.0: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

Trying to run the distributed indexer against a Hadoop 0.21.0 cluster. I get the following error:
Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

By the looks of it the above was a class in 0.20.2 but as of 0.21.0 it's become an interface.

Windows build.properties contains shell script error instead of commit ID in build_revision field.

hadoop-lzo logs version information when it loads. We've discovered that for Windows builds, we're instead getting error output from the get_build_revision.sh script. For example:

14/09/29 14:30:47 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev D:\w\b\project\hadoop-lzo/scripts/get_build_revision.sh: line 2: $'\r': command not found 
D:\w\b\project\hadoop-lzo/scripts/get_build_revision.sh: line 28: syntax error: unexpected end of file]

This information comes from the build.properties file. There are 2 problems with the way this file is generated on Windows:

Running sh on a script, even on Windows, expects LF line endings. At least this is the case with the sh builds from GnuWin32 that I've used. However, depending on the git user, the files might get checked out with CRLF line endings on Windows.
The argument passed to sh is the absolute path of the get_build_revision.sh, composed by concatenating the Maven basedir and the relative path to get_build_revision.sh. On Windows, basedir will contain back slashes. sh can handle this, but it writes a warning to stderr advising that it would be better to use forward slashes.

Incompatibility with Hadoop 2.0 API

The problem

Hadoop-LZO is not yet compatible with the new MapReduce API. An email thread detailing the problem can be found in the CDH4 user group [1].

A typical error message looks as follows:

Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:590)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)

How to reproduce

I ran a simple Pig script using ElephantBird against LZO-compressed input data. Here's the basic skeleton:

-- script to reproduce the bug
DEFINE StorageFormat com.twitter.elephantbird.pig.store.LzoPigStorage(' ');
raw_data = LOAD '/path/to/input.txt.lzo' USING StorageFormat;
DUMP raw_data;

Why does this error happen?

Taken from the email thread above:

CDH4 changed JobContext to be an interface. You'll have to modify the LZO input format to make it compatible with CDH4.

Why is this relevant for Hadoop-LZO and not only for the Cloudera folks?

From what I understand this is not a Cloudera-specific issue but rather caused by CDH4 being based on Hadoop 2.x (and not Hadoop 1.x / 0.20.x).

Hadoop 1.0.3: JobContext is a class
Hadoop 2.0.0: JobContext is an interface

How to fix

The email thread above has some patches attached to it, which Cloudera has started to integrate into their fork of Hadoop-LZO. As of today, however, the patches are not yet in their "official" Hadoop-LZO repo (http://github.com/cloudera/hadoop-lzo / https://github.com/toddlipcon/hadoop-lzo) but in a branch in a forked repo one of their employees (https://github.com/kambatla/hadoop-lzo/tree/cdh4-wip).

The relevant commits so far are:

CDH4 work-in-progress branch (even though the commit message does not suggest it, it is indeed a link to a commit and not a branch)
Update tests to TaskAttemptContextImpl (Contributed by Henning)
Update versions - ivy, junit and commons-logging (I am not 100% sure about this one, but I suppose this commit is needed, too)

Do you think it would be possible to get this fix into Twitter's Hadoop-LZO repo, too?

Oh, one more thing: From what I can tell the current fix will add compatibility with Hadoop 2.0 but it will break backwards-compatibility with Hadoop 1.0 at the same time. So some additional work might be needed.

Best,
Michael

[1] https://groups.google.com/a/cloudera.org/d/topic/cdh-user/XrNoLTMQjPU/discussion

About hadoop lzo

my problem as follows:
$ bin/hadoop jar /home/hadoop/hadoop/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.LzoIndexer README.txt.lzo
12/07/24 17:16:52 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
12/07/24 17:16:52 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f]
12/07/24 17:16:53 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file README.txt.lzo, size 0.00 GB...
Exception in thread "main" java.lang.IllegalArgumentException: Compression codec
com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
at org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:134)
at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:209)
at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
at com.hadoop.compression.lzo.LzoIndexer.main(LzoIndexer.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException:
com.hadoop.compression.lzo.LzoCodec
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 11 more
hadoop version :hadoop-0.20.203.0;
java version:jdk-6u31-linux-i586.bin
and i have installed lzop-1.03,lzo-2.06,
kevinweil-hadoop-lzo-6bb1b7f.zip have builded successfully,i have copied hadoop-lzo-0.4.15.jar to /home/hadoop/hadoop/lib/lzo(my hadoop home is /home/hadoop/hadoop),and copied lib/native/Linux-i386-32 to /home/hadoop/hadoop/lib/native/Linux-i386-32(my system is 32 bit).and my /etc/profile is:
export JAVA_HOME=/usr/lib/jvm/java-1.6.0_31-sun
export CHUKWA_HOME=/home/hadoop/chukwa-0.4.0
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=/home/hadoop/hadoop/conf
export ANT_HOME=/home/hadoop/apache-ant-1.8.4
export PATH=$PATH:#HADOOP_HOME/bin:#ANT_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/hadoop-core-0.20.203.0.jar:$CHUKWA_HOME/lib/:$HADOOP_HOME/lib/hadoop-lzo-0.4.15.jar

my hadoop-env.sh is

Set Hadoop-specific environment variables here.

The only required environment variable is JAVA_HOME. All others are

optional. When running a distributed configuration it is best to

set JAVA_HOME in this file, so that it is correctly defined on

remote nodes.

The java implementation to use. Required.

export JAVA_HOME=/usr/lib/jvm/java-1.6.0_31-sun
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Extra Java CLASSPATH elements. Optional.

export HADOOP_HOME=/home/hadoop/hadoop

export HADOOP_CLASSPATH=/home/hadoop/hadoop/lib/hadoop-lzo-0.4.15.jar

export LD_LIBRARY_PATH=/home/hadoop/hadoop/lib/native

export LIBRARY_PATH=/home/hadoop/hadoop/lib/native/Linux-i386-32/lib:/home/hadoop/hadoop/lib/native:/home/hadoop/hadoop/lib

export JAVA_LIBRARY_PATH=/home/hadoop/hadoop/lib/native:/home/hadoop/hadoop/lib/native/Linux-i386-32/lib

export HADOOP_LIBRARY_PATH=/home/hadoop/hadoop/lib/native/Linux-i386-32/lib:/home/hadoop/hadoop/lib/native:/home/hadoop/hadoop/lib

The maximum amount of heap to use, in MB. Default is 1000.

export HADOOP_HEAPSIZE=2000

Extra Java runtime options. Empty by default.

export HADOOP_OPTS=-server

Command specific options appended to HADOOP_OPTS when specified

export HADOOP_TASKTRACKER_OPTS=

The following applies to multiple commands (fs, dfs, fsck, distcp etc)

export HADOOP_CLIENT_OPTS

Extra ssh options. Empty by default.

export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

Where log files are stored. $HADOOP_HOME/logs by default.

export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.

export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

host:path where hadoop code should be rsync'd from. Unset by default.

export HADOOP_MASTER=master:/home/$USER/src/hadoop

Seconds to sleep between slave commands. Unset by default. This

can be useful in large clusters, where, e.g., slave rsyncs can

otherwise arrive faster than the master can service them.

export HADOOP_SLAVE_SLEEP=0.1

The directory where pid files are stored. /tmp by default.

export HADOOP_PID_DIR=/var/hadoop/pids

A string representing this instance of hadoop. $USER by default.

export HADOOP_IDENT_STRING=$USER

The scheduling priority for daemon processes. See 'man nice'.

export HADOOP_NICENESS=10

no matter I try it on the cluster or on a single node ,the problem is always exits,i don't know why ,please help me!!1

lzo.index.tmp files not deleted

We use distributed lzo indexer on EMR (hadoop version: 1.0.3), files stored on Amazon s3.

Sometimes (observed twice by now) we had the following issue:

all lzo.index is generated, but some of the lzo.index.tmp files are not deleted and cause problem when processing them with pig. No exception or error is thrown during the indexing and job is reported to run successfully.

Problem with hadoop v1.0.3

Hi all,

I've followed the all steps from README.md
But now I have the next error
/usr/local/hadoop/lib $ hadoop jar hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer /lzoInput

13/06/18 11:42:34 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: /usr/local/hadoop-1.0.3/lib/native/Linux-amd64-64/libgplcompression.so: /usr/local/hadoop-1.0.3/lib/native/Linux-amd64-64/libgplcompression.so: undefined symbol: lzo1x_999_compress_level
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1807)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1732)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
....

Enviroment
hadoop 1.0.3
java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)

Thanks for any ideas.

trying to index lzo files giving exceptions : ClassCastException: com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to com.hadoop.compression.lzo.LzopDecompressor at Indexer.main(Indexer.java:14)

here is the code:-

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
//import com.

//import com.hadoop.compression.*;
import com.hadoop.compression.lzo.LzoIndexer;

public class Indexer {
public static void main(String... args) {
LzoIndexer var = new LzoIndexer(new Configuration());
try {
var.index(new Path("/home/malpani/Desktop/input_mahabharat.txt.lzo"));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

OUTPUT exception details :-

INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
10/08/19 10:40:23 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library
10/08/19 10:40:23 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /home/malpani/Desktop/input_mahabharat.txt.lzo, size 0.00 GB...
Exception in thread "main" java.lang.ClassCastException: com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to com.hadoop.compression.lzo.LzopDecompressor
at com.hadoop.compression.lzo.LzoIndex.createIndex(LzoIndex.java:222)
at com.hadoop.compression.lzo.LzoIndexer.indexSingleFile(LzoIndexer.java:117)
at com.hadoop.compression.lzo.LzoIndexer.indexInternal(LzoIndexer.java:98)
at com.hadoop.compression.lzo.LzoIndexer.index(LzoIndexer.java:52)
at Indexer.main(Indexer.java:14)

fix typo in last commit

I'm very sorry, I introduced a typo in my last pull request: #61 . There was a semicolon missing at the end of a line.

open and close record reader without reading from it results in endless loop

Using plain hadoop TextInputFortmat with LzopCodec.
Having code which instantiates a record reader and closes it without reading from it (kind of format check).
This hangs for .lzo files.

Since it cannot be used in Win7,what`s the function of lzo2.dll

hadoop-lzo\target\antrun\build-build-info-win.xml:7: Execute failed: java.io.IOException: Cannot run program "sh": CreateProcess error=2, ?????????

i have compared lzo2.dll use cl in win7 64bit,but when i want the jni for lzo,just failed.
how i can use lzo without hadoop in win7 64bit

different results with and without index

I'm running a Pig job over a couple of sets of identical data, with the only difference being one is raw, and the other LZO compressed. I'm seeing some discrepancies in the result set with the LZO indexes.

Raw text - Records written : 17244
LZO (indexed) - Records written : 17214
LZO (no indexes) - Records written : 17244

Do you know what could be causing this behavior? I'm not sure if I'd look here, and/or elephant-bird. Thanks for releasing this code.

Publish artifacts in public repo

I'm trying to build a project slated for open source outside of Twitter. It depends on elephant-bird which in turn depends on hadoop-lzo. It appears we don't publish the hadoop-lzo artifacts. We should publish them for convenience in a public repo, perhaps using github similar to how elephant-bird does or the public twitter repo.

Build failing in examples directory

I have downloaded the sources and following the instructions in readme.md file. Platform is windows.

Build is failing while compiling example directory with following error
compile-protobuf:
[apply] --twadoop_out: protoc-gen-twadoop: The system cannot find the file specified.

Lzo based on the GPL license，Any replacement？

I cannot use GPL library in commercial product.

Any ideas? I can get the feature of Lzo?