Coder Social home page Coder Social logo

matecat / matecat-filters Goto Github PK

View Code? Open in Web Editor NEW
43.0 16.0 33.0 101.43 MB

Convert any file to XLIFF and back with perfectly preserved formatting! Super easy API, plenty of supported formats and advanced segmentation.

Home Page: http://filters.matecat.com

License: GNU Lesser General Public License v3.0

Java 26.64% HTML 65.41% PHP 7.95%

matecat-filters's Introduction

Retirement

As of June 2020 this repository is put in read only mode (archived). Translated decided to close the sources of the MateCat Filters project. The source herein corresponds to version 1.2.5 based on Okapi version M36.

The MateCat Filters will still be usable via the hosted API. A basic free plan will allow trial and testing.

Convert any file to XLIFF and back

With MateCat Filters you can easily extract all the translatable contents from any file format into a convenient XLIFF file.

After you have translated your XLIFF, use Filters again to get back a completely translated file in the original format, with perfectly preserved formatting.

Fast, reliable and scalable, running everyday inside MateCat, the popular open-source CAT tool.

Test it right now on matecat.com: when you upload your file and later download it translated, you are using Filters.

Main features

Plenty of supported formats

Among others, MateCat Filters fully supports Microsoft Office formats (legacy ones too), Open Office, PDF, hypertext, and even images of scanned documents thanks to automatic OCR (using the proper external library). See the full list in the Wiki.

Advanced segmentation

Filters uses segmentation rules defined by the Unicode consortium, plus another set of rules specifically designed for CAT Tools. This is why Filters can properly split sentences even in uncommon languages like Mongolian.

Hosted API

  • Istantly ready to use, zero installation / configuration.
  • Runs in MateCat's infrastructure, with instances constantly monitored, optimized, and updated.
  • Transparent versioning: automatically downgrades when you try to convert a XLIFF created with an older version of MateCat Filters.
  • Commercial dependencies included: you don't need to buy licenses for the commercial software MateCat Filter uses to support OCR, PDF and legacy MS Office formats.

Getting started

Navigate the wiki to learn how to use the API.

matecat-filters's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

matecat-filters's Issues

Information lost when converting back to PO

Hi :)
I'm having problems when converting from XLIFF files back to PO, using the original2xliff endpoint. Specifically, the problem is about what Filters do to multi-line strings. So, for example, I have this msgstr string:

msgstr ""
"Por favor, active su cuenta. Debería haber recibido un correo electrónico "
"con el asunto \"Bienvenido a Unbabel 'con un enlace de activación."

When I convert to XLIFF and back using the original2xliff endpoint, I get this:

msgstr "Por favor, active su cuenta. Debería haber recibido un correo electrónico con el asunto \"Bienvenido a Unbabel 'con un enlace de activación."

(the string is all on the same line)

So, Filters doesn't maintain the formatting. Is this an intentional behavior?

( I also want to thanks for the great toll you have here 👍 )

Missing License

Hey Guys,

Great job with this work together with the Win Convertor! Just wondering what the licence both these pieces of code are released under.

Cheers,
Dave

Filters doesn't preserve trailing whitespaces inside CDATA

Consider this XML:

<xml>
    <element><![CDATA[This an example xml



    ]]></element>
</xml>

It has a lot of whitespaces. The problem is that when I convert it to XLIFF using Matecat Filters, when I try to convert it back to the original format using the translation, all the trailing whitespace disappear. This does not happen when the text is not inside CDATA.

The trailing spaces are also deleted in case of text inside tags inside CDATA. For example:

<xml>
    <element>
        <![CDATA[<tag>This an example xml </tag>]]>
    </element>
</xml>

The XLIFF resulting does not have that trailing whitespace in the end of the text. This does not happen when the <tag>This an example xml </tag> is not surrounded by CDATA.

"java.lang.RuntimeException: [was class java.util.zip.ZipException] invalid stored block lengths" on docx file

Hi,

10724.docx

This file is openable and readable using LibreOffice Writer, but when I tried to convert it to XLIFF I get the following:

[qtp1769597131-14] INFO com.matecat.converter.server.resources.ConvertToXliffResource - [CONVERSION REQUEST] 10724.docx:  to 
[qtp1769597131-14] INFO com.matecat.converter.core.project.ProjectFactory - [PROJECT CREATED] 10724.docx saved in /tmp/6171336572212321565
[qtp1769597131-14] INFO com.matecat.converter.core.okapiclient.OkapiClient - Using default segmentation
[qtp1769597131-14] INFO net.sf.okapi.common.pipelinedriver.PipelineDriver - Input: /tmp/6171336572212321565/10724.docx
[qtp1769597131-14] ERROR com.matecat.converter.server.resources.ConvertToXliffResource - [CONVERSION REQUEST FAILED] [was class java.util.zip.ZipException] invalid stored block lengths
java.lang.RuntimeException: [was class java.util.zip.ZipException] invalid stored block lengths
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.throwUnchecked(Stax2EventReaderImpl.java:470)
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.next(Stax2EventReaderImpl.java:262)
	at net.sf.okapi.filters.openxml.RunParser.processComplexCodes(RunParser.java:161)
	at net.sf.okapi.filters.openxml.RunParser.processRunBody(RunParser.java:148)
	at net.sf.okapi.filters.openxml.RunParser.parse(RunParser.java:79)
	at net.sf.okapi.filters.openxml.BlockParser.processRun(BlockParser.java:213)
	at net.sf.okapi.filters.openxml.BlockParser.parse(BlockParser.java:130)
	at net.sf.okapi.filters.openxml.StyledTextPartHandler.process(StyledTextPartHandler.java:139)
	at net.sf.okapi.filters.openxml.StyledTextPartHandler.open(StyledTextPartHandler.java:123)
	at net.sf.okapi.filters.openxml.StyledTextPartHandler.open(StyledTextPartHandler.java:115)
	at net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile(OpenXMLFilter.java:470)
	at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:262)
	at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:277)
	at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:171)
	at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:123)
	at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:235)
	at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:205)
	at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:186)
	at com.matecat.converter.core.okapiclient.OkapiClient.generatePack(OkapiClient.java:343)
	at com.matecat.filters.basefilters.DefaultFilter.extractOkapiPack(DefaultFilter.java:56)
	at com.matecat.filters.basefilters.DefaultFilter.extract(DefaultFilter.java:35)
	at com.matecat.filters.basefilters.FiltersRouter.extract(FiltersRouter.java:36)
	at com.matecat.converter.server.resources.ConvertToXliffResource.convert(ConvertToXliffResource.java:90)
	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
	at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:308)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:291)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1140)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:403)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:816)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1113)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1047)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
	at org.eclipse.jetty.server.Server.handle(Server.java:517)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:302)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:238)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:57)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.zip.ZipException: invalid stored block lengths
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
	at java.io.InputStreamReader.read(InputStreamReader.java:184)
	at com.ctc.wstx.io.MergedReader.read(MergedReader.java:105)
	at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:86)
	at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:56)
	at com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1060)
	at com.ctc.wstx.sr.StreamScanner.loadMoreFromCurrent(StreamScanner.java:1070)
	at com.ctc.wstx.sr.StreamScanner.getNextCharFromCurrent(StreamScanner.java:810)
	at com.ctc.wstx.sr.BasicStreamReader.readEndElem(BasicStreamReader.java:3162)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2831)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1073)
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:255)
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.next(Stax2EventReaderImpl.java:260)
	... 65 more

Probably not a urgent problem but decided to report anyway.

The file was created by a fuzzer built as a project of the Udacity's course on Software Testing.

Error 7

Services stop after a while, I get an error 7 message and can no longer upload any file..

Maven failure when trying to build filters project

Hi,

Just wanted to play around with these filters. I am getting the errors listed below when trying to build the filter projec as described in the doc. I am using "mvn clean package".
Any idea what the problem might be?

[INFO] ------------------------------------------------------------------------
[INFO] Building filters 1.2.0
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for net.sf.okapi:okapi-core:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.steps:okapi-step-common:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.steps:okapi-step-whitespace-correction:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.steps:okapi-step-segmentation:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.steps:okapi-step-encodingconversion:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.steps:okapi-step-rainbowkit:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-rainbowkit:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-html:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-plaintext:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-openxml:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-openoffice:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-php:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-its:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-properties:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-po:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-xliff:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-json:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-idml:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-icml:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-txml:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-yaml:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-mif:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-xmlstream:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-table:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-archive:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-xini:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-regex:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-ttx:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-ts:jar:0.30-SNAPSHOT is missing, no dependency information available
[WARNING] The POM for net.sf.okapi.filters:okapi-filter-dtd:jar:0.30-SNAPSHOT is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.379 s
[INFO] Finished at: 2016-09-17T23:30:45+02:00
[INFO] Final Memory: 8M/114M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project filters: Could not resolve dependencies for project com.matecat:filters:jar:1.2.0: The following artifacts could not be resolved: net.sf.okapi:okapi-core:jar:0.30-SNAPSHOT, net.sf.okapi.steps:okapi-step-common:jar:0.30-SNAPSHOT, net.sf.okapi.steps:okapi-step-whitespace-correction:jar:0.30-SNAPSHOT, net.sf.okapi.steps:okapi-step-segmentation:jar:0.30-SNAPSHOT, net.sf.okapi.steps:okapi-step-encodingconversion:jar:0.30-SNAPSHOT, net.sf.okapi.steps:okapi-step-rainbowkit:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-rainbowkit:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-html:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-plaintext:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-openxml:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-openoffice:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-php:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-its:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-properties:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-po:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-xliff:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-json:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-idml:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-icml:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-txml:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-yaml:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-mif:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-xmlstream:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-table:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-archive:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-xini:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-regex:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-ttx:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-ts:jar:0.30-SNAPSHOT, net.sf.okapi.filters:okapi-filter-dtd:jar:0.30-SNAPSHOT: Failure to find net.sf.okapi:okapi-core:jar:0.30-SNAPSHOT in http://repository-okapi.forge.cloudbees.com/release/ was cached in the local repository, resolution will not be reattempted until the update interval of okapi-release has elapsed or updates are forced -> [Help 1]

docx: Error opening zipped input file

Hi!

I have launched filters using this instruction https://github.com/matecat/MateCat-Filters/wiki/Build-and-run
And I get the following error when trying to convert .docx document. Do I get it right, that .docx should be supported by filters? How do I debug such an error?

ERROR qtp1844169442-83 [CONVERSION REQUEST FAILED] Error opening zipped input file.
net.sf.okapi.common.exceptions.OkapiIOException: Error opening zipped input file.
at net.sf.okapi.filters.openxml.OpenXMLFilter.openZipFile(OpenXMLFilter.java:436)
at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:260)
at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:140)
at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:123)
at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:235)
at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:205)
at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:186)
at com.matecat.converter.core.okapiclient.OkapiClient.generatePack(OkapiClient.java:331)
at com.matecat.filters.basefilters.DefaultFilter.extractOkapiPack(DefaultFilter.java:56)
at com.matecat.filters.basefilters.DefaultFilter.extract(DefaultFilter.java:35)
at com.matecat.filters.basefilters.FiltersRouter.extract(FiltersRouter.java:29)
at com.matecat.converter.server.resources.ConvertToXliffResource.convert(ConvertToXliffResource.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:308)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:291)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1140)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:403)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:816)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1113)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1047)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.server.Server.handle(Server.java:517)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:302)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:238)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:57)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)

The character sequence "]]>" must not appear in content

We have this error in logs:

[qtp1638215613-11] INFO net.sf.okapi.common.pipelinedriver.PipelineDriver - Input: /opt/mc-filters/tmp/3511850666662201001/aaaaaaaaaMDM.xlsx
[Fatal Error] :18951:389: The character sequence "]]>" must not appear in content unless used to mark the end of a CDATA section.
org.xml.sax.SAXParseException; lineNumber: 18951; columnNumber: 389; The character sequence "]]>" must not appear in content unless used to mark the end of a CDATA section.

I may be connected with #16 issue.

Can you help us with this issue?

Problems when converting from XLIFF to TSV using the translation

Hi,

When I try to convert a XLIFF (resulting from the conversion of a TSV) back to a TSV using the original2xliff endpoint I get a "It was not possible to obtain the derived file from [name-of-the-file]" error. I've already try with three different TSV files.

(I'm still using the 1.1.4 version, but from what I checked none of the changes would this affect behavior.)

Thanks. :)

PPTX failures

We've experienced rather frequent issues with Powerpoint files that are hard to pin down. In one case, it was a linebreak in a textbox, in another one an empty slide that caused the service to fail. The Java error message below doesn't provide much of a clue:
ERROR com.matecat.converter.server.resources.ConvertToXliffResource - [CONVERSION REQUEST FAILED] null java.lang.NullPointerException at net.sf.okapi.filters.openxml.RunMerger.append(RunMerger.java:390) at net.sf.okapi.filters.openxml.BlockParser.parse(BlockParser.java:164) at net.sf.okapi.filters.openxml.StyledTextPartHandler.process(StyledTextPartHandler.java:168) at net.sf.okapi.filters.openxml.StyledTextPartHandler.open(StyledTextPartHandler.java:147) at net.sf.okapi.filters.openxml.StyledTextPartHandler.open(StyledTextPartHandler.java:139) at net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile(OpenXMLFilter.java:482) at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:261) at net.sf.okapi.filters.openxml.OpenXMLFilter.next(OpenXMLFilter.java:276) at net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(RawDocumentToFilterEventsStep.java:167) at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:119) at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:231) at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:201) at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:182) at com.matecat.converter.core.okapiclient.OkapiClient.generatePack(OkapiClient.java:343) at com.matecat.filters.basefilters.DefaultFilter.extractOkapiPack(DefaultFilter.java:56) at com.matecat.filters.basefilters.DefaultFilter.extract(DefaultFilter.java:35) at com.matecat.filters.basefilters.FiltersRouter.extract(FiltersRouter.java:36) at com.matecat.converter.server.resources.ConvertToXliffResource.convert(ConvertToXliffResource.java:91) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) at org.glassfish.jersey.internal.Errors.process(Errors.java:316) at org.glassfish.jersey.internal.Errors.process(Errors.java:298) at org.glassfish.jersey.internal.Errors.process(Errors.java:268) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:564) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128) at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:673) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:591) at java.lang.Thread.run(Thread.java:748)

Bug when converting XLIFFs with escaped HTML markup?

Hi,
Consider that I have this XLIFF:

 <?xml version="1.0" encoding="utf-8" ?>
 <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
     <file source-language="en" target-language="pt" original="">
         <head></head>
         <body>
             <trans-unit id="1" size-unit="char" maxwidth="140">
                     <source>&lt;This is just a test&gt;</source>
                     <target></target>
             </trans-unit>
          </body>
     </file>
 </xliff>

When I follow this flow:

XLIFF -(original2xliff)-> XLIFF Matecat Filters -(xliff2original)-> XLIFF

The final XLIFF is different from the original one. Specifically, the &gt; are escaped.

<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2">
    <file source-language="en" target-language="it-it" original="">
        <head></head>
        <body>
            <trans-unit id="1" size-unit="char" maxwidth="140">
                    <source>&lt;This is just a test></source>
                    <target>&lt;This is just a test></target>
            </trans-unit>
         </body>
    </file>
</xliff>

Also happens with SDLXLIFF files.

Matecat Segmentation Issues

We have been experimenting segmentation issues with Matecat, please see the following examples:

  1. When there are dots after numbers inside a sentence. Same in both general and paragraph segmentation most of the time:
  1. When there is a quoted sentence inside the sentence:
    https://www.matecat.com/translate/quotesodt/tr-TR-en-GB/1710109-904edd6b46db#966996041

I'm aware segmentation is not an easy task to accomplish but bad segmentation is causing messed up TMs when translation is done between two syntactically different languages such as Turkish and English. For these 2 segments to be properly reflected in the translated document, we need to improvise with the segment translations as you can see below:
image

I believe the most efficient and easy way to solve this problem is adding the capability to merge multiple segments into one from the UI. At the moment Matecat doesn't let you merge segments unless they were split before. It doesn't seem to be a complex technical task to achieve this. In return it'd have serious benefits. What do you think?

Error converting xliff to docx (TR->AR)

Hi,

I have a docx file (in Turkish) that I can upload and work on in MateCat, but whenever I try to download/preview the file, I'm getting the following error from Filters:

ERROR com.matecat.converter.core.okapiclient.OkapiClient - It was not possible to obtain the derived file from 1-.docx
net.sf.okapi.common.exceptions.OkapiNotImplementedException
...

(Full log details have been attached.)

The issue only comes up when I select Arabic as the target language. With English as the target language, I can preview/download the file.

Thank you!
filter-log.txt

Exception running the command: java -cp ".:filters-1.2.5.jar" com.matecat.converter.Main

I have built as per the instruction in the wiki. All configurations were default: https://github.com/matecat/MateCat-Filters/wiki/Build-and-run

Running it produced the below exception. What could be the problem? Thanks in advance.

java -cp ".:filters-1.2.5.jar" com.matecat.converter.Main

Exception in thread "main" java.lang.ExceptionInInitializerError
at com.matecat.converter.server.MatecatConverterServer.(MatecatConverterServer.java:47)
at com.matecat.converter.Main.main(Main.java:21)
Caused by: java.lang.RuntimeException: Exception while loading config.properties.
at com.matecat.converter.core.util.Config.(Config.java:104)
... 2 more
Caused by: java.lang.NullPointerException: inStream parameter is null
at java.base/java.util.Objects.requireNonNull(Objects.java:246)
at java.base/java.util.Properties.load(Properties.java:403)
at com.matecat.converter.core.util.Config.(Config.java:41)
... 2 more

TMX export - Tag issue

Hey there!
We discovered that when using either a link or TMXs exported out if our Matecat instance to create QA projects in lexiQA's platform, any segments with tags in them are not imported correctly.

This was not an issue in previous versions but appeared after our last update to the system. We discovered that tags are exported as <bx id="4"/>, for example, instead of &lt;bx id="4"/&gt. Brackets should be interpreted as < for < and > for >

What's the difference between /AutomationService/xliff2original and /AutomationService/xliff2source resources?

The documentation doesn't explain the difference.

The code is also confusing. In the GenerateDerivedFileResource file, which corresponds to the resource /AutomationService/xliff2source it is written that this resource takes "care of the extraction of the original file from the .XLF".

On the other hand, in the ExtractOriginalFileResource file, which corresponds to the resource /AutomationService/xliff2original it is written that the resource takes "care of the generation of the new file from the .XLF".

So, the /AutomationService/xliff2original does not return the original file? That's not really intuitive.

source/target language not correctly tagged in original2xliff result

It seems the logs are showing that the language combination are correctly received:

[CONVERSION REQUEST] xxxx.xlsx: en-US to da-DK
[PROJECT CREATED] Norwegian Translations.xlsx saved in /tmp/2092361387035352077
Using default segmentation
Input: /tmp/2092361387035352077/xxxxxlsx
[CONVERSION FINISHED] xxxx.xlsx: en-US to da-DK

However in the xliff file the source and target tags are incorrectly tagged (always) as source="en" and target="fr"

Any ideas?

Failed tests

Hi,
When I run the tests on this project there are some failures.

Results :

Failed tests: 
  ExtractOriginalFileResourceTest.testOriginalSuccess:53 expected:<200> but was:<500>
  GenerateDerivedFileResourceTest.testDeriveSuccess:55 expected:<200> but was:<500>
  ConvertToXliffResourceTest.testConvertSuccess:58 expected:<200> but was:<500>

Tests in error: 
  XliffProcessorTest.testGetOriginalFile:26 » ExceptionInInitializer
  XliffProcessorTest.testGetDerivedFile:47 » NoClassDefFound Could not initializ (..)
 (...)

Tests run: 53, Failures: 3, Errors: 29, Skipped: 0

But everything seems ok when testing manually: I can turn the server on and make requests to it.

(I'm new to Maven so I could be doing something wrong)

Error when building new version on Windows 8.1

Hi.

I followed the steps on the wiki and when I run the "mvn clean package" part, I get the following error:

[INFO] Scanning for projects...
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] Child module C:\Users\Luis\Desktop\Matecat Filters 2.0\MateCat-Filters\custom-filters of C:\Users\Luis\Desktop\Matecat Filters 2.0\MateCat-Filters\pom.xml does not exist @
 @
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR]   The project com.matecat:filters-parent:1.2.0 (C:\Users\Luis\Desktop\Matecat Filters 2.0\MateCat-Filters\pom.xml) has 1 error
[ERROR]     Child module C:\Users\Luis\Desktop\Matecat Filters 2.0\MateCat-Filters\custom-filters of C:\Users\Luis\Desktop\Matecat Filters 2.0\MateCat-Filters\pom.xml does not exist
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException

Everything worked well when using the last version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.