evidenceprime / html-docx-js Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 287.0 1.46 MB

Converts HTML documents to DOCX in the browser

Home Page: http://evidenceprime.github.io/html-docx-js/

License: MIT License

HTML 0.92% JavaScript 95.91% CoffeeScript 2.55% Smarty 0.62%

html-docx-js's People

Contributors

Stargazers

Watchers

Forkers

akatashkov anowak tanyapil sirbolkins firulais archivedrepos allamgr a-kazakov hatruong28 aessig peterhward cuchulainx souleh neeraj-webdev celtir n1kto rafaelcastelani peilongwu archer-christ 4kd kuntian andriynoble wiserv-ued houzhong huanghuixin1 homezrutteman nguyen-thanh-tuan jignashusolanki3006 zmrzwj nobodymo odeann avinash1341 lottid mokacao gongph jhgjhtuytdfbnfvmnbgjtuydt senthan venu222 wukuochu engrmostafijur chenhjcs bijoshtj paopaofalao isragg isneillin ea0tank contact-anuragvardhan leeyoe18 a524631266 xlzhao sudharani256 kamshory samuelpadua miiketran deeg etongle emcrk derickwarshaw uday2k feengqi jsweber dyf2015 truexin1292 superface ritukohli72 johnnymaxwell mcharters rcv-legado linayin fabryprog clydeqin7 linecode codeforget cicilalala adrianvm22 charliejones1 maxiplay piyushsain resurser cereceres devedatafullstack1 miyasakago bhaskarsanjeevamoorthy thatal gigabiter xiaomenghuang iaminvictus1993 leeh14 wupan1018 yadavgovind msalah85 ken-kurniawanen aiminho park-brian sisidev apoily kagayakuff c24709 wcmpersonalrepository leejun41

html-docx-js's Issues

Convert an HTML string to Docx XML

Hi guys, great library!

I have a use case where I only need to convert an HTML string, say a TABLE layout, to an DOCX XML.

var strXml = htmlDocx.convert( strHtml );

Is there a way to do this with your lib?

Thanks,

--Jeff

SVG images not supported

When converting a document containing svg images, they are not converted correctly, as other images are. They simply remain inline in the html. It looks like this regex in utils.coffee is the culprit, since the mimetype for svg images has a non-word character:

inlinedSrcPattern = /"data:(\w+\/\w+);(\w+),(\S+)"/g;

Applying CSS in style tag

Hello there :)
First - your work is awesome!!!
I need the whole document to be the same font-famiIy and font-size but I face some troubles when I try to apply CSS to do it. What I do is to add:

<style>
body, body *, h1, h2, h3, h4, h5, h6, p, span, div, p span{font-family: "Times New Roman", Times, serif !important}
*, body, body *, span, div, table, tr, td, font, p, p *{font-size: 25pt!important}
h3{font-size: 50pt!important}
</style>

in the . But after the generation of DOCX the CSS style is not applied.

How can I achieve the document to have the same font-family and size?
Thank you!

SaveAs error while converting html to docx

I have used the following code to generate docx file.
var htmlDocx = require('html-docx-js');
var content = 'businesscard.html';
var converted = htmlDocx.asBlob();
saveAs(converted, 'test.docx');

But i got an error that says, saveAs is not defined. I don't know what todo now?
need some help.
Thanks in advance.

images resize on export

Mostly working great, but images seem to resize and crop themselves when they are sized other then full size, IE (height: xyz, width: yyz). Is this a known issue?

Not working?

This seems to be no longer working. The demo generates an empty .docx file, and trying to use it locally does the same.

Anyone still have it working?

HTML to DOCX results in empty DOCX document

Node.JS version: v0.12.7
html-docx-js version: 0.3.0
Tested on: LibreOffice Writer 4.2.8.2, Google Docs, WordPad

Code

        var fs = require('fs');
        var htmlDocxJs = require('html-docx-js');

        var html = '<!DOCTYPE html><html><head><title>hello!</title></head><body><h1>Hello!</h1></body></html>';
        var docx = htmlDocxJs.asBlob(html);
        fs.writeFileSync('/tmp/test.docx', docx);

Result
LibreOffice Writer 4.2.8.2, Google Docs, WordPad all display an empty document.

Notes

HTML validates against W3C validator
Unzipped the file, file contents looks OK

➜  /tmp  unzip -l /tmp/test.docx 
Archive:  /tmp/test.docx
  Length      Date    Time    Name
---------  ---------- -----   ----
      465  2015-10-20 17:51   [Content_Types].xml
        0  2015-10-20 17:51   _rels/
      330  2015-10-20 17:51   _rels/.rels
        0  2015-10-20 17:51   word/
     2056  2015-10-20 17:51   word/document.xml
      491  2015-10-20 17:51   word/afchunk.mht
        0  2015-10-20 17:51   word/_rels/
      306  2015-10-20 17:51   word/_rels/document.xml.rels
---------                     -------
     3648                     8 files

/tmp/test.docx Passes the Open XML validator - http://www.microsoft.com/en-us/download/details.aspx?id=5124
Unzipped the file and afcchunk.mht contains the HTML and the HTML validates against W3C validatior
GitHub won't allow the test.docx to be uploaded claiming it's not a docx
Resulting file - http://filebin.ca/2JhttwkoZurM/test.docx

"Document is damaged" on Word App

Thanks for this great library. We use it in a one of our project and it works great appart for one thing.

When we generate the docx from an iPad (with Safari on the latest version of iOS), we get a blank page returned. If we open this in the Word App (published by Microsoft), we get a « Document is damaged » warning.
We’ve try with the online demo http://evidenceprime.github.io/html-docx-js/test/sample.html as well, but it does seems that this issue appear here too.

Do you have any work around to have Word document that could work on iOS ?

chinese font in the docx is garbled

hello
i put the chinese font into the content like "你好".the font converted "浣犲ソ" in the docx.
Is there any way to solve it？
thanks.

Table in exported word document is not formatted

I'm using html-docx-js to convert a document from html to a word document. The issues seen in the resultant document are as listed below:

The exported file has the content of the html file. However, the content within tables does not retain it's formatting(The content displayed isn't in sync with the column titles).
The string 'Column Settings' is prepended to each column title.
The title of the 1st column is prepended to the title of every subsequent column of the table.

The js code used is:

var contentDocument = document.getElementById(ids[i]);
var content = '' + contentDocument.outerHTML;
var converted = htmlDocx.asBlob(content, {orientation: 'landscape', margins: {top: 720}});

            saveAs(converted, ids[i]+'.docx');

            var link = document.createElement('a');
            link.href = URL.createObjectURL(converted);
            link.download = ids[i];
            link.appendChild(
                document.createTextNode('Click here if your download has not started automatically'));
            var downloadArea = document.getElementById('download-area');
            downloadArea.innerHTML = '';
            downloadArea.appendChild(link);

Can anyone help with any pointers towards resolving this issue?

minify? uglify?

html-docx.js is ~400kb
can we use minify? or uglify?
is there an official html-docx.min.js somewhere?

Special Characters Truncated

Hello,

When I create a DOCX file from a HTML file using the lib I can open it normally on MS Word 2011. However, when I open the same document on a more recent version, some special characters (áõê...) are truncated.

Is this a known issue with recent Word versions and, if so, do you guys know of a word-around?

Thanks!

Keeping Content from Breaking

Is there an option to prevent page breaking on elements? e.g.

page-break-after: auto;
page-break-inside: avoid;
page-break-before: auto;

Generated output does not take the css into consideration and the page will break.

Custom CSS fonts are not included in the export.

First off, thanks for your work!

I do have one gripe however; fonts, as declared in the CSS Stylesheet via @font-face are not exported into the Word DOCX archive.

Would be absolutely swell to have that feature! Thanks!

Using html-docx-js with electron

Hi,
I'm using electron 0.34.2, and when I try to export a docx file, it seams to work but the docx file only contain the following:
[object Blob]
Here is my code:

var converted = HtmlDocx.asBlob(html_document, {orientation: 'portrait'});

fs.writeFile(path, converted, function(err) {
 if (err) throw err;
});

When I run the same code only using nodejs it work correctly.
Cheers

convertImagesToBase64

To support images, which are scaled by css or attribute, the canvas should have the naturalWidth/naturalHeight not to be cropped in the word document:

function convertImagesToBase64 (element) {

  contentDocument = tinymce.get('content').getDoc();
  var regularImages = contentDocument.querySelectorAll("img");
  var canvas = document.createElement('canvas');
  var ctx = canvas.getContext('2d');
  [].forEach.call(regularImages, function (imgElement) {
	// preparing canvas for drawing
	ctx.clearRect(0, 0, canvas.width, canvas.height);
// change here start
	canvas.width = imgElement.naturalWidth;
	canvas.height = imgElement.naturalHeight;
//change here end
	ctx.drawImage(imgElement, 0, 0);
	// by default toDataURL() produces png image, but you can also export to jpeg
	// checkout function's documentation for more details
	var dataURL = canvas.toDataURL();
	imgElement.setAttribute('src', dataURL);
  })
  canvas.remove();
}

see: https://github.com/evidenceprime/html-docx-js/blob/master/test/sample.html

Problem when calling from an Electron app

Thanks for this awesome package, it will solve me a lot of problems :)

Now, I am using it in NodeJs, and all is file, the problem is that I don't know how to save the return of .asBlob to a file, it comes out something like [blob object] ! Here is my code

                // Renders the HTML template
                var html = this.render(template, values);

                // Convert the html to open xml format
                var output = html2docx.asBlob(html, {
                    'margins': {'top': 200}
                });

                console.log(output);

                // Save the output
                fileSystem.writeFileSync(outputFile, output);

Here is the output object contents

What should i change to get it to save this "blob" to a file ?
Thanks

Accent characters are not rendering correctly in Word 2007

Hi,

I am trying to export some HTML (using IE11) and currently testing with Word 2007. I am seeing accent characters like "é" displaying as "Ã©" in the exported document. If I copy/paste the exact text on my HTML page into the same document, it renders fine.

Any help would be appreciated.

Thanks,
Steven

The sample doesn't work

Hello,

I test, and the sample it's showing nothing in libreoffice, It seems to work with word, but should be good to can open it with libreoffice (or openoffice)

Font is ignored in tables

Is there a way to have tables respect fonts? I have tried putting the style in the TABLE tag, the TH and TD tags, and in a SPAN around the content itself. It works on my mac, but on windows the content inside the table is showing up with the default Word font (Calibri).

Does not work in readers other than ms word

I need to convert docx for a whole different reason. I want html page in my java application to be converted to docx file and your script can do this but the issue is I cannot open it within the application which is the important part of the process. I want a docx file created which can be read using docx4j library i.e. a general docx. Please tell me if that's possible and how?

Header,footer and pagination

Hello,I'm using this plugin because it very friendly approach, So have any ways how to include header,footer & pagination during export to word?
Thanks..

Error when opening file created using the test page

Failed to open up a file created using the test tool at http://evidenceprime.github.io/html-docx-js/test/sample.html

Word 2003 Support?

Hi,

my client is not able to open the generated docx with Microsoft Word 2003. With my Word 2010 it works fine!

Best regards

How just convert to docx without zip the file to upload do google drive

empty row

hi all
i'm making a table with tr inside a td here is the html :

<!DOCTYPE html><html><head><style>table{width: 100%;border-collapse: collapse;}th {   background-color: #a2d8f2;   color: white;}table, th, td{padding: 5px;border: 1px solid black;text-align: center;}.bold {text-align: left;font-weight: bold;width: 20%;}</style></head><body>
<table>
<thead>
  <tr>
  <th>Area</th>
  <th>Key</th>
  <th>CD#11</th>
  <th>CD#14</th>
  <th>CD#15</th>
  <th>CD#16</th>
  <th>CD#24</th>
  <th>Total</th>
  </tr>
 </thead>
 <tbody>
   <tr>
     <td rowspan="1">Application</td>
     <td colspan="7">No Data</td>
   </tr>
   <tr>
     <td rowspan="1">Wish</td>
     <td colspan="7">No Data</td>
   </tr>
   <tr>
     <td rowspan="1">Training</td>
     <td colspan="7">No Data</td>
   </tr>
   <tr>
     <td rowspan="1">Hardware</td>
     <td>Dot Engine</td>
     <td>1</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>1</td>
   </tr>
   <tr>
     <td rowspan="1">Documentation</td>
     <td>Inline position display</td>
     <td>0</td>
     <td>0</td>
     <td>1</td>
     <td>0</td>
     <td>0</td>
     <td>1</td>
   </tr>
   <tr>
     <td rowspan="3">Software</td>
     <tr>
       <td>Dot Engine</td>
       <td>1</td>
       <td>0</td>
       <td>0</td>
       <td>0</td>
       <td>0</td>
       <td>1</td>
   </tr>
   <tr>
     <td>GSP</td>
    <td>0</td>
     <td>2</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>2</td>
   </tr>
  </tr>
  </tbody>
</table>
</body>
</html>

here is the result file
frag.docx

but as you can see there is an empty row above the row inside the td element ..... i couldn't figure it out
i tried to write

and

around it with no result (it just made my table a mess)

any idea ?

Output file blank in Pages (Mac) and Google Documents

Wonderful library. I don't know if this is really an issue of html-docx-js, but I've tried both online demo and node.js version. Works fine with Word Online and MS Word for Windows but I get a blank sheet when opening the generated file with Pages or Google Documents.

Is this perhaps using a different format than the standard one?

I am getting fs.readFileSync is not a function error and I am not able to understand why the browser code is making a call to the 'fs' library?

Works fine in Nodejs environment but doesn't work on the browser. I am using React + Webpack. My code:

import htmlDocx from 'html-docx-js';
import { saveAs } from 'file-saver'

exportToWord = () => {
    var content = '<!DOCTYPE html>' + '<html>' + this.state.report + '</html>'
    var converted = htmlDocx.asBlob(content);
    saveAs(converted, 'test.docx');
  }

When I invoke the exportToWord() function I get the following error:

Uncaught TypeError: fs.readFileSync is not a function
    at Object.addFiles (http://localhost/22.showReport.42adb462f5c71e04f87e.js:24685:41)
    at Object.asBlob (http://localhost/22.showReport.42adb462f5c71e04f87e.js:12695:15)
    at ShowReport._this.exportToWord2 (http://localhost/22.showReport.42adb462f5c71e04f87e.js:10579:45)
    at EnhancedButton._this.handleClick (http://localhost/app.42adb462f5c71e04f87e.js:31743:22)
    at Object.ReactErrorUtils.invokeGuardedCallback (http://localhost/app.42adb462f5c71e04f87e.js:2651:17)
    at executeDispatch (http://localhost/app.42adb462f5c71e04f87e.js:2437:22)
    at Object.executeDispatchesInOrder (http://localhost/app.42adb462f5c71e04f87e.js:2460:6)
    at executeDispatchesAndRelease (http://localhost/app.42adb462f5c71e04f87e.js:1854:23)
    at executeDispatchesAndReleaseTopLevel (http://localhost/app.42adb462f5c71e04f87e.js:1865:11)
    at Array.forEach (native)
    at forEachAccumulated (http://localhost/app.42adb462f5c71e04f87e.js:2748:10)
    at Object.processEventQueue (http://localhost/app.42adb462f5c71e04f87e.js:2068:8)
    at runEventQueueInBatch (http://localhost/app.42adb462f5c71e04f87e.js:9498:19)
    at Object.handleTopLevel [as _handleTopLevel] (http://localhost/app.42adb462f5c71e04f87e.js:9509:6)
    at handleTopLevelImpl (http://localhost/app.42adb462f5c71e04f87e.js:14505:25)

Initially I was getting the Can't resolve 'fs' error which I resolved by adding

node: {
    fs: 'empty'
}

to webpack-config.

But now I am getting fs.readFileSync is not a function error and I am not able to understand why the browser code is making a call to the 'fs' library?

word header and footer

Hallo,

The html-docx works realy great. Could get realy good results till now.
What would be the way to add header and footer in the resulting word document?

changing the .tpl file
using the htmlDocx.asBlob(content, header, footer) function and adding html for header and footer options/parameters
how to add page numbers to the footer

Thx for any hint,
regeads, Willi

A4 page

Hello everyone!
First I want to thank you for the great lib you have created! Awesome! :)))

I have an issue when I try to add content with minimum height of one A4 page (29.7 cm). What I need in my DOCX is to have couple of blocks of texts, every of each to be on separate page. Example: Chapter I will be on the first page, Chapter II no the 2nd, Chapter III - 3rd and etc... I wrapped the block with height of 29.7 cm (style="height: 29.7cm") but it didn't work. The block's height is as much as its content - not 29.7 cm.
How can accomplish this or is there another way to do it?
Thank you!

dute Nahui

Pidor

cyrillic text

Hello, unfortunately, if i will type cyrillic letters it will give back something like this ""Р°РІС‹Р°РІС‹Р"

Add FileSaver.js to dependencies or mention it in the docs

Example in the docs contains usage of saveAs() function which belongs to FileSaver.js extension. Without looking at the demo page sources, it's impossible to figure out why example is not working.

So I suggest to either add it to dependencies or at least mention in the docs.

First option is more preferable for me, because file saving it's important part of converting html to docx. Can we do something with converted object without file saving?

Failed to execute 'createObjectURL' on 'URL'

I am using this awesome tool with Electron (Node.JS) but I am facing this error:
Failed to execute 'createObjectURL' on 'URL': No function was found that matched the signature provided.

Here is my code:

                // Generate a blob out of the passed HTML
                var docx = module_htmlToDocx.asBlob(frame, options);

                // Push the report to the browser, as .docx
                window.saveAs(docx, 'report.docx');

undocumented dependencies: coffeeify & brfs

I had to npm install these browserify transforms before I could get your html-docx-js to be required:
browserify: { transform: [ 'coffeeify', 'brfs' ] },

calculate page numbers after convert

Hi.
I have to show number of pages after convert. could this library help me to know how many pages created in for example A4 format?

Node env Error: Can't resolve 'fs'

This is my full code :

import juice from 'juice'
	import htmlDocx from 'html-docx-js'
	import { saveAs } from 'file-saver'

	export default {
		methods: {
			handleExport () {
				var content = htmlDocx.asBlob('hello')
				saveAs(content, 'test.docx')
			}
		}
	}

Run webpack compiler，Cmd show a error :

Module not found: Error: Can't resolve 'fs' in 'F:\gittest\js-office-demo\node_modules\html-docx-js\build

Waiting online .... @anowak @kozborn @gpurgal

Sample Not working in Chrome

CSS3 Flexbox not working ?

Your lib is great, but i've tried to render a doc styled with flexbox and the layout didn't responded like i expected. Is it already a known issue or is it just me doing things wrong ?
All the styling is in the HTML and not in <style> tags.
Left side is the render in html, right in docx with the lib.

Incorrect package.json "main" option?

It's now "src/index.coffee", and it's not working for me. Shouldn't we change that into "dist/html-docx.js"?

Downloads Empty File

I downloaded the file from the demo page and the .docx file is empty.
P.S I use Pages to view.

samples not working properly in Safari

Hi
Samples.html works perfectly in Firefox, IE and Chrome. However, Safari opens a new blank tab and could not be able to download the docx file.
From the section "Compatibility" of the documentation, it says that it was tested on Google Chrome 36, Safari 7 and Internet Explorer 10.
I am using Safari 9.1.2.
Thanks.

Image support

I just realized that docx that does not supported embed data url in . Do you have any ideas for this?

Was looking in a way to add imagens in the zip and reference them.. but seems tricky.
(I've done saving images in the zip, but not sure how to alter the altchunk)

thanks.

Google Docs support

Files have no content when you drag them into Google Docs.

Populate document meta data

It would be nice if it was possible to populate the meta data of the document based on reasonable html properties. For example, the author could be pulled from here:

<meta name='author' content='John Doe'>

parameter for format like A4, etc. would be great

Thx for this great library. A parameter for format like A4, etc. would be great. Or to have a parameter for width and height to be set.
Thx, willi

File type: Compability mode. google drive issues

I really appreciate the work done with html-docx-js!

However, the docx file opens in compability mode in Word (Mac) and is not able to open i Google Drive for instance. I suspect it is due to compability mode.

Is there any work arounds here, or plans to address this issue?