Comments (7)
Hi @JNimkarLS,
I tried to extract image for the J1.docx
, it is successfully extracted using this code below.
// Copyright 2022 FoxyUtils ehf. All rights reserved.
package main
import (
"io"
"log"
"os"
"strconv"
"github.com/unidoc/unioffice/common/license"
"github.com/unidoc/unioffice/document"
)
func init() {
// Make sure to load your metered License API key prior to using the library.
// If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
doc, err := document.Open("J1.docx")
if err != nil {
panic(err)
}
defer doc.Close()
for i, img := range doc.Images {
destImg := strconv.Itoa(i) + "." + img.Format()
if err := extractImgFile(img.Path(), destImg); err != nil {
panic(err)
}
// Incase you want to read the bytes file.
imgBytes, err := os.ReadFile(img.Path())
if err != nil {
panic(err)
}
log.Printf("bytes: %v\n", imgBytes)
}
}
func extractImgFile(src, dst string) error {
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
out, err := os.Create(dst)
if err != nil {
return err
}
defer out.Close()
_, err = io.Copy(out, in)
return err
}
Is the file is correct? I saw you code, it is trying to open presentation file (PPTX) presentation.Read()
.
from unioffice.
@sampila Thanks for the quick response. My apologies - I am actually using document.Read(). I accidentally sent the snippet that is scanning presentations. However, the code between the two is the same. Here is the code:
doc, err := document.Read(reader, reader.Size()) // <--- document comes in as a byte array
if err != nil {
return "", nil, fmt.Errorf("document read failure with error: %v", err)
}
if doc == nil {
return "", nil, fmt.Errorf("internal error: [document.Read] returned a nil pointer")
}
defer doc.Close()
for _, img := range doc.Images {
ctx.Logger().Debug("image found in docx file - scanning image")
if img.Path() == "" {
ctx.Logger().Warn("received an image with an empty path")
continue
}
raw, err := os.ReadFile(img.Path())
if err != nil {
ctx.Logger().Error("failed to read file: %s with error: %v", img.Path(), err)
continue
}
}
The issue I am having is that the service receives the MS word document as a byte array and the docx file is not stored locally. So I must use document.Read() and not document.Open(). Can you reproduce the error with this code?
from unioffice.
Hi @JNimkarLS,
I tried to modify the code to use document.Read() and extract the image, here's my code.
// Copyright 2022 FoxyUtils ehf. All rights reserved.
package main
import (
"io"
"log"
"os"
"strconv"
"github.com/unidoc/unioffice/common/license"
"github.com/unidoc/unioffice/document"
)
func init() {
// Make sure to load your metered License API key prior to using the library.
// If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
filename := "J1.docx"
docFile, err := os.Open(filename)
if err != nil {
panic(err)
}
defer docFile.Close()
docFileInfo, err := os.Stat(filename)
if err != nil {
panic(err)
}
doc, err := document.Read(docFile, docFileInfo.Size()) // <--- document comes in as a byte array
if err != nil {
panic(err)
}
if doc == nil {
panic(err)
}
defer doc.Close()
for i, img := range doc.Images {
destImg := strconv.Itoa(i) + "." + img.Format()
if err := extractImgFile(img.Path(), destImg); err != nil {
panic(err)
}
// Incase you want to read the bytes file.
imgBytes, err := os.ReadFile(img.Path())
if err != nil {
panic(err)
}
log.Printf("bytes: %v\n", imgBytes)
}
}
func extractImgFile(src, dst string) error {
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
out, err := os.Create(dst)
if err != nil {
return err
}
defer out.Close()
_, err = io.Copy(out, in)
return err
}
Able to extract the image from J1.docx
without issue.
from unioffice.
@sampila Ok interesting. So why doesn't a solution like this work:
func main() {
content, err := os.ReadFile(documentFilePath) // <-- path to J1.docx
if err != nil {
panic(err)
}
reader := bytes.NewReader(content)
doc, err := document.Read(reader, reader.Size())
if err != nil {
panic(fmt.Errorf("document read failure with error: %v", err))
}
fmt.Println(len(doc.Images))
}
If I print the length of images for J1.docx it gives me 0.
from unioffice.
Hi @JNimkarLS,
I tested your latest code, working fine on my end and able to extract the image.
// Copyright 2022 FoxyUtils ehf. All rights reserved.
package main
import (
"bytes"
"fmt"
"io"
"os"
"strconv"
"github.com/unidoc/unioffice/common/license"
"github.com/unidoc/unioffice/document"
)
func init() {
// Make sure to load your metered License API key prior to using the library.
// If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
if err != nil {
panic(err)
}
}
func main() {
filename := "J1.docx"
content, err := os.ReadFile(filename) // <-- path to J1.docx
if err != nil {
panic(err)
}
reader := bytes.NewReader(content)
doc, err := document.Read(reader, reader.Size())
if err != nil {
panic(fmt.Errorf("document read failure with error: %v", err))
}
fmt.Println(len(doc.Images))
if doc == nil {
panic(err)
}
defer doc.Close()
for i, img := range doc.Images {
destImg := strconv.Itoa(i) + "." + img.Format()
if err := extractImgFile(img.Path(), destImg); err != nil {
panic(err)
}
// Incase you want to read the bytes file.
_, err := os.ReadFile(img.Path())
if err != nil {
panic(err)
}
//log.Printf("bytes: %v\n", imgBytes)
}
}
func extractImgFile(src, dst string) error {
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
out, err := os.Create(dst)
if err != nil {
return err
}
defer out.Close()
_, err = io.Copy(out, in)
return err
}
from unioffice.
Hi @JNimkarLS,
Have you checked the code and file?
Do the code works on your end?
from unioffice.
Hi @JNimkarLS,
We closing this issue for now, feel free to re-open this issue if you this still not resolved.
Best regards,
Alip
from unioffice.
Related Issues (20)
- nil pointer when convert docx to pdf HOT 9
- Runtime error extracting text from Word document HOT 3
- No text in document after converting a Word document to PDF HOT 3
- How to split a run and/or paragraph while preserving styling HOT 3
- No text extracted from PowerPoint file HOT 2
- how to get the real content contained in the specified bookmark HOT 2
- AppProperties.Pages() doesn't always return the correct number of pages HOT 2
- Convert `.docx` to `.pdf` misses images HOT 2
- How to get the page size for a Word doc? HOT 7
- is there support for adding a comment to a word doc? HOT 1
- Unit tests? HOT 3
- Why do proofreading errors occur HOT 5
- panic: runtime error: index out of range [0] with length 0 while converting docx to pdf HOT 3
- May I ask if there is an implementation for inserting chart elements into pptx? HOT 1
- How to insert a table at a specified location in a word document HOT 2
- Are all Unidoc projects fully open source? HOT 2
- Chinese characters, a null pointer error will occur. HOT 7
- Support with Pongo2 or Jinja2 templating feature HOT 1
- Text watermark does not support settings such as font, size, color,layout, etc,image watermark does not support scaling parameters HOT 2
- The build directory failed when wps opened HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unioffice.