Current Open-Source solutions for converting documents (PDFs or XDOCs) into machine readable data lack the capacity to extract data while maintaining the original format, limited to just plain text extraction.
Multiple columns, tables data and graphs data, which are common in Ford documents, are extracted without format consideration by existing solutions