Comments (14)
Good news!
I finally got around to obtaining a NEC PC-9821 laptop to develop this part of DOSLIB on. It's a 486DX 50MHz with 10MB of RAM. Despite the seller shipping with no OS, I was able to reinstall MS-DOS 6.2 NEC-98 version. Many PC-98 games, including Touhou Project 1-5, run perfectly fine on it without any sound.
I will be able to begin developing more code in the hw/necpc98 part of the project.
The RS-232 port on the laptop appears to be something proprietary rather than the familiar RS232C 9-pin port. Do any adapters exist to bring it out to RS-232C? I would love to port the remote control program to work on PC-98 to aid development, if I can figure out programming the serial port.
from doslib.
@gingerbeardman That would be helpful, yes!
I've managed to gather a few PDF scans already that could use OCR. Some I found on the Internet Archive.
http://hackipedia.org/browse/Computer/Platform/PC,%20NEC%20PC-98/Collections
from doslib.
I have also updated Hackipedia.org with what PC-98 I've found so far:
http://hackipedia.org/Platform/x86/NEC%20PC-98/
from doslib.
Next task for PC-98 development: Some quick one-off programs to play with keyboard input via INT 18h. Then, begin the 8251 library to demonstrate talking directly to the 8251 chips in the PC-98 platform that drive a) the keyboard and b) the RS-232C port.
I may have to finangle as bit as the available documentation is in Japanese and not in an OCR'd format I can just copy-paste into Google Translate.
I'm reading from what docs I have that later PC-9821 systems have a proper 16550 UART but emulate the 8251 for backwards compatibility. Is that right?
from doslib.
If you need any Japanese documents running through OCR, let me know! I have software set up to do just that
from doslib.
I'm also interested in any documentation concerning NEC's ANSI driver. It seems to have a direct interface via INT DCh but I can only find some documentation on the "extensions" to the interface. Many games and utilities seem to call on it. Once call I traced into appears to set/retrieve the function key row text.
from doslib.
I'll OCR them soon.
Also, have you contacted the author of np2kai? I'm sure he'd share documentation
from doslib.
OK, here we go! This was some heavy work for my little old MBP.
Pre-process
- remove any existing OCR using PDFpenPro
- de-skew using "Enhance Scans" in Acrobat
- split large files in half by duplicating, then deleting unwanted half from each
Post-process
- re-combine them afterwards, if required
Anyway, here are the OCR'd files. I'd keep them alongside the originals.
-
Adobe Acrobat DC
- typical awkward Adobe user experience
- does not require split files
- http://www.mediafire.com/file/2kz5nlw54ogdawb/PC98-OCR-Acrobat.7z
-
ScanSnap (ABBYY Lite)
- use Acrobat to de-skew first
- requires split files
- http://www.mediafire.com/file/dd4qx8h9wtxslky/PC98-OCR-ScanSnap.7z
I also tried unsuccessfully with:
- PDFpen Pro (got so very close)
- FineReader (ABBY Pro)
Also, I'd like to point you to the Neo Kobe collection and also the Tokugawa Corporate Forums.
Translation Aggregator is a great little app to get multiple translations of whatever you copy into the clipboard. Windows only, so I run it using Wine.
Let me know how you get on with these. Happy to redo/tweak.
from doslib.
I will place these OCRd PDFs on the private copy of my hackipedia site to work from. I assume you'd rather I not publish them on the site publicly.
I checked over the PDFs and I can confirm the text is selectable, and copying the text to Notepad (Windows) or Leafpad (Linux) shows text that resembles what is on the page. Considering that some of the kanji are fairly blurry, I'm impressed.
from doslib.
I don't mind what you do with them. Feel free to share them publicly. I claim no ownership.
The new files may contain slightly lower quality image data due to the way the OCR apps modify them, so it's still worth keeping the originals around. If I redo them I always work from the originals.
There's some very impressive OCR software available these days. Though not every OCR app supports Japanese, and each has their own strengths and weaknesses.
As you work with them I'd appreciate feedback on which set give more consistent accuracy. Then in future I'll just use that one OCR app to save time!
from doslib.
Updated Translation Aggregator download link
from doslib.
I reinstalled PDFpenPro and managed to get some mediocre results:
http://www.mediafire.com/file/z30acwfyrc5y55a/PC98-OCR-PDFpenPro.7z
My thoughts on comparative quality, first is best:
- ScanSnap
- Acrobat
- PDFpenPro
Interestingly that is also the order of ease of processing, so I'll stick with ScanSnap for now.
from doslib.
So far so good. The only OCR errors I see are cases where it can't tell between 1 and I (capital i) and l (lowercase L).
from doslib.
Great. I'll see if it's possible to tweak or spell check the text. Maybe use a custom dictionary. We'll see.
from doslib.
Related Issues (20)
- Ouch! Watcom C int86/int86x() function ultimately adds 780 bytes to your MS-DOS executable! HOT 1
- 8254 library: Figure out how to detect what mode the PIT is running in
- Windows 95 EMM386.EXE V86 monitor does not allow RDTSC instruction HOT 7
- 16-bit builds of DOSAMP hang on "old" 486 test unit with Pro Audio Spectrum
- DOSAMP needs debug spew
- PC-98 validation tasks
- hw/dos himemsys.c functions assume 32-bit registers, cannot run on 286 processors HOT 1
- TMODESET.EXE VGA planar capture not compatible with IBM PS/2 model 30 VGA hardware HOT 3
- VGA: Tandy detection assumes Tandy graphics HOT 1
- tool/linker/lnkdos16.c need to forbid symbols in STACK segment unless EXE
- REMSRV vs PC-9821Lt2: Get PIT 2 (RS-232C baud rate clock) to cycle HOT 3
- DOSLIB shell scripts don't work on ubuntu 18.04 HOT 3
- Cleanup Watcom compilation, setup, as recommended by @jmalak, and remove old sarcastic remarks that have outlived their usefulness. HOT 66
- Utility to read and print PC-98 IPL1 partition tables
- HW/ADLIB test program to drive OPL3 to generate high frequencies to detect later chips that resample to 44.1KHz HOT 1
- TASK: Write hw/cpu test case to show LMSW behavior
- Build system usage is really unclear
- VGA240 and not enough memory
- DSXMENU feature request: Can we have a plain text line? HOT 1
- cant compile on debian 11 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from doslib.