Coder Social home page Coder Social logo

cl-babel / babel Goto Github PK

View Code? Open in Web Editor NEW
84.0 84.0 27.0 459 KB

Babel is a charset encoding/decoding library, not unlike GNU libiconv, written in pure Common Lisp.

Home Page: http://common-lisp.net/project/babel

License: Other

Common Lisp 99.82% Shell 0.18%

babel's People

Contributors

attila-lendvai avatar common-lisp-dev-copybara avatar darabi avatar fare avatar levin108 avatar lovesan avatar luismbo avatar muyinliu avatar pfdietz avatar ralith avatar shinmera avatar sionescu avatar snmsts avatar zmyrgel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

babel's Issues

external-format unused?

As far as I can tell, only the encoding for external formats is actually used when converting to octets:

CL-USER> (babel:string-to-octets "foo
bar
baz" :encoding (babel:make-external-format :ascii :eol-style :crlf))

=> #(102 111 111 10 98 97 114 10 98 97 122)

Am I misunderstanding how babel is meant to be used?

GBK broken?

I have the following test case that shows the GBK encoding to be broken:

echo "高怡雯" | iconv -t gbk -o /tmp/gao
# in Clozure CL
(let ((v (make-array 6 :element-type '(unsigned-byte 8))))
           (with-open-file (in "/tmp/gao"
                               :element-type '(unsigned-byte 8))
             (read-sequence v in)
             v))
→ #(184 223 226 249 246 169)
(babel:octets-to-string * :encoding :gbk)
→ "高恂霎"

Documentation

Where is the proper documentation? babel.texi contains mainly

Bla bla bla, bla bla bla.

[PATCH] Fix babel to work with :invert readtable case

Fix babel to work with :invert readtable case

Make symbol munging to use the correct case by using '#:~a-symbol-name syntax. Also fixed some refences to T when they should have been t.

Signed-off-by: Jyrki Jaakkola

diff --git a/src/enc-unicode.lisp b/src/enc-unicode.lisp
index 1a8375b..1c90a19 100644
--- a/src/enc-unicode.lisp
+++ b/src/enc-unicode.lisp
@@ -520,9 +520,9 @@ code points for each invalid byte."
   (check-type name keyword)
   (let ((swap-var (gensym "SWAP"))
         (code-point-counter-name
-          (intern (format nil "~a-CODE-POINT-COUNTER" name)))
-        (encoder-name (intern (format nil "~a-ENCODER" name)))
-        (decoder-name (intern (format nil "~a-DECODER" name))))
+          (intern (format nil (string '#:~a-code-point-counter) (string name))))
+        (encoder-name (intern (format nil (string '#:~a-encoder) (string name))))
+        (decoder-name (intern (format nil (string '#:~a-decoder) (string name)))))
     (labels ((make-bom-check-form (end start getter seq)
                (if (null endianness)
                  ``((,',swap-var
@@ -536,14 +536,14 @@ code points for each invalid byte."
                (case endianness
                  (:le ``(,,getter ,,src ,,i 2 :le))
                  (:be ``(,,getter ,,src ,,i 2 :be))
-                 (T ``(if ,',swap-var
+                 (t ``(if ,',swap-var
                         (,,getter ,,src ,,i 2 :re)
                         (,,getter ,,src ,,i 2 :ne)))))
              (make-setter-form (setter code dest di)
                (case endianness
                  (:be ``(,,setter ,,code ,,dest ,,di 2 :be))
                  (:le ``(,,setter ,,code ,,dest ,,di 2 :le))
-                 (T ``(,,setter ,,code ,,dest ,,di 2 :ne)))))
+                 (t ``(,,setter ,,code ,,dest ,,di 2 :ne)))))
       `(progn
          (define-octet-counter ,name (getter type)
            `(utf16-octet-counter ,getter ,type))
@@ -691,11 +691,11 @@ written in big-endian byte-order without a leading byte-order mark."
   (check-type endianness (or null (eql :le) (eql :be)))
   (let ((swap-var (gensym "SWAP"))
         (code-point-counter-name
-          (intern (format nil "~a-CODE-POINT-COUNTER" name)))
+          (intern (format nil (string '#:~a-code-point-counter) (string name))))
         (encoder-name
-          (intern (format nil "~a-ENCODER" name)))
+          (intern (format nil (string '#:~a-encoder) (string name))))
         (decoder-name
-          (intern (format nil "~a-DECODER" name))))
+          (intern (format nil (string '#:~a-decoder) (string name)))))
     (labels ((make-bom-check-form (end start getter src)
                (if (null endianness)
                  ``(when (not (zerop (- ,,end ,,start)))
@@ -703,8 +703,8 @@ written in big-endian byte-order without a leading byte-order mark."
                        (#.+byte-order-mark-code+
                          (incf ,,start ,',bytes) nil)
                        (#.+swapped-byte-order-mark-code-32+
-                        (incf ,,start ,',bytes) T)
-                       (T #+little-endian T)))
+                        (incf ,,start ,',bytes) t)
+                       (t #+little-endian t)))
                  '()))
              (make-setter-form (setter code dest di)
                ``(,,setter ,,code ,,dest ,,di ,',bytes

format-strings are not symbols; so can't compile on CCL

There are currently calls like (format-symbol t '#:~a-code-point-counter (string name)) (all in this file: https://github.com/cl-babel/babel/blob/master/src/enc-unicode.lisp#L527 ). Alexandria hands these off to format, and CCL's format raises a type-error.

Wrapping all those symbols in (string ...) will resolve the issue.

This effects the current (2013-06-15) quicklisp release, and in turn frustrations the compiling those who grovel difficult.

Replacement character handling is inconsistent/incomplete

The following formats have decoders that can emit #\Replacement_Character even though their encoders don't accept that: :cp1251, :iso-8859-3, :iso-8859-6, :iso-8859-7, :iso-8859-8, :iso-8859-11. :ebcdic-international has a similar issue, but with #\U+FFFF instead. :ebcdic-us seems to substitute various Latin-1 code points such as the private use characters, but for what little I know about EBCDIC, that might actually be the correct behavior.

I would expect octets-to-string output to be valid input to string-to-octets, even if chaining the two need not result in the same bytes. It's not quite clear what the behavior should be because the only other encodings in babel that run into this edge case (:cp1252, :gbk, :eucjp, :cp932) lack error checks for it entirely. I actually have a patch more or less prepared for that already, but it should be consistent with the rest.

In my opinion, signalling an error is the right thing to do when errorp is set and otherwise the ASCII substitution byte (which seems to be available in all supported encodings) could be used. decoding-error conveniently does this out of the box.

Note that this overlaps heavily with the first half of #41. Both have the same underlying issue.

with-simple-vector for "other implmentations" doesn't check for fill-pointer

`(funcall (if (adjustable-array-p ,vector)

with-simple-vector doesn't check for arrays with fill-pointer, and calls the call-with-array-data/fast for such an array, which goes wrong (at least on LispWorks).

CL-USER 60 > (progn (setq  str (make-array 10 :fill-pointer 4 :element-type 'character)) 
                    (replace str "docs")
                    (BABEL:STRING-TO-OCTETS str))
#(244 130 155 134 0 0 0)

It can be fixed by changing the condition to (or (adjustable-array-p ,vector) (array-has-fill-pointer-p ,vector)), and with this it works as expected:

CL-USER 63 > (progn (setq  str (make-array 10 :fill-pointer 4 :element-type 'character)) 
                    (replace str "docs")
                    (BABEL:STRING-TO-OCTETS str))
#(100 111 99 115)

you get this problem if you do:

(ql:quickload "quri")
(quri:url-encode (quri:url-encode "docs"))

Because quri:url-encode returns a string with a fill-pointer.
Actually found it in the test of cl-ses4, because it does this double call:
https://github.com/Jach/cl-ses4/blob/14b9dc5ffb2fe93db82312e3eefbdd4164572b71/src/canonicalize.lisp#L49

GBK-MAP in Lispworks

Gbk-map.lisp does not appear to be working in Lispworks 6.1.
Error: #\啊 is not of type BASE-CHAR.

Streaming API

So it happens more often than I'd like that I want to serialise longer pieces of text to/from an encoding. Having to round-trip through an array copy to do so is quite cumbersome. It would be great if there was instead an API that works either via callbacks, or even better, via a resumable state machine. The callback API and the current copying API could both be implemented in terms of the state machine API quite trivially, I think.

Naturally this would require refactoring most things, and is as such a big undertaking. Still, I feel like this is a very valuable feature, since having to copy megabytes if not gigabytes of text around is often not just slow, but also prohibitively taxing on memory. A state machine API would allow processing text in a streaming fashion, too, without needing to keep anything at all in memory.

Calling write-string on a vector-output-stream leads to an error

Due to a missing stream-write-string method, I get such a stack trace in sldb:

There is no applicable method for the generic function
  #<STANDARD-GENERIC-FUNCTION STREAM-WRITE-STRING (5)>
when called with arguments
  (#<BABEL-STREAMS:VECTOR-OUTPUT-STREAM {10086524A3}>
   "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">"
   0 NIL).
   [Condition of type SIMPLE-ERROR]

Restarts:
 5: CONTINUE-ERROR-HANDLING Continue processing the error as if the debugger was not available
 4: RETRY                   Retry calling the generic function.
 3: RETRY-HANDLING-REQUEST  Try again handling this HTTP request
 2: *ABORT-SERVER-REQUEST   Abort processing request 1 by simply closing the network socket
 1: REMOVE-WORKER           Stop and remove worker #<WORKER {1008D0F2B3}>
 0: ABORT                   Abort thread (#<THREAD "http worker 0 / serving request 1 / HANDLE-LEVEL-1-ERROR / HANDLE-TOPLEVEL-ERROR" RUNNING {1008D0FAF3}>)

Backtrace:
  0: (HU.DWIM.UTIL::INVOKE-SLIME-DEBUGGER #<SIMPLE-ERROR "~@<There is no applicable method for the generic function ~2I~_~S~ ..)
  1: (HU.DWIM.UTIL:MAYBE-INVOKE-DEBUGGER #<SIMPLE-ERROR "~@<There is no applicable method for the generic function ~2I~_~S~ ..)
  2: ((SB-PCL::EMF HU.DWIM.WEB-SERVER:HANDLE-TOPLEVEL-ERROR) #<unavailable argument> #<unavailable argument> #<HU.DWIM.WEB-SERVER:BROKER-BASED-SERVER listen: 0.0.0.0/11080, 0.0.0.0/8443; brokers: 2 {100C47..
  3: ((:METHOD HU.DWIM.WEB-SERVER:HANDLE-TOPLEVEL-ERROR :AROUND (T T)) #<HU.DWIM.WEB-SERVER:BROKER-BASED-SERVER listen: 0.0.0.0/11080, 0.0.0.0/8443; brokers: 2 {100C47D493}> #<SIMPLE-ERROR "~@<There is no ..
  4: ((FLET HU.DWIM.WEB-SERVER::HANDLE-REQUEST-ERROR :IN HU.DWIM.WEB-SERVER::WORKER-LOOP/SERVE-ONE-REQUEST) #<SIMPLE-ERROR "~@<There is no applicable method for the generic function ~2I~_~S~ ..)
  5: ((LABELS HU.DWIM.UTIL::HANDLE-LEVEL-1-ERROR :IN HU.DWIM.UTIL::CALL-WITH-LAYERED-ERROR-HANDLERS) #<SIMPLE-ERROR "~@<There is no applicable method for the generic function ~2I~_~S~ ..)
  6: (SIGNAL #<SIMPLE-ERROR "~@<There is no applicable method for the generic function ~2I~_~S~ ..)
  7: (ERROR "~@<There is no applicable method for the generic function ~2I~_~S~ ..)
  8: ((:METHOD NO-APPLICABLE-METHOD (T)) #<STANDARD-GENERIC-FUNCTION STREAM-WRITE-STRING (5)> #<BABEL-STREAMS:VECTOR-OUTPUT-STREAM {10086524A3}> "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http:..
  9: (SB-PCL::CALL-NO-APPLICABLE-METHOD #<STANDARD-GENERIC-FUNCTION STREAM-WRITE-STRING (5)> (#<BABEL-STREAMS:VECTOR-OUTPUT-STREAM {10086524A3}> "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http:..
 10: (SB-IMPL::%WRITE-STRING "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">" #<BABEL-STREAMS:VECTOR-OUTPUT-STREAM {10086524A3}> 0 NIL)
      Locals:
        SB-DEBUG::ARG-0 = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">"
        SB-DEBUG::ARG-1 = #<BABEL-STREAMS:VECTOR-OUTPUT-STREAM {10086524A3}>
        SB-DEBUG::ARG-2 = 0
        SB-DEBUG::ARG-3 = NIL

Nonconforming code in utf8-decode-tests

That macro calls substitute to replace ? with a non-base-char in a string. This does not work if the string is a base-string. It is allowed, by the standard, for string constants to be simple-base-strings if all the characters in them are base-chars. The macro should coerce that string constant to a one dimensional simple array of characters.

This is relevant because a potential space optimization in SBCL is to make the double quote reader macro return a simple-base-string, when possible. If SBCL does this, this test will break. I understand Clasp already experienced this issue.

Babel will not compile on sbcl 1.0.55

With a fresh checkout of babel from github, I get the following errors:

; file: /home/raison/quicklisp/local-projects/babel/src/enc-unicode.lisp
; in: DEFINE-UTF-16 :UTF-16
; (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UTF-16 :UTF-16))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UTF-16 :UTF-16LE
; (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16LE :LE)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UTF-16 :UTF-16LE ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UTF-16 :UTF-16BE
; (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16BE :BE)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UTF-16 :UTF-16BE ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32
; (BABEL-ENCODINGS::DEFINE-UCS :UTF-32 4)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UCS :UTF-32 ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32LE
; (BABEL-ENCODINGS::DEFINE-UCS :UTF-32LE 4 :LE)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UCS :UTF-32LE ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32BE
; (BABEL-ENCODINGS::DEFINE-UCS :UTF-32BE 4 :BE)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UCS :UTF-32BE ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2
; (BABEL-ENCODINGS::DEFINE-UCS :UCS-2 2 NIL 65536)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UCS :UCS-2 ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2LE
; (BABEL-ENCODINGS::DEFINE-UCS :UCS-2LE 2 :LE 65536)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UCS :UCS-2LE ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2BE
; (BABEL-ENCODINGS::DEFINE-UCS :UCS-2BE 2 :BE 65536)
;
; caught ERROR:
; (during macroexpansion of (DEFINE-UCS :UCS-2BE ...))
; #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
; Wanted one of (STRING SIMPLE-STRING).

Can't compile in sbcl

I saw some similar issues and tried to apply there resolution but not working

This is SBCL 1.0.57.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* (ql:quickload "babel")
To load "babel":
  Load 1 ASDF system:
    babel
; Loading "babel"

; file: /home/recruiterbox/quicklisp/dists/quicklisp/software/babel-20121125-git/src/enc-unicode.lisp
; in: DEFINE-UTF-16 :UTF-16
;     (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UTF-16 :UTF-16))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UTF-16 :UTF-16LE
;     (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16LE :LE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UTF-16 :UTF-16LE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UTF-16 :UTF-16BE
;     (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16BE :BE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UTF-16 :UTF-16BE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).
.
; in: DEFINE-UCS :UTF-32
;     (BABEL-ENCODINGS::DEFINE-UCS :UTF-32 4)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UTF-32 ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32LE
;     (BABEL-ENCODINGS::DEFINE-UCS :UTF-32LE 4 :LE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UTF-32LE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32BE
;     (BABEL-ENCODINGS::DEFINE-UCS :UTF-32BE 4 :BE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UTF-32BE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2
;     (BABEL-ENCODINGS::DEFINE-UCS :UCS-2 2 NIL 65536)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UCS-2 ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2LE
;     (BABEL-ENCODINGS::DEFINE-UCS :UCS-2LE 2 :LE 65536)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UCS-2LE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2BE
;     (BABEL-ENCODINGS::DEFINE-UCS :UCS-2BE 2 :BE 65536)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UCS-2BE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

debugger invoked on a ASDF:COMPILE-ERROR in thread
#<THREAD "main thread" RUNNING {AAF87A1}>:
  Error while invoking #<COMPILE-OP (:VERBOSE NIL) {C8830E1}> on
  #<CL-SOURCE-FILE "babel" "src" "enc-unicode">

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY ] Retry compiling #<CL-SOURCE-FILE "babel" "src" "enc-unicode">.
  1: [ACCEPT] Continue, treating
              compiling #<CL-SOURCE-FILE "babel" "src" "enc-unicode"> as having
              been successful.
  2: [ABORT ] Give up on "babel"
  3:          Exit debugger, returning to top level.

((SB-PCL::FAST-METHOD ASDF:PERFORM (ASDF:COMPILE-OP ASDF:CL-SOURCE-FILE))
 #<unavailable argument>
 #<unavailable argument>
 #<ASDF:COMPILE-OP (:VERBOSE NIL) {C8830E1}>
 #<ASDF:CL-SOURCE-FILE "babel" "src" "enc-unicode">)
0]

-- thanks

ps:

 (ql:update-all-dists)

1 dist to check.
You already have the latest version of "quicklisp": 2012-12-23.
NIL

cp932 error

Characters such as #\№ can not be converted to octets.

CL-USER> (babel:string-to-octets "№あいう" :encoding :cp932)
; Evaluation aborted on #<TYPE-ERROR #xCE7F6BE>.
CL-USER> (ccl:encode-string-to-octets "№あいう" :external-format :cp932)
#(250 89 130 160 130 162 130 164)
8
CL-USER> (lisp-implementation-version)
"Version 1.10-r16196  (WindowsX8632)"
CL-USER> 

This patch seems to fix this error.

bash-3.2$ diff -u enc-jpn.lisp new-enc-jpn.lisp 
--- enc-jpn.lisp    2015-04-14 13:36:44.000000000 +0900
+++ new-enc-jpn.lisp    2015-05-19 22:04:36.000000000 +0900
@@ -43,8 +43,9 @@
                   (+ (ash mid 8) low))))))
   (dolist (i *eucjp*)
     (let ((cp932 (euc-cp932 (first i))))
-      (setf (gethash cp932 *cp932-to-ucs-hash*) (second i))
-      (setf (gethash (second i) *ucs-to-cp932-hash*) cp932))))
+      (when cp932
+        (setf (gethash cp932 *cp932-to-ucs-hash*) (second i))
+        (setf (gethash (second i) *ucs-to-cp932-hash*) cp932)))))

 ;ascii
 (loop for i from #x00 to #x7f do
bash-3.2$ 

Can't load latest babel in SBCL 1.1.1

CL-USER> (ql:quickload :babel)
To load "babel":
  Load 1 ASDF system:
    babel
; Loading "babel"

; file: /home/walker/lisp/babel/src/enc-unicode.lisp
; in: DEFINE-UTF-16 :UTF-16
;     (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UTF-16 :UTF-16))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UTF-16 :UTF-16LE
;     (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16LE :LE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UTF-16 :UTF-16LE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UTF-16 :UTF-16BE
;     (BABEL-ENCODINGS::DEFINE-UTF-16 :UTF-16BE :BE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UTF-16 :UTF-16BE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).
.
; in: DEFINE-UCS :UTF-32
;     (BABEL-ENCODINGS::DEFINE-UCS :UTF-32 4)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UTF-32 ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32LE
;     (BABEL-ENCODINGS::DEFINE-UCS :UTF-32LE 4 :LE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UTF-32LE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UTF-32BE
;     (BABEL-ENCODINGS::DEFINE-UCS :UTF-32BE 4 :BE)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UTF-32BE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2
;     (BABEL-ENCODINGS::DEFINE-UCS :UCS-2 2 NIL 65536)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UCS-2 ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2LE
;     (BABEL-ENCODINGS::DEFINE-UCS :UCS-2LE 2 :LE 65536)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UCS-2LE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).

; in: DEFINE-UCS :UCS-2BE
;     (BABEL-ENCODINGS::DEFINE-UCS :UCS-2BE 2 :BE 65536)
; 
; caught ERROR:
;   (during macroexpansion of (DEFINE-UCS :UCS-2BE ...))
;   #:~A-CODE-POINT-COUNTER fell through ETYPECASE expression.
;   Wanted one of (STRING SIMPLE-STRING).
Error while invoking #<COMPILE-OP (:VERBOSE NIL) {100C4F0963}>
on #<CL-SOURCE-FILE "babel" "src" "enc-unicode">
   [Condition of type ASDF:COMPILE-ERROR]
; Evaluation aborted on NIL

Lispworks problem with octets-to-string

IT IS FIXED IN CURRENT RELEASE, SORRY, MY BAD!

The following example works out of the box in sbcl, but fails in lispworks unless I use a workaround (that is NOT going to work in all cases of course). I don't consider the workaround to be a fix, but a way to pinpoint a bug.

when I invoke without patch (workaround) below:

(puri::decode-escaped-encoding "/tal%2Dstatic%2Dfullscreen%2Dforall/toc" t)

I get:
Error: Illegal :UTF-8 character starting at position 0.

with patch (workaround) it evaluates to:
"/tal-static-fullscreen-forall/toc"

I have babel-20140316-git from quicklisp and lispworks professional linux 6.1.1

My workaround for URLs specifically:

(in-package :babel)

(defparameter *ascii-codes*
  '((32 " ")(33 "!")(34 "\"")(35 "#")(36 "$")(37 "%")(38 "&")(39 "'")(40 "(")(41 ")")
    (42 "*")(43 "+")(44 ",")(45 "-")(46 ".")(47 "/")(48 "0")(49 "1")(50 "2")(51 "3")
    (52 "4")(53 "5")(54 "6")(55 "7")(56 "8")(57 "9")(58 ":")(59 ";")(60 "<")(61 "=")
    (62 ">")(63 "?")(64 "@")(65 "A")(66 "B")(67 "C")(68 "D")(69 "E")(70 "F")(71 "G")
    (72 "H")(73 "I")(74 "J")(75 "K")(76 "L")(77 "M")(78 "N")(79 "O")(80 "P")(81 "Q")
    (82 "R")(83 "S")(84 "T")(85 "U")(86 "V")(87 "W")(88 "X")(89 "Y")(90 "Z")(91 "[")
    (92 "\\")(93 "]")(94 "^")(95 "_")(96 "`")(97 "a")(98 "b")(99 "c")(100 "d")(101 "e")
    (102 "f")(103 "g")(104 "h")(105 "i")(106 "j")(107 "k")(108 "l")(109 "m")(110 "n")(111 "o")
    (112 "p")(113 "q")(114 "r")(115 "s")(116 "t")(117 "u")(118 "v")(119 "w")(120 "x")(121 "y")
    (122 "z")(123 "{")(124 "|")(125 "}")(126 "~")))

#+lispworks
(defun octets-to-string (vector &key (start 0) 
                  end
                  errorp
                  encoding )
  (let ((retval (make-array `(,(length vector)) :element-type 'character :initial-element #\Space)))
    (dotimes (i (length vector))
      (setf (aref retval i) (aref (second (assoc (aref vector i) *ascii-codes*)) 0)))
    retval))

Decoding Invalid Code Sequence Consistency

We would want to know when we have accumulated in a buffer enough bytes to decode a character, depending on the current encodng…
babel doesn't provide a convenient (efficient) API to test that, but I hoped to be able to use OCTETS-TO-STRING for that.
Unfortunately, handling of incomplete code sequences by the different encoding is not consistent.

cl-user> (babel:OCTETS-TO-STRING (coerce #(194 182) '(vector (unsigned-byte 8))) :start 0 :end 2 :errorp nil :encoding :utf-8)
"¶"
cl-user> (babel:OCTETS-TO-STRING (coerce #(194 182) '(vector (unsigned-byte 8))) :start 0 :end 1 :errorp nil :encoding :utf-8)
"�"
cl-user> (babel:OCTETS-TO-STRING (coerce #(194 182) '(vector (unsigned-byte 8))) :start 0 :end 2 :errorp nil :encoding :utf-16)
"슶"
cl-user> (babel:OCTETS-TO-STRING (coerce #(194 182) '(vector (unsigned-byte 8))) :start 0 :end 1 :errorp nil :encoding :utf-16)
> Debug: Failed assertion: (= babel-encodings::i babel-encodings::end)
> While executing: (:internal swank::invoke-default-debugger), in process new-repl-thread(1481).
> Type cmd-/ to continue, cmd-. to abort, cmd-\ for a list of available restarts.
> If continued: test the assertion again.
> Type :? for other options.
1 > :q
; Evaluation aborted on #<simple-error #x302006CBABDD>.
cl-user> (babel:octets-to-string (babel:string-to-octets "こんにちは 世界" :encoding :eucjp) :start 0 :end 2 :encoding :eucjp)
"こ"
cl-user> (babel:octets-to-string (babel:string-to-octets "こんにちは 世界" :encoding :eucjp) :start 0 :end 1 :encoding :eucjp)
> Debug: Illegal :eucjp character starting at position 0.
> While executing: (:internal swank::invoke-default-debugger), in process repl-thread(3921).
> Type cmd-. to abort, cmd-\ for a list of available restarts.
> Type :? for other options.
1 > :q
; Evaluation aborted on #<babel-encodings:end-of-input-in-character #x302006CA4EAD>.
cl-user>

I would suggest to add a keyword parameter to specify what to do in such a case:

| :on-invalid-code substitution-character | would insert the given substitution-character in place of the code. |
| :on-invalid-code :ignore                | would ignore the code and go on.                                    |
| :on-invalid-code :error                 | would signal a babel-encodings:character-decoding-error condition.  |

I would propose also, to provide an efficient function to query the length of a code sequence for the next character:

(babel:decode-character bytes &key start end encoding)
--> character ;
    sequence-valid-p ;
    length
  • If a character can be decoded, then it is returned as primary value, otherwise NIL.

  • If the code sequence is definitely invalid then NIL, else T. Notably if it is just too short, but could be a valid code sequence if completed, T should be returned.

  • If the character is decoded and returned, then the length of the decoded code sequence is returned; if sequence-valid-p then a minimal code sequence length with the given prefix is returned; otherwise a minimum code sequence length.

| character | sequence-valid-p | length                                                         |
|-----------+------------------+----------------------------------------------------------------|
| ch        | T                | length of the decoded sequence                                 |
| ch        | NIL              | --impossible--                                                 |
| NIL       | T                | minimal length of a valid code sequence with the given prefix. |
| NIL       | NIL              | minimal length of a valid code sequence.                       |

For example, in the case NIL T len, if len <= (- end start), then it means the given code sequence is valid, but the decoded code is not the code of a character. eg. #(#xED #xA0 #x80) is UTF-8 for 55296, but (code-char 55296) --> nil.

(babel:decode-character (coerce #(65 32 66) '(vector (unsigned-byte 8)))
                         :start 0 :end 3 :encoding :utf-8)
--> #\A
    T
    1

(babel:decode-character (coerce #(195 128 32 80 97 114 105 115) '(vector (unsigned-byte 8)))
                        :start 0 :end 3 :encoding :utf-8)
--> #\À
    T
    2

(babel:decode-character (coerce #(195 128 32 80 97 114 105 115) '(vector (unsigned-byte 8)))
                        :start 0 :end 1 :encoding :utf-8)
--> NIL
    T
    2

(babel:decode-character (coerce #(195 195 32 80 97 114 105 115) '(vector (unsigned-byte 8)))
                        :start 0 :end 1 :encoding :utf-8)
--> NIL
    T
    2

(babel:decode-character (coerce #(195 195 32 80 97 114 105 115) '(vector (unsigned-byte 8)))
                        :start 0 :end 2 :encoding :utf-8)
--> NIL
    NIL
    1

(babel:decode-character (coerce #(#xED #xA0 #x80) '(vector (unsigned-byte 8)))
                        :start 0 :end 3 :encoding :utf-8)
--> NIL
    T
    3

babel octets to string: type error

(defparameter b1 (babel:string-to-octets "string 1"))
(defparameter b2 (babel:string-to-octets "string 2"))
(defparameter delim (babel:string-to-octets "|"))
(defparameter b3 (concatenate 'vector b1 delim b2))

(babel:octets-to-string b3)
;; give this error message:

The value of VECTOR is #(115 116 114 105 110 103 32 49 124 115
                         116 114 105 110 103 32 50), which is not of type (VECTOR
                                                                           (UNSIGNED-BYTE
                                                                            8)).
[Condition of type SIMPLE-TYPE-ERROR]
----

but it's ok in flexi-streams and cl-base64
(flexi-streams:octets-to-string b3)
=> "string 1|string 2"
(cl-base64:base64-string-to-string (cl-base64:usb8-array-to-base64-string b3))
=> "string 1|string 2"

Warning on undefined variables first load

Hey all,

on first loading babel, I receive the following warnings regarding undefined variables:

; file: C:/Users/Zulu/quicklisp/dists/quicklisp/software/yason-20230214-git/encode.lisp
; in: DEFMETHOD YASON:ENCODE (SYMBOL)
;     (EQ YASON::OBJECT YASON:FALSE)
;
; caught WARNING:
;   undefined variable: YASON:FALSE

;     (EQ YASON::OBJECT YASON:TRUE)
;
; caught WARNING:
;   undefined variable: YASON:TRUE
;
; compilation unit finished
;   Undefined variables:
;     YASON:FALSE YASON:TRUE
;   caught 2 WARNING conditions
;   printed 1 note

I'm assuming it's an issue with the load order of the files in the project, as those variables do exist

Licensing issues

The NOTES file admits to lifting some code from OpenMCL, which is LLGPL. Doesn't that conflict with the stated MIT license of Babel?

There is concern among some Lisp users, that since Babel is pulled in by CFFI, this licensing situation creates a problem for distributing application binaries that use foreign dependencies. See this thread for context.

Since NOTES file marks this as an open issue, maybe it's possible to eventually close it? As I understand, OpenMCL today is CCL, so maybe Clozure Associates can help rectify the legal uncertainty here?

cp932 non-round-trip mapping workaround

It would be very helpful if you could change babel-encodings::*cp932-to-ucs-hash* to incorporate a workaround described here. https://support.microsoft.com/en-us/kb/170559/en-us

BABEL> (setf *print-base* 16
             *print-radix* t)
T
BABEL> (mapcar #'char-code (coerce "ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹ" 'list))
(#x2170 #x2171 #x2172 #x2173 #x2174 #x2175 #x2176 #x2177 #x2178 #x2179)
BABEL> (string-to-octets "ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹ" :encoding :cp932)
#(#xEE #xEF #xEE #xF0 #xEE #xF1 #xEE #xF2 #xEE #xF3 #xEE #xF4 #xEE #xF5 #xEE #xF6 #xEE #xF7 #xEE #xF8)
BABEL> (load "babel-cp932-workaround.lisp")
#P"c:/lispbox-0.7/babel-cp932-workaround.lisp"
BABEL> (string-to-octets "ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹ" :encoding :cp932)
#(#xFA #x40 #xFA #x41 #xFA #x42 #xFA #x43 #xFA #x44 #xFA #x45 #xFA #x46 #xFA #x47 #xFA #x48 #xFA #x49)
BABEL> 

"babel-cp932-workaround.lisp" is a patch that I use.

 (in-package #:babel-encodings)

;; This is quoted from https://support.microsoft.com/en-us/kb/170559/en-us
(let ((kb170559 "0x8790   -> U+2252   -> 0x81e0   Approximately Equal To Or The Image Of
0x8791   -> U+2261   -> 0x81df   Identical To
0x8792   -> U+222b   -> 0x81e7   Integral
0x8795   -> U+221a   -> 0x81e3   Square Root
0x8796   -> U+22a5   -> 0x81db   Up Tack
0x8797   -> U+2220   -> 0x81da   Angle
0x879a   -> U+2235   -> 0x81e6   Because
0x879b   -> U+2229   -> 0x81bf   Intersection
0x879c   -> U+222a   -> 0x81be   Union
0xed40   -> U+7e8a   -> 0xfa5c   CJK Unified Ideograph
0xed41   -> U+891c   -> 0xfa5d   CJK Unified Ideograph
0xed42   -> U+9348   -> 0xfa5e   CJK Unified Ideograph
0xed43   -> U+9288   -> 0xfa5f   CJK Unified Ideograph
0xed44   -> U+84dc   -> 0xfa60   CJK Unified Ideograph
0xed45   -> U+4fc9   -> 0xfa61   CJK Unified Ideograph
0xed46   -> U+70bb   -> 0xfa62   CJK Unified Ideograph
0xed47   -> U+6631   -> 0xfa63   CJK Unified Ideograph
0xed48   -> U+68c8   -> 0xfa64   CJK Unified Ideograph
0xed49   -> U+92f9   -> 0xfa65   CJK Unified Ideograph
0xed4a   -> U+66fb   -> 0xfa66   CJK Unified Ideograph
0xed4b   -> U+5f45   -> 0xfa67   CJK Unified Ideograph
0xed4c   -> U+4e28   -> 0xfa68   CJK Unified Ideograph
0xed4d   -> U+4ee1   -> 0xfa69   CJK Unified Ideograph
0xed4e   -> U+4efc   -> 0xfa6a   CJK Unified Ideograph
0xed4f   -> U+4f00   -> 0xfa6b   CJK Unified Ideograph
0xed50   -> U+4f03   -> 0xfa6c   CJK Unified Ideograph
0xed51   -> U+4f39   -> 0xfa6d   CJK Unified Ideograph
0xed52   -> U+4f56   -> 0xfa6e   CJK Unified Ideograph
0xed53   -> U+4f92   -> 0xfa6f   CJK Unified Ideograph
0xed54   -> U+4f8a   -> 0xfa70   CJK Unified Ideograph
0xed55   -> U+4f9a   -> 0xfa71   CJK Unified Ideograph
0xed56   -> U+4f94   -> 0xfa72   CJK Unified Ideograph
0xed57   -> U+4fcd   -> 0xfa73   CJK Unified Ideograph
0xed58   -> U+5040   -> 0xfa74   CJK Unified Ideograph
0xed59   -> U+5022   -> 0xfa75   CJK Unified Ideograph
0xed5a   -> U+4fff   -> 0xfa76   CJK Unified Ideograph
0xed5b   -> U+501e   -> 0xfa77   CJK Unified Ideograph
0xed5c   -> U+5046   -> 0xfa78   CJK Unified Ideograph
0xed5d   -> U+5070   -> 0xfa79   CJK Unified Ideograph
0xed5e   -> U+5042   -> 0xfa7a   CJK Unified Ideograph
0xed5f   -> U+5094   -> 0xfa7b   CJK Unified Ideograph
0xed60   -> U+50f4   -> 0xfa7c   CJK Unified Ideograph
0xed61   -> U+50d8   -> 0xfa7d   CJK Unified Ideograph
0xed62   -> U+514a   -> 0xfa7e   CJK Unified Ideograph
0xed63   -> U+5164   -> 0xfa80   CJK Unified Ideograph
0xed64   -> U+519d   -> 0xfa81   CJK Unified Ideograph
0xed65   -> U+51be   -> 0xfa82   CJK Unified Ideograph
0xed66   -> U+51ec   -> 0xfa83   CJK Unified Ideograph
0xed67   -> U+5215   -> 0xfa84   CJK Unified Ideograph
0xed68   -> U+529c   -> 0xfa85   CJK Unified Ideograph
0xed69   -> U+52a6   -> 0xfa86   CJK Unified Ideograph
0xed6a   -> U+52c0   -> 0xfa87   CJK Unified Ideograph
0xed6b   -> U+52db   -> 0xfa88   CJK Unified Ideograph
0xed6c   -> U+5300   -> 0xfa89   CJK Unified Ideograph
0xed6d   -> U+5307   -> 0xfa8a   CJK Unified Ideograph
0xed6e   -> U+5324   -> 0xfa8b   CJK Unified Ideograph
0xed6f   -> U+5372   -> 0xfa8c   CJK Unified Ideograph
0xed70   -> U+5393   -> 0xfa8d   CJK Unified Ideograph
0xed71   -> U+53b2   -> 0xfa8e   CJK Unified Ideograph
0xed72   -> U+53dd   -> 0xfa8f   CJK Unified Ideograph
0xed73   -> U+fa0e   -> 0xfa90   CJK compatibility Ideograph
0xed74   -> U+549c   -> 0xfa91   CJK Unified Ideograph
0xed75   -> U+548a   -> 0xfa92   CJK Unified Ideograph
0xed76   -> U+54a9   -> 0xfa93   CJK Unified Ideograph
0xed77   -> U+54ff   -> 0xfa94   CJK Unified Ideograph
0xed78   -> U+5586   -> 0xfa95   CJK Unified Ideograph
0xed79   -> U+5759   -> 0xfa96   CJK Unified Ideograph
0xed7a   -> U+5765   -> 0xfa97   CJK Unified Ideograph
0xed7b   -> U+57ac   -> 0xfa98   CJK Unified Ideograph
0xed7c   -> U+57c8   -> 0xfa99   CJK Unified Ideograph
0xed7d   -> U+57c7   -> 0xfa9a   CJK Unified Ideograph
0xed7e   -> U+fa0f   -> 0xfa9b   CJK compatibility Ideograph
0xed80   -> U+fa10   -> 0xfa9c   CJK compatibility Ideograph
0xed81   -> U+589e   -> 0xfa9d   CJK Unified Ideograph
0xed82   -> U+58b2   -> 0xfa9e   CJK Unified Ideograph
0xed83   -> U+590b   -> 0xfa9f   CJK Unified Ideograph
0xed84   -> U+5953   -> 0xfaa0   CJK Unified Ideograph
0xed85   -> U+595b   -> 0xfaa1   CJK Unified Ideograph
0xed86   -> U+595d   -> 0xfaa2   CJK Unified Ideograph
0xed87   -> U+5963   -> 0xfaa3   CJK Unified Ideograph
0xed88   -> U+59a4   -> 0xfaa4   CJK Unified Ideograph
0xed89   -> U+59ba   -> 0xfaa5   CJK Unified Ideograph
0xed8a   -> U+5b56   -> 0xfaa6   CJK Unified Ideograph
0xed8b   -> U+5bc0   -> 0xfaa7   CJK Unified Ideograph
0xed8c   -> U+752f   -> 0xfaa8   CJK Unified Ideograph
0xed8d   -> U+5bd8   -> 0xfaa9   CJK Unified Ideograph
0xed8e   -> U+5bec   -> 0xfaaa   CJK Unified Ideograph
0xed8f   -> U+5c1e   -> 0xfaab   CJK Unified Ideograph
0xed90   -> U+5ca6   -> 0xfaac   CJK Unified Ideograph
0xed91   -> U+5cba   -> 0xfaad   CJK Unified Ideograph
0xed92   -> U+5cf5   -> 0xfaae   CJK Unified Ideograph
0xed93   -> U+5d27   -> 0xfaaf   CJK Unified Ideograph
0xed94   -> U+5d53   -> 0xfab0   CJK Unified Ideograph
0xed95   -> U+fa11   -> 0xfab1   CJK compatibility Ideograph
0xed96   -> U+5d42   -> 0xfab2   CJK Unified Ideograph
0xed97   -> U+5d6d   -> 0xfab3   CJK Unified Ideograph
0xed98   -> U+5db8   -> 0xfab4   CJK Unified Ideograph
0xed99   -> U+5db9   -> 0xfab5   CJK Unified Ideograph
0xed9a   -> U+5dd0   -> 0xfab6   CJK Unified Ideograph
0xed9b   -> U+5f21   -> 0xfab7   CJK Unified Ideograph
0xed9c   -> U+5f34   -> 0xfab8   CJK Unified Ideograph
0xed9d   -> U+5f67   -> 0xfab9   CJK Unified Ideograph
0xed9e   -> U+5fb7   -> 0xfaba   CJK Unified Ideograph
0xed9f   -> U+5fde   -> 0xfabb   CJK Unified Ideograph
0xeda0   -> U+605d   -> 0xfabc   CJK Unified Ideograph
0xeda1   -> U+6085   -> 0xfabd   CJK Unified Ideograph
0xeda2   -> U+608a   -> 0xfabe   CJK Unified Ideograph
0xeda3   -> U+60de   -> 0xfabf   CJK Unified Ideograph
0xeda4   -> U+60d5   -> 0xfac0   CJK Unified Ideograph
0xeda5   -> U+6120   -> 0xfac1   CJK Unified Ideograph
0xeda6   -> U+60f2   -> 0xfac2   CJK Unified Ideograph
0xeda7   -> U+6111   -> 0xfac3   CJK Unified Ideograph
0xeda8   -> U+6137   -> 0xfac4   CJK Unified Ideograph
0xeda9   -> U+6130   -> 0xfac5   CJK Unified Ideograph
0xedaa   -> U+6198   -> 0xfac6   CJK Unified Ideograph
0xedab   -> U+6213   -> 0xfac7   CJK Unified Ideograph
0xedac   -> U+62a6   -> 0xfac8   CJK Unified Ideograph
0xedad   -> U+63f5   -> 0xfac9   CJK Unified Ideograph
0xedae   -> U+6460   -> 0xfaca   CJK Unified Ideograph
0xedaf   -> U+649d   -> 0xfacb   CJK Unified Ideograph
0xedb0   -> U+64ce   -> 0xfacc   CJK Unified Ideograph
0xedb1   -> U+654e   -> 0xfacd   CJK Unified Ideograph
0xedb2   -> U+6600   -> 0xface   CJK Unified Ideograph
0xedb3   -> U+6615   -> 0xfacf   CJK Unified Ideograph
0xedb4   -> U+663b   -> 0xfad0   CJK Unified Ideograph
0xedb5   -> U+6609   -> 0xfad1   CJK Unified Ideograph
0xedb6   -> U+662e   -> 0xfad2   CJK Unified Ideograph
0xedb7   -> U+661e   -> 0xfad3   CJK Unified Ideograph
0xedb8   -> U+6624   -> 0xfad4   CJK Unified Ideograph
0xedb9   -> U+6665   -> 0xfad5   CJK Unified Ideograph
0xedba   -> U+6657   -> 0xfad6   CJK Unified Ideograph
0xedbb   -> U+6659   -> 0xfad7   CJK Unified Ideograph
0xedbc   -> U+fa12   -> 0xfad8   CJK compatibility Ideograph
0xedbd   -> U+6673   -> 0xfad9   CJK Unified Ideograph
0xedbe   -> U+6699   -> 0xfada   CJK Unified Ideograph
0xedbf   -> U+66a0   -> 0xfadb   CJK Unified Ideograph
0xedc0   -> U+66b2   -> 0xfadc   CJK Unified Ideograph
0xedc1   -> U+66bf   -> 0xfadd   CJK Unified Ideograph
0xedc2   -> U+66fa   -> 0xfade   CJK Unified Ideograph
0xedc3   -> U+670e   -> 0xfadf   CJK Unified Ideograph
0xedc4   -> U+f929   -> 0xfae0   CJK compatibility Ideograph
0xedc5   -> U+6766   -> 0xfae1   CJK Unified Ideograph
0xedc6   -> U+67bb   -> 0xfae2   CJK Unified Ideograph
0xedc7   -> U+6852   -> 0xfae3   CJK Unified Ideograph
0xedc8   -> U+67c0   -> 0xfae4   CJK Unified Ideograph
0xedc9   -> U+6801   -> 0xfae5   CJK Unified Ideograph
0xedca   -> U+6844   -> 0xfae6   CJK Unified Ideograph
0xedcb   -> U+68cf   -> 0xfae7   CJK Unified Ideograph
0xedcc   -> U+fa13   -> 0xfae8   CJK compatibility Ideograph
0xedcd   -> U+6968   -> 0xfae9   CJK Unified Ideograph
0xedce   -> U+fa14   -> 0xfaea   CJK compatibility Ideograph
0xedcf   -> U+6998   -> 0xfaeb   CJK Unified Ideograph
0xedd0   -> U+69e2   -> 0xfaec   CJK Unified Ideograph
0xedd1   -> U+6a30   -> 0xfaed   CJK Unified Ideograph
0xedd2   -> U+6a6b   -> 0xfaee   CJK Unified Ideograph
0xedd3   -> U+6a46   -> 0xfaef   CJK Unified Ideograph
0xedd4   -> U+6a73   -> 0xfaf0   CJK Unified Ideograph
0xedd5   -> U+6a7e   -> 0xfaf1   CJK Unified Ideograph
0xedd6   -> U+6ae2   -> 0xfaf2   CJK Unified Ideograph
0xedd7   -> U+6ae4   -> 0xfaf3   CJK Unified Ideograph
0xedd8   -> U+6bd6   -> 0xfaf4   CJK Unified Ideograph
0xedd9   -> U+6c3f   -> 0xfaf5   CJK Unified Ideograph
0xedda   -> U+6c5c   -> 0xfaf6   CJK Unified Ideograph
0xeddb   -> U+6c86   -> 0xfaf7   CJK Unified Ideograph
0xeddc   -> U+6c6f   -> 0xfaf8   CJK Unified Ideograph
0xeddd   -> U+6cda   -> 0xfaf9   CJK Unified Ideograph
0xedde   -> U+6d04   -> 0xfafa   CJK Unified Ideograph
0xeddf   -> U+6d87   -> 0xfafb   CJK Unified Ideograph
0xede0   -> U+6d6f   -> 0xfafc   CJK Unified Ideograph
0xede1   -> U+6d96   -> 0xfb40   CJK Unified Ideograph
0xede2   -> U+6dac   -> 0xfb41   CJK Unified Ideograph
0xede3   -> U+6dcf   -> 0xfb42   CJK Unified Ideograph
0xede4   -> U+6df8   -> 0xfb43   CJK Unified Ideograph
0xede5   -> U+6df2   -> 0xfb44   CJK Unified Ideograph
0xede6   -> U+6dfc   -> 0xfb45   CJK Unified Ideograph
0xede7   -> U+6e39   -> 0xfb46   CJK Unified Ideograph
0xede8   -> U+6e5c   -> 0xfb47   CJK Unified Ideograph
0xede9   -> U+6e27   -> 0xfb48   CJK Unified Ideograph
0xedea   -> U+6e3c   -> 0xfb49   CJK Unified Ideograph
0xedeb   -> U+6ebf   -> 0xfb4a   CJK Unified Ideograph
0xedec   -> U+6f88   -> 0xfb4b   CJK Unified Ideograph
0xeded   -> U+6fb5   -> 0xfb4c   CJK Unified Ideograph
0xedee   -> U+6ff5   -> 0xfb4d   CJK Unified Ideograph
0xedef   -> U+7005   -> 0xfb4e   CJK Unified Ideograph
0xedf0   -> U+7007   -> 0xfb4f   CJK Unified Ideograph
0xedf1   -> U+7028   -> 0xfb50   CJK Unified Ideograph
0xedf2   -> U+7085   -> 0xfb51   CJK Unified Ideograph
0xedf3   -> U+70ab   -> 0xfb52   CJK Unified Ideograph
0xedf4   -> U+710f   -> 0xfb53   CJK Unified Ideograph
0xedf5   -> U+7104   -> 0xfb54   CJK Unified Ideograph
0xedf6   -> U+715c   -> 0xfb55   CJK Unified Ideograph
0xedf7   -> U+7146   -> 0xfb56   CJK Unified Ideograph
0xedf8   -> U+7147   -> 0xfb57   CJK Unified Ideograph
0xedf9   -> U+fa15   -> 0xfb58   CJK compatibility Ideograph
0xedfa   -> U+71c1   -> 0xfb59   CJK Unified Ideograph
0xedfb   -> U+71fe   -> 0xfb5a   CJK Unified Ideograph
0xedfc   -> U+72b1   -> 0xfb5b   CJK Unified Ideograph
0xee40   -> U+72be   -> 0xfb5c   CJK Unified Ideograph
0xee41   -> U+7324   -> 0xfb5d   CJK Unified Ideograph
0xee42   -> U+fa16   -> 0xfb5e   CJK compatibility Ideograph
0xee43   -> U+7377   -> 0xfb5f   CJK Unified Ideograph
0xee44   -> U+73bd   -> 0xfb60   CJK Unified Ideograph
0xee45   -> U+73c9   -> 0xfb61   CJK Unified Ideograph
0xee46   -> U+73d6   -> 0xfb62   CJK Unified Ideograph
0xee47   -> U+73e3   -> 0xfb63   CJK Unified Ideograph
0xee48   -> U+73d2   -> 0xfb64   CJK Unified Ideograph
0xee49   -> U+7407   -> 0xfb65   CJK Unified Ideograph
0xee4a   -> U+73f5   -> 0xfb66   CJK Unified Ideograph
0xee4b   -> U+7426   -> 0xfb67   CJK Unified Ideograph
0xee4c   -> U+742a   -> 0xfb68   CJK Unified Ideograph
0xee4d   -> U+7429   -> 0xfb69   CJK Unified Ideograph
0xee4e   -> U+742e   -> 0xfb6a   CJK Unified Ideograph
0xee4f   -> U+7462   -> 0xfb6b   CJK Unified Ideograph
0xee50   -> U+7489   -> 0xfb6c   CJK Unified Ideograph
0xee51   -> U+749f   -> 0xfb6d   CJK Unified Ideograph
0xee52   -> U+7501   -> 0xfb6e   CJK Unified Ideograph
0xee53   -> U+756f   -> 0xfb6f   CJK Unified Ideograph
0xee54   -> U+7682   -> 0xfb70   CJK Unified Ideograph
0xee55   -> U+769c   -> 0xfb71   CJK Unified Ideograph
0xee56   -> U+769e   -> 0xfb72   CJK Unified Ideograph
0xee57   -> U+769b   -> 0xfb73   CJK Unified Ideograph
0xee58   -> U+76a6   -> 0xfb74   CJK Unified Ideograph
0xee59   -> U+fa17   -> 0xfb75   CJK compatibility Ideograph
0xee5a   -> U+7746   -> 0xfb76   CJK Unified Ideograph
0xee5b   -> U+52af   -> 0xfb77   CJK Unified Ideograph
0xee5c   -> U+7821   -> 0xfb78   CJK Unified Ideograph
0xee5d   -> U+784e   -> 0xfb79   CJK Unified Ideograph
0xee5e   -> U+7864   -> 0xfb7a   CJK Unified Ideograph
0xee5f   -> U+787a   -> 0xfb7b   CJK Unified Ideograph
0xee60   -> U+7930   -> 0xfb7c   CJK Unified Ideograph
0xee61   -> U+fa18   -> 0xfb7d   CJK compatibility Ideograph
0xee62   -> U+fa19   -> 0xfb7e   CJK compatibility Ideograph
0xee63   -> U+fa1a   -> 0xfb80   CJK compatibility Ideograph
0xee64   -> U+7994   -> 0xfb81   CJK Unified Ideograph
0xee65   -> U+fa1b   -> 0xfb82   CJK compatibility Ideograph
0xee66   -> U+799b   -> 0xfb83   CJK Unified Ideograph
0xee67   -> U+7ad1   -> 0xfb84   CJK Unified Ideograph
0xee68   -> U+7ae7   -> 0xfb85   CJK Unified Ideograph
0xee69   -> U+fa1c   -> 0xfb86   CJK compatibility Ideograph
0xee6a   -> U+7aeb   -> 0xfb87   CJK Unified Ideograph
0xee6b   -> U+7b9e   -> 0xfb88   CJK Unified Ideograph
0xee6c   -> U+fa1d   -> 0xfb89   CJK compatibility Ideograph
0xee6d   -> U+7d48   -> 0xfb8a   CJK Unified Ideograph
0xee6e   -> U+7d5c   -> 0xfb8b   CJK Unified Ideograph
0xee6f   -> U+7db7   -> 0xfb8c   CJK Unified Ideograph
0xee70   -> U+7da0   -> 0xfb8d   CJK Unified Ideograph
0xee71   -> U+7dd6   -> 0xfb8e   CJK Unified Ideograph
0xee72   -> U+7e52   -> 0xfb8f   CJK Unified Ideograph
0xee73   -> U+7f47   -> 0xfb90   CJK Unified Ideograph
0xee74   -> U+7fa1   -> 0xfb91   CJK Unified Ideograph
0xee75   -> U+fa1e   -> 0xfb92   CJK compatibility Ideograph
0xee76   -> U+8301   -> 0xfb93   CJK Unified Ideograph
0xee77   -> U+8362   -> 0xfb94   CJK Unified Ideograph
0xee78   -> U+837f   -> 0xfb95   CJK Unified Ideograph
0xee79   -> U+83c7   -> 0xfb96   CJK Unified Ideograph
0xee7a   -> U+83f6   -> 0xfb97   CJK Unified Ideograph
0xee7b   -> U+8448   -> 0xfb98   CJK Unified Ideograph
0xee7c   -> U+84b4   -> 0xfb99   CJK Unified Ideograph
0xee7d   -> U+8553   -> 0xfb9a   CJK Unified Ideograph
0xee7e   -> U+8559   -> 0xfb9b   CJK Unified Ideograph
0xee80   -> U+856b   -> 0xfb9c   CJK Unified Ideograph
0xee81   -> U+fa1f   -> 0xfb9d   CJK compatibility Ideograph
0xee82   -> U+85b0   -> 0xfb9e   CJK Unified Ideograph
0xee83   -> U+fa20   -> 0xfb9f   CJK compatibility Ideograph
0xee84   -> U+fa21   -> 0xfba0   CJK compatibility Ideograph
0xee85   -> U+8807   -> 0xfba1   CJK Unified Ideograph
0xee86   -> U+88f5   -> 0xfba2   CJK Unified Ideograph
0xee87   -> U+8a12   -> 0xfba3   CJK Unified Ideograph
0xee88   -> U+8a37   -> 0xfba4   CJK Unified Ideograph
0xee89   -> U+8a79   -> 0xfba5   CJK Unified Ideograph
0xee8a   -> U+8aa7   -> 0xfba6   CJK Unified Ideograph
0xee8b   -> U+8abe   -> 0xfba7   CJK Unified Ideograph
0xee8c   -> U+8adf   -> 0xfba8   CJK Unified Ideograph
0xee8d   -> U+fa22   -> 0xfba9   CJK compatibility Ideograph
0xee8e   -> U+8af6   -> 0xfbaa   CJK Unified Ideograph
0xee8f   -> U+8b53   -> 0xfbab   CJK Unified Ideograph
0xee90   -> U+8b7f   -> 0xfbac   CJK Unified Ideograph
0xee91   -> U+8cf0   -> 0xfbad   CJK Unified Ideograph
0xee92   -> U+8cf4   -> 0xfbae   CJK Unified Ideograph
0xee93   -> U+8d12   -> 0xfbaf   CJK Unified Ideograph
0xee94   -> U+8d76   -> 0xfbb0   CJK Unified Ideograph
0xee95   -> U+fa23   -> 0xfbb1   CJK compatibility Ideograph
0xee96   -> U+8ecf   -> 0xfbb2   CJK Unified Ideograph
0xee97   -> U+fa24   -> 0xfbb3   CJK compatibility Ideograph
0xee98   -> U+fa25   -> 0xfbb4   CJK compatibility Ideograph
0xee99   -> U+9067   -> 0xfbb5   CJK Unified Ideograph
0xee9a   -> U+90de   -> 0xfbb6   CJK Unified Ideograph
0xee9b   -> U+fa26   -> 0xfbb7   CJK compatibility Ideograph
0xee9c   -> U+9115   -> 0xfbb8   CJK Unified Ideograph
0xee9d   -> U+9127   -> 0xfbb9   CJK Unified Ideograph
0xee9e   -> U+91da   -> 0xfbba   CJK Unified Ideograph
0xee9f   -> U+91d7   -> 0xfbbb   CJK Unified Ideograph
0xeea0   -> U+91de   -> 0xfbbc   CJK Unified Ideograph
0xeea1   -> U+91ed   -> 0xfbbd   CJK Unified Ideograph
0xeea2   -> U+91ee   -> 0xfbbe   CJK Unified Ideograph
0xeea3   -> U+91e4   -> 0xfbbf   CJK Unified Ideograph
0xeea4   -> U+91e5   -> 0xfbc0   CJK Unified Ideograph
0xeea5   -> U+9206   -> 0xfbc1   CJK Unified Ideograph
0xeea6   -> U+9210   -> 0xfbc2   CJK Unified Ideograph
0xeea7   -> U+920a   -> 0xfbc3   CJK Unified Ideograph
0xeea8   -> U+923a   -> 0xfbc4   CJK Unified Ideograph
0xeea9   -> U+9240   -> 0xfbc5   CJK Unified Ideograph
0xeeaa   -> U+923c   -> 0xfbc6   CJK Unified Ideograph
0xeeab   -> U+924e   -> 0xfbc7   CJK Unified Ideograph
0xeeac   -> U+9259   -> 0xfbc8   CJK Unified Ideograph
0xeead   -> U+9251   -> 0xfbc9   CJK Unified Ideograph
0xeeae   -> U+9239   -> 0xfbca   CJK Unified Ideograph
0xeeaf   -> U+9267   -> 0xfbcb   CJK Unified Ideograph
0xeeb0   -> U+92a7   -> 0xfbcc   CJK Unified Ideograph
0xeeb1   -> U+9277   -> 0xfbcd   CJK Unified Ideograph
0xeeb2   -> U+9278   -> 0xfbce   CJK Unified Ideograph
0xeeb3   -> U+92e7   -> 0xfbcf   CJK Unified Ideograph
0xeeb4   -> U+92d7   -> 0xfbd0   CJK Unified Ideograph
0xeeb5   -> U+92d9   -> 0xfbd1   CJK Unified Ideograph
0xeeb6   -> U+92d0   -> 0xfbd2   CJK Unified Ideograph
0xeeb7   -> U+fa27   -> 0xfbd3   CJK compatibility Ideograph
0xeeb8   -> U+92d5   -> 0xfbd4   CJK Unified Ideograph
0xeeb9   -> U+92e0   -> 0xfbd5   CJK Unified Ideograph
0xeeba   -> U+92d3   -> 0xfbd6   CJK Unified Ideograph
0xeebb   -> U+9325   -> 0xfbd7   CJK Unified Ideograph
0xeebc   -> U+9321   -> 0xfbd8   CJK Unified Ideograph
0xeebd   -> U+92fb   -> 0xfbd9   CJK Unified Ideograph
0xeebe   -> U+fa28   -> 0xfbda   CJK compatibility Ideograph
0xeebf   -> U+931e   -> 0xfbdb   CJK Unified Ideograph
0xeec0   -> U+92ff   -> 0xfbdc   CJK Unified Ideograph
0xeec1   -> U+931d   -> 0xfbdd   CJK Unified Ideograph
0xeec2   -> U+9302   -> 0xfbde   CJK Unified Ideograph
0xeec3   -> U+9370   -> 0xfbdf   CJK Unified Ideograph
0xeec4   -> U+9357   -> 0xfbe0   CJK Unified Ideograph
0xeec5   -> U+93a4   -> 0xfbe1   CJK Unified Ideograph
0xeec6   -> U+93c6   -> 0xfbe2   CJK Unified Ideograph
0xeec7   -> U+93de   -> 0xfbe3   CJK Unified Ideograph
0xeec8   -> U+93f8   -> 0xfbe4   CJK Unified Ideograph
0xeec9   -> U+9431   -> 0xfbe5   CJK Unified Ideograph
0xeeca   -> U+9445   -> 0xfbe6   CJK Unified Ideograph
0xeecb   -> U+9448   -> 0xfbe7   CJK Unified Ideograph
0xeecc   -> U+9592   -> 0xfbe8   CJK Unified Ideograph
0xeecd   -> U+f9dc   -> 0xfbe9   CJK compatibility Ideograph
0xeece   -> U+fa29   -> 0xfbea   CJK compatibility Ideograph
0xeecf   -> U+969d   -> 0xfbeb   CJK Unified Ideograph
0xeed0   -> U+96af   -> 0xfbec   CJK Unified Ideograph
0xeed1   -> U+9733   -> 0xfbed   CJK Unified Ideograph
0xeed2   -> U+973b   -> 0xfbee   CJK Unified Ideograph
0xeed3   -> U+9743   -> 0xfbef   CJK Unified Ideograph
0xeed4   -> U+974d   -> 0xfbf0   CJK Unified Ideograph
0xeed5   -> U+974f   -> 0xfbf1   CJK Unified Ideograph
0xeed6   -> U+9751   -> 0xfbf2   CJK Unified Ideograph
0xeed7   -> U+9755   -> 0xfbf3   CJK Unified Ideograph
0xeed8   -> U+9857   -> 0xfbf4   CJK Unified Ideograph
0xeed9   -> U+9865   -> 0xfbf5   CJK Unified Ideograph
0xeeda   -> U+fa2a   -> 0xfbf6   CJK compatibility Ideograph
0xeedb   -> U+fa2b   -> 0xfbf7   CJK compatibility Ideograph
0xeedc   -> U+9927   -> 0xfbf8   CJK Unified Ideograph
0xeedd   -> U+fa2c   -> 0xfbf9   CJK compatibility Ideograph
0xeede   -> U+999e   -> 0xfbfa   CJK Unified Ideograph
0xeedf   -> U+9a4e   -> 0xfbfb   CJK Unified Ideograph
0xeee0   -> U+9ad9   -> 0xfbfc   CJK Unified Ideograph
0xeee1   -> U+9adc   -> 0xfc40   CJK Unified Ideograph
0xeee2   -> U+9b75   -> 0xfc41   CJK Unified Ideograph
0xeee3   -> U+9b72   -> 0xfc42   CJK Unified Ideograph
0xeee4   -> U+9b8f   -> 0xfc43   CJK Unified Ideograph
0xeee5   -> U+9bb1   -> 0xfc44   CJK Unified Ideograph
0xeee6   -> U+9bbb   -> 0xfc45   CJK Unified Ideograph
0xeee7   -> U+9c00   -> 0xfc46   CJK Unified Ideograph
0xeee8   -> U+9d70   -> 0xfc47   CJK Unified Ideograph
0xeee9   -> U+9d6b   -> 0xfc48   CJK Unified Ideograph
0xeeea   -> U+fa2d   -> 0xfc49   CJK compatibility Ideograph
0xeeeb   -> U+9e19   -> 0xfc4a   CJK Unified Ideograph
0xeeec   -> U+9ed1   -> 0xfc4b   CJK Unified Ideograph
0xeeef   -> U+2170   -> 0xfa40   Small Roman Numeral One
0xeef0   -> U+2171   -> 0xfa41   Small Roman Numeral Two
0xeef1   -> U+2172   -> 0xfa42   Small Roman Numeral Three
0xeef2   -> U+2173   -> 0xfa43   Small Roman Numeral Four
0xeef3   -> U+2174   -> 0xfa44   Small Roman Numeral Five
0xeef4   -> U+2175   -> 0xfa45   Small Roman Numeral Six
0xeef5   -> U+2176   -> 0xfa46   Small Roman Numeral Seven
0xeef6   -> U+2177   -> 0xfa47   Small Roman Numeral Eight
0xeef7   -> U+2178   -> 0xfa48   Small Roman Numeral Nine
0xeef8   -> U+2179   -> 0xfa49   Small Roman Numeral Ten
0xeef9   -> U+ffe2   -> 0x81ca   Fullwidth Not Sign
0xeefa   -> U+ffe4   -> 0xfa55   Fullwidth Broken Bar
0xeefb   -> U+ff07   -> 0xfa56   Fullwidth Apostrophe
0xeefc   -> U+ff02   -> 0xfa57   Fullwidth Quotation Mark
0xfa4a   -> U+2160   -> 0x8754   Roman Numeral One
0xfa4b   -> U+2161   -> 0x8755   Roman Numeral Two
0xfa4c   -> U+2162   -> 0x8756   Roman Numeral Three
0xfa4d   -> U+2163   -> 0x8757   Roman Numeral Four
0xfa4e   -> U+2164   -> 0x8758   Roman Numeral Five
0xfa4f   -> U+2165   -> 0x8759   Roman Numeral Six
0xfa50   -> U+2166   -> 0x875a   Roman Numeral Seven
0xfa51   -> U+2167   -> 0x875b   Roman Numeral Eight
0xfa52   -> U+2168   -> 0x875c   Roman Numeral Nine
0xfa53   -> U+2169   -> 0x875d   Roman Numeral Ten
0xfa54   -> U+ffe2   -> 0x81ca   Fullwidth Not Sign
0xfa58   -> U+3231   -> 0x878a   Parenthesized Ideograph Stock
0xfa59   -> U+2116   -> 0x8782   Numero Sign
0xfa5a   -> U+2121   -> 0x8784   Telephone Sign
0xfa5b   -> U+2235   -> 0x81e6   Because"))
  (with-input-from-string (s kb170559)
    (do ((line (read-line s nil)
           (read-line s nil)))
    ((null line))
      (let* ((*read-base* 16)
         (ucs (read-from-string (subseq line 14 18)))
         (cp932 (read-from-string (subseq line 26 30))))
    (setf (gethash ucs *ucs-to-cp932-hash*) cp932)))))

add X11 compound text?

Hi, wondering if it'd be possible to add enc/dec of compound text. There are some examples in swank backends.

UTF-8 encoder allows to encode codepoints in range #xD800 - #xDFFF

Such code-points do not represent unicode characters.
This also breaks the non-ambiguity of :utf-8 encoding:

(babel:string-to-octets (string (code-char #xd800)))
; => #(237 160 128)
(babel:octets-to-string *)
; Evaluation aborted on #<BABEL-ENCODINGS:CHARACTER-OUT-OF-RANGE {10053D9533}>.

For example sbcl throws an error in such case:

(sb-ext:string-to-octets (string (code-char #xd800)))
; Evaluation aborted on #<SB-IMPL::OCTETS-ENCODING-ERROR {10013BEA23}>.

This seems to affect some other utf/ucs encodings as well (like :utf-16be or :utf-16le).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.