utf8gen-1.1/0000755000175000017500000000000013331374125011421 5ustar paulpaulutf8gen-1.1/README0000644000175000017500000000450613331351244012303 0ustar paulpaulThis is the README file for the utf8gen package. This package contains the program utf8gen, a utility for reading in hexadecimal numbers from an input source, one per line, and printing them as UTF-8 byte sequences. Several options allow various forms of output. Consult the utfgen(1) man page and the utfgen Texinfo file for more information. Read the man page with the command man utf8gen following a "make install" step. Read the Texinfo user guide with the command info utf8gen Information about the latest version is in the NEWS file. If you downloaded this source package, instructions for building and installation can be found in the INSTALL file and license information is in the COPYING file. If you are a downstream maintainer porting this package to a new architecture, you can remove all files that Autotools added with the command autoreconf -f -i && ./configure && make orig In all other cases, typing the following command will usually build the software on your system: ./configure && make Then consult the INSTALL file for installation instructions. LICENSES -------- Licenses are contained in the COPYING file. A summary of these licenses appears below. Source Code License: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Documentation License: Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. You should have received a copy of the GNU Free Documentation License along with this program. If not, see . utf8gen-1.1/Makefile.am0000644000175000017500000000066213331247752013466 0ustar paulpaul## Process this file with automake to produce Makefile.in SUBDIRS = doc man src test # # Add "orig" target to remove all Autotools-added files left over from # # autoreconf && ./configure && make && make distclean # orig: distclean \rm -rf aclocal.m4 autom4te.cache build-aux configure *~ */*~ \ INSTALL Makefile.in doc/Makefile.in man/Makefile.in \ src/Makefile.in test/Makefile.in src/config.h.in doc/utf8gen.info utf8gen-1.1/man/0000755000175000017500000000000013331374125012174 5ustar paulpaulutf8gen-1.1/man/Makefile.am0000664000175000017500000000015113322700070014217 0ustar paulpaul## Process this file with automake to produce Makefile.in man_MANS = utf8gen.1 EXTRA_DIST = $(man_MANS) utf8gen-1.1/man/utf8gen.10000644000175000017500000000553013322455005013636 0ustar paulpaul.TH UTF8GEN 1 "2018 Jun 30" .SH NAME utf8gen \- Generate UTF-8 output from hexadecimal input .SH SYNOPSIS .br \fButf8gen\fP [ [-e \fIformat1\fP] | [-E \fIformat2\fP] ] [-r \fIformatr\fP] [ [-u \fIutf8_format\fP] | -n] [-c] [-s] [-i \fIinput_file\fP] [-o \fIoutput_file\fP] .SH DESCRIPTION .B utf8gen reads a list of hexadecimal ASCII values in the range 0 through 10FFFF, one per line, and prints the UTF-8 encoding of that number as a Unicode code point. .PP Each input line must begin with a hexadecimal number. A string may follow after that, which can be echoed to the output as the "remainder" (see the -r option below). The total input line length, including an ending newline, is limited to 4096 bytes. .SH OPTIONS .TP 6 \-c After the UTF-8 codes are printed, print a space followed by the character that the hexadecimal code point represents. .TP \-e Echo the input code point in one format, using the printf(3) format string \fIformat1\fP. .TP \-E Echo the input code point in two formats, using the printf(3) format string \fIformat2\fP. .TP \-n Do \fInot\fP print the UTF-8 byte values. This can be useful if only the printed character itself is desired; see the \-c option. .TP \-r Print the remainder of the input string after the initial hexadecimal digits, using the printf(3) format string \fIformatr\fP. .TP \-s Swap the order of output: print the UTF-8 output portion first, then print the input string portion. This can be useful for generating code containing a UTF-8 encoding followed by a comment that contains the input hexadecimal digits. .TP \-u Print the UTF-8 encoded value of the input hexadecimal number, as numeric codes for each UTF-8 byte, using the printf(3) format string \fIutf8_format\fP. If no string is specified, a default format of a backslash followed by three octal digits is printed for each byte. .SH EXAMPLES .RS .PP utf8gen -e "0x%04X " -u "\\%03o" .PP utf8gen -E "U+%04x = 0%02o = " .PP utf8gen -s -e " /* U+%04X */" -u "\\%03o" .RE .SH FILES Files contain lines that each begin with an ASCII hexadecimal code in the valid Unicode range 0 through 10FFFF, inclusive. This hexadecimal code may optionally be followed by a space followed by an arbitrary string ending with a newline, up to the limit of 4096 bytes per input line. An example line could be the following (with no indent): .PP .RS 41 Letter 'A' .RE .SH "SEE ALSO" For more detailed explanations and examples of common usage, consult the \fButf8gen\fP texinfo manual. .SH AUTHOR .B utf8gen was written by Paul Hardy. .SH LICENSE .B utf8gen is Copyright \(co 2018 Paul Hardy. .PP This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. .SH BUGS No known bugs exist. utf8gen-1.1/doc/0000755000175000017500000000000013331374125012166 5ustar paulpaulutf8gen-1.1/doc/utf8gen.texi0000644000175000017500000005271713322454672014463 0ustar paulpaul\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename utf8gen.info @settitle utf8gen @setchapternewpage odd @c %**end of header @macro utf @w{UTF-8} @end macro @paragraphindent none @copying This manual describes @command{utf8gen}, a utility for converting Unicode hexadecimal code points into @utf{} as printable characters for immediate viewing and as byte sequences suitable for including in programs. Copyright @copyright{} 2018 Paul Hardy @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts and no Back-Cover Texts. @end quotation @end copying @dircategory Text @direntry * utf8gen: (utf8gen). A utility for converting hexadecimal numbers into @utf{} @end direntry @titlepage @title utf8gen @author Paul Hardy @page @vskip 0pt plus 1filll @insertcopying @end titlepage @contents @node Top, Introduction, (dir), (dir) @menu * Introduction:: General information * Unicode:: Overview of Unicode and @utf{} * Invoking @command{utf8gen}:: Common Use Cases for using @command{utf8gen} * @command{utf8gen} Reference:: Detailed description of the @command{utf8gen} utility @end menu @node Introduction, Unicode, Top, Top @chapter Introduction This document describes some typical uses for @command{utf8gen}, a utility to read ASCII hexadecimal numbers, interpret them as Unicode code points, and output Unicode Transformation Format -- @w{8-bit} (@utf{}). If you have questions, please email @email{unifoundry@@unifoundry.com}. You can check for the latest @command{utf8gen} news at @code{http://unifoundry.com/utf8gen/}. --- Paul Hardy (@email{unifoundry@@unifoundry.com}) 2018 @node Unicode, Invoking @command{utf8gen}, Introduction, Top @chapter Unicode @menu * Unicode Overview:: * Unicode Planes:: * UTF-8:: @end menu @node Unicode Overview, Unicode Planes, , Unicode @section Unicode Overview Unicode arose out of a practical need for a common encoding to represent all of the world's languages on computers. It has grown rapidly over the past 20+ years to contain more than 100,000 glyphs (characters). These glyphs are divided over multiple Unicode @dfn{planes}: @w{Plane 0} through @w{Plane 16} (decimal), for a total of 17 planes. Each plane contains 64k @dfn{code points}, which in hexadecimal is 10000 code points. @dfn{Code point} is a more general term than @dfn{character}, because Unicode contains more than just visible characters; for example, Unicode contains various code points for indicating variation selection for scripts that have multiple forms of a visible character. @node Unicode Planes, UTF-8, Unicode Overview, Unicode @section Unicode Planes @w{Plane 0} contains most of the world's modern scripts. Code points in this range are denoted as Unicode code points U+0000 through U+FFFF, inclusive. This plane is also known as the Basic Multilingual Plane, or BMP. The ASCII code points are in the beginning of the BMP, from U+0000 through U+007F. The BMP is almost entirely allocated --- there are hardly any free code point ranges in the BMP for assigning new scripts. Fortunately, Unicode has 16 more planes beyone @w{Plane 0.} @w{Plane 1} contains many ancient scripts, and modern collections that were not assigned to @w{Plane 0} (for example, emoji). This plane is also known as the Supplementary Multilingual Plane, or SMP. Unicode code points in the SMP are in the range U+10000 through U+1FFFF, inclusive. @w{Plane 2} is the Supplementary Ideographic Plane, or SIP. It contains Chinese and Japanese ideographs that were not included in @w{Plane 0.} Unicode code points in @w{Plane 2} are in the range U+20000 through U+2FFFF, inclusive. These are the main planes with assigned visible characters. @w{Plane 14} is the Supplementary Special-purpose Plane, or SSP. Its code points are in the range U+E0000 through U+EFFFF. This plane contains specialized tags and other designators. Planes 15 and 16 are Private Use Area (PUA) planes. They can contain any user-defined characters and special-purpose codes. These planes span the Unicode range U+FFFFF through U+10FFFF. @node UTF-8, , Unicode Planes, Unicode @section UTF-8 Thus the valid Unicode range is U+0000 through U+10FFFF, inclusive. Encoding the entire Unicode range takes from one byte for the ASCII range to 21 bits to encode anything in @w{Plane 16} (U+100000 through U+10FFFF). A problem with transmitting these multi-byte numbers is that different computer architectures order bytes in a multi-byte word differently. Today there are only two common orderings: big-endian, where the largest byte is stored first, and little-endian, where the smallest byte is stored first. When transmitting information between computers of different architectures, a standard protocol had to be defined. The Unicode encoding that modern web browsers use is called Unicode Transformation Format -- @w{8-bit,} or @utf{}. It has also become the standard encoding for text documents that contain non-ASCII characters. @utf{} encoding has several desirable characteristics, which are described briefly below. The first byte in a @utf{} encoded character begins with a series of @samp{1}@tie{}bits, to indicate how many bytes the character requires, except a one-byte character starts with a @samp{0}@tie{}bit to designate the byte as ASCII. The ASCII range, U+0000 through U+007F, is encoded the same in @utf{}, as just one byte. Each byte after the first in a multi-byte character begins with the bits@tie{}@samp{10}. The number of bytes in a @utf{} encoded Unicode code point varies from one to four bytes. Thus it is an efficient encoding compared to one that would transmit the same number of bytes for every character across the entire Unicode range. No single-byte @utf{} character will ever begin with the pattern @samp{10}, as single-byte @utf{} characters always begin with a @samp{0}@tie{}bit. So string searching functions can skip bytes within a @utf{} byte string and if a byte currently being examined begins with the bits@tie{}@samp{10}, the search function knows it is past the beginning of a multi-byte character. Unicode code points are published in @dfn{code charts}, available at @url{http://unicode.org/}. These code charts number code points using hexadecimal. These hexadecimal numbers must be converted to @utf{} for transmission on web pages, storing in a text document, etc. Hence the creation of @command{utf8gen}. @node Invoking @command{utf8gen}, @command{utf8gen} Reference, Unicode, Top @chapter Invoking @command{utf8gen} @menu * Motivation:: * Printing a Character:: * Code Generation:: * Use Case Summary:: @end menu @node Motivation, Printing a Character, , Invoking @command{utf8gen} @section Motivation This chapter provides examples of typical uses for @command{utf8gen} for programmers and end-users. I needed to generate hundreds of lines of source code containing different @utf{} characters for a set of programs. My searches did not find anything that performed the conversion as I wanted, so I wrote @command{utf8gen}. With the Unicode Standard specifying code point assignments in hexadecimal, it was natural to write software that took a hexadecimal number as input. There are numerous potential forms of output, especially considering the formatting syntax of different programming languages. The purpose of most of the options for @command{utf8gen} is to select various output options. @command{utf8gen} reads in hexadecimal numbers, one per input line. Each number can be followed by a space and a miscellaneous string to the end of the line. That @dfn{remainder} string can optionally be printed on output; more on that later. @node Printing a Character, Code Generation, Motivation, Invoking @command{utf8gen} @section Printing a Character The simplest thing an end-user might want to know is whether their computer has a font that supports a certain Unicode character. The easiest way to use @command{utf8gen} is interactively at a terminal, typing in hexadecimal numbers and looking at the character produced. To do this, run the command @example utf8gen -c -n @end example The @option{-c} option tells @command{utf8gen} to print the input hexadecimal number as a Unicode character on the screen. The @option{-n} option tells @command{utf8gen} to @emph{not} print the @utf{} byte sequence as a set of formatted numbers. Just enter one hexadecimal number in the range 0 through 10FFFF, inclusive, one number per line. When finished running @command{utf8gen} interactively in this way, end your input by typing @key{C-d}. @node Code Generation, Use Case Summary, Printing a Character, Invoking @command{utf8gen} @section Code Generation @menu * The Usefulness of Octal:: * Commenting Code:: * Remainder Strings:: * UTF-8 Output Format:: * Input and Output Files:: @end menu @node The Usefulness of Octal, Commenting Code, , Code Generation @subsection The Usefulness of Octal If converting hexadecimal numbers into a form that a programming language accepts, there are many possiblilties. For this reason, @command{utf8gen} accepts format strings in the style of the C @code{printf} function. This was a natural choice, as @command{utf8gen} is written @w{in C.} With eight bits in a byte, and @utf{} encoded characters starting either with a @samp{0}@tie{}bit for ASCII or with @samp{10} for all but the first byte in a multi-byte sequence, it is convenient to look at Unicode code point numbers encoded as octal. If a byte in a @utf{} byte string begins with @samp{10}, this leaves six bits for the remainder of the byte. This is conveniently viewed as two octal digits. The default output of @command{utf8gen} is simply the sequence of octal digits in a @utf{} character, printed in the C style of a backslash followed by three octal digits per byte. This is handy for a quick copy and paste of a single @utf{} byte sequence into a program. If using the C-style backslashed octal number format, it can be reassuring to see what a Unicode code point is in octal (at least it was for me, when I first wrote the program and was verifying its proper operation). A simple way of doing this is to have @command{utf8gen} echo the input hexadecimal number you typed in as octal, and then print the @utf{} representation. To do this, run a command of the form @example utf8gen -e "%03o = " @end example For example, if you enter the hexadecimal number @kbd{2134} (the Unicode code point for @samp{Script Small Letter O} in the @samp{Letterlike Symbols} block, @command{utf8gen} will generate this output: @example 20464 = \342\204\264 @end example The hexadecimal number 2134 is 20464 in octal. Notice how two octal digits from the Unicode code point appear in each @utf{} byte except for the first byte. The leading octal digit of @samp{2} represents the leading two bits @samp{10} in a @utf{} multi-byte sequence. The first byte in a multi-byte @utf{} sequence starts with a string of @samp{1}@tie{}bits indicating how many bytes long the encoded character is. In this case, the @utf{} representation of U+2134 will take three bytes, so the first byte begins with the bit string @samp{1110}. That corresponds to the first two octal digits (@samp{34}) of the first byte in the sequence,@tie{}@samp{\342}. Looking at the sequence @samp{\342\204\264} again, it is easy to see the placement of the octal representation of this Unicode code point, 20464. In this way, verifying the proper conversion of the hexadecimal Unicode code point to @utf{} is straightforward. @node Commenting Code, Remainder Strings, The Usefulness of Octal, Code Generation @subsection Commenting Code Commenting code is of course useful, especially when dealing with something as arcane as raw @utf{} byte sequences. @command{utf8gen} provides various ways of doing this. A couple of examples should suffice to give you an idea of these capabilities. The simplest method for creating comments might be to follow an octal sequence with the Unicode code point in its canonical form. The @option{-e} option @emph{echoes} the input number to the output using the format string that follows. This will accomplish that: @example utf8gen -e "/* U+%04X */ " @end example For the hexadecimal input number 2134, this produces the output @example /* U+2134 */ \342\204\264 @end example The expectation is that a programmer will be able to use an editor that can take a string like @samp{\342\204\264} and easily convert it into a @dfn{print}-style command in the programming language of choice. It might be preferable to print the comment after the @utf{} byte sequence. The @option{-s} option allows this by @dfn{swapping} the default output string order. For example, the command @example utf8gen -e " /* U+%04X */" -s @end example produces the output (again, using 2134 as the input number) of @example \342\204\264 /* U+2134 */ @end example It might even be useful to output the initial hexadecimal number using two different bases. This is accomplished with the @option{-E} option, followed by the format string for echoing the input number in two ways. For example, the command @example utf8gen -E " /* U+%04X = 0%o */" -s @end example produces the output (with an input of @samp{2134}) @example \342\204\264 /* U+2134 = 020464 */ @end example @node Remainder Strings, UTF-8 Output Format, Commenting Code, Code Generation @subsection Remainder Strings One can only glean so much by looking at numbers though. A textual comment describing a Unicode code point can also help. @command{utf8gen} supports printing free-form text following an initial hexadecimal number followed by a space. This is done with the @option{-r} option, to print the @dfn{remainder} of the input line, using the format string that follows this option. The Unicode Consortium makes various data files available with a free use license. The first field is usually the Unicode code point in hexadecimal. Remaining fields will contain information about each code point. For example, given the following line of input: @example 2134 SCRIPT SMALL O @end example This command @example utf8gen -e " /* U+%04X " -s -r "%s */" @end example will produce this output: @example \342\204\264 /* U+2134 SCRIPT SMALL O */ @end example This can facilitate batch processing of large portions of a Unicode data file. @node UTF-8 Output Format, Input and Output Files, Remainder Strings, Code Generation @subsection UTF-8 Output Format @command{utf8gen} also allows specifying the format of the encoded @utf{} bytes with the @option{-u} option followed by a format string. For example, suppose the programming language you use will accept bytes in hexadecimal using the form @code{\x} followed by a hexadecimal number. If we take the previous example input line and provide it to @command{utf8gen} with the comand @example utf8gen -u "\x%02x" -r " /* %s */" -s @end example this will produce the output @example \xe2\x84\xb4 /* SMALL SCRIPT O */ @end example If the @option{-r} option is selected but there is nothing after the hexadecimal number on an input line, no remainder content will be printed. Of course, you could also use the @option{-e} or @option{-E} options to echo back the input number in the desired output format(s) by adding it to the command line. @node Input and Output Files, , UTF-8 Output Format, Code Generation @subsection Input and Output Files This information can be extracted and provided as an input file to @command{utf8gen} using the @option{-i} option to specify an input file. Output can be written to a file using the @option{-o} option. @node Use Case Summary, , Code Generation, Invoking @command{utf8gen} @section Use Case Summary The descriptions in this chapter give a brief overview of all of the @option{utf8gen} options and how they might be used in practice. @command{utf8gen} tries to strike a balance between the basics that a programmer might find useful for bulk conversion of a large number of hexadecimal Unicode code points versus creeping featurism. While @command{utf8gen} won't write your program for you, it can make the bulk conversion of code points efficient. @node @command{utf8gen} Reference, , Invoking @command{utf8gen}, Top @chapter @command{utf8gen} Reference @comment TROFF INPUT: .TH UTF8GEN 1 "2018 Jun 30" @c @node utf8gen, , , @command{utf8gen} Reference @c @section utf8gen @c DEBUG: print_menu("@section") @menu * NAME:: * SYNOPSIS:: * DESCRIPTION:: * OPTIONS:: * EXAMPLES:: * FILES:: * AUTHOR:: * LICENSE:: * BUGS:: @end menu @comment TROFF INPUT: .SH NAME @node NAME, SYNOPSIS, , @command{utf8gen} Reference @section NAME @c DEBUG: print_menu("utf8gen NAME") utf8gen @minus{} Generate UTF-8 output from hexadecimal input @comment TROFF INPUT: .SH SYNOPSIS @node SYNOPSIS, DESCRIPTION, NAME, @command{utf8gen} Reference @section SYNOPSIS @c DEBUG: print_menu("utf8gen SYNOPSIS") @comment TROFF INPUT: .br @comment .br @b{utf8gen} [ [-e @i{format1}] | [-E @i{format2}] ] [-r @i{formatr}] [ [-u @i{utf8@t{_}format}] | -n] [-c] [-s] [-i @i{input@t{_}file}] [-o @i{output@t{_}file}] @comment TROFF INPUT: .SH DESCRIPTION @node DESCRIPTION, OPTIONS, SYNOPSIS, @command{utf8gen} Reference @section DESCRIPTION @c DEBUG: print_menu("utf8gen DESCRIPTION") @comment TROFF INPUT: .B utf8gen @b{utf8gen} reads a list of hexadecimal ASCII values in the range 0 through 10FFFF, one per line, and prints the UTF-8 encoding of that number as a Unicode code point. @comment TROFF INPUT: .PP Each input line must begin with a hexadecimal number. A string may follow after that, which can be echoed to the output as the "remainder" (see the @option{-r} option below). The total input line length, including an ending newline, is limited to 4096 bytes. @comment TROFF INPUT: .SH OPTIONS @node OPTIONS, EXAMPLES, DESCRIPTION, @command{utf8gen} Reference @section OPTIONS @c DEBUG: print_menu("utf8gen OPTIONS") @comment TROFF INPUT: .TP 6 @c --------------------------------------------------------------------- @table @code @item @option{-c} After the UTF-8 codes are printed, print a space followed by the character that the hexadecimal code point represents. @comment TROFF INPUT: .TP @item @option{-e} Echo the input code point in one format, using the printf(3) format string @i{format1}. @comment TROFF INPUT: .TP @item @option{-E} Echo the input code point in two formats, using the printf(3) format string @i{format2}. @comment TROFF INPUT: .TP @item @option{-n} Do @i{not} print the UTF-8 byte values. This can be useful if only the printed character itself is desired; see the @option{-c} option. @comment TROFF INPUT: .TP @item @option{-r} Print the remainder of the input string after the initial hexadecimal digits, using the printf(3) format string @i{formatr}. @comment TROFF INPUT: .TP @item @option{-s} Swap the order of output: print the UTF-8 output portion first, then print the input string portion. This can be useful for generating code containing a UTF-8 encoding followed by a comment that contains the input hexadecimal digits. @comment TROFF INPUT: .SH EXAMPLES @item @option{-u} Print the UTF-8 encoded value of the input hexadecimal number, as numeric codes for each UTF-8 byte, using the printf(3) format string @i{utf8@t{_}format}. If no string is specified, a default format of a backslash followed by three octal digits is printed for each byte. @comment TROFF INPUT: .TP @end table @c --------------------------------------------------------------------- @node EXAMPLES, FILES, OPTIONS, @command{utf8gen} Reference @section EXAMPLES @c DEBUG: print_menu("utf8gen EXAMPLES") @comment TROFF INPUT: .RS @c --------------------------------------------------------------------- @quotation @comment TROFF INPUT: .PP @code{utf8gen -e "0x%04X " -u "\\%03o"} @comment TROFF INPUT: .PP @code{utf8gen -E "U+%04x = 0%02o = "} @comment TROFF INPUT: .PP @code{utf8gen -s -e " /* U+%04X */" -u "\\%03o"} @comment TROFF INPUT: .RE @end quotation @c --------------------------------------------------------------------- @comment TROFF INPUT: .SH FILES @node FILES, AUTHOR, EXAMPLES, @command{utf8gen} Reference @section FILES @c DEBUG: print_menu("utf8gen FILES") Files contain lines that each begin with an ASCII hexadecimal code in the valid Unicode range 0 through 10FFFF, inclusive. This hexadecimal code may optionally be followed by a space followed by an arbitrary string ending with a newline, up to the limit of 4096 bytes per input line. An example line could be the following (with no indent): @comment TROFF INPUT: .PP @comment TROFF INPUT: .RS @c --------------------------------------------------------------------- @quotation 41 Letter 'A' @comment TROFF INPUT: .RE @end quotation @c --------------------------------------------------------------------- @comment TROFF INPUT: .SH AUTHOR @node AUTHOR, LICENSE, FILES, @command{utf8gen} Reference @section AUTHOR @c DEBUG: print_menu("utf8gen AUTHOR") @comment TROFF INPUT: .B utf8gen @b{utf8gen} was written by Paul Hardy. @comment TROFF INPUT: .SH LICENSE @node LICENSE, BUGS, AUTHOR, @command{utf8gen} Reference @section LICENSE @c DEBUG: print_menu("utf8gen LICENSE") @comment TROFF INPUT: .B utf8gen @b{utf8gen} is Copyright @copyright{} 2018 Paul Hardy. @comment TROFF INPUT: .PP This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. @comment TROFF INPUT: .SH BUGS @node BUGS, , LICENSE, @command{utf8gen} Reference @section BUGS @c DEBUG: print_menu("utf8gen BUGS") No known bugs exist. @bye utf8gen-1.1/doc/Makefile.am0000644000175000017500000000013113322700035014206 0ustar paulpaul## Process this file with automake to produce Makefile.in info_TEXINFOS = utf8gen.texi utf8gen-1.1/configure.ac0000644000175000017500000000060113331224301013672 0ustar paulpaulAC_INIT([utf8gen], [1.1], [unifoundry@unifoundry.com], [utf8gen], [http://www.unifoundry.com/utf8gen/]) AC_PREREQ([2.68]) AC_CONFIG_SRCDIR([src/utf8gen.c]) AC_CONFIG_AUX_DIR([build-aux]) AM_INIT_AUTOMAKE([1.11 subdir-objects -Wall -Werror]) AC_CONFIG_HEADERS([src/config.h]) AC_CONFIG_FILES([Makefile doc/Makefile man/Makefile src/Makefile test/Makefile]) AC_PROG_CC AC_OUTPUT utf8gen-1.1/test/0000755000175000017500000000000013331374125012400 5ustar paulpaulutf8gen-1.1/test/sample2-out.txt0000644000175000017500000000106013316717456015321 0ustar paulpaul\056 . -- Full Stop (Period) \101 A -- Latin Letter Capital 'A' \172 z -- Latin Letter Small 'z' \316\221 Α -- Greek Letter Capital Alpha \317\211 ω -- Greek Letter Small Omega \320\251 Щ -- Cyrillic Capital Letter Shcha \340\244\204 ऄ -- Devanagari Letter Short A \341\232\240 ᚠ -- Runic Letter Fehu Feoh Fe F \342\235\244 ❤ -- Heavy Black Heart \346\227\245 日 -- CJK Ideographs Sun \360\220\202\205 𐂅 -- Linear B Ideogram B105M Stallion \360\235\215\261 𝍱 -- Counting Rod Tens Digit Nine \364\217\277\260 􏿰 -- Unicode Code Point U+10FFF0 utf8gen-1.1/test/Makefile.am0000644000175000017500000000050613322200736014431 0ustar paulpaulcheck_SCRIPTS=test1 test2 test3 TESTS=$(check_SCRIPTS) EXTRA_DIST=README-test $(check_SCRIPTS) test-all sample-in.txt \ sample1-out.txt sample2-out.txt sample3-out.txt AM_TESTS_ENVIRONMENT = utf8gen_path='$(abs_top_builddir)/src' ; \ export utf8gen_path ; installcheck-local: make utf8gen_bindir=${DESTDIR}${bindir} check utf8gen-1.1/test/test20000755000175000017500000000201713322177252013371 0ustar paulpaul#!/bin/sh set -e # # Create temporary directory for test # output if AUTOPKGTEST_TMP is undefined. # Debian GNU/Linux defines AUTOPKGTEST_TMP. # if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then TEST_TMP=$(mktemp -d) trap "\rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM else TEST_TMP=${AUTOPKGTEST_TMP} fi # # Point to the source directory for test. # if [ "x${srcdir}" = "x" ] ; then srcdir=. fi # # Point to binary executable; utf8gen_bindir # should be defined for "make installcheck". # Otherwise, leave undefined for "make check". # if [ "x${utf8gen_bindir}" = "x" ] ; then utf8gen_bindir=../src fi ${utf8gen_bindir}/utf8gen -s -c -r " -- %s" \ < ${srcdir}/sample-in.txt \ > ${TEST_TMP}/test2-out.txt diff ${srcdir}/sample2-out.txt ${TEST_TMP}/test2-out.txt || \ (echo "test1 FAILED; output in ${TEST_TMP}/test1-out.txt" ; exit 1) # # If AUTOPKGTEST_TMP was defined, don't remove it; # a Debian calling process will take care of that. # if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then \rm -rf ${TEST_TMP} fi utf8gen-1.1/test/sample1-out.txt0000644000175000017500000000106013316051456015307 0ustar paulpaul\056 . -- Full Stop (Period) \101 A -- Latin Letter Capital 'A' \172 z -- Latin Letter Small 'z' \316\221 Α -- Greek Letter Capital Alpha \317\211 ω -- Greek Letter Small Omega \320\251 Щ -- Cyrillic Capital Letter Shcha \340\244\204 ऄ -- Devanagari Letter Short A \341\232\240 ᚠ -- Runic Letter Fehu Feoh Fe F \342\235\244 ❤ -- Heavy Black Heart \346\227\245 日 -- CJK Ideographs Sun \360\220\202\205 𐂅 -- Linear B Ideogram B105M Stallion \360\235\215\261 𝍱 -- Counting Rod Tens Digit Nine \364\217\277\260 􏿰 -- Unicode Code Point U+10FFF0 utf8gen-1.1/test/test30000755000175000017500000000201713322177260013371 0ustar paulpaul#!/bin/sh set -e # # Create temporary directory for test # output if AUTOPKGTEST_TMP is undefined. # Debian GNU/Linux defines AUTOPKGTEST_TMP. # if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then TEST_TMP=$(mktemp -d) trap "\rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM else TEST_TMP=${AUTOPKGTEST_TMP} fi # # Point to the source directory for test. # if [ "x${srcdir}" = "x" ] ; then srcdir=. fi # # Point to binary executable; utf8gen_bindir # should be defined for "make installcheck". # Otherwise, leave undefined for "make check". # if [ "x${utf8gen_bindir}" = "x" ] ; then utf8gen_bindir=../src fi ${utf8gen_bindir}/utf8gen -s -c -r " -- %s" \ < ${srcdir}/sample-in.txt \ > ${TEST_TMP}/test3-out.txt diff ${srcdir}/sample3-out.txt ${TEST_TMP}/test3-out.txt || \ (echo "test1 FAILED; output in ${TEST_TMP}/test1-out.txt" ; exit 1) # # If AUTOPKGTEST_TMP was defined, don't remove it; # a Debian calling process will take care of that. # if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then \rm -rf ${TEST_TMP} fi utf8gen-1.1/test/sample3-out.txt0000644000175000017500000000106013316056516015313 0ustar paulpaul\056 . -- Full Stop (Period) \101 A -- Latin Letter Capital 'A' \172 z -- Latin Letter Small 'z' \316\221 Α -- Greek Letter Capital Alpha \317\211 ω -- Greek Letter Small Omega \320\251 Щ -- Cyrillic Capital Letter Shcha \340\244\204 ऄ -- Devanagari Letter Short A \341\232\240 ᚠ -- Runic Letter Fehu Feoh Fe F \342\235\244 ❤ -- Heavy Black Heart \346\227\245 日 -- CJK Ideographs Sun \360\220\202\205 𐂅 -- Linear B Ideogram B105M Stallion \360\235\215\261 𝍱 -- Counting Rod Tens Digit Nine \364\217\277\260 􏿰 -- Unicode Code Point U+10FFF0 utf8gen-1.1/test/test10000755000175000017500000000201713322177245013372 0ustar paulpaul#!/bin/sh set -e # # Create temporary directory for test # output if AUTOPKGTEST_TMP is undefined. # Debian GNU/Linux defines AUTOPKGTEST_TMP. # if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then TEST_TMP=$(mktemp -d) trap "\rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM else TEST_TMP=${AUTOPKGTEST_TMP} fi # # Point to the source directory for test. # if [ "x${srcdir}" = "x" ] ; then srcdir=. fi # # Point to binary executable; utf8gen_bindir # should be defined for "make installcheck". # Otherwise, leave undefined for "make check". # if [ "x${utf8gen_bindir}" = "x" ] ; then utf8gen_bindir=../src fi ${utf8gen_bindir}/utf8gen -s -c -r " -- %s" \ < ${srcdir}/sample-in.txt \ > ${TEST_TMP}/test1-out.txt diff ${srcdir}/sample1-out.txt ${TEST_TMP}/test1-out.txt || \ (echo "test1 FAILED; output in ${TEST_TMP}/test1-out.txt" ; exit 1) # # If AUTOPKGTEST_TMP was defined, don't remove it; # a Debian calling process will take care of that. # if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then \rm -rf ${TEST_TMP} fi utf8gen-1.1/test/sample-in.txt0000644000175000017500000000061213316056402015023 0ustar paulpaul002E Full Stop (Period) 41 Latin Letter Capital 'A' 7a Latin Letter Small 'z' 0391 Greek Letter Capital Alpha 03c9 Greek Letter Small Omega 429 Cyrillic Capital Letter Shcha 0904 Devanagari Letter Short A 16A0 Runic Letter Fehu Feoh Fe F 2764 Heavy Black Heart 65E5 CJK Ideographs Sun 10085 Linear B Ideogram B105M Stallion 1d371 Counting Rod Tens Digit Nine 10FFF0 Unicode Code Point U+10FFF0 utf8gen-1.1/test/test-all0000755000175000017500000000027213316720203014047 0ustar paulpaul#!/bin/sh echo "*** Running Tests..." ./test1 || exit 1 echo "Test 1 PASSED" ./test2 || exit 1 echo "Test 2 PASSED" ./test3 || exit 1 echo "Test 3 PASSED" echo "*** Finished Tests" utf8gen-1.1/test/README-test0000644000175000017500000000211213322200663014224 0ustar paulpaulThe shell script "test-all" can be run from this directory after ../src/utf8gen has been built. Instead of running "test-all" in this directory, it is better to use Autotools to build the Makefiles, and then use the Makefiles to test utf8gen. To do this, perform these steps: 1) cd .. [so you are in the top-level directory] 2) type the following commands: ./configure make make check This will provide diagnostic output for any failed test. If all goes well, the output from "make check" should contain a series of lines like this: PASS: test1 PASS: test2 PASS: test3 ==============================================... Testsuite summary for utf8gen ==============================================... # TOTAL: 3 # PASS: 3 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ==============================================... where is the current version number of utf8gen. If you see that the three tests passed, everything tested correctly. utf8gen-1.1/src/0000755000175000017500000000000013331374125012210 5ustar paulpaulutf8gen-1.1/src/Makefile.am0000644000175000017500000000015613322700111014232 0ustar paulpaul## Process this file with automake to produce Makefile.in bin_PROGRAMS = utf8gen utf8gen_SOURCES = utf8gen.c utf8gen-1.1/src/utf8gen.c0000644000175000017500000004652013321312236013735 0ustar paulpaul/* utf8gen - convert hexadecimal input to UTF-8 numbers Author: Paul Hardy Date: June 2018 Synopsis: utf8gen [ [-e ] | [-E ] ] [-r ] [ [-u ] | -n] [-c] [-s] [-i ] [-o ] Author: Paul Hardy, unifoundry unifoundry.com, June 2018 Copyright (C) 2018 Paul Hardy LICENSE: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ #include /* created by autotools */ #include #include #include #include #include #include #include /* To check functionality for compiling on GNU-based systems */ #define _GNU_SOURCE /* These two defines are for diagnostic & help output */ #ifdef PACKAGE_NAME #define PROG_NAME PACKAGE_NAME #else #define PROG_NAME "utf8gen" #endif #ifdef PACKAGE_VERSION #define PROG_VERSION PACKAGE_VERSION #else #define PROG_VERSION "1.0" #endif #define MAXSTRING 4098 /* maximum number of characters on an input line */ /* For handling errors in system functions */ extern int errno; int main (int argc, char *argv[]) { int i; /* loop variable */ int in_formats=0; /* number of times to print input number */ uint32_t codept; /* Unicode code point to convert */ char instring[MAXSTRING]; /* input line */ unsigned utf8_bytes[5]; /* encoded UTF-8 bytes, ending with null byte */ int print_remainder=0; /* =1 to print input string following code point */ int print_char=0; /* =1 to end output line with +UTF-8 character */ int swap_order=0; /* =1 to print UTF-8 first, then input format(s) */ int print_codes=1; /* print UTF-8 encoding; don't print if == 0 */ int exit_status; /* program exit status */ /* Format strings for printing input number and output UTF-8. By default, do not print the input code point, but print the output UTF-8 character using the default_out format string. */ static char *default_out = "\\%03o"; /* default output UTF-8 format */ char *in_format=""; /* format to print input number */ char *rem_format=""; /* format for input remainder */ char *out_format = default_out; /* format to print output number */ void fatal_error (int, char *); void print_help (); int cvt2utf8 (uint32_t, unsigned *); void fprint_utf8 (FILE *, unsigned *, char *); void print_instring (int, int, char *, int, char *, char *, FILE *); void print_outstring (int, int, unsigned *, char *, FILE *); int interactive=1; /* =1 if reading from terminal, 0 otherwise */ FILE *infp = stdin; /* input file pointer; default is stdin */ FILE *outfp = stdout; /* output file pointer; default is stdout */ exit_status = EXIT_SUCCESS; interactive = isatty (fileno (stdin)) ? 1 : 0; for (i = 1; i < argc; i++) { /* Parse options. If an invalid command line argument was given, print a help menu and exit with error status. */ if (argv[i][0] == '-' && exit_status == EXIT_SUCCESS) { switch (argv[i][1]) { /* Echo input number one way before printing conversion (-e) */ case 'e': if (++i < argc) { in_formats = 1; in_format = argv[i]; } else { fatal_error (interactive, "Missing echo format string after -e"); } break; /* Echo input number two ways before printing conversion (-E) */ case 'E': if (++i < argc) { in_formats = 2; in_format = argv[i]; } else { fatal_error (interactive, "Missing echo format string pair after -E"); } break; /* print remaining string that followed the hexadecimal Unicode code point (-r) */ case 'r': if (++i < argc) { print_remainder = 1; rem_format = argv[i]; } else { fatal_error (interactive, "Missing remainder of string format after -r"); } break; /* UTF-8 output format for each encoded byte (-u) */ case 'u': if (++i < argc) { out_format = argv[i]; } else { fatal_error (interactive, "Missing format string after '-u'"); } break; /* do not print the UTF-8 byte codes (-n) */ case 'n': print_codes = 0; break; /* end line by printing + UTF-8 character (-c) */ case 'c': print_char = 1; break; /* swap output order: print UTF-8 components, then input (-s) */ case 's': swap_order = 1; break; /* input filename (-i) */ case 'i': if (++i < argc) { infp = fopen (argv[i], "r"); if (infp == NULL) { fprintf (stderr, "%s: cannot open %s for input - %s\n\n", PROG_NAME, argv[i], strerror (errno)); exit (EXIT_FAILURE); } } else { fatal_error (interactive, "No input filename give after '-i'"); } break; /* output filename (-o) */ case 'o': if (++i < argc) { outfp = fopen (argv[i], "w"); if (outfp == NULL) { fprintf (stderr, "%s: cannot open %s for output - %s\n\n", PROG_NAME, argv[i], strerror (errno)); exit (EXIT_FAILURE); } } else { fatal_error (interactive, "No output filename give after '-o'"); } break; /* Print help message (-h, -?) */ case 'h': case '?': print_help (); exit (EXIT_SUCCESS); break; /* Option starts with "--"; look for "--help" and "--verbose" */ case '-': /* (--help) */ if (strcmp (&argv[i][2], "help") == 0) { print_help (); exit (EXIT_SUCCESS); } /* (--version) */ else if (strcmp (&argv[i][2], "version") == 0) { printf ("%s %s\n", PROG_NAME, PROG_VERSION); printf ("Copyright (C) 2018 Paul Hardy\n"); printf ("License GPLv2+: GNU GPL version 2 or later \n"); printf ("This is free software: you are free to change and redistribute it.\n"); printf ("There is NO WARRANTY, to the extent permitted by law.\n\n"); exit (EXIT_SUCCESS); } else { fatal_error (interactive, "Unrecognized option"); } break; default: fatal_error (interactive, "Unrecognized option"); break; } } else { if (infp == stdin) fprintf (stderr, "Unrecognized parameter %s\n\n", argv[i]); else fprintf (stderr, "%s: unrecognized parameter %s\n\n", PROG_NAME, argv[i]); print_help (); exit (EXIT_FAILURE); } } /* Read one number per input line, possibly with following string */ codept = 0; /* Initialize to avoid blank line input */ while (fgets (instring, MAXSTRING, infp) != NULL) { /* Get Unicode code point at start of line */ if (instring[0] != '\n' && instring[0] != '\0') { sscanf (instring, "%X", &codept); if (cvt2utf8 (codept, utf8_bytes) > 0) { /* If in Unicode range */ /* Non-swapped output (no option '-s'); echo input line first */ if (swap_order == 0) { /* Print input values first */ print_instring (codept, in_formats,in_format, print_remainder, rem_format, instring, outfp); } /* swap_order == 0 */ /* Print selected UTF-8 encoding output and/or the character itself */ print_outstring (print_char, print_codes, utf8_bytes, out_format, outfp); /* Swapped output (option '-s'); echo input line after other output */ if (swap_order == 1) { /* Print input values last */ print_instring (codept, in_formats,in_format, print_remainder, rem_format, instring, outfp); } /* swap_order == 1 */ fprintf (outfp, "\n"); /* Printed all output for this input line */ } /* cvt2utf8 (codept, utf8_bytes) >= 0 */ else { if (interactive) { /* Print error, but keep going */ fprintf (stderr, "Out of range Unicode value > 10FFFF\n"); } else { /* Non-interactive -- abort */ fatal_error (interactive, "Out of range Unicode value > 10FFFF"); } } /* cvt2utf8 (codept, utf8_bytes) < 0 (invalid code point) */ codept = 0; /* reset to zero in case next input is a blank line */ } /* input line did not start with a newline or '\0' */ } /* while not at end of input */ fclose (outfp); exit (exit_status); } /* Print an error message, print the help menu, then quit with non-zero exit status. If the input file pointer points to stdin, do not print the program naem and begin the message with an uppercase letter. Otherwise, print the program name and begin the message with a lowercase letter. */ void fatal_error (int interactive, char *err_message) { void print_help (); if (interactive) { fprintf (stderr, "%c%s\n\n", toupper (err_message[0]), &err_message[1]); } else { fprintf (stderr, "%s: %c%s\n\n", PROG_NAME, tolower (err_message[0]), &err_message[1]); } if (interactive) print_help (); exit (EXIT_FAILURE); } /* Print a help message. If the input file pointer points to stdin, do not print the program naem and begin the message with an uppercase letter. Otherwise, print the program name and begin the message with a lowercase letter. */ void print_help () { fprintf (stdout, "Syntax: %s { [-e ] | [-E ] } ", PROG_NAME); fprintf (stdout, "[-r ]\n"); fprintf (stdout, " [ [-u ] | -n] [-c] [-s]\n"); fprintf (stdout, " [-i ] [-o ]\n\n"); fprintf (stdout, " , , , and \n"); fprintf (stdout, " are printf format strings\n\n"); fprintf (stdout, " -e Echo input code point in one format\n\n"); fprintf (stdout, " -E Echo input code point in two formats\n\n"); fprintf (stdout, " -r Print remainder of input after code point\n\n"); fprintf (stdout, " -u UTF-8 output format\n\n"); fprintf (stdout, " -n Do not print UTF-8 codes\n\n"); fprintf (stdout, " -c print +UTF-8 character after UTF-8 bytes\n\n"); fprintf (stdout, " -s Swap order: print UTF-8 string first, then input value\n\n"); fprintf (stdout, " -h\n"); fprintf (stdout, " --help This help message\n\n"); fprintf (stdout, " --version Program version information\n\n"); fprintf (stdout, " Examples:\n\n"); fprintf (stdout, " %s -e \"0x%%04X \" -u \"\\%%03o\"\n\n", PROG_NAME); fprintf (stdout, " %s -E \"U+%%04x = 0%%02o = \"\n\n", PROG_NAME); fprintf (stdout, " %s -s -e \" /* U+%%04X */\" -u \"\\%%03o\"\n\n", PROG_NAME); fprintf (stdout, " Valid Unicode values range from hexadecimal 0 through 10FFFF\n\n"); return; } /* Convert a Unicode code point to a UTF-8 string. The allowable Unicode range is U+0000..U+10FFFF. codept - the Unicode code point to encode utf8_bytes - an array of 5 bytes to hold the UTF-8 encoded string; the string will consist of up to 4 UTF-8-encoded bytes, with null bytes after the last encoded byte to signal to the end of the array, utf8_bytes[4]. */ int cvt2utf8 (uint32_t codept, unsigned *utf8_bytes) { int bin_length; /* number of binary digits, for forming UTF-8 */ int byte_length; /* numberof bytes of UTF-8 */ int bin_digits (uint32_t); /* If codept is within the valid Unicode range of 0x0 through 0x10FFFF inclusive, convert it to UTF-8. */ if (codept <= 0x10FFFF) { byte_length = 0; bin_length = bin_digits (codept); if (bin_length < 8) { /* U+0000..U+007F */ byte_length = 1; utf8_bytes [0] = codept; utf8_bytes [1] = utf8_bytes [2] = utf8_bytes [3] = utf8_bytes [4] = 0; } else if (bin_length < 12) { /* U+0080..U+07FF */ byte_length = 2; utf8_bytes [0] = 0xC0 | ((codept >> 6) & 0x1F); utf8_bytes [1] = 0x80 | ( codept & 0x3F); utf8_bytes [2] = utf8_bytes [3] = utf8_bytes [4] = 0; } else if (bin_length < 17) { /* U+0800..U+FFFF */ byte_length = 3; utf8_bytes [0] = 0xE0 | ((codept >> 12) & 0x0F); utf8_bytes [1] = 0x80 | ((codept >> 6) & 0x3F); utf8_bytes [2] = 0x80 | ( codept & 0x3F); utf8_bytes [3] = utf8_bytes [4] = 0; } else if (bin_length < 22) { /* U+010000..U+10FFFF */ byte_length = 4; utf8_bytes [0] = 0xF0 | ((codept >> 18) & 0x07); utf8_bytes [1] = 0x80 | ((codept >> 12) & 0x3F); utf8_bytes [2] = 0x80 | ((codept >> 6) & 0x3F); utf8_bytes [3] = 0x80 | ( codept & 0x3F); utf8_bytes [4] = 0; } } /* encoded output for valid Unicode code point */ else { /* flag out of range Unicode code point */ /* 0xFF is never a valid UTF-8 code point, so testing for it will be an easy check of a valid return value. */ byte_length = -1; utf8_bytes [0] = 0xFF; utf8_bytes [1] = 0xFF; utf8_bytes [2] = 0xFF; utf8_bytes [3] = 0xFF; utf8_bytes [4] = 0; } return byte_length; } /* Print an array of bytes comprising one UTF-8 encoded character. outfp - the output stream file pointer utf8_bytes - an array of 7 bytes holding a null-terminated UTF-8 string utf_format - format for fprintf to use with each byte */ void fprint_utf8 (FILE *outfp, unsigned *utf8_bytes, char *utf_format) { int i; /* loop variable */ for (i = 0; utf8_bytes[i] != 0x00; i++) fprintf (outfp, utf_format, utf8_bytes[i] & 0xFF); return; } /* Return the number of significant binary digits in an unsigned number. */ int bin_digits (uint32_t itest) { uint32_t i; int result; i = 0x80000000; /* mask highest uint32_t bit */ result = 32; while ( (i != 0) && ((itest & i) == 0) ) { i >>= 1; result--; } return result; } /* Output the input line in the desired format. codept The Unicode code point in_formats Number of ways to format the input code point (1 or 2) in_format The format string to use to print the code point print_remainder 1 if printing remainder of input string, 0 otherwise rem_format The format string for the remainder of input following the code point + space instring The entire line of input, null terminated outfp The output file pointer */ void print_instring (int codept, int in_formats, char *in_format, int print_remainder, char *rem_format, char *instring, FILE *outfp) { int i; /* loop variable */ if (in_formats == 1) /* option '-e' specified */ fprintf (outfp, in_format, codept); else if (in_formats == 2) /* option '-E' specified */ fprintf (outfp, in_format, codept, codept); if (print_remainder == 1) { /* option '-r' specified */ instring [strlen (instring) - 1] = '\0'; /* Find start of hexadecimal number */ for (i = 0; isspace (instring[i]) || instring[i] == '\0'; i++); if (instring[i] != '\0') { /* Find space after hexadecimal number */ for (i = 0; !isspace (instring[i]) || instring[i] == '\0'; i++); i++; /* Skip the space character */ /* If a valid string, print it */ if (i < strlen (instring) && instring[i] != '\0') { fprintf (outfp, rem_format, &instring[i]); } } /* instring[i] != '\0' */ } /* print_remainder == 1 */ return; } /* Print UTF-8 encoded output string and/or UTF-8 character print_char 1 to print the Unicode character itself as a UTF-8 byte stream print_codes 1 to print UTF-8 encoded bytes numerically utf8_bytes The byte string of UTF-8 values, null-terminated out_format The output format to use for printing UTF-8 bytes outfp The output file pointer */ void print_outstring (int print_char, int print_codes, unsigned *utf8_bytes, char *out_format, FILE *outfp) { int i; /* loop variable */ void fprint_utf8 (FILE *, unsigned *, char *); if (print_codes == 1) { /* no option '-n' specified */ fprint_utf8 (outfp, utf8_bytes, out_format); } if (print_char == 1) { /* option '-c' */ fputc (' ', outfp); for (i = 0; i < 4 && utf8_bytes[i] != '\0'; i++) fputc (utf8_bytes[i], outfp); } return; } utf8gen-1.1/ChangeLog0000644000175000017500000000043613331350403013167 0ustar paulpaul2018-08-04 Paul Hardy * Version 1.1. Added "orig" target in Makefile.am to remove all Autotools-generated files, returning the source package to its pristine state. 2018-07-14 Paul Hardy * Version 1.0. Initial version. utf8gen-1.1/AUTHORS0000644000175000017500000000013513322423110012455 0ustar paulpaulThe utf8gen package was written by Paul Hardy in June-July 2018. utf8gen-1.1/NEWS0000644000175000017500000000052213331350271012113 0ustar paulpaul2018-08-04 Version 1.1. Added "make orig" target to restore pristine pre-Autotools tarball. This is to help downstream distros port packages to new architectures, as well as to guarantee that all files are build with the latest versions of Autotools programs. 2018-07-14 Version 1.0. Initial version of utf8gen. utf8gen-1.1/COPYING0000644000175000017500000012014413316507655012467 0ustar paulpaul GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. GNU Free Documentation License Version 1.3, 3 November 2008 Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. 0. PREAMBLE The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software. We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference. 1. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law. A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language. A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none. The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words. A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque". Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only. The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text. The "publisher" means any person or entity that distributes copies of the Document to the public. A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition. The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License. 2. VERBATIM COPYING You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may also lend copies, under the same conditions stated above, and you may publicly display copies. 3. COPYING IN QUANTITY If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects. If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages. If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document. 4. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission. B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement. C. State on the Title page the name of the publisher of the Modified Version, as the publisher. D. Preserve all the copyright notices of the Document. E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below. G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice. H. Include an unaltered copy of this License. I. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission. K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. M. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version. N. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section. O. Preserve any Warranty Disclaimers. If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles. You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. 5. COMBINING DOCUMENTS You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers. The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements". 6. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 7. AGGREGATION WITH INDEPENDENT WORKS A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document. If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate. 8. TRANSLATION Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail. If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title. 9. TERMINATION You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License. However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it. 10. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See https://www.gnu.org/licenses/. Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Document. 11. RELICENSING "Massive Multiauthor Collaboration Site" (or "MMC Site") means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A "Massive Multiauthor Collaboration" (or "MMC") contained in the site means any set of copyrightable works thus published on the MMC site. "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization. "Incorporate" means to publish or republish a Document, in whole or in part, as part of another Document. An MMC is "eligible for relicensing" if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008. The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing. ADDENDUM: How to use this License for your documents To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright (c) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this: with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation. If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.