utf8gen-1.1/ 0000755 0001750 0001750 00000000000 13331374125 011421 5 ustar paul paul utf8gen-1.1/README 0000644 0001750 0001750 00000004506 13331351244 012303 0 ustar paul paul This is the README file for the utf8gen package.
This package contains the program utf8gen, a utility
for reading in hexadecimal numbers from an input source,
one per line, and printing them as UTF-8 byte sequences.
Several options allow various forms of output. Consult
the utfgen(1) man page and the utfgen Texinfo file for
more information. Read the man page with the command
man utf8gen
following a "make install" step. Read the Texinfo user
guide with the command
info utf8gen
Information about the latest version is in the NEWS file.
If you downloaded this source package, instructions for
building and installation can be found in the INSTALL file
and license information is in the COPYING file.
If you are a downstream maintainer porting this package
to a new architecture, you can remove all files that
Autotools added with the command
autoreconf -f -i && ./configure && make orig
In all other cases, typing the following command will
usually build the software on your system:
./configure && make
Then consult the INSTALL file for installation instructions.
LICENSES
--------
Licenses are contained in the COPYING file. A summary of these licenses
appears below.
Source Code License:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
Documentation License:
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
You should have received a copy of the GNU Free Documentation License
along with this program. If not, see .
utf8gen-1.1/Makefile.am 0000644 0001750 0001750 00000000662 13331247752 013466 0 ustar paul paul ## Process this file with automake to produce Makefile.in
SUBDIRS = doc man src test
#
# Add "orig" target to remove all Autotools-added files left over from
#
# autoreconf && ./configure && make && make distclean
#
orig: distclean
\rm -rf aclocal.m4 autom4te.cache build-aux configure *~ */*~ \
INSTALL Makefile.in doc/Makefile.in man/Makefile.in \
src/Makefile.in test/Makefile.in src/config.h.in doc/utf8gen.info
utf8gen-1.1/man/ 0000755 0001750 0001750 00000000000 13331374125 012174 5 ustar paul paul utf8gen-1.1/man/Makefile.am 0000664 0001750 0001750 00000000151 13322700070 014217 0 ustar paul paul ## Process this file with automake to produce Makefile.in
man_MANS = utf8gen.1
EXTRA_DIST = $(man_MANS)
utf8gen-1.1/man/utf8gen.1 0000644 0001750 0001750 00000005530 13322455005 013636 0 ustar paul paul .TH UTF8GEN 1 "2018 Jun 30"
.SH NAME
utf8gen \- Generate UTF-8 output from hexadecimal input
.SH SYNOPSIS
.br
\fButf8gen\fP [ [-e \fIformat1\fP] | [-E \fIformat2\fP] ] [-r \fIformatr\fP]
[ [-u \fIutf8_format\fP] | -n] [-c] [-s]
[-i \fIinput_file\fP] [-o \fIoutput_file\fP]
.SH DESCRIPTION
.B utf8gen
reads a list of hexadecimal ASCII values in the range
0 through 10FFFF, one per line, and prints the UTF-8 encoding
of that number as a Unicode code point.
.PP
Each input line must begin with a hexadecimal number.
A string may follow after that, which can be echoed to the
output as the "remainder" (see the -r option below).
The total input line length, including an ending newline,
is limited to 4096 bytes.
.SH OPTIONS
.TP 6
\-c
After the UTF-8 codes are printed, print a space followed by
the character that the hexadecimal code point represents.
.TP
\-e
Echo the input code point in one format, using the
printf(3) format string \fIformat1\fP.
.TP
\-E
Echo the input code point in two formats, using the
printf(3) format string \fIformat2\fP.
.TP
\-n
Do \fInot\fP print the UTF-8 byte values. This can be useful
if only the printed character itself is desired; see the \-c option.
.TP
\-r
Print the remainder of the input string after the initial
hexadecimal digits, using the printf(3) format string \fIformatr\fP.
.TP
\-s
Swap the order of output: print the UTF-8 output portion first,
then print the input string portion. This can be useful for
generating code containing a UTF-8 encoding followed by a
comment that contains the input hexadecimal digits.
.TP
\-u
Print the UTF-8 encoded value of the input hexadecimal number,
as numeric codes for each UTF-8 byte, using the printf(3)
format string \fIutf8_format\fP. If no string is specified,
a default format of a backslash followed by three octal digits
is printed for each byte.
.SH EXAMPLES
.RS
.PP
utf8gen -e "0x%04X " -u "\\%03o"
.PP
utf8gen -E "U+%04x = 0%02o = "
.PP
utf8gen -s -e " /* U+%04X */" -u "\\%03o"
.RE
.SH FILES
Files contain lines that each begin with an ASCII hexadecimal
code in the valid Unicode range 0 through 10FFFF, inclusive.
This hexadecimal code may optionally be followed by a space
followed by an arbitrary string ending with a newline,
up to the limit of 4096 bytes per input line.
An example line could be the following (with no indent):
.PP
.RS
41 Letter 'A'
.RE
.SH "SEE ALSO"
For more detailed explanations and examples of common usage,
consult the \fButf8gen\fP texinfo manual.
.SH AUTHOR
.B utf8gen
was written by Paul Hardy.
.SH LICENSE
.B utf8gen
is Copyright \(co 2018 Paul Hardy.
.PP
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
.SH BUGS
No known bugs exist.
utf8gen-1.1/doc/ 0000755 0001750 0001750 00000000000 13331374125 012166 5 ustar paul paul utf8gen-1.1/doc/utf8gen.texi 0000644 0001750 0001750 00000052717 13322454672 014463 0 ustar paul paul \input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename utf8gen.info
@settitle utf8gen
@setchapternewpage odd
@c %**end of header
@macro utf
@w{UTF-8}
@end macro
@paragraphindent none
@copying
This manual describes @command{utf8gen}, a utility for converting Unicode
hexadecimal code points into @utf{} as printable characters for immediate
viewing and as byte sequences suitable for including in programs.
Copyright @copyright{} 2018 Paul Hardy
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts and no Back-Cover Texts.
@end quotation
@end copying
@dircategory Text
@direntry
* utf8gen: (utf8gen). A utility for converting hexadecimal numbers into @utf{}
@end direntry
@titlepage
@title utf8gen
@author Paul Hardy
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage
@contents
@node Top, Introduction, (dir), (dir)
@menu
* Introduction:: General information
* Unicode:: Overview of Unicode and @utf{}
* Invoking @command{utf8gen}:: Common Use Cases for using @command{utf8gen}
* @command{utf8gen} Reference:: Detailed description of the @command{utf8gen} utility
@end menu
@node Introduction, Unicode, Top, Top
@chapter Introduction
This document describes some typical uses for @command{utf8gen}, a utility
to read ASCII hexadecimal numbers, interpret them as Unicode
code points, and output Unicode Transformation Format --
@w{8-bit} (@utf{}).
If you have questions, please email
@email{unifoundry@@unifoundry.com}.
You can check for the latest @command{utf8gen} news at
@code{http://unifoundry.com/utf8gen/}.
--- Paul Hardy (@email{unifoundry@@unifoundry.com}) 2018
@node Unicode, Invoking @command{utf8gen}, Introduction, Top
@chapter Unicode
@menu
* Unicode Overview::
* Unicode Planes::
* UTF-8::
@end menu
@node Unicode Overview, Unicode Planes, , Unicode
@section Unicode Overview
Unicode arose out of a practical need for a common encoding
to represent all of the world's languages on computers.
It has grown rapidly over the past 20+ years to contain
more than 100,000 glyphs (characters). These glyphs are
divided over multiple Unicode @dfn{planes}: @w{Plane 0} through
@w{Plane 16} (decimal), for a total of 17 planes. Each plane
contains 64k @dfn{code points}, which in hexadecimal is
10000 code points. @dfn{Code point} is a more general
term than @dfn{character}, because Unicode contains more
than just visible characters; for example, Unicode contains
various code points for indicating variation selection
for scripts that have multiple forms of a visible character.
@node Unicode Planes, UTF-8, Unicode Overview, Unicode
@section Unicode Planes
@w{Plane 0} contains most of the world's modern scripts.
Code points in this range are denoted as Unicode code points
U+0000 through U+FFFF, inclusive. This plane is also known as
the Basic Multilingual Plane, or BMP. The ASCII code points
are in the beginning of the BMP, from U+0000 through U+007F.
The BMP is almost entirely allocated --- there are hardly any
free code point ranges in the BMP for assigning new scripts.
Fortunately, Unicode has 16 more planes beyone @w{Plane 0.}
@w{Plane 1} contains many ancient scripts, and modern collections
that were not assigned to @w{Plane 0} (for example, emoji). This
plane is also known as the Supplementary Multilingual Plane,
or SMP. Unicode code points in the SMP are in the range
U+10000 through U+1FFFF, inclusive.
@w{Plane 2} is the Supplementary Ideographic Plane, or SIP.
It contains Chinese and Japanese ideographs that were not
included in @w{Plane 0.} Unicode code points in @w{Plane 2}
are in the range U+20000 through U+2FFFF, inclusive.
These are the main planes with assigned visible characters.
@w{Plane 14} is the Supplementary Special-purpose Plane, or SSP.
Its code points are in the range U+E0000 through U+EFFFF.
This plane contains specialized tags and other designators.
Planes 15 and 16 are Private Use Area (PUA) planes. They
can contain any user-defined characters and special-purpose
codes. These planes span the Unicode range U+FFFFF through
U+10FFFF.
@node UTF-8, , Unicode Planes, Unicode
@section UTF-8
Thus the valid Unicode range is U+0000 through U+10FFFF,
inclusive. Encoding the entire Unicode range takes from
one byte for the ASCII range to 21 bits to encode anything
in @w{Plane 16} (U+100000 through U+10FFFF).
A problem with transmitting these multi-byte numbers is
that different computer architectures order bytes in a
multi-byte word differently. Today there are only two
common orderings: big-endian, where the largest byte is
stored first, and little-endian, where the smallest byte
is stored first.
When transmitting information between computers of
different architectures, a standard protocol had to
be defined. The Unicode encoding that modern web
browsers use is called Unicode Transformation Format
-- @w{8-bit,} or @utf{}. It has also become the standard
encoding for text documents that contain non-ASCII
characters.
@utf{} encoding has several desirable characteristics,
which are described briefly below.
The first byte in a @utf{} encoded character begins with
a series of @samp{1}@tie{}bits, to indicate how many bytes
the character requires, except a one-byte character
starts with a @samp{0}@tie{}bit to designate the byte as
ASCII. The ASCII range, U+0000 through U+007F,
is encoded the same in @utf{}, as just one byte.
Each byte after the first in a multi-byte character
begins with the bits@tie{}@samp{10}.
The number of bytes in a @utf{} encoded Unicode
code point varies from one to four bytes. Thus
it is an efficient encoding compared to one that
would transmit the same number of bytes for every
character across the entire Unicode range.
No single-byte @utf{} character will ever begin with
the pattern @samp{10}, as single-byte @utf{} characters
always begin with a @samp{0}@tie{}bit. So string searching
functions can skip bytes within a @utf{} byte string
and if a byte currently being examined begins with
the bits@tie{}@samp{10}, the search function knows it is
past the beginning of a multi-byte character.
Unicode code points are published in @dfn{code charts},
available at @url{http://unicode.org/}.
These code charts number code points using
hexadecimal. These hexadecimal numbers must be
converted to @utf{} for transmission on web pages,
storing in a text document, etc. Hence the
creation of @command{utf8gen}.
@node Invoking @command{utf8gen}, @command{utf8gen} Reference, Unicode, Top
@chapter Invoking @command{utf8gen}
@menu
* Motivation::
* Printing a Character::
* Code Generation::
* Use Case Summary::
@end menu
@node Motivation, Printing a Character, , Invoking @command{utf8gen}
@section Motivation
This chapter provides examples of typical uses
for @command{utf8gen} for programmers and end-users.
I needed to generate hundreds of lines of source
code containing different @utf{} characters for
a set of programs. My searches did not find
anything that performed the conversion as I wanted,
so I wrote @command{utf8gen}.
With the Unicode Standard specifying code point
assignments in hexadecimal, it was natural to
write software that took a hexadecimal number
as input. There are numerous potential forms
of output, especially considering the formatting
syntax of different programming languages. The
purpose of most of the options for @command{utf8gen}
is to select various output options.
@command{utf8gen} reads in hexadecimal numbers,
one per input line. Each number can be followed
by a space and a miscellaneous string to the end
of the line. That @dfn{remainder} string can
optionally be printed on output; more on that
later.
@node Printing a Character, Code Generation, Motivation, Invoking @command{utf8gen}
@section Printing a Character
The simplest thing an end-user might want to know
is whether their computer has a font that supports
a certain Unicode character. The easiest way
to use @command{utf8gen} is interactively at a
terminal, typing in hexadecimal numbers and
looking at the character produced. To do this,
run the command
@example
utf8gen -c -n
@end example
The @option{-c} option tells @command{utf8gen} to
print the input hexadecimal number as a Unicode
character on the screen. The @option{-n} option
tells @command{utf8gen} to @emph{not} print the
@utf{} byte sequence as a set of formatted
numbers. Just enter one hexadecimal number
in the range 0 through 10FFFF, inclusive,
one number per line. When finished running
@command{utf8gen} interactively in this way,
end your input by typing @key{C-d}.
@node Code Generation, Use Case Summary, Printing a Character, Invoking @command{utf8gen}
@section Code Generation
@menu
* The Usefulness of Octal::
* Commenting Code::
* Remainder Strings::
* UTF-8 Output Format::
* Input and Output Files::
@end menu
@node The Usefulness of Octal, Commenting Code, , Code Generation
@subsection The Usefulness of Octal
If converting hexadecimal numbers into a form
that a programming language accepts, there are
many possiblilties. For this reason, @command{utf8gen}
accepts format strings in the style of the C
@code{printf} function. This was a natural
choice, as @command{utf8gen} is written @w{in C.}
With eight bits in a byte, and @utf{} encoded
characters starting either with a @samp{0}@tie{}bit
for ASCII or with @samp{10} for all but the
first byte in a multi-byte sequence, it is
convenient to look at Unicode code point numbers
encoded as octal. If a byte in a @utf{} byte
string begins with @samp{10}, this leaves six
bits for the remainder of the byte. This is
conveniently viewed as two octal digits.
The default output of @command{utf8gen} is simply
the sequence of octal digits in a @utf{} character,
printed in the C style of a backslash followed
by three octal digits per byte. This is handy
for a quick copy and paste of a single @utf{}
byte sequence into a program.
If using the C-style backslashed octal number
format, it can be reassuring to see what a
Unicode code point is in octal (at least it
was for me, when I first wrote the program
and was verifying its proper operation).
A simple way of doing this is to have
@command{utf8gen} echo the input hexadecimal
number you typed in as octal, and then
print the @utf{} representation. To do
this, run a command of the form
@example
utf8gen -e "%03o = "
@end example
For example, if you enter the hexadecimal
number @kbd{2134} (the Unicode code point for
@samp{Script Small Letter O} in the
@samp{Letterlike Symbols} block,
@command{utf8gen} will generate this output:
@example
20464 = \342\204\264
@end example
The hexadecimal number 2134 is 20464 in octal.
Notice how two octal digits from the Unicode
code point appear in each @utf{} byte except
for the first byte. The leading octal digit
of @samp{2} represents the leading two bits
@samp{10} in a @utf{} multi-byte sequence.
The first byte in a multi-byte @utf{} sequence
starts with a string of @samp{1}@tie{}bits indicating
how many bytes long the encoded character is.
In this case, the @utf{} representation of
U+2134 will take three bytes, so the first byte
begins with the bit string @samp{1110}. That
corresponds to the first two octal digits
(@samp{34}) of the first byte in the sequence,@tie{}@samp{\342}.
Looking at the sequence @samp{\342\204\264}
again, it is easy to see the placement of
the octal representation of this Unicode
code point, 20464. In this way, verifying
the proper conversion of the hexadecimal
Unicode code point to @utf{} is straightforward.
@node Commenting Code, Remainder Strings, The Usefulness of Octal, Code Generation
@subsection Commenting Code
Commenting code is of course useful, especially
when dealing with something as arcane as raw
@utf{} byte sequences. @command{utf8gen} provides
various ways of doing this. A couple of examples
should suffice to give you an idea of these
capabilities.
The simplest method for creating comments might
be to follow an octal sequence with the Unicode
code point in its canonical form. The @option{-e}
option @emph{echoes} the input number to the output
using the format string that follows. This will
accomplish that:
@example
utf8gen -e "/* U+%04X */ "
@end example
For the hexadecimal input number 2134, this
produces the output
@example
/* U+2134 */ \342\204\264
@end example
The expectation is that a programmer will be
able to use an editor that can take a string
like @samp{\342\204\264} and easily convert
it into a @dfn{print}-style command in the
programming language of choice.
It might be preferable to print the comment
after the @utf{} byte sequence. The @option{-s}
option allows this by @dfn{swapping} the default
output string order. For example, the command
@example
utf8gen -e " /* U+%04X */" -s
@end example
produces the output (again, using 2134 as
the input number) of
@example
\342\204\264 /* U+2134 */
@end example
It might even be useful to output the initial
hexadecimal number using two different bases.
This is accomplished with the @option{-E} option,
followed by the format string for echoing the
input number in two ways. For example, the command
@example
utf8gen -E " /* U+%04X = 0%o */" -s
@end example
produces the output (with an input of @samp{2134})
@example
\342\204\264 /* U+2134 = 020464 */
@end example
@node Remainder Strings, UTF-8 Output Format, Commenting Code, Code Generation
@subsection Remainder Strings
One can only glean so much by looking at numbers
though. A textual comment describing a Unicode
code point can also help. @command{utf8gen} supports
printing free-form text following an initial
hexadecimal number followed by a space. This is
done with the @option{-r} option, to print the
@dfn{remainder} of the input line, using
the format string that follows this option.
The Unicode Consortium makes various data files
available with a free use license. The first
field is usually the Unicode code point in hexadecimal.
Remaining fields will contain information about
each code point. For example, given the following
line of input:
@example
2134 SCRIPT SMALL O
@end example
This command
@example
utf8gen -e " /* U+%04X " -s -r "%s */"
@end example
will produce this output:
@example
\342\204\264 /* U+2134 SCRIPT SMALL O */
@end example
This can facilitate batch processing of large
portions of a Unicode data file.
@node UTF-8 Output Format, Input and Output Files, Remainder Strings, Code Generation
@subsection UTF-8 Output Format
@command{utf8gen} also allows specifying the
format of the encoded @utf{} bytes with the
@option{-u} option followed by a format string.
For example, suppose the programming language you
use will accept bytes in hexadecimal using the form
@code{\x} followed by a hexadecimal number.
If we take the previous example input line and
provide it to @command{utf8gen} with the comand
@example
utf8gen -u "\x%02x" -r " /* %s */" -s
@end example
this will produce the output
@example
\xe2\x84\xb4 /* SMALL SCRIPT O */
@end example
If the @option{-r} option is selected but there
is nothing after the hexadecimal number on an
input line, no remainder content will be printed.
Of course, you could also use the @option{-e}
or @option{-E} options to echo back the input number
in the desired output format(s) by adding it to
the command line.
@node Input and Output Files, , UTF-8 Output Format, Code Generation
@subsection Input and Output Files
This information can be extracted and provided as
an input file to @command{utf8gen} using the @option{-i}
option to specify an input file. Output can be
written to a file using the @option{-o} option.
@node Use Case Summary, , Code Generation, Invoking @command{utf8gen}
@section Use Case Summary
The descriptions in this chapter give a brief
overview of all of the @option{utf8gen} options
and how they might be used in practice.
@command{utf8gen} tries to strike a balance between
the basics that a programmer might find useful
for bulk conversion of a large number of hexadecimal
Unicode code points versus creeping featurism.
While @command{utf8gen} won't write your program
for you, it can make the bulk conversion of
code points efficient.
@node @command{utf8gen} Reference, , Invoking @command{utf8gen}, Top
@chapter @command{utf8gen} Reference
@comment TROFF INPUT: .TH UTF8GEN 1 "2018 Jun 30"
@c @node utf8gen, , , @command{utf8gen} Reference
@c @section utf8gen
@c DEBUG: print_menu("@section")
@menu
* NAME::
* SYNOPSIS::
* DESCRIPTION::
* OPTIONS::
* EXAMPLES::
* FILES::
* AUTHOR::
* LICENSE::
* BUGS::
@end menu
@comment TROFF INPUT: .SH NAME
@node NAME, SYNOPSIS, , @command{utf8gen} Reference
@section NAME
@c DEBUG: print_menu("utf8gen NAME")
utf8gen @minus{} Generate UTF-8 output from hexadecimal input
@comment TROFF INPUT: .SH SYNOPSIS
@node SYNOPSIS, DESCRIPTION, NAME, @command{utf8gen} Reference
@section SYNOPSIS
@c DEBUG: print_menu("utf8gen SYNOPSIS")
@comment TROFF INPUT: .br
@comment .br
@b{utf8gen} [ [-e @i{format1}] | [-E @i{format2}] ] [-r @i{formatr}]
[ [-u @i{utf8@t{_}format}] | -n] [-c] [-s]
[-i @i{input@t{_}file}] [-o @i{output@t{_}file}]
@comment TROFF INPUT: .SH DESCRIPTION
@node DESCRIPTION, OPTIONS, SYNOPSIS, @command{utf8gen} Reference
@section DESCRIPTION
@c DEBUG: print_menu("utf8gen DESCRIPTION")
@comment TROFF INPUT: .B utf8gen
@b{utf8gen}
reads a list of hexadecimal ASCII values in the range
0 through 10FFFF, one per line, and prints the UTF-8 encoding
of that number as a Unicode code point.
@comment TROFF INPUT: .PP
Each input line must begin with a hexadecimal number.
A string may follow after that, which can be echoed to the
output as the "remainder" (see the @option{-r} option below).
The total input line length, including an ending newline,
is limited to 4096 bytes.
@comment TROFF INPUT: .SH OPTIONS
@node OPTIONS, EXAMPLES, DESCRIPTION, @command{utf8gen} Reference
@section OPTIONS
@c DEBUG: print_menu("utf8gen OPTIONS")
@comment TROFF INPUT: .TP 6
@c ---------------------------------------------------------------------
@table @code
@item @option{-c}
After the UTF-8 codes are printed, print a space followed by
the character that the hexadecimal code point represents.
@comment TROFF INPUT: .TP
@item @option{-e}
Echo the input code point in one format, using the
printf(3) format string @i{format1}.
@comment TROFF INPUT: .TP
@item @option{-E}
Echo the input code point in two formats, using the
printf(3) format string @i{format2}.
@comment TROFF INPUT: .TP
@item @option{-n}
Do @i{not} print the UTF-8 byte values. This can be useful
if only the printed character itself is desired; see the @option{-c} option.
@comment TROFF INPUT: .TP
@item @option{-r}
Print the remainder of the input string after the initial
hexadecimal digits, using the printf(3) format string @i{formatr}.
@comment TROFF INPUT: .TP
@item @option{-s}
Swap the order of output: print the UTF-8 output portion first,
then print the input string portion. This can be useful for
generating code containing a UTF-8 encoding followed by a
comment that contains the input hexadecimal digits.
@comment TROFF INPUT: .SH EXAMPLES
@item @option{-u}
Print the UTF-8 encoded value of the input hexadecimal number,
as numeric codes for each UTF-8 byte, using the printf(3)
format string @i{utf8@t{_}format}. If no string is specified,
a default format of a backslash followed by three octal digits
is printed for each byte.
@comment TROFF INPUT: .TP
@end table
@c ---------------------------------------------------------------------
@node EXAMPLES, FILES, OPTIONS, @command{utf8gen} Reference
@section EXAMPLES
@c DEBUG: print_menu("utf8gen EXAMPLES")
@comment TROFF INPUT: .RS
@c ---------------------------------------------------------------------
@quotation
@comment TROFF INPUT: .PP
@code{utf8gen -e "0x%04X " -u "\\%03o"}
@comment TROFF INPUT: .PP
@code{utf8gen -E "U+%04x = 0%02o = "}
@comment TROFF INPUT: .PP
@code{utf8gen -s -e " /* U+%04X */" -u "\\%03o"}
@comment TROFF INPUT: .RE
@end quotation
@c ---------------------------------------------------------------------
@comment TROFF INPUT: .SH FILES
@node FILES, AUTHOR, EXAMPLES, @command{utf8gen} Reference
@section FILES
@c DEBUG: print_menu("utf8gen FILES")
Files contain lines that each begin with an ASCII hexadecimal
code in the valid Unicode range 0 through 10FFFF, inclusive.
This hexadecimal code may optionally be followed by a space
followed by an arbitrary string ending with a newline,
up to the limit of 4096 bytes per input line.
An example line could be the following (with no indent):
@comment TROFF INPUT: .PP
@comment TROFF INPUT: .RS
@c ---------------------------------------------------------------------
@quotation
41 Letter 'A'
@comment TROFF INPUT: .RE
@end quotation
@c ---------------------------------------------------------------------
@comment TROFF INPUT: .SH AUTHOR
@node AUTHOR, LICENSE, FILES, @command{utf8gen} Reference
@section AUTHOR
@c DEBUG: print_menu("utf8gen AUTHOR")
@comment TROFF INPUT: .B utf8gen
@b{utf8gen}
was written by Paul Hardy.
@comment TROFF INPUT: .SH LICENSE
@node LICENSE, BUGS, AUTHOR, @command{utf8gen} Reference
@section LICENSE
@c DEBUG: print_menu("utf8gen LICENSE")
@comment TROFF INPUT: .B utf8gen
@b{utf8gen}
is Copyright @copyright{} 2018 Paul Hardy.
@comment TROFF INPUT: .PP
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
@comment TROFF INPUT: .SH BUGS
@node BUGS, , LICENSE, @command{utf8gen} Reference
@section BUGS
@c DEBUG: print_menu("utf8gen BUGS")
No known bugs exist.
@bye
utf8gen-1.1/doc/Makefile.am 0000644 0001750 0001750 00000000131 13322700035 014206 0 ustar paul paul ## Process this file with automake to produce Makefile.in
info_TEXINFOS = utf8gen.texi
utf8gen-1.1/configure.ac 0000644 0001750 0001750 00000000601 13331224301 013672 0 ustar paul paul AC_INIT([utf8gen], [1.1], [unifoundry@unifoundry.com],
[utf8gen], [http://www.unifoundry.com/utf8gen/])
AC_PREREQ([2.68])
AC_CONFIG_SRCDIR([src/utf8gen.c])
AC_CONFIG_AUX_DIR([build-aux])
AM_INIT_AUTOMAKE([1.11 subdir-objects -Wall -Werror])
AC_CONFIG_HEADERS([src/config.h])
AC_CONFIG_FILES([Makefile doc/Makefile man/Makefile src/Makefile test/Makefile])
AC_PROG_CC
AC_OUTPUT
utf8gen-1.1/test/ 0000755 0001750 0001750 00000000000 13331374125 012400 5 ustar paul paul utf8gen-1.1/test/sample2-out.txt 0000644 0001750 0001750 00000001060 13316717456 015321 0 ustar paul paul \056 . -- Full Stop (Period)
\101 A -- Latin Letter Capital 'A'
\172 z -- Latin Letter Small 'z'
\316\221 Α -- Greek Letter Capital Alpha
\317\211 ω -- Greek Letter Small Omega
\320\251 Щ -- Cyrillic Capital Letter Shcha
\340\244\204 ऄ -- Devanagari Letter Short A
\341\232\240 ᚠ -- Runic Letter Fehu Feoh Fe F
\342\235\244 ❤ -- Heavy Black Heart
\346\227\245 日 -- CJK Ideographs Sun
\360\220\202\205 𐂅 -- Linear B Ideogram B105M Stallion
\360\235\215\261 𝍱 -- Counting Rod Tens Digit Nine
\364\217\277\260 -- Unicode Code Point U+10FFF0
utf8gen-1.1/test/Makefile.am 0000644 0001750 0001750 00000000506 13322200736 014431 0 ustar paul paul check_SCRIPTS=test1 test2 test3
TESTS=$(check_SCRIPTS)
EXTRA_DIST=README-test $(check_SCRIPTS) test-all sample-in.txt \
sample1-out.txt sample2-out.txt sample3-out.txt
AM_TESTS_ENVIRONMENT = utf8gen_path='$(abs_top_builddir)/src' ; \
export utf8gen_path ;
installcheck-local:
make utf8gen_bindir=${DESTDIR}${bindir} check
utf8gen-1.1/test/test2 0000755 0001750 0001750 00000002017 13322177252 013371 0 ustar paul paul #!/bin/sh
set -e
#
# Create temporary directory for test
# output if AUTOPKGTEST_TMP is undefined.
# Debian GNU/Linux defines AUTOPKGTEST_TMP.
#
if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then
TEST_TMP=$(mktemp -d)
trap "\rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM
else
TEST_TMP=${AUTOPKGTEST_TMP}
fi
#
# Point to the source directory for test.
#
if [ "x${srcdir}" = "x" ] ; then
srcdir=.
fi
#
# Point to binary executable; utf8gen_bindir
# should be defined for "make installcheck".
# Otherwise, leave undefined for "make check".
#
if [ "x${utf8gen_bindir}" = "x" ] ; then
utf8gen_bindir=../src
fi
${utf8gen_bindir}/utf8gen -s -c -r " -- %s" \
< ${srcdir}/sample-in.txt \
> ${TEST_TMP}/test2-out.txt
diff ${srcdir}/sample2-out.txt ${TEST_TMP}/test2-out.txt || \
(echo "test1 FAILED; output in ${TEST_TMP}/test1-out.txt" ; exit 1)
#
# If AUTOPKGTEST_TMP was defined, don't remove it;
# a Debian calling process will take care of that.
#
if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then
\rm -rf ${TEST_TMP}
fi
utf8gen-1.1/test/sample1-out.txt 0000644 0001750 0001750 00000001060 13316051456 015307 0 ustar paul paul \056 . -- Full Stop (Period)
\101 A -- Latin Letter Capital 'A'
\172 z -- Latin Letter Small 'z'
\316\221 Α -- Greek Letter Capital Alpha
\317\211 ω -- Greek Letter Small Omega
\320\251 Щ -- Cyrillic Capital Letter Shcha
\340\244\204 ऄ -- Devanagari Letter Short A
\341\232\240 ᚠ -- Runic Letter Fehu Feoh Fe F
\342\235\244 ❤ -- Heavy Black Heart
\346\227\245 日 -- CJK Ideographs Sun
\360\220\202\205 𐂅 -- Linear B Ideogram B105M Stallion
\360\235\215\261 𝍱 -- Counting Rod Tens Digit Nine
\364\217\277\260 -- Unicode Code Point U+10FFF0
utf8gen-1.1/test/test3 0000755 0001750 0001750 00000002017 13322177260 013371 0 ustar paul paul #!/bin/sh
set -e
#
# Create temporary directory for test
# output if AUTOPKGTEST_TMP is undefined.
# Debian GNU/Linux defines AUTOPKGTEST_TMP.
#
if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then
TEST_TMP=$(mktemp -d)
trap "\rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM
else
TEST_TMP=${AUTOPKGTEST_TMP}
fi
#
# Point to the source directory for test.
#
if [ "x${srcdir}" = "x" ] ; then
srcdir=.
fi
#
# Point to binary executable; utf8gen_bindir
# should be defined for "make installcheck".
# Otherwise, leave undefined for "make check".
#
if [ "x${utf8gen_bindir}" = "x" ] ; then
utf8gen_bindir=../src
fi
${utf8gen_bindir}/utf8gen -s -c -r " -- %s" \
< ${srcdir}/sample-in.txt \
> ${TEST_TMP}/test3-out.txt
diff ${srcdir}/sample3-out.txt ${TEST_TMP}/test3-out.txt || \
(echo "test1 FAILED; output in ${TEST_TMP}/test1-out.txt" ; exit 1)
#
# If AUTOPKGTEST_TMP was defined, don't remove it;
# a Debian calling process will take care of that.
#
if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then
\rm -rf ${TEST_TMP}
fi
utf8gen-1.1/test/sample3-out.txt 0000644 0001750 0001750 00000001060 13316056516 015313 0 ustar paul paul \056 . -- Full Stop (Period)
\101 A -- Latin Letter Capital 'A'
\172 z -- Latin Letter Small 'z'
\316\221 Α -- Greek Letter Capital Alpha
\317\211 ω -- Greek Letter Small Omega
\320\251 Щ -- Cyrillic Capital Letter Shcha
\340\244\204 ऄ -- Devanagari Letter Short A
\341\232\240 ᚠ -- Runic Letter Fehu Feoh Fe F
\342\235\244 ❤ -- Heavy Black Heart
\346\227\245 日 -- CJK Ideographs Sun
\360\220\202\205 𐂅 -- Linear B Ideogram B105M Stallion
\360\235\215\261 𝍱 -- Counting Rod Tens Digit Nine
\364\217\277\260 -- Unicode Code Point U+10FFF0
utf8gen-1.1/test/test1 0000755 0001750 0001750 00000002017 13322177245 013372 0 ustar paul paul #!/bin/sh
set -e
#
# Create temporary directory for test
# output if AUTOPKGTEST_TMP is undefined.
# Debian GNU/Linux defines AUTOPKGTEST_TMP.
#
if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then
TEST_TMP=$(mktemp -d)
trap "\rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM
else
TEST_TMP=${AUTOPKGTEST_TMP}
fi
#
# Point to the source directory for test.
#
if [ "x${srcdir}" = "x" ] ; then
srcdir=.
fi
#
# Point to binary executable; utf8gen_bindir
# should be defined for "make installcheck".
# Otherwise, leave undefined for "make check".
#
if [ "x${utf8gen_bindir}" = "x" ] ; then
utf8gen_bindir=../src
fi
${utf8gen_bindir}/utf8gen -s -c -r " -- %s" \
< ${srcdir}/sample-in.txt \
> ${TEST_TMP}/test1-out.txt
diff ${srcdir}/sample1-out.txt ${TEST_TMP}/test1-out.txt || \
(echo "test1 FAILED; output in ${TEST_TMP}/test1-out.txt" ; exit 1)
#
# If AUTOPKGTEST_TMP was defined, don't remove it;
# a Debian calling process will take care of that.
#
if [ "x${AUTOPKGTEST_TMP}" = "x" ] ; then
\rm -rf ${TEST_TMP}
fi
utf8gen-1.1/test/sample-in.txt 0000644 0001750 0001750 00000000612 13316056402 015023 0 ustar paul paul 002E Full Stop (Period)
41 Latin Letter Capital 'A'
7a Latin Letter Small 'z'
0391 Greek Letter Capital Alpha
03c9 Greek Letter Small Omega
429 Cyrillic Capital Letter Shcha
0904 Devanagari Letter Short A
16A0 Runic Letter Fehu Feoh Fe F
2764 Heavy Black Heart
65E5 CJK Ideographs Sun
10085 Linear B Ideogram B105M Stallion
1d371 Counting Rod Tens Digit Nine
10FFF0 Unicode Code Point U+10FFF0
utf8gen-1.1/test/test-all 0000755 0001750 0001750 00000000272 13316720203 014047 0 ustar paul paul #!/bin/sh
echo "*** Running Tests..."
./test1 || exit 1
echo "Test 1 PASSED"
./test2 || exit 1
echo "Test 2 PASSED"
./test3 || exit 1
echo "Test 3 PASSED"
echo "*** Finished Tests"
utf8gen-1.1/test/README-test 0000644 0001750 0001750 00000002112 13322200663 014224 0 ustar paul paul The shell script "test-all" can be run from this
directory after ../src/utf8gen has been built.
Instead of running "test-all" in this directory,
it is better to use Autotools to build the Makefiles,
and then use the Makefiles to test utf8gen. To do
this, perform these steps:
1) cd .. [so you are in the top-level directory]
2) type the following commands:
./configure
make
make check
This will provide diagnostic output for any failed test.
If all goes well, the output from "make check" should
contain a series of lines like this:
PASS: test1
PASS: test2
PASS: test3
==============================================...
Testsuite summary for utf8gen
==============================================...
# TOTAL: 3
# PASS: 3
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
==============================================...
where is the current version number
of utf8gen. If you see that the three tests passed,
everything tested correctly.
utf8gen-1.1/src/ 0000755 0001750 0001750 00000000000 13331374125 012210 5 ustar paul paul utf8gen-1.1/src/Makefile.am 0000644 0001750 0001750 00000000156 13322700111 014232 0 ustar paul paul ## Process this file with automake to produce Makefile.in
bin_PROGRAMS = utf8gen
utf8gen_SOURCES = utf8gen.c
utf8gen-1.1/src/utf8gen.c 0000644 0001750 0001750 00000046520 13321312236 013735 0 ustar paul paul /*
utf8gen - convert hexadecimal input to UTF-8 numbers
Author: Paul Hardy
Date: June 2018
Synopsis: utf8gen [ [-e ] | [-E ] ] [-r ]
[ [-u ] | -n]
[-c] [-s]
[-i ] [-o ]
Author: Paul Hardy, unifoundry unifoundry.com, June 2018
Copyright (C) 2018 Paul Hardy
LICENSE:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
*/
#include /* created by autotools */
#include
#include
#include
#include
#include
#include
#include
/* To check functionality for compiling on GNU-based systems */
#define _GNU_SOURCE
/* These two defines are for diagnostic & help output */
#ifdef PACKAGE_NAME
#define PROG_NAME PACKAGE_NAME
#else
#define PROG_NAME "utf8gen"
#endif
#ifdef PACKAGE_VERSION
#define PROG_VERSION PACKAGE_VERSION
#else
#define PROG_VERSION "1.0"
#endif
#define MAXSTRING 4098 /* maximum number of characters on an input line */
/* For handling errors in system functions */
extern int errno;
int
main (int argc, char *argv[])
{
int i; /* loop variable */
int in_formats=0; /* number of times to print input number */
uint32_t codept; /* Unicode code point to convert */
char instring[MAXSTRING]; /* input line */
unsigned utf8_bytes[5]; /* encoded UTF-8 bytes, ending with null byte */
int print_remainder=0; /* =1 to print input string following code point */
int print_char=0; /* =1 to end output line with +UTF-8 character */
int swap_order=0; /* =1 to print UTF-8 first, then input format(s) */
int print_codes=1; /* print UTF-8 encoding; don't print if == 0 */
int exit_status; /* program exit status */
/*
Format strings for printing input number and output UTF-8.
By default, do not print the input code point, but print the
output UTF-8 character using the default_out format string.
*/
static char *default_out = "\\%03o"; /* default output UTF-8 format */
char *in_format=""; /* format to print input number */
char *rem_format=""; /* format for input remainder */
char *out_format = default_out; /* format to print output number */
void fatal_error (int, char *);
void print_help ();
int cvt2utf8 (uint32_t, unsigned *);
void fprint_utf8 (FILE *, unsigned *, char *);
void print_instring (int, int, char *, int, char *, char *, FILE *);
void print_outstring (int, int, unsigned *, char *, FILE *);
int interactive=1; /* =1 if reading from terminal, 0 otherwise */
FILE *infp = stdin; /* input file pointer; default is stdin */
FILE *outfp = stdout; /* output file pointer; default is stdout */
exit_status = EXIT_SUCCESS;
interactive = isatty (fileno (stdin)) ? 1 : 0;
for (i = 1; i < argc; i++) {
/*
Parse options. If an invalid command line argument
was given, print a help menu and exit with error status.
*/
if (argv[i][0] == '-' && exit_status == EXIT_SUCCESS) {
switch (argv[i][1]) {
/* Echo input number one way before printing conversion (-e) */
case 'e':
if (++i < argc) {
in_formats = 1;
in_format = argv[i];
}
else {
fatal_error (interactive,
"Missing echo format string after -e");
}
break;
/* Echo input number two ways before printing conversion (-E) */
case 'E':
if (++i < argc) {
in_formats = 2;
in_format = argv[i];
}
else {
fatal_error (interactive,
"Missing echo format string pair after -E");
}
break;
/*
print remaining string that followed
the hexadecimal Unicode code point (-r)
*/
case 'r':
if (++i < argc) {
print_remainder = 1;
rem_format = argv[i];
}
else {
fatal_error (interactive,
"Missing remainder of string format after -r");
}
break;
/* UTF-8 output format for each encoded byte (-u) */
case 'u':
if (++i < argc) {
out_format = argv[i];
}
else {
fatal_error (interactive,
"Missing format string after '-u'");
}
break;
/* do not print the UTF-8 byte codes (-n) */
case 'n':
print_codes = 0;
break;
/* end line by printing + UTF-8 character (-c) */
case 'c':
print_char = 1;
break;
/* swap output order: print UTF-8 components, then input (-s) */
case 's':
swap_order = 1;
break;
/* input filename (-i) */
case 'i':
if (++i < argc) {
infp = fopen (argv[i], "r");
if (infp == NULL) {
fprintf (stderr,
"%s: cannot open %s for input - %s\n\n",
PROG_NAME, argv[i], strerror (errno));
exit (EXIT_FAILURE);
}
}
else {
fatal_error (interactive,
"No input filename give after '-i'");
}
break;
/* output filename (-o) */
case 'o':
if (++i < argc) {
outfp = fopen (argv[i], "w");
if (outfp == NULL) {
fprintf (stderr,
"%s: cannot open %s for output - %s\n\n",
PROG_NAME, argv[i], strerror (errno));
exit (EXIT_FAILURE);
}
}
else {
fatal_error (interactive,
"No output filename give after '-o'");
}
break;
/* Print help message (-h, -?) */
case 'h':
case '?':
print_help ();
exit (EXIT_SUCCESS);
break;
/* Option starts with "--"; look for "--help" and "--verbose" */
case '-':
/* (--help) */
if (strcmp (&argv[i][2], "help") == 0) {
print_help ();
exit (EXIT_SUCCESS);
}
/* (--version) */
else if (strcmp (&argv[i][2], "version") == 0) {
printf ("%s %s\n", PROG_NAME, PROG_VERSION);
printf ("Copyright (C) 2018 Paul Hardy\n");
printf ("License GPLv2+: GNU GPL version 2 or later \n");
printf ("This is free software: you are free to change and redistribute it.\n");
printf ("There is NO WARRANTY, to the extent permitted by law.\n\n");
exit (EXIT_SUCCESS);
}
else {
fatal_error (interactive, "Unrecognized option");
}
break;
default:
fatal_error (interactive, "Unrecognized option");
break;
}
}
else {
if (infp == stdin)
fprintf (stderr, "Unrecognized parameter %s\n\n",
argv[i]);
else
fprintf (stderr, "%s: unrecognized parameter %s\n\n",
PROG_NAME, argv[i]);
print_help ();
exit (EXIT_FAILURE);
}
}
/*
Read one number per input line, possibly with following string
*/
codept = 0; /* Initialize to avoid blank line input */
while (fgets (instring, MAXSTRING, infp) != NULL) {
/* Get Unicode code point at start of line */
if (instring[0] != '\n' && instring[0] != '\0') {
sscanf (instring, "%X", &codept);
if (cvt2utf8 (codept, utf8_bytes) > 0) { /* If in Unicode range */
/*
Non-swapped output (no option '-s'); echo input line first
*/
if (swap_order == 0) { /* Print input values first */
print_instring (codept, in_formats,in_format,
print_remainder, rem_format,
instring, outfp);
} /* swap_order == 0 */
/*
Print selected UTF-8 encoding output and/or the character itself
*/
print_outstring (print_char, print_codes, utf8_bytes, out_format, outfp);
/*
Swapped output (option '-s'); echo input line after other output
*/
if (swap_order == 1) { /* Print input values last */
print_instring (codept, in_formats,in_format,
print_remainder, rem_format,
instring, outfp);
} /* swap_order == 1 */
fprintf (outfp, "\n"); /* Printed all output for this input line */
} /* cvt2utf8 (codept, utf8_bytes) >= 0 */
else {
if (interactive) { /* Print error, but keep going */
fprintf (stderr, "Out of range Unicode value > 10FFFF\n");
}
else { /* Non-interactive -- abort */
fatal_error (interactive,
"Out of range Unicode value > 10FFFF");
}
} /* cvt2utf8 (codept, utf8_bytes) < 0 (invalid code point) */
codept = 0; /* reset to zero in case next input is a blank line */
} /* input line did not start with a newline or '\0' */
} /* while not at end of input */
fclose (outfp);
exit (exit_status);
}
/*
Print an error message, print the help menu, then quit
with non-zero exit status.
If the input file pointer points to stdin, do not print
the program naem and begin the message with an uppercase
letter. Otherwise, print the program name and begin the
message with a lowercase letter.
*/
void
fatal_error (int interactive, char *err_message)
{
void print_help ();
if (interactive) {
fprintf (stderr, "%c%s\n\n",
toupper (err_message[0]), &err_message[1]);
}
else {
fprintf (stderr, "%s: %c%s\n\n",
PROG_NAME, tolower (err_message[0]), &err_message[1]);
}
if (interactive) print_help ();
exit (EXIT_FAILURE);
}
/*
Print a help message.
If the input file pointer points to stdin, do not print
the program naem and begin the message with an uppercase
letter. Otherwise, print the program name and begin the
message with a lowercase letter.
*/
void
print_help ()
{
fprintf (stdout, "Syntax: %s { [-e ] | [-E ] } ",
PROG_NAME);
fprintf (stdout, "[-r ]\n");
fprintf (stdout, " [ [-u ] | -n] [-c] [-s]\n");
fprintf (stdout, " [-i ] [-o ]\n\n");
fprintf (stdout, " , , , and \n");
fprintf (stdout, " are printf format strings\n\n");
fprintf (stdout, " -e Echo input code point in one format\n\n");
fprintf (stdout, " -E Echo input code point in two formats\n\n");
fprintf (stdout, " -r Print remainder of input after code point\n\n");
fprintf (stdout, " -u UTF-8 output format\n\n");
fprintf (stdout, " -n Do not print UTF-8 codes\n\n");
fprintf (stdout, " -c print +UTF-8 character after UTF-8 bytes\n\n");
fprintf (stdout, " -s Swap order: print UTF-8 string first, then input value\n\n");
fprintf (stdout, " -h\n");
fprintf (stdout, " --help This help message\n\n");
fprintf (stdout, " --version Program version information\n\n");
fprintf (stdout, " Examples:\n\n");
fprintf (stdout, " %s -e \"0x%%04X \" -u \"\\%%03o\"\n\n",
PROG_NAME);
fprintf (stdout, " %s -E \"U+%%04x = 0%%02o = \"\n\n",
PROG_NAME);
fprintf (stdout, " %s -s -e \" /* U+%%04X */\" -u \"\\%%03o\"\n\n",
PROG_NAME);
fprintf (stdout, " Valid Unicode values range from hexadecimal 0 through 10FFFF\n\n");
return;
}
/*
Convert a Unicode code point to a UTF-8 string.
The allowable Unicode range is U+0000..U+10FFFF.
codept - the Unicode code point to encode
utf8_bytes - an array of 5 bytes to hold the UTF-8 encoded string;
the string will consist of up to 4 UTF-8-encoded bytes,
with null bytes after the last encoded byte to signal
to the end of the array, utf8_bytes[4].
*/
int
cvt2utf8 (uint32_t codept, unsigned *utf8_bytes)
{
int bin_length; /* number of binary digits, for forming UTF-8 */
int byte_length; /* numberof bytes of UTF-8 */
int bin_digits (uint32_t);
/*
If codept is within the valid Unicode range of
0x0 through 0x10FFFF inclusive, convert it to UTF-8.
*/
if (codept <= 0x10FFFF) {
byte_length = 0;
bin_length = bin_digits (codept);
if (bin_length < 8) { /* U+0000..U+007F */
byte_length = 1;
utf8_bytes [0] = codept;
utf8_bytes [1] =
utf8_bytes [2] =
utf8_bytes [3] =
utf8_bytes [4] = 0;
}
else if (bin_length < 12) { /* U+0080..U+07FF */
byte_length = 2;
utf8_bytes [0] = 0xC0 | ((codept >> 6) & 0x1F);
utf8_bytes [1] = 0x80 | ( codept & 0x3F);
utf8_bytes [2] =
utf8_bytes [3] =
utf8_bytes [4] = 0;
}
else if (bin_length < 17) { /* U+0800..U+FFFF */
byte_length = 3;
utf8_bytes [0] = 0xE0 | ((codept >> 12) & 0x0F);
utf8_bytes [1] = 0x80 | ((codept >> 6) & 0x3F);
utf8_bytes [2] = 0x80 | ( codept & 0x3F);
utf8_bytes [3] =
utf8_bytes [4] = 0;
}
else if (bin_length < 22) { /* U+010000..U+10FFFF */
byte_length = 4;
utf8_bytes [0] = 0xF0 | ((codept >> 18) & 0x07);
utf8_bytes [1] = 0x80 | ((codept >> 12) & 0x3F);
utf8_bytes [2] = 0x80 | ((codept >> 6) & 0x3F);
utf8_bytes [3] = 0x80 | ( codept & 0x3F);
utf8_bytes [4] = 0;
}
} /* encoded output for valid Unicode code point */
else { /* flag out of range Unicode code point */
/*
0xFF is never a valid UTF-8 code point, so testing
for it will be an easy check of a valid return value.
*/
byte_length = -1;
utf8_bytes [0] = 0xFF;
utf8_bytes [1] = 0xFF;
utf8_bytes [2] = 0xFF;
utf8_bytes [3] = 0xFF;
utf8_bytes [4] = 0;
}
return byte_length;
}
/*
Print an array of bytes comprising one UTF-8 encoded character.
outfp - the output stream file pointer
utf8_bytes - an array of 7 bytes holding a null-terminated UTF-8 string
utf_format - format for fprintf to use with each byte
*/
void
fprint_utf8 (FILE *outfp,
unsigned *utf8_bytes,
char *utf_format)
{
int i; /* loop variable */
for (i = 0; utf8_bytes[i] != 0x00; i++)
fprintf (outfp, utf_format, utf8_bytes[i] & 0xFF);
return;
}
/*
Return the number of significant binary digits in an unsigned number.
*/
int
bin_digits (uint32_t itest)
{
uint32_t i;
int result;
i = 0x80000000; /* mask highest uint32_t bit */
result = 32;
while ( (i != 0) && ((itest & i) == 0) ) {
i >>= 1;
result--;
}
return result;
}
/*
Output the input line in the desired format.
codept The Unicode code point
in_formats Number of ways to format the input
code point (1 or 2)
in_format The format string to use to print the
code point
print_remainder 1 if printing remainder of input string,
0 otherwise
rem_format The format string for the remainder of input
following the code point + space
instring The entire line of input, null terminated
outfp The output file pointer
*/
void
print_instring (int codept,
int in_formats, char *in_format,
int print_remainder, char *rem_format,
char *instring, FILE *outfp)
{
int i; /* loop variable */
if (in_formats == 1) /* option '-e' specified */
fprintf (outfp, in_format, codept);
else if (in_formats == 2) /* option '-E' specified */
fprintf (outfp, in_format, codept, codept);
if (print_remainder == 1) { /* option '-r' specified */
instring [strlen (instring) - 1] = '\0';
/* Find start of hexadecimal number */
for (i = 0;
isspace (instring[i]) || instring[i] == '\0';
i++);
if (instring[i] != '\0') {
/* Find space after hexadecimal number */
for (i = 0;
!isspace (instring[i]) || instring[i] == '\0';
i++);
i++; /* Skip the space character */
/* If a valid string, print it */
if (i < strlen (instring) && instring[i] != '\0') {
fprintf (outfp, rem_format, &instring[i]);
}
} /* instring[i] != '\0' */
} /* print_remainder == 1 */
return;
}
/*
Print UTF-8 encoded output string and/or UTF-8 character
print_char 1 to print the Unicode character itself as
a UTF-8 byte stream
print_codes 1 to print UTF-8 encoded bytes numerically
utf8_bytes The byte string of UTF-8 values, null-terminated
out_format The output format to use for printing UTF-8 bytes
outfp The output file pointer
*/
void
print_outstring (int print_char, int print_codes,
unsigned *utf8_bytes,
char *out_format, FILE *outfp)
{
int i; /* loop variable */
void fprint_utf8 (FILE *, unsigned *, char *);
if (print_codes == 1) { /* no option '-n' specified */
fprint_utf8 (outfp, utf8_bytes, out_format);
}
if (print_char == 1) { /* option '-c' */
fputc (' ', outfp);
for (i = 0; i < 4 && utf8_bytes[i] != '\0'; i++)
fputc (utf8_bytes[i], outfp);
}
return;
}
utf8gen-1.1/ChangeLog 0000644 0001750 0001750 00000000436 13331350403 013167 0 ustar paul paul 2018-08-04 Paul Hardy
* Version 1.1. Added "orig" target in Makefile.am to remove
all Autotools-generated files, returning the source package
to its pristine state.
2018-07-14 Paul Hardy
* Version 1.0. Initial version.
utf8gen-1.1/AUTHORS 0000644 0001750 0001750 00000000135 13322423110 012455 0 ustar paul paul The utf8gen package was written by
Paul Hardy
in June-July 2018.
utf8gen-1.1/NEWS 0000644 0001750 0001750 00000000522 13331350271 012113 0 ustar paul paul 2018-08-04 Version 1.1.
Added "make orig" target to restore pristine pre-Autotools tarball.
This is to help downstream distros port packages to new architectures,
as well as to guarantee that all files are build with the latest
versions of Autotools programs.
2018-07-14 Version 1.0.
Initial version of utf8gen.
utf8gen-1.1/COPYING 0000644 0001750 0001750 00000120144 13316507655 012467 0 ustar paul paul GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Copyright (C)
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.
GNU Free Documentation License
Version 1.3, 3 November 2008
Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other
functional and useful document "free" in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way
to get credit for their work, while not being considered responsible
for modifications made by others.
This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense. It
complements the GNU General Public License, which is a copyleft
license designed for free software.
We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does. But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License
principally for works whose purpose is instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License. Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein. The "Document", below,
refers to any such manual or work. Any member of the public is a
licensee, and is addressed as "you". You accept the license if you
copy, modify or distribute the work in a way requiring permission
under copyright law.
A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall
subject (or to related matters) and contains nothing that could fall
directly within that overall subject. (Thus, if the Document is in
part a textbook of mathematics, a Secondary Section may not explain
any mathematics.) The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.
The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License. If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant. The Document may contain zero
Invariant Sections. If the Document does not identify any Invariant
Sections then there are none.
The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License. A Front-Cover Text may
be at most 5 words, and a Back-Cover Text may be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters. A copy made in an otherwise Transparent file
format whose markup, or absence of markup, has been arranged to thwart
or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount
of text. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML, PostScript or PDF designed for human modification. Examples of
transparent image formats include PNG, XCF and JPG. Opaque formats
include proprietary formats that can be read and edited only by
proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page. For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.
The "publisher" means any person or entity that distributes copies of
the Document to the public.
A section "Entitled XYZ" means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language. (Here XYZ stands for a
specific section name mentioned below, such as "Acknowledgements",
"Dedications", "Endorsements", or "History".) To "Preserve the Title"
of such a section when you modify the Document means that it remains a
section "Entitled XYZ" according to this definition.
The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document. These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no
other conditions whatsoever to those of this License. You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept
compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and
you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover. Both covers must also clearly and legibly identify
you as the publisher of these copies. The front cover must present
the full title with all words of the title equally prominent and
visible. You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.
If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a computer-network location from which the general network-using
public has access to download using public-standard network protocols
a complete Transparent copy of the Document, free of added material.
If you use the latter option, you must take reasonably prudent steps,
when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that
edition to the public.
It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to
give them a chance to provide you with an updated version of the
Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct
from that of the Document, and from those of previous versions
(which should, if there were any, be listed in the History section
of the Document). You may use the same title as a previous version
if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
responsible for authorship of the modifications in the Modified
Version, together with at least five of the principal authors of the
Document (all of its principal authors, if it has fewer than five),
unless they release you from this requirement.
C. State on the Title page the name of the publisher of the
Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice
giving the public permission to use the Modified Version under the
terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections
and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title, and add
to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page. If
there is no section Entitled "History" in the Document, create one
stating the title, year, authors, and publisher of the Document as
given on its Title Page, then add an item describing the Modified
Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for
public access to a Transparent copy of the Document, and likewise
the network locations given in the Document for previous versions
it was based on. These may be placed in the "History" section.
You may omit a network location for a work that was published at
least four years before the Document itself, or if the original
publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications",
Preserve the Title of the section, and preserve in the section all
the substance and tone of each of the contributor acknowledgements
and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section numbers
or the equivalent are not considered part of the section titles.
M. Delete any section Entitled "Endorsements". Such a section
may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements"
or to conflict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant. To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.
You may add a section Entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version. Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled "History"
in the various original documents, forming one section Entitled
"History"; likewise combine any sections Entitled "Acknowledgements",
and any sections Entitled "Dedications". You must delete all sections
Entitled "Endorsements".
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other
documents released under this License, and replace the individual
copies of this License in the various documents with a single copy
that is included in the collection, provided that you follow the rules
of this License for verbatim copying of each of the documents in all
other respects.
You may extract a single document from such a collection, and
distribute it individually under this License, provided you insert a
copy of this License into the extracted document, and follow this
License in all other respects regarding verbatim copying of that
document.
7. AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers. In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.
If a section in the Document is Entitled "Acknowledgements",
"Dedications", or "History", the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual
title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense, or distribute it is void, and
will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your license
from a particular copyright holder is reinstated (a) provisionally,
unless and until the copyright holder explicitly and finally
terminates your license, and (b) permanently, if the copyright holder
fails to notify you of the violation by some reasonable means prior to
60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, receipt of a copy of some or all of the same material does
not give you any rights to use it.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the
GNU Free Documentation License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in
detail to address new problems or concerns. See
https://www.gnu.org/licenses/.
Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License "or any later version" applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation. If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation. If the Document
specifies that a proxy can decide which future versions of this
License can be used, that proxy's public statement of acceptance of a
version permanently authorizes you to choose that version for the
Document.
11. RELICENSING
"Massive Multiauthor Collaboration Site" (or "MMC Site") means any
World Wide Web server that publishes copyrightable works and also
provides prominent facilities for anybody to edit those works. A
public wiki that anybody can edit is an example of such a server. A
"Massive Multiauthor Collaboration" (or "MMC") contained in the site
means any set of copyrightable works thus published on the MMC site.
"CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
license published by Creative Commons Corporation, a not-for-profit
corporation with a principal place of business in San Francisco,
California, as well as future copyleft versions of that license
published by that same organization.
"Incorporate" means to publish or republish a Document, in whole or in
part, as part of another Document.
An MMC is "eligible for relicensing" if it is licensed under this
License, and if all works that were first published under this License
somewhere other than this MMC, and subsequently incorporated in whole or
in part into the MMC, (1) had no cover texts or invariant sections, and
(2) were thus incorporated prior to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the site
under CC-BY-SA on the same site at any time before August 1, 2009,
provided the MMC is eligible for relicensing.
ADDENDUM: How to use this License for your documents
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:
Copyright (c) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the "with...Texts." line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the
situation.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.