hexpat-0.20.13/ 0000755 0000000 0000000 00000000000 13122604047 011333 5 ustar 00 0000000 0000000 hexpat-0.20.13/Setup.lhs 0000644 0000000 0000000 00000000116 13122604047 013141 0 ustar 00 0000000 0000000 #! /usr/bin/env runhaskell
> import Distribution.Simple
> main = defaultMain
hexpat-0.20.13/LICENSE 0000644 0000000 0000000 00000002611 13122604047 012340 0 ustar 00 0000000 0000000 Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the author nor the names of contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
hexpat-0.20.13/hexpat.cabal 0000644 0000000 0000000 00000015460 13122604047 013616 0 ustar 00 0000000 0000000 Cabal-Version: >= 1.6
Name: hexpat
Version: 0.20.13
Synopsis: XML parser/formatter based on expat
Description:
This package provides a general purpose Haskell XML library using Expat to
do its parsing ( - a fast stream-oriented XML
parser written in C). It is extensible to any string type, with @String@,
@ByteString@ and @Text@ provided out of the box.
.
Basic usage: Parsing a tree (/Tree/), formatting a tree (/Format/).
Other features: Helpers for processing XML trees (/Proc/), trees annotated with
XML source location (/Annotated/), extended XML trees with comments,
processing instructions, etc (/Extended/), XML cursors (/Cursor/),
SAX-style parse (/SAX/), and access to the low-level interface in case speed
is paramount (/Internal.IO/).
.
The design goals are speed, speed, speed, interface simplicity and modularity.
.
For introduction and examples, see the /Text.XML.Expat.Tree/ module. For benchmarks,
.
If you want to do interactive I\/O, an obvious option is to use lazy parsing
with one of the lazy I\/O functions such as hGetContents. However, this can be
problematic in some applications because it doesn't handle I\/O errors properly
and can give no guarantee of timely resource cleanup. Because of the generalized
list, Hexpat is designed to allow for chunked I/O, but as of this writing I haven't
done a nice integration with enumerator and friends.
.
/IO/ is filed under /Internal/ because it's low-level and most users won't want
it. The other /Internal/ modules are re-exported by /Annotated/, /Tree/ and /Extended/,
so you won't need to import them directly.
.
If you have trouble building on Windows, you can try the bundle flag. This will
make it build from the source of libexpat bundled inside the hexpat package:
cabal install -f bundle hexpat
.
Credits to Iavor Diatchki and the @xml@ (XML.Light) package for /Proc/ and /Cursor/.
Thanks to the many contributors.
.
ChangeLog: 0.15 changes intended to fix a (rare) \"error: a C finalizer called back into Haskell.\"
that seemed only to happen only on ghc6.12.X; 0.15.1 Fix broken Annotated parse;
0.16 switch from mtl to transformers; 0.17 fix mapNodeContainer & rename some things.;
0.18 rename defaultEncoding to overrideEncoding. 0.18.3 formatG and indent were demanding list
items more than once (inefficient in chunked processing); 0.19 add Extended.hs;
0.19.1 fix a memory leak introduced in 0.19, delegate parsing to bound thread
if unbound (see note above); 0.19.2 include expat source code so \'cabal install\' just works
on Linux, Mac and Windows (thanks Jacob Stanley); 0.19.3 fix misconfiguration of expat
which broke entity parsing; 0.19.4 bump version constraint for text; 0.19.5 bump text
to < 0.12 and fix text-0.10.0.1 breakage; 0.19.6 dependency breakage with List;
0.19.7 ghc-7.2.1 compatibility; 0.19.8 fix space leak on lazy parse under ghc-7.2.1;
0.19.9 fix formatting of > character + improve performance; 0.19.10 ghc-7.4.x compatibility;
0.20.1 fix an unfortunate crash when used in parallel processing and greatly improve
performance; 0.20.2 make parseSaxG lazier; 0.20.3 minor build issues; 0.20.4 remove
dependency on extensible-exceptions; 0.20.5 bump text upper bound; 0.20.6 bump text again
to include 1.1.x.x; 0.20.7 bump text again for 1.2.x.x; 0.20.8 bump utf8-string dep;
0.20.9 bump deepseq dep/ghc-7.10 compatibility.; 0.20.10 increase dependency upper bounds;
0.20.11 update to libexpat-2.2.1 which includes several security fixes;
0.20.12 use the system libexpat by default, but provide a bundle flag to allow a bundled
copy of expat to be used, which might make life easier on Windows: cabal install -f bundle
hexpat; 0.20.13 Fix some mistakes made in 0.20.12 cabal file.
Category: XML
License: BSD3
License-File: LICENSE
Author:
Stephen Blackheath [blackh] (the primary author),
Doug Beardsley,
Gregory Collins,
Evan Martin (who started the project),
Matthew Pocock [drdozer],
Kevin Jardine,
Jacob Stanley,
Simon Hengel
Maintainer: Stephen Blackheath
Copyright:
(c) 2009 Doug Beardsley ,
(c) 2009-2012 Stephen Blackheath ,
(c) 2009 Gregory Collins,
(c) 2008 Evan Martin ,
(c) 2009 Matthew Pocock ,
(c) 2007-2009 Galois Inc.,
(c) 2010 Kevin Jardine,
(c) 2012 Simon Hengel
Homepage: http://haskell.org/haskellwiki/Hexpat/
Extra-Source-Files:
test/hexpat-tests.cabal,
test/test.xml,
test/suite/TestSuite.hs,
test/suite/Text/XML/Expat/Proc/Tests.hs,
test/suite/Text/XML/Expat/UnitTests.hs,
test/suite/Text/XML/Expat/Tests.hs,
test/suite/Text/XML/Expat/Cursor/Tests.hs,
test/suite/Text/XML/Expat/ParallelTest.hs,
test/suite/Text/XML/Expat/ParseFormat.hs,
test/thread-leak/build.sh,
test/thread-leak/callme.c,
test/thread-leak/cleak.c,
test/thread-leak/clean.sh,
test/thread-leak/thread-leak.hs,
test/hexpat-leak/instant-message.llsd,
test/hexpat-leak/Parse.hs,
test/hexpat-leak/run.sh,
test/hexpat-leak/build.sh
test/readRoads.hs,
test/ROADS.xml,
cbits/winconfig.h,
cbits/xmltok.h
cbits/winconfig.h
cbits/xmltok_ns.c
cbits/internal.h
cbits/utf8tab.h
cbits/siphash.h
cbits/latin1tab.h
cbits/xmltok.h
cbits/expat.h
cbits/xmltok.c
cbits/iasciitab.h
cbits/asciitab.h
cbits/README
cbits/xmlparse.c
cbits/xmltok_impl.h
cbits/xmltok_impl.c
cbits/xmlrole.c
cbits/xmlrole.h
cbits/expat_external.h
cbits/ascii.h
cbits/nametab.h
Build-Type: Simple
Stability: beta
source-repository head
type: git
location: https://github.com/the-real-blackh/hexpat
Flag bundle {
Description: Use bundled libexpat
Default: False
}
Library
Build-Depends:
base >= 3 && < 5,
bytestring,
transformers,
text >= 0.5.0.0 && < 1.3.0.0,
utf8-string >= 0.3 && < 1.1,
deepseq >= 1.1.0.0 && < 1.5.0.0,
containers,
List >= 0.4.2 && < 0.7
Exposed-Modules:
Text.XML.Expat.Annotated,
Text.XML.Expat.Cursor,
Text.XML.Expat.Extended,
Text.XML.Expat.Format,
Text.XML.Expat.Proc,
Text.XML.Expat.SAX,
Text.XML.Expat.Tree,
Text.XML.Expat.Internal.DocumentClass,
Text.XML.Expat.Internal.IO,
Text.XML.Expat.Internal.Namespaced,
Text.XML.Expat.Internal.NodeClass,
Text.XML.Expat.Internal.Qualified
ghc-options: -Wall -fno-warn-name-shadowing
if flag(bundle) {
include-dirs: cbits
c-sources:
cbits/xmlparse.c,
cbits/xmlrole.c,
cbits/xmltok.c,
cbits/xmltok_impl.c,
cbits/xmltok_ns.c,
Text/XML/Expat/Internal/Glue.c
cc-options: -DHAVE_MEMMOVE -DXML_NS -DXML_DTD
}
else {
c-sources:
Text/XML/Expat/Internal/Glue.c
extra-libraries: expat
}
hexpat-0.20.13/test/ 0000755 0000000 0000000 00000000000 13122604047 012312 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/ROADS.xml 0000644 0000000 0000000 00000253105 13122604047 013712 0 ustar 00 0000000 0000000
hexpat-0.20.13/test/test.xml 0000644 0000000 0000000 00000010400 13122604047 014006 0 ustar 00 0000000 0000000
Gana Prajatantri Bangladesh
Bangladesci
Bangladesh
An Bhanglaidéis
Bangladesh - বাংলাদেশ
Bangladesia
ບັງກະລາເທດ
Bangladeş
Bangladesj
Bangladeša
Bangladešas
บังคลาเทศ
Бангладеш
బంగ్లాదేశ్
பங்களாதேஷ்
Bangladesch
Bangladesh
བངྒ་ལ་དེཤ
Bangladesh
ಬಾಂಗ್ಲಾದೇಶ
বাংলাদেশ
Μπανγκλαντές
Bangladeŝo
Bangladesh
孟加拉国
Bangladesh
Bangladesh
Bangladesh
Бангладеш
Bangladesh
Бангладэш
Бангладеш
Bangladesh
Bangladesh
Gonaoprojatontri Bangladesh
Bangladeš
バングラデシュ
Bangladèsh
Bangladesch
Бангладеш
Bangla Desh
Bangladesh
Bangladéš
بنګلهدیش
Bangladesh
Bangladesh
Bangladesz
Բանգլադեշ
Bangladeš
Bangladèch
Banglades
बंगलादेश
בנגלאדש
Bangladesh
ബംഗ്ലാദേശ്
Бангладеш
بنگلہ دیش
Bangladexx
Бангладеш
बांगलादेश
بېنگلا
Bangladesj
Bangladesh
Bangladess
ባንግላዲሽ
Bangladesh
Bangladesh
بنغلاديش
Bangladesh
Bangladesh
Bangladesh
बंगलादेश
Bangladesh
Bangladesh
Bangladesh
Bangladesh
Bangladesh
Bangaala-Deesh
Bangladesh
Bangladesh
Banglades
بنگلادش
Bangladesh
Bangladesj
ბანგლადეში
Бангладеш
Bangladeshi
방글라데시
Bangladesh
បង់ក្លាដេស្ហ
Bangladéš
Bangladeš
Bangladesh
Bangladeş
Bangladeš
Bangladesh
hexpat-0.20.13/test/readRoads.hs 0000644 0000000 0000000 00000001137 13122604047 014554 0 ustar 00 0000000 0000000 {-# LANGUAGE OverloadedStrings #-}
import Text.XML.Expat.Tree
import Control.Monad
import Data.Text (Text)
import qualified Data.Text as T
import Data.ByteString (ByteString)
import qualified Data.ByteString.Lazy as L
import Data.Maybe
-- Reads the contents of ROADS.xml from stdin
main :: IO ()
main = do
bs <- L.getContents
let Element _ _ chs = parseThrowing defaultParseOptions bs :: UNode Text
forM_ chs $ \ch -> do
case ch of
elt@(Element "shape" _ _) -> do
putStrLn $ T.unpack $ fromMaybe "" $ getAttribute elt "FULL_NAME"
_ -> return ()
hexpat-0.20.13/test/hexpat-tests.cabal 0000644 0000000 0000000 00000001042 13122604047 015724 0 ustar 00 0000000 0000000 Cabal-Version: >= 1.4
Name: hexpat-tests
Version: 0.11
Build-Type: Simple
Executable testsuite
hs-source-dirs: suite
main-is: TestSuite.hs
build-depends:
HUnit < 1.3,
QuickCheck >= 2.7.0.0,
base >= 3 && < 5,
bytestring,
containers,
transformers,
deepseq >= 1.1.0.0,
parallel >= 3.1.0.0,
test-framework,
test-framework-hunit,
test-framework-quickcheck2,
text >= 0.5,
utf8-string >= 0.3.3,
List >= 0.4.2,
mtl,
random,
hexpat
ghc-options: -Wall -fhpc -threaded
hexpat-0.20.13/test/hexpat-leak/ 0000755 0000000 0000000 00000000000 13122604047 014515 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/hexpat-leak/instant-message.llsd 0000644 0000000 0000000 00000002627 13122604047 020506 0 ustar 00 0000000 0000000
hexpat-0.20.13/test/hexpat-leak/run.sh 0000644 0000000 0000000 00000000061 13122604047 015652 0 ustar 00 0000000 0000000 ./Parse instant-message.llsd 10 10000 >/dev/null
hexpat-0.20.13/test/hexpat-leak/build.sh 0000644 0000000 0000000 00000000057 13122604047 016152 0 ustar 00 0000000 0000000 rm -f Parse.o
ghc -O Parse.hs --make -threaded
hexpat-0.20.13/test/hexpat-leak/Parse.hs 0000644 0000000 0000000 00000001327 13122604047 016126 0 ustar 00 0000000 0000000 -- Thanks to Bryan O'Sullivan for this test case.
-- hexpat will spawn zillions of threads (which is seen as huge virtual memory
-- usage in top). This is now fixed in 0.19.1.
import Control.Concurrent
import Control.Monad
import qualified Data.ByteString as B
import Text.XML.Expat.Tree
import System.Environment
main = do
[path, threads, reads] <- getArgs
let nthreads = read threads
qs <- newQSem 0
replicateM_ nthreads $ do
forkIO $ do
replicateM_ (read reads) $ do
bs <- B.readFile path
case parse' defaultParseOptions bs of
Left err -> print err
Right p -> print (p :: UNode B.ByteString)
signalQSem qs
replicateM_ nthreads $ waitQSem qs
putStrLn "done"
hexpat-0.20.13/test/suite/ 0000755 0000000 0000000 00000000000 13122604047 013443 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/suite/TestSuite.hs 0000644 0000000 0000000 00000001613 13122604047 015731 0 ustar 00 0000000 0000000 module Main where
import qualified Text.XML.Expat.UnitTests
import qualified Text.XML.Expat.Cursor.Tests
import qualified Text.XML.Expat.Proc.Tests
import qualified Text.XML.Expat.ParseFormat
import qualified Text.XML.Expat.ParallelTest
import Test.Framework (defaultMain, testGroup)
main :: IO ()
main = defaultMain tests
where tests = [ testGroup "unit tests"
Text.XML.Expat.UnitTests.tests
, testGroup "Text.XML.Expat.Proc"
Text.XML.Expat.Proc.Tests.tests
, testGroup "Text.XML.Expat.Cursor"
Text.XML.Expat.Cursor.Tests.tests
, testGroup "Text.XML.Expat.ParseFormat"
Text.XML.Expat.ParseFormat.tests
, testGroup "Text.XML.Expat.ParallelTest"
Text.XML.Expat.ParallelTest.tests
]
hexpat-0.20.13/test/suite/Text/ 0000755 0000000 0000000 00000000000 13122604047 014367 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/suite/Text/XML/ 0000755 0000000 0000000 00000000000 13122604047 015027 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/suite/Text/XML/Expat/ 0000755 0000000 0000000 00000000000 13122604047 016110 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/suite/Text/XML/Expat/Tests.hs 0000644 0000000 0000000 00000004126 13122604047 017551 0 ustar 00 0000000 0000000 {-# LANGUAGE OverloadedStrings, FlexibleInstances, TypeSynonymInstances #-}
module Text.XML.Expat.Tests
( TCursor
, TNode
, testTagSet
, testTextSet
, testAttrSet )
where
import Control.Applicative
import Control.Monad (liftM)
import Data.ByteString.Char8 (ByteString)
import qualified Data.Map as M
import Test.QuickCheck
import Text.XML.Expat.Cursor (Cursor)
import Text.XML.Expat.Tree
------------------------------------------------------------------------------
type TCursor = Cursor ByteString ByteString
type TNode = Node ByteString ByteString
testTagSet :: [ByteString]
testTagSet = [ "apple"
, "banana"
, "cauliflower"
, "duck"
, "eel"
, "ferret"
, "grape" ]
testTextSet :: [ByteString]
testTextSet = [ "zoo"
, "yellow"
, "xylophone"
, "wet"
, "vulture"
, "ululate"
, "tympani" ]
testAttrSet :: [ByteString]
testAttrSet = [ "sheep"
, "ram"
, "quail"
, "penguin"
, "ox"
, "narwhal" ]
instance Arbitrary TNode where
arbitrary = mkElem 0
where
depth :: Int -> Gen TNode
depth n = do
prob <- (choose (0, 1) :: Gen Float)
if prob < 0.75 then mkElem n else mkText
mkAttr = do
key <- elements testAttrSet
val <- elements testAttrSet
return (key,val)
mkText = liftM Text $ elements testTextSet
mkElem n = do
nchildren <- if n > 3
then return 0
else choose ((0,6) :: (Int,Int))
nattrs <- choose ((0,4) :: (Int,Int))
attrs <- M.toList . M.fromList -- remove duplicate attributes
<$> sequence (replicate nattrs mkAttr)
children <- sequence $ replicate nchildren (depth (n+1))
tagname <- elements testTagSet
return $ Element tagname attrs children
hexpat-0.20.13/test/suite/Text/XML/Expat/UnitTests.hs 0000644 0000000 0000000 00000036276 13122604047 020424 0 ustar 00 0000000 0000000 module Text.XML.Expat.UnitTests where
import Text.XML.Expat.Tree hiding (parse)
import qualified Text.XML.Expat.Tree as Tree
import Text.XML.Expat.SAX (SAXEvent(..))
import qualified Text.XML.Expat.SAX as SAX
import Text.XML.Expat.Cursor
import Text.XML.Expat.Format
import Text.XML.Expat.Extended (LDocument)
import qualified Text.XML.Expat.Extended as Extended
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString.Lazy.Char8 as LC
import qualified Data.ByteString.Lazy as L
import qualified Data.Text as T
import Foreign
import Foreign.C
import Data.ByteString.Internal (c2w, w2c)
import Data.Char
import Data.Maybe
import Data.Monoid
import Data.IORef
import Control.Applicative
import Control.Arrow (first)
import Control.Exception as E
import Control.Monad
import Control.DeepSeq
import Test.HUnit hiding (Node)
import System.IO
import Test.Framework.Providers.HUnit (hUnitTestToTests)
toByteStringL :: String -> L.ByteString
toByteStringL = L.pack . map c2w
fromByteStringL :: L.ByteString -> String
fromByteStringL = map w2c . L.unpack
toByteString :: String -> B.ByteString
toByteString = B.pack . map c2w
fromByteString :: B.ByteString -> String
fromByteString = map w2c . B.unpack
testDoc :: (Show tag, Show text) =>
(ParseOptions tag text
-> bs
-> Either XMLParseError (Node tag text))
-> (Node tag text -> L.ByteString)
-> (String -> bs)
-> String
-> Int
-> String
-> IO ()
testDoc parseFn fmt toBS descr0 idx xml = do
let eTree = parseFn popts (toBS xml)
descr = descr0++" #"++show idx
case eTree of
Right tree -> do
let out = fromByteStringL $ fmt tree
assertEqual descr xml out
Left error -> do
hPutStrLn stderr $ "parse failed: "++show error
assertFailure descr
where
popts = defaultParseOptions { overrideEncoding = Just UTF8 }
eitherify f mEnc bs = do
case f mEnc bs of
(_, Just err) -> Left err
(doc, Nothing) -> Right doc
test_error1 :: IO ()
test_error1 = do
let eDoc = Tree.parse' defaultParseOptions (toByteString "") :: Either XMLParseError (UNode String)
assertEqual "error1" (Left $ XMLParseError "mismatched tag" (XMLParseLocation 1 9 9 0)) eDoc
test_error2 :: IO ()
test_error2 = do
assertEqual "error2" (
Element {eName = "hello", eAttributes = [], eChildren = []},
Just (XMLParseError "mismatched tag" (XMLParseLocation 1 9 9 0))
) (Tree.parse defaultParseOptions
(toByteStringL "") :: (UNode String, Maybe XMLParseError))
test_error3 :: IO ()
test_error3 =
assertEqual "error3" (
Element {eName = "open", eAttributes = [], eChildren = [
Element {eName = "test1", eAttributes = [], eChildren = [Text "Hello"]},
Element {eName = "hello", eAttributes = [], eChildren = []}
]},
Just (XMLParseError "mismatched tag" (XMLParseLocation 1 35 35 0))
) $ Tree.parse defaultParseOptions
(toByteStringL "Hello")
test_error4 :: IO ()
test_error4 = do
let eDoc = Tree.parse' defaultParseOptions (toByteString "!") :: Either XMLParseError (UNode String)
assertEqual "error1" (Left $ XMLParseError "not well-formed (invalid token)"
(XMLParseLocation 1 0 0 0)) eDoc
test_entities1 = do
assertEqual "parse error" merr Nothing
assertEqual "entity substitution" (Text "foo") c
where
xml = "&entity;"
popts = defaultParseOptions { entityDecoder = Just entityLookup }
(tree,merr) = Tree.parse popts $ toByteStringL xml
c = current $ fromJust $ firstChild $ fromTree tree
entityLookup b = if b == "entity"
then Just "foo"
else Nothing
test_entities2 = do
assertEqual "wrong answer" (Element "html" [] [Text "\228"], Nothing) pr
where
pr :: (UNode String, Maybe XMLParseError)
pr = Tree.parse opt $ LC.pack "ä"
where
opt = defaultParseOptions
{ entityDecoder = Just ed }
ed "auml" = Just "\228"
ed _ = Nothing
test_textContent = do
let tree = Element "cheese" [("type", "edam")]
[Text "You don't actually ",
Element "sub" [] [Text "have any "],
Text "cheese at all",
Text ", do you?"]
assertEqual "textContent" "You don't actually have any cheese at all, do you?" (textContent tree)
testXMLFile :: IO String
testXMLFile = do
s <- map w2c . B.unpack <$> B.readFile "test.xml"
-- Remove trailing newline
return (reverse . dropWhile (== '\n') . reverse $ s)
test_indent = do
let tests = [
("#1",
toByteString "\n",
toByteString "\n"),
("#2",
toByteString "\nWith some text in it",
toByteString "\nWith some text in it"),
("#3",
toByteString $ "\n"++
"",
toByteString $ "\n"++
"\n \n \n \n"),
("#4",
toByteString $ "\n"++
"strengthSlaveryPeace",
toByteString $ "\n"++
"\n strength\n Slavery\n Peace\n"),
("#5",
toByteString $ "\n"++
"Extra hereMinistry of TruthIn between"++
"Ministry of Love\n And some more"++
"Ministry of Plenty"++
"Ministry of PeaceEurasia"++
"strength",
toByteString $ "\n"++
"Extra here\n \n Ministry of TruthIn between"++
"\n Ministry of LoveAnd some more"++
"\n Ministry of Plenty"++
"\n Ministry of Peace\n Eurasia\n \n "++
"\n \n strength\n \n")
]
forM_ tests $ \(name, inp, outSB) -> do
let eree = Tree.parse' defaultParseOptions inp :: Either XMLParseError (UNode String)
case eree of
Left err -> assertFailure $ show err
Right tree -> do
let outIS = format' (indent 2 tree)
assertEqual name outSB outIS
test_setAttribute :: IO ()
test_setAttribute = do
assertEqual "#1" [("abc", "def")] $ getAttributes $
setAttribute "abc" "def"
(Element "test" [] [])
assertEqual "#2" [("abc", "def")] $ getAttributes $
setAttribute "abc" "def"
(Element "test" [("abc", "xyzzy")] [])
assertEqual "#2" [("abc", "def"), ("abc", "xyzzy")] $ getAttributes $
setAttribute "abc" "def"
(Element "test" [("abc", "zapf"), ("abc", "xyzzy")] [])
assertEqual "#3" [("zanzi", "zapf"), ("bar", "xyzzy"), ("abc", "def")] $ getAttributes $
setAttribute "abc" "def"
(Element "test" [("zanzi", "zapf"), ("bar", "xyzzy")] [])
assertEqual "#4" [("zanzi", "zapf"), ("bar", "xyzzy")] $ getAttributes $
deleteAttribute "abc"
(Element "test" [("zanzi", "zapf"), ("bar", "xyzzy"), ("abc", "def")] [])
assertEqual "#5" [("zanzi", "zapf"), ("abc", "def")] $ getAttributes $
deleteAttribute "bar"
(Element "test" [("zanzi", "zapf"), ("bar", "xyzzy"), ("abc", "def")] [])
assertEqual "#6" [("zanzi", "zapf"), ("bar", "xyzzy"), ("abc", "def")] $ getAttributes $
deleteAttribute "bumpf"
(Element "test" [("zanzi", "zapf"), ("bar", "xyzzy"), ("abc", "def")] [])
simpleDocs = [
"\n"++
"Cat & mouseDog & bone",
"\n"++
"Cat & mouseDog & boneRose & Crown",
"\nCat & mouse"
]
data ParseFormatTest = ParseFormatTest {
}
test_xmlDecl1 :: IO ()
test_xmlDecl1 = do
assertEqual "plain" [XMLDeclaration "1.0" Nothing Nothing,StartElement "hello" [],EndElement "hello"]
(SAX.parse defaultParseOptions (LC.pack "") :: [SAXEvent String String])
assertEqual "withEnc" [XMLDeclaration "1.0" (Just "UTF-8") Nothing,StartElement "hello" [],EndElement "hello"]
(SAX.parse defaultParseOptions (LC.pack "") :: [SAXEvent String String])
assertEqual "SA0" [XMLDeclaration "1.0" Nothing (Just False),StartElement "hello" [],EndElement "hello"]
(SAX.parse defaultParseOptions (LC.pack "") :: [SAXEvent String String])
assertEqual "SA1" [XMLDeclaration "1.0" Nothing (Just True),StartElement "hello" [],EndElement "hello"]
(SAX.parse defaultParseOptions (LC.pack "") :: [SAXEvent String String])
assertEqual "SA0enc" [XMLDeclaration "1.0" (Just "UTF-8") (Just False),StartElement "hello" [],EndElement "hello"]
(SAX.parse defaultParseOptions (LC.pack "") :: [SAXEvent String String])
assertEqual "SA1enc" [XMLDeclaration "1.0" (Just "UTF-8") (Just True),StartElement "hello" [],EndElement "hello"]
(SAX.parse defaultParseOptions (LC.pack "") :: [SAXEvent String String])
normalizeSAXText :: Monoid t => [SAXEvent s t] -> [SAXEvent s t]
normalizeSAXText (CharacterData a:CharacterData b:xs) = normalizeSAXText (CharacterData (a `mappend` b):xs)
normalizeSAXText (x:xs) = x:normalizeSAXText xs
normalizeSAXText [] = []
test_various :: IO ()
test_various =
assertEqual "var1" [
StartElement "test" [],
StartElement "sample" [("id","5")],
CharacterData "This \"text with quotations\" should be escaped.",
EndElement "sample",
StartElement "mytest" [],
CharacterData "//",
StartCData,
CharacterData "This \"text with quotations\" should not be escaped.//",
EndCData,
EndElement "mytest",
ProcessingInstruction "php" "somecode(); ",
Comment " this is a comment ",
EndElement "test"
]
(normalizeSAXText $ SAX.parse defaultParseOptions (LC.pack variousText) :: [SAXEvent String String])
variousText =
""++
"This \"text with quotations\" should be escaped."++
""++
"//"++
""++
""++
""++
""
quotationOut =
"This "text with quotations" should be escaped."++
"//"++
""
test_quotation =
assertEqual "quotation"
(quotationOut, Nothing)
$ first (C.unpack . mconcat . LC.toChunks . formatDocument)
$ (Extended.parse defaultParseOptions (LC.pack variousText) :: (LDocument String String, Maybe XMLParseError))
tests = hUnitTestToTests $
TestList [
t' ("String",
Tree.parse' :: ParseOptions String String
-> B.ByteString
-> Either XMLParseError (Node String String),
format),
t' ("ByteString",
Tree.parse' :: ParseOptions B.ByteString B.ByteString
-> B.ByteString
-> Either XMLParseError (Node B.ByteString B.ByteString),
format),
t' ("Text",
Tree.parse' :: ParseOptions T.Text T.Text
-> B.ByteString
-> Either XMLParseError (Node T.Text T.Text),
format),
t ("String/Lazy",
eitherify $ Tree.parse :: ParseOptions String String
-> L.ByteString
-> Either XMLParseError (Node String String),
format),
t ("ByteString/Lazy",
eitherify $ Tree.parse :: ParseOptions B.ByteString B.ByteString
-> L.ByteString
-> Either XMLParseError (Node B.ByteString B.ByteString),
format),
t ("Text/Lazy",
eitherify $ Tree.parse :: ParseOptions T.Text T.Text
-> L.ByteString
-> Either XMLParseError (Node T.Text T.Text),
format),
TestLabel "error1" $ TestCase $ test_error1,
TestLabel "error2" $ TestCase $ test_error2,
TestLabel "error3" $ TestCase $ test_error3,
TestLabel "error4" $ TestCase $ test_error4,
TestLabel "entities1" $ TestCase $ test_entities1,
TestLabel "entities2" $ TestCase $ test_entities2,
TestLabel "textContent" $ TestCase $ test_textContent,
TestLabel "indent" $ TestCase $ test_indent,
TestLabel "setAttribute" $ TestCase $ test_setAttribute,
TestLabel "xmlDecl1" $ TestCase $ test_xmlDecl1,
TestLabel "various" $ TestCase $ test_various,
TestLabel "quotation" $ TestCase $ test_quotation
]
where
t (descr, parse, fmt) = TestLabel descr $ TestCase $ do
f <- testXMLFile
let docs = f:simpleDocs
forM_ (zip [1..] docs) $ \(idx, doc) ->
testDoc parse fmt toByteStringL descr idx doc
t' (descr, parse, fmt) = TestLabel descr $ TestCase $ do
f <- testXMLFile
let docs = f:simpleDocs
forM_ (zip [1..] docs) $ \(idx, doc) ->
testDoc parse fmt toByteString descr idx doc
hexpat-0.20.13/test/suite/Text/XML/Expat/ParallelTest.hs 0000644 0000000 0000000 00000004420 13122604047 021040 0 ustar 00 0000000 0000000 {-# LANGUAGE OverloadedStrings #-}
-- | The purpose of this test is to make sure that if we run lots of parses on
-- multiple threads, that they all give the correct answers. This is important,
-- because this implementation is imperative code hidden inside an unsafePerformIO.
module Text.XML.Expat.ParallelTest where
import Text.XML.Expat.Tests -- Arbitrary instance
import Text.XML.Expat.ParseFormat (normalizeText)
import Text.XML.Expat.Tree
import Text.XML.Expat.Format
import Control.Concurrent
import Control.Exception
import Control.Monad.State.Strict
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as L
import Test.QuickCheck
import Test.QuickCheck.Gen
import Test.QuickCheck.Random
import Test.HUnit hiding (Node)
import System.IO
import System.Random
import Test.Framework.Providers.HUnit (hUnitTestToTests)
import Prelude hiding (catch)
tests = hUnitTestToTests $
TestList [
TestLabel "parallel (forkIO)" $ TestCase (testParallel forkOS),
TestLabel "parallel (forkOS)" $ TestCase (testParallel forkOS)
]
chunkSize = 512
breakUp :: B.ByteString -> L.ByteString
breakUp = L.fromChunks . bu
where
bu bs | B.length bs < chunkSize = [bs]
bu bs = bs1:bu bs2
where
(bs1, bs2) = B.splitAt chunkSize bs
nthreads = 5
nloops = 500
testParallel :: (IO () -> IO ThreadId) -> IO ()
testParallel fork = do
resultMVs <- replicateM nthreads $ do
resultMV <- newEmptyMVar
do
replicateM_ nloops $ do
g <- newQCGen
let treeIn = normalizeText $ unGen (arbitrary :: Gen TNode) g 30
xml = breakUp $ format' treeIn
treeOut = normalizeText $ parseThrowing defaultParseOptions xml
assertEqual "tree match" treeIn treeOut
`catch` \exc -> do
putStrLn $ "failing XML: "++concat (map B.unpack $ L.toChunks xml)
throwIO (exc :: SomeException)
putMVar resultMV Nothing
`catch` \exc -> do
putMVar resultMV $ Just (exc :: SomeException)
return resultMV
forM_ resultMVs $ \resultMV -> do
mExc <- takeMVar resultMV
case mExc of
Just exc -> throwIO exc
Nothing -> return ()
hexpat-0.20.13/test/suite/Text/XML/Expat/ParseFormat.hs 0000644 0000000 0000000 00000023754 13122604047 020702 0 ustar 00 0000000 0000000 {-# LANGUAGE OverloadedStrings, FlexibleContexts #-}
module Text.XML.Expat.ParseFormat where
import Text.XML.Expat.Extended
import Text.XML.Expat.Format
import qualified Text.XML.Expat.Tree as Tree
import qualified Text.XML.Expat.Annotated as Annotated
import Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as L
import Data.List
import Data.Maybe
import Data.Monoid
import Data.Text (Text)
import Test.Framework.Providers.HUnit (hUnitTestToTests)
import Test.HUnit
tests = hUnitTestToTests $
TestList $ concatMap mkTests pfTests
pfTests :: [PFTest]
pfTests = [
PFTest {
pfName = "quotation",
pfXML = "\n" `mappend`
"This \"text with quotations\" should be escaped.\n" `mappend`
"\n" `mappend`
"//\n" `mappend`
"\n" `mappend`
"\n" `mappend`
"\n" `mappend`
"",
pfDoc = mkPlainDocument $
Element "test" [] [
Text "\n",
Element "sample" [("id","5")] [Text "This \"text with quotations\" should be escaped."] (),
Text "\n",
Element "mytest" [] [
Text "\n",
Text "//",
CData "\nThis \"text with quotations\" should not be escaped.\nAnother line goes here.\n\nAnd more.\n//",
Text "\n"
] (),
Text "\n",
Misc (ProcessingInstruction "php" "somecode(); "),
Text "\n",
Misc (Comment " this is a comment "),
Text "\n"
] (),
pfOutXML = [(Extended,
"\n" `mappend`
-- " gets translated into " here but not inside CDATA.
"This "text with quotations" should be escaped.\n" `mappend`
"\n" `mappend`
"//\n" `mappend`
"\n" `mappend`
"\n" `mappend`
"\n" `mappend`
"" )],
pfImpls = [Extended]
},
PFTest {
pfName = "xmlDecl1",
pfXML = "\n",
pfDoc = Document (Just (XMLDeclaration "1.0" Nothing Nothing)) Nothing [] (Element "hello" [] [] ()),
pfOutXML = [],
pfImpls = [Extended]
},
PFTest {
pfName = "xmlDecl2",
pfXML = "\n",
pfDoc = Document (Just (XMLDeclaration "1.0" (Just "ISO-8859-1") Nothing)) Nothing [] (Element "hello" [] [] ()),
pfOutXML = [],
pfImpls = [Extended]
},
PFTest {
pfName = "xmlDecl3",
pfXML = "\n",
pfDoc = Document (Just (XMLDeclaration "1.0" Nothing (Just True))) Nothing [] (Element "hello" [] [] ()),
pfOutXML = [],
pfImpls = [Extended]
},
PFTest {
pfName = "xmlDecl4",
pfXML = "\n",
pfDoc = Document (Just (XMLDeclaration "1.0" Nothing (Just False))) Nothing [] (Element "hello" [] [] ()),
pfOutXML = [],
pfImpls = [Extended]
},
PFTest {
pfName = "topLevelMiscs1",
pfXML = "\n\n\n",
pfDoc = Document (Just (XMLDeclaration "1.0" Nothing Nothing)) Nothing [
ProcessingInstruction "process" "My code",
Comment " And a comment "
] (Element "hello" [] [] ()),
pfOutXML = [],
pfImpls = [Extended]
},
PFTest {
pfName = "topLevelMiscs2",
-- Test that we can read processing instructions and comments from after the root element.
pfXML = "\n\n\n" `mappend`
"\n\n",
pfDoc = Document (Just (XMLDeclaration "1.0" Nothing Nothing)) Nothing [
ProcessingInstruction "process" "My code",
Comment " And a comment ",
Comment " Also afterwards ",
ProcessingInstruction "php" "something();"
] (Element "hello" [] [] ()),
-- In the output they appear *before* the root element, however.
pfOutXML = [(Extended,
"\n\n\n" `mappend`
"\n\n"
)],
pfImpls = [Extended]
},
PFTest {
pfName = "basic",
pfXML = "\n" `mappend`
"Cat & mouseIn between" `mappend`
"Dog & bone" `mappend`
"Rose & Crown",
pfDoc = Document (Just (XMLDeclaration "1.0" (Just "UTF-8") Nothing)) Nothing [] (
Element "second" [] [Element "test" [] [Element "test1" [("type","expression")]
[Text "Cat ",Text "&",Text " mouse"] (),Text "In between",
Element "test2" [("type","communication"),("language","Rhyming slang")]
[Text "Dog &",Text " bone"] ()] (),Element "test" []
[Text "Ro", Text "se & Crown"] ()] ()), -- Test text normalization
pfOutXML = [],
pfImpls = [Tree, Annotated, Extended]
},
PFTest {
pfName = "escaping of >",
pfXML = "\n]]>",
pfDoc = Document (Just (XMLDeclaration "1.0" (Just "UTF-8") Nothing)) Nothing [] (
Element "text" [] [Text "]]>"] ()),
pfOutXML = [],
pfImpls = [Extended]
}
]
-- | Recursively append all adjacent Text nodes.
normalizeText :: (NodeClass n [], Monoid text) => n [] tag text -> n [] tag text
normalizeText = modifyChildren combine
where
combine (t1:t2:ns) | isText t1 && isText t2 = combine ((mkText $ getText t1 `mappend` getText t2):ns)
combine (e:ns) | isElement e = normalizeText e : combine ns
combine (n:ns) = n:combine ns
combine [] = []
mkTests :: PFTest -> [Test]
mkTests pf = flip concatMap (pfImpls pf) $ \impl ->
case impl of
Tree -> [
TestLabel (pfName pf ++ "-tree") $ TestCase $ do
case Tree.parse' defaultParseOptions (pfXML pf) of
Left err -> assertFailure $ "parse failed: "++show err
Right root0 -> do
let root = normalizeText root0
sbDoc = normalizeText $ fromElement (getRoot $ pfDoc pf)
assertEqual "parse match" sbDoc (root :: Tree.UNode Text)
let sb = fromMaybe (pfXML pf) (impl `lookup` pfOutXML pf)
bs = format' root
assertEqual "format match" sb bs
]
Annotated -> [
TestLabel (pfName pf ++ "-tree") $ TestCase $ do
case Annotated.parse' defaultParseOptions (pfXML pf) of
Left err -> assertFailure $ "parse failed: "++show err
Right root0 -> do
let root = normalizeText $ Annotated.mapAnnotation (const ()) root0
sbDoc = normalizeText $ fromElement (getRoot $ pfDoc pf)
assertEqual "parse match" sbDoc (root :: Annotated.UNode () Text)
let sb = fromMaybe (pfXML pf) (impl `lookup` pfOutXML pf)
bs = format' root
assertEqual "format match" sb bs
]
Extended -> [
TestLabel (pfName pf ++ "-extended") $ TestCase $ do
case parse' defaultParseOptions (pfXML pf) of
Left err -> assertFailure $ "parse failed: "++show err
Right doc0 -> do
let doc = modifyRoot normalizeText $ mapDocumentAnnotation (const ()) doc0
assertEqual "parse match" (modifyRoot normalizeText $ pfDoc pf) doc
let sb = fromMaybe (pfXML pf) (impl `lookup` pfOutXML pf)
bs = formatDocument' (pfDoc pf)
assertEqual "format match" sb bs
]
data Impl = Tree | Annotated | Extended deriving (Eq, Ord, Show)
data PFTest = PFTest {
pfName :: String,
pfXML :: ByteString,
pfDoc :: UDocument () Text,
pfOutXML :: [(Impl, ByteString)], -- ^ Output XML where it differs from the input XML
pfImpls :: [Impl]
}
hexpat-0.20.13/test/suite/Text/XML/Expat/Proc/ 0000755 0000000 0000000 00000000000 13122604047 017013 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/suite/Text/XML/Expat/Proc/Tests.hs 0000644 0000000 0000000 00000010707 13122604047 020456 0 ustar 00 0000000 0000000 {-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
module Text.XML.Expat.Proc.Tests (tests) where
import Data.Maybe
import Test.Framework (Test)
import Test.Framework.Providers.QuickCheck2
import Test.QuickCheck
import Text.XML.Expat.Tests
import Text.XML.Expat.Proc
import Text.XML.Expat.Tree
tests :: [Test]
tests = [ testProperty "onlyElems" prop_onlyElems
, testProperty "onlyText" prop_onlyText
, testProperty "findChildren" prop_findChildren
, testProperty "findChildren2" prop_findChildren2
, testProperty "findChildren3" prop_findChildren3
, testProperty "filterChildren" prop_filterChildren
, testProperty "findChild" prop_findChild
, testProperty "filterElements1" prop_filterElements1
, testProperty "filterElements2" prop_filterElements2
, testProperty "filterElements3" prop_filterElements3
, testProperty "filterElements4" prop_filterElements4
, testProperty "others" prop_others
]
prop_onlyElems :: [TNode] -> Bool
prop_onlyElems nodes = all isElement els
where
els = onlyElems nodes
prop_onlyText :: [TNode] -> Bool
prop_onlyText nodes = all isAText txts
where
txts = onlyText nodes
isAText b = elem b testTextSet
prop_findChildren :: TNode -> Bool
prop_findChildren node = all p ch
where
ch = findChildren "banana" node
p (Text _) = False
p (Element nm _ _) = nm == "banana"
prop_findChildren2 :: Bool
prop_findChildren2 = ch == [child1]
where
child1 :: TNode
child1 = Element "banana" [] []
child2 :: TNode
child2 = Element "rhubarb" [] []
node :: TNode
node = Element "root" [] [child1, child2]
ch = findChildren "banana" node
prop_findChildren3 :: Bool
prop_findChildren3 = null ch
where
child :: TNode
child = Text "foo"
!ch = findChildren "banana" child
prop_filterChildren :: TNode -> Bool
prop_filterChildren node = all p ch
where
ch = filterChildrenName (=="banana") node
p (Text _) = False
p (Element nm _ _) = nm == "banana"
prop_findChild :: TNode -> Property
prop_findChild node' = isElement node' ==> r == (Just child)
where
child :: TNode
child = Element "tag" [] []
node = node' { eChildren = child:(eChildren node') }
r = findChild "tag" node
-- test positive case
prop_filterElements1 :: TNode -> Bool
prop_filterElements1 n@(Text _) = filterElements isText n == [n]
prop_filterElements1 n@(Element nm _ _) = filterElements f n == [n]
where
f = isNamed nm
-- test that all results obey the predicate
prop_filterElements2 :: TNode -> Bool
prop_filterElements2 n = p1 && p2
where
l1 = filterElements isText n
l2 = filterElements isElement n
p1 = all isText l1
p2 = all isElement l2
-- test that we grab all elements
prop_filterElements3 :: TNode -> Bool
prop_filterElements3 n = p1 && p2
where
l1 = filterElements isText n
l2 = filterElements isElement n
p1 = all isText l1
p2 = all isElement l2
-- test that all children match & that we don't recurse into matching children
prop_filterElements4 :: Property
prop_filterElements4 = forAll gen $ \node ->
let ch = getChildren node
in ch == f node && ch == g node
where
gen = do
let node = Element "banana" [] [] :: TNode
let node'= Element "banana" [] [node] :: TNode
n <- choose(0,5)
let l = node':(replicate n node)
return $ Element "root" [] l
f = filterElements (isNamed "banana")
g = filterElementsName (=="banana")
-- other functions are all trivial, this property just gives us code coverage
prop_others :: Bool
prop_others = and [p1, p2, p3, p4]
where
child1 :: TNode
child1 = Element "banana" [] []
child2 :: TNode
child2 = Element "rhubarb" [] []
node :: TNode
node = Element "root" [] [child1, child2]
fc1 = filterChild (isNamed "rhubarb") node
fc2 = filterChildName (=="rhubarb") node
fc3 = findElement "banana" node
fc4 = filterElement (isNamed "root") node
fc5 = filterChild (isNamed "root") node
fc6 = filterElementName (=="root") node
p1 = all isJust [fc1, fc2, fc3, fc4, fc6]
p2 = all (not . isJust) [fc5]
p3 = findElements "banana" node == [child1]
p4 = filterElementsName (=="root") foo == []
foo :: TNode
foo = Text "foo"
hexpat-0.20.13/test/suite/Text/XML/Expat/Cursor/ 0000755 0000000 0000000 00000000000 13122604047 017365 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/suite/Text/XML/Expat/Cursor/Tests.hs 0000644 0000000 0000000 00000037363 13122604047 021037 0 ustar 00 0000000 0000000 {-# LANGUAGE BangPatterns, FlexibleContexts, OverloadedStrings #-}
module Text.XML.Expat.Cursor.Tests (tests) where
import Control.Monad (replicateM)
import Data.Maybe
import Test.Framework (Test)
import Test.Framework.Providers.QuickCheck2
import Test.QuickCheck
import Text.XML.Expat.Tests
import Text.XML.Expat.Cursor
import Text.XML.Expat.Tree
tests :: [Test]
tests = [ testProperty "invertible" prop_invertible
, testProperty "invertible2" prop_invertible2
, testProperty "fromTag" prop_fromTag
, testProperty "fromForest" prop_fromForest
, testProperty "firstChild" prop_firstChild
, testProperty "firstChild2" prop_firstChild2
, testProperty "downUp" prop_downUp
, testProperty "leftRight" prop_leftRight
, testProperty "root" prop_root
, testProperty "lastChild" prop_lastChild
, testProperty "findLeft" prop_findLeft
, testProperty "findRight" prop_findRight
, testProperty "findChild" prop_findChild
, testProperty "findChild2" prop_findChild2
, testProperty "nextDF1" prop_nextDF1
, testProperty "nextDF2" prop_nextDF2
, testProperty "nextDF3" prop_nextDF3
, testProperty "findRec" prop_findRec
, testProperty "getNodeIndex" prop_getNodeIndex
, testProperty "emptyChild" prop_emptyChild
, testProperty "negativeChild" prop_negativeChild
, testProperty "isChild" prop_isChild
, testProperty "modifyContent" prop_modifyContent
, testProperty "modifyContentList" prop_modifyContentList
, testProperty "insertChildren" prop_insertChildren
, testProperty "insertLeftRight" prop_insertLeftRight
, testProperty "removeLeftRight" prop_removeLeftRight
, testProperty "insertGo" prop_insertGo
, testProperty "removeGo" prop_removeGo
]
------------------------------------------------------------------------------
satisfy :: (Arbitrary a) =>
(a -> Bool) -- ^ predicate that generated values must
-- satisfy
-> Gen a -- ^ generator
-> Gen a
satisfy f g = do
x <- arbitrary
if f x then return x else satisfy f g
currentsEq :: TCursor -> TCursor -> Bool
currentsEq a b = current a == current b
someNodes :: Gen [TNode]
someNodes = do
n <- choose (3::Int, 8::Int)
replicateM n arbitrary
const' :: a -> b -> a
const' x y = y `seq` x
allSame :: (Eq a) => [a] -> Bool
allSame [] = True
allSame xs = and $ map (uncurry (==)) (xs `zip` tail xs)
------------------------------------------------------------------------------
prop_invertible :: TNode -> Bool
prop_invertible n = p
where
p = toTree (fromTree n) == n
prop_invertible2 :: [TNode] -> Property
prop_invertible2 n = not (null n) ==> p
where
p = toForest (fromJust $ fromForest n :: TCursor) == n
-- this is stupid because the function is so trivial, but I lust after the
-- green bar
prop_fromTag :: TNode -> Property
prop_fromTag n = isElement n ==> fromTag (getTag n) (eChildren n) == n
prop_fromForest :: Bool
prop_fromForest = isNothing (fromForest [] :: Maybe TCursor)
prop_firstChild :: TNode -> Property
prop_firstChild node = isElement node ==> p1 && p2
where
child1 :: TNode
child1 = Element "gryphon" [] []
node' = node { eChildren= child1:(eChildren node) }
mbfc = do
c <- firstChild $ fromTree node'
return $ current c
p1 = isJust mbfc
p2 = maybe False (== child1) mbfc
prop_firstChild2 :: Bool
prop_firstChild2 = (isNothing $ firstChild c) && (isNothing $ firstChild c2)
where
node :: TNode
node = Element "root" [] []
txt :: TNode
txt = Text ""
c = fromTree node
c2 = fromTree txt
prop_downUp :: TNode -> Property
prop_downUp node = isElement node && (not $ null $ eChildren node) ==> p
where
p = p1 && p2
cursor = fromTree node
p1 = isNothing $ parent cursor
m = do
cur' <- firstChild cursor
pa <- parent cur'
return $ current pa
p2 = maybe False (== node) m
prop_leftRight :: Property
prop_leftRight = forAll gen p
where
gen :: Gen (TNode,TNode,TNode)
gen = do
ch1 <- arbitrary
ch2 <- arbitrary
chN1 <- arbitrary
chN <- arbitrary
n <- satisfy isElement (arbitrary :: Gen TNode)
let n' = n {eChildren = ([ch1,ch2] ++ (eChildren n) ++ [chN1,chN])}
return (n', ch1, chN)
p :: (TNode, TNode, TNode) -> Bool
p (node, ch1, chN) = p1 && p2
where
cursor = fromTree node
m1 = do
curFirst <- firstChild cursor
curLast <- lastChild cursor
let f = current curFirst
let l = current curLast
return $ (f == ch1) && (l == chN)
p1 = fromMaybe False m1
m2 = do
curFirst <- firstChild cursor
curLast <- lastChild cursor
l <- left curLast >>= left
r <- right l >>= right
a <- right curFirst >>= right
b <- left a >>= left
let bad1 = left curFirst
let bad2 = right curLast
let lch = current curLast
let x = current r
let fch = current curFirst
let y = current b
return (x == lch && y == fch && isNothing bad1 && isNothing bad2)
p2 = fromMaybe False m2
prop_root :: Property
prop_root = forAll gen f
where
gen :: Gen (TNode,TNode,TNode)
gen = do
ch1' <- satisfy isElement arbitrary
ch2 <- arbitrary
let ch1 = ch1' { eChildren = ch2:(eChildren ch1') }
n <- satisfy isElement (arbitrary :: Gen TNode)
let n' = n {eChildren = ch1:(eChildren n)}
return (n',ch1,ch2)
f (n,ch1,ch2) = do
fromMaybe False m
where
m = do c1 <- firstChild $ fromTree n
c2 <- firstChild c1
let r = root c2
return $ and [ current r == n
, current c1 == ch1
, current c2 == ch2 ]
prop_lastChild :: TNode -> Property
prop_lastChild n' = isElement n' ==> p
where
n = n' { eChildren=[] }
c = fromTree n
n2 = n { eChildren=[n'] }
c2 = fromTree n2
p = p1 && p2
p1 = isNothing $ lastChild c
mc = lastChild c2 >>= parent
p2 = maybe False (\x -> toTree x == n2) mc
prop_findLeft :: Property
prop_findLeft = forAll gen f
where
gen :: Gen (TCursor,TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let node = Element "root" [] (ch:nodes)
i <- choose (1,length nodes)
let cn = fromTree node
let c1 = fromMaybe (error "impossible") (getChild i cn)
let c2 = fromMaybe (error "impossible") (firstChild cn)
return (c1, c2)
f :: (TCursor,TCursor) -> Bool
f (c,c') = maybe False (currentsEq c') mbC
where
mbC = findLeft (\x -> isNamed "halibut" $ current x) c
prop_findRight :: Property
prop_findRight = forAll gen f
where
gen :: Gen (TCursor,TCursor)
gen = do
n <- choose (3::Int, 8::Int)
i <- choose (0,n-1)
nodes <- replicateM n arbitrary
let ch = Element "halibut" [] []
let node = Element "root" [] (nodes ++ [ch])
let cn = fromTree node
let c1 = fromMaybe (error "impossible") (getChild i cn)
let c2 = fromMaybe (error "impossible") (lastChild cn)
return (c1, c2)
f :: (TCursor,TCursor) -> Bool
f (c,c') = maybe False (currentsEq c') mbC
where
mbC = findRight (\x -> isNamed "halibut" $ current x) c
prop_findChild :: Property
prop_findChild = forAll gen f
where
gen :: Gen (TCursor,TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let n = length nodes
let node = Element "root" [] (nodes ++ [ch] ++ nodes ++ [ch])
let cn = fromTree node
let c1 = fromMaybe (error "impossible") (getChild n cn)
return (cn, c1)
f :: (TCursor,TCursor) -> Bool
f (c,c') = maybe False (currentsEq c') mbC
where
mbC = findChild (\x -> isNamed "halibut" $ current x) c
prop_findChild2 :: Property
prop_findChild2 = forAll gen f
where
gen :: Gen (TCursor,TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let node = Element "root" [] (ch:nodes)
let cn = fromTree node
let c1 = fromMaybe (error "impossible") (firstChild cn)
return (cn, c1)
f :: (TCursor,TCursor) -> Bool
f (c,c') = maybe False (currentsEq c') mbC
where
mbC = findChild (\x -> isNamed "halibut" $ current x) c
prop_nextDF1 :: Property
prop_nextDF1 = forAll gen $ uncurry currentsEq
where
gen :: Gen (TCursor, TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let node = Element "root" [] (ch:nodes)
let cn = fromJust $ fromForest [node]
let c1 = fromJust (firstChild cn)
let c2 = fromJust (nextDF cn)
return (c1, c2)
prop_nextDF2 :: Property
prop_nextDF2 = forAll gen $ uncurry currentsEq
where
gen :: Gen (TCursor, TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let node = Element "root" [] (ch:nodes)
let cn = fromJust $ fromForest [node]
let cc = fromJust (firstChild cn)
let c1 = fromJust (nextDF cc)
let c2 = fromJust (right cc)
return (c1, c2)
prop_nextDF3 :: Property
prop_nextDF3 = forAll gen $ uncurry currentsEq
where
gen :: Gen (TCursor, TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let ch2 = Element "pike" [] []
let node1 = Element "subtree1" [] [ch]
let node2 = Element "subtree2" [] (ch2:nodes)
let node = Element "root" [] [node1, node2]
let cn = fromJust $ fromForest [node] :: TCursor
let cc = fromJust (firstChild cn >>= firstChild)
let c1 = fromJust $ nextDF cc
let c2 = fromJust (firstChild cn >>= right)
return (c1, c2)
prop_findRec :: Property
prop_findRec = forAll gen $ uncurry currentsEq
where
gen :: Gen (TCursor, TCursor)
gen = do
nodes <- someNodes
let ch = Element "halibut" [] []
let ch2 = Element "pike" [] []
let node1 = Element "subtree1" [] [ch]
let node2 = Element "subtree2" [] (ch2:nodes)
let node = Element "root" [] [node1, node2]
let cn = fromTree node
let c1 = fromJust (firstChild cn >>= right >>= firstChild)
let c2 = fromJust $ findRec (isNamed "pike" . current) cn
return (c1, c2)
prop_emptyChild :: Bool
prop_emptyChild = isNothing m
where
tree :: TNode
tree = Element "root" [] []
m = getChild 0 $ fromTree tree
prop_negativeChild :: Property
prop_negativeChild = forAll gen f
where
gen = satisfy isElement (arbitrary :: Gen TNode)
f node = isNothing $ getChild (-1) (fromTree node)
prop_getNodeIndex :: Property
prop_getNodeIndex = forAll gen $ uncurry (==)
where
gen :: Gen (Int, Int)
gen = do
nodes <- replicateM 10 arbitrary
i <- choose (0,9)
let node = (Element "root" [] nodes)::TNode
let cn = fromTree node
let c1 = fromJust $ getChild i cn
let j = getNodeIndex c1
return (i,j)
prop_isChild :: TNode -> Bool
prop_isChild n = isChild c && hasChildren r && isFirst r && isLast c
where
node = Element "root" [] [n]
r = fromTree node
c = fromJust $ firstChild r
prop_modifyContent :: Property
prop_modifyContent = forAll gen allSame
where
gen :: Gen [TNode]
gen = do
nodes <- replicateM 10 arbitrary
let n1 = Element "apple" [] []
let n2 = Element "banana" [] []
let tree1 = Element "root" [] (n1:nodes)
let tree2 = Element "root" [] (n2:nodes)
let c = fromJust . firstChild . fromTree $ tree1
let c1 = modifyContent (const' n2) c
c2 <- modifyContentM (const' $ return n2) c
let tree3 = toTree c1
let tree4 = toTree c2
return [tree2, tree3, tree4]
prop_modifyContentList :: Property
prop_modifyContentList = forAll gen $ uncurry (==)
where
gen :: Gen (TNode,TNode)
gen = do
nodes <- replicateM 10 arbitrary
let n = Element "apple" [] []
let tree1 = Element "root" [] [n]
let tree2 = Element "root" [] nodes
let c = fromJust . firstChild . fromTree $ tree1
let c1 = fromJust $ modifyContentList (const' nodes) c
let treeResult = toTree c1
return (tree2, treeResult)
prop_insertChildren :: [TNode] -> Bool
prop_insertChildren ns = isNothing m1 && tree2 == tree3
where
tree = Element "root" [] ns
n1 = Element "alpha" [] []
n2 = Element "omega" [] []
n3 = Element "beta" [] []
n4 = Element "gamma" [] []
tree2 = Element "root" [] $ concat [[n1,n2], ns, [n3,n4]]
txt :: TNode
txt = Text "foo"
m1 = insertFirstChild n1 $ fromTree txt
top = fromTree tree
tree3 = fromJust $ do
c1 <- insertFirstChild n2 top
c2 <- insertLastChild n3 c1
c3 <- insertManyFirstChild [n1] c2
c4 <- insertManyLastChild [n4] c3
return $ toTree c4
prop_insertLeftRight :: (TNode,TNode) -> Bool
prop_insertLeftRight (n,n') = f == [n1, n', n, n2]
where
n1 = Element "alpha" [] []
n2 = Element "omega" [] []
c = insertRight n2 $ insertManyLeft [n'] $ insertLeft n1 $ fromTree n
f = toForest c
prop_removeLeftRight :: [TNode] -> Property
prop_removeLeftRight ns = not (null ns) ==> p1 && p2 && p3
where
n1 = Element "alpha" [] []
n2 = Element "omega" [] []
tree1 = Element "root" [] (n1:(ns ++ [n2]))
tree2 = Element "root" [] ns
c1 = fromJust $ firstChild (fromTree tree1)
m1 = removeLeft c1
c2 = fromJust $ right c1
(x1,c3) = fromJust $ removeLeft c2
c4 = fromJust (parent c3 >>= lastChild)
m4 = removeRight c4
c5 = fromJust $ left c4
(x2,c6) = fromJust $ removeRight c5
tree3 = toTree c6
p1 = tree2 == tree3
p2 = isNothing m1 && isNothing m4
p3 = n1 == x1 && n2 == x2
prop_insertGo :: [TNode] -> Property
prop_insertGo ns = not (null ns) ==> p1
where
n1 = Element "alpha" [] []
n2 = Element "omega" [] []
tree1 = Element "root" [] ns
tree2 = Element "root" [] $ [n1,n2] ++ ns
c1 = fromJust $ firstChild (fromTree tree1)
c2 = insertGoLeft n1 c1
c3 = insertGoRight n2 c2
tree3 = toTree c3
p1 = tree2 == tree3
prop_removeGo :: [TNode] -> Property
prop_removeGo ns = not (null ns) ==> p1 && p2 && p3
where
n1 = Element "alpha" [] []
n2 = Element "omega" [] []
tree1 = Element "root" [] $ [n1,n2] ++ ns
tree2 = Element "root" [] ns
top = fromTree tree1
c1 = fromJust $ firstChild top
c2 = fromJust $ lastChild top
m1 = removeGoLeft c1
m2 = removeGoRight c2
m3 = removeGoUp top
c3 = fromJust $ right c1
c4 = fromJust $ removeGoLeft c3
n3 = current c4
c5 = fromJust $ removeGoRight c4
c6 = fromJust $ removeGoUp c4
m4 = left c6
m5 = right c6
tree3 = toTree c5
tree4 = toTree c6
p1 = and $ map isNothing [m1,m2,m3,m4,m5]
p2 = tree2 == tree3 && tree2 == tree4
p3 = n1 == n3
hexpat-0.20.13/test/thread-leak/ 0000755 0000000 0000000 00000000000 13122604047 014473 5 ustar 00 0000000 0000000 hexpat-0.20.13/test/thread-leak/cleak.c 0000644 0000000 0000000 00000000001 13122604047 015705 0 ustar 00 0000000 0000000
hexpat-0.20.13/test/thread-leak/callme.c 0000644 0000000 0000000 00000000127 13122604047 016074 0 ustar 00 0000000 0000000
void callme(void (*cb)())
{
int i;
for (i = 0; i < 10; i++)
cb();
}
hexpat-0.20.13/test/thread-leak/build.sh 0000644 0000000 0000000 00000000055 13122604047 016126 0 ustar 00 0000000 0000000 ghc thread-leak.hs callme.c --make -threaded
hexpat-0.20.13/test/thread-leak/thread-leak.hs 0000644 0000000 0000000 00000002613 13122604047 017212 0 ustar 00 0000000 0000000 {-# LANGUAGE ForeignFunctionInterface, CPP #-}
-- | In ghc 6.12.3, this program spawns lots of thread when os = False.
-- If you set os = True, then it doesn't.
--
-- You can observe this either by seeing the virtual memory go crazy in top,
-- or by running in gdb and pressing ctrl-C.
import Control.Concurrent
import Control.Exception
import Control.Monad
import qualified Data.ByteString as B
import Text.XML.Expat.Tree
import System.Environment
import Data.IORef
import Foreign
os = False
foreign import ccall safe "callme" callme :: FunPtr (IO ()) -> IO ()
foreign import ccall safe "wrapper" mkPlain :: IO () -> IO (FunPtr (IO ()))
main = do
args <- getArgs
let (nthreads, nloops) = case args of
threads : loops : _ -> (read threads, read loops)
_ -> (10, 10000)
putStrLn $ show nthreads++" threads with "++show nloops++" loops each"++
", using '"++(if os then "forkOS" else "forkIO")++"'"
qs <- newQSem 0
replicateM_ nthreads $ do
(if os then forkOS else forkIO) $ do
cRef <- newIORef 0
cb <- mkPlain $ modifyIORef cRef $ \x -> x `seq` (x+1)
replicateM_ nloops $ callme cb
freeHaskellFunPtr cb
c <- readIORef cRef
-- 'callme' calls us back 10 times
when (c /= nloops*10) $ fail $ "went really wrong: "++show (c, nloops*10)
signalQSem qs
replicateM_ nthreads $ waitQSem qs
putStrLn "done"
hexpat-0.20.13/test/thread-leak/clean.sh 0000644 0000000 0000000 00000000056 13122604047 016112 0 ustar 00 0000000 0000000 rm -f *.hi *.o thread-leak thread-leak_stub.*
hexpat-0.20.13/Text/ 0000755 0000000 0000000 00000000000 13122604047 012257 5 ustar 00 0000000 0000000 hexpat-0.20.13/Text/XML/ 0000755 0000000 0000000 00000000000 13122604047 012717 5 ustar 00 0000000 0000000 hexpat-0.20.13/Text/XML/Expat/ 0000755 0000000 0000000 00000000000 13122604047 014000 5 ustar 00 0000000 0000000 hexpat-0.20.13/Text/XML/Expat/Cursor.hs 0000644 0000000 0000000 00000054243 13122604047 015621 0 ustar 00 0000000 0000000 {-# LANGUAGE FlexibleContexts, UndecidableInstances #-}
--------------------------------------------------------------------
-- |
-- Module : Text.XML.Expat.Cursor
--
-- This module ported from Text.XML.Light.Cursor
--
-- XML cursors for working XML content withing the context of
-- an XML document. This implementation is based on the general
-- tree zipper written by Krasimir Angelov and Iavor S. Diatchki.
--
-- With the exception of 'modifyContentM', then M-suffixed functions are
-- for use with monadic node types, as used when dealing with chunked I\/O
-- with the /hexpat-iteratee/ package. In the more common pure case, you
-- wouldn't need these *M functions.
module Text.XML.Expat.Cursor
(
-- * Types
Cursor, CursorG(..), Path, PathG
, Tag(..), getTag, fromTag
-- * Conversions
, fromTree
, fromForest
, toForest
, toTree
-- * Moving around
, parent
, root
, getChild
, getChildM
, firstChild
, firstChildM
, lastChild
, lastChildM
, left
, leftM
, right
, rightM
, nextDF
, nextDFM
-- ** Searching
, findChild
, findLeft
, findRight
, findRec
, findRecM
-- * Node classification
, isRoot
, isFirst
, isFirstM
, isLast
, isLastM
, isLeaf
, isChild
, hasChildren
, getNodeIndex
-- * Updates
, setContent
, modifyContent
, modifyContentList
, modifyContentListM
, modifyContentM
-- ** Inserting content
, insertLeft
, insertRight
, insertManyLeft
, insertManyRight
, insertFirstChild
, insertLastChild
, insertManyFirstChild
, insertManyLastChild
, insertGoLeft
, insertGoRight
-- ** Removing content
, removeLeft
, removeLeftM
, removeRight
, removeRightM
, removeGoLeft
, removeGoLeftM
, removeGoRight
, removeGoRightM
, removeGoUp
) where
import Text.XML.Expat.Tree
import Control.Monad (mzero, mplus)
import Data.Maybe(isNothing)
import Data.Monoid
import Data.Functor.Identity
import Data.List.Class (List(..), ListItem(..), cons, foldlL, lengthL)
data Tag tag text = Tag { tagName :: tag
, tagAttribs :: Attributes tag text
} deriving (Show)
{-
setTag :: Tag -> Element -> Element
setTag t e = fromTag t (elContent e)
-}
fromTag :: MkElementClass n c => Tag tag text -> c (n c tag text) -> n c tag text
fromTag t cs = mkElement (tagName t) (tagAttribs t) cs
-- | Generalized path within an XML document.
type PathG n c tag text = [(c (n c tag text),Tag tag text,c (n c tag text))]
-- | A path specific to @Text.XML.Expat.Tree.Node@ trees.
type Path tag text = PathG NodeG [] tag text
-- | Generalized cursor: The position of a piece of content in an XML document.
-- @n@ is the Node type and @c@ is the list type, which would usually be [],
-- except when you're using chunked I\/O.
data CursorG n c tag text = Cur
{ current :: n c tag text -- ^ The currently selected content.
, lefts :: c (n c tag text) -- ^ Siblings on the left, closest first.
, rights :: c (n c tag text) -- ^ Siblings on the right, closest first.
, parents :: PathG n c tag text -- ^ The contexts of the parent elements of this location.
}
instance (Show (n c tag text), Show (c (n c tag text)), Show tag, Show text)
=> Show (CursorG n c tag text) where
show (Cur c l r p) = "Cur { current="++show c++
", lefts="++show l++
", rights="++show r++
", parents="++show p++" }"
-- | A cursor specific to @Text.XML.Expat.Tree.Node@ trees.
type Cursor tag text = CursorG NodeG [] tag text
-- Moving around ---------------------------------------------------------------
-- | The parent of the given location.
parent :: MkElementClass n c => CursorG n c tag text -> Maybe (CursorG n c tag text)
parent loc =
case parents loc of
(pls,v,prs) : ps -> Just
Cur { current = (fromTag v
(combChildren (lefts loc) (current loc) (rights loc)))
, lefts = pls, rights = prs, parents = ps
}
[] -> Nothing
-- | The top-most parent of the given location.
root :: MkElementClass n c => CursorG n c tag text -> CursorG n c tag text
root loc = maybe loc root (parent loc)
-- | The left sibling of the given location - pure version.
left :: CursorG n [] tag text -> Maybe (CursorG n [] tag text)
left loc = runIdentity $ leftM loc
-- | The left sibling of the given location - used for monadic node types.
leftM :: List c => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
leftM loc = do
let l = lefts loc
li <- runList l
case li of
Nil -> return Nothing
Cons t ts -> return $ Just loc { current = t, lefts = ts
, rights = cons (current loc) (rights loc) }
-- | The right sibling of the given location - pure version.
right :: CursorG n [] tag text -> Maybe (CursorG n [] tag text)
right loc = runIdentity $ rightM loc
-- | The right sibling of the given location - used for monadic node types.
rightM :: List c => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
rightM loc = do
let r = rights loc
li <- runList r
case li of
Nil -> return Nothing
Cons t ts -> return $ Just loc { current = t, lefts = cons (current loc) (lefts loc)
, rights = ts }
-- | The first child of the given location - pure version.
firstChild :: (NodeClass n [], Monoid tag) => CursorG n [] tag text -> Maybe (CursorG n [] tag text)
firstChild loc = runIdentity $ firstChildM loc
-- | The first child of the given location - used for monadic node types.
firstChildM :: (NodeClass n c, Monoid tag) => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
firstChildM loc = do
case downParents loc of
Just (l, ps) -> do
li <- runList l
return $ case li of
Cons t ts -> Just $ Cur { current = t, lefts = mzero, rights = ts , parents = ps }
Nil -> Nothing
Nothing -> return $ Nothing
-- | The last child of the given location - pure version.
lastChild :: (NodeClass n [], Monoid tag) => CursorG n [] tag text -> Maybe (CursorG n [] tag text)
lastChild loc = runIdentity $ lastChildM loc
-- | The last child of the given location - used for monadic node types.
lastChildM :: (NodeClass n c, Monoid tag) => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
lastChildM loc = do
case downParents loc of
Just (l, ps) -> do
li <- runList (reverseL l)
return $ case li of
Cons t ts -> Just $ Cur { current = t, lefts = ts, rights = mzero , parents = ps }
Nil -> Nothing
Nothing -> return $ Nothing
-- | Find the next left sibling that satisfies a predicate.
findLeft :: NodeClass n [] =>
(CursorG n [] tag text -> Bool)
-> CursorG n [] tag text
-> Maybe (CursorG n [] tag text)
findLeft p loc = runIdentity (findLeftM p loc)
-- | Find the next left sibling that satisfies a predicate.
findLeftM :: NodeClass n c =>
(CursorG n c tag text -> Bool)
-> CursorG n c tag text
-> ItemM c (Maybe (CursorG n c tag text))
findLeftM p loc = do
mLoc1 <- leftM loc
case mLoc1 of
Just loc1 -> if p loc1 then return (Just loc1) else findLeftM p loc1
Nothing -> return Nothing
-- | Find the next right sibling that satisfies a predicate - pure version.
findRight :: (CursorG n [] tag text -> Bool)
-> CursorG n [] tag text
-> Maybe (CursorG n [] tag text)
findRight p loc = runIdentity $ findRightM p loc
-- | Find the next right sibling that satisfies a predicate - used for monadic node types.
findRightM :: List c =>
(CursorG n c tag text -> Bool)
-> CursorG n c tag text
-> ItemM c (Maybe (CursorG n c tag text))
findRightM p loc = do
mLoc1 <- rightM loc
case mLoc1 of
Just loc1 -> if p loc1 then return $ Just loc1 else findRightM p loc1
Nothing -> return Nothing
-- | The first child that satisfies a predicate - pure version.
findChild :: (NodeClass n [], Monoid tag) =>
(CursorG n [] tag text -> Bool)
-> CursorG n [] tag text
-> Maybe (CursorG n [] tag text)
findChild p loc = runIdentity $ findChildM p loc
-- | The first child that satisfies a predicate - used for monadic node types.
findChildM :: (NodeClass n c, Monoid tag) =>
(CursorG n c tag text -> Bool)
-> CursorG n c tag text
-> ItemM c (Maybe (CursorG n c tag text))
findChildM p loc = do
mLoc1 <- firstChildM loc
case mLoc1 of
Just loc1 -> if p loc1 then return $ Just loc1 else findRightM p loc1
Nothing -> return Nothing
-- | The next position in a left-to-right depth-first traversal of a document:
-- either the first child, right sibling, or the right sibling of a parent that
-- has one. Pure version.
nextDF :: (MkElementClass n [], Monoid tag) => CursorG n [] tag text -> Maybe (CursorG n [] tag text)
nextDF c = runIdentity $ nextDFM c
-- | The next position in a left-to-right depth-first traversal of a document:
-- either the first child, right sibling, or the right sibling of a parent that
-- has one. Used for monadic node types.
nextDFM :: (MkElementClass n c, Monoid tag) => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
nextDFM c = do
mFirst <- firstChildM c
case mFirst of
Just c' -> return $ Just c'
Nothing -> up c
where
up x = do
mRight <- rightM x
case mRight of
Just c' -> return $ Just c'
Nothing ->
case parent x of
Just p -> up p
Nothing -> return Nothing
-- | Perform a depth first search for a descendant that satisfies the
-- given predicate. Pure version.
findRec :: (MkElementClass n [], Monoid tag) =>
(CursorG n [] tag text -> Bool)
-> CursorG n [] tag text
-> Maybe (CursorG n [] tag text)
findRec p c = runIdentity $ findRecM (return . p) c
-- | Perform a depth first search for a descendant that satisfies the
-- given predicate. Used for monadic node types.
findRecM :: (MkElementClass n c, Monoid tag) =>
(CursorG n c tag text -> ItemM c Bool)
-> CursorG n c tag text
-> ItemM c (Maybe (CursorG n c tag text))
findRecM p c = do
found <- p c
if found
then return $ Just c
else do
mC' <- nextDFM c
case mC' of
Just c' -> findRecM p c'
Nothing -> return Nothing
-- | The child with the given index (starting from 0). - pure version.
getChild :: (NodeClass n [], Monoid tag) => Int -> CursorG n [] tag text -> Maybe (CursorG n [] tag text)
getChild n loc = runIdentity $ getChildM n loc
-- | The child with the given index (starting from 0) - used for monadic node types.
getChildM :: (NodeClass n c, Monoid tag) =>
Int
-> CursorG n c tag text
-> ItemM c (Maybe (CursorG n c tag text))
getChildM n loc = do
let mParents = downParents loc
case mParents of
Just (ts, ps) -> do
mSplit <- splitChildrenM ts n
case mSplit of
Just (ls,t,rs) -> return $ Just $
Cur { current = t, lefts = ls, rights = rs, parents = ps }
Nothing -> return Nothing
Nothing -> return Nothing
-- | private: computes the parent for "down" operations.
downParents :: (NodeClass n c, Monoid tag) => CursorG n c tag text -> Maybe (c (n c tag text), PathG n c tag text)
downParents loc =
case current loc of
e | isElement e ->
let n = getName e
a = getAttributes e
c = getChildren e
in Just ( c
, cons (lefts loc, Tag n a, rights loc) (parents loc)
)
_ -> Nothing
getTag :: Node tag text -> Tag tag text
getTag e = Tag { tagName = eName e
, tagAttribs = eAttributes e
}
-- Conversions -----------------------------------------------------------------
-- | A cursor for the given content.
fromTree :: List c => n c tag text -> CursorG n c tag text
fromTree t = Cur { current = t, lefts = mzero, rights = mzero, parents = [] }
-- | The location of the first tree in a forest - pure version.
fromForest :: NodeClass n [] => [n [] tag text] -> Maybe (CursorG n [] tag text)
fromForest l = runIdentity $ fromForestM l
-- | The location of the first tree in a forest - used with monadic node types.
fromForestM :: List c => c (n c tag text) -> ItemM c (Maybe (CursorG n c tag text))
fromForestM l = do
li <- runList l
return $ case li of
Cons t ts -> Just Cur { current = t, lefts = mzero, rights = ts
, parents = [] }
Nil -> Nothing
-- | Computes the tree containing this location.
toTree :: MkElementClass n c => CursorG n c tag text -> n c tag text
toTree loc = current (root loc)
-- | Computes the forest containing this location.
toForest :: MkElementClass n c => CursorG n c tag text -> c (n c tag text)
toForest loc = let r = root loc in combChildren (lefts r) (current r) (rights r)
-- Queries ---------------------------------------------------------------------
-- | Are we at the top of the document?
isRoot :: CursorG n c tag text -> Bool
isRoot loc = null (parents loc)
-- | Are we at the left end of the the document? (Pure version.)
isFirst :: CursorG n [] tag text -> Bool
isFirst loc = runIdentity $ isFirstM loc
-- | Are we at the left end of the the document? (Used for monadic node types.)
isFirstM :: List c => CursorG n c tag text -> ItemM c Bool
isFirstM loc = do
li <- runList (lefts loc)
return $ case li of
Nil -> True
_ -> False
-- | Are we at the right end of the document? (Pure version.)
isLast :: CursorG n [] tag text -> Bool
isLast loc = runIdentity $ isLastM loc
-- | Are we at the right end of the document? (Used for monadic node types.)
isLastM :: List c => CursorG n c tag text -> ItemM c Bool
isLastM loc = do
li <- runList (rights loc)
return $ case li of
Nil -> True
_ -> False
-- | Are we at the bottom of the document?
isLeaf :: (NodeClass n c, Monoid tag) => CursorG n c tag text -> Bool
isLeaf loc = isNothing (downParents loc)
-- | Do we have a parent?
isChild :: CursorG n c tag text -> Bool
isChild loc = not (isRoot loc)
-- | Get the node index inside the sequence of children - pure version.
getNodeIndex :: CursorG n [] tag text -> Int
getNodeIndex loc = runIdentity $ getNodeIndexM loc
-- | Get the node index inside the sequence of children - used for monadic node types.
getNodeIndexM :: List c => CursorG n c tag text -> ItemM c Int
getNodeIndexM loc = lengthL (lefts loc)
-- | Do we have children?
hasChildren :: (NodeClass n c, Monoid tag) => CursorG n c tag text -> Bool
hasChildren loc = not (isLeaf loc)
-- Updates ---------------------------------------------------------------------
-- | Change the current content.
setContent :: n c tag text -> CursorG n c tag text -> CursorG n c tag text
setContent t loc = loc { current = t }
-- | Modify the current content.
modifyContent :: (n c tag text -> n c tag text) -> CursorG n c tag text -> CursorG n c tag text
modifyContent f loc = setContent (f (current loc)) loc
-- | Modify the current content - pure version.
modifyContentList :: NodeClass n [] =>
(n [] tag text -> [n [] tag text]) -> CursorG n [] tag text -> Maybe (CursorG n [] tag text)
modifyContentList f loc = runIdentity $ modifyContentListM f loc
-- | Modify the current content - used for monadic node types.
modifyContentListM :: NodeClass n c =>
(n c tag text -> c (n c tag text))
-> CursorG n c tag text
-> ItemM c (Maybe (CursorG n c tag text))
modifyContentListM f loc = removeGoRightM $ insertManyRight (f $ current loc) loc
-- | Modify the current content, allowing for an effect.
modifyContentM :: Monad m => (n [] tag text -> m (n [] tag text)) -> CursorG n [] tag text -> m (CursorG n [] tag text)
modifyContentM f loc = do x <- f (current loc)
return (setContent x loc)
-- | Insert content to the left of the current position.
insertLeft :: List c => n c tag text -> CursorG n c tag text -> CursorG n c tag text
insertLeft t loc = loc { lefts = t `cons` lefts loc }
-- | Insert content to the right of the current position.
insertRight :: List c => n c tag text -> CursorG n c tag text -> CursorG n c tag text
insertRight t loc = loc { rights = t `cons` rights loc }
-- | Insert content to the left of the current position.
insertManyLeft :: List c => c (n c tag text) -> CursorG n c tag text -> CursorG n c tag text
insertManyLeft t loc = loc { lefts = reverseL t `mplus` lefts loc }
-- | Insert content to the right of the current position.
insertManyRight :: List c => c (n c tag text) -> CursorG n c tag text -> CursorG n c tag text
insertManyRight t loc = loc { rights = t `mplus` rights loc }
-- | Insert content as the first child of the current position.
mapChildren :: NodeClass n c => (c (n c tag text) -> c (n c tag text))
-> CursorG n c tag text
-> Maybe (CursorG n c tag text)
mapChildren f loc = let e = current loc in
if isElement e then
Just $ loc { current = modifyChildren f e }
else
Nothing
-- | Insert content as the first child of the current position.
insertFirstChild :: NodeClass n c => n c tag text -> CursorG n c tag text -> Maybe (CursorG n c tag text)
insertFirstChild t = mapChildren (t `cons`)
-- | Insert content as the first child of the current position.
insertLastChild :: NodeClass n c => n c tag text -> CursorG n c tag text -> Maybe (CursorG n c tag text)
insertLastChild t = mapChildren (`mplus` return t)
-- | Insert content as the first child of the current position.
insertManyFirstChild :: NodeClass n c => c (n c tag text) -> CursorG n c tag text -> Maybe (CursorG n c tag text)
insertManyFirstChild t = mapChildren (t `mplus`)
-- | Insert content as the first child of the current position.
insertManyLastChild :: NodeClass n c => c (n c tag text) -> CursorG n c tag text -> Maybe (CursorG n c tag text)
insertManyLastChild t = mapChildren (`mplus` t)
-- | Remove the content on the left of the current position, if any - pure version.
removeLeft :: CursorG n [] tag text -> Maybe (n [] tag text, CursorG n [] tag text)
removeLeft loc = runIdentity $ removeLeftM loc
-- | Remove the content on the left of the current position, if any - used for monadic node types.
removeLeftM :: List c => CursorG n c tag text -> ItemM c (Maybe (n c tag text, CursorG n c tag text))
removeLeftM loc = do
li <- runList (lefts loc)
return $ case li of
Cons l ls -> Just $ (l,loc { lefts = ls })
Nil -> Nothing
-- | Remove the content on the right of the current position, if any - pure version.
removeRight :: CursorG n [] tag text -> Maybe (n [] tag text, CursorG n [] tag text)
removeRight loc = runIdentity $ removeRightM loc
-- | Remove the content on the left of the current position, if any - used for monadic node types.
removeRightM :: List c => CursorG n c tag text -> ItemM c (Maybe (n c tag text, CursorG n c tag text))
removeRightM loc = do
li <- runList (rights loc)
return $ case li of
Cons l ls -> Just $ (l,loc { rights = ls })
Nil -> Nothing
-- | Insert content to the left of the current position.
-- The new content becomes the current position.
insertGoLeft :: List c => n c tag text -> CursorG n c tag text -> CursorG n c tag text
insertGoLeft t loc = loc { current = t, rights = current loc `cons` rights loc }
-- | Insert content to the right of the current position.
-- The new content becomes the current position.
insertGoRight :: List c => n c tag text -> CursorG n c tag text -> CursorG n c tag text
insertGoRight t loc = loc { current = t, lefts = current loc `cons` lefts loc }
-- | Remove the current element.
-- The new position is the one on the left. Pure version.
removeGoLeft :: CursorG n [] tag text -> Maybe (CursorG n [] tag text)
removeGoLeft loc = case lefts loc of
l : ls -> Just loc { current = l, lefts = ls }
[] -> Nothing
-- | Remove the current element.
-- The new position is the one on the left. Pure version.
removeGoLeftM :: List c => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
removeGoLeftM loc = do
li <- runList (lefts loc)
return $ case li of
Cons l ls -> Just loc { current = l, lefts = ls }
Nil -> Nothing
-- | Remove the current element.
-- The new position is the one on the right. Pure version.
removeGoRight :: CursorG n [] tag text -> Maybe (CursorG n [] tag text)
removeGoRight loc = runIdentity $ removeGoRightM loc
-- | Remove the current element.
-- The new position is the one on the right. Used for monadic node types.
removeGoRightM :: List c => CursorG n c tag text -> ItemM c (Maybe (CursorG n c tag text))
removeGoRightM loc = do
li <- runList (rights loc)
return $ case li of
Cons l ls -> Just loc { current = l, rights = ls }
Nil -> Nothing
-- | Remove the current element.
-- The new position is the parent of the old position.
removeGoUp :: MkElementClass n c => CursorG n c tag text -> Maybe (CursorG n c tag text)
removeGoUp loc =
case (parents loc) of
[] -> Nothing
(pls, v, prs):ps -> Just $
Cur { current = fromTag v (reverseL (lefts loc) `mplus` rights loc)
, lefts = pls, rights = prs, parents = ps
}
-- | private: Gets the given element of a list.
-- Also returns the preceding elements (reversed) and the following elements.
splitChildrenM :: List c => c a -> Int -> ItemM c (Maybe (c a,a,c a))
splitChildrenM _ n | n < 0 = return Nothing
splitChildrenM cs pos = loop mzero cs pos
where
loop acc l n = do
li <- runList l
case li of
Nil -> return Nothing
Cons x l' -> if n == 0
then return $ Just (acc, x, l')
else loop (cons x acc) l' $! n-1
-- | private: combChildren ls x ys = reverse ls ++ [x] ++ rs
combChildren :: List c =>
c a -- ^ ls
-> a -- ^ x
-> c a -- ^ rs
-> c a
combChildren ls t rs = joinL $ foldlL (flip cons) (cons t rs) ls
reverseL :: List c => c a -> c a
reverseL = joinL . foldlL (flip cons) mzero
hexpat-0.20.13/Text/XML/Expat/Annotated.hs 0000644 0000000 0000000 00000031247 13122604047 016260 0 ustar 00 0000000 0000000 {-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, TypeFamilies,
FlexibleContexts, ScopedTypeVariables #-}
-- | A variant of /Node/ in which Element nodes have an annotation of any type,
-- and some concrete functions that annotate with the XML parse location.
--
-- The names conflict with those in /Tree/ so you must use qualified import
-- if you want to use both modules.
module Text.XML.Expat.Annotated (
-- * Tree structure
Node,
NodeG(..),
UNode,
LNode,
ULNode,
-- * Generic node manipulation
module Text.XML.Expat.Internal.NodeClass,
-- * Annotation-specific
modifyAnnotation,
mapAnnotation,
-- * Qualified nodes
QNode,
QLNode,
module Text.XML.Expat.Internal.Qualified,
-- * Namespaced nodes
NNode,
NLNode,
module Text.XML.Expat.Internal.Namespaced,
-- * Parse to tree
ParseOptions(..),
defaultParseOptions,
Encoding(..),
parse,
parse',
parseG,
XMLParseError(..),
XMLParseLocation(..),
-- * Variant that throws exceptions
parseThrowing,
XMLParseException(..),
-- * Convert from SAX
saxToTree,
saxToTreeG,
-- * Abstraction of string types
GenericXMLString(..)
) where
import Control.Arrow (first)
import Text.XML.Expat.SAX ( Encoding(..)
, GenericXMLString(..)
, ParseOptions(..)
, defaultParseOptions
, SAXEvent(..)
, XMLParseError(..)
, XMLParseException(..)
, XMLParseLocation(..) )
import qualified Text.XML.Expat.SAX as SAX
import Text.XML.Expat.Internal.Namespaced
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.Internal.Qualified
import Control.Monad (mplus, mzero)
import Control.DeepSeq
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.List.Class (List(..), ListItem(..), cons, foldlL, joinM)
import Data.Monoid
-- | Annotated variant of the tree representation of the XML document, meaning
-- that it has an extra piece of information of your choice attached to each
-- Element.
--
-- @c@ is the container type for the element's children, which would normally be [],
-- but could potentially be a monadic list type to allow for chunked I/O.
--
-- @tag@ is the tag type, which can either be one of several string types,
-- or a special type from the @Text.XML.Expat.Namespaced@ or
-- @Text.XML.Expat.Qualified@ modules.
--
-- @text@ is the string type for text content.
--
-- @a@ is the type of the annotation. One of the things this can be used for
-- is to store the XML parse location, which is useful for error handling.
--
-- Note that some functions in the @Text.XML.Expat.Cursor@ module need to create
-- new nodes through the 'MkElementClass' type class. Normally this can only be done
-- if @a@ is a Maybe type or () (so it can provide the Nothing value for the annotation
-- on newly created nodes). Or, you can write your own 'MkElementClass' instance.
-- Apart from that, there is no requirement for @a@ to be a Maybe type.
data NodeG a c tag text =
Element {
eName :: !tag,
eAttributes :: ![(tag,text)],
eChildren :: c (NodeG a c tag text),
eAnn :: a
} |
Text !text
type instance ListOf (NodeG a c tag text) = c (NodeG a c tag text)
-- | A pure tree representation that uses a list as its container type,
-- annotated variant.
--
-- In the @hexpat@ package, a list of nodes has the type @[Node tag text]@, but note
-- that you can also use the more general type function 'ListOf' to give a list of
-- any node type, using that node's associated list type, e.g.
-- @ListOf (UNode Text)@.
type Node a tag text = NodeG a [] tag text
instance (Show tag, Show text, Show a) => Show (NodeG a [] tag text) where
showsPrec d (Element na at ch an) = showParen (d > 10) $
("Element "++) . showsPrec 11 na . (" "++) .
showsPrec 11 at . (" "++) .
showsPrec 11 ch . (" "++) .
showsPrec 11 an
showsPrec d (Text t) = showParen (d > 10) $ ("Text "++) . showsPrec 11 t
instance (Eq tag, Eq text, Eq a) => Eq (NodeG a [] tag text) where
Element na1 at1 ch1 an1 == Element na2 at2 ch2 an2 =
na1 == na2 &&
at1 == at2 &&
ch1 == ch2 &&
an1 == an2
Text t1 == Text t2 = t1 == t2
_ == _ = False
instance (NFData tag, NFData text, NFData a) => NFData (NodeG a [] tag text) where
rnf (Element nam att chi ann) = rnf (nam, att, chi, ann)
rnf (Text txt) = rnf txt
instance (Functor c, List c) => NodeClass (NodeG a) c where
textContentM (Element _ _ children _) = foldlL mappend mempty $ joinM $ fmap textContentM children
textContentM (Text txt) = return txt
isElement (Element _ _ _ _) = True
isElement _ = False
isText (Text _) = True
isText _ = False
isCData _ = False
isProcessingInstruction _ = False
isComment _ = False
isNamed _ (Text _) = False
isNamed nm (Element nm' _ _ _) = nm == nm'
getName (Text _) = mempty
getName (Element name _ _ _) = name
hasTarget _ _ = False
getTarget _ = mempty
getAttributes (Text _) = []
getAttributes (Element _ attrs _ _) = attrs
getChildren (Text _) = mzero
getChildren (Element _ _ ch _) = ch
getText (Text txt) = txt
getText (Element _ _ _ _) = mempty
modifyName _ node@(Text _) = node
modifyName f (Element n a c ann) = Element (f n) a c ann
modifyAttributes _ node@(Text _) = node
modifyAttributes f (Element n a c ann) = Element n (f a) c ann
modifyChildren _ node@(Text _) = node
modifyChildren f (Element n a c ann) = Element n a (f c) ann
mapAllTags _ (Text t) = Text t
mapAllTags f (Element n a c ann) = Element (f n) (map (first f) a) (fmap (mapAllTags f) c) ann
modifyElement _ (Text t) = Text t
modifyElement f (Element n a c ann) =
let (n', a', c') = f (n, a, c)
in Element n' a' c' ann
mapNodeContainer f (Element n a ch an) = do
ch' <- mapNodeListContainer f ch
return $ Element n a ch' an
mapNodeContainer _ (Text t) = return $ Text t
mkText = Text
instance (Functor c, List c) => MkElementClass (NodeG (Maybe a)) c where
mkElement name attrs children = Element name attrs children Nothing
instance (Functor c, List c) => MkElementClass (NodeG ()) c where
mkElement name attrs children = Element name attrs children ()
-- | Type alias for an annotated node with unqualified tag names where
-- tag and text are the same string type
type UNode a text = Node a text text
-- | Type alias for an annotated node, annotated with parse location
type LNode tag text = Node XMLParseLocation tag text
-- | Type alias for an annotated node with unqualified tag names where
-- tag and text are the same string type, annotated with parse location
type ULNode text = LNode text text
-- | Type alias for an annotated node where qualified names are used for tags
type QNode a text = Node a (QName text) text
-- | Type alias for an annotated node where qualified names are used for tags, annotated with parse location
type QLNode text = LNode (QName text) text
-- | Type alias for an annotated node where namespaced names are used for tags
type NNode a text = Node a (NName text) text
-- | Type alias for an annotated node where namespaced names are used for tags, annotated with parse location
type NLNode text = LNode (NName text) text
-- | Modify this node's annotation (non-recursively) if it's an element, otherwise no-op.
modifyAnnotation :: (a -> a) -> Node a tag text -> Node a tag text
f `modifyAnnotation` Element na at ch an = Element na at ch (f an)
_ `modifyAnnotation` Text t = Text t
-- | Modify this node's annotation and all its children recursively if it's an element, otherwise no-op.
mapAnnotation :: (a -> b) -> Node a tag text -> Node b tag text
f `mapAnnotation` Element na at ch an = Element na at (map (f `mapAnnotation`) ch) (f an)
_ `mapAnnotation` Text t = Text t
-- | A lower level function that lazily converts a SAX stream into a tree structure.
-- Variant that takes annotations for start tags.
saxToTree :: GenericXMLString tag =>
[(SAXEvent tag text, a)]
-> (Node a tag text, Maybe XMLParseError)
saxToTree events =
let (nodes, mError, _) = ptl events
in (findRoot nodes, mError)
where
findRoot (elt@(Element _ _ _ _):_) = elt
findRoot (_:nodes) = findRoot nodes
findRoot [] = Element (gxFromString "") [] [] (error "saxToTree null annotation")
ptl ((StartElement name attrs, ann):rema) =
let (children, err1, rema') = ptl rema
elt = Element name attrs children ann
(out, err2, rema'') = ptl rema'
in (elt:out, err1 `mplus` err2, rema'')
ptl ((EndElement _, _):rema) = ([], Nothing, rema)
ptl ((CharacterData txt, _):rema) =
let (out, err, rema') = ptl rema
in (Text txt:out, err, rema')
ptl ((FailDocument err, _):_) = ([], Just err, [])
ptl (_:rema) = ptl rema -- extended node types not supported in this tree type
ptl [] = ([], Nothing, [])
-- | A lower level function that converts a generalized SAX stream into a tree structure.
-- Ignores parse errors.
saxToTreeG :: forall l a tag text . (GenericXMLString tag, List l, Monad (ItemM l)) =>
l (SAXEvent tag text, a)
-> ItemM l (NodeG a l tag text)
saxToTreeG events = do
(elts, _) <- process events
findRoot elts
where
findRoot :: l (NodeG a l tag text) -> ItemM l (NodeG a l tag text)
findRoot elts = do
li <- runList elts
case li of
Cons elt@(Element _ _ _ _) _ -> return elt
Cons _ rema -> findRoot rema
Nil -> return $ Element (gxFromString "") mzero mzero (error "saxToTree null annotation")
process :: l (SAXEvent tag text, a)
-> ItemM l (l (NodeG a l tag text), l (SAXEvent tag text, a))
process events = do
li <- runList events
case li of
Nil -> return (mzero, mzero)
Cons (StartElement name attrs, ann) rema -> do
(children, rema') <- process rema
(out, rema'') <- process rema'
return (Element name attrs children ann `cons` out, rema'')
Cons (EndElement _, _) rema -> return (mzero, rema)
Cons (CharacterData txt, _) rema -> do
(out, rema') <- process rema
return (Text txt `cons` out, rema')
--Cons (FailDocument err) rema = (mzero, mzero)
Cons _ rema -> process rema
-- | Lazily parse XML to tree. Note that forcing the XMLParseError return value
-- will force the entire parse. Therefore, to ensure lazy operation, don't
-- check the error status until you have processed the tree.
parse :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> (LNode tag text, Maybe XMLParseError)
parse opts bs = saxToTree $ SAX.parseLocations opts bs
-- | Parse a generalized list to a tree, ignoring parse errors.
-- This function allows for a parse from an enumerator/iteratee to a "lazy"
-- tree structure using the @List-enumerator@ package.
parseG :: (GenericXMLString tag, GenericXMLString text, List l) =>
ParseOptions tag text -- ^ Parse options
-> l ByteString -- ^ Input text as a generalized list of blocks
-> ItemM l (NodeG XMLParseLocation l tag text)
parseG opts = saxToTreeG . SAX.parseLocationsG opts
-- | Lazily parse XML to tree. In the event of an error, throw 'XMLParseException'.
--
-- @parseThrowing@ can throw an exception from pure code, which is generally a bad
-- way to handle errors, because Haskell\'s lazy evaluation means it\'s hard to
-- predict where it will be thrown from. However, it may be acceptable in
-- situations where it's not expected during normal operation, depending on the
-- design of your program.
parseThrowing :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> LNode tag text
parseThrowing opts bs = fst $ saxToTree $ SAX.parseLocationsThrowing opts bs
-- | Strictly parse XML to tree. Returns error message or valid parsed tree.
parse' :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> B.ByteString -- ^ Input text (a strict ByteString)
-> Either XMLParseError (LNode tag text)
parse' opts doc = case parse opts (L.fromChunks [doc]) of
(xml, Nothing) -> Right xml
(_, Just err) -> Left err
hexpat-0.20.13/Text/XML/Expat/Tree.hs 0000644 0000000 0000000 00000033430 13122604047 015236 0 ustar 00 0000000 0000000 {-# LANGUAGE DeriveDataTypeable, TypeSynonymInstances, FlexibleInstances,
MultiParamTypeClasses, TypeFamilies, ScopedTypeVariables #-}
-- hexpat, a Haskell wrapper for expat
-- Copyright (C) 2008 Evan Martin
-- Copyright (C) 2009 Stephen Blackheath
-- | This module provides functions to parse an XML document to a tree structure,
-- either strictly or lazily.
--
-- The 'GenericXMLString' type class allows you to use any string type. Three
-- string types are provided for here: 'String', 'ByteString' and 'Text'.
--
-- Here is a complete example to get you started:
--
-- > -- | A "hello world" example of hexpat that lazily parses a document, printing
-- > -- it to standard out.
-- >
-- > import Text.XML.Expat.Tree
-- > import Text.XML.Expat.Format
-- > import System.Environment
-- > import System.Exit
-- > import System.IO
-- > import qualified Data.ByteString.Lazy as L
-- >
-- > main = do
-- > args <- getArgs
-- > case args of
-- > [filename] -> process filename
-- > otherwise -> do
-- > hPutStrLn stderr "Usage: helloworld "
-- > exitWith $ ExitFailure 1
-- >
-- > process :: String -> IO ()
-- > process filename = do
-- > inputText <- L.readFile filename
-- > -- Note: Because we're not using the tree, Haskell can't infer the type of
-- > -- strings we're using so we need to tell it explicitly with a type signature.
-- > let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String, Maybe XMLParseError)
-- > -- Process document before handling error, so we get lazy processing.
-- > L.hPutStr stdout $ format xml
-- > putStrLn ""
-- > case mErr of
-- > Nothing -> return ()
-- > Just err -> do
-- > hPutStrLn stderr $ "XML parse failed: "++show err
-- > exitWith $ ExitFailure 2
--
-- Error handling in strict parses is very straightforward - just check the
-- 'Either' return value. Lazy parses are not so simple. Here are two working
-- examples that illustrate the ways to handle errors. Here they are:
--
-- Way no. 1 - Using a Maybe value
--
-- > import Text.XML.Expat.Tree
-- > import qualified Data.ByteString.Lazy as L
-- > import Data.ByteString.Internal (c2w)
-- >
-- > -- This is the recommended way to handle errors in lazy parses
-- > main = do
-- > let (tree, mError) = parse defaultParseOptions
-- > (L.pack $ map c2w $ "")
-- > print (tree :: UNode String)
-- >
-- > -- Note: We check the error _after_ we have finished our processing
-- > -- on the tree.
-- > case mError of
-- > Just err -> putStrLn $ "It failed : "++show err
-- > Nothing -> putStrLn "Success!"
--
-- Way no. 2 - Using exceptions
--
-- 'parseThrowing' can throw an exception from pure code, which is generally a bad
-- way to handle errors, because Haskell\'s lazy evaluation means it\'s hard to
-- predict where it will be thrown from. However, it may be acceptable in
-- situations where it's not expected during normal operation, depending on the
-- design of your program.
--
-- > ...
-- > import Control.Exception.Extensible as E
-- >
-- > -- This is not the recommended way to handle errors.
-- > main = do
-- > do
-- > let tree = parseThrowing defaultParseOptions
-- > (L.pack $ map c2w $ "")
-- > print (tree :: UNode String)
-- > -- Because of lazy evaluation, you should not process the tree outside
-- > -- the 'do' block, or exceptions could be thrown that won't get caught.
-- > `E.catch` (\exc ->
-- > case E.fromException exc of
-- > Just (XMLParseException err) -> putStrLn $ "It failed : "++show err
-- > Nothing -> E.throwIO exc)
module Text.XML.Expat.Tree (
-- * Tree structure
Node,
NodeG(..),
UNode,
-- * Generic node manipulation
module Text.XML.Expat.Internal.NodeClass,
-- * Qualified nodes
QNode,
module Text.XML.Expat.Internal.Qualified,
-- * Namespaced nodes
NNode,
module Text.XML.Expat.Internal.Namespaced,
-- * Parse to tree
ParseOptions(..),
defaultParseOptions,
Encoding(..),
parse,
parse',
parseG,
XMLParseError(..),
XMLParseLocation(..),
-- * Variant that throws exceptions
parseThrowing,
XMLParseException(..),
-- * Convert from SAX
saxToTree,
saxToTreeG,
-- * Abstraction of string types
GenericXMLString(..)
) where
import Text.XML.Expat.SAX ( Encoding(..)
, GenericXMLString(..)
, ParseOptions(..)
, defaultParseOptions
, SAXEvent(..)
, XMLParseError(..)
, XMLParseException(..)
, XMLParseLocation(..) )
import qualified Text.XML.Expat.SAX as SAX
import Text.XML.Expat.Internal.Namespaced
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.Internal.Qualified
import Control.Arrow
import Control.Monad (mplus, mzero)
import Data.ByteString (ByteString)
import qualified Data.ByteString.Lazy as L
import Data.List.Class
import Data.Monoid (Monoid,mempty,mappend)
import Control.DeepSeq
-- | The tree representation of the XML document.
--
-- @c@ is the container type for the element's children, which would normally be [],
-- but could potentially be a monadic list type to allow for chunked I/O.
--
-- @tag@ is the tag type, which can either be one of several string types,
-- or a special type from the @Text.XML.Expat.Namespaced@ or
-- @Text.XML.Expat.Qualified@ modules.
--
-- @text@ is the string type for text content.
data NodeG c tag text =
Element {
eName :: !tag,
eAttributes :: ![(tag,text)],
eChildren :: c (NodeG c tag text)
} |
Text !text
type instance ListOf (NodeG c tag text) = c (NodeG c tag text)
instance (Show tag, Show text) => Show (NodeG [] tag text) where
showsPrec d (Element na at ch) = showParen (d > 10) $
("Element "++) . showsPrec 11 na . (" "++) .
showsPrec 11 at . (" "++) .
showsPrec 11 ch
showsPrec d (Text t) = showParen (d > 10) $ ("Text "++) . showsPrec 11 t
instance (Eq tag, Eq text) => Eq (NodeG [] tag text) where
Element na1 at1 ch1 == Element na2 at2 ch2 =
na1 == na2 &&
at1 == at2 &&
ch1 == ch2
Text t1 == Text t2 = t1 == t2
_ == _ = False
-- | A pure tree representation that uses a list as its container type.
--
-- In the @hexpat@ package, a list of nodes has the type @[Node tag text]@, but note
-- that you can also use the more general type function 'ListOf' to give a list of
-- any node type, using that node's associated list type, e.g.
-- @ListOf (UNode Text)@.
type Node tag text = NodeG [] tag text
instance (NFData tag, NFData text) => NFData (NodeG [] tag text) where
rnf (Element nam att chi) = rnf (nam, att, chi)
rnf (Text txt) = rnf txt
-- | Type alias for a node with unqualified tag names where tag and
-- text are the same string type.
type UNode text = Node text text
-- | Type alias for a node where qualified names are used for tags
type QNode text = Node (QName text) text
-- | Type alias for a node where namespaced names are used for tags
type NNode text = Node (NName text) text
instance (Functor c, List c) => NodeClass NodeG c where
textContentM (Element _ _ children) = foldlL mappend mempty $ joinM $ fmap textContentM children
textContentM (Text txt) = return txt
isElement (Element _ _ _) = True
isElement _ = False
isText (Text _) = True
isText _ = False
isCData _ = False
isProcessingInstruction _ = False
isComment _ = False
isNamed _ (Text _) = False
isNamed nm (Element nm' _ _) = nm == nm'
getName (Text _) = mempty
getName (Element name _ _) = name
hasTarget _ _ = False
getTarget _ = mempty
getAttributes (Text _) = []
getAttributes (Element _ attrs _) = attrs
getChildren (Text _) = mzero
getChildren (Element _ _ ch) = ch
getText (Text txt) = txt
getText (Element _ _ _) = mempty
modifyName _ node@(Text _) = node
modifyName f (Element n a c) = Element (f n) a c
modifyAttributes _ node@(Text _) = node
modifyAttributes f (Element n a c) = Element n (f a) c
modifyChildren _ node@(Text _) = node
modifyChildren f (Element n a c) = Element n a (f c)
mapAllTags _ (Text t) = Text t
mapAllTags f (Element n a c) = Element (f n) (map (first f) a) (fmap (mapAllTags f) c)
modifyElement _ (Text t) = Text t
modifyElement f (Element n a c) =
let (n', a', c') = f (n, a, c)
in Element n' a' c'
mapNodeContainer f (Element n a ch) = do
ch' <- mapNodeListContainer f ch
return $ Element n a ch'
mapNodeContainer _ (Text t) = return $ Text t
mkText = Text
instance (Functor c, List c) => MkElementClass NodeG c where
mkElement name attrs children = Element name attrs children
-- | Strictly parse XML to tree. Returns error message or valid parsed tree.
parse' :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> ByteString -- ^ Input text (a strict ByteString)
-> Either XMLParseError (Node tag text)
parse' opts doc = case parse opts (L.fromChunks [doc]) of
(xml, Nothing) -> Right xml
(_, Just err) -> Left err
-- | A lower level function that lazily converts a SAX stream into a tree structure.
saxToTree :: GenericXMLString tag =>
[SAXEvent tag text]
-> (Node tag text, Maybe XMLParseError)
saxToTree events =
let (nodes, mError, _) = ptl events
in (findRoot nodes, mError)
where
findRoot (elt@(Element _ _ _):_) = elt
findRoot (_:nodes) = findRoot nodes
findRoot [] = Element (gxFromString "") [] []
ptl (StartElement name attrs:rema) =
let (children, err1, rema') = ptl rema
elt = Element name attrs children
(out, err2, rema'') = ptl rema'
in (elt:out, err1 `mplus` err2, rema'')
ptl (EndElement _:rema) = ([], Nothing, rema)
ptl (CharacterData txt:rema) =
let (out, err, rema') = ptl rema
in (Text txt:out, err, rema')
ptl (FailDocument err:_) = ([], Just err, [])
ptl (_:rema) = ptl rema -- extended node types not supported in this tree type
ptl [] = ([], Nothing, [])
-- | A lower level function that converts a generalized SAX stream into a tree structure.
-- Ignores parse errors.
saxToTreeG :: forall tag text l . (GenericXMLString tag, List l) =>
l (SAXEvent tag text)
-> ItemM l (NodeG l tag text)
saxToTreeG events = do
li <- runList (process events)
case li of
Cons elt@(Element _ _ _ ) _ -> return elt
_ -> return $ Element (gxFromString "") mzero mzero
where
process :: l (SAXEvent tag text) -> l (NodeG l tag text)
process events = joinL $ process_ events
where
process_ :: l (SAXEvent tag text) -> ItemM l (l (NodeG l tag text))
process_ events = do
li <- runList events
case li of
Nil -> return mzero
Cons (StartElement name attrs) rema -> do
return $ Element name attrs (process rema) `cons` process (stripElement rema)
Cons (EndElement _) _ -> return mzero
Cons (CharacterData txt) rema -> return $ Text txt `cons` process rema
Cons _ rema -> process_ rema
stripElement :: l (SAXEvent tag text) -> l (SAXEvent tag text)
stripElement events = joinL $ stripElement_ 0 events
where
stripElement_ :: Int -> l (SAXEvent tag text) -> ItemM l (l (SAXEvent tag text))
stripElement_ level events = level `seq` do
li <- runList events
case li of
Nil -> return mzero
Cons (StartElement _ _) rema -> stripElement_ (level+1) rema
Cons (EndElement _) rema -> if level == 0 then return rema
else stripElement_ (level-1) rema
Cons _ rema -> stripElement_ level rema
-- | Lazily parse XML to tree. Note that forcing the XMLParseError return value
-- will force the entire parse. Therefore, to ensure lazy operation, don't
-- check the error status until you have processed the tree.
parse :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> (Node tag text, Maybe XMLParseError)
parse opts bs = saxToTree $ SAX.parse opts bs
-- | Parse a generalized list to a tree, ignoring parse errors.
-- This function allows for a parse from an enumerator/iteratee to a "lazy"
-- tree structure using the @List-enumerator@ package.
parseG :: (GenericXMLString tag, GenericXMLString text, List l) =>
ParseOptions tag text -- ^ Parse options
-> l ByteString -- ^ Input text as a generalized list of blocks
-> ItemM l (NodeG l tag text)
parseG opts = saxToTreeG . SAX.parseG opts
-- | Lazily parse XML to tree. In the event of an error, throw 'XMLParseException'.
--
-- @parseThrowing@ can throw an exception from pure code, which is generally a bad
-- way to handle errors, because Haskell\'s lazy evaluation means it\'s hard to
-- predict where it will be thrown from. However, it may be acceptable in
-- situations where it's not expected during normal operation, depending on the
-- design of your program.
parseThrowing :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> Node tag text
parseThrowing opts bs = fst $ saxToTree $ SAX.parseThrowing opts bs
hexpat-0.20.13/Text/XML/Expat/SAX.hs 0000644 0000000 0000000 00000036422 13122604047 014776 0 ustar 00 0000000 0000000 {-# LANGUAGE DeriveDataTypeable, TypeSynonymInstances, CPP, ScopedTypeVariables, FlexibleInstances, GADTs #-}
{-# OPTIONS_GHC -fno-cse -fno-full-laziness #-}
-- hexpat, a Haskell wrapper for expat
-- Copyright (C) 2008 Evan Martin
-- Copyright (C) 2009 Stephen Blackheath
-- | This module provides functions to parse an XML document to a lazy
-- stream of SAX events.
module Text.XML.Expat.SAX (
-- * XML primitives
Encoding(..),
XMLParseError(..),
XMLParseLocation(..),
-- * SAX-style parse
ParseOptions(..),
SAXEvent(..),
parse,
parseG,
parseLocations,
parseLocationsG,
parseLocationsThrowing,
parseThrowing,
defaultParseOptions,
-- * Variants that throw exceptions
XMLParseException(..),
-- * Abstraction of string types
GenericXMLString(..)
) where
import Control.Concurrent.MVar
import Control.Exception as Exc
import Text.XML.Expat.Internal.IO
import Data.Bits
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Internal as I
import Data.Int
import Data.ByteString.Internal (c2w, w2c, c_strlen)
import qualified Data.Monoid as M
import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Codec.Binary.UTF8.String as U8
import Data.List.Class (List(..), ListItem(..), cons, fromList, mapL)
import Data.Typeable
import Data.Word
import Control.Applicative
import Control.DeepSeq
import Control.Monad
import System.IO.Unsafe
import Foreign.C
import Foreign.ForeignPtr
import Foreign.Marshal.Array
import Foreign.Ptr
import Foreign.Storable
data ParseOptions tag text = ParseOptions
{ overrideEncoding :: Maybe Encoding
-- ^ The encoding parameter, if provided, overrides the document's
-- encoding declaration.
, entityDecoder :: Maybe (tag -> Maybe text)
-- ^ If provided, entity references (i.e. @ @ and friends) will
-- be decoded into text using the supplied lookup function
}
defaultParseOptions :: ParseOptions tag text
defaultParseOptions = ParseOptions Nothing Nothing
-- | An abstraction for any string type you want to use as xml text (that is,
-- attribute values or element text content). If you want to use a
-- new string type with /hexpat/, you must make it an instance of
-- 'GenericXMLString'.
class (M.Monoid s, Eq s) => GenericXMLString s where
gxNullString :: s -> Bool
gxToString :: s -> String
gxFromString :: String -> s
gxFromChar :: Char -> s
gxHead :: s -> Char
gxTail :: s -> s
gxBreakOn :: Char -> s -> (s, s)
gxFromByteString :: B.ByteString -> s
gxToByteString :: s -> B.ByteString
instance GenericXMLString String where
gxNullString = null
gxToString = id
gxFromString = id
gxFromChar c = [c]
gxHead = head
gxTail = tail
gxBreakOn c = break (==c)
gxFromByteString = U8.decode . B.unpack
gxToByteString = B.pack . map c2w . U8.encodeString
instance GenericXMLString B.ByteString where
gxNullString = B.null
gxToString = U8.decodeString . map w2c . B.unpack
gxFromString = B.pack . map c2w . U8.encodeString
gxFromChar = B.singleton . c2w
gxHead = w2c . B.head
gxTail = B.tail
gxBreakOn c = B.break (== c2w c)
gxFromByteString = id
gxToByteString = id
instance GenericXMLString T.Text where
gxNullString = T.null
gxToString = T.unpack
gxFromString = T.pack
gxFromChar = T.singleton
gxHead = T.head
gxTail = T.tail
#if MIN_VERSION_text(0,11,0)
gxBreakOn c = T.break (==c)
#elif MIN_VERSION_text(0,10,0)
-- breakBy gets renamed to break between 0.10.0.0 and 0.10.0.1.
-- There's no 'break' function that is consistent between these two
-- versions so we work around it using other functions.
gxBreakOn c t = (T.takeWhile (/=c) t, T.dropWhile (/=c) t)
#else
gxBreakOn c = T.breakBy (==c)
#endif
gxFromByteString = TE.decodeUtf8
gxToByteString = TE.encodeUtf8
data SAXEvent tag text =
XMLDeclaration text (Maybe text) (Maybe Bool) |
StartElement tag [(tag, text)] |
EndElement tag |
CharacterData text |
StartCData |
EndCData |
ProcessingInstruction text text |
Comment text |
FailDocument XMLParseError
deriving (Eq, Show)
instance (NFData tag, NFData text) => NFData (SAXEvent tag text) where
rnf (XMLDeclaration ver mEnc mSD) = rnf ver `seq` rnf mEnc `seq` rnf mSD
rnf (StartElement tag atts) = rnf tag `seq` rnf atts
rnf (EndElement tag) = rnf tag
rnf (CharacterData text) = rnf text
rnf StartCData = ()
rnf EndCData = ()
rnf (ProcessingInstruction target text) = rnf target `seq` rnf text
rnf (Comment text) = rnf text
rnf (FailDocument err) = rnf err
-- | Parse a generalized list of ByteStrings containing XML to SAX events.
-- In the event of an error, FailDocument is the last element of the output list.
parseG :: forall tag text l . (GenericXMLString tag, GenericXMLString text, List l) =>
ParseOptions tag text -- ^ Parse options
-> l ByteString -- ^ Input text (a lazy ByteString)
-> l (SAXEvent tag text)
{-# NOINLINE parseG #-}
parseG opts inputBlocks = mapL (return . fst) $ parseImpl opts inputBlocks False noExtra failureA
where noExtra _ offset = return ((), offset)
failureA _ = return ()
-- | Parse a generalized list of ByteStrings containing XML to SAX events.
-- In the event of an error, FailDocument is the last element of the output list.
parseLocationsG :: forall tag text l . (GenericXMLString tag, GenericXMLString text, List l) =>
ParseOptions tag text -- ^ Parse options
-> l ByteString -- ^ Input text (a lazy ByteString)
-> l (SAXEvent tag text, XMLParseLocation)
{-# NOINLINE parseLocationsG #-}
parseLocationsG opts inputBlocks = parseImpl opts inputBlocks True fetchLocation id
where
fetchLocation pBuf offset = do
[a, b, c, d] <- peekArray 4 (pBuf `plusPtr` offset :: Ptr Int64)
return (XMLParseLocation a b c d, offset + 32)
parseImpl :: forall a tag text l . (GenericXMLString tag, GenericXMLString text, List l) =>
ParseOptions tag text -- ^ Parse options
-> l ByteString -- ^ Input text (a lazy ByteString)
-> Bool -- ^ True to add locations to binary output
-> (Ptr Word8 -> Int -> IO (a, Int)) -- ^ Fetch extra data
-> (IO XMLParseLocation -> IO a) -- ^ Fetch a value for failure case
-> l (SAXEvent tag text, a)
parseImpl opts inputBlocks addLocations extra failureA = runParser inputBlocks parse cacheRef
where
(parse, getLocation, cacheRef) = unsafePerformIO $ do
(parse, getLocation) <- hexpatNewParser
(overrideEncoding opts)
((\decode -> fmap gxToByteString . decode . gxFromByteString) <$> entityDecoder opts)
addLocations
cacheRef <- newMVar Nothing
return (parse, getLocation, cacheRef)
runParser iblks parse cacheRef = joinL $ do
li <- runList iblks
return $ unsafePerformIO $ do
mCached <- takeMVar cacheRef
case mCached of
Just l -> do
putMVar cacheRef mCached
return l
Nothing -> do
(saxen, rema) <- case li of
Nil -> do
(buf, len, mError) <- parse B.empty True
saxen <- parseBuf buf len extra
rema <- handleFailure mError mzero
return (saxen, rema)
Cons blk t -> {-unsafeInterleaveIO $-} do
(buf, len, mError) <- parse blk False
saxen <- parseBuf buf len extra
cacheRef' <- newMVar Nothing
rema <- handleFailure mError (runParser t parse cacheRef')
return (saxen, rema)
let l = fromList saxen `mplus` rema
putMVar cacheRef (Just l)
return l
where
handleFailure (Just err) _ = do a <- failureA getLocation
return $ (FailDocument err, a) `cons` mzero
handleFailure Nothing l = return l
parseBuf :: (GenericXMLString tag, GenericXMLString text) =>
ForeignPtr Word8 -> CInt -> (Ptr Word8 -> Int -> IO (a, Int)) -> IO [(SAXEvent tag text, a)]
parseBuf buf _ processExtra = withForeignPtr buf $ \pBuf -> doit [] pBuf 0
where
roundUp32 offset = (offset + 3) .&. complement 3
doit acc pBuf offset0 = offset0 `seq` do
typ <- peek (pBuf `plusPtr` offset0 :: Ptr Word32)
(a, offset) <- processExtra pBuf (offset0 + 4)
case typ of
0 -> return (reverse acc)
1 -> do
nAtts <- peek (pBuf `plusPtr` offset :: Ptr Word32)
let pName = pBuf `plusPtr` (offset + 4)
lName <- fromIntegral <$> c_strlen pName
let name = gxFromByteString $ I.fromForeignPtr buf (offset + 4) lName
(atts, offset') <- foldM (\(atts, offset) _ -> do
let pAtt = pBuf `plusPtr` offset
lAtt <- fromIntegral <$> c_strlen pAtt
let att = gxFromByteString $ I.fromForeignPtr buf offset lAtt
offset' = offset + lAtt + 1
pValue = pBuf `plusPtr` offset'
lValue <- fromIntegral <$> c_strlen pValue
let value = gxFromByteString $ I.fromForeignPtr buf offset' lValue
return ((att, value):atts, offset' + lValue + 1)
) ([], offset + 4 + lName + 1) [1,3..nAtts]
doit ((StartElement name (reverse atts), a) : acc) pBuf (roundUp32 offset')
2 -> do
let pName = pBuf `plusPtr` offset
lName <- fromIntegral <$> c_strlen pName
let name = gxFromByteString $ I.fromForeignPtr buf offset lName
offset' = offset + lName + 1
doit ((EndElement name, a) : acc) pBuf (roundUp32 offset')
3 -> do
len <- fromIntegral <$> peek (pBuf `plusPtr` offset :: Ptr Word32)
let text = gxFromByteString $ I.fromForeignPtr buf (offset + 4) len
offset' = offset + 4 + len
doit ((CharacterData text, a) : acc) pBuf (roundUp32 offset')
4 -> do
let pEnc = pBuf `plusPtr` offset
lEnc <- fromIntegral <$> c_strlen pEnc
let enc = gxFromByteString $ I.fromForeignPtr buf offset lEnc
offset' = offset + lEnc + 1
pVer = pBuf `plusPtr` offset'
pVerFirst <- peek (castPtr pVer :: Ptr Word8)
(mVer, offset'') <- case pVerFirst of
0 -> return (Nothing, offset' + 1)
1 -> do
lVer <- fromIntegral <$> c_strlen (pVer `plusPtr` 1)
return (Just $ gxFromByteString $ I.fromForeignPtr buf (offset' + 1) lVer, offset' + 1 + lVer + 1)
_ -> error "hexpat: bad data from C land"
cSta <- peek (pBuf `plusPtr` offset'' :: Ptr Int8)
let sta = if cSta < 0 then Nothing else
if cSta == 0 then Just False else
Just True
doit ((XMLDeclaration enc mVer sta, a) : acc) pBuf (roundUp32 (offset'' + 1))
5 -> doit ((StartCData, a) : acc) pBuf offset
6 -> doit ((EndCData, a) : acc) pBuf offset
7 -> do
let pTarget = pBuf `plusPtr` offset
lTarget <- fromIntegral <$> c_strlen pTarget
let target = gxFromByteString $ I.fromForeignPtr buf offset lTarget
offset' = offset + lTarget + 1
pData = pBuf `plusPtr` offset'
lData <- fromIntegral <$> c_strlen pData
let dat = gxFromByteString $ I.fromForeignPtr buf offset' lData
doit ((ProcessingInstruction target dat, a) : acc) pBuf (roundUp32 (offset' + lData + 1))
8 -> do
let pText = pBuf `plusPtr` offset
lText <- fromIntegral <$> c_strlen pText
let text = gxFromByteString $ I.fromForeignPtr buf offset lText
doit ((Comment text, a) : acc) pBuf (roundUp32 (offset + lText + 1))
_ -> error "hexpat: bad data from C land"
-- | Lazily parse XML to SAX events. In the event of an error, FailDocument is
-- the last element of the output list.
parse :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> [SAXEvent tag text]
parse opts input = parseG opts (L.toChunks input)
-- | An exception indicating an XML parse error, used by the /..Throwing/ variants.
data XMLParseException = XMLParseException XMLParseError
deriving (Eq, Show, Typeable)
instance Exception XMLParseException where
-- | A variant of parseSAX that gives a document location with each SAX event.
parseLocations :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> [(SAXEvent tag text, XMLParseLocation)]
parseLocations opts input = parseLocationsG opts (L.toChunks input)
-- | Lazily parse XML to SAX events. In the event of an error, throw
-- 'XMLParseException'.
--
-- @parseThrowing@ can throw an exception from pure code, which is generally a bad
-- way to handle errors, because Haskell\'s lazy evaluation means it\'s hard to
-- predict where it will be thrown from. However, it may be acceptable in
-- situations where it's not expected during normal operation, depending on the
-- design of your program.
parseThrowing :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ input text (a lazy ByteString)
-> [SAXEvent tag text]
parseThrowing opts bs = map freakOut $ parse opts bs
where
freakOut (FailDocument err) = Exc.throw $ XMLParseException err
freakOut other = other
-- | A variant of parseSAX that gives a document location with each SAX event.
-- In the event of an error, throw 'XMLParseException'.
--
-- @parseLocationsThrowing@ can throw an exception from pure code, which is generally a bad
-- way to handle errors, because Haskell\'s lazy evaluation means it\'s hard to
-- predict where it will be thrown from. However, it may be acceptable in
-- situations where it's not expected during normal operation, depending on the
-- design of your program.
parseLocationsThrowing :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Optional encoding override
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> [(SAXEvent tag text, XMLParseLocation)]
parseLocationsThrowing opts bs = map freakOut $ parseLocations opts bs
where
freakOut (FailDocument err, _) = Exc.throw $ XMLParseException err
freakOut other = other
hexpat-0.20.13/Text/XML/Expat/Proc.hs 0000644 0000000 0000000 00000007715 13122604047 015251 0 ustar 00 0000000 0000000 {-# LANGUAGE FlexibleContexts #-}
-- | This module ported from Text.XML.Light.Proc
module Text.XML.Expat.Proc where
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.SAX
import Control.Monad
import Data.List.Class (filter)
import Data.Maybe(listToMaybe)
import Data.Monoid
import Prelude hiding (filter)
-- | Select only the elements from a list of XML content.
onlyElems :: NodeClass n c => c (n c tag text) -> c (n c tag text)
onlyElems = filter isElement
-- | Select only the text from a list of XML content.
onlyText :: (NodeClass n c, Monoid text) => c (n c tag text) -> c text
onlyText = fmap getText . filter isText
-- | Find all immediate children with the given name.
findChildren :: (NodeClass n c, Eq tag, Monoid tag) => tag -> n c tag text -> c (n c tag text)
findChildren q e = filterChildren ((q ==) . getName) e
-- | Filter all immediate children wrt a given predicate.
filterChildren :: NodeClass n c => (n c tag text -> Bool) -> n c tag text -> c (n c tag text)
filterChildren p e | isElement e = filter p (onlyElems (getChildren e))
filterChildren _ _ = mzero
-- | Filter all immediate children wrt a given predicate over their names.
filterChildrenName :: (NodeClass n c, Monoid tag) => (tag -> Bool) -> n c tag text -> c (n c tag text)
filterChildrenName p e | isElement e = filter (p . getName) (onlyElems (getChildren e))
filterChildrenName _ _ = mzero
-- | Find an immediate child with the given name.
findChild :: (NodeClass n [], GenericXMLString tag) => tag -> n [] tag text -> Maybe (n [] tag text)
findChild q e = listToMaybe (findChildren q e)
-- | Find an immediate child with the given name.
filterChild :: NodeClass n [] => (n [] tag text -> Bool) -> n [] tag text -> Maybe (n [] tag text)
filterChild p e = listToMaybe (filterChildren p e)
-- | Find an immediate child with name matching a predicate.
filterChildName :: (NodeClass n [], Monoid tag) => (tag -> Bool) -> n [] tag text -> Maybe (n [] tag text)
filterChildName p e = listToMaybe (filterChildrenName p e)
-- | Find the left-most occurrence of an element matching given name.
findElement :: (NodeClass n [], Eq tag, Monoid tag) => tag -> n [] tag text -> Maybe (n [] tag text)
findElement q e = listToMaybe (findElements q e)
-- | Filter the left-most occurrence of an element wrt. given predicate.
filterElement :: NodeClass n [] => (n [] tag text -> Bool) -> n [] tag text -> Maybe (n [] tag text)
filterElement p e = listToMaybe (filterElements p e)
-- | Filter the left-most occurrence of an element wrt. given predicate.
filterElementName :: (NodeClass n [], Monoid tag) => (tag -> Bool) -> n [] tag text -> Maybe (n [] tag text)
filterElementName p e = listToMaybe (filterElementsName p e)
-- | Find all non-nested occurances of an element.
-- (i.e., once we have found an element, we do not search
-- for more occurances among the element's children).
findElements :: (NodeClass n c, Eq tag, Monoid tag) => tag -> n c tag text -> c (n c tag text)
findElements qn e = filterElementsName (qn==) e
-- | Find all non-nested occurrences of an element wrt. given predicate.
-- (i.e., once we have found an element, we do not search
-- for more occurances among the element's children).
filterElements :: NodeClass n c => (n c tag text -> Bool) -> n c tag text -> c (n c tag text)
filterElements p e
| p e = return e
| isElement e = join $ fmap (filterElements p) $ onlyElems $ getChildren e
| otherwise = mzero
-- | Find all non-nested occurences of an element wrt a predicate over element names.
-- (i.e., once we have found an element, we do not search
-- for more occurances among the element's children).
filterElementsName :: (NodeClass n c, Monoid tag) => (tag -> Bool) -> n c tag text -> c (n c tag text)
filterElementsName p e | isElement e = filterElements (p . getName) e
filterElementsName _ _ = mzero
hexpat-0.20.13/Text/XML/Expat/Format.hs 0000644 0000000 0000000 00000035224 13122604047 015572 0 ustar 00 0000000 0000000 {-# LANGUAGE FlexibleContexts, ScopedTypeVariables #-}
-- hexpat, a Haskell wrapper for expat
-- Copyright (C) 2008 Evan Martin
-- Copyright (C) 2009 Stephen Blackheath
-- | This module provides functions to format a tree
-- structure or SAX stream as UTF-8 encoded XML.
--
-- The formatting functions always outputs only UTF-8, regardless
-- of what encoding is specified in the document's 'Doc.XMLDeclaration'.
-- If you want to output a document in another encoding, then make sure the
-- 'Doc.XMLDeclaration' agrees with the final output encoding, then format the
-- document, and convert from UTF-8 to your desired encoding using some text
-- conversion library.
--
-- The lazy 'L.ByteString' representation of the output in generated with very
-- small chunks, so in some applications you may want to combine them into
-- larger chunks to get better efficiency.
module Text.XML.Expat.Format (
-- * High level
format,
format',
formatG,
formatNode,
formatNode',
formatNodeG,
-- * Format document (for use with Extended.hs)
formatDocument,
formatDocument',
formatDocumentG,
-- * Low level
xmlHeader,
treeToSAX,
documentToSAX,
formatSAX,
formatSAX',
formatSAXG,
-- * Indentation
indent,
indent_
) where
import qualified Text.XML.Expat.Internal.DocumentClass as Doc
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.SAX
import Control.Monad
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Internal (c2w, w2c)
import Data.Char (isSpace)
import Data.List.Class (List(..), ListItem(..), fromList)
import Data.Monoid
import Data.Word
import Data.Text (Text)
import Text.XML.Expat.Tree (UNode)
-- | Format document with
n [] tag text
-> L.ByteString
format node = L.fromChunks (xmlHeader : formatNodeG node)
{-# SPECIALIZE format :: UNode Text -> L.ByteString #-}
-- | Format document with
n c tag text
-> c B.ByteString
formatG node = cons xmlHeader $ formatNodeG node
-- | Format document with
n [] tag text
-> B.ByteString
format' = B.concat . L.toChunks . format
-- | Format XML node with no header - lazy variant that returns lazy ByteString.
formatNode :: (NodeClass n [], GenericXMLString tag, GenericXMLString text) =>
n [] tag text
-> L.ByteString
formatNode = formatSAX . treeToSAX
-- | Format XML node with no header - strict variant that returns strict ByteString.
formatNode' :: (NodeClass n [], GenericXMLString tag, GenericXMLString text) =>
n [] tag text
-> B.ByteString
formatNode' = B.concat . L.toChunks . formatNode
-- | Format XML node with no header - generalized variant that returns a generic
-- list of strict ByteStrings.
formatNodeG :: (NodeClass n c, GenericXMLString tag, GenericXMLString text) =>
n c tag text
-> c B.ByteString
formatNodeG = formatSAXG . treeToSAX
{-# SPECIALIZE formatNodeG :: UNode Text -> [B.ByteString] #-}
-- | Format an XML document - lazy variant that returns lazy ByteString.
formatDocument :: (Doc.DocumentClass d [], GenericXMLString tag, GenericXMLString text) =>
d [] tag text
-> L.ByteString
formatDocument = formatSAX . documentToSAX
-- | Format an XML document - strict variant that returns strict ByteString.
formatDocument' :: (Doc.DocumentClass d [], GenericXMLString tag, GenericXMLString text) =>
d [] tag text
-> B.ByteString
formatDocument' = B.concat . L.toChunks . formatDocument
-- | Format an XML document - generalized variant that returns a generic
-- list of strict ByteStrings.
formatDocumentG :: (Doc.DocumentClass d c, GenericXMLString tag, GenericXMLString text) =>
d c tag text
-> c B.ByteString
formatDocumentG = formatSAXG . documentToSAX
-- | The standard XML header with UTF-8 encoding.
xmlHeader :: B.ByteString
xmlHeader = B.pack $ map c2w "\n"
documentToSAX :: forall tag text d c . (GenericXMLString tag, GenericXMLString text,
Monoid text, Doc.DocumentClass d c) =>
d c tag text -> c (SAXEvent tag text)
documentToSAX doc =
(case Doc.getXMLDeclaration doc of
Just (Doc.XMLDeclaration ver mEnc sd) -> fromList [
XMLDeclaration ver mEnc sd, CharacterData (gxFromString "\n")]
Nothing -> mzero) `mplus`
join (fmap (\misc -> fromList [case misc of
Doc.ProcessingInstruction target text -> ProcessingInstruction target text
Doc.Comment text -> Comment text,
CharacterData (gxFromString "\n")]
) (Doc.getTopLevelMiscs doc)) `mplus`
treeToSAX (Doc.getRoot doc)
-- | Flatten a tree structure into SAX events, monadic version.
treeToSAX :: forall tag text n c . (GenericXMLString tag, GenericXMLString text,
Monoid text, NodeClass n c) =>
n c tag text -> c (SAXEvent tag text)
treeToSAX node
| isElement node =
let name = getName node
atts = getAttributes node
children = getChildren node
postpend :: c (SAXEvent tag text) -> c (SAXEvent tag text)
postpend l = joinL $ do
li <- runList l
return $ case li of
Nil -> singleton (EndElement name)
Cons n l' -> cons n (postpend l')
in cons (StartElement name atts) $
postpend (concatL $ fmap treeToSAX children)
| isCData node =
cons StartCData (cons (CharacterData $ getText node) (singleton EndCData))
| isText node =
singleton (CharacterData $ getText node)
| isProcessingInstruction node =
singleton (ProcessingInstruction (getTarget node) (getText node))
| isComment node =
singleton (Comment $ getText node)
| otherwise = mzero
where
singleton = return
concatL = join
{-# SPECIALIZE treeToSAX :: UNode Text -> [(SAXEvent Text Text)] #-}
-- | Format SAX events with no header - lazy variant that returns lazy ByteString.
formatSAX :: (GenericXMLString tag, GenericXMLString text) =>
[SAXEvent tag text]
-> L.ByteString
formatSAX = L.fromChunks . formatSAXG
-- | Format SAX events with no header - strict variant that returns strict ByteString.
formatSAX' :: (GenericXMLString tag, GenericXMLString text) =>
[SAXEvent tag text]
-> B.ByteString
formatSAX' = B.concat . formatSAXG
-- Do start tag and attributes but omit closing >
startTagHelper :: (GenericXMLString tag, GenericXMLString text) =>
tag
-> [(tag, text)]
-> [B.ByteString]
startTagHelper name atts =
B.singleton (c2w '<'):
gxToByteString name:
Prelude.concatMap (
\(aname, avalue) ->
B.singleton (c2w ' '):
gxToByteString aname:
pack "=\"":
escapeText (gxToByteString avalue)++
[B.singleton (c2w '"')]
) atts
-- | Format SAX events with no header - generalized variant that uses generic
-- list.
formatSAXG :: forall c tag text . (List c, GenericXMLString tag,
GenericXMLString text) =>
c (SAXEvent tag text) -- ^ SAX events
-> c B.ByteString
formatSAXG l1 = formatSAXGb l1 False
{-# SPECIALIZE formatSAXG :: [SAXEvent Text Text] -> [B.ByteString] #-}
formatSAXGb :: forall c tag text . (List c, GenericXMLString tag,
GenericXMLString text) =>
c (SAXEvent tag text) -- ^ SAX events
-> Bool -- ^ True if processing CDATA
-> c B.ByteString
formatSAXGb l1 cd = joinL $ do
it1 <- runList l1
return $ formatItem it1
where
formatItem it1 = case it1 of
Nil -> mzero
Cons (XMLDeclaration ver mEnc mSD) l2 ->
return (pack " mzero
Just enc ->
return (pack " encoding=\"") `mplus`
fromList (escapeText (gxToByteString enc)) `mplus`
return (pack "\"")
) `mplus`
(
case mSD of
Nothing -> mzero
Just True -> return (pack " standalone=\"yes\"")
Just False -> return (pack " standalone=\"no\"")
) `mplus`
return (pack ("?>"))
`mplus`
formatSAXGb l2 cd
Cons (StartElement name attrs) l2 ->
fromList (startTagHelper name attrs)
`mplus` (
joinL $ do
it2 <- runList l2
return $ case it2 of
Cons (EndElement _) l3 ->
cons (pack "/>") $
formatSAXGb l3 cd
_ ->
cons (B.singleton (c2w '>')) $
formatItem it2
)
Cons (EndElement name) l2 ->
cons (pack "") $
cons (gxToByteString name) $
cons (B.singleton (c2w '>')) $
formatSAXGb l2 cd
Cons (CharacterData txt) l2 ->
(if cd then
fromList [gxToByteString txt]
else
fromList (escapeText (gxToByteString txt))
) `mplus` (formatSAXGb l2 cd)
Cons StartCData l2 ->
cons(pack "
cons(pack "]]>") $
formatSAXGb l2 False
Cons (ProcessingInstruction target txt) l2 ->
cons (pack "") $
cons (gxToByteString target) $
cons (pack " ") $
cons (gxToByteString txt) $
cons (pack "?>") $
formatSAXGb l2 cd
Cons (Comment txt) l2 ->
cons (pack "") $
formatSAXGb l2 cd
Cons (FailDocument _) l2 ->
formatSAXGb l2 cd
{-# SPECIALIZE formatSAXGb :: [SAXEvent Text Text] -> Bool -> [B.ByteString] #-}
pack :: String -> B.ByteString
pack = B.pack . map c2w
isSafeChar :: Word8 -> Bool
isSafeChar c =
(c /= c2w '&')
&& (c /= c2w '<')
&& (c /= c2w '>')
&& (c /= c2w '"')
&& (c /= c2w '\'')
{-# INLINE isSafeChar #-}
escapeText :: B.ByteString -> [B.ByteString]
escapeText str | B.null str = []
escapeText str =
let (good, bad) = B.span isSafeChar str
in if B.null good
then case w2c $ B.head str of
'&' -> pack "&":escapeText rema
'<' -> pack "<":escapeText rema
'>' -> pack ">":escapeText rema
'"' -> pack """:escapeText rema
'\'' -> pack "'":escapeText rema
_ -> error "hexpat: impossible"
else good:escapeText bad
where
rema = B.tail str
-- | Make the output prettier by adding indentation.
indent :: (NodeClass n c, GenericXMLString tag, GenericXMLString text) =>
Int -- ^ Number of indentation spaces per nesting level
-> n c tag text
-> n c tag text
indent = indent_ 0
-- | Make the output prettier by adding indentation, specifying initial indent.
indent_ :: forall n c tag text . (NodeClass n c, GenericXMLString tag, GenericXMLString text) =>
Int -- ^ Initial indent (spaces)
-> Int -- ^ Number of indentation spaces per nesting level
-> n c tag text
-> n c tag text
indent_ cur perLevel elt | isElement elt =
flip modifyChildren elt $ \chs -> joinL $ do
(anyElts, chs') <- anyElements [] chs
-- The new list chs' is the same as the old list chs, but some of its
-- nodes have been loaded into memory. This is to avoid evaluating
-- list elements twice.
if anyElts
then addSpace True chs'
else return chs'
where
addSpace :: Bool -> c (n c tag text) -> ItemM c (c (n c tag text))
addSpace startOfText l = do
ch <- runList l
case ch of
Nil -> return $ singleton (mkText $ gxFromString ('\n':replicate cur ' '))
Cons elt l' | isElement elt -> do
let cur' = cur + perLevel
return $
cons (mkText $ gxFromString ('\n':replicate cur' ' ')) $
cons (indent_ cur' perLevel elt) $
joinL (addSpace True l')
Cons tx l' | isText tx && startOfText ->
case strip (getText tx) of
Nothing -> addSpace True l'
Just t' -> return $
cons (mkText t') $
joinL $ addSpace False l'
Cons n l' ->
return $
cons n $
joinL $ addSpace False l'
-- acc is used to keep the nodes we've scanned into memory.
-- We then construct a new list that looks the same as the old list, but
-- which starts with the nodes in memory, to prevent the list being
-- demanded more than once (in case it's monadic and it's expensive to
-- evaluate).
anyElements :: [n c tag text] -- ^ Accumulator for tags we've looked at.
-> c (n c tag text)
-> ItemM c (Bool, c (n c tag text))
anyElements acc l = do
n <- runList l
case n of
Nil -> return (False, instantiatedList acc mzero)
Cons n l' | isElement n -> return (True, instantiatedList (n:acc) l')
Cons n l' -> anyElements (n:acc) l'
where
instantiatedList :: [n c tag text] -> c (n c tag text) -> c (n c tag text)
instantiatedList acc l' = reverse acc `prepend` l'
prepend :: forall a . [a] -> c a -> c a
prepend xs l = foldr cons l xs
strip t | gxNullString t = Nothing
strip t | isSpace (gxHead t) = strip (gxTail t)
strip t = Just t
singleton = return
indent_ _ _ n = n
hexpat-0.20.13/Text/XML/Expat/Extended.hs 0000644 0000000 0000000 00000042044 13122604047 016100 0 ustar 00 0000000 0000000 {-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, TypeFamilies,
FlexibleContexts, EmptyDataDecls #-}
-- | An extended variant of /Node/ intended to implement the entire XML
-- specification. DTDs are not yet supported, however.
--
-- The names conflict with those in /Tree/ so you must use qualified import
-- if you want to use both modules.
module Text.XML.Expat.Extended (
-- * Tree structure
Document,
DocumentG(..),
Node,
NodeG(..),
UDocument,
LDocument,
ULDocument,
UNode,
LNode,
ULNode,
-- * Generic document/node manipulation
module Text.XML.Expat.Internal.DocumentClass,
module Text.XML.Expat.Internal.NodeClass,
-- * Annotation-specific
modifyAnnotation,
mapAnnotation,
mapDocumentAnnotation,
-- * Qualified nodes
QDocument,
QLDocument,
QNode,
QLNode,
module Text.XML.Expat.Internal.Qualified,
-- * Namespaced nodes
NDocument,
NLDocument,
NNode,
NLNode,
module Text.XML.Expat.Internal.Namespaced,
-- * Parse to tree
ParseOptions(..),
defaultParseOptions,
Encoding(..),
parse,
parse',
XMLParseError(..),
XMLParseLocation(..),
-- * Variant that throws exceptions
parseThrowing,
XMLParseException(..),
-- * Convert from SAX
saxToTree,
-- * Abstraction of string types
GenericXMLString(..)
) where
import Control.Arrow
import Text.XML.Expat.SAX ( Encoding(..)
, GenericXMLString(..)
, ParseOptions(..)
, defaultParseOptions
, SAXEvent
, XMLParseError(..)
, XMLParseException(..)
, XMLParseLocation(..) )
import qualified Text.XML.Expat.SAX as SAX
import Text.XML.Expat.Internal.DocumentClass
import Text.XML.Expat.Internal.Namespaced
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.Internal.Qualified
import Control.Monad (mplus, mzero)
import Control.DeepSeq
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
import Data.List.Class (List, foldlL, joinM)
import Data.Maybe
import Data.Monoid
-- | Document representation of the XML document, intended to support the entire
-- XML specification. DTDs are not yet supported, however.
data DocumentG a c tag text = Document {
dXMLDeclaration :: Maybe (XMLDeclaration text),
dDocumentTypeDeclaration :: Maybe (DocumentTypeDeclaration c tag text),
dTopLevelMiscs :: c (Misc text),
dRoot :: NodeG a c tag text
}
instance (Show tag, Show text, Show a) => Show (DocumentG a [] tag text) where
showsPrec d (Document xd dtd m r) = showParen (d > 10) $
("Document "++) . showsPrec 11 xd . (" "++) .
showsPrec 11 dtd . (" "++) .
showsPrec 11 m . (" "++) .
showsPrec 11 r
instance (Eq tag, Eq text, Eq a) => Eq (DocumentG a [] tag text) where
Document xd1 dtd1 m1 r1 == Document xd2 dtd2 m2 r2 =
xd1 == xd2 &&
dtd1 == dtd2 &&
m1 == m2 &&
r1 == r2
-- | A pure representation of an XML document that uses a list as its container type.
type Document a tag text = DocumentG a [] tag text
type instance NodeType (DocumentG ann) = NodeG ann
instance (Functor c, List c) => DocumentClass (DocumentG ann) c where
getXMLDeclaration = dXMLDeclaration
getDocumentTypeDeclaration = dDocumentTypeDeclaration
getTopLevelMiscs = dTopLevelMiscs
getRoot = dRoot
mkDocument = Document
-- | Extended variant of the tree representation of the XML document, intended
-- to support the entire XML specification. DTDs are not yet supported, however.
--
-- @c@ is the container type for the element's children, which is [] in the
-- @hexpat@ package, and a monadic list type for @hexpat-iteratee@.
--
-- @tag@ is the tag type, which can either be one of several string types,
-- or a special type from the @Text.XML.Expat.Namespaced@ or
-- @Text.XML.Expat.Qualified@ modules.
--
-- @text@ is the string type for text content.
--
-- @a@ is the type of the annotation. One of the things this can be used for
-- is to store the XML parse location, which is useful for error handling.
--
-- Note that some functions in the @Text.XML.Expat.Cursor@ module need to create
-- new nodes through the 'MkElementClass' type class. Normally this can only be done
-- if @a@ is a Maybe type or () (so it can provide the Nothing value for the annotation
-- on newly created nodes). Or, you can write your own 'MkElementClass' instance.
-- Apart from that, there is no requirement for @a@ to be a Maybe type.
data NodeG a c tag text =
Element {
eName :: !tag,
eAttributes :: ![(tag,text)],
eChildren :: c (NodeG a c tag text),
eAnn :: a
} |
Text !text |
CData !text |
Misc (Misc text)
type instance ListOf (NodeG a c tag text) = c (NodeG a c tag text)
-- | A pure tree representation that uses a list as its container type,
-- extended variant.
--
-- In the @hexpat@ package, a list of nodes has the type @[Node tag text]@, but note
-- that you can also use the more general type function 'ListOf' to give a list of
-- any node type, using that node's associated list type, e.g.
-- @ListOf (UNode Text)@.
type Node a tag text = NodeG a [] tag text
instance (Show tag, Show text, Show a) => Show (NodeG a [] tag text) where
showsPrec d (Element na at ch an) = showParen (d > 10) $
("Element "++) . showsPrec 11 na . (" "++) .
showsPrec 11 at . (" "++) .
showsPrec 11 ch . (" "++) .
showsPrec 11 an
showsPrec d (Text t) = showParen (d > 10) $ ("Text "++) . showsPrec 11 t
showsPrec d (CData t) = showParen (d > 10) $ ("CData "++) . showsPrec 11 t
showsPrec d (Misc m) = showParen (d > 10) $ ("Misc "++) . showsPrec 11 m
instance (Eq tag, Eq text, Eq a) => Eq (NodeG a [] tag text) where
Element na1 at1 ch1 an1 == Element na2 at2 ch2 an2 =
na1 == na2 &&
at1 == at2 &&
ch1 == ch2 &&
an1 == an2
Text t1 == Text t2 = t1 == t2
CData t1 == CData t2 = t1 == t2
Misc t1 == Misc t2 = t1 == t2
_ == _ = False
instance (NFData tag, NFData text, NFData a) => NFData (NodeG a [] tag text) where
rnf (Element nam att chi ann) = rnf (nam, att, chi, ann)
rnf (Text txt) = rnf txt
rnf (CData txt) = rnf txt
rnf (Misc m) = rnf m
instance (Functor c, List c) => NodeClass (NodeG a) c where
textContentM (Element _ _ children _) = foldlL mappend mempty $ joinM $ fmap textContentM children
textContentM (Text txt) = return txt
textContentM (CData txt) = return txt
textContentM (Misc (ProcessingInstruction _ _)) = return mempty
textContentM (Misc (Comment _)) = return mempty
isElement (Element _ _ _ _) = True
isElement _ = False
isText (Text _) = True
isText (CData _) = True
isText _ = False
isCData (CData _) = True
isCData _ = False
isProcessingInstruction (Misc (ProcessingInstruction _ _)) = True
isProcessingInstruction _ = False
isComment (Misc (Comment _)) = True
isComment _ = False
isNamed nm (Element nm' _ _ _) = nm == nm'
isNamed _ _ = False
getName (Element name _ _ _) = name
getName _ = mempty
hasTarget t (Misc (ProcessingInstruction t' _ )) = t == t'
hasTarget _ _ = False
getTarget (Misc (ProcessingInstruction target _)) = target
getTarget _ = mempty
getAttributes (Element _ attrs _ _) = attrs
getAttributes _ = []
getChildren (Element _ _ ch _) = ch
getChildren _ = mzero
getText (Text txt) = txt
getText (CData txt) = txt
getText (Misc (ProcessingInstruction _ txt)) = txt
getText (Misc (Comment txt)) = txt
getText (Element _ _ _ _) = mempty
modifyName f (Element n a c ann) = Element (f n) a c ann
modifyName _ node = node
modifyAttributes f (Element n a c ann) = Element n (f a) c ann
modifyAttributes _ node = node
modifyChildren f (Element n a c ann) = Element n a (f c) ann
modifyChildren _ node = node
mapAllTags f (Element n a c ann) = Element (f n) (map (first f) a) (fmap (mapAllTags f) c) ann
mapAllTags _ (Text txt) = Text txt
mapAllTags _ (CData txt) = CData txt
mapAllTags _ (Misc (ProcessingInstruction n txt)) = Misc (ProcessingInstruction n txt)
mapAllTags _ (Misc (Comment txt)) = Misc (Comment txt)
modifyElement f (Element n a c ann) =
let (n', a', c') = f (n, a, c)
in Element n' a' c' ann
modifyElement _ (Text txt) = Text txt
modifyElement _ (CData txt) = CData txt
modifyElement _ (Misc (ProcessingInstruction n txt)) = Misc (ProcessingInstruction n txt)
modifyElement _ (Misc (Comment txt)) = Misc (Comment txt)
mapNodeContainer f (Element n a ch an) = do
ch' <- mapNodeListContainer f ch
return $ Element n a ch' an
mapNodeContainer _ (Text txt) = return $ (Text txt)
mapNodeContainer _ (CData txt) = return $ (CData txt)
mapNodeContainer _ (Misc (ProcessingInstruction n txt)) = return $ Misc (ProcessingInstruction n txt)
mapNodeContainer _ (Misc (Comment txt)) = return $ Misc (Comment txt)
mkText = Text
instance (Functor c, List c) => MkElementClass (NodeG (Maybe a)) c where
mkElement name attrs children = Element name attrs children Nothing
instance (Functor c, List c) => MkElementClass (NodeG ()) c where
mkElement name attrs children = Element name attrs children ()
-- | Type alias for an extended document with unqualified tag names where
-- tag and text are the same string type
type UDocument a text = Document a text text
-- | Type alias for an extended document, annotated with parse location
type LDocument tag text = Document XMLParseLocation tag text
-- | Type alias for an extended document with unqualified tag names where
-- tag and text are the same string type, annotated with parse location
type ULDocument text = Document XMLParseLocation text text
-- | Type alias for an extended document where qualified names are used for tags
type QDocument a text = Document a (QName text) text
-- | Type alias for an extended document where qualified names are used for tags, annotated with parse location
type QLDocument text = Document XMLParseLocation (QName text) text
-- | Type alias for an extended document where namespaced names are used for tags
type NDocument a text = Document a (NName text) text
-- | Type alias for an extended document where namespaced names are used for tags, annotated with parse location
type NLDocument text = Document XMLParseLocation (NName text) text
-- | Type alias for an extended node with unqualified tag names where
-- tag and text are the same string type
type UNode a text = Node a text text
-- | Type alias for an extended node, annotated with parse location
type LNode tag text = Node XMLParseLocation tag text
-- | Type alias for an extended node with unqualified tag names where
-- tag and text are the same string type, annotated with parse location
type ULNode text = LNode text text
-- | Type alias for an extended node where qualified names are used for tags
type QNode a text = Node a (QName text) text
-- | Type alias for an extended node where qualified names are used for tags, annotated with parse location
type QLNode text = LNode (QName text) text
-- | Type alias for an extended node where namespaced names are used for tags
type NNode a text = Node a (NName text) text
-- | Type alias for an extended node where namespaced names are used for tags, annotated with parse location
type NLNode text = LNode (NName text) text
-- | Modify this node's annotation (non-recursively) if it's an element, otherwise no-op.
modifyAnnotation :: (a -> a) -> Node a tag text -> Node a tag text
f `modifyAnnotation` Element na at ch an = Element na at ch (f an)
_ `modifyAnnotation` Text t = Text t
_ `modifyAnnotation` CData t = CData t
_ `modifyAnnotation` Misc (ProcessingInstruction n t) = Misc (ProcessingInstruction n t)
_ `modifyAnnotation` Misc (Comment t) = Misc (Comment t)
-- | Modify this node's annotation and all its children recursively if it's an element, otherwise no-op.
mapAnnotation :: (a -> b) -> Node a tag text -> Node b tag text
f `mapAnnotation` Element na at ch an = Element na at (map (f `mapAnnotation`) ch) (f an)
_ `mapAnnotation` Text t = Text t
_ `mapAnnotation` CData t = CData t
_ `mapAnnotation` Misc (ProcessingInstruction n t) = Misc (ProcessingInstruction n t)
_ `mapAnnotation` Misc (Comment t) = Misc (Comment t)
-- | Modify the annotation of every node in the document recursively.
mapDocumentAnnotation :: (a -> b) -> Document a tag text -> Document b tag text
mapDocumentAnnotation f doc = Document {
dXMLDeclaration = dXMLDeclaration doc,
dDocumentTypeDeclaration = dDocumentTypeDeclaration doc,
dTopLevelMiscs = dTopLevelMiscs doc,
dRoot = mapAnnotation f (dRoot doc)
}
-- | A lower level function that lazily converts a SAX stream into a tree structure.
-- Variant that takes annotations for start tags.
saxToTree :: (GenericXMLString tag, Monoid text) =>
[(SAXEvent tag text, a)]
-> (Document a tag text, Maybe XMLParseError)
saxToTree ((SAX.XMLDeclaration ver mEnc mSD, _):events) =
let (doc, mErr) = saxToTree events
in (doc {
dXMLDeclaration = Just $ XMLDeclaration ver mEnc mSD
}, mErr)
saxToTree events =
let (nodes, mError, _) = ptl events False []
doc = Document {
dXMLDeclaration = Nothing,
dDocumentTypeDeclaration = Nothing,
dTopLevelMiscs = findTopLevelMiscs nodes,
dRoot = findRoot nodes
}
in (doc, mError)
where
findRoot (elt@(Element _ _ _ _):_) = elt
findRoot (_:nodes) = findRoot nodes
findRoot [] = Element (gxFromString "") [] [] (error "saxToTree null annotation")
findTopLevelMiscs = mapMaybe $ \node -> case node of
Misc m -> Just m
_ -> Nothing
ptl ((SAX.StartElement name attrs,ann):rema) isCD cd =
let (children, err1, rema') = ptl rema isCD cd
elt = Element name attrs children ann
(out, err2, rema'') = ptl rema' isCD cd
in (elt:out, err1 `mplus` err2, rema'')
ptl ((SAX.EndElement _, _):rema) _ _ = ([], Nothing, rema)
ptl ((SAX.CharacterData txt, _):rema) isCD cd =
if isCD then
ptl rema isCD (txt:cd)
else
let (out, err, rema') = ptl rema isCD cd
in (Text txt:out, err, rema')
ptl ((SAX.StartCData,_) :rema) _ _ =
ptl rema True mzero
ptl ((SAX.EndCData, _) :rema) _ cd =
let (out, err, rema') = ptl rema False mzero
in (CData (mconcat $ reverse cd):out, err, rema')
ptl ((SAX.Comment txt, _):rema) isCD cd =
let (out, err, rema') = ptl rema isCD cd
in (Misc (Comment txt):out, err, rema')
ptl ((SAX.ProcessingInstruction target txt, _):rema) isCD cd =
let (out, err, rema') = ptl rema isCD cd
in (Misc (ProcessingInstruction target txt):out, err, rema')
ptl ((SAX.FailDocument err, _):_) _ _ = ([], Just err, [])
ptl ((SAX.XMLDeclaration _ _ _, _):rema) isCD cd = ptl rema isCD cd -- doesn't appear in the middle of a document
ptl [] _ _ = ([], Nothing, [])
-- | Lazily parse XML to tree. Note that forcing the XMLParseError return value
-- will force the entire parse. Therefore, to ensure lazy operation, don't
-- check the error status until you have processed the tree.
parse :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> (LDocument tag text, Maybe XMLParseError)
parse opts bs = saxToTree $ SAX.parseLocations opts bs
-- | Lazily parse XML to tree. In the event of an error, throw 'XMLParseException'.
--
-- @parseThrowing@ can throw an exception from pure code, which is generally a bad
-- way to handle errors, because Haskell\'s lazy evaluation means it\'s hard to
-- predict where it will be thrown from. However, it may be acceptable in
-- situations where it's not expected during normal operation, depending on the
-- design of your program.
parseThrowing :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> L.ByteString -- ^ Input text (a lazy ByteString)
-> LDocument tag text
parseThrowing opts bs = fst $ saxToTree $ SAX.parseLocationsThrowing opts bs
-- | Strictly parse XML to tree. Returns error message or valid parsed tree.
parse' :: (GenericXMLString tag, GenericXMLString text) =>
ParseOptions tag text -- ^ Parse options
-> B.ByteString -- ^ Input text (a strict ByteString)
-> Either XMLParseError (LDocument tag text)
parse' opts bs = case parse opts (L.fromChunks [bs]) of
(_, Just err) -> Left err
(root, Nothing) -> Right root
hexpat-0.20.13/Text/XML/Expat/Internal/ 0000755 0000000 0000000 00000000000 13122604047 015554 5 ustar 00 0000000 0000000 hexpat-0.20.13/Text/XML/Expat/Internal/Glue.c 0000644 0000000 0000000 00000014666 13122604047 016631 0 ustar 00 0000000 0000000 #include
#include
#include
#include
typedef struct {
XML_Parser parser;
XML_Char* (*decoder)(const XML_Char*);
int locations;
} MyParser;
typedef struct {
size_t offset;
size_t capacity;
uint8_t* block;
MyParser* mp;
} Block;
/*!
* Allocate the specified amount of room in the buffer.
*/
static void* alloc(Block* blk, size_t room)
{
size_t minCapacity = blk->offset + room;
size_t start = blk->offset;
{
int changed = 0;
size_t capacity = blk->capacity;
while (capacity < minCapacity) {
capacity = capacity * 2;
changed = 1;
}
if (changed) {
blk->capacity = capacity;
blk->block = realloc(blk->block, capacity);
}
}
blk->offset += room;
return blk->block + start;
}
static void pushType(Block* blk, uint32_t type)
{
*(uint32_t*)alloc(blk, 4) = type;
if (blk->mp->locations) {
int64_t* loc = alloc(blk, 32);
loc[0] = (int64_t)XML_GetCurrentLineNumber(blk->mp->parser);
loc[1] = (int64_t)XML_GetCurrentColumnNumber(blk->mp->parser);
loc[2] = (int64_t)XML_GetCurrentByteIndex(blk->mp->parser);
loc[3] = (int64_t)XML_GetCurrentByteCount(blk->mp->parser);
}
}
#define ROUND_UP_32(x) (((x) + 3) & ~3)
static void startElement(
void *userData,
const XML_Char *name,
const XML_Char **atts)
{
Block* blk = userData;
size_t nameLen = strlen(name) + 1;
size_t nAtts, i;
for (nAtts = 0; atts[nAtts] != NULL; nAtts += 2) ;
pushType(blk, 1);
*(uint32_t*)alloc(blk, 4) = nAtts;
memcpy(alloc(blk, nameLen), name, nameLen);
for (i = 0; i < nAtts; i++) {
size_t attLen = strlen(atts[i]) + 1;
memcpy(alloc(blk, attLen), atts[i], attLen);
}
blk->offset = ROUND_UP_32(blk->offset);
}
static void endElement(void *userData, const XML_Char *name)
{
Block* blk = userData;
size_t nameLen = strlen(name) + 1;
pushType(blk, 2);
memcpy(alloc(blk, nameLen), name, nameLen);
blk->offset = ROUND_UP_32(blk->offset);
}
static void characterData(
void *userData,
const XML_Char *s,
int len)
{
Block* blk = userData;
pushType(blk, 3);
*(uint32_t*)alloc(blk, 4) = len;
memcpy(alloc(blk, (size_t)len), s, (size_t)len);
blk->offset = ROUND_UP_32(blk->offset);
}
static void xmlDeclHandler(void *userData,
const XML_Char *version,
const XML_Char *encoding,
int standalone)
{
Block* blk = userData;
int verLen = strlen(version) + 1;
pushType(blk, 4);
memcpy(alloc(blk, verLen), version, verLen);
if (encoding != NULL) {
int encLen = strlen(encoding)+1;
uint8_t* pEnc = alloc(blk, encLen+1);
pEnc[0] = 1;
memcpy(pEnc+1, encoding, encLen);
}
else
*(uint8_t*)alloc(blk, 1) = 0;
*(int8_t*)alloc(blk, 1) = (int8_t)standalone;
blk->offset = ROUND_UP_32(blk->offset);
}
static void startCData(void* userData)
{
Block* blk = userData;
pushType(blk, 5);
}
static void endCData(void* userData)
{
Block* blk = userData;
pushType(blk, 6);
}
static void processingInstruction(void *userData, const XML_Char *target, const XML_Char *data)
{
Block* blk = userData;
int targetLen = strlen(target) + 1;
int dataLen = strlen(data) + 1;
pushType(blk, 7);
memcpy(alloc(blk, targetLen), target, targetLen);
memcpy(alloc(blk, dataLen), data, dataLen);
blk->offset = ROUND_UP_32(blk->offset);
}
static void comment(void* userData, const XML_Char *text)
{
Block* blk = userData;
int textLen = strlen(text) + 1;
pushType(blk, 8);
memcpy(alloc(blk, textLen), text, textLen);
blk->offset = ROUND_UP_32(blk->offset);
}
MyParser* hexpatNewParser(const XML_Char* encoding, int locations)
{
MyParser* mp = malloc(sizeof(MyParser));
XML_Parser p = XML_ParserCreate(encoding);
XML_SetStartElementHandler(p, startElement);
XML_SetEndElementHandler(p, endElement);
XML_SetCharacterDataHandler(p, characterData);
XML_SetXmlDeclHandler(p, xmlDeclHandler);
XML_SetCdataSectionHandler(p, startCData, endCData);
XML_SetProcessingInstructionHandler(p, processingInstruction);
XML_SetCommentHandler(p, comment);
mp->parser = p;
mp->locations = locations;
return mp;
}
void hexpatFreeParser(MyParser* mp)
{
XML_ParserFree(mp->parser);
free(mp);
}
static int externalEntityRef(XML_Parser parser,
const XML_Char *context,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId)
{
if (systemId == NULL && publicId == NULL) {
XML_Parser eep = XML_ExternalEntityParserCreate(parser, context, NULL);
enum XML_Status ret = XML_Parse(eep, "", 0, XML_TRUE);
if (ret == XML_STATUS_OK) {
XML_ParserFree(eep);
}
else {
XML_ParserFree(eep);
XML_StopParser(parser, 0);
}
}
else
XML_StopParser(parser, 0);
}
static void skippedEntity(void *userData,
const XML_Char *entityName,
int is_parameter_entity)
{
Block* blk = userData;
if (is_parameter_entity)
XML_StopParser(blk->mp->parser, 0);
else {
XML_Char* out = blk->mp->decoder(entityName);
if (out != NULL) {
characterData(blk, out, strlen(out));
free(out);
}
else
XML_StopParser(blk->mp->parser, 0);
}
}
enum XML_Status hexpatParse(
MyParser* mp,
const char* s,
int len,
int isFinal,
uint8_t** buffer,
int* length)
{
enum XML_Status ret;
Block blk;
blk.offset = 0;
blk.capacity = 256;
blk.block = malloc(blk.capacity);
blk.mp = mp;
XML_SetUserData(mp->parser, &blk);
ret = XML_Parse(mp->parser, s, len, isFinal);
*(uint32_t*)alloc(&blk, 4) = 0;
*buffer = blk.block;
*length = (int)blk.offset;
return ret;
}
void hexpatSetEntityHandler(
MyParser* mp,
XML_Char* (*decoder)(const XML_Char*))
{
mp->decoder = decoder;
XML_UseForeignDTD(mp->parser, XML_TRUE);
XML_SetExternalEntityRefHandler(mp->parser, externalEntityRef);
XML_SetSkippedEntityHandler(mp->parser, skippedEntity);
}
XML_Parser hexpatGetParser(MyParser* mp)
{
return mp->parser;
}
hexpat-0.20.13/Text/XML/Expat/Internal/IO.hs 0000644 0000000 0000000 00000020421 13122604047 016416 0 ustar 00 0000000 0000000 {-# LANGUAGE ForeignFunctionInterface, EmptyDataDecls #-}
{-# OPTIONS_GHC -fno-cse -fno-full-laziness #-}
-- | Low-level interface to Expat. Unless speed is paramount, this should
-- normally be avoided in favour of the interfaces provided by
-- 'Text.XML.Expat.SAX' and 'Text.XML.Expat.Tree', etc.
module Text.XML.Expat.Internal.IO (
HParser,
hexpatNewParser,
encodingToString,
Encoding(..),
XMLParseError(..),
XMLParseLocation(..)
) where
import Control.Applicative
import Control.DeepSeq
import qualified Data.ByteString as B
import qualified Data.ByteString.Internal as I
import Data.Int
import Data.Word
import Foreign
import Foreign.C
data Parser_struct
type ParserPtr = Ptr Parser_struct
data Encoding = ASCII | UTF8 | UTF16 | ISO88591
encodingToString :: Encoding -> String
encodingToString ASCII = "US-ASCII"
encodingToString UTF8 = "UTF-8"
encodingToString UTF16 = "UTF-16"
encodingToString ISO88591 = "ISO-8859-1"
withOptEncoding :: Maybe Encoding -> (CString -> IO a) -> IO a
withOptEncoding Nothing f = f nullPtr
withOptEncoding (Just enc) f = withCString (encodingToString enc) f
-- ByteString.useAsCStringLen is almost what we need, but C2HS wants a CInt
-- instead of an Int.
withBStringLen :: B.ByteString -> ((CString, CInt) -> IO a) -> IO a
withBStringLen bs f = do
B.useAsCStringLen bs $ \(str, len) -> f (str, fromIntegral len)
unStatus :: CInt -> Bool
unStatus 0 = False
unStatus _ = True
getError :: ParserPtr -> IO XMLParseError
getError pp = do
code <- xmlGetErrorCode pp
cerr <- xmlErrorString code
err <- peekCString cerr
loc <- getParseLocation pp
return $ XMLParseError err loc
-- |Obtain C value from Haskell 'Bool'.
--
cFromBool :: Num a => Bool -> a
cFromBool = fromBool
-- | Parse error, consisting of message text and error location
data XMLParseError = XMLParseError String XMLParseLocation deriving (Eq, Show)
instance NFData XMLParseError where
rnf (XMLParseError msg loc) = rnf (msg, loc)
-- | Specifies a location of an event within the input text
data XMLParseLocation = XMLParseLocation {
xmlLineNumber :: Int64, -- ^ Line number of the event
xmlColumnNumber :: Int64, -- ^ Column number of the event
xmlByteIndex :: Int64, -- ^ Byte index of event from start of document
xmlByteCount :: Int64 -- ^ The number of bytes in the event
}
deriving (Eq, Show)
instance NFData XMLParseLocation where
rnf (XMLParseLocation lin col ind cou) = rnf (lin, col, ind, cou)
getParseLocation :: ParserPtr -> IO XMLParseLocation
getParseLocation pp = do
line <- xmlGetCurrentLineNumber pp
col <- xmlGetCurrentColumnNumber pp
index <- xmlGetCurrentByteIndex pp
count <- xmlGetCurrentByteCount pp
return $ XMLParseLocation {
xmlLineNumber = fromIntegral line,
xmlColumnNumber = fromIntegral col,
xmlByteIndex = fromIntegral index,
xmlByteCount = fromIntegral count
}
-- Note on word sizes:
--
-- on expat 2.0:
-- XML_GetCurrentLineNumber returns XML_Size
-- XML_GetCurrentColumnNumber returns XML_Size
-- XML_GetCurrentByteIndex returns XML_Index
-- These are defined in expat_external.h
--
-- debian-i386 says XML_Size and XML_Index are 4 bytes.
-- ubuntu-amd64 says XML_Size and XML_Index are 8 bytes.
-- These two systems do NOT define XML_LARGE_SIZE, which would force these types
-- to be 64-bit.
--
-- If we guess the word size too small, it shouldn't matter: We will just discard
-- the most significant part. If we get the word size too large, we will get
-- garbage (very bad).
--
-- So - what I will do is use CLong and CULong, which correspond to what expat
-- is using when XML_LARGE_SIZE is disabled, and give the correct sizes on the
-- two machines mentioned above. At the absolute worst the word size will be too
-- short.
foreign import ccall unsafe "expat.h XML_GetErrorCode" xmlGetErrorCode
:: ParserPtr -> IO CInt
foreign import ccall unsafe "expat.h XML_GetCurrentLineNumber" xmlGetCurrentLineNumber
:: ParserPtr -> IO CULong
foreign import ccall unsafe "expat.h XML_GetCurrentColumnNumber" xmlGetCurrentColumnNumber
:: ParserPtr -> IO CULong
foreign import ccall unsafe "expat.h XML_GetCurrentByteIndex" xmlGetCurrentByteIndex
:: ParserPtr -> IO CLong
foreign import ccall unsafe "expat.h XML_GetCurrentByteCount" xmlGetCurrentByteCount
:: ParserPtr -> IO CInt
foreign import ccall unsafe "expat.h XML_ErrorString" xmlErrorString
:: CInt -> IO CString
type HParser = B.ByteString -> Bool -> IO (ForeignPtr Word8, CInt, Maybe XMLParseError)
foreign import ccall unsafe "hexpatNewParser"
_hexpatNewParser :: Ptr CChar -> CInt -> IO MyParserPtr
foreign import ccall unsafe "hexpatGetParser"
_hexpatGetParser :: MyParserPtr -> ParserPtr
data MyParser_struct
type MyParserPtr = Ptr MyParser_struct
foreign import ccall "&hexpatFreeParser" hexpatFreeParser :: FunPtr (MyParserPtr -> IO ())
hexpatNewParser :: Maybe Encoding
-> Maybe (B.ByteString -> Maybe B.ByteString) -- ^ Entity decoder
-> Bool -- ^ Whether to include input locations
-> IO (HParser, IO XMLParseLocation)
hexpatNewParser enc mDecoder locations =
withOptEncoding enc $ \cEnc -> do
parser <- newForeignPtr hexpatFreeParser =<< _hexpatNewParser cEnc (cFromBool locations)
return (parse parser, withForeignPtr parser $ \mp -> getParseLocation $ _hexpatGetParser mp)
where
parse parser = case mDecoder of
Nothing -> \text final ->
alloca $ \ppData ->
alloca $ \pLen ->
withBStringLen text $ \(textBuf, textLen) ->
withForeignPtr parser $ \pp -> do
ok <- unStatus <$> _hexpatParseUnsafe pp textBuf textLen (cFromBool final) ppData pLen
pData <- peek ppData
len <- peek pLen
err <- if ok
then return Nothing
else Just <$> getError (_hexpatGetParser pp)
fpData <- newForeignPtr funPtrFree pData
return (fpData, len, err)
Just decoder -> \text final ->
alloca $ \ppData ->
alloca $ \pLen ->
withBStringLen text $ \(textBuf, textLen) ->
withForeignPtr parser $ \pp -> do
eh <- mkCEntityHandler . wrapCEntityHandler $ decoder
_hexpatSetEntityHandler pp eh
ok <- unStatus <$> _hexpatParseSafe pp textBuf textLen (cFromBool final) ppData pLen
freeHaskellFunPtr eh
pData <- peek ppData
len <- peek pLen
err <- if ok
then return Nothing
else Just <$> getError (_hexpatGetParser pp)
fpData <- newForeignPtr funPtrFree pData
return (fpData, len, err)
foreign import ccall unsafe "hexpatParse"
_hexpatParseUnsafe :: MyParserPtr -> Ptr CChar -> CInt -> CInt -> Ptr (Ptr Word8) -> Ptr CInt -> IO CInt
foreign import ccall safe "hexpatParse"
_hexpatParseSafe :: MyParserPtr -> Ptr CChar -> CInt -> CInt -> Ptr (Ptr Word8) -> Ptr CInt -> IO CInt
type CEntityHandler = Ptr CChar -> IO (Ptr CChar)
foreign import ccall safe "wrapper"
mkCEntityHandler :: CEntityHandler
-> IO (FunPtr CEntityHandler)
peekByteStringLen :: CStringLen -> IO B.ByteString
{-# INLINE peekByteStringLen #-}
peekByteStringLen (cstr, len) =
I.create (fromIntegral len) $ \ptr ->
I.memcpy ptr (castPtr cstr) (fromIntegral len)
wrapCEntityHandler :: (B.ByteString -> Maybe B.ByteString) -> CEntityHandler
wrapCEntityHandler handler = h
where
h cname = do
sz <- fromIntegral <$> I.c_strlen cname
name <- peekByteStringLen (cname, sz)
case handler name of
Just text -> do
let (fp, offset, len) = I.toForeignPtr text
withForeignPtr fp $ \ctextBS -> do
ctext <- mallocBytes (len + 1) :: IO CString
I.memcpy (castPtr ctext) (ctextBS `plusPtr` offset) (fromIntegral len)
poke (ctext `plusPtr` len) (0 :: CChar)
return ctext
Nothing -> return nullPtr
foreign import ccall unsafe "hexpatSetEntityHandler"
_hexpatSetEntityHandler :: MyParserPtr -> FunPtr CEntityHandler -> IO ()
foreign import ccall "&free" funPtrFree :: FunPtr (Ptr Word8 -> IO ())
hexpat-0.20.13/Text/XML/Expat/Internal/Namespaced.hs 0000644 0000000 0000000 00000014266 13122604047 020161 0 ustar 00 0000000 0000000 {-# LANGUAGE FlexibleContexts #-}
module Text.XML.Expat.Internal.Namespaced
( NName (..)
, NAttributes
, mkNName
, mkAnNName
, toNamespaced
, fromNamespaced
, xmlnsUri
, xmlns
) where
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.Internal.Qualified
import Text.XML.Expat.SAX
import Control.DeepSeq
import qualified Data.Map as M
import qualified Data.Maybe as DM
import qualified Data.List as L
-- | A namespace-qualified tag.
--
-- NName has two components, a local part and an optional namespace. The local part is the
-- name of the tag. The namespace is the URI identifying collections of declared tags.
-- Tags with the same local part but from different namespaces are distinct. Unqualified tags
-- are those with no namespace. They are in the default namespace, and all uses of an
-- unqualified tag are equivalent.
data NName text =
NName {
nnNamespace :: Maybe text,
nnLocalPart :: !text
}
deriving (Eq,Show)
instance NFData text => NFData (NName text) where
rnf (NName ns loc) = rnf (ns, loc)
-- | Type shortcut for attributes with namespaced names
type NAttributes text = Attributes (NName text) text
-- | Make a new NName from a prefix and localPart.
mkNName :: text -> text -> NName text
mkNName prefix localPart = NName (Just prefix) localPart
-- | Make a new NName with no prefix.
mkAnNName :: text -> NName text
mkAnNName localPart = NName Nothing localPart
type NsPrefixMap text = M.Map (Maybe text) (Maybe text)
type PrefixNsMap text = M.Map (Maybe text) (Maybe text)
xmlUri :: (GenericXMLString text) => text
xmlUri = gxFromString "http://www.w3.org/XML/1998/namespace"
xml :: (GenericXMLString text) => text
xml = gxFromString "xml"
xmlnsUri :: (GenericXMLString text) => text
xmlnsUri = gxFromString "http://www.w3.org/2000/xmlns/"
xmlns :: (GenericXMLString text) => text
xmlns = gxFromString "xmlns"
baseNsBindings :: (GenericXMLString text, Ord text)
=> NsPrefixMap text
baseNsBindings = M.fromList
[ (Nothing, Nothing)
, (Just xml, Just xmlUri)
, (Just xmlns, Just xmlnsUri)
]
basePfBindings :: (GenericXMLString text, Ord text)
=> PrefixNsMap text
basePfBindings = M.fromList
[ (Nothing, Nothing)
, (Just xmlUri, Just xml)
, (Just xmlnsUri, Just xmlns)
]
toNamespaced :: (NodeClass n c, GenericXMLString text, Ord text, Show text)
=> n c (QName text) text -> n c (NName text) text
toNamespaced = nodeWithNamespaces baseNsBindings
nodeWithNamespaces :: (NodeClass n c, GenericXMLString text, Ord text, Show text)
=> NsPrefixMap text -> n c (QName text) text -> n c (NName text) text
nodeWithNamespaces bindings = modifyElement namespaceify
where
namespaceify (qname, qattrs, qchildren) = (nname, nattrs, nchildren)
where
for = flip map
ffor = flip fmap
(nsAtts, otherAtts) = L.partition ((== Just xmlns) . qnPrefix . fst) qattrs
(dfAtt, normalAtts) = L.partition ((== QName Nothing xmlns) . fst) otherAtts
nsMap = M.fromList $ for nsAtts $ \((QName _ lp), uri) -> (Just lp, Just uri)
-- fixme: when snd q is null, use Nothing
dfMap = M.fromList $ for dfAtt $ \q -> (Nothing, Just $ snd q)
chldBs = M.unions [dfMap, nsMap, bindings]
trans bs (QName pref qual) = case pref `M.lookup` bs of
Nothing -> error
$ "Namespace prefix referenced but never bound: '"
++ (show . DM.fromJust) pref
++ "'"
Just mUri -> NName mUri qual
nname = trans chldBs qname
-- attributes with no prefix are in the same namespace as the element
attBs = M.insert Nothing (nnNamespace nname) chldBs
transAt (qn, v) = (trans attBs qn, v)
nNsAtts = map transAt nsAtts
nDfAtt = map transAt dfAtt
nNormalAtts = map transAt normalAtts
nattrs = concat [nNsAtts, nDfAtt, nNormalAtts]
nchildren = ffor qchildren $ nodeWithNamespaces chldBs
fromNamespaced :: (NodeClass n c, GenericXMLString text, Ord text, Functor c) =>
n c (NName text) text -> n c (QName text) text
fromNamespaced = nodeWithQualifiers 1 basePfBindings
nodeWithQualifiers :: (NodeClass n c, GenericXMLString text, Ord text, Functor c) =>
Int
-> PrefixNsMap text
-> n c (NName text) text
-> n c (QName text) text
nodeWithQualifiers cntr bindings = modifyElement namespaceify
where
namespaceify (nname, nattrs, nchildren) = (qname, qattrs, qchildren)
where
for = flip map
ffor = flip fmap
(nsAtts, otherAtts) = L.partition ((== Just xmlnsUri) . nnNamespace . fst) nattrs
(dfAtt, normalAtts) = L.partition ((== NName Nothing xmlns) . fst) otherAtts
nsMap = M.fromList $ for nsAtts $ \((NName _ lp), uri) -> (Just uri, Just lp)
dfMap = M.fromList $ for dfAtt $ \(_, uri) -> (Just uri, Just xmlns)
chldBs = M.unions [dfMap, nsMap, bindings]
trans (i, bs, as) (NName nspace qual) =
case nspace `M.lookup` bs of
Nothing -> let
pfx = gxFromString $ "ns" ++ show i
bsN = M.insert nspace (Just pfx) bs
asN = (NName (Just xmlnsUri) pfx, DM.fromJust nspace) : as
in trans (i+1, bsN, asN) (NName nspace qual)
Just pfx -> ((i, bs, as), QName pfx qual)
transAt ibs (nn, v) = let (ibs', qn) = trans ibs nn
in (ibs', (qn, v))
((i', bs', as'), qname) = trans (cntr, chldBs, []) nname
((i'', bs'', as''), qNsAtts) = L.mapAccumL transAt (i', bs', as') nsAtts
((i''', bs''', as'''), qDfAtt) = L.mapAccumL transAt (i'', bs'', as'') dfAtt
((i'''', bs'''', as''''), qNormalAtts) = L.mapAccumL transAt (i''', bs''', as''') normalAtts
(_, qas) = L.mapAccumL transAt (i'''', bs'''', as'''') as''''
qattrs = concat [qNsAtts, qDfAtt, qNormalAtts, qas]
qchildren = ffor nchildren $ nodeWithQualifiers i'''' bs''''
hexpat-0.20.13/Text/XML/Expat/Internal/NodeClass.hs 0000644 0000000 0000000 00000022257 13122604047 017773 0 ustar 00 0000000 0000000 {-# LANGUAGE MultiParamTypeClasses, FlexibleContexts, TypeFamilies,
ScopedTypeVariables, Rank2Types #-}
-- | Type classes to allow for XML handling functions to be generalized to
-- work with different node types, including the ones defined in /Tree/ and
-- /Annotated/.
module Text.XML.Expat.Internal.NodeClass where
import Control.Monad (mzero, liftM)
import Data.Functor.Identity
import Data.List.Class (List(..), ListItem(..), cons, fromList, mapL, toList)
import Data.Monoid (Monoid)
import Text.XML.Expat.SAX (GenericXMLString)
-- | Type shortcut for attributes
type Attributes tag text = [(tag, text)]
-- | Type shortcut for attributes with unqualified names where tag and
-- text are the same string type.
type UAttributes text = Attributes text text
-- | Extract all text content from inside a tag into a single string, including
-- any text contained in children. This /excludes/ the contents of /comments/ or
-- /processing instructions/. To get the text for these node types, use 'getText'.
textContent :: (NodeClass n [], Monoid text) => n [] tag text -> text
textContent node = runIdentity $ textContentM node
-- | A type function to give the type of a list of nodes, using the appropriate
-- list type for the specified node type, e.g. @ListOf (UNode Text)@
type family ListOf n
class (Functor c, List c) => NodeClass (n :: (* -> *) -> * -> * -> *) c where
-- | Is the given node an element?
isElement :: n c tag text -> Bool
-- | Is the given node text?
isText :: n c tag text -> Bool
-- | Is the given node CData?
isCData :: n c tag text -> Bool
-- | Is the given node a processing instruction?
isProcessingInstruction :: n c tag text -> Bool
-- | Is the given node a comment?
isComment :: n c tag text -> Bool
-- | Extract all text content from inside a tag into a single string, including
-- any text contained in children. This /excludes/ the contents of /comments/ or
-- /processing instructions/. To get the text for these node types, use 'getText'.
textContentM :: Monoid text => n c tag text -> ItemM c text
-- | Is the given node a tag with the given name?
isNamed :: Eq tag => tag -> n c tag text -> Bool
-- | Get the name of this node if it's an element, return empty string otherwise.
getName :: Monoid tag => n c tag text -> tag
-- | Is the given node a Processing Instruction with the given target?
hasTarget :: Eq text => text -> n c tag text -> Bool
-- | Get the target of this node if it's a Processing Instruction, return empty string otherwise.
getTarget :: Monoid text => n c tag text -> text
-- | Get the attributes of a node if it's an element, return empty list otherwise.
getAttributes :: n c tag text -> [(tag,text)]
-- | Get children of a node if it's an element, return empty list otherwise.
getChildren :: n c tag text -> c (n c tag text)
-- | Get this node's text if it's a text node, comment, or processing instruction,
-- return empty text otherwise.
getText :: Monoid text => n c tag text -> text
-- | Modify name if it's an element, no-op otherwise.
modifyName :: (tag -> tag)
-> n c tag text
-> n c tag text
-- | Modify attributes if it's an element, no-op otherwise.
modifyAttributes :: ([(tag, text)] -> [(tag, text)])
-> n c tag text
-> n c tag text
-- | Modify children (non-recursively) if it's an element, no-op otherwise.
modifyChildren :: (c (n c tag text) -> c (n c tag text))
-> n c tag text
-> n c tag text
-- | Map an element non-recursively, allowing the tag type to be changed.
modifyElement :: ((tag, [(tag, text)], c (n c tag text))
-> (tag', [(tag', text)], c (n c tag' text)))
-> n c tag text
-> n c tag' text
-- | Map all tags (both tag names and attribute names) recursively.
mapAllTags :: (tag -> tag')
-> n c tag text
-> n c tag' text
-- | Change a node recursively from one container type to another, with a
-- specified function to convert the container type.
mapNodeContainer :: List c' =>
(forall a . c a -> ItemM c (c' a))
-> n c tag text
-> ItemM c (n c' tag text)
-- | Generic text node constructor.
mkText :: text -> n c tag text
-- | Change a list of nodes recursively from one container type to another, with
-- a specified function to convert the container type.
mapNodeListContainer :: (NodeClass n c, List c') =>
(forall a . c a -> ItemM c (c' a))
-> c (n c tag text)
-> ItemM c (c' (n c' tag text))
mapNodeListContainer f = f . mapL (mapNodeContainer f)
-- | Change a node recursively from one container type to another. This
-- extracts the entire tree contents to standard lists and re-constructs them
-- with the new container type. For monadic list types used in
-- @hexpat-iteratee@ this operation forces evaluation.
fromNodeContainer :: (NodeClass n c, List c') =>
n c tag text
-> ItemM c (n c' tag text)
fromNodeContainer = mapNodeContainer (\l -> fromList `liftM` toList l)
-- | Change a list of nodes recursively from one container type to another. This
-- extracts the entire tree contents to standard lists and re-constructs them
-- with the new container type. For monadic list types used in
-- @hexpat-iteratee@ this operation forces evaluation.
fromNodeListContainer :: (NodeClass n c, List c') =>
c (n c tag text)
-> ItemM c (c' (n c' tag text))
fromNodeListContainer = mapNodeListContainer (\l -> fromList `liftM` toList l)
-- | A class of node types where an Element can be constructed given a tag,
-- attributes and children.
class NodeClass n c => MkElementClass n c where
-- | Generic element constructor.
mkElement :: tag -> Attributes tag text -> c (n c tag text) -> n c tag text
-- | Get the value of the attribute having the specified name.
getAttribute :: (NodeClass n c, GenericXMLString tag) => n c tag text -> tag -> Maybe text
getAttribute n t = lookup t $ getAttributes n
-- | Set the value of the attribute with the specified name to the value, overwriting
-- the first existing attribute with that name if present.
setAttribute :: (Eq tag, NodeClass n c, GenericXMLString tag) => tag -> text -> n c tag text -> n c tag text
setAttribute t newValue = modifyAttributes set
where
set [] = [(t, newValue)]
set ((name, _):atts) | name == t = (name, newValue):atts
set (att:atts) = att:set atts
-- | Delete the first attribute matching the specified name.
deleteAttribute :: (Eq tag, NodeClass n c, GenericXMLString tag) => tag -> n c tag text -> n c tag text
deleteAttribute t = modifyAttributes del
where
del [] = []
del ((name, _):atts) | name == t = atts
del (att:atts) = att:del atts
-- | setAttribute if /Just/, deleteAttribute if /Nothing/.
alterAttribute :: (Eq tag, NodeClass n c, GenericXMLString tag) => tag -> Maybe text -> n c tag text -> n c tag text
alterAttribute t (Just newValue) = setAttribute t newValue
alterAttribute t Nothing = deleteAttribute t
-- | Generically convert an element of one node type to another. Useful for
-- adding or removing annotations.
fromElement :: (NodeClass n c, MkElementClass n' c, Monoid tag, Monoid text) =>
n c tag text
-> n' c tag text
fromElement = fromElement_ mkElement
-- | Generically convert an element of one node type to another, using
-- the specified element constructor. Useful for adding or removing annotations.
fromElement_ :: (NodeClass n c, NodeClass n' c, Monoid tag, Monoid text) =>
(tag -> Attributes tag text -> c (n' c tag text) -> n' c tag text) -- ^ Element constructor
-> n c tag text
-> n' c tag text
fromElement_ mkElement elt | isElement elt =
mkElement (getName elt) (getAttributes elt) (fromNodes_ mkElement $ getChildren elt)
fromElement_ _ _ = error "fromElement requires an Element"
-- | Generically convert a list of nodes from one node type to another. Useful for
-- adding or removing annotations.
fromNodes :: (NodeClass n c, MkElementClass n' c, Monoid tag, Monoid text) =>
c (n c tag text)
-> c (n' c tag text)
fromNodes = fromNodes_ mkElement
-- | Generically convert a list of nodes from one node type to another, using
-- the specified element constructor. Useful for adding or removing annotations.
fromNodes_ :: (NodeClass n c, NodeClass n' c, Monoid tag, Monoid text) =>
(tag -> Attributes tag text -> c (n' c tag text) -> n' c tag text) -- ^ Element constructor
-> c (n c tag text)
-> c (n' c tag text)
fromNodes_ mkElement l = joinL $ do
li <- runList l
return $ case li of
Nil -> mzero
Cons elt l' | isElement elt -> fromElement_ mkElement elt `cons` fromNodes_ mkElement l'
Cons txt l' | isText txt -> mkText (getText txt) `cons` fromNodes_ mkElement l'
-- Future node types may include other kinds of nodes, which we discard here.
Cons _ l' -> fromNodes_ mkElement l'
hexpat-0.20.13/Text/XML/Expat/Internal/Qualified.hs 0000644 0000000 0000000 00000004563 13122604047 020023 0 ustar 00 0000000 0000000 -- hexpat, a Haskell wrapper for expat
-- Copyright (C) 2008 Evan Martin
-- Copyright (C) 2009 Stephen Blackheath
-- | In the default representation, qualified tag and attribute names such as
-- \ are represented just as a string containing a colon, e.g.
-- \"abc:hello\".
--
-- This module provides functionality to handle these more intelligently, splitting
-- all tag and attribute names into their Prefix and LocalPart components.
module Text.XML.Expat.Internal.Qualified (
QName(..),
QAttributes,
mkQName,
mkAnQName,
toQualified,
fromQualified
) where
import Text.XML.Expat.Internal.NodeClass
import Text.XML.Expat.SAX
import Control.DeepSeq
import Data.Monoid
-- | A qualified name.
--
-- Qualified names have two parts, a prefix and a local part. The local part
-- is the name of the tag. The prefix scopes that name to a particular
-- group of legal tags.
--
-- The prefix will usually be associated with a namespace URI. This is usually
-- achieved by using xmlns attributes to bind prefixes to URIs.
data QName text =
QName {
qnPrefix :: Maybe text,
qnLocalPart :: !text
}
deriving (Eq,Show)
instance NFData text => NFData (QName text) where
rnf (QName pre loc) = rnf (pre, loc)
-- | Type shortcut for attributes with qualified names
type QAttributes text = Attributes (QName text) text
-- | Make a new QName from a prefix and localPart.
mkQName :: text -> text -> QName text
mkQName prefix localPart = QName (Just prefix) localPart
-- | Make a new QName with no prefix.
mkAnQName :: text -> QName text
mkAnQName localPart = QName Nothing localPart
toQualified :: (NodeClass n c, GenericXMLString text) => n c text text -> n c (QName text) text
toQualified = mapAllTags qual
where
qual ident =
case gxBreakOn ':' ident of
(prefix, _local) | not (gxNullString _local)
&& gxHead _local == ':'
-> QName (Just prefix) (gxTail _local)
_ -> QName Nothing ident
fromQualified :: (NodeClass n c, GenericXMLString text) => n c (QName text) text -> n c text text
fromQualified = mapAllTags tag
where
tag (QName (Just prefix) local) = prefix `mappend` gxFromChar ':' `mappend` local
tag (QName Nothing local) = local
hexpat-0.20.13/Text/XML/Expat/Internal/DocumentClass.hs 0000644 0000000 0000000 00000010233 13122604047 020653 0 ustar 00 0000000 0000000 {-# LANGUAGE MultiParamTypeClasses, TypeFamilies, FlexibleContexts #-}
-- | Type classes to allow for XML handling functions to be generalized to
-- work with different document types.
module Text.XML.Expat.Internal.DocumentClass where
import Text.XML.Expat.Internal.NodeClass (NodeClass)
import Control.DeepSeq
import Control.Monad (mzero)
import Data.List.Class (List)
-- | XML declaration, consisting of version, encoding and standalone.
--
-- The formatting functions always outputs only UTF-8, regardless
-- of what encoding is specified here. If you want to produce a document in a
-- different encoding, then set the encoding here, format the document, and then
-- convert the output text from UTF-8 to your desired encoding using some
-- text conversion library.
data XMLDeclaration text = XMLDeclaration text (Maybe text) (Maybe Bool) deriving (Eq, Show)
-- | Stub for future expansion.
data DocumentTypeDeclaration (c :: * -> *) tag text = DocumentTypeDeclaration deriving (Eq, Show)
data Misc text =
Comment !text |
ProcessingInstruction !text !text
instance Show text => Show (Misc text) where
showsPrec d (ProcessingInstruction t txt) = showParen (d > 10) $
("ProcessingInstruction "++) . showsPrec 11 t . (" "++) . showsPrec 11 txt
showsPrec d (Comment t) = showParen (d > 10) $ ("Comment "++) . showsPrec 11 t
instance Eq text => Eq (Misc text) where
ProcessingInstruction t1 d1 == ProcessingInstruction t2 d2 =
t1 == t2 &&
d1 == d2
Comment t1 == Comment t2 = t1 == t2
_ == _ = False
instance NFData text => NFData (Misc text) where
rnf (ProcessingInstruction target txt) = rnf (target, txt)
rnf (Comment txt) = rnf txt
type family NodeType (d :: (* -> *) -> * -> * -> *) :: (* -> *) -> * -> * -> *
class (Functor c, List c, NodeClass (NodeType d) c) => DocumentClass d (c :: * -> *) where
-- | Get the XML declaration for this document.
getXMLDeclaration :: d c tag text -> Maybe (XMLDeclaration text)
-- | Get the Document Type Declaration (DTD) for this document.
getDocumentTypeDeclaration :: d c tag text -> Maybe (DocumentTypeDeclaration c tag text)
-- | Get the top-level 'Misc' nodes for this document.
getTopLevelMiscs :: d c tag text -> c (Misc text)
-- | Get the root element for this document.
getRoot :: d c tag text -> NodeType d c tag text
-- | Make a document with the specified fields.
mkDocument :: Maybe (XMLDeclaration text)
-> Maybe (DocumentTypeDeclaration c tag text)
-> c (Misc text)
-> NodeType d c tag text
-> d c tag text
-- | Make a document with the specified root node and all other information
-- set to defaults.
mkPlainDocument :: DocumentClass d c => NodeType d c tag text -> d c tag text
mkPlainDocument = mkDocument Nothing Nothing mzero
modifyXMLDeclaration :: DocumentClass d c =>
(Maybe (XMLDeclaration text) -> Maybe (XMLDeclaration text))
-> d c tag text
-> d c tag text
modifyXMLDeclaration f doc = mkDocument (f $ getXMLDeclaration doc) (getDocumentTypeDeclaration doc)
(getTopLevelMiscs doc) (getRoot doc)
modifyDocumentTypeDeclaration :: DocumentClass d c =>
(Maybe (DocumentTypeDeclaration c tag text) -> Maybe (DocumentTypeDeclaration c tag text))
-> d c tag text
-> d c tag text
modifyDocumentTypeDeclaration f doc = mkDocument (getXMLDeclaration doc) (f $ getDocumentTypeDeclaration doc)
(getTopLevelMiscs doc) (getRoot doc)
modifyTopLevelMiscs :: DocumentClass d c =>
(c (Misc text) -> c (Misc text))
-> d c tag text
-> d c tag text
modifyTopLevelMiscs f doc = mkDocument (getXMLDeclaration doc) (getDocumentTypeDeclaration doc)
(f $ getTopLevelMiscs doc) (getRoot doc)
modifyRoot :: DocumentClass d c =>
(NodeType d c tag text -> NodeType d c tag text)
-> d c tag text
-> d c tag text
modifyRoot f doc = mkDocument (getXMLDeclaration doc) (getDocumentTypeDeclaration doc)
(getTopLevelMiscs doc) (f $ getRoot doc)
hexpat-0.20.13/cbits/ 0000755 0000000 0000000 00000000000 13122604047 012437 5 ustar 00 0000000 0000000 hexpat-0.20.13/cbits/winconfig.h 0000644 0000000 0000000 00000001557 13122604047 014603 0 ustar 00 0000000 0000000 /*================================================================
** Copyright 2000, Clark Cooper
** All rights reserved.
**
** This is free software. You are permitted to copy, distribute, or modify
** it under the terms of the MIT/X license (contained in the COPYING file
** with this distribution.)
*/
#ifndef WINCONFIG_H
#define WINCONFIG_H
#define WIN32_LEAN_AND_MEAN
#include
#undef WIN32_LEAN_AND_MEAN
#include
#include
#if defined(HAVE_EXPAT_CONFIG_H) /* e.g. MinGW */
# include
#else /* !defined(HAVE_EXPAT_CONFIG_H) */
#define XML_NS 1
#define XML_DTD 1
#define XML_CONTEXT_BYTES 1024
/* we will assume all Windows platforms are little endian */
#define BYTEORDER 1234
/* Windows has memmove() available. */
#define HAVE_MEMMOVE
#endif /* !defined(HAVE_EXPAT_CONFIG_H) */
#endif /* ndef WINCONFIG_H */
hexpat-0.20.13/cbits/xmltok_ns.c 0000644 0000000 0000000 00000006110 13122604047 014617 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
/* This file is included! */
#ifdef XML_TOK_NS_C
const ENCODING *
NS(XmlGetUtf8InternalEncoding)(void)
{
return &ns(internal_utf8_encoding).enc;
}
const ENCODING *
NS(XmlGetUtf16InternalEncoding)(void)
{
#if BYTEORDER == 1234
return &ns(internal_little2_encoding).enc;
#elif BYTEORDER == 4321
return &ns(internal_big2_encoding).enc;
#else
const short n = 1;
return (*(const char *)&n
? &ns(internal_little2_encoding).enc
: &ns(internal_big2_encoding).enc);
#endif
}
static const ENCODING * const NS(encodings)[] = {
&ns(latin1_encoding).enc,
&ns(ascii_encoding).enc,
&ns(utf8_encoding).enc,
&ns(big2_encoding).enc,
&ns(big2_encoding).enc,
&ns(little2_encoding).enc,
&ns(utf8_encoding).enc /* NO_ENC */
};
static int PTRCALL
NS(initScanProlog)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
return initScan(NS(encodings), (const INIT_ENCODING *)enc,
XML_PROLOG_STATE, ptr, end, nextTokPtr);
}
static int PTRCALL
NS(initScanContent)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
return initScan(NS(encodings), (const INIT_ENCODING *)enc,
XML_CONTENT_STATE, ptr, end, nextTokPtr);
}
int
NS(XmlInitEncoding)(INIT_ENCODING *p, const ENCODING **encPtr,
const char *name)
{
int i = getEncodingIndex(name);
if (i == UNKNOWN_ENC)
return 0;
SET_INIT_ENC_INDEX(p, i);
p->initEnc.scanners[XML_PROLOG_STATE] = NS(initScanProlog);
p->initEnc.scanners[XML_CONTENT_STATE] = NS(initScanContent);
p->initEnc.updatePosition = initUpdatePosition;
p->encPtr = encPtr;
*encPtr = &(p->initEnc);
return 1;
}
static const ENCODING *
NS(findEncoding)(const ENCODING *enc, const char *ptr, const char *end)
{
#define ENCODING_MAX 128
char buf[ENCODING_MAX];
char *p = buf;
int i;
XmlUtf8Convert(enc, &ptr, end, &p, p + ENCODING_MAX - 1);
if (ptr != end)
return 0;
*p = 0;
if (streqci(buf, KW_UTF_16) && enc->minBytesPerChar == 2)
return enc;
i = getEncodingIndex(buf);
if (i == UNKNOWN_ENC)
return 0;
return NS(encodings)[i];
}
int
NS(XmlParseXmlDecl)(int isGeneralTextEntity,
const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr,
const char **versionPtr,
const char **versionEndPtr,
const char **encodingName,
const ENCODING **encoding,
int *standalone)
{
return doParseXmlDecl(NS(findEncoding),
isGeneralTextEntity,
enc,
ptr,
end,
badPtr,
versionPtr,
versionEndPtr,
encodingName,
encoding,
standalone);
}
#endif /* XML_TOK_NS_C */
hexpat-0.20.13/cbits/internal.h 0000644 0000000 0000000 00000004447 13122604047 014435 0 ustar 00 0000000 0000000 /* internal.h
Internal definitions used by Expat. This is not needed to compile
client code.
The following calling convention macros are defined for frequently
called functions:
FASTCALL - Used for those internal functions that have a simple
body and a low number of arguments and local variables.
PTRCALL - Used for functions called though function pointers.
PTRFASTCALL - Like PTRCALL, but for low number of arguments.
inline - Used for selected internal functions for which inlining
may improve performance on some platforms.
Note: Use of these macros is based on judgement, not hard rules,
and therefore subject to change.
*/
#if defined(__GNUC__) && defined(__i386__) && !defined(__MINGW32__)
/* We'll use this version by default only where we know it helps.
regparm() generates warnings on Solaris boxes. See SF bug #692878.
Instability reported with egcs on a RedHat Linux 7.3.
Let's comment out:
#define FASTCALL __attribute__((stdcall, regparm(3)))
and let's try this:
*/
#define FASTCALL __attribute__((regparm(3)))
#define PTRFASTCALL __attribute__((regparm(3)))
#endif
/* Using __fastcall seems to have an unexpected negative effect under
MS VC++, especially for function pointers, so we won't use it for
now on that platform. It may be reconsidered for a future release
if it can be made more effective.
Likely reason: __fastcall on Windows is like stdcall, therefore
the compiler cannot perform stack optimizations for call clusters.
*/
/* Make sure all of these are defined if they aren't already. */
#ifndef FASTCALL
#define FASTCALL
#endif
#ifndef PTRCALL
#define PTRCALL
#endif
#ifndef PTRFASTCALL
#define PTRFASTCALL
#endif
#ifndef XML_MIN_SIZE
#if !defined(__cplusplus) && !defined(inline)
#ifdef __GNUC__
#define inline __inline
#endif /* __GNUC__ */
#endif
#endif /* XML_MIN_SIZE */
#ifdef __cplusplus
#define inline inline
#else
#ifndef inline
#define inline
#endif
#endif
#ifndef UNUSED_P
# ifdef __GNUC__
# define UNUSED_P(p) UNUSED_ ## p __attribute__((__unused__))
# else
# define UNUSED_P(p) UNUSED_ ## p
# endif
#endif
#ifdef __cplusplus
extern "C" {
#endif
void
align_limit_to_full_utf8_characters(const char * from, const char ** fromLimRef);
#ifdef __cplusplus
}
#endif
hexpat-0.20.13/cbits/utf8tab.h 0000644 0000000 0000000 00000003343 13122604047 014170 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
/* 0x80 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x84 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x88 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x8C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x90 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x94 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x98 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0x9C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xA0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xA4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xA8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xAC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xB0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xB4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xB8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xBC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL,
/* 0xC0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xC4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xC8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xCC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xD0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xD4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xD8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xDC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2,
/* 0xE0 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xE4 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xE8 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xEC */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3,
/* 0xF0 */ BT_LEAD4, BT_LEAD4, BT_LEAD4, BT_LEAD4,
/* 0xF4 */ BT_LEAD4, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0xF8 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0xFC */ BT_NONXML, BT_NONXML, BT_MALFORM, BT_MALFORM,
hexpat-0.20.13/cbits/siphash.h 0000644 0000000 0000000 00000025264 13122604047 014260 0 ustar 00 0000000 0000000 /* ==========================================================================
* siphash.h - SipHash-2-4 in a single header file
* --------------------------------------------------------------------------
* Derived by William Ahern from the reference implementation[1] published[2]
* by Jean-Philippe Aumasson and Daniel J. Berstein. Licensed in kind.
* by Jean-Philippe Aumasson and Daniel J. Berstein.
* Minimal changes by Sebastian Pipping on top, details below.
* Licensed under the CC0 Public Domain Dedication license.
*
* 1. https://www.131002.net/siphash/siphash24.c
* 2. https://www.131002.net/siphash/
* --------------------------------------------------------------------------
* HISTORY:
*
* 2017-06-10 (Sebastian Pipping)
* - Clarify license note in the header
* - Address C89 issues:
* - Stop using inline keyword (and let compiler decide)
* - Turn integer suffix ULL to UL
* - Replace _Bool by int
* - Turn macro siphash24 into a function
* - Address invalid conversion (void pointer) by explicit cast
* - Always expose sip24_valid (for self-tests)
*
* 2012-11-04 - Born. (William Ahern)
* --------------------------------------------------------------------------
* USAGE:
*
* SipHash-2-4 takes as input two 64-bit words as the key, some number of
* message bytes, and outputs a 64-bit word as the message digest. This
* implementation employs two data structures: a struct sipkey for
* representing the key, and a struct siphash for representing the hash
* state.
*
* For converting a 16-byte unsigned char array to a key, use either the
* macro sip_keyof or the routine sip_tokey. The former instantiates a
* compound literal key, while the latter requires a key object as a
* parameter.
*
* unsigned char secret[16];
* arc4random_buf(secret, sizeof secret);
* struct sipkey *key = sip_keyof(secret);
*
* For hashing a message, use either the convenience macro siphash24 or the
* routines sip24_init, sip24_update, and sip24_final.
*
* struct siphash state;
* void *msg;
* size_t len;
* uint64_t hash;
*
* sip24_init(&state, key);
* sip24_update(&state, msg, len);
* hash = sip24_final(&state);
*
* or
*
* hash = siphash24(msg, len, key);
*
* To convert the 64-bit hash value to a canonical 8-byte little-endian
* binary representation, use either the macro sip_binof or the routine
* sip_tobin. The former instantiates and returns a compound literal array,
* while the latter requires an array object as a parameter.
* --------------------------------------------------------------------------
* NOTES:
*
* o Neither sip_keyof, sip_binof, nor siphash24 will work with compilers
* lacking compound literal support. Instead, you must use the lower-level
* interfaces which take as parameters the temporary state objects.
*
* o Uppercase macros may evaluate parameters more than once. Lowercase
* macros should not exhibit any such side effects.
* ==========================================================================
*/
#ifndef SIPHASH_H
#define SIPHASH_H
#include /* size_t */
#include /* uint64_t uint32_t uint8_t */
#define SIP_ROTL(x, b) (uint64_t)(((x) << (b)) | ( (x) >> (64 - (b))))
#define SIP_U32TO8_LE(p, v) \
(p)[0] = (uint8_t)((v) >> 0); (p)[1] = (uint8_t)((v) >> 8); \
(p)[2] = (uint8_t)((v) >> 16); (p)[3] = (uint8_t)((v) >> 24);
#define SIP_U64TO8_LE(p, v) \
SIP_U32TO8_LE((p) + 0, (uint32_t)((v) >> 0)); \
SIP_U32TO8_LE((p) + 4, (uint32_t)((v) >> 32));
#define SIP_U8TO64_LE(p) \
(((uint64_t)((p)[0]) << 0) | \
((uint64_t)((p)[1]) << 8) | \
((uint64_t)((p)[2]) << 16) | \
((uint64_t)((p)[3]) << 24) | \
((uint64_t)((p)[4]) << 32) | \
((uint64_t)((p)[5]) << 40) | \
((uint64_t)((p)[6]) << 48) | \
((uint64_t)((p)[7]) << 56))
#define SIPHASH_INITIALIZER { 0, 0, 0, 0, { 0 }, 0, 0 }
struct siphash {
uint64_t v0, v1, v2, v3;
unsigned char buf[8], *p;
uint64_t c;
}; /* struct siphash */
#define SIP_KEYLEN 16
struct sipkey {
uint64_t k[2];
}; /* struct sipkey */
#define sip_keyof(k) sip_tokey(&(struct sipkey){ { 0 } }, (k))
static struct sipkey *sip_tokey(struct sipkey *key, const void *src) {
key->k[0] = SIP_U8TO64_LE((const unsigned char *)src);
key->k[1] = SIP_U8TO64_LE((const unsigned char *)src + 8);
return key;
} /* sip_tokey() */
#define sip_binof(v) sip_tobin((unsigned char[8]){ 0 }, (v))
static void *sip_tobin(void *dst, uint64_t u64) {
SIP_U64TO8_LE((unsigned char *)dst, u64);
return dst;
} /* sip_tobin() */
static void sip_round(struct siphash *H, const int rounds) {
int i;
for (i = 0; i < rounds; i++) {
H->v0 += H->v1;
H->v1 = SIP_ROTL(H->v1, 13);
H->v1 ^= H->v0;
H->v0 = SIP_ROTL(H->v0, 32);
H->v2 += H->v3;
H->v3 = SIP_ROTL(H->v3, 16);
H->v3 ^= H->v2;
H->v0 += H->v3;
H->v3 = SIP_ROTL(H->v3, 21);
H->v3 ^= H->v0;
H->v2 += H->v1;
H->v1 = SIP_ROTL(H->v1, 17);
H->v1 ^= H->v2;
H->v2 = SIP_ROTL(H->v2, 32);
}
} /* sip_round() */
static struct siphash *sip24_init(struct siphash *H, const struct sipkey *key) {
H->v0 = 0x736f6d6570736575UL ^ key->k[0];
H->v1 = 0x646f72616e646f6dUL ^ key->k[1];
H->v2 = 0x6c7967656e657261UL ^ key->k[0];
H->v3 = 0x7465646279746573UL ^ key->k[1];
H->p = H->buf;
H->c = 0;
return H;
} /* sip24_init() */
#define sip_endof(a) (&(a)[sizeof (a) / sizeof *(a)])
static struct siphash *sip24_update(struct siphash *H, const void *src, size_t len) {
const unsigned char *p = (const unsigned char *)src, *pe = p + len;
uint64_t m;
do {
while (p < pe && H->p < sip_endof(H->buf))
*H->p++ = *p++;
if (H->p < sip_endof(H->buf))
break;
m = SIP_U8TO64_LE(H->buf);
H->v3 ^= m;
sip_round(H, 2);
H->v0 ^= m;
H->p = H->buf;
H->c += 8;
} while (p < pe);
return H;
} /* sip24_update() */
static uint64_t sip24_final(struct siphash *H) {
char left = H->p - H->buf;
uint64_t b = (H->c + left) << 56;
switch (left) {
case 7: b |= (uint64_t)H->buf[6] << 48;
case 6: b |= (uint64_t)H->buf[5] << 40;
case 5: b |= (uint64_t)H->buf[4] << 32;
case 4: b |= (uint64_t)H->buf[3] << 24;
case 3: b |= (uint64_t)H->buf[2] << 16;
case 2: b |= (uint64_t)H->buf[1] << 8;
case 1: b |= (uint64_t)H->buf[0] << 0;
case 0: break;
}
H->v3 ^= b;
sip_round(H, 2);
H->v0 ^= b;
H->v2 ^= 0xff;
sip_round(H, 4);
return H->v0 ^ H->v1 ^ H->v2 ^ H->v3;
} /* sip24_final() */
static uint64_t siphash24(const void *src, size_t len, const struct sipkey *key) {
struct siphash state = SIPHASH_INITIALIZER;
return sip24_final(sip24_update(sip24_init(&state, key), src, len));
} /* siphash24() */
/*
* SipHash-2-4 output with
* k = 00 01 02 ...
* and
* in = (empty string)
* in = 00 (1 byte)
* in = 00 01 (2 bytes)
* in = 00 01 02 (3 bytes)
* ...
* in = 00 01 02 ... 3e (63 bytes)
*/
static int sip24_valid(void) {
static const unsigned char vectors[64][8] = {
{ 0x31, 0x0e, 0x0e, 0xdd, 0x47, 0xdb, 0x6f, 0x72, },
{ 0xfd, 0x67, 0xdc, 0x93, 0xc5, 0x39, 0xf8, 0x74, },
{ 0x5a, 0x4f, 0xa9, 0xd9, 0x09, 0x80, 0x6c, 0x0d, },
{ 0x2d, 0x7e, 0xfb, 0xd7, 0x96, 0x66, 0x67, 0x85, },
{ 0xb7, 0x87, 0x71, 0x27, 0xe0, 0x94, 0x27, 0xcf, },
{ 0x8d, 0xa6, 0x99, 0xcd, 0x64, 0x55, 0x76, 0x18, },
{ 0xce, 0xe3, 0xfe, 0x58, 0x6e, 0x46, 0xc9, 0xcb, },
{ 0x37, 0xd1, 0x01, 0x8b, 0xf5, 0x00, 0x02, 0xab, },
{ 0x62, 0x24, 0x93, 0x9a, 0x79, 0xf5, 0xf5, 0x93, },
{ 0xb0, 0xe4, 0xa9, 0x0b, 0xdf, 0x82, 0x00, 0x9e, },
{ 0xf3, 0xb9, 0xdd, 0x94, 0xc5, 0xbb, 0x5d, 0x7a, },
{ 0xa7, 0xad, 0x6b, 0x22, 0x46, 0x2f, 0xb3, 0xf4, },
{ 0xfb, 0xe5, 0x0e, 0x86, 0xbc, 0x8f, 0x1e, 0x75, },
{ 0x90, 0x3d, 0x84, 0xc0, 0x27, 0x56, 0xea, 0x14, },
{ 0xee, 0xf2, 0x7a, 0x8e, 0x90, 0xca, 0x23, 0xf7, },
{ 0xe5, 0x45, 0xbe, 0x49, 0x61, 0xca, 0x29, 0xa1, },
{ 0xdb, 0x9b, 0xc2, 0x57, 0x7f, 0xcc, 0x2a, 0x3f, },
{ 0x94, 0x47, 0xbe, 0x2c, 0xf5, 0xe9, 0x9a, 0x69, },
{ 0x9c, 0xd3, 0x8d, 0x96, 0xf0, 0xb3, 0xc1, 0x4b, },
{ 0xbd, 0x61, 0x79, 0xa7, 0x1d, 0xc9, 0x6d, 0xbb, },
{ 0x98, 0xee, 0xa2, 0x1a, 0xf2, 0x5c, 0xd6, 0xbe, },
{ 0xc7, 0x67, 0x3b, 0x2e, 0xb0, 0xcb, 0xf2, 0xd0, },
{ 0x88, 0x3e, 0xa3, 0xe3, 0x95, 0x67, 0x53, 0x93, },
{ 0xc8, 0xce, 0x5c, 0xcd, 0x8c, 0x03, 0x0c, 0xa8, },
{ 0x94, 0xaf, 0x49, 0xf6, 0xc6, 0x50, 0xad, 0xb8, },
{ 0xea, 0xb8, 0x85, 0x8a, 0xde, 0x92, 0xe1, 0xbc, },
{ 0xf3, 0x15, 0xbb, 0x5b, 0xb8, 0x35, 0xd8, 0x17, },
{ 0xad, 0xcf, 0x6b, 0x07, 0x63, 0x61, 0x2e, 0x2f, },
{ 0xa5, 0xc9, 0x1d, 0xa7, 0xac, 0xaa, 0x4d, 0xde, },
{ 0x71, 0x65, 0x95, 0x87, 0x66, 0x50, 0xa2, 0xa6, },
{ 0x28, 0xef, 0x49, 0x5c, 0x53, 0xa3, 0x87, 0xad, },
{ 0x42, 0xc3, 0x41, 0xd8, 0xfa, 0x92, 0xd8, 0x32, },
{ 0xce, 0x7c, 0xf2, 0x72, 0x2f, 0x51, 0x27, 0x71, },
{ 0xe3, 0x78, 0x59, 0xf9, 0x46, 0x23, 0xf3, 0xa7, },
{ 0x38, 0x12, 0x05, 0xbb, 0x1a, 0xb0, 0xe0, 0x12, },
{ 0xae, 0x97, 0xa1, 0x0f, 0xd4, 0x34, 0xe0, 0x15, },
{ 0xb4, 0xa3, 0x15, 0x08, 0xbe, 0xff, 0x4d, 0x31, },
{ 0x81, 0x39, 0x62, 0x29, 0xf0, 0x90, 0x79, 0x02, },
{ 0x4d, 0x0c, 0xf4, 0x9e, 0xe5, 0xd4, 0xdc, 0xca, },
{ 0x5c, 0x73, 0x33, 0x6a, 0x76, 0xd8, 0xbf, 0x9a, },
{ 0xd0, 0xa7, 0x04, 0x53, 0x6b, 0xa9, 0x3e, 0x0e, },
{ 0x92, 0x59, 0x58, 0xfc, 0xd6, 0x42, 0x0c, 0xad, },
{ 0xa9, 0x15, 0xc2, 0x9b, 0xc8, 0x06, 0x73, 0x18, },
{ 0x95, 0x2b, 0x79, 0xf3, 0xbc, 0x0a, 0xa6, 0xd4, },
{ 0xf2, 0x1d, 0xf2, 0xe4, 0x1d, 0x45, 0x35, 0xf9, },
{ 0x87, 0x57, 0x75, 0x19, 0x04, 0x8f, 0x53, 0xa9, },
{ 0x10, 0xa5, 0x6c, 0xf5, 0xdf, 0xcd, 0x9a, 0xdb, },
{ 0xeb, 0x75, 0x09, 0x5c, 0xcd, 0x98, 0x6c, 0xd0, },
{ 0x51, 0xa9, 0xcb, 0x9e, 0xcb, 0xa3, 0x12, 0xe6, },
{ 0x96, 0xaf, 0xad, 0xfc, 0x2c, 0xe6, 0x66, 0xc7, },
{ 0x72, 0xfe, 0x52, 0x97, 0x5a, 0x43, 0x64, 0xee, },
{ 0x5a, 0x16, 0x45, 0xb2, 0x76, 0xd5, 0x92, 0xa1, },
{ 0xb2, 0x74, 0xcb, 0x8e, 0xbf, 0x87, 0x87, 0x0a, },
{ 0x6f, 0x9b, 0xb4, 0x20, 0x3d, 0xe7, 0xb3, 0x81, },
{ 0xea, 0xec, 0xb2, 0xa3, 0x0b, 0x22, 0xa8, 0x7f, },
{ 0x99, 0x24, 0xa4, 0x3c, 0xc1, 0x31, 0x57, 0x24, },
{ 0xbd, 0x83, 0x8d, 0x3a, 0xaf, 0xbf, 0x8d, 0xb7, },
{ 0x0b, 0x1a, 0x2a, 0x32, 0x65, 0xd5, 0x1a, 0xea, },
{ 0x13, 0x50, 0x79, 0xa3, 0x23, 0x1c, 0xe6, 0x60, },
{ 0x93, 0x2b, 0x28, 0x46, 0xe4, 0xd7, 0x06, 0x66, },
{ 0xe1, 0x91, 0x5f, 0x5c, 0xb1, 0xec, 0xa4, 0x6c, },
{ 0xf3, 0x25, 0x96, 0x5c, 0xa1, 0x6d, 0x62, 0x9f, },
{ 0x57, 0x5f, 0xf2, 0x8e, 0x60, 0x38, 0x1b, 0xe5, },
{ 0x72, 0x45, 0x06, 0xeb, 0x4c, 0x32, 0x8a, 0x95, }
};
unsigned char in[64];
struct sipkey k;
size_t i;
sip_tokey(&k, "\000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017");
for (i = 0; i < sizeof in; ++i) {
in[i] = i;
if (siphash24(in, i, &k) != SIP_U8TO64_LE(vectors[i]))
return 0;
}
return 1;
} /* sip24_valid() */
#if SIPHASH_MAIN
#include
int main(void) {
int ok = sip24_valid();
if (ok)
puts("OK");
else
puts("FAIL");
return !ok;
} /* main() */
#endif /* SIPHASH_MAIN */
#endif /* SIPHASH_H */
hexpat-0.20.13/cbits/latin1tab.h 0000644 0000000 0000000 00000003425 13122604047 014473 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
/* 0x80 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x84 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x88 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x8C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x90 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x94 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x98 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0x9C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xA0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xA4 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xA8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER,
/* 0xAC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xB0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xB4 */ BT_OTHER, BT_NMSTRT, BT_OTHER, BT_NAME,
/* 0xB8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER,
/* 0xBC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER,
/* 0xC0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xC4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xC8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xCC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xD0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xD4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0xD8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xDC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xE0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xE4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xE8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xEC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xF0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xF4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0xF8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0xFC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
hexpat-0.20.13/cbits/xmltok.h 0000644 0000000 0000000 00000026221 13122604047 014131 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#ifndef XmlTok_INCLUDED
#define XmlTok_INCLUDED 1
#ifdef __cplusplus
extern "C" {
#endif
/* The following token may be returned by XmlContentTok */
#define XML_TOK_TRAILING_RSQB -5 /* ] or ]] at the end of the scan; might be
start of illegal ]]> sequence */
/* The following tokens may be returned by both XmlPrologTok and
XmlContentTok.
*/
#define XML_TOK_NONE -4 /* The string to be scanned is empty */
#define XML_TOK_TRAILING_CR -3 /* A CR at the end of the scan;
might be part of CRLF sequence */
#define XML_TOK_PARTIAL_CHAR -2 /* only part of a multibyte sequence */
#define XML_TOK_PARTIAL -1 /* only part of a token */
#define XML_TOK_INVALID 0
/* The following tokens are returned by XmlContentTok; some are also
returned by XmlAttributeValueTok, XmlEntityTok, XmlCdataSectionTok.
*/
#define XML_TOK_START_TAG_WITH_ATTS 1
#define XML_TOK_START_TAG_NO_ATTS 2
#define XML_TOK_EMPTY_ELEMENT_WITH_ATTS 3 /* empty element tag */
#define XML_TOK_EMPTY_ELEMENT_NO_ATTS 4
#define XML_TOK_END_TAG 5
#define XML_TOK_DATA_CHARS 6
#define XML_TOK_DATA_NEWLINE 7
#define XML_TOK_CDATA_SECT_OPEN 8
#define XML_TOK_ENTITY_REF 9
#define XML_TOK_CHAR_REF 10 /* numeric character reference */
/* The following tokens may be returned by both XmlPrologTok and
XmlContentTok.
*/
#define XML_TOK_PI 11 /* processing instruction */
#define XML_TOK_XML_DECL 12 /* XML decl or text decl */
#define XML_TOK_COMMENT 13
#define XML_TOK_BOM 14 /* Byte order mark */
/* The following tokens are returned only by XmlPrologTok */
#define XML_TOK_PROLOG_S 15
#define XML_TOK_DECL_OPEN 16 /* */
#define XML_TOK_NAME 18
#define XML_TOK_NMTOKEN 19
#define XML_TOK_POUND_NAME 20 /* #name */
#define XML_TOK_OR 21 /* | */
#define XML_TOK_PERCENT 22
#define XML_TOK_OPEN_PAREN 23
#define XML_TOK_CLOSE_PAREN 24
#define XML_TOK_OPEN_BRACKET 25
#define XML_TOK_CLOSE_BRACKET 26
#define XML_TOK_LITERAL 27
#define XML_TOK_PARAM_ENTITY_REF 28
#define XML_TOK_INSTANCE_START 29
/* The following occur only in element type declarations */
#define XML_TOK_NAME_QUESTION 30 /* name? */
#define XML_TOK_NAME_ASTERISK 31 /* name* */
#define XML_TOK_NAME_PLUS 32 /* name+ */
#define XML_TOK_COND_SECT_OPEN 33 /* */
#define XML_TOK_CLOSE_PAREN_QUESTION 35 /* )? */
#define XML_TOK_CLOSE_PAREN_ASTERISK 36 /* )* */
#define XML_TOK_CLOSE_PAREN_PLUS 37 /* )+ */
#define XML_TOK_COMMA 38
/* The following token is returned only by XmlAttributeValueTok */
#define XML_TOK_ATTRIBUTE_VALUE_S 39
/* The following token is returned only by XmlCdataSectionTok */
#define XML_TOK_CDATA_SECT_CLOSE 40
/* With namespace processing this is returned by XmlPrologTok for a
name with a colon.
*/
#define XML_TOK_PREFIXED_NAME 41
#ifdef XML_DTD
#define XML_TOK_IGNORE_SECT 42
#endif /* XML_DTD */
#ifdef XML_DTD
#define XML_N_STATES 4
#else /* not XML_DTD */
#define XML_N_STATES 3
#endif /* not XML_DTD */
#define XML_PROLOG_STATE 0
#define XML_CONTENT_STATE 1
#define XML_CDATA_SECTION_STATE 2
#ifdef XML_DTD
#define XML_IGNORE_SECTION_STATE 3
#endif /* XML_DTD */
#define XML_N_LITERAL_TYPES 2
#define XML_ATTRIBUTE_VALUE_LITERAL 0
#define XML_ENTITY_VALUE_LITERAL 1
/* The size of the buffer passed to XmlUtf8Encode must be at least this. */
#define XML_UTF8_ENCODE_MAX 4
/* The size of the buffer passed to XmlUtf16Encode must be at least this. */
#define XML_UTF16_ENCODE_MAX 2
typedef struct position {
/* first line and first column are 0 not 1 */
XML_Size lineNumber;
XML_Size columnNumber;
} POSITION;
typedef struct {
const char *name;
const char *valuePtr;
const char *valueEnd;
char normalized;
} ATTRIBUTE;
struct encoding;
typedef struct encoding ENCODING;
typedef int (PTRCALL *SCANNER)(const ENCODING *,
const char *,
const char *,
const char **);
enum XML_Convert_Result {
XML_CONVERT_COMPLETED = 0,
XML_CONVERT_INPUT_INCOMPLETE = 1,
XML_CONVERT_OUTPUT_EXHAUSTED = 2 /* and therefore potentially input remaining as well */
};
struct encoding {
SCANNER scanners[XML_N_STATES];
SCANNER literalScanners[XML_N_LITERAL_TYPES];
int (PTRCALL *sameName)(const ENCODING *,
const char *,
const char *);
int (PTRCALL *nameMatchesAscii)(const ENCODING *,
const char *,
const char *,
const char *);
int (PTRFASTCALL *nameLength)(const ENCODING *, const char *);
const char *(PTRFASTCALL *skipS)(const ENCODING *, const char *);
int (PTRCALL *getAtts)(const ENCODING *enc,
const char *ptr,
int attsMax,
ATTRIBUTE *atts);
int (PTRFASTCALL *charRefNumber)(const ENCODING *enc, const char *ptr);
int (PTRCALL *predefinedEntityName)(const ENCODING *,
const char *,
const char *);
void (PTRCALL *updatePosition)(const ENCODING *,
const char *ptr,
const char *end,
POSITION *);
int (PTRCALL *isPublicId)(const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr);
enum XML_Convert_Result (PTRCALL *utf8Convert)(const ENCODING *enc,
const char **fromP,
const char *fromLim,
char **toP,
const char *toLim);
enum XML_Convert_Result (PTRCALL *utf16Convert)(const ENCODING *enc,
const char **fromP,
const char *fromLim,
unsigned short **toP,
const unsigned short *toLim);
int minBytesPerChar;
char isUtf8;
char isUtf16;
};
/* Scan the string starting at ptr until the end of the next complete
token, but do not scan past eptr. Return an integer giving the
type of token.
Return XML_TOK_NONE when ptr == eptr; nextTokPtr will not be set.
Return XML_TOK_PARTIAL when the string does not contain a complete
token; nextTokPtr will not be set.
Return XML_TOK_INVALID when the string does not start a valid
token; nextTokPtr will be set to point to the character which made
the token invalid.
Otherwise the string starts with a valid token; nextTokPtr will be
set to point to the character following the end of that token.
Each data character counts as a single token, but adjacent data
characters may be returned together. Similarly for characters in
the prolog outside literals, comments and processing instructions.
*/
#define XmlTok(enc, state, ptr, end, nextTokPtr) \
(((enc)->scanners[state])(enc, ptr, end, nextTokPtr))
#define XmlPrologTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_PROLOG_STATE, ptr, end, nextTokPtr)
#define XmlContentTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_CONTENT_STATE, ptr, end, nextTokPtr)
#define XmlCdataSectionTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_CDATA_SECTION_STATE, ptr, end, nextTokPtr)
#ifdef XML_DTD
#define XmlIgnoreSectionTok(enc, ptr, end, nextTokPtr) \
XmlTok(enc, XML_IGNORE_SECTION_STATE, ptr, end, nextTokPtr)
#endif /* XML_DTD */
/* This is used for performing a 2nd-level tokenization on the content
of a literal that has already been returned by XmlTok.
*/
#define XmlLiteralTok(enc, literalType, ptr, end, nextTokPtr) \
(((enc)->literalScanners[literalType])(enc, ptr, end, nextTokPtr))
#define XmlAttributeValueTok(enc, ptr, end, nextTokPtr) \
XmlLiteralTok(enc, XML_ATTRIBUTE_VALUE_LITERAL, ptr, end, nextTokPtr)
#define XmlEntityValueTok(enc, ptr, end, nextTokPtr) \
XmlLiteralTok(enc, XML_ENTITY_VALUE_LITERAL, ptr, end, nextTokPtr)
#define XmlSameName(enc, ptr1, ptr2) (((enc)->sameName)(enc, ptr1, ptr2))
#define XmlNameMatchesAscii(enc, ptr1, end1, ptr2) \
(((enc)->nameMatchesAscii)(enc, ptr1, end1, ptr2))
#define XmlNameLength(enc, ptr) \
(((enc)->nameLength)(enc, ptr))
#define XmlSkipS(enc, ptr) \
(((enc)->skipS)(enc, ptr))
#define XmlGetAttributes(enc, ptr, attsMax, atts) \
(((enc)->getAtts)(enc, ptr, attsMax, atts))
#define XmlCharRefNumber(enc, ptr) \
(((enc)->charRefNumber)(enc, ptr))
#define XmlPredefinedEntityName(enc, ptr, end) \
(((enc)->predefinedEntityName)(enc, ptr, end))
#define XmlUpdatePosition(enc, ptr, end, pos) \
(((enc)->updatePosition)(enc, ptr, end, pos))
#define XmlIsPublicId(enc, ptr, end, badPtr) \
(((enc)->isPublicId)(enc, ptr, end, badPtr))
#define XmlUtf8Convert(enc, fromP, fromLim, toP, toLim) \
(((enc)->utf8Convert)(enc, fromP, fromLim, toP, toLim))
#define XmlUtf16Convert(enc, fromP, fromLim, toP, toLim) \
(((enc)->utf16Convert)(enc, fromP, fromLim, toP, toLim))
typedef struct {
ENCODING initEnc;
const ENCODING **encPtr;
} INIT_ENCODING;
int XmlParseXmlDecl(int isGeneralTextEntity,
const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr,
const char **versionPtr,
const char **versionEndPtr,
const char **encodingNamePtr,
const ENCODING **namedEncodingPtr,
int *standalonePtr);
int XmlInitEncoding(INIT_ENCODING *, const ENCODING **, const char *name);
const ENCODING *XmlGetUtf8InternalEncoding(void);
const ENCODING *XmlGetUtf16InternalEncoding(void);
int FASTCALL XmlUtf8Encode(int charNumber, char *buf);
int FASTCALL XmlUtf16Encode(int charNumber, unsigned short *buf);
int XmlSizeOfUnknownEncoding(void);
typedef int (XMLCALL *CONVERTER) (void *userData, const char *p);
ENCODING *
XmlInitUnknownEncoding(void *mem,
int *table,
CONVERTER convert,
void *userData);
int XmlParseXmlDeclNS(int isGeneralTextEntity,
const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr,
const char **versionPtr,
const char **versionEndPtr,
const char **encodingNamePtr,
const ENCODING **namedEncodingPtr,
int *standalonePtr);
int XmlInitEncodingNS(INIT_ENCODING *, const ENCODING **, const char *name);
const ENCODING *XmlGetUtf8InternalEncodingNS(void);
const ENCODING *XmlGetUtf16InternalEncodingNS(void);
ENCODING *
XmlInitUnknownEncodingNS(void *mem,
int *table,
CONVERTER convert,
void *userData);
#ifdef __cplusplus
}
#endif
#endif /* not XmlTok_INCLUDED */
hexpat-0.20.13/cbits/expat.h 0000644 0000000 0000000 00000122207 13122604047 013735 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#ifndef Expat_INCLUDED
#define Expat_INCLUDED 1
#ifdef __VMS
/* 0 1 2 3 0 1 2 3
1234567890123456789012345678901 1234567890123456789012345678901 */
#define XML_SetProcessingInstructionHandler XML_SetProcessingInstrHandler
#define XML_SetUnparsedEntityDeclHandler XML_SetUnparsedEntDeclHandler
#define XML_SetStartNamespaceDeclHandler XML_SetStartNamespcDeclHandler
#define XML_SetExternalEntityRefHandlerArg XML_SetExternalEntRefHandlerArg
#endif
#include
#include "expat_external.h"
#ifdef __cplusplus
extern "C" {
#endif
struct XML_ParserStruct;
typedef struct XML_ParserStruct *XML_Parser;
/* Should this be defined using stdbool.h when C99 is available? */
typedef unsigned char XML_Bool;
#define XML_TRUE ((XML_Bool) 1)
#define XML_FALSE ((XML_Bool) 0)
/* The XML_Status enum gives the possible return values for several
API functions. The preprocessor #defines are included so this
stanza can be added to code that still needs to support older
versions of Expat 1.95.x:
#ifndef XML_STATUS_OK
#define XML_STATUS_OK 1
#define XML_STATUS_ERROR 0
#endif
Otherwise, the #define hackery is quite ugly and would have been
dropped.
*/
enum XML_Status {
XML_STATUS_ERROR = 0,
#define XML_STATUS_ERROR XML_STATUS_ERROR
XML_STATUS_OK = 1,
#define XML_STATUS_OK XML_STATUS_OK
XML_STATUS_SUSPENDED = 2
#define XML_STATUS_SUSPENDED XML_STATUS_SUSPENDED
};
enum XML_Error {
XML_ERROR_NONE,
XML_ERROR_NO_MEMORY,
XML_ERROR_SYNTAX,
XML_ERROR_NO_ELEMENTS,
XML_ERROR_INVALID_TOKEN,
XML_ERROR_UNCLOSED_TOKEN,
XML_ERROR_PARTIAL_CHAR,
XML_ERROR_TAG_MISMATCH,
XML_ERROR_DUPLICATE_ATTRIBUTE,
XML_ERROR_JUNK_AFTER_DOC_ELEMENT,
XML_ERROR_PARAM_ENTITY_REF,
XML_ERROR_UNDEFINED_ENTITY,
XML_ERROR_RECURSIVE_ENTITY_REF,
XML_ERROR_ASYNC_ENTITY,
XML_ERROR_BAD_CHAR_REF,
XML_ERROR_BINARY_ENTITY_REF,
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF,
XML_ERROR_MISPLACED_XML_PI,
XML_ERROR_UNKNOWN_ENCODING,
XML_ERROR_INCORRECT_ENCODING,
XML_ERROR_UNCLOSED_CDATA_SECTION,
XML_ERROR_EXTERNAL_ENTITY_HANDLING,
XML_ERROR_NOT_STANDALONE,
XML_ERROR_UNEXPECTED_STATE,
XML_ERROR_ENTITY_DECLARED_IN_PE,
XML_ERROR_FEATURE_REQUIRES_XML_DTD,
XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING,
/* Added in 1.95.7. */
XML_ERROR_UNBOUND_PREFIX,
/* Added in 1.95.8. */
XML_ERROR_UNDECLARING_PREFIX,
XML_ERROR_INCOMPLETE_PE,
XML_ERROR_XML_DECL,
XML_ERROR_TEXT_DECL,
XML_ERROR_PUBLICID,
XML_ERROR_SUSPENDED,
XML_ERROR_NOT_SUSPENDED,
XML_ERROR_ABORTED,
XML_ERROR_FINISHED,
XML_ERROR_SUSPEND_PE,
/* Added in 2.0. */
XML_ERROR_RESERVED_PREFIX_XML,
XML_ERROR_RESERVED_PREFIX_XMLNS,
XML_ERROR_RESERVED_NAMESPACE_URI,
/* Added in 2.2.1. */
XML_ERROR_INVALID_ARGUMENT
};
enum XML_Content_Type {
XML_CTYPE_EMPTY = 1,
XML_CTYPE_ANY,
XML_CTYPE_MIXED,
XML_CTYPE_NAME,
XML_CTYPE_CHOICE,
XML_CTYPE_SEQ
};
enum XML_Content_Quant {
XML_CQUANT_NONE,
XML_CQUANT_OPT,
XML_CQUANT_REP,
XML_CQUANT_PLUS
};
/* If type == XML_CTYPE_EMPTY or XML_CTYPE_ANY, then quant will be
XML_CQUANT_NONE, and the other fields will be zero or NULL.
If type == XML_CTYPE_MIXED, then quant will be NONE or REP and
numchildren will contain number of elements that may be mixed in
and children point to an array of XML_Content cells that will be
all of XML_CTYPE_NAME type with no quantification.
If type == XML_CTYPE_NAME, then the name points to the name, and
the numchildren field will be zero and children will be NULL. The
quant fields indicates any quantifiers placed on the name.
CHOICE and SEQ will have name NULL, the number of children in
numchildren and children will point, recursively, to an array
of XML_Content cells.
The EMPTY, ANY, and MIXED types will only occur at top level.
*/
typedef struct XML_cp XML_Content;
struct XML_cp {
enum XML_Content_Type type;
enum XML_Content_Quant quant;
XML_Char * name;
unsigned int numchildren;
XML_Content * children;
};
/* This is called for an element declaration. See above for
description of the model argument. It's the caller's responsibility
to free model when finished with it.
*/
typedef void (XMLCALL *XML_ElementDeclHandler) (void *userData,
const XML_Char *name,
XML_Content *model);
XMLPARSEAPI(void)
XML_SetElementDeclHandler(XML_Parser parser,
XML_ElementDeclHandler eldecl);
/* The Attlist declaration handler is called for *each* attribute. So
a single Attlist declaration with multiple attributes declared will
generate multiple calls to this handler. The "default" parameter
may be NULL in the case of the "#IMPLIED" or "#REQUIRED"
keyword. The "isrequired" parameter will be true and the default
value will be NULL in the case of "#REQUIRED". If "isrequired" is
true and default is non-NULL, then this is a "#FIXED" default.
*/
typedef void (XMLCALL *XML_AttlistDeclHandler) (
void *userData,
const XML_Char *elname,
const XML_Char *attname,
const XML_Char *att_type,
const XML_Char *dflt,
int isrequired);
XMLPARSEAPI(void)
XML_SetAttlistDeclHandler(XML_Parser parser,
XML_AttlistDeclHandler attdecl);
/* The XML declaration handler is called for *both* XML declarations
and text declarations. The way to distinguish is that the version
parameter will be NULL for text declarations. The encoding
parameter may be NULL for XML declarations. The standalone
parameter will be -1, 0, or 1 indicating respectively that there
was no standalone parameter in the declaration, that it was given
as no, or that it was given as yes.
*/
typedef void (XMLCALL *XML_XmlDeclHandler) (void *userData,
const XML_Char *version,
const XML_Char *encoding,
int standalone);
XMLPARSEAPI(void)
XML_SetXmlDeclHandler(XML_Parser parser,
XML_XmlDeclHandler xmldecl);
typedef struct {
void *(*malloc_fcn)(size_t size);
void *(*realloc_fcn)(void *ptr, size_t size);
void (*free_fcn)(void *ptr);
} XML_Memory_Handling_Suite;
/* Constructs a new parser; encoding is the encoding specified by the
external protocol or NULL if there is none specified.
*/
XMLPARSEAPI(XML_Parser)
XML_ParserCreate(const XML_Char *encoding);
/* Constructs a new parser and namespace processor. Element type
names and attribute names that belong to a namespace will be
expanded; unprefixed attribute names are never expanded; unprefixed
element type names are expanded only if there is a default
namespace. The expanded name is the concatenation of the namespace
URI, the namespace separator character, and the local part of the
name. If the namespace separator is '\0' then the namespace URI
and the local part will be concatenated without any separator.
It is a programming error to use the separator '\0' with namespace
triplets (see XML_SetReturnNSTriplet).
*/
XMLPARSEAPI(XML_Parser)
XML_ParserCreateNS(const XML_Char *encoding, XML_Char namespaceSeparator);
/* Constructs a new parser using the memory management suite referred to
by memsuite. If memsuite is NULL, then use the standard library memory
suite. If namespaceSeparator is non-NULL it creates a parser with
namespace processing as described above. The character pointed at
will serve as the namespace separator.
All further memory operations used for the created parser will come from
the given suite.
*/
XMLPARSEAPI(XML_Parser)
XML_ParserCreate_MM(const XML_Char *encoding,
const XML_Memory_Handling_Suite *memsuite,
const XML_Char *namespaceSeparator);
/* Prepare a parser object to be re-used. This is particularly
valuable when memory allocation overhead is disproportionatly high,
such as when a large number of small documnents need to be parsed.
All handlers are cleared from the parser, except for the
unknownEncodingHandler. The parser's external state is re-initialized
except for the values of ns and ns_triplets.
Added in Expat 1.95.3.
*/
XMLPARSEAPI(XML_Bool)
XML_ParserReset(XML_Parser parser, const XML_Char *encoding);
/* atts is array of name/value pairs, terminated by 0;
names and values are 0 terminated.
*/
typedef void (XMLCALL *XML_StartElementHandler) (void *userData,
const XML_Char *name,
const XML_Char **atts);
typedef void (XMLCALL *XML_EndElementHandler) (void *userData,
const XML_Char *name);
/* s is not 0 terminated. */
typedef void (XMLCALL *XML_CharacterDataHandler) (void *userData,
const XML_Char *s,
int len);
/* target and data are 0 terminated */
typedef void (XMLCALL *XML_ProcessingInstructionHandler) (
void *userData,
const XML_Char *target,
const XML_Char *data);
/* data is 0 terminated */
typedef void (XMLCALL *XML_CommentHandler) (void *userData,
const XML_Char *data);
typedef void (XMLCALL *XML_StartCdataSectionHandler) (void *userData);
typedef void (XMLCALL *XML_EndCdataSectionHandler) (void *userData);
/* This is called for any characters in the XML document for which
there is no applicable handler. This includes both characters that
are part of markup which is of a kind that is not reported
(comments, markup declarations), or characters that are part of a
construct which could be reported but for which no handler has been
supplied. The characters are passed exactly as they were in the XML
document except that they will be encoded in UTF-8 or UTF-16.
Line boundaries are not normalized. Note that a byte order mark
character is not passed to the default handler. There are no
guarantees about how characters are divided between calls to the
default handler: for example, a comment might be split between
multiple calls.
*/
typedef void (XMLCALL *XML_DefaultHandler) (void *userData,
const XML_Char *s,
int len);
/* This is called for the start of the DOCTYPE declaration, before
any DTD or internal subset is parsed.
*/
typedef void (XMLCALL *XML_StartDoctypeDeclHandler) (
void *userData,
const XML_Char *doctypeName,
const XML_Char *sysid,
const XML_Char *pubid,
int has_internal_subset);
/* This is called for the start of the DOCTYPE declaration when the
closing > is encountered, but after processing any external
subset.
*/
typedef void (XMLCALL *XML_EndDoctypeDeclHandler)(void *userData);
/* This is called for entity declarations. The is_parameter_entity
argument will be non-zero if the entity is a parameter entity, zero
otherwise.
For internal entities (), value will
be non-NULL and systemId, publicID, and notationName will be NULL.
The value string is NOT nul-terminated; the length is provided in
the value_length argument. Since it is legal to have zero-length
values, do not use this argument to test for internal entities.
For external entities, value will be NULL and systemId will be
non-NULL. The publicId argument will be NULL unless a public
identifier was provided. The notationName argument will have a
non-NULL value only for unparsed entity declarations.
Note that is_parameter_entity can't be changed to XML_Bool, since
that would break binary compatibility.
*/
typedef void (XMLCALL *XML_EntityDeclHandler) (
void *userData,
const XML_Char *entityName,
int is_parameter_entity,
const XML_Char *value,
int value_length,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId,
const XML_Char *notationName);
XMLPARSEAPI(void)
XML_SetEntityDeclHandler(XML_Parser parser,
XML_EntityDeclHandler handler);
/* OBSOLETE -- OBSOLETE -- OBSOLETE
This handler has been superseded by the EntityDeclHandler above.
It is provided here for backward compatibility.
This is called for a declaration of an unparsed (NDATA) entity.
The base argument is whatever was set by XML_SetBase. The
entityName, systemId and notationName arguments will never be
NULL. The other arguments may be.
*/
typedef void (XMLCALL *XML_UnparsedEntityDeclHandler) (
void *userData,
const XML_Char *entityName,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId,
const XML_Char *notationName);
/* This is called for a declaration of notation. The base argument is
whatever was set by XML_SetBase. The notationName will never be
NULL. The other arguments can be.
*/
typedef void (XMLCALL *XML_NotationDeclHandler) (
void *userData,
const XML_Char *notationName,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId);
/* When namespace processing is enabled, these are called once for
each namespace declaration. The call to the start and end element
handlers occur between the calls to the start and end namespace
declaration handlers. For an xmlns attribute, prefix will be
NULL. For an xmlns="" attribute, uri will be NULL.
*/
typedef void (XMLCALL *XML_StartNamespaceDeclHandler) (
void *userData,
const XML_Char *prefix,
const XML_Char *uri);
typedef void (XMLCALL *XML_EndNamespaceDeclHandler) (
void *userData,
const XML_Char *prefix);
/* This is called if the document is not standalone, that is, it has an
external subset or a reference to a parameter entity, but does not
have standalone="yes". If this handler returns XML_STATUS_ERROR,
then processing will not continue, and the parser will return a
XML_ERROR_NOT_STANDALONE error.
If parameter entity parsing is enabled, then in addition to the
conditions above this handler will only be called if the referenced
entity was actually read.
*/
typedef int (XMLCALL *XML_NotStandaloneHandler) (void *userData);
/* This is called for a reference to an external parsed general
entity. The referenced entity is not automatically parsed. The
application can parse it immediately or later using
XML_ExternalEntityParserCreate.
The parser argument is the parser parsing the entity containing the
reference; it can be passed as the parser argument to
XML_ExternalEntityParserCreate. The systemId argument is the
system identifier as specified in the entity declaration; it will
not be NULL.
The base argument is the system identifier that should be used as
the base for resolving systemId if systemId was relative; this is
set by XML_SetBase; it may be NULL.
The publicId argument is the public identifier as specified in the
entity declaration, or NULL if none was specified; the whitespace
in the public identifier will have been normalized as required by
the XML spec.
The context argument specifies the parsing context in the format
expected by the context argument to XML_ExternalEntityParserCreate;
context is valid only until the handler returns, so if the
referenced entity is to be parsed later, it must be copied.
context is NULL only when the entity is a parameter entity.
The handler should return XML_STATUS_ERROR if processing should not
continue because of a fatal error in the handling of the external
entity. In this case the calling parser will return an
XML_ERROR_EXTERNAL_ENTITY_HANDLING error.
Note that unlike other handlers the first argument is the parser,
not userData.
*/
typedef int (XMLCALL *XML_ExternalEntityRefHandler) (
XML_Parser parser,
const XML_Char *context,
const XML_Char *base,
const XML_Char *systemId,
const XML_Char *publicId);
/* This is called in two situations:
1) An entity reference is encountered for which no declaration
has been read *and* this is not an error.
2) An internal entity reference is read, but not expanded, because
XML_SetDefaultHandler has been called.
Note: skipped parameter entities in declarations and skipped general
entities in attribute values cannot be reported, because
the event would be out of sync with the reporting of the
declarations or attribute values
*/
typedef void (XMLCALL *XML_SkippedEntityHandler) (
void *userData,
const XML_Char *entityName,
int is_parameter_entity);
/* This structure is filled in by the XML_UnknownEncodingHandler to
provide information to the parser about encodings that are unknown
to the parser.
The map[b] member gives information about byte sequences whose
first byte is b.
If map[b] is c where c is >= 0, then b by itself encodes the
Unicode scalar value c.
If map[b] is -1, then the byte sequence is malformed.
If map[b] is -n, where n >= 2, then b is the first byte of an
n-byte sequence that encodes a single Unicode scalar value.
The data member will be passed as the first argument to the convert
function.
The convert function is used to convert multibyte sequences; s will
point to a n-byte sequence where map[(unsigned char)*s] == -n. The
convert function must return the Unicode scalar value represented
by this byte sequence or -1 if the byte sequence is malformed.
The convert function may be NULL if the encoding is a single-byte
encoding, that is if map[b] >= -1 for all bytes b.
When the parser is finished with the encoding, then if release is
not NULL, it will call release passing it the data member; once
release has been called, the convert function will not be called
again.
Expat places certain restrictions on the encodings that are supported
using this mechanism.
1. Every ASCII character that can appear in a well-formed XML document,
other than the characters
$@\^`{}~
must be represented by a single byte, and that byte must be the
same byte that represents that character in ASCII.
2. No character may require more than 4 bytes to encode.
3. All characters encoded must have Unicode scalar values <=
0xFFFF, (i.e., characters that would be encoded by surrogates in
UTF-16 are not allowed). Note that this restriction doesn't
apply to the built-in support for UTF-8 and UTF-16.
4. No Unicode character may be encoded by more than one distinct
sequence of bytes.
*/
typedef struct {
int map[256];
void *data;
int (XMLCALL *convert)(void *data, const char *s);
void (XMLCALL *release)(void *data);
} XML_Encoding;
/* This is called for an encoding that is unknown to the parser.
The encodingHandlerData argument is that which was passed as the
second argument to XML_SetUnknownEncodingHandler.
The name argument gives the name of the encoding as specified in
the encoding declaration.
If the callback can provide information about the encoding, it must
fill in the XML_Encoding structure, and return XML_STATUS_OK.
Otherwise it must return XML_STATUS_ERROR.
If info does not describe a suitable encoding, then the parser will
return an XML_UNKNOWN_ENCODING error.
*/
typedef int (XMLCALL *XML_UnknownEncodingHandler) (
void *encodingHandlerData,
const XML_Char *name,
XML_Encoding *info);
XMLPARSEAPI(void)
XML_SetElementHandler(XML_Parser parser,
XML_StartElementHandler start,
XML_EndElementHandler end);
XMLPARSEAPI(void)
XML_SetStartElementHandler(XML_Parser parser,
XML_StartElementHandler handler);
XMLPARSEAPI(void)
XML_SetEndElementHandler(XML_Parser parser,
XML_EndElementHandler handler);
XMLPARSEAPI(void)
XML_SetCharacterDataHandler(XML_Parser parser,
XML_CharacterDataHandler handler);
XMLPARSEAPI(void)
XML_SetProcessingInstructionHandler(XML_Parser parser,
XML_ProcessingInstructionHandler handler);
XMLPARSEAPI(void)
XML_SetCommentHandler(XML_Parser parser,
XML_CommentHandler handler);
XMLPARSEAPI(void)
XML_SetCdataSectionHandler(XML_Parser parser,
XML_StartCdataSectionHandler start,
XML_EndCdataSectionHandler end);
XMLPARSEAPI(void)
XML_SetStartCdataSectionHandler(XML_Parser parser,
XML_StartCdataSectionHandler start);
XMLPARSEAPI(void)
XML_SetEndCdataSectionHandler(XML_Parser parser,
XML_EndCdataSectionHandler end);
/* This sets the default handler and also inhibits expansion of
internal entities. These entity references will be passed to the
default handler, or to the skipped entity handler, if one is set.
*/
XMLPARSEAPI(void)
XML_SetDefaultHandler(XML_Parser parser,
XML_DefaultHandler handler);
/* This sets the default handler but does not inhibit expansion of
internal entities. The entity reference will not be passed to the
default handler.
*/
XMLPARSEAPI(void)
XML_SetDefaultHandlerExpand(XML_Parser parser,
XML_DefaultHandler handler);
XMLPARSEAPI(void)
XML_SetDoctypeDeclHandler(XML_Parser parser,
XML_StartDoctypeDeclHandler start,
XML_EndDoctypeDeclHandler end);
XMLPARSEAPI(void)
XML_SetStartDoctypeDeclHandler(XML_Parser parser,
XML_StartDoctypeDeclHandler start);
XMLPARSEAPI(void)
XML_SetEndDoctypeDeclHandler(XML_Parser parser,
XML_EndDoctypeDeclHandler end);
XMLPARSEAPI(void)
XML_SetUnparsedEntityDeclHandler(XML_Parser parser,
XML_UnparsedEntityDeclHandler handler);
XMLPARSEAPI(void)
XML_SetNotationDeclHandler(XML_Parser parser,
XML_NotationDeclHandler handler);
XMLPARSEAPI(void)
XML_SetNamespaceDeclHandler(XML_Parser parser,
XML_StartNamespaceDeclHandler start,
XML_EndNamespaceDeclHandler end);
XMLPARSEAPI(void)
XML_SetStartNamespaceDeclHandler(XML_Parser parser,
XML_StartNamespaceDeclHandler start);
XMLPARSEAPI(void)
XML_SetEndNamespaceDeclHandler(XML_Parser parser,
XML_EndNamespaceDeclHandler end);
XMLPARSEAPI(void)
XML_SetNotStandaloneHandler(XML_Parser parser,
XML_NotStandaloneHandler handler);
XMLPARSEAPI(void)
XML_SetExternalEntityRefHandler(XML_Parser parser,
XML_ExternalEntityRefHandler handler);
/* If a non-NULL value for arg is specified here, then it will be
passed as the first argument to the external entity ref handler
instead of the parser object.
*/
XMLPARSEAPI(void)
XML_SetExternalEntityRefHandlerArg(XML_Parser parser,
void *arg);
XMLPARSEAPI(void)
XML_SetSkippedEntityHandler(XML_Parser parser,
XML_SkippedEntityHandler handler);
XMLPARSEAPI(void)
XML_SetUnknownEncodingHandler(XML_Parser parser,
XML_UnknownEncodingHandler handler,
void *encodingHandlerData);
/* This can be called within a handler for a start element, end
element, processing instruction or character data. It causes the
corresponding markup to be passed to the default handler.
*/
XMLPARSEAPI(void)
XML_DefaultCurrent(XML_Parser parser);
/* If do_nst is non-zero, and namespace processing is in effect, and
a name has a prefix (i.e. an explicit namespace qualifier) then
that name is returned as a triplet in a single string separated by
the separator character specified when the parser was created: URI
+ sep + local_name + sep + prefix.
If do_nst is zero, then namespace information is returned in the
default manner (URI + sep + local_name) whether or not the name
has a prefix.
Note: Calling XML_SetReturnNSTriplet after XML_Parse or
XML_ParseBuffer has no effect.
*/
XMLPARSEAPI(void)
XML_SetReturnNSTriplet(XML_Parser parser, int do_nst);
/* This value is passed as the userData argument to callbacks. */
XMLPARSEAPI(void)
XML_SetUserData(XML_Parser parser, void *userData);
/* Returns the last value set by XML_SetUserData or NULL. */
#define XML_GetUserData(parser) (*(void **)(parser))
/* This is equivalent to supplying an encoding argument to
XML_ParserCreate. On success XML_SetEncoding returns non-zero,
zero otherwise.
Note: Calling XML_SetEncoding after XML_Parse or XML_ParseBuffer
has no effect and returns XML_STATUS_ERROR.
*/
XMLPARSEAPI(enum XML_Status)
XML_SetEncoding(XML_Parser parser, const XML_Char *encoding);
/* If this function is called, then the parser will be passed as the
first argument to callbacks instead of userData. The userData will
still be accessible using XML_GetUserData.
*/
XMLPARSEAPI(void)
XML_UseParserAsHandlerArg(XML_Parser parser);
/* If useDTD == XML_TRUE is passed to this function, then the parser
will assume that there is an external subset, even if none is
specified in the document. In such a case the parser will call the
externalEntityRefHandler with a value of NULL for the systemId
argument (the publicId and context arguments will be NULL as well).
Note: For the purpose of checking WFC: Entity Declared, passing
useDTD == XML_TRUE will make the parser behave as if the document
had a DTD with an external subset.
Note: If this function is called, then this must be done before
the first call to XML_Parse or XML_ParseBuffer, since it will
have no effect after that. Returns
XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING.
Note: If the document does not have a DOCTYPE declaration at all,
then startDoctypeDeclHandler and endDoctypeDeclHandler will not
be called, despite an external subset being parsed.
Note: If XML_DTD is not defined when Expat is compiled, returns
XML_ERROR_FEATURE_REQUIRES_XML_DTD.
Note: If parser == NULL, returns XML_ERROR_INVALID_ARGUMENT.
*/
XMLPARSEAPI(enum XML_Error)
XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD);
/* Sets the base to be used for resolving relative URIs in system
identifiers in declarations. Resolving relative identifiers is
left to the application: this value will be passed through as the
base argument to the XML_ExternalEntityRefHandler,
XML_NotationDeclHandler and XML_UnparsedEntityDeclHandler. The base
argument will be copied. Returns XML_STATUS_ERROR if out of memory,
XML_STATUS_OK otherwise.
*/
XMLPARSEAPI(enum XML_Status)
XML_SetBase(XML_Parser parser, const XML_Char *base);
XMLPARSEAPI(const XML_Char *)
XML_GetBase(XML_Parser parser);
/* Returns the number of the attribute/value pairs passed in last call
to the XML_StartElementHandler that were specified in the start-tag
rather than defaulted. Each attribute/value pair counts as 2; thus
this correspondds to an index into the atts array passed to the
XML_StartElementHandler. Returns -1 if parser == NULL.
*/
XMLPARSEAPI(int)
XML_GetSpecifiedAttributeCount(XML_Parser parser);
/* Returns the index of the ID attribute passed in the last call to
XML_StartElementHandler, or -1 if there is no ID attribute or
parser == NULL. Each attribute/value pair counts as 2; thus this
correspondds to an index into the atts array passed to the
XML_StartElementHandler.
*/
XMLPARSEAPI(int)
XML_GetIdAttributeIndex(XML_Parser parser);
#ifdef XML_ATTR_INFO
/* Source file byte offsets for the start and end of attribute names and values.
The value indices are exclusive of surrounding quotes; thus in a UTF-8 source
file an attribute value of "blah" will yield:
info->valueEnd - info->valueStart = 4 bytes.
*/
typedef struct {
XML_Index nameStart; /* Offset to beginning of the attribute name. */
XML_Index nameEnd; /* Offset after the attribute name's last byte. */
XML_Index valueStart; /* Offset to beginning of the attribute value. */
XML_Index valueEnd; /* Offset after the attribute value's last byte. */
} XML_AttrInfo;
/* Returns an array of XML_AttrInfo structures for the attribute/value pairs
passed in last call to the XML_StartElementHandler that were specified
in the start-tag rather than defaulted. Each attribute/value pair counts
as 1; thus the number of entries in the array is
XML_GetSpecifiedAttributeCount(parser) / 2.
*/
XMLPARSEAPI(const XML_AttrInfo *)
XML_GetAttributeInfo(XML_Parser parser);
#endif
/* Parses some input. Returns XML_STATUS_ERROR if a fatal error is
detected. The last call to XML_Parse must have isFinal true; len
may be zero for this call (or any other).
Though the return values for these functions has always been
described as a Boolean value, the implementation, at least for the
1.95.x series, has always returned exactly one of the XML_Status
values.
*/
XMLPARSEAPI(enum XML_Status)
XML_Parse(XML_Parser parser, const char *s, int len, int isFinal);
XMLPARSEAPI(void *)
XML_GetBuffer(XML_Parser parser, int len);
XMLPARSEAPI(enum XML_Status)
XML_ParseBuffer(XML_Parser parser, int len, int isFinal);
/* Stops parsing, causing XML_Parse() or XML_ParseBuffer() to return.
Must be called from within a call-back handler, except when aborting
(resumable = 0) an already suspended parser. Some call-backs may
still follow because they would otherwise get lost. Examples:
- endElementHandler() for empty elements when stopped in
startElementHandler(),
- endNameSpaceDeclHandler() when stopped in endElementHandler(),
and possibly others.
Can be called from most handlers, including DTD related call-backs,
except when parsing an external parameter entity and resumable != 0.
Returns XML_STATUS_OK when successful, XML_STATUS_ERROR otherwise.
Possible error codes:
- XML_ERROR_SUSPENDED: when suspending an already suspended parser.
- XML_ERROR_FINISHED: when the parser has already finished.
- XML_ERROR_SUSPEND_PE: when suspending while parsing an external PE.
When resumable != 0 (true) then parsing is suspended, that is,
XML_Parse() and XML_ParseBuffer() return XML_STATUS_SUSPENDED.
Otherwise, parsing is aborted, that is, XML_Parse() and XML_ParseBuffer()
return XML_STATUS_ERROR with error code XML_ERROR_ABORTED.
*Note*:
This will be applied to the current parser instance only, that is, if
there is a parent parser then it will continue parsing when the
externalEntityRefHandler() returns. It is up to the implementation of
the externalEntityRefHandler() to call XML_StopParser() on the parent
parser (recursively), if one wants to stop parsing altogether.
When suspended, parsing can be resumed by calling XML_ResumeParser().
*/
XMLPARSEAPI(enum XML_Status)
XML_StopParser(XML_Parser parser, XML_Bool resumable);
/* Resumes parsing after it has been suspended with XML_StopParser().
Must not be called from within a handler call-back. Returns same
status codes as XML_Parse() or XML_ParseBuffer().
Additional error code XML_ERROR_NOT_SUSPENDED possible.
*Note*:
This must be called on the most deeply nested child parser instance
first, and on its parent parser only after the child parser has finished,
to be applied recursively until the document entity's parser is restarted.
That is, the parent parser will not resume by itself and it is up to the
application to call XML_ResumeParser() on it at the appropriate moment.
*/
XMLPARSEAPI(enum XML_Status)
XML_ResumeParser(XML_Parser parser);
enum XML_Parsing {
XML_INITIALIZED,
XML_PARSING,
XML_FINISHED,
XML_SUSPENDED
};
typedef struct {
enum XML_Parsing parsing;
XML_Bool finalBuffer;
} XML_ParsingStatus;
/* Returns status of parser with respect to being initialized, parsing,
finished, or suspended and processing the final buffer.
XXX XML_Parse() and XML_ParseBuffer() should return XML_ParsingStatus,
XXX with XML_FINISHED_OK or XML_FINISHED_ERROR replacing XML_FINISHED
*/
XMLPARSEAPI(void)
XML_GetParsingStatus(XML_Parser parser, XML_ParsingStatus *status);
/* Creates an XML_Parser object that can parse an external general
entity; context is a '\0'-terminated string specifying the parse
context; encoding is a '\0'-terminated string giving the name of
the externally specified encoding, or NULL if there is no
externally specified encoding. The context string consists of a
sequence of tokens separated by formfeeds (\f); a token consisting
of a name specifies that the general entity of the name is open; a
token of the form prefix=uri specifies the namespace for a
particular prefix; a token of the form =uri specifies the default
namespace. This can be called at any point after the first call to
an ExternalEntityRefHandler so longer as the parser has not yet
been freed. The new parser is completely independent and may
safely be used in a separate thread. The handlers and userData are
initialized from the parser argument. Returns NULL if out of memory.
Otherwise returns a new XML_Parser object.
*/
XMLPARSEAPI(XML_Parser)
XML_ExternalEntityParserCreate(XML_Parser parser,
const XML_Char *context,
const XML_Char *encoding);
enum XML_ParamEntityParsing {
XML_PARAM_ENTITY_PARSING_NEVER,
XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE,
XML_PARAM_ENTITY_PARSING_ALWAYS
};
/* Controls parsing of parameter entities (including the external DTD
subset). If parsing of parameter entities is enabled, then
references to external parameter entities (including the external
DTD subset) will be passed to the handler set with
XML_SetExternalEntityRefHandler. The context passed will be 0.
Unlike external general entities, external parameter entities can
only be parsed synchronously. If the external parameter entity is
to be parsed, it must be parsed during the call to the external
entity ref handler: the complete sequence of
XML_ExternalEntityParserCreate, XML_Parse/XML_ParseBuffer and
XML_ParserFree calls must be made during this call. After
XML_ExternalEntityParserCreate has been called to create the parser
for the external parameter entity (context must be 0 for this
call), it is illegal to make any calls on the old parser until
XML_ParserFree has been called on the newly created parser.
If the library has been compiled without support for parameter
entity parsing (ie without XML_DTD being defined), then
XML_SetParamEntityParsing will return 0 if parsing of parameter
entities is requested; otherwise it will return non-zero.
Note: If XML_SetParamEntityParsing is called after XML_Parse or
XML_ParseBuffer, then it has no effect and will always return 0.
Note: If parser == NULL, the function will do nothing and return 0.
*/
XMLPARSEAPI(int)
XML_SetParamEntityParsing(XML_Parser parser,
enum XML_ParamEntityParsing parsing);
/* Sets the hash salt to use for internal hash calculations.
Helps in preventing DoS attacks based on predicting hash
function behavior. This must be called before parsing is started.
Returns 1 if successful, 0 when called after parsing has started.
Note: If parser == NULL, the function will do nothing and return 0.
*/
XMLPARSEAPI(int)
XML_SetHashSalt(XML_Parser parser,
unsigned long hash_salt);
/* If XML_Parse or XML_ParseBuffer have returned XML_STATUS_ERROR, then
XML_GetErrorCode returns information about the error.
*/
XMLPARSEAPI(enum XML_Error)
XML_GetErrorCode(XML_Parser parser);
/* These functions return information about the current parse
location. They may be called from any callback called to report
some parse event; in this case the location is the location of the
first of the sequence of characters that generated the event. When
called from callbacks generated by declarations in the document
prologue, the location identified isn't as neatly defined, but will
be within the relevant markup. When called outside of the callback
functions, the position indicated will be just past the last parse
event (regardless of whether there was an associated callback).
They may also be called after returning from a call to XML_Parse
or XML_ParseBuffer. If the return value is XML_STATUS_ERROR then
the location is the location of the character at which the error
was detected; otherwise the location is the location of the last
parse event, as described above.
Note: XML_GetCurrentLineNumber and XML_GetCurrentColumnNumber
return 0 to indicate an error.
Note: XML_GetCurrentByteIndex returns -1 to indicate an error.
*/
XMLPARSEAPI(XML_Size) XML_GetCurrentLineNumber(XML_Parser parser);
XMLPARSEAPI(XML_Size) XML_GetCurrentColumnNumber(XML_Parser parser);
XMLPARSEAPI(XML_Index) XML_GetCurrentByteIndex(XML_Parser parser);
/* Return the number of bytes in the current event.
Returns 0 if the event is in an internal entity.
*/
XMLPARSEAPI(int)
XML_GetCurrentByteCount(XML_Parser parser);
/* If XML_CONTEXT_BYTES is defined, returns the input buffer, sets
the integer pointed to by offset to the offset within this buffer
of the current parse position, and sets the integer pointed to by size
to the size of this buffer (the number of input bytes). Otherwise
returns a NULL pointer. Also returns a NULL pointer if a parse isn't
active.
NOTE: The character pointer returned should not be used outside
the handler that makes the call.
*/
XMLPARSEAPI(const char *)
XML_GetInputContext(XML_Parser parser,
int *offset,
int *size);
/* For backwards compatibility with previous versions. */
#define XML_GetErrorLineNumber XML_GetCurrentLineNumber
#define XML_GetErrorColumnNumber XML_GetCurrentColumnNumber
#define XML_GetErrorByteIndex XML_GetCurrentByteIndex
/* Frees the content model passed to the element declaration handler */
XMLPARSEAPI(void)
XML_FreeContentModel(XML_Parser parser, XML_Content *model);
/* Exposing the memory handling functions used in Expat */
XMLPARSEAPI(void *)
XML_ATTR_MALLOC
XML_ATTR_ALLOC_SIZE(2)
XML_MemMalloc(XML_Parser parser, size_t size);
XMLPARSEAPI(void *)
XML_ATTR_ALLOC_SIZE(3)
XML_MemRealloc(XML_Parser parser, void *ptr, size_t size);
XMLPARSEAPI(void)
XML_MemFree(XML_Parser parser, void *ptr);
/* Frees memory used by the parser. */
XMLPARSEAPI(void)
XML_ParserFree(XML_Parser parser);
/* Returns a string describing the error. */
XMLPARSEAPI(const XML_LChar *)
XML_ErrorString(enum XML_Error code);
/* Return a string containing the version number of this expat */
XMLPARSEAPI(const XML_LChar *)
XML_ExpatVersion(void);
typedef struct {
int major;
int minor;
int micro;
} XML_Expat_Version;
/* Return an XML_Expat_Version structure containing numeric version
number information for this version of expat.
*/
XMLPARSEAPI(XML_Expat_Version)
XML_ExpatVersionInfo(void);
/* Added in Expat 1.95.5. */
enum XML_FeatureEnum {
XML_FEATURE_END = 0,
XML_FEATURE_UNICODE,
XML_FEATURE_UNICODE_WCHAR_T,
XML_FEATURE_DTD,
XML_FEATURE_CONTEXT_BYTES,
XML_FEATURE_MIN_SIZE,
XML_FEATURE_SIZEOF_XML_CHAR,
XML_FEATURE_SIZEOF_XML_LCHAR,
XML_FEATURE_NS,
XML_FEATURE_LARGE_SIZE,
XML_FEATURE_ATTR_INFO
/* Additional features must be added to the end of this enum. */
};
typedef struct {
enum XML_FeatureEnum feature;
const XML_LChar *name;
long int value;
} XML_Feature;
XMLPARSEAPI(const XML_Feature *)
XML_GetFeatureList(void);
/* Expat follows the semantic versioning convention.
See http://semver.org.
*/
#define XML_MAJOR_VERSION 2
#define XML_MINOR_VERSION 2
#define XML_MICRO_VERSION 1
#ifdef __cplusplus
}
#endif
#endif /* not Expat_INCLUDED */
hexpat-0.20.13/cbits/xmltok.c 0000644 0000000 0000000 00000127440 13122604047 014131 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#include
#ifdef _WIN32
#include "winconfig.h"
#else
#ifdef HAVE_EXPAT_CONFIG_H
#include
#endif
#endif /* ndef _WIN32 */
#include "expat_external.h"
#include "internal.h"
#include "xmltok.h"
#include "nametab.h"
#ifdef XML_DTD
#define IGNORE_SECTION_TOK_VTABLE , PREFIX(ignoreSectionTok)
#else
#define IGNORE_SECTION_TOK_VTABLE /* as nothing */
#endif
#define VTABLE1 \
{ PREFIX(prologTok), PREFIX(contentTok), \
PREFIX(cdataSectionTok) IGNORE_SECTION_TOK_VTABLE }, \
{ PREFIX(attributeValueTok), PREFIX(entityValueTok) }, \
PREFIX(sameName), \
PREFIX(nameMatchesAscii), \
PREFIX(nameLength), \
PREFIX(skipS), \
PREFIX(getAtts), \
PREFIX(charRefNumber), \
PREFIX(predefinedEntityName), \
PREFIX(updatePosition), \
PREFIX(isPublicId)
#define VTABLE VTABLE1, PREFIX(toUtf8), PREFIX(toUtf16)
#define UCS2_GET_NAMING(pages, hi, lo) \
(namingBitmap[(pages[hi] << 3) + ((lo) >> 5)] & (1u << ((lo) & 0x1F)))
/* A 2 byte UTF-8 representation splits the characters 11 bits between
the bottom 5 and 6 bits of the bytes. We need 8 bits to index into
pages, 3 bits to add to that index and 5 bits to generate the mask.
*/
#define UTF8_GET_NAMING2(pages, byte) \
(namingBitmap[((pages)[(((byte)[0]) >> 2) & 7] << 3) \
+ ((((byte)[0]) & 3) << 1) \
+ ((((byte)[1]) >> 5) & 1)] \
& (1u << (((byte)[1]) & 0x1F)))
/* A 3 byte UTF-8 representation splits the characters 16 bits between
the bottom 4, 6 and 6 bits of the bytes. We need 8 bits to index
into pages, 3 bits to add to that index and 5 bits to generate the
mask.
*/
#define UTF8_GET_NAMING3(pages, byte) \
(namingBitmap[((pages)[((((byte)[0]) & 0xF) << 4) \
+ ((((byte)[1]) >> 2) & 0xF)] \
<< 3) \
+ ((((byte)[1]) & 3) << 1) \
+ ((((byte)[2]) >> 5) & 1)] \
& (1u << (((byte)[2]) & 0x1F)))
#define UTF8_GET_NAMING(pages, p, n) \
((n) == 2 \
? UTF8_GET_NAMING2(pages, (const unsigned char *)(p)) \
: ((n) == 3 \
? UTF8_GET_NAMING3(pages, (const unsigned char *)(p)) \
: 0))
/* Detection of invalid UTF-8 sequences is based on Table 3.1B
of Unicode 3.2: http://www.unicode.org/unicode/reports/tr28/
with the additional restriction of not allowing the Unicode
code points 0xFFFF and 0xFFFE (sequences EF,BF,BF and EF,BF,BE).
Implementation details:
(A & 0x80) == 0 means A < 0x80
and
(A & 0xC0) == 0xC0 means A > 0xBF
*/
#define UTF8_INVALID2(p) \
((*p) < 0xC2 || ((p)[1] & 0x80) == 0 || ((p)[1] & 0xC0) == 0xC0)
#define UTF8_INVALID3(p) \
(((p)[2] & 0x80) == 0 \
|| \
((*p) == 0xEF && (p)[1] == 0xBF \
? \
(p)[2] > 0xBD \
: \
((p)[2] & 0xC0) == 0xC0) \
|| \
((*p) == 0xE0 \
? \
(p)[1] < 0xA0 || ((p)[1] & 0xC0) == 0xC0 \
: \
((p)[1] & 0x80) == 0 \
|| \
((*p) == 0xED ? (p)[1] > 0x9F : ((p)[1] & 0xC0) == 0xC0)))
#define UTF8_INVALID4(p) \
(((p)[3] & 0x80) == 0 || ((p)[3] & 0xC0) == 0xC0 \
|| \
((p)[2] & 0x80) == 0 || ((p)[2] & 0xC0) == 0xC0 \
|| \
((*p) == 0xF0 \
? \
(p)[1] < 0x90 || ((p)[1] & 0xC0) == 0xC0 \
: \
((p)[1] & 0x80) == 0 \
|| \
((*p) == 0xF4 ? (p)[1] > 0x8F : ((p)[1] & 0xC0) == 0xC0)))
static int PTRFASTCALL
isNever(const ENCODING *UNUSED_P(enc), const char *UNUSED_P(p))
{
return 0;
}
static int PTRFASTCALL
utf8_isName2(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_GET_NAMING2(namePages, (const unsigned char *)p);
}
static int PTRFASTCALL
utf8_isName3(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_GET_NAMING3(namePages, (const unsigned char *)p);
}
#define utf8_isName4 isNever
static int PTRFASTCALL
utf8_isNmstrt2(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_GET_NAMING2(nmstrtPages, (const unsigned char *)p);
}
static int PTRFASTCALL
utf8_isNmstrt3(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_GET_NAMING3(nmstrtPages, (const unsigned char *)p);
}
#define utf8_isNmstrt4 isNever
static int PTRFASTCALL
utf8_isInvalid2(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_INVALID2((const unsigned char *)p);
}
static int PTRFASTCALL
utf8_isInvalid3(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_INVALID3((const unsigned char *)p);
}
static int PTRFASTCALL
utf8_isInvalid4(const ENCODING *UNUSED_P(enc), const char *p)
{
return UTF8_INVALID4((const unsigned char *)p);
}
struct normal_encoding {
ENCODING enc;
unsigned char type[256];
#ifdef XML_MIN_SIZE
int (PTRFASTCALL *byteType)(const ENCODING *, const char *);
int (PTRFASTCALL *isNameMin)(const ENCODING *, const char *);
int (PTRFASTCALL *isNmstrtMin)(const ENCODING *, const char *);
int (PTRFASTCALL *byteToAscii)(const ENCODING *, const char *);
int (PTRCALL *charMatches)(const ENCODING *, const char *, int);
#endif /* XML_MIN_SIZE */
int (PTRFASTCALL *isName2)(const ENCODING *, const char *);
int (PTRFASTCALL *isName3)(const ENCODING *, const char *);
int (PTRFASTCALL *isName4)(const ENCODING *, const char *);
int (PTRFASTCALL *isNmstrt2)(const ENCODING *, const char *);
int (PTRFASTCALL *isNmstrt3)(const ENCODING *, const char *);
int (PTRFASTCALL *isNmstrt4)(const ENCODING *, const char *);
int (PTRFASTCALL *isInvalid2)(const ENCODING *, const char *);
int (PTRFASTCALL *isInvalid3)(const ENCODING *, const char *);
int (PTRFASTCALL *isInvalid4)(const ENCODING *, const char *);
};
#define AS_NORMAL_ENCODING(enc) ((const struct normal_encoding *) (enc))
#ifdef XML_MIN_SIZE
#define STANDARD_VTABLE(E) \
E ## byteType, \
E ## isNameMin, \
E ## isNmstrtMin, \
E ## byteToAscii, \
E ## charMatches,
#else
#define STANDARD_VTABLE(E) /* as nothing */
#endif
#define NORMAL_VTABLE(E) \
E ## isName2, \
E ## isName3, \
E ## isName4, \
E ## isNmstrt2, \
E ## isNmstrt3, \
E ## isNmstrt4, \
E ## isInvalid2, \
E ## isInvalid3, \
E ## isInvalid4
#define NULL_VTABLE \
/* isName2 */ NULL, \
/* isName3 */ NULL, \
/* isName4 */ NULL, \
/* isNmstrt2 */ NULL, \
/* isNmstrt3 */ NULL, \
/* isNmstrt4 */ NULL, \
/* isInvalid2 */ NULL, \
/* isInvalid3 */ NULL, \
/* isInvalid4 */ NULL
static int FASTCALL checkCharRefNumber(int);
#include "xmltok_impl.h"
#include "ascii.h"
#ifdef XML_MIN_SIZE
#define sb_isNameMin isNever
#define sb_isNmstrtMin isNever
#endif
#ifdef XML_MIN_SIZE
#define MINBPC(enc) ((enc)->minBytesPerChar)
#else
/* minimum bytes per character */
#define MINBPC(enc) 1
#endif
#define SB_BYTE_TYPE(enc, p) \
(((struct normal_encoding *)(enc))->type[(unsigned char)*(p)])
#ifdef XML_MIN_SIZE
static int PTRFASTCALL
sb_byteType(const ENCODING *enc, const char *p)
{
return SB_BYTE_TYPE(enc, p);
}
#define BYTE_TYPE(enc, p) \
(AS_NORMAL_ENCODING(enc)->byteType(enc, p))
#else
#define BYTE_TYPE(enc, p) SB_BYTE_TYPE(enc, p)
#endif
#ifdef XML_MIN_SIZE
#define BYTE_TO_ASCII(enc, p) \
(AS_NORMAL_ENCODING(enc)->byteToAscii(enc, p))
static int PTRFASTCALL
sb_byteToAscii(const ENCODING *enc, const char *p)
{
return *p;
}
#else
#define BYTE_TO_ASCII(enc, p) (*(p))
#endif
#define IS_NAME_CHAR(enc, p, n) \
(AS_NORMAL_ENCODING(enc)->isName ## n(enc, p))
#define IS_NMSTRT_CHAR(enc, p, n) \
(AS_NORMAL_ENCODING(enc)->isNmstrt ## n(enc, p))
#define IS_INVALID_CHAR(enc, p, n) \
(AS_NORMAL_ENCODING(enc)->isInvalid ## n(enc, p))
#ifdef XML_MIN_SIZE
#define IS_NAME_CHAR_MINBPC(enc, p) \
(AS_NORMAL_ENCODING(enc)->isNameMin(enc, p))
#define IS_NMSTRT_CHAR_MINBPC(enc, p) \
(AS_NORMAL_ENCODING(enc)->isNmstrtMin(enc, p))
#else
#define IS_NAME_CHAR_MINBPC(enc, p) (0)
#define IS_NMSTRT_CHAR_MINBPC(enc, p) (0)
#endif
#ifdef XML_MIN_SIZE
#define CHAR_MATCHES(enc, p, c) \
(AS_NORMAL_ENCODING(enc)->charMatches(enc, p, c))
static int PTRCALL
sb_charMatches(const ENCODING *enc, const char *p, int c)
{
return *p == c;
}
#else
/* c is an ASCII character */
#define CHAR_MATCHES(enc, p, c) (*(p) == c)
#endif
#define PREFIX(ident) normal_ ## ident
#define XML_TOK_IMPL_C
#include "xmltok_impl.c"
#undef XML_TOK_IMPL_C
#undef MINBPC
#undef BYTE_TYPE
#undef BYTE_TO_ASCII
#undef CHAR_MATCHES
#undef IS_NAME_CHAR
#undef IS_NAME_CHAR_MINBPC
#undef IS_NMSTRT_CHAR
#undef IS_NMSTRT_CHAR_MINBPC
#undef IS_INVALID_CHAR
enum { /* UTF8_cvalN is value of masked first byte of N byte sequence */
UTF8_cval1 = 0x00,
UTF8_cval2 = 0xc0,
UTF8_cval3 = 0xe0,
UTF8_cval4 = 0xf0
};
void
align_limit_to_full_utf8_characters(const char * from, const char ** fromLimRef)
{
const char * fromLim = *fromLimRef;
size_t walked = 0;
for (; fromLim > from; fromLim--, walked++) {
const unsigned char prev = (unsigned char)fromLim[-1];
if ((prev & 0xf8u) == 0xf0u) { /* 4-byte character, lead by 0b11110xxx byte */
if (walked + 1 >= 4) {
fromLim += 4 - 1;
break;
} else {
walked = 0;
}
} else if ((prev & 0xf0u) == 0xe0u) { /* 3-byte character, lead by 0b1110xxxx byte */
if (walked + 1 >= 3) {
fromLim += 3 - 1;
break;
} else {
walked = 0;
}
} else if ((prev & 0xe0u) == 0xc0u) { /* 2-byte character, lead by 0b110xxxxx byte */
if (walked + 1 >= 2) {
fromLim += 2 - 1;
break;
} else {
walked = 0;
}
} else if ((prev & 0x80u) == 0x00u) { /* 1-byte character, matching 0b0xxxxxxx */
break;
}
}
*fromLimRef = fromLim;
}
static enum XML_Convert_Result PTRCALL
utf8_toUtf8(const ENCODING *UNUSED_P(enc),
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
char *to;
const char *from;
const char *fromLimInitial = fromLim;
/* Avoid copying partial characters. */
align_limit_to_full_utf8_characters(*fromP, &fromLim);
for (to = *toP, from = *fromP; (from < fromLim) && (to < toLim); from++, to++)
*to = *from;
*fromP = from;
*toP = to;
if (fromLim < fromLimInitial)
return XML_CONVERT_INPUT_INCOMPLETE;
else if ((to == toLim) && (from < fromLim))
return XML_CONVERT_OUTPUT_EXHAUSTED;
else
return XML_CONVERT_COMPLETED;
}
static enum XML_Convert_Result PTRCALL
utf8_toUtf16(const ENCODING *enc,
const char **fromP, const char *fromLim,
unsigned short **toP, const unsigned short *toLim)
{
enum XML_Convert_Result res = XML_CONVERT_COMPLETED;
unsigned short *to = *toP;
const char *from = *fromP;
while (from < fromLim && to < toLim) {
switch (((struct normal_encoding *)enc)->type[(unsigned char)*from]) {
case BT_LEAD2:
if (fromLim - from < 2) {
res = XML_CONVERT_INPUT_INCOMPLETE;
goto after;
}
*to++ = (unsigned short)(((from[0] & 0x1f) << 6) | (from[1] & 0x3f));
from += 2;
break;
case BT_LEAD3:
if (fromLim - from < 3) {
res = XML_CONVERT_INPUT_INCOMPLETE;
goto after;
}
*to++ = (unsigned short)(((from[0] & 0xf) << 12)
| ((from[1] & 0x3f) << 6) | (from[2] & 0x3f));
from += 3;
break;
case BT_LEAD4:
{
unsigned long n;
if (toLim - to < 2) {
res = XML_CONVERT_OUTPUT_EXHAUSTED;
goto after;
}
if (fromLim - from < 4) {
res = XML_CONVERT_INPUT_INCOMPLETE;
goto after;
}
n = ((from[0] & 0x7) << 18) | ((from[1] & 0x3f) << 12)
| ((from[2] & 0x3f) << 6) | (from[3] & 0x3f);
n -= 0x10000;
to[0] = (unsigned short)((n >> 10) | 0xD800);
to[1] = (unsigned short)((n & 0x3FF) | 0xDC00);
to += 2;
from += 4;
}
break;
default:
*to++ = *from++;
break;
}
}
if (from < fromLim)
res = XML_CONVERT_OUTPUT_EXHAUSTED;
after:
*fromP = from;
*toP = to;
return res;
}
#ifdef XML_NS
static const struct normal_encoding utf8_encoding_ns = {
{ VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 },
{
#include "asciitab.h"
#include "utf8tab.h"
},
STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_)
};
#endif
static const struct normal_encoding utf8_encoding = {
{ VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 },
{
#define BT_COLON BT_NMSTRT
#include "asciitab.h"
#undef BT_COLON
#include "utf8tab.h"
},
STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_)
};
#ifdef XML_NS
static const struct normal_encoding internal_utf8_encoding_ns = {
{ VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 },
{
#include "iasciitab.h"
#include "utf8tab.h"
},
STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_)
};
#endif
static const struct normal_encoding internal_utf8_encoding = {
{ VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 },
{
#define BT_COLON BT_NMSTRT
#include "iasciitab.h"
#undef BT_COLON
#include "utf8tab.h"
},
STANDARD_VTABLE(sb_) NORMAL_VTABLE(utf8_)
};
static enum XML_Convert_Result PTRCALL
latin1_toUtf8(const ENCODING *UNUSED_P(enc),
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
for (;;) {
unsigned char c;
if (*fromP == fromLim)
return XML_CONVERT_COMPLETED;
c = (unsigned char)**fromP;
if (c & 0x80) {
if (toLim - *toP < 2)
return XML_CONVERT_OUTPUT_EXHAUSTED;
*(*toP)++ = (char)((c >> 6) | UTF8_cval2);
*(*toP)++ = (char)((c & 0x3f) | 0x80);
(*fromP)++;
}
else {
if (*toP == toLim)
return XML_CONVERT_OUTPUT_EXHAUSTED;
*(*toP)++ = *(*fromP)++;
}
}
}
static enum XML_Convert_Result PTRCALL
latin1_toUtf16(const ENCODING *UNUSED_P(enc),
const char **fromP, const char *fromLim,
unsigned short **toP, const unsigned short *toLim)
{
while (*fromP < fromLim && *toP < toLim)
*(*toP)++ = (unsigned char)*(*fromP)++;
if ((*toP == toLim) && (*fromP < fromLim))
return XML_CONVERT_OUTPUT_EXHAUSTED;
else
return XML_CONVERT_COMPLETED;
}
#ifdef XML_NS
static const struct normal_encoding latin1_encoding_ns = {
{ VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 },
{
#include "asciitab.h"
#include "latin1tab.h"
},
STANDARD_VTABLE(sb_) NULL_VTABLE
};
#endif
static const struct normal_encoding latin1_encoding = {
{ VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 },
{
#define BT_COLON BT_NMSTRT
#include "asciitab.h"
#undef BT_COLON
#include "latin1tab.h"
},
STANDARD_VTABLE(sb_) NULL_VTABLE
};
static enum XML_Convert_Result PTRCALL
ascii_toUtf8(const ENCODING *UNUSED_P(enc),
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
while (*fromP < fromLim && *toP < toLim)
*(*toP)++ = *(*fromP)++;
if ((*toP == toLim) && (*fromP < fromLim))
return XML_CONVERT_OUTPUT_EXHAUSTED;
else
return XML_CONVERT_COMPLETED;
}
#ifdef XML_NS
static const struct normal_encoding ascii_encoding_ns = {
{ VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 },
{
#include "asciitab.h"
/* BT_NONXML == 0 */
},
STANDARD_VTABLE(sb_) NULL_VTABLE
};
#endif
static const struct normal_encoding ascii_encoding = {
{ VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 },
{
#define BT_COLON BT_NMSTRT
#include "asciitab.h"
#undef BT_COLON
/* BT_NONXML == 0 */
},
STANDARD_VTABLE(sb_) NULL_VTABLE
};
static int PTRFASTCALL
unicode_byte_type(char hi, char lo)
{
switch ((unsigned char)hi) {
case 0xD8: case 0xD9: case 0xDA: case 0xDB:
return BT_LEAD4;
case 0xDC: case 0xDD: case 0xDE: case 0xDF:
return BT_TRAIL;
case 0xFF:
switch ((unsigned char)lo) {
case 0xFF:
case 0xFE:
return BT_NONXML;
}
break;
}
return BT_NONASCII;
}
#define DEFINE_UTF16_TO_UTF8(E) \
static enum XML_Convert_Result PTRCALL \
E ## toUtf8(const ENCODING *UNUSED_P(enc), \
const char **fromP, const char *fromLim, \
char **toP, const char *toLim) \
{ \
const char *from = *fromP; \
fromLim = from + (((fromLim - from) >> 1) << 1); /* shrink to even */ \
for (; from < fromLim; from += 2) { \
int plane; \
unsigned char lo2; \
unsigned char lo = GET_LO(from); \
unsigned char hi = GET_HI(from); \
switch (hi) { \
case 0: \
if (lo < 0x80) { \
if (*toP == toLim) { \
*fromP = from; \
return XML_CONVERT_OUTPUT_EXHAUSTED; \
} \
*(*toP)++ = lo; \
break; \
} \
/* fall through */ \
case 0x1: case 0x2: case 0x3: \
case 0x4: case 0x5: case 0x6: case 0x7: \
if (toLim - *toP < 2) { \
*fromP = from; \
return XML_CONVERT_OUTPUT_EXHAUSTED; \
} \
*(*toP)++ = ((lo >> 6) | (hi << 2) | UTF8_cval2); \
*(*toP)++ = ((lo & 0x3f) | 0x80); \
break; \
default: \
if (toLim - *toP < 3) { \
*fromP = from; \
return XML_CONVERT_OUTPUT_EXHAUSTED; \
} \
/* 16 bits divided 4, 6, 6 amongst 3 bytes */ \
*(*toP)++ = ((hi >> 4) | UTF8_cval3); \
*(*toP)++ = (((hi & 0xf) << 2) | (lo >> 6) | 0x80); \
*(*toP)++ = ((lo & 0x3f) | 0x80); \
break; \
case 0xD8: case 0xD9: case 0xDA: case 0xDB: \
if (toLim - *toP < 4) { \
*fromP = from; \
return XML_CONVERT_OUTPUT_EXHAUSTED; \
} \
if (fromLim - from < 4) { \
*fromP = from; \
return XML_CONVERT_INPUT_INCOMPLETE; \
} \
plane = (((hi & 0x3) << 2) | ((lo >> 6) & 0x3)) + 1; \
*(*toP)++ = ((plane >> 2) | UTF8_cval4); \
*(*toP)++ = (((lo >> 2) & 0xF) | ((plane & 0x3) << 4) | 0x80); \
from += 2; \
lo2 = GET_LO(from); \
*(*toP)++ = (((lo & 0x3) << 4) \
| ((GET_HI(from) & 0x3) << 2) \
| (lo2 >> 6) \
| 0x80); \
*(*toP)++ = ((lo2 & 0x3f) | 0x80); \
break; \
} \
} \
*fromP = from; \
if (from < fromLim) \
return XML_CONVERT_INPUT_INCOMPLETE; \
else \
return XML_CONVERT_COMPLETED; \
}
#define DEFINE_UTF16_TO_UTF16(E) \
static enum XML_Convert_Result PTRCALL \
E ## toUtf16(const ENCODING *UNUSED_P(enc), \
const char **fromP, const char *fromLim, \
unsigned short **toP, const unsigned short *toLim) \
{ \
enum XML_Convert_Result res = XML_CONVERT_COMPLETED; \
fromLim = *fromP + (((fromLim - *fromP) >> 1) << 1); /* shrink to even */ \
/* Avoid copying first half only of surrogate */ \
if (fromLim - *fromP > ((toLim - *toP) << 1) \
&& (GET_HI(fromLim - 2) & 0xF8) == 0xD8) { \
fromLim -= 2; \
res = XML_CONVERT_INPUT_INCOMPLETE; \
} \
for (; *fromP < fromLim && *toP < toLim; *fromP += 2) \
*(*toP)++ = (GET_HI(*fromP) << 8) | GET_LO(*fromP); \
if ((*toP == toLim) && (*fromP < fromLim)) \
return XML_CONVERT_OUTPUT_EXHAUSTED; \
else \
return res; \
}
#define SET2(ptr, ch) \
(((ptr)[0] = ((ch) & 0xff)), ((ptr)[1] = ((ch) >> 8)))
#define GET_LO(ptr) ((unsigned char)(ptr)[0])
#define GET_HI(ptr) ((unsigned char)(ptr)[1])
DEFINE_UTF16_TO_UTF8(little2_)
DEFINE_UTF16_TO_UTF16(little2_)
#undef SET2
#undef GET_LO
#undef GET_HI
#define SET2(ptr, ch) \
(((ptr)[0] = ((ch) >> 8)), ((ptr)[1] = ((ch) & 0xFF)))
#define GET_LO(ptr) ((unsigned char)(ptr)[1])
#define GET_HI(ptr) ((unsigned char)(ptr)[0])
DEFINE_UTF16_TO_UTF8(big2_)
DEFINE_UTF16_TO_UTF16(big2_)
#undef SET2
#undef GET_LO
#undef GET_HI
#define LITTLE2_BYTE_TYPE(enc, p) \
((p)[1] == 0 \
? ((struct normal_encoding *)(enc))->type[(unsigned char)*(p)] \
: unicode_byte_type((p)[1], (p)[0]))
#define LITTLE2_BYTE_TO_ASCII(enc, p) ((p)[1] == 0 ? (p)[0] : -1)
#define LITTLE2_CHAR_MATCHES(enc, p, c) ((p)[1] == 0 && (p)[0] == c)
#define LITTLE2_IS_NAME_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(namePages, (unsigned char)p[1], (unsigned char)p[0])
#define LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[1], (unsigned char)p[0])
#ifdef XML_MIN_SIZE
static int PTRFASTCALL
little2_byteType(const ENCODING *enc, const char *p)
{
return LITTLE2_BYTE_TYPE(enc, p);
}
static int PTRFASTCALL
little2_byteToAscii(const ENCODING *enc, const char *p)
{
return LITTLE2_BYTE_TO_ASCII(enc, p);
}
static int PTRCALL
little2_charMatches(const ENCODING *enc, const char *p, int c)
{
return LITTLE2_CHAR_MATCHES(enc, p, c);
}
static int PTRFASTCALL
little2_isNameMin(const ENCODING *enc, const char *p)
{
return LITTLE2_IS_NAME_CHAR_MINBPC(enc, p);
}
static int PTRFASTCALL
little2_isNmstrtMin(const ENCODING *enc, const char *p)
{
return LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p);
}
#undef VTABLE
#define VTABLE VTABLE1, little2_toUtf8, little2_toUtf16
#else /* not XML_MIN_SIZE */
#undef PREFIX
#define PREFIX(ident) little2_ ## ident
#define MINBPC(enc) 2
/* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */
#define BYTE_TYPE(enc, p) LITTLE2_BYTE_TYPE(enc, p)
#define BYTE_TO_ASCII(enc, p) LITTLE2_BYTE_TO_ASCII(enc, p)
#define CHAR_MATCHES(enc, p, c) LITTLE2_CHAR_MATCHES(enc, p, c)
#define IS_NAME_CHAR(enc, p, n) 0
#define IS_NAME_CHAR_MINBPC(enc, p) LITTLE2_IS_NAME_CHAR_MINBPC(enc, p)
#define IS_NMSTRT_CHAR(enc, p, n) (0)
#define IS_NMSTRT_CHAR_MINBPC(enc, p) LITTLE2_IS_NMSTRT_CHAR_MINBPC(enc, p)
#define XML_TOK_IMPL_C
#include "xmltok_impl.c"
#undef XML_TOK_IMPL_C
#undef MINBPC
#undef BYTE_TYPE
#undef BYTE_TO_ASCII
#undef CHAR_MATCHES
#undef IS_NAME_CHAR
#undef IS_NAME_CHAR_MINBPC
#undef IS_NMSTRT_CHAR
#undef IS_NMSTRT_CHAR_MINBPC
#undef IS_INVALID_CHAR
#endif /* not XML_MIN_SIZE */
#ifdef XML_NS
static const struct normal_encoding little2_encoding_ns = {
{ VTABLE, 2, 0,
#if BYTEORDER == 1234
1
#else
0
#endif
},
{
#include "asciitab.h"
#include "latin1tab.h"
},
STANDARD_VTABLE(little2_) NULL_VTABLE
};
#endif
static const struct normal_encoding little2_encoding = {
{ VTABLE, 2, 0,
#if BYTEORDER == 1234
1
#else
0
#endif
},
{
#define BT_COLON BT_NMSTRT
#include "asciitab.h"
#undef BT_COLON
#include "latin1tab.h"
},
STANDARD_VTABLE(little2_) NULL_VTABLE
};
#if BYTEORDER != 4321
#ifdef XML_NS
static const struct normal_encoding internal_little2_encoding_ns = {
{ VTABLE, 2, 0, 1 },
{
#include "iasciitab.h"
#include "latin1tab.h"
},
STANDARD_VTABLE(little2_) NULL_VTABLE
};
#endif
static const struct normal_encoding internal_little2_encoding = {
{ VTABLE, 2, 0, 1 },
{
#define BT_COLON BT_NMSTRT
#include "iasciitab.h"
#undef BT_COLON
#include "latin1tab.h"
},
STANDARD_VTABLE(little2_) NULL_VTABLE
};
#endif
#define BIG2_BYTE_TYPE(enc, p) \
((p)[0] == 0 \
? ((struct normal_encoding *)(enc))->type[(unsigned char)(p)[1]] \
: unicode_byte_type((p)[0], (p)[1]))
#define BIG2_BYTE_TO_ASCII(enc, p) ((p)[0] == 0 ? (p)[1] : -1)
#define BIG2_CHAR_MATCHES(enc, p, c) ((p)[0] == 0 && (p)[1] == c)
#define BIG2_IS_NAME_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(namePages, (unsigned char)p[0], (unsigned char)p[1])
#define BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p) \
UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[0], (unsigned char)p[1])
#ifdef XML_MIN_SIZE
static int PTRFASTCALL
big2_byteType(const ENCODING *enc, const char *p)
{
return BIG2_BYTE_TYPE(enc, p);
}
static int PTRFASTCALL
big2_byteToAscii(const ENCODING *enc, const char *p)
{
return BIG2_BYTE_TO_ASCII(enc, p);
}
static int PTRCALL
big2_charMatches(const ENCODING *enc, const char *p, int c)
{
return BIG2_CHAR_MATCHES(enc, p, c);
}
static int PTRFASTCALL
big2_isNameMin(const ENCODING *enc, const char *p)
{
return BIG2_IS_NAME_CHAR_MINBPC(enc, p);
}
static int PTRFASTCALL
big2_isNmstrtMin(const ENCODING *enc, const char *p)
{
return BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p);
}
#undef VTABLE
#define VTABLE VTABLE1, big2_toUtf8, big2_toUtf16
#else /* not XML_MIN_SIZE */
#undef PREFIX
#define PREFIX(ident) big2_ ## ident
#define MINBPC(enc) 2
/* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */
#define BYTE_TYPE(enc, p) BIG2_BYTE_TYPE(enc, p)
#define BYTE_TO_ASCII(enc, p) BIG2_BYTE_TO_ASCII(enc, p)
#define CHAR_MATCHES(enc, p, c) BIG2_CHAR_MATCHES(enc, p, c)
#define IS_NAME_CHAR(enc, p, n) 0
#define IS_NAME_CHAR_MINBPC(enc, p) BIG2_IS_NAME_CHAR_MINBPC(enc, p)
#define IS_NMSTRT_CHAR(enc, p, n) (0)
#define IS_NMSTRT_CHAR_MINBPC(enc, p) BIG2_IS_NMSTRT_CHAR_MINBPC(enc, p)
#define XML_TOK_IMPL_C
#include "xmltok_impl.c"
#undef XML_TOK_IMPL_C
#undef MINBPC
#undef BYTE_TYPE
#undef BYTE_TO_ASCII
#undef CHAR_MATCHES
#undef IS_NAME_CHAR
#undef IS_NAME_CHAR_MINBPC
#undef IS_NMSTRT_CHAR
#undef IS_NMSTRT_CHAR_MINBPC
#undef IS_INVALID_CHAR
#endif /* not XML_MIN_SIZE */
#ifdef XML_NS
static const struct normal_encoding big2_encoding_ns = {
{ VTABLE, 2, 0,
#if BYTEORDER == 4321
1
#else
0
#endif
},
{
#include "asciitab.h"
#include "latin1tab.h"
},
STANDARD_VTABLE(big2_) NULL_VTABLE
};
#endif
static const struct normal_encoding big2_encoding = {
{ VTABLE, 2, 0,
#if BYTEORDER == 4321
1
#else
0
#endif
},
{
#define BT_COLON BT_NMSTRT
#include "asciitab.h"
#undef BT_COLON
#include "latin1tab.h"
},
STANDARD_VTABLE(big2_) NULL_VTABLE
};
#if BYTEORDER != 1234
#ifdef XML_NS
static const struct normal_encoding internal_big2_encoding_ns = {
{ VTABLE, 2, 0, 1 },
{
#include "iasciitab.h"
#include "latin1tab.h"
},
STANDARD_VTABLE(big2_) NULL_VTABLE
};
#endif
static const struct normal_encoding internal_big2_encoding = {
{ VTABLE, 2, 0, 1 },
{
#define BT_COLON BT_NMSTRT
#include "iasciitab.h"
#undef BT_COLON
#include "latin1tab.h"
},
STANDARD_VTABLE(big2_) NULL_VTABLE
};
#endif
#undef PREFIX
static int FASTCALL
streqci(const char *s1, const char *s2)
{
for (;;) {
char c1 = *s1++;
char c2 = *s2++;
if (ASCII_a <= c1 && c1 <= ASCII_z)
c1 += ASCII_A - ASCII_a;
if (ASCII_a <= c2 && c2 <= ASCII_z)
c2 += ASCII_A - ASCII_a;
if (c1 != c2)
return 0;
if (!c1)
break;
}
return 1;
}
static void PTRCALL
initUpdatePosition(const ENCODING *UNUSED_P(enc), const char *ptr,
const char *end, POSITION *pos)
{
normal_updatePosition(&utf8_encoding.enc, ptr, end, pos);
}
static int
toAscii(const ENCODING *enc, const char *ptr, const char *end)
{
char buf[1];
char *p = buf;
XmlUtf8Convert(enc, &ptr, end, &p, p + 1);
if (p == buf)
return -1;
else
return buf[0];
}
static int FASTCALL
isSpace(int c)
{
switch (c) {
case 0x20:
case 0xD:
case 0xA:
case 0x9:
return 1;
}
return 0;
}
/* Return 1 if there's just optional white space or there's an S
followed by name=val.
*/
static int
parsePseudoAttribute(const ENCODING *enc,
const char *ptr,
const char *end,
const char **namePtr,
const char **nameEndPtr,
const char **valPtr,
const char **nextTokPtr)
{
int c;
char open;
if (ptr == end) {
*namePtr = NULL;
return 1;
}
if (!isSpace(toAscii(enc, ptr, end))) {
*nextTokPtr = ptr;
return 0;
}
do {
ptr += enc->minBytesPerChar;
} while (isSpace(toAscii(enc, ptr, end)));
if (ptr == end) {
*namePtr = NULL;
return 1;
}
*namePtr = ptr;
for (;;) {
c = toAscii(enc, ptr, end);
if (c == -1) {
*nextTokPtr = ptr;
return 0;
}
if (c == ASCII_EQUALS) {
*nameEndPtr = ptr;
break;
}
if (isSpace(c)) {
*nameEndPtr = ptr;
do {
ptr += enc->minBytesPerChar;
} while (isSpace(c = toAscii(enc, ptr, end)));
if (c != ASCII_EQUALS) {
*nextTokPtr = ptr;
return 0;
}
break;
}
ptr += enc->minBytesPerChar;
}
if (ptr == *namePtr) {
*nextTokPtr = ptr;
return 0;
}
ptr += enc->minBytesPerChar;
c = toAscii(enc, ptr, end);
while (isSpace(c)) {
ptr += enc->minBytesPerChar;
c = toAscii(enc, ptr, end);
}
if (c != ASCII_QUOT && c != ASCII_APOS) {
*nextTokPtr = ptr;
return 0;
}
open = (char)c;
ptr += enc->minBytesPerChar;
*valPtr = ptr;
for (;; ptr += enc->minBytesPerChar) {
c = toAscii(enc, ptr, end);
if (c == open)
break;
if (!(ASCII_a <= c && c <= ASCII_z)
&& !(ASCII_A <= c && c <= ASCII_Z)
&& !(ASCII_0 <= c && c <= ASCII_9)
&& c != ASCII_PERIOD
&& c != ASCII_MINUS
&& c != ASCII_UNDERSCORE) {
*nextTokPtr = ptr;
return 0;
}
}
*nextTokPtr = ptr + enc->minBytesPerChar;
return 1;
}
static const char KW_version[] = {
ASCII_v, ASCII_e, ASCII_r, ASCII_s, ASCII_i, ASCII_o, ASCII_n, '\0'
};
static const char KW_encoding[] = {
ASCII_e, ASCII_n, ASCII_c, ASCII_o, ASCII_d, ASCII_i, ASCII_n, ASCII_g, '\0'
};
static const char KW_standalone[] = {
ASCII_s, ASCII_t, ASCII_a, ASCII_n, ASCII_d, ASCII_a, ASCII_l, ASCII_o,
ASCII_n, ASCII_e, '\0'
};
static const char KW_yes[] = {
ASCII_y, ASCII_e, ASCII_s, '\0'
};
static const char KW_no[] = {
ASCII_n, ASCII_o, '\0'
};
static int
doParseXmlDecl(const ENCODING *(*encodingFinder)(const ENCODING *,
const char *,
const char *),
int isGeneralTextEntity,
const ENCODING *enc,
const char *ptr,
const char *end,
const char **badPtr,
const char **versionPtr,
const char **versionEndPtr,
const char **encodingName,
const ENCODING **encoding,
int *standalone)
{
const char *val = NULL;
const char *name = NULL;
const char *nameEnd = NULL;
ptr += 5 * enc->minBytesPerChar;
end -= 2 * enc->minBytesPerChar;
if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)
|| !name) {
*badPtr = ptr;
return 0;
}
if (!XmlNameMatchesAscii(enc, name, nameEnd, KW_version)) {
if (!isGeneralTextEntity) {
*badPtr = name;
return 0;
}
}
else {
if (versionPtr)
*versionPtr = val;
if (versionEndPtr)
*versionEndPtr = ptr;
if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)) {
*badPtr = ptr;
return 0;
}
if (!name) {
if (isGeneralTextEntity) {
/* a TextDecl must have an EncodingDecl */
*badPtr = ptr;
return 0;
}
return 1;
}
}
if (XmlNameMatchesAscii(enc, name, nameEnd, KW_encoding)) {
int c = toAscii(enc, val, end);
if (!(ASCII_a <= c && c <= ASCII_z) && !(ASCII_A <= c && c <= ASCII_Z)) {
*badPtr = val;
return 0;
}
if (encodingName)
*encodingName = val;
if (encoding)
*encoding = encodingFinder(enc, val, ptr - enc->minBytesPerChar);
if (!parsePseudoAttribute(enc, ptr, end, &name, &nameEnd, &val, &ptr)) {
*badPtr = ptr;
return 0;
}
if (!name)
return 1;
}
if (!XmlNameMatchesAscii(enc, name, nameEnd, KW_standalone)
|| isGeneralTextEntity) {
*badPtr = name;
return 0;
}
if (XmlNameMatchesAscii(enc, val, ptr - enc->minBytesPerChar, KW_yes)) {
if (standalone)
*standalone = 1;
}
else if (XmlNameMatchesAscii(enc, val, ptr - enc->minBytesPerChar, KW_no)) {
if (standalone)
*standalone = 0;
}
else {
*badPtr = val;
return 0;
}
while (isSpace(toAscii(enc, ptr, end)))
ptr += enc->minBytesPerChar;
if (ptr != end) {
*badPtr = ptr;
return 0;
}
return 1;
}
static int FASTCALL
checkCharRefNumber(int result)
{
switch (result >> 8) {
case 0xD8: case 0xD9: case 0xDA: case 0xDB:
case 0xDC: case 0xDD: case 0xDE: case 0xDF:
return -1;
case 0:
if (latin1_encoding.type[result] == BT_NONXML)
return -1;
break;
case 0xFF:
if (result == 0xFFFE || result == 0xFFFF)
return -1;
break;
}
return result;
}
int FASTCALL
XmlUtf8Encode(int c, char *buf)
{
enum {
/* minN is minimum legal resulting value for N byte sequence */
min2 = 0x80,
min3 = 0x800,
min4 = 0x10000
};
if (c < 0)
return 0;
if (c < min2) {
buf[0] = (char)(c | UTF8_cval1);
return 1;
}
if (c < min3) {
buf[0] = (char)((c >> 6) | UTF8_cval2);
buf[1] = (char)((c & 0x3f) | 0x80);
return 2;
}
if (c < min4) {
buf[0] = (char)((c >> 12) | UTF8_cval3);
buf[1] = (char)(((c >> 6) & 0x3f) | 0x80);
buf[2] = (char)((c & 0x3f) | 0x80);
return 3;
}
if (c < 0x110000) {
buf[0] = (char)((c >> 18) | UTF8_cval4);
buf[1] = (char)(((c >> 12) & 0x3f) | 0x80);
buf[2] = (char)(((c >> 6) & 0x3f) | 0x80);
buf[3] = (char)((c & 0x3f) | 0x80);
return 4;
}
return 0;
}
int FASTCALL
XmlUtf16Encode(int charNum, unsigned short *buf)
{
if (charNum < 0)
return 0;
if (charNum < 0x10000) {
buf[0] = (unsigned short)charNum;
return 1;
}
if (charNum < 0x110000) {
charNum -= 0x10000;
buf[0] = (unsigned short)((charNum >> 10) + 0xD800);
buf[1] = (unsigned short)((charNum & 0x3FF) + 0xDC00);
return 2;
}
return 0;
}
struct unknown_encoding {
struct normal_encoding normal;
CONVERTER convert;
void *userData;
unsigned short utf16[256];
char utf8[256][4];
};
#define AS_UNKNOWN_ENCODING(enc) ((const struct unknown_encoding *) (enc))
int
XmlSizeOfUnknownEncoding(void)
{
return sizeof(struct unknown_encoding);
}
static int PTRFASTCALL
unknown_isName(const ENCODING *enc, const char *p)
{
const struct unknown_encoding *uenc = AS_UNKNOWN_ENCODING(enc);
int c = uenc->convert(uenc->userData, p);
if (c & ~0xFFFF)
return 0;
return UCS2_GET_NAMING(namePages, c >> 8, c & 0xFF);
}
static int PTRFASTCALL
unknown_isNmstrt(const ENCODING *enc, const char *p)
{
const struct unknown_encoding *uenc = AS_UNKNOWN_ENCODING(enc);
int c = uenc->convert(uenc->userData, p);
if (c & ~0xFFFF)
return 0;
return UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xFF);
}
static int PTRFASTCALL
unknown_isInvalid(const ENCODING *enc, const char *p)
{
const struct unknown_encoding *uenc = AS_UNKNOWN_ENCODING(enc);
int c = uenc->convert(uenc->userData, p);
return (c & ~0xFFFF) || checkCharRefNumber(c) < 0;
}
static enum XML_Convert_Result PTRCALL
unknown_toUtf8(const ENCODING *enc,
const char **fromP, const char *fromLim,
char **toP, const char *toLim)
{
const struct unknown_encoding *uenc = AS_UNKNOWN_ENCODING(enc);
char buf[XML_UTF8_ENCODE_MAX];
for (;;) {
const char *utf8;
int n;
if (*fromP == fromLim)
return XML_CONVERT_COMPLETED;
utf8 = uenc->utf8[(unsigned char)**fromP];
n = *utf8++;
if (n == 0) {
int c = uenc->convert(uenc->userData, *fromP);
n = XmlUtf8Encode(c, buf);
if (n > toLim - *toP)
return XML_CONVERT_OUTPUT_EXHAUSTED;
utf8 = buf;
*fromP += (AS_NORMAL_ENCODING(enc)->type[(unsigned char)**fromP]
- (BT_LEAD2 - 2));
}
else {
if (n > toLim - *toP)
return XML_CONVERT_OUTPUT_EXHAUSTED;
(*fromP)++;
}
do {
*(*toP)++ = *utf8++;
} while (--n != 0);
}
}
static enum XML_Convert_Result PTRCALL
unknown_toUtf16(const ENCODING *enc,
const char **fromP, const char *fromLim,
unsigned short **toP, const unsigned short *toLim)
{
const struct unknown_encoding *uenc = AS_UNKNOWN_ENCODING(enc);
while (*fromP < fromLim && *toP < toLim) {
unsigned short c = uenc->utf16[(unsigned char)**fromP];
if (c == 0) {
c = (unsigned short)
uenc->convert(uenc->userData, *fromP);
*fromP += (AS_NORMAL_ENCODING(enc)->type[(unsigned char)**fromP]
- (BT_LEAD2 - 2));
}
else
(*fromP)++;
*(*toP)++ = c;
}
if ((*toP == toLim) && (*fromP < fromLim))
return XML_CONVERT_OUTPUT_EXHAUSTED;
else
return XML_CONVERT_COMPLETED;
}
ENCODING *
XmlInitUnknownEncoding(void *mem,
int *table,
CONVERTER convert,
void *userData)
{
int i;
struct unknown_encoding *e = (struct unknown_encoding *)mem;
for (i = 0; i < (int)sizeof(struct normal_encoding); i++)
((char *)mem)[i] = ((char *)&latin1_encoding)[i];
for (i = 0; i < 128; i++)
if (latin1_encoding.type[i] != BT_OTHER
&& latin1_encoding.type[i] != BT_NONXML
&& table[i] != i)
return 0;
for (i = 0; i < 256; i++) {
int c = table[i];
if (c == -1) {
e->normal.type[i] = BT_MALFORM;
/* This shouldn't really get used. */
e->utf16[i] = 0xFFFF;
e->utf8[i][0] = 1;
e->utf8[i][1] = 0;
}
else if (c < 0) {
if (c < -4)
return 0;
e->normal.type[i] = (unsigned char)(BT_LEAD2 - (c + 2));
e->utf8[i][0] = 0;
e->utf16[i] = 0;
}
else if (c < 0x80) {
if (latin1_encoding.type[c] != BT_OTHER
&& latin1_encoding.type[c] != BT_NONXML
&& c != i)
return 0;
e->normal.type[i] = latin1_encoding.type[c];
e->utf8[i][0] = 1;
e->utf8[i][1] = (char)c;
e->utf16[i] = (unsigned short)(c == 0 ? 0xFFFF : c);
}
else if (checkCharRefNumber(c) < 0) {
e->normal.type[i] = BT_NONXML;
/* This shouldn't really get used. */
e->utf16[i] = 0xFFFF;
e->utf8[i][0] = 1;
e->utf8[i][1] = 0;
}
else {
if (c > 0xFFFF)
return 0;
if (UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xff))
e->normal.type[i] = BT_NMSTRT;
else if (UCS2_GET_NAMING(namePages, c >> 8, c & 0xff))
e->normal.type[i] = BT_NAME;
else
e->normal.type[i] = BT_OTHER;
e->utf8[i][0] = (char)XmlUtf8Encode(c, e->utf8[i] + 1);
e->utf16[i] = (unsigned short)c;
}
}
e->userData = userData;
e->convert = convert;
if (convert) {
e->normal.isName2 = unknown_isName;
e->normal.isName3 = unknown_isName;
e->normal.isName4 = unknown_isName;
e->normal.isNmstrt2 = unknown_isNmstrt;
e->normal.isNmstrt3 = unknown_isNmstrt;
e->normal.isNmstrt4 = unknown_isNmstrt;
e->normal.isInvalid2 = unknown_isInvalid;
e->normal.isInvalid3 = unknown_isInvalid;
e->normal.isInvalid4 = unknown_isInvalid;
}
e->normal.enc.utf8Convert = unknown_toUtf8;
e->normal.enc.utf16Convert = unknown_toUtf16;
return &(e->normal.enc);
}
/* If this enumeration is changed, getEncodingIndex and encodings
must also be changed. */
enum {
UNKNOWN_ENC = -1,
ISO_8859_1_ENC = 0,
US_ASCII_ENC,
UTF_8_ENC,
UTF_16_ENC,
UTF_16BE_ENC,
UTF_16LE_ENC,
/* must match encodingNames up to here */
NO_ENC
};
static const char KW_ISO_8859_1[] = {
ASCII_I, ASCII_S, ASCII_O, ASCII_MINUS, ASCII_8, ASCII_8, ASCII_5, ASCII_9,
ASCII_MINUS, ASCII_1, '\0'
};
static const char KW_US_ASCII[] = {
ASCII_U, ASCII_S, ASCII_MINUS, ASCII_A, ASCII_S, ASCII_C, ASCII_I, ASCII_I,
'\0'
};
static const char KW_UTF_8[] = {
ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_8, '\0'
};
static const char KW_UTF_16[] = {
ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, '\0'
};
static const char KW_UTF_16BE[] = {
ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, ASCII_B, ASCII_E,
'\0'
};
static const char KW_UTF_16LE[] = {
ASCII_U, ASCII_T, ASCII_F, ASCII_MINUS, ASCII_1, ASCII_6, ASCII_L, ASCII_E,
'\0'
};
static int FASTCALL
getEncodingIndex(const char *name)
{
static const char * const encodingNames[] = {
KW_ISO_8859_1,
KW_US_ASCII,
KW_UTF_8,
KW_UTF_16,
KW_UTF_16BE,
KW_UTF_16LE,
};
int i;
if (name == NULL)
return NO_ENC;
for (i = 0; i < (int)(sizeof(encodingNames)/sizeof(encodingNames[0])); i++)
if (streqci(name, encodingNames[i]))
return i;
return UNKNOWN_ENC;
}
/* For binary compatibility, we store the index of the encoding
specified at initialization in the isUtf16 member.
*/
#define INIT_ENC_INDEX(enc) ((int)(enc)->initEnc.isUtf16)
#define SET_INIT_ENC_INDEX(enc, i) ((enc)->initEnc.isUtf16 = (char)i)
/* This is what detects the encoding. encodingTable maps from
encoding indices to encodings; INIT_ENC_INDEX(enc) is the index of
the external (protocol) specified encoding; state is
XML_CONTENT_STATE if we're parsing an external text entity, and
XML_PROLOG_STATE otherwise.
*/
static int
initScan(const ENCODING * const *encodingTable,
const INIT_ENCODING *enc,
int state,
const char *ptr,
const char *end,
const char **nextTokPtr)
{
const ENCODING **encPtr;
if (ptr >= end)
return XML_TOK_NONE;
encPtr = enc->encPtr;
if (ptr + 1 == end) {
/* only a single byte available for auto-detection */
#ifndef XML_DTD /* FIXME */
/* a well-formed document entity must have more than one byte */
if (state != XML_CONTENT_STATE)
return XML_TOK_PARTIAL;
#endif
/* so we're parsing an external text entity... */
/* if UTF-16 was externally specified, then we need at least 2 bytes */
switch (INIT_ENC_INDEX(enc)) {
case UTF_16_ENC:
case UTF_16LE_ENC:
case UTF_16BE_ENC:
return XML_TOK_PARTIAL;
}
switch ((unsigned char)*ptr) {
case 0xFE:
case 0xFF:
case 0xEF: /* possibly first byte of UTF-8 BOM */
if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC
&& state == XML_CONTENT_STATE)
break;
/* fall through */
case 0x00:
case 0x3C:
return XML_TOK_PARTIAL;
}
}
else {
switch (((unsigned char)ptr[0] << 8) | (unsigned char)ptr[1]) {
case 0xFEFF:
if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC
&& state == XML_CONTENT_STATE)
break;
*nextTokPtr = ptr + 2;
*encPtr = encodingTable[UTF_16BE_ENC];
return XML_TOK_BOM;
/* 00 3C is handled in the default case */
case 0x3C00:
if ((INIT_ENC_INDEX(enc) == UTF_16BE_ENC
|| INIT_ENC_INDEX(enc) == UTF_16_ENC)
&& state == XML_CONTENT_STATE)
break;
*encPtr = encodingTable[UTF_16LE_ENC];
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
case 0xFFFE:
if (INIT_ENC_INDEX(enc) == ISO_8859_1_ENC
&& state == XML_CONTENT_STATE)
break;
*nextTokPtr = ptr + 2;
*encPtr = encodingTable[UTF_16LE_ENC];
return XML_TOK_BOM;
case 0xEFBB:
/* Maybe a UTF-8 BOM (EF BB BF) */
/* If there's an explicitly specified (external) encoding
of ISO-8859-1 or some flavour of UTF-16
and this is an external text entity,
don't look for the BOM,
because it might be a legal data.
*/
if (state == XML_CONTENT_STATE) {
int e = INIT_ENC_INDEX(enc);
if (e == ISO_8859_1_ENC || e == UTF_16BE_ENC
|| e == UTF_16LE_ENC || e == UTF_16_ENC)
break;
}
if (ptr + 2 == end)
return XML_TOK_PARTIAL;
if ((unsigned char)ptr[2] == 0xBF) {
*nextTokPtr = ptr + 3;
*encPtr = encodingTable[UTF_8_ENC];
return XML_TOK_BOM;
}
break;
default:
if (ptr[0] == '\0') {
/* 0 isn't a legal data character. Furthermore a document
entity can only start with ASCII characters. So the only
way this can fail to be big-endian UTF-16 if it it's an
external parsed general entity that's labelled as
UTF-16LE.
*/
if (state == XML_CONTENT_STATE && INIT_ENC_INDEX(enc) == UTF_16LE_ENC)
break;
*encPtr = encodingTable[UTF_16BE_ENC];
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
}
else if (ptr[1] == '\0') {
/* We could recover here in the case:
- parsing an external entity
- second byte is 0
- no externally specified encoding
- no encoding declaration
by assuming UTF-16LE. But we don't, because this would mean when
presented just with a single byte, we couldn't reliably determine
whether we needed further bytes.
*/
if (state == XML_CONTENT_STATE)
break;
*encPtr = encodingTable[UTF_16LE_ENC];
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
}
break;
}
}
*encPtr = encodingTable[INIT_ENC_INDEX(enc)];
return XmlTok(*encPtr, state, ptr, end, nextTokPtr);
}
#define NS(x) x
#define ns(x) x
#define XML_TOK_NS_C
#include "xmltok_ns.c"
#undef XML_TOK_NS_C
#undef NS
#undef ns
#ifdef XML_NS
#define NS(x) x ## NS
#define ns(x) x ## _ns
#define XML_TOK_NS_C
#include "xmltok_ns.c"
#undef XML_TOK_NS_C
#undef NS
#undef ns
ENCODING *
XmlInitUnknownEncodingNS(void *mem,
int *table,
CONVERTER convert,
void *userData)
{
ENCODING *enc = XmlInitUnknownEncoding(mem, table, convert, userData);
if (enc)
((struct normal_encoding *)enc)->type[ASCII_COLON] = BT_COLON;
return enc;
}
#endif /* XML_NS */
hexpat-0.20.13/cbits/iasciitab.h 0000644 0000000 0000000 00000003446 13122604047 014547 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
/* Like asciitab.h, except that 0xD has code BT_S rather than BT_CR */
/* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML,
/* 0x0C */ BT_NONXML, BT_S, BT_NONXML, BT_NONXML,
/* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM,
/* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS,
/* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS,
/* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL,
/* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x38 */ BT_DIGIT, BT_DIGIT, BT_COLON, BT_SEMI,
/* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST,
/* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB,
/* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT,
/* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER,
hexpat-0.20.13/cbits/asciitab.h 0000644 0000000 0000000 00000003340 13122604047 014367 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
/* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML,
/* 0x0C */ BT_NONXML, BT_CR, BT_NONXML, BT_NONXML,
/* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML,
/* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM,
/* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS,
/* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS,
/* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL,
/* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT,
/* 0x38 */ BT_DIGIT, BT_DIGIT, BT_COLON, BT_SEMI,
/* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST,
/* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB,
/* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT,
/* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX,
/* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT,
/* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT,
/* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER,
/* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER,
hexpat-0.20.13/cbits/README 0000644 0000000 0000000 00000013235 13122604047 013323 0 ustar 00 0000000 0000000
Expat, Release 2.2.1
This is Expat, a C library for parsing XML, written by James Clark.
Expat is a stream-oriented XML parser. This means that you register
handlers with the parser before starting the parse. These handlers
are called when the parser discovers the associated structures in the
document being parsed. A start tag is an example of the kind of
structures for which you may register handlers.
Windows users should use the expat_win32bin package, which includes
both precompiled libraries and executables, and source code for
developers.
Expat is free software. You may copy, distribute, and modify it under
the terms of the License contained in the file COPYING distributed
with this package. This license is the same as the MIT/X Consortium
license.
Versions of Expat that have an odd minor version (the middle number in
the release above), are development releases and should be considered
as beta software. Releases with even minor version numbers are
intended to be production grade software.
If you are building Expat from a check-out from the CVS repository,
you need to run a script that generates the configure script using the
GNU autoconf and libtool tools. To do this, you need to have
autoconf 2.58 or newer. Run the script like this:
./buildconf.sh
Once this has been done, follow the same instructions as for building
from a source distribution.
To build Expat from a source distribution, you first run the
configuration shell script in the top level distribution directory:
./configure
There are many options which you may provide to configure (which you
can discover by running configure with the --help option). But the
one of most interest is the one that sets the installation directory.
By default, the configure script will set things up to install
libexpat into /usr/local/lib, expat.h into /usr/local/include, and
xmlwf into /usr/local/bin. If, for example, you'd prefer to install
into /home/me/mystuff/lib, /home/me/mystuff/include, and
/home/me/mystuff/bin, you can tell configure about that with:
./configure --prefix=/home/me/mystuff
Another interesting option is to enable 64-bit integer support for
line and column numbers and the over-all byte index:
./configure CPPFLAGS=-DXML_LARGE_SIZE
However, such a modification would be a breaking change to the ABI
and is therefore not recommended for general use - e.g. as part of
a Linux distribution - but rather for builds with special requirements.
After running the configure script, the "make" command will build
things and "make install" will install things into their proper
location. Have a look at the "Makefile" to learn about additional
"make" options. Note that you need to have write permission into
the directories into which things will be installed.
If you are interested in building Expat to provide document
information in UTF-16 encoding rather than the default UTF-8, follow
these instructions (after having run "make distclean"):
1. For UTF-16 output as unsigned short (and version/error
strings as char), run:
./configure CPPFLAGS=-DXML_UNICODE
For UTF-16 output as wchar_t (incl. version/error strings),
run:
./configure CFLAGS="-g -O2 -fshort-wchar" \
CPPFLAGS=-DXML_UNICODE_WCHAR_T
2. Edit the MakeFile, changing:
LIBRARY = libexpat.la
to:
LIBRARY = libexpatw.la
(Note the additional "w" in the library name.)
3. Run "make buildlib" (which builds the library only).
Or, to save step 2, run "make buildlib LIBRARY=libexpatw.la".
4. Run "make installlib" (which installs the library only).
Or, if step 2 was omitted, run "make installlib LIBRARY=libexpatw.la".
Using DESTDIR or INSTALL_ROOT is enabled, with INSTALL_ROOT being the default
value for DESTDIR, and the rest of the make file using only DESTDIR.
It works as follows:
$ make install DESTDIR=/path/to/image
overrides the in-makefile set DESTDIR, while both
$ INSTALL_ROOT=/path/to/image make install
$ make install INSTALL_ROOT=/path/to/image
use DESTDIR=$(INSTALL_ROOT), even if DESTDIR eventually is defined in the
environment, because variable-setting priority is
1) commandline
2) in-makefile
3) environment
Note: This only applies to the Expat library itself, building UTF-16 versions
of xmlwf and the tests is currently not supported.
Note for Solaris users: The "ar" command is usually located in
"/usr/ccs/bin", which is not in the default PATH. You will need to
add this to your path for the "make" command, and probably also switch
to GNU make (the "make" found in /usr/ccs/bin does not seem to work
properly -- apparently it does not understand .PHONY directives). If
you're using ksh or bash, use this command to build:
PATH=/usr/ccs/bin:$PATH make
When using Expat with a project using autoconf for configuration, you
can use the probing macro in conftools/expat.m4 to determine how to
include Expat. See the comments at the top of that file for more
information.
A reference manual is available in the file doc/reference.html in this
distribution.
The homepage for this project is http://www.libexpat.org/. There
are links there to connect you to the bug reports page. If you need
to report a bug when you don't have access to a browser, you may also
send a bug report by email to expat-bugs@mail.libexpat.org.
Discussion related to the direction of future expat development takes
place on expat-discuss@mail.libexpat.org. Archives of this list and
other Expat-related lists may be found at:
http://mail.libexpat.org/mailman/listinfo/
hexpat-0.20.13/cbits/xmlparse.c 0000644 0000000 0000000 00000631360 13122604047 014447 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
77fea421d361dca90041d0040ecf1dca651167fadf2af79e990e35168d70d933 (2.2.1+)
*/
#define _GNU_SOURCE /* syscall prototype */
#include
#include /* memset(), memcpy() */
#include
#include /* UINT_MAX */
#include /* fprintf */
#include /* getenv */
#ifdef _WIN32
#define getpid GetCurrentProcessId
#else
#include /* gettimeofday() */
#include /* getpid() */
#include /* getpid() */
#endif
#define XML_BUILDING_EXPAT 1
#ifdef _WIN32
#include "winconfig.h"
#elif defined(HAVE_EXPAT_CONFIG_H)
#include
#endif /* ndef _WIN32 */
#include "ascii.h"
#include "expat.h"
#include "siphash.h"
#ifdef XML_UNICODE
#define XML_ENCODE_MAX XML_UTF16_ENCODE_MAX
#define XmlConvert XmlUtf16Convert
#define XmlGetInternalEncoding XmlGetUtf16InternalEncoding
#define XmlGetInternalEncodingNS XmlGetUtf16InternalEncodingNS
#define XmlEncode XmlUtf16Encode
/* Using pointer subtraction to convert to integer type. */
#define MUST_CONVERT(enc, s) (!(enc)->isUtf16 || (((char *)(s) - (char *)NULL) & 1))
typedef unsigned short ICHAR;
#else
#define XML_ENCODE_MAX XML_UTF8_ENCODE_MAX
#define XmlConvert XmlUtf8Convert
#define XmlGetInternalEncoding XmlGetUtf8InternalEncoding
#define XmlGetInternalEncodingNS XmlGetUtf8InternalEncodingNS
#define XmlEncode XmlUtf8Encode
#define MUST_CONVERT(enc, s) (!(enc)->isUtf8)
typedef char ICHAR;
#endif
#ifndef XML_NS
#define XmlInitEncodingNS XmlInitEncoding
#define XmlInitUnknownEncodingNS XmlInitUnknownEncoding
#undef XmlGetInternalEncodingNS
#define XmlGetInternalEncodingNS XmlGetInternalEncoding
#define XmlParseXmlDeclNS XmlParseXmlDecl
#endif
#ifdef XML_UNICODE
#ifdef XML_UNICODE_WCHAR_T
#define XML_T(x) (const wchar_t)x
#define XML_L(x) L ## x
#else
#define XML_T(x) (const unsigned short)x
#define XML_L(x) x
#endif
#else
#define XML_T(x) x
#define XML_L(x) x
#endif
/* Round up n to be a multiple of sz, where sz is a power of 2. */
#define ROUND_UP(n, sz) (((n) + ((sz) - 1)) & ~((sz) - 1))
/* Handle the case where memmove() doesn't exist. */
#ifndef HAVE_MEMMOVE
#ifdef HAVE_BCOPY
#define memmove(d,s,l) bcopy((s),(d),(l))
#else
#error memmove does not exist on this platform, nor is a substitute available
#endif /* HAVE_BCOPY */
#endif /* HAVE_MEMMOVE */
#include "internal.h"
#include "xmltok.h"
#include "xmlrole.h"
typedef const XML_Char *KEY;
typedef struct {
KEY name;
} NAMED;
typedef struct {
NAMED **v;
unsigned char power;
size_t size;
size_t used;
const XML_Memory_Handling_Suite *mem;
} HASH_TABLE;
static size_t
keylen(KEY s);
static void
copy_salt_to_sipkey(XML_Parser parser, struct sipkey * key);
/* For probing (after a collision) we need a step size relative prime
to the hash table size, which is a power of 2. We use double-hashing,
since we can calculate a second hash value cheaply by taking those bits
of the first hash value that were discarded (masked out) when the table
index was calculated: index = hash & mask, where mask = table->size - 1.
We limit the maximum step size to table->size / 4 (mask >> 2) and make
it odd, since odd numbers are always relative prime to a power of 2.
*/
#define SECOND_HASH(hash, mask, power) \
((((hash) & ~(mask)) >> ((power) - 1)) & ((mask) >> 2))
#define PROBE_STEP(hash, mask, power) \
((unsigned char)((SECOND_HASH(hash, mask, power)) | 1))
typedef struct {
NAMED **p;
NAMED **end;
} HASH_TABLE_ITER;
#define INIT_TAG_BUF_SIZE 32 /* must be a multiple of sizeof(XML_Char) */
#define INIT_DATA_BUF_SIZE 1024
#define INIT_ATTS_SIZE 16
#define INIT_ATTS_VERSION 0xFFFFFFFF
#define INIT_BLOCK_SIZE 1024
#define INIT_BUFFER_SIZE 1024
#define EXPAND_SPARE 24
typedef struct binding {
struct prefix *prefix;
struct binding *nextTagBinding;
struct binding *prevPrefixBinding;
const struct attribute_id *attId;
XML_Char *uri;
int uriLen;
int uriAlloc;
} BINDING;
typedef struct prefix {
const XML_Char *name;
BINDING *binding;
} PREFIX;
typedef struct {
const XML_Char *str;
const XML_Char *localPart;
const XML_Char *prefix;
int strLen;
int uriLen;
int prefixLen;
} TAG_NAME;
/* TAG represents an open element.
The name of the element is stored in both the document and API
encodings. The memory buffer 'buf' is a separately-allocated
memory area which stores the name. During the XML_Parse()/
XMLParseBuffer() when the element is open, the memory for the 'raw'
version of the name (in the document encoding) is shared with the
document buffer. If the element is open across calls to
XML_Parse()/XML_ParseBuffer(), the buffer is re-allocated to
contain the 'raw' name as well.
A parser re-uses these structures, maintaining a list of allocated
TAG objects in a free list.
*/
typedef struct tag {
struct tag *parent; /* parent of this element */
const char *rawName; /* tagName in the original encoding */
int rawNameLength;
TAG_NAME name; /* tagName in the API encoding */
char *buf; /* buffer for name components */
char *bufEnd; /* end of the buffer */
BINDING *bindings;
} TAG;
typedef struct {
const XML_Char *name;
const XML_Char *textPtr;
int textLen; /* length in XML_Chars */
int processed; /* # of processed bytes - when suspended */
const XML_Char *systemId;
const XML_Char *base;
const XML_Char *publicId;
const XML_Char *notation;
XML_Bool open;
XML_Bool is_param;
XML_Bool is_internal; /* true if declared in internal subset outside PE */
} ENTITY;
typedef struct {
enum XML_Content_Type type;
enum XML_Content_Quant quant;
const XML_Char * name;
int firstchild;
int lastchild;
int childcnt;
int nextsib;
} CONTENT_SCAFFOLD;
#define INIT_SCAFFOLD_ELEMENTS 32
typedef struct block {
struct block *next;
int size;
XML_Char s[1];
} BLOCK;
typedef struct {
BLOCK *blocks;
BLOCK *freeBlocks;
const XML_Char *end;
XML_Char *ptr;
XML_Char *start;
const XML_Memory_Handling_Suite *mem;
} STRING_POOL;
/* The XML_Char before the name is used to determine whether
an attribute has been specified. */
typedef struct attribute_id {
XML_Char *name;
PREFIX *prefix;
XML_Bool maybeTokenized;
XML_Bool xmlns;
} ATTRIBUTE_ID;
typedef struct {
const ATTRIBUTE_ID *id;
XML_Bool isCdata;
const XML_Char *value;
} DEFAULT_ATTRIBUTE;
typedef struct {
unsigned long version;
unsigned long hash;
const XML_Char *uriName;
} NS_ATT;
typedef struct {
const XML_Char *name;
PREFIX *prefix;
const ATTRIBUTE_ID *idAtt;
int nDefaultAtts;
int allocDefaultAtts;
DEFAULT_ATTRIBUTE *defaultAtts;
} ELEMENT_TYPE;
typedef struct {
HASH_TABLE generalEntities;
HASH_TABLE elementTypes;
HASH_TABLE attributeIds;
HASH_TABLE prefixes;
STRING_POOL pool;
STRING_POOL entityValuePool;
/* false once a parameter entity reference has been skipped */
XML_Bool keepProcessing;
/* true once an internal or external PE reference has been encountered;
this includes the reference to an external subset */
XML_Bool hasParamEntityRefs;
XML_Bool standalone;
#ifdef XML_DTD
/* indicates if external PE has been read */
XML_Bool paramEntityRead;
HASH_TABLE paramEntities;
#endif /* XML_DTD */
PREFIX defaultPrefix;
/* === scaffolding for building content model === */
XML_Bool in_eldecl;
CONTENT_SCAFFOLD *scaffold;
unsigned contentStringLen;
unsigned scaffSize;
unsigned scaffCount;
int scaffLevel;
int *scaffIndex;
} DTD;
typedef struct open_internal_entity {
const char *internalEventPtr;
const char *internalEventEndPtr;
struct open_internal_entity *next;
ENTITY *entity;
int startTagLevel;
XML_Bool betweenDecl; /* WFC: PE Between Declarations */
} OPEN_INTERNAL_ENTITY;
typedef enum XML_Error PTRCALL Processor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr);
static Processor prologProcessor;
static Processor prologInitProcessor;
static Processor contentProcessor;
static Processor cdataSectionProcessor;
#ifdef XML_DTD
static Processor ignoreSectionProcessor;
static Processor externalParEntProcessor;
static Processor externalParEntInitProcessor;
static Processor entityValueProcessor;
static Processor entityValueInitProcessor;
#endif /* XML_DTD */
static Processor epilogProcessor;
static Processor errorProcessor;
static Processor externalEntityInitProcessor;
static Processor externalEntityInitProcessor2;
static Processor externalEntityInitProcessor3;
static Processor externalEntityContentProcessor;
static Processor internalEntityProcessor;
static enum XML_Error
handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName);
static enum XML_Error
processXmlDecl(XML_Parser parser, int isGeneralTextEntity,
const char *s, const char *next);
static enum XML_Error
initializeEncoding(XML_Parser parser);
static enum XML_Error
doProlog(XML_Parser parser, const ENCODING *enc, const char *s,
const char *end, int tok, const char *next, const char **nextPtr,
XML_Bool haveMore);
static enum XML_Error
processInternalEntity(XML_Parser parser, ENTITY *entity,
XML_Bool betweenDecl);
static enum XML_Error
doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc,
const char *start, const char *end, const char **endPtr,
XML_Bool haveMore);
static enum XML_Error
doCdataSection(XML_Parser parser, const ENCODING *, const char **startPtr,
const char *end, const char **nextPtr, XML_Bool haveMore);
#ifdef XML_DTD
static enum XML_Error
doIgnoreSection(XML_Parser parser, const ENCODING *, const char **startPtr,
const char *end, const char **nextPtr, XML_Bool haveMore);
#endif /* XML_DTD */
static void
freeBindings(XML_Parser parser, BINDING *bindings);
static enum XML_Error
storeAtts(XML_Parser parser, const ENCODING *, const char *s,
TAG_NAME *tagNamePtr, BINDING **bindingsPtr);
static enum XML_Error
addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId,
const XML_Char *uri, BINDING **bindingsPtr);
static int
defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *, XML_Bool isCdata,
XML_Bool isId, const XML_Char *dfltValue, XML_Parser parser);
static enum XML_Error
storeAttributeValue(XML_Parser parser, const ENCODING *, XML_Bool isCdata,
const char *, const char *, STRING_POOL *);
static enum XML_Error
appendAttributeValue(XML_Parser parser, const ENCODING *, XML_Bool isCdata,
const char *, const char *, STRING_POOL *);
static ATTRIBUTE_ID *
getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start,
const char *end);
static int
setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *);
static enum XML_Error
storeEntityValue(XML_Parser parser, const ENCODING *enc, const char *start,
const char *end);
static int
reportProcessingInstruction(XML_Parser parser, const ENCODING *enc,
const char *start, const char *end);
static int
reportComment(XML_Parser parser, const ENCODING *enc, const char *start,
const char *end);
static void
reportDefault(XML_Parser parser, const ENCODING *enc, const char *start,
const char *end);
static const XML_Char * getContext(XML_Parser parser);
static XML_Bool
setContext(XML_Parser parser, const XML_Char *context);
static void FASTCALL normalizePublicId(XML_Char *s);
static DTD * dtdCreate(const XML_Memory_Handling_Suite *ms);
/* do not call if parentParser != NULL */
static void dtdReset(DTD *p, const XML_Memory_Handling_Suite *ms);
static void
dtdDestroy(DTD *p, XML_Bool isDocEntity, const XML_Memory_Handling_Suite *ms);
static int
dtdCopy(XML_Parser oldParser,
DTD *newDtd, const DTD *oldDtd, const XML_Memory_Handling_Suite *ms);
static int
copyEntityTable(XML_Parser oldParser,
HASH_TABLE *, STRING_POOL *, const HASH_TABLE *);
static NAMED *
lookup(XML_Parser parser, HASH_TABLE *table, KEY name, size_t createSize);
static void FASTCALL
hashTableInit(HASH_TABLE *, const XML_Memory_Handling_Suite *ms);
static void FASTCALL hashTableClear(HASH_TABLE *);
static void FASTCALL hashTableDestroy(HASH_TABLE *);
static void FASTCALL
hashTableIterInit(HASH_TABLE_ITER *, const HASH_TABLE *);
static NAMED * FASTCALL hashTableIterNext(HASH_TABLE_ITER *);
static void FASTCALL
poolInit(STRING_POOL *, const XML_Memory_Handling_Suite *ms);
static void FASTCALL poolClear(STRING_POOL *);
static void FASTCALL poolDestroy(STRING_POOL *);
static XML_Char *
poolAppend(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end);
static XML_Char *
poolStoreString(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end);
static XML_Bool FASTCALL poolGrow(STRING_POOL *pool);
static const XML_Char * FASTCALL
poolCopyString(STRING_POOL *pool, const XML_Char *s);
static const XML_Char *
poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n);
static const XML_Char * FASTCALL
poolAppendString(STRING_POOL *pool, const XML_Char *s);
static int FASTCALL nextScaffoldPart(XML_Parser parser);
static XML_Content * build_model(XML_Parser parser);
static ELEMENT_TYPE *
getElementType(XML_Parser parser, const ENCODING *enc,
const char *ptr, const char *end);
static unsigned long generate_hash_secret_salt(XML_Parser parser);
static XML_Bool startParsing(XML_Parser parser);
static XML_Parser
parserCreate(const XML_Char *encodingName,
const XML_Memory_Handling_Suite *memsuite,
const XML_Char *nameSep,
DTD *dtd);
static void
parserInit(XML_Parser parser, const XML_Char *encodingName);
#define poolStart(pool) ((pool)->start)
#define poolEnd(pool) ((pool)->ptr)
#define poolLength(pool) ((pool)->ptr - (pool)->start)
#define poolChop(pool) ((void)--(pool->ptr))
#define poolLastChar(pool) (((pool)->ptr)[-1])
#define poolDiscard(pool) ((pool)->ptr = (pool)->start)
#define poolFinish(pool) ((pool)->start = (pool)->ptr)
#define poolAppendChar(pool, c) \
(((pool)->ptr == (pool)->end && !poolGrow(pool)) \
? 0 \
: ((*((pool)->ptr)++ = c), 1))
struct XML_ParserStruct {
/* The first member must be userData so that the XML_GetUserData
macro works. */
void *m_userData;
void *m_handlerArg;
char *m_buffer;
const XML_Memory_Handling_Suite m_mem;
/* first character to be parsed */
const char *m_bufferPtr;
/* past last character to be parsed */
char *m_bufferEnd;
/* allocated end of buffer */
const char *m_bufferLim;
XML_Index m_parseEndByteIndex;
const char *m_parseEndPtr;
XML_Char *m_dataBuf;
XML_Char *m_dataBufEnd;
XML_StartElementHandler m_startElementHandler;
XML_EndElementHandler m_endElementHandler;
XML_CharacterDataHandler m_characterDataHandler;
XML_ProcessingInstructionHandler m_processingInstructionHandler;
XML_CommentHandler m_commentHandler;
XML_StartCdataSectionHandler m_startCdataSectionHandler;
XML_EndCdataSectionHandler m_endCdataSectionHandler;
XML_DefaultHandler m_defaultHandler;
XML_StartDoctypeDeclHandler m_startDoctypeDeclHandler;
XML_EndDoctypeDeclHandler m_endDoctypeDeclHandler;
XML_UnparsedEntityDeclHandler m_unparsedEntityDeclHandler;
XML_NotationDeclHandler m_notationDeclHandler;
XML_StartNamespaceDeclHandler m_startNamespaceDeclHandler;
XML_EndNamespaceDeclHandler m_endNamespaceDeclHandler;
XML_NotStandaloneHandler m_notStandaloneHandler;
XML_ExternalEntityRefHandler m_externalEntityRefHandler;
XML_Parser m_externalEntityRefHandlerArg;
XML_SkippedEntityHandler m_skippedEntityHandler;
XML_UnknownEncodingHandler m_unknownEncodingHandler;
XML_ElementDeclHandler m_elementDeclHandler;
XML_AttlistDeclHandler m_attlistDeclHandler;
XML_EntityDeclHandler m_entityDeclHandler;
XML_XmlDeclHandler m_xmlDeclHandler;
const ENCODING *m_encoding;
INIT_ENCODING m_initEncoding;
const ENCODING *m_internalEncoding;
const XML_Char *m_protocolEncodingName;
XML_Bool m_ns;
XML_Bool m_ns_triplets;
void *m_unknownEncodingMem;
void *m_unknownEncodingData;
void *m_unknownEncodingHandlerData;
void (XMLCALL *m_unknownEncodingRelease)(void *);
PROLOG_STATE m_prologState;
Processor *m_processor;
enum XML_Error m_errorCode;
const char *m_eventPtr;
const char *m_eventEndPtr;
const char *m_positionPtr;
OPEN_INTERNAL_ENTITY *m_openInternalEntities;
OPEN_INTERNAL_ENTITY *m_freeInternalEntities;
XML_Bool m_defaultExpandInternalEntities;
int m_tagLevel;
ENTITY *m_declEntity;
const XML_Char *m_doctypeName;
const XML_Char *m_doctypeSysid;
const XML_Char *m_doctypePubid;
const XML_Char *m_declAttributeType;
const XML_Char *m_declNotationName;
const XML_Char *m_declNotationPublicId;
ELEMENT_TYPE *m_declElementType;
ATTRIBUTE_ID *m_declAttributeId;
XML_Bool m_declAttributeIsCdata;
XML_Bool m_declAttributeIsId;
DTD *m_dtd;
const XML_Char *m_curBase;
TAG *m_tagStack;
TAG *m_freeTagList;
BINDING *m_inheritedBindings;
BINDING *m_freeBindingList;
int m_attsSize;
int m_nSpecifiedAtts;
int m_idAttIndex;
ATTRIBUTE *m_atts;
NS_ATT *m_nsAtts;
unsigned long m_nsAttsVersion;
unsigned char m_nsAttsPower;
#ifdef XML_ATTR_INFO
XML_AttrInfo *m_attInfo;
#endif
POSITION m_position;
STRING_POOL m_tempPool;
STRING_POOL m_temp2Pool;
char *m_groupConnector;
unsigned int m_groupSize;
XML_Char m_namespaceSeparator;
XML_Parser m_parentParser;
XML_ParsingStatus m_parsingStatus;
#ifdef XML_DTD
XML_Bool m_isParamEntity;
XML_Bool m_useForeignDTD;
enum XML_ParamEntityParsing m_paramEntityParsing;
#endif
unsigned long m_hash_secret_salt;
};
#define MALLOC(s) (parser->m_mem.malloc_fcn((s)))
#define REALLOC(p,s) (parser->m_mem.realloc_fcn((p),(s)))
#define FREE(p) (parser->m_mem.free_fcn((p)))
#define userData (parser->m_userData)
#define handlerArg (parser->m_handlerArg)
#define startElementHandler (parser->m_startElementHandler)
#define endElementHandler (parser->m_endElementHandler)
#define characterDataHandler (parser->m_characterDataHandler)
#define processingInstructionHandler \
(parser->m_processingInstructionHandler)
#define commentHandler (parser->m_commentHandler)
#define startCdataSectionHandler \
(parser->m_startCdataSectionHandler)
#define endCdataSectionHandler (parser->m_endCdataSectionHandler)
#define defaultHandler (parser->m_defaultHandler)
#define startDoctypeDeclHandler (parser->m_startDoctypeDeclHandler)
#define endDoctypeDeclHandler (parser->m_endDoctypeDeclHandler)
#define unparsedEntityDeclHandler \
(parser->m_unparsedEntityDeclHandler)
#define notationDeclHandler (parser->m_notationDeclHandler)
#define startNamespaceDeclHandler \
(parser->m_startNamespaceDeclHandler)
#define endNamespaceDeclHandler (parser->m_endNamespaceDeclHandler)
#define notStandaloneHandler (parser->m_notStandaloneHandler)
#define externalEntityRefHandler \
(parser->m_externalEntityRefHandler)
#define externalEntityRefHandlerArg \
(parser->m_externalEntityRefHandlerArg)
#define internalEntityRefHandler \
(parser->m_internalEntityRefHandler)
#define skippedEntityHandler (parser->m_skippedEntityHandler)
#define unknownEncodingHandler (parser->m_unknownEncodingHandler)
#define elementDeclHandler (parser->m_elementDeclHandler)
#define attlistDeclHandler (parser->m_attlistDeclHandler)
#define entityDeclHandler (parser->m_entityDeclHandler)
#define xmlDeclHandler (parser->m_xmlDeclHandler)
#define encoding (parser->m_encoding)
#define initEncoding (parser->m_initEncoding)
#define internalEncoding (parser->m_internalEncoding)
#define unknownEncodingMem (parser->m_unknownEncodingMem)
#define unknownEncodingData (parser->m_unknownEncodingData)
#define unknownEncodingHandlerData \
(parser->m_unknownEncodingHandlerData)
#define unknownEncodingRelease (parser->m_unknownEncodingRelease)
#define protocolEncodingName (parser->m_protocolEncodingName)
#define ns (parser->m_ns)
#define ns_triplets (parser->m_ns_triplets)
#define prologState (parser->m_prologState)
#define processor (parser->m_processor)
#define errorCode (parser->m_errorCode)
#define eventPtr (parser->m_eventPtr)
#define eventEndPtr (parser->m_eventEndPtr)
#define positionPtr (parser->m_positionPtr)
#define position (parser->m_position)
#define openInternalEntities (parser->m_openInternalEntities)
#define freeInternalEntities (parser->m_freeInternalEntities)
#define defaultExpandInternalEntities \
(parser->m_defaultExpandInternalEntities)
#define tagLevel (parser->m_tagLevel)
#define buffer (parser->m_buffer)
#define bufferPtr (parser->m_bufferPtr)
#define bufferEnd (parser->m_bufferEnd)
#define parseEndByteIndex (parser->m_parseEndByteIndex)
#define parseEndPtr (parser->m_parseEndPtr)
#define bufferLim (parser->m_bufferLim)
#define dataBuf (parser->m_dataBuf)
#define dataBufEnd (parser->m_dataBufEnd)
#define _dtd (parser->m_dtd)
#define curBase (parser->m_curBase)
#define declEntity (parser->m_declEntity)
#define doctypeName (parser->m_doctypeName)
#define doctypeSysid (parser->m_doctypeSysid)
#define doctypePubid (parser->m_doctypePubid)
#define declAttributeType (parser->m_declAttributeType)
#define declNotationName (parser->m_declNotationName)
#define declNotationPublicId (parser->m_declNotationPublicId)
#define declElementType (parser->m_declElementType)
#define declAttributeId (parser->m_declAttributeId)
#define declAttributeIsCdata (parser->m_declAttributeIsCdata)
#define declAttributeIsId (parser->m_declAttributeIsId)
#define freeTagList (parser->m_freeTagList)
#define freeBindingList (parser->m_freeBindingList)
#define inheritedBindings (parser->m_inheritedBindings)
#define tagStack (parser->m_tagStack)
#define atts (parser->m_atts)
#define attsSize (parser->m_attsSize)
#define nSpecifiedAtts (parser->m_nSpecifiedAtts)
#define idAttIndex (parser->m_idAttIndex)
#define nsAtts (parser->m_nsAtts)
#define nsAttsVersion (parser->m_nsAttsVersion)
#define nsAttsPower (parser->m_nsAttsPower)
#define attInfo (parser->m_attInfo)
#define tempPool (parser->m_tempPool)
#define temp2Pool (parser->m_temp2Pool)
#define groupConnector (parser->m_groupConnector)
#define groupSize (parser->m_groupSize)
#define namespaceSeparator (parser->m_namespaceSeparator)
#define parentParser (parser->m_parentParser)
#define ps_parsing (parser->m_parsingStatus.parsing)
#define ps_finalBuffer (parser->m_parsingStatus.finalBuffer)
#ifdef XML_DTD
#define isParamEntity (parser->m_isParamEntity)
#define useForeignDTD (parser->m_useForeignDTD)
#define paramEntityParsing (parser->m_paramEntityParsing)
#endif /* XML_DTD */
#define hash_secret_salt (parser->m_hash_secret_salt)
XML_Parser XMLCALL
XML_ParserCreate(const XML_Char *encodingName)
{
return XML_ParserCreate_MM(encodingName, NULL, NULL);
}
XML_Parser XMLCALL
XML_ParserCreateNS(const XML_Char *encodingName, XML_Char nsSep)
{
XML_Char tmp[2];
*tmp = nsSep;
return XML_ParserCreate_MM(encodingName, NULL, tmp);
}
static const XML_Char implicitContext[] = {
ASCII_x, ASCII_m, ASCII_l, ASCII_EQUALS, ASCII_h, ASCII_t, ASCII_t, ASCII_p,
ASCII_COLON, ASCII_SLASH, ASCII_SLASH, ASCII_w, ASCII_w, ASCII_w,
ASCII_PERIOD, ASCII_w, ASCII_3, ASCII_PERIOD, ASCII_o, ASCII_r, ASCII_g,
ASCII_SLASH, ASCII_X, ASCII_M, ASCII_L, ASCII_SLASH, ASCII_1, ASCII_9,
ASCII_9, ASCII_8, ASCII_SLASH, ASCII_n, ASCII_a, ASCII_m, ASCII_e,
ASCII_s, ASCII_p, ASCII_a, ASCII_c, ASCII_e, '\0'
};
#if defined(HAVE_GETRANDOM) || defined(HAVE_SYSCALL_GETRANDOM)
# include
# if defined(HAVE_GETRANDOM)
# include /* getrandom */
# else
# include /* syscall */
# include /* SYS_getrandom */
# endif
/* Obtain entropy on Linux 3.17+ */
static int
writeRandomBytes_getrandom(void * target, size_t count) {
int success = 0; /* full count bytes written? */
size_t bytesWrittenTotal = 0;
const unsigned int getrandomFlags = 0;
do {
void * const currentTarget = (void*)((char*)target + bytesWrittenTotal);
const size_t bytesToWrite = count - bytesWrittenTotal;
const int bytesWrittenMore =
#if defined(HAVE_GETRANDOM)
getrandom(currentTarget, bytesToWrite, getrandomFlags);
#else
syscall(SYS_getrandom, currentTarget, bytesToWrite, getrandomFlags);
#endif
if (bytesWrittenMore > 0) {
bytesWrittenTotal += bytesWrittenMore;
if (bytesWrittenTotal >= count)
success = 1;
}
} while (! success && (errno == EINTR || errno == EAGAIN));
return success;
}
#endif /* defined(HAVE_GETRANDOM) || defined(HAVE_SYSCALL_GETRANDOM) */
#ifdef _WIN32
typedef BOOLEAN (APIENTRY *RTLGENRANDOM_FUNC)(PVOID, ULONG);
/* Obtain entropy on Windows XP / Windows Server 2003 and later.
* Hint on RtlGenRandom and the following article from libsodioum.
*
* Michael Howard: Cryptographically Secure Random number on Windows without using CryptoAPI
* https://blogs.msdn.microsoft.com/michael_howard/2005/01/14/cryptographically-secure-random-number-on-windows-without-using-cryptoapi/
*/
static int
writeRandomBytes_RtlGenRandom(void * target, size_t count) {
int success = 0; /* full count bytes written? */
const HMODULE advapi32 = LoadLibrary("ADVAPI32.DLL");
if (advapi32) {
const RTLGENRANDOM_FUNC RtlGenRandom
= (RTLGENRANDOM_FUNC)GetProcAddress(advapi32, "SystemFunction036");
if (RtlGenRandom) {
if (RtlGenRandom((PVOID)target, (ULONG)count) == TRUE) {
success = 1;
}
}
FreeLibrary(advapi32);
}
return success;
}
#endif /* _WIN32 */
static unsigned long
gather_time_entropy(void)
{
#ifdef _WIN32
FILETIME ft;
GetSystemTimeAsFileTime(&ft); /* never fails */
return ft.dwHighDateTime ^ ft.dwLowDateTime;
#else
struct timeval tv;
int gettimeofday_res;
gettimeofday_res = gettimeofday(&tv, NULL);
assert (gettimeofday_res == 0);
/* Microseconds time is <20 bits entropy */
return tv.tv_usec;
#endif
}
#if defined(HAVE_ARC4RANDOM_BUF) && defined(HAVE_LIBBSD)
# include
#endif
static unsigned long
ENTROPY_DEBUG(const char * label, unsigned long entropy) {
const char * const EXPAT_ENTROPY_DEBUG = getenv("EXPAT_ENTROPY_DEBUG");
if (EXPAT_ENTROPY_DEBUG && ! strcmp(EXPAT_ENTROPY_DEBUG, "1")) {
fprintf(stderr, "Entropy: %s --> 0x%0*lx (%lu bytes)\n",
label,
(int)sizeof(entropy) * 2, entropy,
(unsigned long)sizeof(entropy));
}
return entropy;
}
static unsigned long
generate_hash_secret_salt(XML_Parser parser)
{
unsigned long entropy;
(void)parser;
#if defined(HAVE_ARC4RANDOM_BUF) || defined(__CloudABI__)
(void)gather_time_entropy;
arc4random_buf(&entropy, sizeof(entropy));
return ENTROPY_DEBUG("arc4random_buf", entropy);
#else
/* Try high quality providers first .. */
#ifdef _WIN32
if (writeRandomBytes_RtlGenRandom((void *)&entropy, sizeof(entropy))) {
return ENTROPY_DEBUG("RtlGenRandom", entropy);
}
#elif defined(HAVE_GETRANDOM) || defined(HAVE_SYSCALL_GETRANDOM)
if (writeRandomBytes_getrandom((void *)&entropy, sizeof(entropy))) {
return ENTROPY_DEBUG("getrandom", entropy);
}
#endif
/* .. and self-made low quality for backup: */
/* Process ID is 0 bits entropy if attacker has local access */
entropy = gather_time_entropy() ^ getpid();
/* Factors are 2^31-1 and 2^61-1 (Mersenne primes M31 and M61) */
if (sizeof(unsigned long) == 4) {
return ENTROPY_DEBUG("fallback(4)", entropy * 2147483647);
} else {
return ENTROPY_DEBUG("fallback(8)",
entropy * (unsigned long)2305843009213693951);
}
#endif
}
static unsigned long
get_hash_secret_salt(XML_Parser parser) {
if (parser->m_parentParser != NULL)
return get_hash_secret_salt(parser->m_parentParser);
return parser->m_hash_secret_salt;
}
static XML_Bool /* only valid for root parser */
startParsing(XML_Parser parser)
{
/* hash functions must be initialized before setContext() is called */
if (hash_secret_salt == 0)
hash_secret_salt = generate_hash_secret_salt(parser);
if (ns) {
/* implicit context only set for root parser, since child
parsers (i.e. external entity parsers) will inherit it
*/
return setContext(parser, implicitContext);
}
return XML_TRUE;
}
XML_Parser XMLCALL
XML_ParserCreate_MM(const XML_Char *encodingName,
const XML_Memory_Handling_Suite *memsuite,
const XML_Char *nameSep)
{
return parserCreate(encodingName, memsuite, nameSep, NULL);
}
static XML_Parser
parserCreate(const XML_Char *encodingName,
const XML_Memory_Handling_Suite *memsuite,
const XML_Char *nameSep,
DTD *dtd)
{
XML_Parser parser;
if (memsuite) {
XML_Memory_Handling_Suite *mtemp;
parser = (XML_Parser)
memsuite->malloc_fcn(sizeof(struct XML_ParserStruct));
if (parser != NULL) {
mtemp = (XML_Memory_Handling_Suite *)&(parser->m_mem);
mtemp->malloc_fcn = memsuite->malloc_fcn;
mtemp->realloc_fcn = memsuite->realloc_fcn;
mtemp->free_fcn = memsuite->free_fcn;
}
}
else {
XML_Memory_Handling_Suite *mtemp;
parser = (XML_Parser)malloc(sizeof(struct XML_ParserStruct));
if (parser != NULL) {
mtemp = (XML_Memory_Handling_Suite *)&(parser->m_mem);
mtemp->malloc_fcn = malloc;
mtemp->realloc_fcn = realloc;
mtemp->free_fcn = free;
}
}
if (!parser)
return parser;
buffer = NULL;
bufferLim = NULL;
attsSize = INIT_ATTS_SIZE;
atts = (ATTRIBUTE *)MALLOC(attsSize * sizeof(ATTRIBUTE));
if (atts == NULL) {
FREE(parser);
return NULL;
}
#ifdef XML_ATTR_INFO
attInfo = (XML_AttrInfo*)MALLOC(attsSize * sizeof(XML_AttrInfo));
if (attInfo == NULL) {
FREE(atts);
FREE(parser);
return NULL;
}
#endif
dataBuf = (XML_Char *)MALLOC(INIT_DATA_BUF_SIZE * sizeof(XML_Char));
if (dataBuf == NULL) {
FREE(atts);
#ifdef XML_ATTR_INFO
FREE(attInfo);
#endif
FREE(parser);
return NULL;
}
dataBufEnd = dataBuf + INIT_DATA_BUF_SIZE;
if (dtd)
_dtd = dtd;
else {
_dtd = dtdCreate(&parser->m_mem);
if (_dtd == NULL) {
FREE(dataBuf);
FREE(atts);
#ifdef XML_ATTR_INFO
FREE(attInfo);
#endif
FREE(parser);
return NULL;
}
}
freeBindingList = NULL;
freeTagList = NULL;
freeInternalEntities = NULL;
groupSize = 0;
groupConnector = NULL;
unknownEncodingHandler = NULL;
unknownEncodingHandlerData = NULL;
namespaceSeparator = ASCII_EXCL;
ns = XML_FALSE;
ns_triplets = XML_FALSE;
nsAtts = NULL;
nsAttsVersion = 0;
nsAttsPower = 0;
poolInit(&tempPool, &(parser->m_mem));
poolInit(&temp2Pool, &(parser->m_mem));
parserInit(parser, encodingName);
if (encodingName && !protocolEncodingName) {
XML_ParserFree(parser);
return NULL;
}
if (nameSep) {
ns = XML_TRUE;
internalEncoding = XmlGetInternalEncodingNS();
namespaceSeparator = *nameSep;
}
else {
internalEncoding = XmlGetInternalEncoding();
}
return parser;
}
static void
parserInit(XML_Parser parser, const XML_Char *encodingName)
{
processor = prologInitProcessor;
XmlPrologStateInit(&prologState);
protocolEncodingName = (encodingName != NULL
? poolCopyString(&tempPool, encodingName)
: NULL);
curBase = NULL;
XmlInitEncoding(&initEncoding, &encoding, 0);
userData = NULL;
handlerArg = NULL;
startElementHandler = NULL;
endElementHandler = NULL;
characterDataHandler = NULL;
processingInstructionHandler = NULL;
commentHandler = NULL;
startCdataSectionHandler = NULL;
endCdataSectionHandler = NULL;
defaultHandler = NULL;
startDoctypeDeclHandler = NULL;
endDoctypeDeclHandler = NULL;
unparsedEntityDeclHandler = NULL;
notationDeclHandler = NULL;
startNamespaceDeclHandler = NULL;
endNamespaceDeclHandler = NULL;
notStandaloneHandler = NULL;
externalEntityRefHandler = NULL;
externalEntityRefHandlerArg = parser;
skippedEntityHandler = NULL;
elementDeclHandler = NULL;
attlistDeclHandler = NULL;
entityDeclHandler = NULL;
xmlDeclHandler = NULL;
bufferPtr = buffer;
bufferEnd = buffer;
parseEndByteIndex = 0;
parseEndPtr = NULL;
declElementType = NULL;
declAttributeId = NULL;
declEntity = NULL;
doctypeName = NULL;
doctypeSysid = NULL;
doctypePubid = NULL;
declAttributeType = NULL;
declNotationName = NULL;
declNotationPublicId = NULL;
declAttributeIsCdata = XML_FALSE;
declAttributeIsId = XML_FALSE;
memset(&position, 0, sizeof(POSITION));
errorCode = XML_ERROR_NONE;
eventPtr = NULL;
eventEndPtr = NULL;
positionPtr = NULL;
openInternalEntities = NULL;
defaultExpandInternalEntities = XML_TRUE;
tagLevel = 0;
tagStack = NULL;
inheritedBindings = NULL;
nSpecifiedAtts = 0;
unknownEncodingMem = NULL;
unknownEncodingRelease = NULL;
unknownEncodingData = NULL;
parentParser = NULL;
ps_parsing = XML_INITIALIZED;
#ifdef XML_DTD
isParamEntity = XML_FALSE;
useForeignDTD = XML_FALSE;
paramEntityParsing = XML_PARAM_ENTITY_PARSING_NEVER;
#endif
hash_secret_salt = 0;
}
/* moves list of bindings to freeBindingList */
static void FASTCALL
moveToFreeBindingList(XML_Parser parser, BINDING *bindings)
{
while (bindings) {
BINDING *b = bindings;
bindings = bindings->nextTagBinding;
b->nextTagBinding = freeBindingList;
freeBindingList = b;
}
}
XML_Bool XMLCALL
XML_ParserReset(XML_Parser parser, const XML_Char *encodingName)
{
TAG *tStk;
OPEN_INTERNAL_ENTITY *openEntityList;
if (parser == NULL)
return XML_FALSE;
if (parentParser)
return XML_FALSE;
/* move tagStack to freeTagList */
tStk = tagStack;
while (tStk) {
TAG *tag = tStk;
tStk = tStk->parent;
tag->parent = freeTagList;
moveToFreeBindingList(parser, tag->bindings);
tag->bindings = NULL;
freeTagList = tag;
}
/* move openInternalEntities to freeInternalEntities */
openEntityList = openInternalEntities;
while (openEntityList) {
OPEN_INTERNAL_ENTITY *openEntity = openEntityList;
openEntityList = openEntity->next;
openEntity->next = freeInternalEntities;
freeInternalEntities = openEntity;
}
moveToFreeBindingList(parser, inheritedBindings);
FREE(unknownEncodingMem);
if (unknownEncodingRelease)
unknownEncodingRelease(unknownEncodingData);
poolClear(&tempPool);
poolClear(&temp2Pool);
parserInit(parser, encodingName);
dtdReset(_dtd, &parser->m_mem);
return XML_TRUE;
}
enum XML_Status XMLCALL
XML_SetEncoding(XML_Parser parser, const XML_Char *encodingName)
{
if (parser == NULL)
return XML_STATUS_ERROR;
/* Block after XML_Parse()/XML_ParseBuffer() has been called.
XXX There's no way for the caller to determine which of the
XXX possible error cases caused the XML_STATUS_ERROR return.
*/
if (ps_parsing == XML_PARSING || ps_parsing == XML_SUSPENDED)
return XML_STATUS_ERROR;
if (encodingName == NULL)
protocolEncodingName = NULL;
else {
protocolEncodingName = poolCopyString(&tempPool, encodingName);
if (!protocolEncodingName)
return XML_STATUS_ERROR;
}
return XML_STATUS_OK;
}
XML_Parser XMLCALL
XML_ExternalEntityParserCreate(XML_Parser oldParser,
const XML_Char *context,
const XML_Char *encodingName)
{
XML_Parser parser = oldParser;
DTD *newDtd = NULL;
DTD *oldDtd;
XML_StartElementHandler oldStartElementHandler;
XML_EndElementHandler oldEndElementHandler;
XML_CharacterDataHandler oldCharacterDataHandler;
XML_ProcessingInstructionHandler oldProcessingInstructionHandler;
XML_CommentHandler oldCommentHandler;
XML_StartCdataSectionHandler oldStartCdataSectionHandler;
XML_EndCdataSectionHandler oldEndCdataSectionHandler;
XML_DefaultHandler oldDefaultHandler;
XML_UnparsedEntityDeclHandler oldUnparsedEntityDeclHandler;
XML_NotationDeclHandler oldNotationDeclHandler;
XML_StartNamespaceDeclHandler oldStartNamespaceDeclHandler;
XML_EndNamespaceDeclHandler oldEndNamespaceDeclHandler;
XML_NotStandaloneHandler oldNotStandaloneHandler;
XML_ExternalEntityRefHandler oldExternalEntityRefHandler;
XML_SkippedEntityHandler oldSkippedEntityHandler;
XML_UnknownEncodingHandler oldUnknownEncodingHandler;
XML_ElementDeclHandler oldElementDeclHandler;
XML_AttlistDeclHandler oldAttlistDeclHandler;
XML_EntityDeclHandler oldEntityDeclHandler;
XML_XmlDeclHandler oldXmlDeclHandler;
ELEMENT_TYPE * oldDeclElementType;
void *oldUserData;
void *oldHandlerArg;
XML_Bool oldDefaultExpandInternalEntities;
XML_Parser oldExternalEntityRefHandlerArg;
#ifdef XML_DTD
enum XML_ParamEntityParsing oldParamEntityParsing;
int oldInEntityValue;
#endif
XML_Bool oldns_triplets;
/* Note that the new parser shares the same hash secret as the old
parser, so that dtdCopy and copyEntityTable can lookup values
from hash tables associated with either parser without us having
to worry which hash secrets each table has.
*/
unsigned long oldhash_secret_salt;
/* Validate the oldParser parameter before we pull everything out of it */
if (oldParser == NULL)
return NULL;
/* Stash the original parser contents on the stack */
oldDtd = _dtd;
oldStartElementHandler = startElementHandler;
oldEndElementHandler = endElementHandler;
oldCharacterDataHandler = characterDataHandler;
oldProcessingInstructionHandler = processingInstructionHandler;
oldCommentHandler = commentHandler;
oldStartCdataSectionHandler = startCdataSectionHandler;
oldEndCdataSectionHandler = endCdataSectionHandler;
oldDefaultHandler = defaultHandler;
oldUnparsedEntityDeclHandler = unparsedEntityDeclHandler;
oldNotationDeclHandler = notationDeclHandler;
oldStartNamespaceDeclHandler = startNamespaceDeclHandler;
oldEndNamespaceDeclHandler = endNamespaceDeclHandler;
oldNotStandaloneHandler = notStandaloneHandler;
oldExternalEntityRefHandler = externalEntityRefHandler;
oldSkippedEntityHandler = skippedEntityHandler;
oldUnknownEncodingHandler = unknownEncodingHandler;
oldElementDeclHandler = elementDeclHandler;
oldAttlistDeclHandler = attlistDeclHandler;
oldEntityDeclHandler = entityDeclHandler;
oldXmlDeclHandler = xmlDeclHandler;
oldDeclElementType = declElementType;
oldUserData = userData;
oldHandlerArg = handlerArg;
oldDefaultExpandInternalEntities = defaultExpandInternalEntities;
oldExternalEntityRefHandlerArg = externalEntityRefHandlerArg;
#ifdef XML_DTD
oldParamEntityParsing = paramEntityParsing;
oldInEntityValue = prologState.inEntityValue;
#endif
oldns_triplets = ns_triplets;
/* Note that the new parser shares the same hash secret as the old
parser, so that dtdCopy and copyEntityTable can lookup values
from hash tables associated with either parser without us having
to worry which hash secrets each table has.
*/
oldhash_secret_salt = hash_secret_salt;
#ifdef XML_DTD
if (!context)
newDtd = oldDtd;
#endif /* XML_DTD */
/* Note that the magical uses of the pre-processor to make field
access look more like C++ require that `parser' be overwritten
here. This makes this function more painful to follow than it
would be otherwise.
*/
if (ns) {
XML_Char tmp[2];
*tmp = namespaceSeparator;
parser = parserCreate(encodingName, &parser->m_mem, tmp, newDtd);
}
else {
parser = parserCreate(encodingName, &parser->m_mem, NULL, newDtd);
}
if (!parser)
return NULL;
startElementHandler = oldStartElementHandler;
endElementHandler = oldEndElementHandler;
characterDataHandler = oldCharacterDataHandler;
processingInstructionHandler = oldProcessingInstructionHandler;
commentHandler = oldCommentHandler;
startCdataSectionHandler = oldStartCdataSectionHandler;
endCdataSectionHandler = oldEndCdataSectionHandler;
defaultHandler = oldDefaultHandler;
unparsedEntityDeclHandler = oldUnparsedEntityDeclHandler;
notationDeclHandler = oldNotationDeclHandler;
startNamespaceDeclHandler = oldStartNamespaceDeclHandler;
endNamespaceDeclHandler = oldEndNamespaceDeclHandler;
notStandaloneHandler = oldNotStandaloneHandler;
externalEntityRefHandler = oldExternalEntityRefHandler;
skippedEntityHandler = oldSkippedEntityHandler;
unknownEncodingHandler = oldUnknownEncodingHandler;
elementDeclHandler = oldElementDeclHandler;
attlistDeclHandler = oldAttlistDeclHandler;
entityDeclHandler = oldEntityDeclHandler;
xmlDeclHandler = oldXmlDeclHandler;
declElementType = oldDeclElementType;
userData = oldUserData;
if (oldUserData == oldHandlerArg)
handlerArg = userData;
else
handlerArg = parser;
if (oldExternalEntityRefHandlerArg != oldParser)
externalEntityRefHandlerArg = oldExternalEntityRefHandlerArg;
defaultExpandInternalEntities = oldDefaultExpandInternalEntities;
ns_triplets = oldns_triplets;
hash_secret_salt = oldhash_secret_salt;
parentParser = oldParser;
#ifdef XML_DTD
paramEntityParsing = oldParamEntityParsing;
prologState.inEntityValue = oldInEntityValue;
if (context) {
#endif /* XML_DTD */
if (!dtdCopy(oldParser, _dtd, oldDtd, &parser->m_mem)
|| !setContext(parser, context)) {
XML_ParserFree(parser);
return NULL;
}
processor = externalEntityInitProcessor;
#ifdef XML_DTD
}
else {
/* The DTD instance referenced by _dtd is shared between the document's
root parser and external PE parsers, therefore one does not need to
call setContext. In addition, one also *must* not call setContext,
because this would overwrite existing prefix->binding pointers in
_dtd with ones that get destroyed with the external PE parser.
This would leave those prefixes with dangling pointers.
*/
isParamEntity = XML_TRUE;
XmlPrologStateInitExternalEntity(&prologState);
processor = externalParEntInitProcessor;
}
#endif /* XML_DTD */
return parser;
}
static void FASTCALL
destroyBindings(BINDING *bindings, XML_Parser parser)
{
for (;;) {
BINDING *b = bindings;
if (!b)
break;
bindings = b->nextTagBinding;
FREE(b->uri);
FREE(b);
}
}
void XMLCALL
XML_ParserFree(XML_Parser parser)
{
TAG *tagList;
OPEN_INTERNAL_ENTITY *entityList;
if (parser == NULL)
return;
/* free tagStack and freeTagList */
tagList = tagStack;
for (;;) {
TAG *p;
if (tagList == NULL) {
if (freeTagList == NULL)
break;
tagList = freeTagList;
freeTagList = NULL;
}
p = tagList;
tagList = tagList->parent;
FREE(p->buf);
destroyBindings(p->bindings, parser);
FREE(p);
}
/* free openInternalEntities and freeInternalEntities */
entityList = openInternalEntities;
for (;;) {
OPEN_INTERNAL_ENTITY *openEntity;
if (entityList == NULL) {
if (freeInternalEntities == NULL)
break;
entityList = freeInternalEntities;
freeInternalEntities = NULL;
}
openEntity = entityList;
entityList = entityList->next;
FREE(openEntity);
}
destroyBindings(freeBindingList, parser);
destroyBindings(inheritedBindings, parser);
poolDestroy(&tempPool);
poolDestroy(&temp2Pool);
#ifdef XML_DTD
/* external parameter entity parsers share the DTD structure
parser->m_dtd with the root parser, so we must not destroy it
*/
if (!isParamEntity && _dtd)
#else
if (_dtd)
#endif /* XML_DTD */
dtdDestroy(_dtd, (XML_Bool)!parentParser, &parser->m_mem);
FREE((void *)atts);
#ifdef XML_ATTR_INFO
FREE((void *)attInfo);
#endif
FREE(groupConnector);
FREE(buffer);
FREE(dataBuf);
FREE(nsAtts);
FREE(unknownEncodingMem);
if (unknownEncodingRelease)
unknownEncodingRelease(unknownEncodingData);
FREE(parser);
}
void XMLCALL
XML_UseParserAsHandlerArg(XML_Parser parser)
{
if (parser != NULL)
handlerArg = parser;
}
enum XML_Error XMLCALL
XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD)
{
if (parser == NULL)
return XML_ERROR_INVALID_ARGUMENT;
#ifdef XML_DTD
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (ps_parsing == XML_PARSING || ps_parsing == XML_SUSPENDED)
return XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING;
useForeignDTD = useDTD;
return XML_ERROR_NONE;
#else
return XML_ERROR_FEATURE_REQUIRES_XML_DTD;
#endif
}
void XMLCALL
XML_SetReturnNSTriplet(XML_Parser parser, int do_nst)
{
if (parser == NULL)
return;
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (ps_parsing == XML_PARSING || ps_parsing == XML_SUSPENDED)
return;
ns_triplets = do_nst ? XML_TRUE : XML_FALSE;
}
void XMLCALL
XML_SetUserData(XML_Parser parser, void *p)
{
if (parser == NULL)
return;
if (handlerArg == userData)
handlerArg = userData = p;
else
userData = p;
}
enum XML_Status XMLCALL
XML_SetBase(XML_Parser parser, const XML_Char *p)
{
if (parser == NULL)
return XML_STATUS_ERROR;
if (p) {
p = poolCopyString(&_dtd->pool, p);
if (!p)
return XML_STATUS_ERROR;
curBase = p;
}
else
curBase = NULL;
return XML_STATUS_OK;
}
const XML_Char * XMLCALL
XML_GetBase(XML_Parser parser)
{
if (parser == NULL)
return NULL;
return curBase;
}
int XMLCALL
XML_GetSpecifiedAttributeCount(XML_Parser parser)
{
if (parser == NULL)
return -1;
return nSpecifiedAtts;
}
int XMLCALL
XML_GetIdAttributeIndex(XML_Parser parser)
{
if (parser == NULL)
return -1;
return idAttIndex;
}
#ifdef XML_ATTR_INFO
const XML_AttrInfo * XMLCALL
XML_GetAttributeInfo(XML_Parser parser)
{
if (parser == NULL)
return NULL;
return attInfo;
}
#endif
void XMLCALL
XML_SetElementHandler(XML_Parser parser,
XML_StartElementHandler start,
XML_EndElementHandler end)
{
if (parser == NULL)
return;
startElementHandler = start;
endElementHandler = end;
}
void XMLCALL
XML_SetStartElementHandler(XML_Parser parser,
XML_StartElementHandler start) {
if (parser != NULL)
startElementHandler = start;
}
void XMLCALL
XML_SetEndElementHandler(XML_Parser parser,
XML_EndElementHandler end) {
if (parser != NULL)
endElementHandler = end;
}
void XMLCALL
XML_SetCharacterDataHandler(XML_Parser parser,
XML_CharacterDataHandler handler)
{
if (parser != NULL)
characterDataHandler = handler;
}
void XMLCALL
XML_SetProcessingInstructionHandler(XML_Parser parser,
XML_ProcessingInstructionHandler handler)
{
if (parser != NULL)
processingInstructionHandler = handler;
}
void XMLCALL
XML_SetCommentHandler(XML_Parser parser,
XML_CommentHandler handler)
{
if (parser != NULL)
commentHandler = handler;
}
void XMLCALL
XML_SetCdataSectionHandler(XML_Parser parser,
XML_StartCdataSectionHandler start,
XML_EndCdataSectionHandler end)
{
if (parser == NULL)
return;
startCdataSectionHandler = start;
endCdataSectionHandler = end;
}
void XMLCALL
XML_SetStartCdataSectionHandler(XML_Parser parser,
XML_StartCdataSectionHandler start) {
if (parser != NULL)
startCdataSectionHandler = start;
}
void XMLCALL
XML_SetEndCdataSectionHandler(XML_Parser parser,
XML_EndCdataSectionHandler end) {
if (parser != NULL)
endCdataSectionHandler = end;
}
void XMLCALL
XML_SetDefaultHandler(XML_Parser parser,
XML_DefaultHandler handler)
{
if (parser == NULL)
return;
defaultHandler = handler;
defaultExpandInternalEntities = XML_FALSE;
}
void XMLCALL
XML_SetDefaultHandlerExpand(XML_Parser parser,
XML_DefaultHandler handler)
{
if (parser == NULL)
return;
defaultHandler = handler;
defaultExpandInternalEntities = XML_TRUE;
}
void XMLCALL
XML_SetDoctypeDeclHandler(XML_Parser parser,
XML_StartDoctypeDeclHandler start,
XML_EndDoctypeDeclHandler end)
{
if (parser == NULL)
return;
startDoctypeDeclHandler = start;
endDoctypeDeclHandler = end;
}
void XMLCALL
XML_SetStartDoctypeDeclHandler(XML_Parser parser,
XML_StartDoctypeDeclHandler start) {
if (parser != NULL)
startDoctypeDeclHandler = start;
}
void XMLCALL
XML_SetEndDoctypeDeclHandler(XML_Parser parser,
XML_EndDoctypeDeclHandler end) {
if (parser != NULL)
endDoctypeDeclHandler = end;
}
void XMLCALL
XML_SetUnparsedEntityDeclHandler(XML_Parser parser,
XML_UnparsedEntityDeclHandler handler)
{
if (parser != NULL)
unparsedEntityDeclHandler = handler;
}
void XMLCALL
XML_SetNotationDeclHandler(XML_Parser parser,
XML_NotationDeclHandler handler)
{
if (parser != NULL)
notationDeclHandler = handler;
}
void XMLCALL
XML_SetNamespaceDeclHandler(XML_Parser parser,
XML_StartNamespaceDeclHandler start,
XML_EndNamespaceDeclHandler end)
{
if (parser == NULL)
return;
startNamespaceDeclHandler = start;
endNamespaceDeclHandler = end;
}
void XMLCALL
XML_SetStartNamespaceDeclHandler(XML_Parser parser,
XML_StartNamespaceDeclHandler start) {
if (parser != NULL)
startNamespaceDeclHandler = start;
}
void XMLCALL
XML_SetEndNamespaceDeclHandler(XML_Parser parser,
XML_EndNamespaceDeclHandler end) {
if (parser != NULL)
endNamespaceDeclHandler = end;
}
void XMLCALL
XML_SetNotStandaloneHandler(XML_Parser parser,
XML_NotStandaloneHandler handler)
{
if (parser != NULL)
notStandaloneHandler = handler;
}
void XMLCALL
XML_SetExternalEntityRefHandler(XML_Parser parser,
XML_ExternalEntityRefHandler handler)
{
if (parser != NULL)
externalEntityRefHandler = handler;
}
void XMLCALL
XML_SetExternalEntityRefHandlerArg(XML_Parser parser, void *arg)
{
if (parser == NULL)
return;
if (arg)
externalEntityRefHandlerArg = (XML_Parser)arg;
else
externalEntityRefHandlerArg = parser;
}
void XMLCALL
XML_SetSkippedEntityHandler(XML_Parser parser,
XML_SkippedEntityHandler handler)
{
if (parser != NULL)
skippedEntityHandler = handler;
}
void XMLCALL
XML_SetUnknownEncodingHandler(XML_Parser parser,
XML_UnknownEncodingHandler handler,
void *data)
{
if (parser == NULL)
return;
unknownEncodingHandler = handler;
unknownEncodingHandlerData = data;
}
void XMLCALL
XML_SetElementDeclHandler(XML_Parser parser,
XML_ElementDeclHandler eldecl)
{
if (parser != NULL)
elementDeclHandler = eldecl;
}
void XMLCALL
XML_SetAttlistDeclHandler(XML_Parser parser,
XML_AttlistDeclHandler attdecl)
{
if (parser != NULL)
attlistDeclHandler = attdecl;
}
void XMLCALL
XML_SetEntityDeclHandler(XML_Parser parser,
XML_EntityDeclHandler handler)
{
if (parser != NULL)
entityDeclHandler = handler;
}
void XMLCALL
XML_SetXmlDeclHandler(XML_Parser parser,
XML_XmlDeclHandler handler) {
if (parser != NULL)
xmlDeclHandler = handler;
}
int XMLCALL
XML_SetParamEntityParsing(XML_Parser parser,
enum XML_ParamEntityParsing peParsing)
{
if (parser == NULL)
return 0;
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (ps_parsing == XML_PARSING || ps_parsing == XML_SUSPENDED)
return 0;
#ifdef XML_DTD
paramEntityParsing = peParsing;
return 1;
#else
return peParsing == XML_PARAM_ENTITY_PARSING_NEVER;
#endif
}
int XMLCALL
XML_SetHashSalt(XML_Parser parser,
unsigned long hash_salt)
{
if (parser == NULL)
return 0;
if (parser->m_parentParser)
return XML_SetHashSalt(parser->m_parentParser, hash_salt);
/* block after XML_Parse()/XML_ParseBuffer() has been called */
if (ps_parsing == XML_PARSING || ps_parsing == XML_SUSPENDED)
return 0;
hash_secret_salt = hash_salt;
return 1;
}
enum XML_Status XMLCALL
XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)
{
if ((parser == NULL) || (len < 0) || ((s == NULL) && (len != 0))) {
errorCode = XML_ERROR_INVALID_ARGUMENT;
return XML_STATUS_ERROR;
}
switch (ps_parsing) {
case XML_SUSPENDED:
errorCode = XML_ERROR_SUSPENDED;
return XML_STATUS_ERROR;
case XML_FINISHED:
errorCode = XML_ERROR_FINISHED;
return XML_STATUS_ERROR;
case XML_INITIALIZED:
if (parentParser == NULL && !startParsing(parser)) {
errorCode = XML_ERROR_NO_MEMORY;
return XML_STATUS_ERROR;
}
default:
ps_parsing = XML_PARSING;
}
if (len == 0) {
ps_finalBuffer = (XML_Bool)isFinal;
if (!isFinal)
return XML_STATUS_OK;
positionPtr = bufferPtr;
parseEndPtr = bufferEnd;
/* If data are left over from last buffer, and we now know that these
data are the final chunk of input, then we have to check them again
to detect errors based on that fact.
*/
errorCode = processor(parser, bufferPtr, parseEndPtr, &bufferPtr);
if (errorCode == XML_ERROR_NONE) {
switch (ps_parsing) {
case XML_SUSPENDED:
XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position);
positionPtr = bufferPtr;
return XML_STATUS_SUSPENDED;
case XML_INITIALIZED:
case XML_PARSING:
ps_parsing = XML_FINISHED;
/* fall through */
default:
return XML_STATUS_OK;
}
}
eventEndPtr = eventPtr;
processor = errorProcessor;
return XML_STATUS_ERROR;
}
#ifndef XML_CONTEXT_BYTES
else if (bufferPtr == bufferEnd) {
const char *end;
int nLeftOver;
enum XML_Status result;
/* Detect overflow (a+b > MAX <==> b > MAX-a) */
if (len > ((XML_Size)-1) / 2 - parseEndByteIndex) {
errorCode = XML_ERROR_NO_MEMORY;
eventPtr = eventEndPtr = NULL;
processor = errorProcessor;
return XML_STATUS_ERROR;
}
parseEndByteIndex += len;
positionPtr = s;
ps_finalBuffer = (XML_Bool)isFinal;
errorCode = processor(parser, s, parseEndPtr = s + len, &end);
if (errorCode != XML_ERROR_NONE) {
eventEndPtr = eventPtr;
processor = errorProcessor;
return XML_STATUS_ERROR;
}
else {
switch (ps_parsing) {
case XML_SUSPENDED:
result = XML_STATUS_SUSPENDED;
break;
case XML_INITIALIZED:
case XML_PARSING:
if (isFinal) {
ps_parsing = XML_FINISHED;
return XML_STATUS_OK;
}
/* fall through */
default:
result = XML_STATUS_OK;
}
}
XmlUpdatePosition(encoding, positionPtr, end, &position);
nLeftOver = s + len - end;
if (nLeftOver) {
if (buffer == NULL || nLeftOver > bufferLim - buffer) {
/* avoid _signed_ integer overflow */
char *temp = NULL;
const int bytesToAllocate = (int)((unsigned)len * 2U);
if (bytesToAllocate > 0) {
temp = (buffer == NULL
? (char *)MALLOC(bytesToAllocate)
: (char *)REALLOC(buffer, bytesToAllocate));
}
if (temp == NULL) {
errorCode = XML_ERROR_NO_MEMORY;
eventPtr = eventEndPtr = NULL;
processor = errorProcessor;
return XML_STATUS_ERROR;
}
buffer = temp;
bufferLim = buffer + bytesToAllocate;
}
memcpy(buffer, end, nLeftOver);
}
bufferPtr = buffer;
bufferEnd = buffer + nLeftOver;
positionPtr = bufferPtr;
parseEndPtr = bufferEnd;
eventPtr = bufferPtr;
eventEndPtr = bufferPtr;
return result;
}
#endif /* not defined XML_CONTEXT_BYTES */
else {
void *buff = XML_GetBuffer(parser, len);
if (buff == NULL)
return XML_STATUS_ERROR;
else {
memcpy(buff, s, len);
return XML_ParseBuffer(parser, len, isFinal);
}
}
}
enum XML_Status XMLCALL
XML_ParseBuffer(XML_Parser parser, int len, int isFinal)
{
const char *start;
enum XML_Status result = XML_STATUS_OK;
if (parser == NULL)
return XML_STATUS_ERROR;
switch (ps_parsing) {
case XML_SUSPENDED:
errorCode = XML_ERROR_SUSPENDED;
return XML_STATUS_ERROR;
case XML_FINISHED:
errorCode = XML_ERROR_FINISHED;
return XML_STATUS_ERROR;
case XML_INITIALIZED:
if (parentParser == NULL && !startParsing(parser)) {
errorCode = XML_ERROR_NO_MEMORY;
return XML_STATUS_ERROR;
}
default:
ps_parsing = XML_PARSING;
}
start = bufferPtr;
positionPtr = start;
bufferEnd += len;
parseEndPtr = bufferEnd;
parseEndByteIndex += len;
ps_finalBuffer = (XML_Bool)isFinal;
errorCode = processor(parser, start, parseEndPtr, &bufferPtr);
if (errorCode != XML_ERROR_NONE) {
eventEndPtr = eventPtr;
processor = errorProcessor;
return XML_STATUS_ERROR;
}
else {
switch (ps_parsing) {
case XML_SUSPENDED:
result = XML_STATUS_SUSPENDED;
break;
case XML_INITIALIZED:
case XML_PARSING:
if (isFinal) {
ps_parsing = XML_FINISHED;
return result;
}
default: ; /* should not happen */
}
}
XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position);
positionPtr = bufferPtr;
return result;
}
void * XMLCALL
XML_GetBuffer(XML_Parser parser, int len)
{
if (parser == NULL)
return NULL;
if (len < 0) {
errorCode = XML_ERROR_NO_MEMORY;
return NULL;
}
switch (ps_parsing) {
case XML_SUSPENDED:
errorCode = XML_ERROR_SUSPENDED;
return NULL;
case XML_FINISHED:
errorCode = XML_ERROR_FINISHED;
return NULL;
default: ;
}
if (len > bufferLim - bufferEnd) {
#ifdef XML_CONTEXT_BYTES
int keep;
#endif /* defined XML_CONTEXT_BYTES */
/* Do not invoke signed arithmetic overflow: */
int neededSize = (int) ((unsigned)len + (unsigned)(bufferEnd - bufferPtr));
if (neededSize < 0) {
errorCode = XML_ERROR_NO_MEMORY;
return NULL;
}
#ifdef XML_CONTEXT_BYTES
keep = (int)(bufferPtr - buffer);
if (keep > XML_CONTEXT_BYTES)
keep = XML_CONTEXT_BYTES;
neededSize += keep;
#endif /* defined XML_CONTEXT_BYTES */
if (neededSize <= bufferLim - buffer) {
#ifdef XML_CONTEXT_BYTES
if (keep < bufferPtr - buffer) {
int offset = (int)(bufferPtr - buffer) - keep;
memmove(buffer, &buffer[offset], bufferEnd - bufferPtr + keep);
bufferEnd -= offset;
bufferPtr -= offset;
}
#else
memmove(buffer, bufferPtr, bufferEnd - bufferPtr);
bufferEnd = buffer + (bufferEnd - bufferPtr);
bufferPtr = buffer;
#endif /* not defined XML_CONTEXT_BYTES */
}
else {
char *newBuf;
int bufferSize = (int)(bufferLim - bufferPtr);
if (bufferSize == 0)
bufferSize = INIT_BUFFER_SIZE;
do {
/* Do not invoke signed arithmetic overflow: */
bufferSize = (int) (2U * (unsigned) bufferSize);
} while (bufferSize < neededSize && bufferSize > 0);
if (bufferSize <= 0) {
errorCode = XML_ERROR_NO_MEMORY;
return NULL;
}
newBuf = (char *)MALLOC(bufferSize);
if (newBuf == 0) {
errorCode = XML_ERROR_NO_MEMORY;
return NULL;
}
bufferLim = newBuf + bufferSize;
#ifdef XML_CONTEXT_BYTES
if (bufferPtr) {
int keep = (int)(bufferPtr - buffer);
if (keep > XML_CONTEXT_BYTES)
keep = XML_CONTEXT_BYTES;
memcpy(newBuf, &bufferPtr[-keep], bufferEnd - bufferPtr + keep);
FREE(buffer);
buffer = newBuf;
bufferEnd = buffer + (bufferEnd - bufferPtr) + keep;
bufferPtr = buffer + keep;
}
else {
bufferEnd = newBuf + (bufferEnd - bufferPtr);
bufferPtr = buffer = newBuf;
}
#else
if (bufferPtr) {
memcpy(newBuf, bufferPtr, bufferEnd - bufferPtr);
FREE(buffer);
}
bufferEnd = newBuf + (bufferEnd - bufferPtr);
bufferPtr = buffer = newBuf;
#endif /* not defined XML_CONTEXT_BYTES */
}
eventPtr = eventEndPtr = NULL;
positionPtr = NULL;
}
return bufferEnd;
}
enum XML_Status XMLCALL
XML_StopParser(XML_Parser parser, XML_Bool resumable)
{
if (parser == NULL)
return XML_STATUS_ERROR;
switch (ps_parsing) {
case XML_SUSPENDED:
if (resumable) {
errorCode = XML_ERROR_SUSPENDED;
return XML_STATUS_ERROR;
}
ps_parsing = XML_FINISHED;
break;
case XML_FINISHED:
errorCode = XML_ERROR_FINISHED;
return XML_STATUS_ERROR;
default:
if (resumable) {
#ifdef XML_DTD
if (isParamEntity) {
errorCode = XML_ERROR_SUSPEND_PE;
return XML_STATUS_ERROR;
}
#endif
ps_parsing = XML_SUSPENDED;
}
else
ps_parsing = XML_FINISHED;
}
return XML_STATUS_OK;
}
enum XML_Status XMLCALL
XML_ResumeParser(XML_Parser parser)
{
enum XML_Status result = XML_STATUS_OK;
if (parser == NULL)
return XML_STATUS_ERROR;
if (ps_parsing != XML_SUSPENDED) {
errorCode = XML_ERROR_NOT_SUSPENDED;
return XML_STATUS_ERROR;
}
ps_parsing = XML_PARSING;
errorCode = processor(parser, bufferPtr, parseEndPtr, &bufferPtr);
if (errorCode != XML_ERROR_NONE) {
eventEndPtr = eventPtr;
processor = errorProcessor;
return XML_STATUS_ERROR;
}
else {
switch (ps_parsing) {
case XML_SUSPENDED:
result = XML_STATUS_SUSPENDED;
break;
case XML_INITIALIZED:
case XML_PARSING:
if (ps_finalBuffer) {
ps_parsing = XML_FINISHED;
return result;
}
default: ;
}
}
XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position);
positionPtr = bufferPtr;
return result;
}
void XMLCALL
XML_GetParsingStatus(XML_Parser parser, XML_ParsingStatus *status)
{
if (parser == NULL)
return;
assert(status != NULL);
*status = parser->m_parsingStatus;
}
enum XML_Error XMLCALL
XML_GetErrorCode(XML_Parser parser)
{
if (parser == NULL)
return XML_ERROR_INVALID_ARGUMENT;
return errorCode;
}
XML_Index XMLCALL
XML_GetCurrentByteIndex(XML_Parser parser)
{
if (parser == NULL)
return -1;
if (eventPtr)
return (XML_Index)(parseEndByteIndex - (parseEndPtr - eventPtr));
return -1;
}
int XMLCALL
XML_GetCurrentByteCount(XML_Parser parser)
{
if (parser == NULL)
return 0;
if (eventEndPtr && eventPtr)
return (int)(eventEndPtr - eventPtr);
return 0;
}
const char * XMLCALL
XML_GetInputContext(XML_Parser parser, int *offset, int *size)
{
#ifdef XML_CONTEXT_BYTES
if (parser == NULL)
return NULL;
if (eventPtr && buffer) {
if (offset != NULL)
*offset = (int)(eventPtr - buffer);
if (size != NULL)
*size = (int)(bufferEnd - buffer);
return buffer;
}
#else
(void)parser;
(void)offset;
(void)size;
#endif /* defined XML_CONTEXT_BYTES */
return (char *) 0;
}
XML_Size XMLCALL
XML_GetCurrentLineNumber(XML_Parser parser)
{
if (parser == NULL)
return 0;
if (eventPtr && eventPtr >= positionPtr) {
XmlUpdatePosition(encoding, positionPtr, eventPtr, &position);
positionPtr = eventPtr;
}
return position.lineNumber + 1;
}
XML_Size XMLCALL
XML_GetCurrentColumnNumber(XML_Parser parser)
{
if (parser == NULL)
return 0;
if (eventPtr && eventPtr >= positionPtr) {
XmlUpdatePosition(encoding, positionPtr, eventPtr, &position);
positionPtr = eventPtr;
}
return position.columnNumber;
}
void XMLCALL
XML_FreeContentModel(XML_Parser parser, XML_Content *model)
{
if (parser != NULL)
FREE(model);
}
void * XMLCALL
XML_MemMalloc(XML_Parser parser, size_t size)
{
if (parser == NULL)
return NULL;
return MALLOC(size);
}
void * XMLCALL
XML_MemRealloc(XML_Parser parser, void *ptr, size_t size)
{
if (parser == NULL)
return NULL;
return REALLOC(ptr, size);
}
void XMLCALL
XML_MemFree(XML_Parser parser, void *ptr)
{
if (parser != NULL)
FREE(ptr);
}
void XMLCALL
XML_DefaultCurrent(XML_Parser parser)
{
if (parser == NULL)
return;
if (defaultHandler) {
if (openInternalEntities)
reportDefault(parser,
internalEncoding,
openInternalEntities->internalEventPtr,
openInternalEntities->internalEventEndPtr);
else
reportDefault(parser, encoding, eventPtr, eventEndPtr);
}
}
const XML_LChar * XMLCALL
XML_ErrorString(enum XML_Error code)
{
static const XML_LChar* const message[] = {
0,
XML_L("out of memory"),
XML_L("syntax error"),
XML_L("no element found"),
XML_L("not well-formed (invalid token)"),
XML_L("unclosed token"),
XML_L("partial character"),
XML_L("mismatched tag"),
XML_L("duplicate attribute"),
XML_L("junk after document element"),
XML_L("illegal parameter entity reference"),
XML_L("undefined entity"),
XML_L("recursive entity reference"),
XML_L("asynchronous entity"),
XML_L("reference to invalid character number"),
XML_L("reference to binary entity"),
XML_L("reference to external entity in attribute"),
XML_L("XML or text declaration not at start of entity"),
XML_L("unknown encoding"),
XML_L("encoding specified in XML declaration is incorrect"),
XML_L("unclosed CDATA section"),
XML_L("error in processing external entity reference"),
XML_L("document is not standalone"),
XML_L("unexpected parser state - please send a bug report"),
XML_L("entity declared in parameter entity"),
XML_L("requested feature requires XML_DTD support in Expat"),
XML_L("cannot change setting once parsing has begun"),
XML_L("unbound prefix"),
XML_L("must not undeclare prefix"),
XML_L("incomplete markup in parameter entity"),
XML_L("XML declaration not well-formed"),
XML_L("text declaration not well-formed"),
XML_L("illegal character(s) in public id"),
XML_L("parser suspended"),
XML_L("parser not suspended"),
XML_L("parsing aborted"),
XML_L("parsing finished"),
XML_L("cannot suspend in external parameter entity"),
XML_L("reserved prefix (xml) must not be undeclared or bound to another namespace name"),
XML_L("reserved prefix (xmlns) must not be declared or undeclared"),
XML_L("prefix must not be bound to one of the reserved namespace names")
};
if (code > 0 && code < sizeof(message)/sizeof(message[0]))
return message[code];
return NULL;
}
const XML_LChar * XMLCALL
XML_ExpatVersion(void) {
/* V1 is used to string-ize the version number. However, it would
string-ize the actual version macro *names* unless we get them
substituted before being passed to V1. CPP is defined to expand
a macro, then rescan for more expansions. Thus, we use V2 to expand
the version macros, then CPP will expand the resulting V1() macro
with the correct numerals. */
/* ### I'm assuming cpp is portable in this respect... */
#define V1(a,b,c) XML_L(#a)XML_L(".")XML_L(#b)XML_L(".")XML_L(#c)
#define V2(a,b,c) XML_L("expat_")V1(a,b,c)
return V2(XML_MAJOR_VERSION, XML_MINOR_VERSION, XML_MICRO_VERSION);
#undef V1
#undef V2
}
XML_Expat_Version XMLCALL
XML_ExpatVersionInfo(void)
{
XML_Expat_Version version;
version.major = XML_MAJOR_VERSION;
version.minor = XML_MINOR_VERSION;
version.micro = XML_MICRO_VERSION;
return version;
}
const XML_Feature * XMLCALL
XML_GetFeatureList(void)
{
static const XML_Feature features[] = {
{XML_FEATURE_SIZEOF_XML_CHAR, XML_L("sizeof(XML_Char)"),
sizeof(XML_Char)},
{XML_FEATURE_SIZEOF_XML_LCHAR, XML_L("sizeof(XML_LChar)"),
sizeof(XML_LChar)},
#ifdef XML_UNICODE
{XML_FEATURE_UNICODE, XML_L("XML_UNICODE"), 0},
#endif
#ifdef XML_UNICODE_WCHAR_T
{XML_FEATURE_UNICODE_WCHAR_T, XML_L("XML_UNICODE_WCHAR_T"), 0},
#endif
#ifdef XML_DTD
{XML_FEATURE_DTD, XML_L("XML_DTD"), 0},
#endif
#ifdef XML_CONTEXT_BYTES
{XML_FEATURE_CONTEXT_BYTES, XML_L("XML_CONTEXT_BYTES"),
XML_CONTEXT_BYTES},
#endif
#ifdef XML_MIN_SIZE
{XML_FEATURE_MIN_SIZE, XML_L("XML_MIN_SIZE"), 0},
#endif
#ifdef XML_NS
{XML_FEATURE_NS, XML_L("XML_NS"), 0},
#endif
#ifdef XML_LARGE_SIZE
{XML_FEATURE_LARGE_SIZE, XML_L("XML_LARGE_SIZE"), 0},
#endif
#ifdef XML_ATTR_INFO
{XML_FEATURE_ATTR_INFO, XML_L("XML_ATTR_INFO"), 0},
#endif
{XML_FEATURE_END, NULL, 0}
};
return features;
}
/* Initially tag->rawName always points into the parse buffer;
for those TAG instances opened while the current parse buffer was
processed, and not yet closed, we need to store tag->rawName in a more
permanent location, since the parse buffer is about to be discarded.
*/
static XML_Bool
storeRawNames(XML_Parser parser)
{
TAG *tag = tagStack;
while (tag) {
int bufSize;
int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);
char *rawNameBuf = tag->buf + nameLen;
/* Stop if already stored. Since tagStack is a stack, we can stop
at the first entry that has already been copied; everything
below it in the stack is already been accounted for in a
previous call to this function.
*/
if (tag->rawName == rawNameBuf)
break;
/* For re-use purposes we need to ensure that the
size of tag->buf is a multiple of sizeof(XML_Char).
*/
bufSize = nameLen + ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
if (bufSize > tag->bufEnd - tag->buf) {
char *temp = (char *)REALLOC(tag->buf, bufSize);
if (temp == NULL)
return XML_FALSE;
/* if tag->name.str points to tag->buf (only when namespace
processing is off) then we have to update it
*/
if (tag->name.str == (XML_Char *)tag->buf)
tag->name.str = (XML_Char *)temp;
/* if tag->name.localPart is set (when namespace processing is on)
then update it as well, since it will always point into tag->buf
*/
if (tag->name.localPart)
tag->name.localPart = (XML_Char *)temp + (tag->name.localPart -
(XML_Char *)tag->buf);
tag->buf = temp;
tag->bufEnd = temp + bufSize;
rawNameBuf = temp + nameLen;
}
memcpy(rawNameBuf, tag->rawName, tag->rawNameLength);
tag->rawName = rawNameBuf;
tag = tag->parent;
}
return XML_TRUE;
}
static enum XML_Error PTRCALL
contentProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = doContent(parser, 0, encoding, start, end,
endPtr, (XML_Bool)!ps_finalBuffer);
if (result == XML_ERROR_NONE) {
if (!storeRawNames(parser))
return XML_ERROR_NO_MEMORY;
}
return result;
}
static enum XML_Error PTRCALL
externalEntityInitProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = initializeEncoding(parser);
if (result != XML_ERROR_NONE)
return result;
processor = externalEntityInitProcessor2;
return externalEntityInitProcessor2(parser, start, end, endPtr);
}
static enum XML_Error PTRCALL
externalEntityInitProcessor2(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
const char *next = start; /* XmlContentTok doesn't always set the last arg */
int tok = XmlContentTok(encoding, start, end, &next);
switch (tok) {
case XML_TOK_BOM:
/* If we are at the end of the buffer, this would cause the next stage,
i.e. externalEntityInitProcessor3, to pass control directly to
doContent (by detecting XML_TOK_NONE) without processing any xml text
declaration - causing the error XML_ERROR_MISPLACED_XML_PI in doContent.
*/
if (next == end && !ps_finalBuffer) {
*endPtr = next;
return XML_ERROR_NONE;
}
start = next;
break;
case XML_TOK_PARTIAL:
if (!ps_finalBuffer) {
*endPtr = start;
return XML_ERROR_NONE;
}
eventPtr = start;
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (!ps_finalBuffer) {
*endPtr = start;
return XML_ERROR_NONE;
}
eventPtr = start;
return XML_ERROR_PARTIAL_CHAR;
}
processor = externalEntityInitProcessor3;
return externalEntityInitProcessor3(parser, start, end, endPtr);
}
static enum XML_Error PTRCALL
externalEntityInitProcessor3(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
int tok;
const char *next = start; /* XmlContentTok doesn't always set the last arg */
eventPtr = start;
tok = XmlContentTok(encoding, start, end, &next);
eventEndPtr = next;
switch (tok) {
case XML_TOK_XML_DECL:
{
enum XML_Error result;
result = processXmlDecl(parser, 1, start, next);
if (result != XML_ERROR_NONE)
return result;
switch (ps_parsing) {
case XML_SUSPENDED:
*endPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
default:
start = next;
}
}
break;
case XML_TOK_PARTIAL:
if (!ps_finalBuffer) {
*endPtr = start;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (!ps_finalBuffer) {
*endPtr = start;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
}
processor = externalEntityContentProcessor;
tagLevel = 1;
return externalEntityContentProcessor(parser, start, end, endPtr);
}
static enum XML_Error PTRCALL
externalEntityContentProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = doContent(parser, 1, encoding, start, end,
endPtr, (XML_Bool)!ps_finalBuffer);
if (result == XML_ERROR_NONE) {
if (!storeRawNames(parser))
return XML_ERROR_NO_MEMORY;
}
return result;
}
static enum XML_Error
doContent(XML_Parser parser,
int startTagLevel,
const ENCODING *enc,
const char *s,
const char *end,
const char **nextPtr,
XML_Bool haveMore)
{
/* save one level of indirection */
DTD * const dtd = _dtd;
const char **eventPP;
const char **eventEndPP;
if (enc == encoding) {
eventPP = &eventPtr;
eventEndPP = &eventEndPtr;
}
else {
eventPP = &(openInternalEntities->internalEventPtr);
eventEndPP = &(openInternalEntities->internalEventEndPtr);
}
*eventPP = s;
for (;;) {
const char *next = s; /* XmlContentTok doesn't always set the last arg */
int tok = XmlContentTok(enc, s, end, &next);
*eventEndPP = next;
switch (tok) {
case XML_TOK_TRAILING_CR:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
*eventEndPP = end;
if (characterDataHandler) {
XML_Char c = 0xA;
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
reportDefault(parser, enc, s, end);
/* We are at the end of the final buffer, should we check for
XML_SUSPENDED, XML_FINISHED?
*/
if (startTagLevel == 0)
return XML_ERROR_NO_ELEMENTS;
if (tagLevel != startTagLevel)
return XML_ERROR_ASYNC_ENTITY;
*nextPtr = end;
return XML_ERROR_NONE;
case XML_TOK_NONE:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
if (startTagLevel > 0) {
if (tagLevel != startTagLevel)
return XML_ERROR_ASYNC_ENTITY;
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_NO_ELEMENTS;
case XML_TOK_INVALID:
*eventPP = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_ENTITY_REF:
{
const XML_Char *name;
ENTITY *entity;
XML_Char ch = (XML_Char) XmlPredefinedEntityName(enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (ch) {
if (characterDataHandler)
characterDataHandler(handlerArg, &ch, 1);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
name = poolStoreString(&dtd->pool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!name)
return XML_ERROR_NO_MEMORY;
entity = (ENTITY *)lookup(parser, &dtd->generalEntities, name, 0);
poolDiscard(&dtd->pool);
/* First, determine if a check for an existing declaration is needed;
if yes, check that the entity exists, and that it is internal,
otherwise call the skipped entity or default handler.
*/
if (!dtd->hasParamEntityRefs || dtd->standalone) {
if (!entity)
return XML_ERROR_UNDEFINED_ENTITY;
else if (!entity->is_internal)
return XML_ERROR_ENTITY_DECLARED_IN_PE;
}
else if (!entity) {
if (skippedEntityHandler)
skippedEntityHandler(handlerArg, name, 0);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
if (entity->open)
return XML_ERROR_RECURSIVE_ENTITY_REF;
if (entity->notation)
return XML_ERROR_BINARY_ENTITY_REF;
if (entity->textPtr) {
enum XML_Error result;
if (!defaultExpandInternalEntities) {
if (skippedEntityHandler)
skippedEntityHandler(handlerArg, entity->name, 0);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
result = processInternalEntity(parser, entity, XML_FALSE);
if (result != XML_ERROR_NONE)
return result;
}
else if (externalEntityRefHandler) {
const XML_Char *context;
entity->open = XML_TRUE;
context = getContext(parser);
entity->open = XML_FALSE;
if (!context)
return XML_ERROR_NO_MEMORY;
if (!externalEntityRefHandler(externalEntityRefHandlerArg,
context,
entity->base,
entity->systemId,
entity->publicId))
return XML_ERROR_EXTERNAL_ENTITY_HANDLING;
poolDiscard(&tempPool);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
case XML_TOK_START_TAG_NO_ATTS:
/* fall through */
case XML_TOK_START_TAG_WITH_ATTS:
{
TAG *tag;
enum XML_Error result;
XML_Char *toPtr;
if (freeTagList) {
tag = freeTagList;
freeTagList = freeTagList->parent;
}
else {
tag = (TAG *)MALLOC(sizeof(TAG));
if (!tag)
return XML_ERROR_NO_MEMORY;
tag->buf = (char *)MALLOC(INIT_TAG_BUF_SIZE);
if (!tag->buf) {
FREE(tag);
return XML_ERROR_NO_MEMORY;
}
tag->bufEnd = tag->buf + INIT_TAG_BUF_SIZE;
}
tag->bindings = NULL;
tag->parent = tagStack;
tagStack = tag;
tag->name.localPart = NULL;
tag->name.prefix = NULL;
tag->rawName = s + enc->minBytesPerChar;
tag->rawNameLength = XmlNameLength(enc, tag->rawName);
++tagLevel;
{
const char *rawNameEnd = tag->rawName + tag->rawNameLength;
const char *fromPtr = tag->rawName;
toPtr = (XML_Char *)tag->buf;
for (;;) {
int bufSize;
int convLen;
const enum XML_Convert_Result convert_res = XmlConvert(enc,
&fromPtr, rawNameEnd,
(ICHAR **)&toPtr, (ICHAR *)tag->bufEnd - 1);
convLen = (int)(toPtr - (XML_Char *)tag->buf);
if ((fromPtr >= rawNameEnd) || (convert_res == XML_CONVERT_INPUT_INCOMPLETE)) {
tag->name.strLen = convLen;
break;
}
bufSize = (int)(tag->bufEnd - tag->buf) << 1;
{
char *temp = (char *)REALLOC(tag->buf, bufSize);
if (temp == NULL)
return XML_ERROR_NO_MEMORY;
tag->buf = temp;
tag->bufEnd = temp + bufSize;
toPtr = (XML_Char *)temp + convLen;
}
}
}
tag->name.str = (XML_Char *)tag->buf;
*toPtr = XML_T('\0');
result = storeAtts(parser, enc, s, &(tag->name), &(tag->bindings));
if (result)
return result;
if (startElementHandler)
startElementHandler(handlerArg, tag->name.str,
(const XML_Char **)atts);
else if (defaultHandler)
reportDefault(parser, enc, s, next);
poolClear(&tempPool);
break;
}
case XML_TOK_EMPTY_ELEMENT_NO_ATTS:
/* fall through */
case XML_TOK_EMPTY_ELEMENT_WITH_ATTS:
{
const char *rawName = s + enc->minBytesPerChar;
enum XML_Error result;
BINDING *bindings = NULL;
XML_Bool noElmHandlers = XML_TRUE;
TAG_NAME name;
name.str = poolStoreString(&tempPool, enc, rawName,
rawName + XmlNameLength(enc, rawName));
if (!name.str)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
result = storeAtts(parser, enc, s, &name, &bindings);
if (result != XML_ERROR_NONE) {
freeBindings(parser, bindings);
return result;
}
poolFinish(&tempPool);
if (startElementHandler) {
startElementHandler(handlerArg, name.str, (const XML_Char **)atts);
noElmHandlers = XML_FALSE;
}
if (endElementHandler) {
if (startElementHandler)
*eventPP = *eventEndPP;
endElementHandler(handlerArg, name.str);
noElmHandlers = XML_FALSE;
}
if (noElmHandlers && defaultHandler)
reportDefault(parser, enc, s, next);
poolClear(&tempPool);
freeBindings(parser, bindings);
}
if (tagLevel == 0)
return epilogProcessor(parser, next, end, nextPtr);
break;
case XML_TOK_END_TAG:
if (tagLevel == startTagLevel)
return XML_ERROR_ASYNC_ENTITY;
else {
int len;
const char *rawName;
TAG *tag = tagStack;
tagStack = tag->parent;
tag->parent = freeTagList;
freeTagList = tag;
rawName = s + enc->minBytesPerChar*2;
len = XmlNameLength(enc, rawName);
if (len != tag->rawNameLength
|| memcmp(tag->rawName, rawName, len) != 0) {
*eventPP = rawName;
return XML_ERROR_TAG_MISMATCH;
}
--tagLevel;
if (endElementHandler) {
const XML_Char *localPart;
const XML_Char *prefix;
XML_Char *uri;
localPart = tag->name.localPart;
if (ns && localPart) {
/* localPart and prefix may have been overwritten in
tag->name.str, since this points to the binding->uri
buffer which gets re-used; so we have to add them again
*/
uri = (XML_Char *)tag->name.str + tag->name.uriLen;
/* don't need to check for space - already done in storeAtts() */
while (*localPart) *uri++ = *localPart++;
prefix = (XML_Char *)tag->name.prefix;
if (ns_triplets && prefix) {
*uri++ = namespaceSeparator;
while (*prefix) *uri++ = *prefix++;
}
*uri = XML_T('\0');
}
endElementHandler(handlerArg, tag->name.str);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
while (tag->bindings) {
BINDING *b = tag->bindings;
if (endNamespaceDeclHandler)
endNamespaceDeclHandler(handlerArg, b->prefix->name);
tag->bindings = tag->bindings->nextTagBinding;
b->nextTagBinding = freeBindingList;
freeBindingList = b;
b->prefix->binding = b->prevPrefixBinding;
}
if (tagLevel == 0)
return epilogProcessor(parser, next, end, nextPtr);
}
break;
case XML_TOK_CHAR_REF:
{
int n = XmlCharRefNumber(enc, s);
if (n < 0)
return XML_ERROR_BAD_CHAR_REF;
if (characterDataHandler) {
XML_Char buf[XML_ENCODE_MAX];
characterDataHandler(handlerArg, buf, XmlEncode(n, (ICHAR *)buf));
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
}
break;
case XML_TOK_XML_DECL:
return XML_ERROR_MISPLACED_XML_PI;
case XML_TOK_DATA_NEWLINE:
if (characterDataHandler) {
XML_Char c = 0xA;
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
case XML_TOK_CDATA_SECT_OPEN:
{
enum XML_Error result;
if (startCdataSectionHandler)
startCdataSectionHandler(handlerArg);
#if 0
/* Suppose you doing a transformation on a document that involves
changing only the character data. You set up a defaultHandler
and a characterDataHandler. The defaultHandler simply copies
characters through. The characterDataHandler does the
transformation and writes the characters out escaping them as
necessary. This case will fail to work if we leave out the
following two lines (because & and < inside CDATA sections will
be incorrectly escaped).
However, now we have a start/endCdataSectionHandler, so it seems
easier to let the user deal with this.
*/
else if (characterDataHandler)
characterDataHandler(handlerArg, dataBuf, 0);
#endif
else if (defaultHandler)
reportDefault(parser, enc, s, next);
result = doCdataSection(parser, enc, &next, end, nextPtr, haveMore);
if (result != XML_ERROR_NONE)
return result;
else if (!next) {
processor = cdataSectionProcessor;
return result;
}
}
break;
case XML_TOK_TRAILING_RSQB:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
if (characterDataHandler) {
if (MUST_CONVERT(enc, s)) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd);
characterDataHandler(handlerArg, dataBuf,
(int)(dataPtr - (ICHAR *)dataBuf));
}
else
characterDataHandler(handlerArg,
(XML_Char *)s,
(int)((XML_Char *)end - (XML_Char *)s));
}
else if (defaultHandler)
reportDefault(parser, enc, s, end);
/* We are at the end of the final buffer, should we check for
XML_SUSPENDED, XML_FINISHED?
*/
if (startTagLevel == 0) {
*eventPP = end;
return XML_ERROR_NO_ELEMENTS;
}
if (tagLevel != startTagLevel) {
*eventPP = end;
return XML_ERROR_ASYNC_ENTITY;
}
*nextPtr = end;
return XML_ERROR_NONE;
case XML_TOK_DATA_CHARS:
{
XML_CharacterDataHandler charDataHandler = characterDataHandler;
if (charDataHandler) {
if (MUST_CONVERT(enc, s)) {
for (;;) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
const enum XML_Convert_Result convert_res = XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd);
*eventEndPP = s;
charDataHandler(handlerArg, dataBuf,
(int)(dataPtr - (ICHAR *)dataBuf));
if ((convert_res == XML_CONVERT_COMPLETED) || (convert_res == XML_CONVERT_INPUT_INCOMPLETE))
break;
*eventPP = s;
}
}
else
charDataHandler(handlerArg,
(XML_Char *)s,
(int)((XML_Char *)next - (XML_Char *)s));
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
}
break;
case XML_TOK_PI:
if (!reportProcessingInstruction(parser, enc, s, next))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_COMMENT:
if (!reportComment(parser, enc, s, next))
return XML_ERROR_NO_MEMORY;
break;
default:
if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
}
*eventPP = s = next;
switch (ps_parsing) {
case XML_SUSPENDED:
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
default: ;
}
}
/* not reached */
}
/* This function does not call free() on the allocated memory, merely
* moving it to the parser's freeBindingList where it can be freed or
* reused as appropriate.
*/
static void
freeBindings(XML_Parser parser, BINDING *bindings)
{
while (bindings) {
BINDING *b = bindings;
/* startNamespaceDeclHandler will have been called for this
* binding in addBindings(), so call the end handler now.
*/
if (endNamespaceDeclHandler)
endNamespaceDeclHandler(handlerArg, b->prefix->name);
bindings = bindings->nextTagBinding;
b->nextTagBinding = freeBindingList;
freeBindingList = b;
b->prefix->binding = b->prevPrefixBinding;
}
}
/* Precondition: all arguments must be non-NULL;
Purpose:
- normalize attributes
- check attributes for well-formedness
- generate namespace aware attribute names (URI, prefix)
- build list of attributes for startElementHandler
- default attributes
- process namespace declarations (check and report them)
- generate namespace aware element name (URI, prefix)
*/
static enum XML_Error
storeAtts(XML_Parser parser, const ENCODING *enc,
const char *attStr, TAG_NAME *tagNamePtr,
BINDING **bindingsPtr)
{
DTD * const dtd = _dtd; /* save one level of indirection */
ELEMENT_TYPE *elementType;
int nDefaultAtts;
const XML_Char **appAtts; /* the attribute list for the application */
int attIndex = 0;
int prefixLen;
int i;
int n;
XML_Char *uri;
int nPrefixes = 0;
BINDING *binding;
const XML_Char *localPart;
/* lookup the element type name */
elementType = (ELEMENT_TYPE *)lookup(parser, &dtd->elementTypes, tagNamePtr->str,0);
if (!elementType) {
const XML_Char *name = poolCopyString(&dtd->pool, tagNamePtr->str);
if (!name)
return XML_ERROR_NO_MEMORY;
elementType = (ELEMENT_TYPE *)lookup(parser, &dtd->elementTypes, name,
sizeof(ELEMENT_TYPE));
if (!elementType)
return XML_ERROR_NO_MEMORY;
if (ns && !setElementTypePrefix(parser, elementType))
return XML_ERROR_NO_MEMORY;
}
nDefaultAtts = elementType->nDefaultAtts;
/* get the attributes from the tokenizer */
n = XmlGetAttributes(enc, attStr, attsSize, atts);
if (n + nDefaultAtts > attsSize) {
int oldAttsSize = attsSize;
ATTRIBUTE *temp;
#ifdef XML_ATTR_INFO
XML_AttrInfo *temp2;
#endif
attsSize = n + nDefaultAtts + INIT_ATTS_SIZE;
temp = (ATTRIBUTE *)REALLOC((void *)atts, attsSize * sizeof(ATTRIBUTE));
if (temp == NULL)
return XML_ERROR_NO_MEMORY;
atts = temp;
#ifdef XML_ATTR_INFO
temp2 = (XML_AttrInfo *)REALLOC((void *)attInfo, attsSize * sizeof(XML_AttrInfo));
if (temp2 == NULL)
return XML_ERROR_NO_MEMORY;
attInfo = temp2;
#endif
if (n > oldAttsSize)
XmlGetAttributes(enc, attStr, n, atts);
}
appAtts = (const XML_Char **)atts;
for (i = 0; i < n; i++) {
ATTRIBUTE *currAtt = &atts[i];
#ifdef XML_ATTR_INFO
XML_AttrInfo *currAttInfo = &attInfo[i];
#endif
/* add the name and value to the attribute list */
ATTRIBUTE_ID *attId = getAttributeId(parser, enc, currAtt->name,
currAtt->name
+ XmlNameLength(enc, currAtt->name));
if (!attId)
return XML_ERROR_NO_MEMORY;
#ifdef XML_ATTR_INFO
currAttInfo->nameStart = parseEndByteIndex - (parseEndPtr - currAtt->name);
currAttInfo->nameEnd = currAttInfo->nameStart +
XmlNameLength(enc, currAtt->name);
currAttInfo->valueStart = parseEndByteIndex -
(parseEndPtr - currAtt->valuePtr);
currAttInfo->valueEnd = parseEndByteIndex - (parseEndPtr - currAtt->valueEnd);
#endif
/* Detect duplicate attributes by their QNames. This does not work when
namespace processing is turned on and different prefixes for the same
namespace are used. For this case we have a check further down.
*/
if ((attId->name)[-1]) {
if (enc == encoding)
eventPtr = atts[i].name;
return XML_ERROR_DUPLICATE_ATTRIBUTE;
}
(attId->name)[-1] = 1;
appAtts[attIndex++] = attId->name;
if (!atts[i].normalized) {
enum XML_Error result;
XML_Bool isCdata = XML_TRUE;
/* figure out whether declared as other than CDATA */
if (attId->maybeTokenized) {
int j;
for (j = 0; j < nDefaultAtts; j++) {
if (attId == elementType->defaultAtts[j].id) {
isCdata = elementType->defaultAtts[j].isCdata;
break;
}
}
}
/* normalize the attribute value */
result = storeAttributeValue(parser, enc, isCdata,
atts[i].valuePtr, atts[i].valueEnd,
&tempPool);
if (result)
return result;
appAtts[attIndex] = poolStart(&tempPool);
poolFinish(&tempPool);
}
else {
/* the value did not need normalizing */
appAtts[attIndex] = poolStoreString(&tempPool, enc, atts[i].valuePtr,
atts[i].valueEnd);
if (appAtts[attIndex] == 0)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
}
/* handle prefixed attribute names */
if (attId->prefix) {
if (attId->xmlns) {
/* deal with namespace declarations here */
enum XML_Error result = addBinding(parser, attId->prefix, attId,
appAtts[attIndex], bindingsPtr);
if (result)
return result;
--attIndex;
}
else {
/* deal with other prefixed names later */
attIndex++;
nPrefixes++;
(attId->name)[-1] = 2;
}
}
else
attIndex++;
}
/* set-up for XML_GetSpecifiedAttributeCount and XML_GetIdAttributeIndex */
nSpecifiedAtts = attIndex;
if (elementType->idAtt && (elementType->idAtt->name)[-1]) {
for (i = 0; i < attIndex; i += 2)
if (appAtts[i] == elementType->idAtt->name) {
idAttIndex = i;
break;
}
}
else
idAttIndex = -1;
/* do attribute defaulting */
for (i = 0; i < nDefaultAtts; i++) {
const DEFAULT_ATTRIBUTE *da = elementType->defaultAtts + i;
if (!(da->id->name)[-1] && da->value) {
if (da->id->prefix) {
if (da->id->xmlns) {
enum XML_Error result = addBinding(parser, da->id->prefix, da->id,
da->value, bindingsPtr);
if (result)
return result;
}
else {
(da->id->name)[-1] = 2;
nPrefixes++;
appAtts[attIndex++] = da->id->name;
appAtts[attIndex++] = da->value;
}
}
else {
(da->id->name)[-1] = 1;
appAtts[attIndex++] = da->id->name;
appAtts[attIndex++] = da->value;
}
}
}
appAtts[attIndex] = 0;
/* expand prefixed attribute names, check for duplicates,
and clear flags that say whether attributes were specified */
i = 0;
if (nPrefixes) {
int j; /* hash table index */
unsigned long version = nsAttsVersion;
int nsAttsSize = (int)1 << nsAttsPower;
/* size of hash table must be at least 2 * (# of prefixed attributes) */
if ((nPrefixes << 1) >> nsAttsPower) { /* true for nsAttsPower = 0 */
NS_ATT *temp;
/* hash table size must also be a power of 2 and >= 8 */
while (nPrefixes >> nsAttsPower++);
if (nsAttsPower < 3)
nsAttsPower = 3;
nsAttsSize = (int)1 << nsAttsPower;
temp = (NS_ATT *)REALLOC(nsAtts, nsAttsSize * sizeof(NS_ATT));
if (!temp)
return XML_ERROR_NO_MEMORY;
nsAtts = temp;
version = 0; /* force re-initialization of nsAtts hash table */
}
/* using a version flag saves us from initializing nsAtts every time */
if (!version) { /* initialize version flags when version wraps around */
version = INIT_ATTS_VERSION;
for (j = nsAttsSize; j != 0; )
nsAtts[--j].version = version;
}
nsAttsVersion = --version;
/* expand prefixed names and check for duplicates */
for (; i < attIndex; i += 2) {
const XML_Char *s = appAtts[i];
if (s[-1] == 2) { /* prefixed */
ATTRIBUTE_ID *id;
const BINDING *b;
unsigned long uriHash;
struct siphash sip_state;
struct sipkey sip_key;
copy_salt_to_sipkey(parser, &sip_key);
sip24_init(&sip_state, &sip_key);
((XML_Char *)s)[-1] = 0; /* clear flag */
id = (ATTRIBUTE_ID *)lookup(parser, &dtd->attributeIds, s, 0);
if (!id || !id->prefix)
return XML_ERROR_NO_MEMORY;
b = id->prefix->binding;
if (!b)
return XML_ERROR_UNBOUND_PREFIX;
for (j = 0; j < b->uriLen; j++) {
const XML_Char c = b->uri[j];
if (!poolAppendChar(&tempPool, c))
return XML_ERROR_NO_MEMORY;
}
sip24_update(&sip_state, b->uri, b->uriLen * sizeof(XML_Char));
while (*s++ != XML_T(ASCII_COLON))
;
sip24_update(&sip_state, s, keylen(s) * sizeof(XML_Char));
do { /* copies null terminator */
if (!poolAppendChar(&tempPool, *s))
return XML_ERROR_NO_MEMORY;
} while (*s++);
uriHash = (unsigned long)sip24_final(&sip_state);
{ /* Check hash table for duplicate of expanded name (uriName).
Derived from code in lookup(parser, HASH_TABLE *table, ...).
*/
unsigned char step = 0;
unsigned long mask = nsAttsSize - 1;
j = uriHash & mask; /* index into hash table */
while (nsAtts[j].version == version) {
/* for speed we compare stored hash values first */
if (uriHash == nsAtts[j].hash) {
const XML_Char *s1 = poolStart(&tempPool);
const XML_Char *s2 = nsAtts[j].uriName;
/* s1 is null terminated, but not s2 */
for (; *s1 == *s2 && *s1 != 0; s1++, s2++);
if (*s1 == 0)
return XML_ERROR_DUPLICATE_ATTRIBUTE;
}
if (!step)
step = PROBE_STEP(uriHash, mask, nsAttsPower);
j < step ? (j += nsAttsSize - step) : (j -= step);
}
}
if (ns_triplets) { /* append namespace separator and prefix */
tempPool.ptr[-1] = namespaceSeparator;
s = b->prefix->name;
do {
if (!poolAppendChar(&tempPool, *s))
return XML_ERROR_NO_MEMORY;
} while (*s++);
}
/* store expanded name in attribute list */
s = poolStart(&tempPool);
poolFinish(&tempPool);
appAtts[i] = s;
/* fill empty slot with new version, uriName and hash value */
nsAtts[j].version = version;
nsAtts[j].hash = uriHash;
nsAtts[j].uriName = s;
if (!--nPrefixes) {
i += 2;
break;
}
}
else /* not prefixed */
((XML_Char *)s)[-1] = 0; /* clear flag */
}
}
/* clear flags for the remaining attributes */
for (; i < attIndex; i += 2)
((XML_Char *)(appAtts[i]))[-1] = 0;
for (binding = *bindingsPtr; binding; binding = binding->nextTagBinding)
binding->attId->name[-1] = 0;
if (!ns)
return XML_ERROR_NONE;
/* expand the element type name */
if (elementType->prefix) {
binding = elementType->prefix->binding;
if (!binding)
return XML_ERROR_UNBOUND_PREFIX;
localPart = tagNamePtr->str;
while (*localPart++ != XML_T(ASCII_COLON))
;
}
else if (dtd->defaultPrefix.binding) {
binding = dtd->defaultPrefix.binding;
localPart = tagNamePtr->str;
}
else
return XML_ERROR_NONE;
prefixLen = 0;
if (ns_triplets && binding->prefix->name) {
for (; binding->prefix->name[prefixLen++];)
; /* prefixLen includes null terminator */
}
tagNamePtr->localPart = localPart;
tagNamePtr->uriLen = binding->uriLen;
tagNamePtr->prefix = binding->prefix->name;
tagNamePtr->prefixLen = prefixLen;
for (i = 0; localPart[i++];)
; /* i includes null terminator */
n = i + binding->uriLen + prefixLen;
if (n > binding->uriAlloc) {
TAG *p;
uri = (XML_Char *)MALLOC((n + EXPAND_SPARE) * sizeof(XML_Char));
if (!uri)
return XML_ERROR_NO_MEMORY;
binding->uriAlloc = n + EXPAND_SPARE;
memcpy(uri, binding->uri, binding->uriLen * sizeof(XML_Char));
for (p = tagStack; p; p = p->parent)
if (p->name.str == binding->uri)
p->name.str = uri;
FREE(binding->uri);
binding->uri = uri;
}
/* if namespaceSeparator != '\0' then uri includes it already */
uri = binding->uri + binding->uriLen;
memcpy(uri, localPart, i * sizeof(XML_Char));
/* we always have a namespace separator between localPart and prefix */
if (prefixLen) {
uri += i - 1;
*uri = namespaceSeparator; /* replace null terminator */
memcpy(uri + 1, binding->prefix->name, prefixLen * sizeof(XML_Char));
}
tagNamePtr->str = binding->uri;
return XML_ERROR_NONE;
}
/* addBinding() overwrites the value of prefix->binding without checking.
Therefore one must keep track of the old value outside of addBinding().
*/
static enum XML_Error
addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId,
const XML_Char *uri, BINDING **bindingsPtr)
{
static const XML_Char xmlNamespace[] = {
ASCII_h, ASCII_t, ASCII_t, ASCII_p, ASCII_COLON, ASCII_SLASH, ASCII_SLASH,
ASCII_w, ASCII_w, ASCII_w, ASCII_PERIOD, ASCII_w, ASCII_3, ASCII_PERIOD,
ASCII_o, ASCII_r, ASCII_g, ASCII_SLASH, ASCII_X, ASCII_M, ASCII_L,
ASCII_SLASH, ASCII_1, ASCII_9, ASCII_9, ASCII_8, ASCII_SLASH,
ASCII_n, ASCII_a, ASCII_m, ASCII_e, ASCII_s, ASCII_p, ASCII_a, ASCII_c,
ASCII_e, '\0'
};
static const int xmlLen =
(int)sizeof(xmlNamespace)/sizeof(XML_Char) - 1;
static const XML_Char xmlnsNamespace[] = {
ASCII_h, ASCII_t, ASCII_t, ASCII_p, ASCII_COLON, ASCII_SLASH, ASCII_SLASH,
ASCII_w, ASCII_w, ASCII_w, ASCII_PERIOD, ASCII_w, ASCII_3, ASCII_PERIOD,
ASCII_o, ASCII_r, ASCII_g, ASCII_SLASH, ASCII_2, ASCII_0, ASCII_0,
ASCII_0, ASCII_SLASH, ASCII_x, ASCII_m, ASCII_l, ASCII_n, ASCII_s,
ASCII_SLASH, '\0'
};
static const int xmlnsLen =
(int)sizeof(xmlnsNamespace)/sizeof(XML_Char) - 1;
XML_Bool mustBeXML = XML_FALSE;
XML_Bool isXML = XML_TRUE;
XML_Bool isXMLNS = XML_TRUE;
BINDING *b;
int len;
/* empty URI is only valid for default namespace per XML NS 1.0 (not 1.1) */
if (*uri == XML_T('\0') && prefix->name)
return XML_ERROR_UNDECLARING_PREFIX;
if (prefix->name
&& prefix->name[0] == XML_T(ASCII_x)
&& prefix->name[1] == XML_T(ASCII_m)
&& prefix->name[2] == XML_T(ASCII_l)) {
/* Not allowed to bind xmlns */
if (prefix->name[3] == XML_T(ASCII_n)
&& prefix->name[4] == XML_T(ASCII_s)
&& prefix->name[5] == XML_T('\0'))
return XML_ERROR_RESERVED_PREFIX_XMLNS;
if (prefix->name[3] == XML_T('\0'))
mustBeXML = XML_TRUE;
}
for (len = 0; uri[len]; len++) {
if (isXML && (len > xmlLen || uri[len] != xmlNamespace[len]))
isXML = XML_FALSE;
if (!mustBeXML && isXMLNS
&& (len > xmlnsLen || uri[len] != xmlnsNamespace[len]))
isXMLNS = XML_FALSE;
}
isXML = isXML && len == xmlLen;
isXMLNS = isXMLNS && len == xmlnsLen;
if (mustBeXML != isXML)
return mustBeXML ? XML_ERROR_RESERVED_PREFIX_XML
: XML_ERROR_RESERVED_NAMESPACE_URI;
if (isXMLNS)
return XML_ERROR_RESERVED_NAMESPACE_URI;
if (namespaceSeparator)
len++;
if (freeBindingList) {
b = freeBindingList;
if (len > b->uriAlloc) {
XML_Char *temp = (XML_Char *)REALLOC(b->uri,
sizeof(XML_Char) * (len + EXPAND_SPARE));
if (temp == NULL)
return XML_ERROR_NO_MEMORY;
b->uri = temp;
b->uriAlloc = len + EXPAND_SPARE;
}
freeBindingList = b->nextTagBinding;
}
else {
b = (BINDING *)MALLOC(sizeof(BINDING));
if (!b)
return XML_ERROR_NO_MEMORY;
b->uri = (XML_Char *)MALLOC(sizeof(XML_Char) * (len + EXPAND_SPARE));
if (!b->uri) {
FREE(b);
return XML_ERROR_NO_MEMORY;
}
b->uriAlloc = len + EXPAND_SPARE;
}
b->uriLen = len;
memcpy(b->uri, uri, len * sizeof(XML_Char));
if (namespaceSeparator)
b->uri[len - 1] = namespaceSeparator;
b->prefix = prefix;
b->attId = attId;
b->prevPrefixBinding = prefix->binding;
/* NULL binding when default namespace undeclared */
if (*uri == XML_T('\0') && prefix == &_dtd->defaultPrefix)
prefix->binding = NULL;
else
prefix->binding = b;
b->nextTagBinding = *bindingsPtr;
*bindingsPtr = b;
/* if attId == NULL then we are not starting a namespace scope */
if (attId && startNamespaceDeclHandler)
startNamespaceDeclHandler(handlerArg, prefix->name,
prefix->binding ? uri : 0);
return XML_ERROR_NONE;
}
/* The idea here is to avoid using stack for each CDATA section when
the whole file is parsed with one call.
*/
static enum XML_Error PTRCALL
cdataSectionProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = doCdataSection(parser, encoding, &start, end,
endPtr, (XML_Bool)!ps_finalBuffer);
if (result != XML_ERROR_NONE)
return result;
if (start) {
if (parentParser) { /* we are parsing an external entity */
processor = externalEntityContentProcessor;
return externalEntityContentProcessor(parser, start, end, endPtr);
}
else {
processor = contentProcessor;
return contentProcessor(parser, start, end, endPtr);
}
}
return result;
}
/* startPtr gets set to non-null if the section is closed, and to null if
the section is not yet closed.
*/
static enum XML_Error
doCdataSection(XML_Parser parser,
const ENCODING *enc,
const char **startPtr,
const char *end,
const char **nextPtr,
XML_Bool haveMore)
{
const char *s = *startPtr;
const char **eventPP;
const char **eventEndPP;
if (enc == encoding) {
eventPP = &eventPtr;
*eventPP = s;
eventEndPP = &eventEndPtr;
}
else {
eventPP = &(openInternalEntities->internalEventPtr);
eventEndPP = &(openInternalEntities->internalEventEndPtr);
}
*eventPP = s;
*startPtr = NULL;
for (;;) {
const char *next;
int tok = XmlCdataSectionTok(enc, s, end, &next);
*eventEndPP = next;
switch (tok) {
case XML_TOK_CDATA_SECT_CLOSE:
if (endCdataSectionHandler)
endCdataSectionHandler(handlerArg);
#if 0
/* see comment under XML_TOK_CDATA_SECT_OPEN */
else if (characterDataHandler)
characterDataHandler(handlerArg, dataBuf, 0);
#endif
else if (defaultHandler)
reportDefault(parser, enc, s, next);
*startPtr = next;
*nextPtr = next;
if (ps_parsing == XML_FINISHED)
return XML_ERROR_ABORTED;
else
return XML_ERROR_NONE;
case XML_TOK_DATA_NEWLINE:
if (characterDataHandler) {
XML_Char c = 0xA;
characterDataHandler(handlerArg, &c, 1);
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
break;
case XML_TOK_DATA_CHARS:
{
XML_CharacterDataHandler charDataHandler = characterDataHandler;
if (charDataHandler) {
if (MUST_CONVERT(enc, s)) {
for (;;) {
ICHAR *dataPtr = (ICHAR *)dataBuf;
const enum XML_Convert_Result convert_res = XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd);
*eventEndPP = next;
charDataHandler(handlerArg, dataBuf,
(int)(dataPtr - (ICHAR *)dataBuf));
if ((convert_res == XML_CONVERT_COMPLETED) || (convert_res == XML_CONVERT_INPUT_INCOMPLETE))
break;
*eventPP = s;
}
}
else
charDataHandler(handlerArg,
(XML_Char *)s,
(int)((XML_Char *)next - (XML_Char *)s));
}
else if (defaultHandler)
reportDefault(parser, enc, s, next);
}
break;
case XML_TOK_INVALID:
*eventPP = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_PARTIAL:
case XML_TOK_NONE:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_CDATA_SECTION;
default:
*eventPP = next;
return XML_ERROR_UNEXPECTED_STATE;
}
*eventPP = s = next;
switch (ps_parsing) {
case XML_SUSPENDED:
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
default: ;
}
}
/* not reached */
}
#ifdef XML_DTD
/* The idea here is to avoid using stack for each IGNORE section when
the whole file is parsed with one call.
*/
static enum XML_Error PTRCALL
ignoreSectionProcessor(XML_Parser parser,
const char *start,
const char *end,
const char **endPtr)
{
enum XML_Error result = doIgnoreSection(parser, encoding, &start, end,
endPtr, (XML_Bool)!ps_finalBuffer);
if (result != XML_ERROR_NONE)
return result;
if (start) {
processor = prologProcessor;
return prologProcessor(parser, start, end, endPtr);
}
return result;
}
/* startPtr gets set to non-null is the section is closed, and to null
if the section is not yet closed.
*/
static enum XML_Error
doIgnoreSection(XML_Parser parser,
const ENCODING *enc,
const char **startPtr,
const char *end,
const char **nextPtr,
XML_Bool haveMore)
{
const char *next;
int tok;
const char *s = *startPtr;
const char **eventPP;
const char **eventEndPP;
if (enc == encoding) {
eventPP = &eventPtr;
*eventPP = s;
eventEndPP = &eventEndPtr;
}
else {
eventPP = &(openInternalEntities->internalEventPtr);
eventEndPP = &(openInternalEntities->internalEventEndPtr);
}
*eventPP = s;
*startPtr = NULL;
tok = XmlIgnoreSectionTok(enc, s, end, &next);
*eventEndPP = next;
switch (tok) {
case XML_TOK_IGNORE_SECT:
if (defaultHandler)
reportDefault(parser, enc, s, next);
*startPtr = next;
*nextPtr = next;
if (ps_parsing == XML_FINISHED)
return XML_ERROR_ABORTED;
else
return XML_ERROR_NONE;
case XML_TOK_INVALID:
*eventPP = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_PARTIAL:
case XML_TOK_NONE:
if (haveMore) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_SYNTAX; /* XML_ERROR_UNCLOSED_IGNORE_SECTION */
default:
*eventPP = next;
return XML_ERROR_UNEXPECTED_STATE;
}
/* not reached */
}
#endif /* XML_DTD */
static enum XML_Error
initializeEncoding(XML_Parser parser)
{
const char *s;
#ifdef XML_UNICODE
char encodingBuf[128];
if (!protocolEncodingName)
s = NULL;
else {
int i;
for (i = 0; protocolEncodingName[i]; i++) {
if (i == sizeof(encodingBuf) - 1
|| (protocolEncodingName[i] & ~0x7f) != 0) {
encodingBuf[0] = '\0';
break;
}
encodingBuf[i] = (char)protocolEncodingName[i];
}
encodingBuf[i] = '\0';
s = encodingBuf;
}
#else
s = protocolEncodingName;
#endif
if ((ns ? XmlInitEncodingNS : XmlInitEncoding)(&initEncoding, &encoding, s))
return XML_ERROR_NONE;
return handleUnknownEncoding(parser, protocolEncodingName);
}
static enum XML_Error
processXmlDecl(XML_Parser parser, int isGeneralTextEntity,
const char *s, const char *next)
{
const char *encodingName = NULL;
const XML_Char *storedEncName = NULL;
const ENCODING *newEncoding = NULL;
const char *version = NULL;
const char *versionend;
const XML_Char *storedversion = NULL;
int standalone = -1;
if (!(ns
? XmlParseXmlDeclNS
: XmlParseXmlDecl)(isGeneralTextEntity,
encoding,
s,
next,
&eventPtr,
&version,
&versionend,
&encodingName,
&newEncoding,
&standalone)) {
if (isGeneralTextEntity)
return XML_ERROR_TEXT_DECL;
else
return XML_ERROR_XML_DECL;
}
if (!isGeneralTextEntity && standalone == 1) {
_dtd->standalone = XML_TRUE;
#ifdef XML_DTD
if (paramEntityParsing == XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE)
paramEntityParsing = XML_PARAM_ENTITY_PARSING_NEVER;
#endif /* XML_DTD */
}
if (xmlDeclHandler) {
if (encodingName != NULL) {
storedEncName = poolStoreString(&temp2Pool,
encoding,
encodingName,
encodingName
+ XmlNameLength(encoding, encodingName));
if (!storedEncName)
return XML_ERROR_NO_MEMORY;
poolFinish(&temp2Pool);
}
if (version) {
storedversion = poolStoreString(&temp2Pool,
encoding,
version,
versionend - encoding->minBytesPerChar);
if (!storedversion)
return XML_ERROR_NO_MEMORY;
}
xmlDeclHandler(handlerArg, storedversion, storedEncName, standalone);
}
else if (defaultHandler)
reportDefault(parser, encoding, s, next);
if (protocolEncodingName == NULL) {
if (newEncoding) {
if (newEncoding->minBytesPerChar != encoding->minBytesPerChar) {
eventPtr = encodingName;
return XML_ERROR_INCORRECT_ENCODING;
}
encoding = newEncoding;
}
else if (encodingName) {
enum XML_Error result;
if (!storedEncName) {
storedEncName = poolStoreString(
&temp2Pool, encoding, encodingName,
encodingName + XmlNameLength(encoding, encodingName));
if (!storedEncName)
return XML_ERROR_NO_MEMORY;
}
result = handleUnknownEncoding(parser, storedEncName);
poolClear(&temp2Pool);
if (result == XML_ERROR_UNKNOWN_ENCODING)
eventPtr = encodingName;
return result;
}
}
if (storedEncName || storedversion)
poolClear(&temp2Pool);
return XML_ERROR_NONE;
}
static enum XML_Error
handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName)
{
if (unknownEncodingHandler) {
XML_Encoding info;
int i;
for (i = 0; i < 256; i++)
info.map[i] = -1;
info.convert = NULL;
info.data = NULL;
info.release = NULL;
if (unknownEncodingHandler(unknownEncodingHandlerData, encodingName,
&info)) {
ENCODING *enc;
unknownEncodingMem = MALLOC(XmlSizeOfUnknownEncoding());
if (!unknownEncodingMem) {
if (info.release)
info.release(info.data);
return XML_ERROR_NO_MEMORY;
}
enc = (ns
? XmlInitUnknownEncodingNS
: XmlInitUnknownEncoding)(unknownEncodingMem,
info.map,
info.convert,
info.data);
if (enc) {
unknownEncodingData = info.data;
unknownEncodingRelease = info.release;
encoding = enc;
return XML_ERROR_NONE;
}
}
if (info.release != NULL)
info.release(info.data);
}
return XML_ERROR_UNKNOWN_ENCODING;
}
static enum XML_Error PTRCALL
prologInitProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
enum XML_Error result = initializeEncoding(parser);
if (result != XML_ERROR_NONE)
return result;
processor = prologProcessor;
return prologProcessor(parser, s, end, nextPtr);
}
#ifdef XML_DTD
static enum XML_Error PTRCALL
externalParEntInitProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
enum XML_Error result = initializeEncoding(parser);
if (result != XML_ERROR_NONE)
return result;
/* we know now that XML_Parse(Buffer) has been called,
so we consider the external parameter entity read */
_dtd->paramEntityRead = XML_TRUE;
if (prologState.inEntityValue) {
processor = entityValueInitProcessor;
return entityValueInitProcessor(parser, s, end, nextPtr);
}
else {
processor = externalParEntProcessor;
return externalParEntProcessor(parser, s, end, nextPtr);
}
}
static enum XML_Error PTRCALL
entityValueInitProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
int tok;
const char *start = s;
const char *next = start;
eventPtr = start;
for (;;) {
tok = XmlPrologTok(encoding, start, end, &next);
eventEndPtr = next;
if (tok <= 0) {
if (!ps_finalBuffer && tok != XML_TOK_INVALID) {
*nextPtr = s;
return XML_ERROR_NONE;
}
switch (tok) {
case XML_TOK_INVALID:
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
return XML_ERROR_PARTIAL_CHAR;
case XML_TOK_NONE: /* start == end */
default:
break;
}
/* found end of entity value - can store it now */
return storeEntityValue(parser, encoding, s, end);
}
else if (tok == XML_TOK_XML_DECL) {
enum XML_Error result;
result = processXmlDecl(parser, 0, start, next);
if (result != XML_ERROR_NONE)
return result;
switch (ps_parsing) {
case XML_SUSPENDED:
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
default:
*nextPtr = next;
}
/* stop scanning for text declaration - we found one */
processor = entityValueProcessor;
return entityValueProcessor(parser, next, end, nextPtr);
}
/* If we are at the end of the buffer, this would cause XmlPrologTok to
return XML_TOK_NONE on the next call, which would then cause the
function to exit with *nextPtr set to s - that is what we want for other
tokens, but not for the BOM - we would rather like to skip it;
then, when this routine is entered the next time, XmlPrologTok will
return XML_TOK_INVALID, since the BOM is still in the buffer
*/
else if (tok == XML_TOK_BOM && next == end && !ps_finalBuffer) {
*nextPtr = next;
return XML_ERROR_NONE;
}
/* If we get this token, we have the start of what might be a
normal tag, but not a declaration (i.e. it doesn't begin with
"internalEventPtr);
eventEndPP = &(openInternalEntities->internalEventEndPtr);
}
for (;;) {
int role;
XML_Bool handleDefault = XML_TRUE;
*eventPP = s;
*eventEndPP = next;
if (tok <= 0) {
if (haveMore && tok != XML_TOK_INVALID) {
*nextPtr = s;
return XML_ERROR_NONE;
}
switch (tok) {
case XML_TOK_INVALID:
*eventPP = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
return XML_ERROR_PARTIAL_CHAR;
case -XML_TOK_PROLOG_S:
tok = -tok;
break;
case XML_TOK_NONE:
#ifdef XML_DTD
/* for internal PE NOT referenced between declarations */
if (enc != encoding && !openInternalEntities->betweenDecl) {
*nextPtr = s;
return XML_ERROR_NONE;
}
/* WFC: PE Between Declarations - must check that PE contains
complete markup, not only for external PEs, but also for
internal PEs if the reference occurs between declarations.
*/
if (isParamEntity || enc != encoding) {
if (XmlTokenRole(&prologState, XML_TOK_NONE, end, end, enc)
== XML_ROLE_ERROR)
return XML_ERROR_INCOMPLETE_PE;
*nextPtr = s;
return XML_ERROR_NONE;
}
#endif /* XML_DTD */
return XML_ERROR_NO_ELEMENTS;
default:
tok = -tok;
next = end;
break;
}
}
role = XmlTokenRole(&prologState, tok, s, next, enc);
switch (role) {
case XML_ROLE_XML_DECL:
{
enum XML_Error result = processXmlDecl(parser, 0, s, next);
if (result != XML_ERROR_NONE)
return result;
enc = encoding;
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_DOCTYPE_NAME:
if (startDoctypeDeclHandler) {
doctypeName = poolStoreString(&tempPool, enc, s, next);
if (!doctypeName)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
doctypePubid = NULL;
handleDefault = XML_FALSE;
}
doctypeSysid = NULL; /* always initialize to NULL */
break;
case XML_ROLE_DOCTYPE_INTERNAL_SUBSET:
if (startDoctypeDeclHandler) {
startDoctypeDeclHandler(handlerArg, doctypeName, doctypeSysid,
doctypePubid, 1);
doctypeName = NULL;
poolClear(&tempPool);
handleDefault = XML_FALSE;
}
break;
#ifdef XML_DTD
case XML_ROLE_TEXT_DECL:
{
enum XML_Error result = processXmlDecl(parser, 1, s, next);
if (result != XML_ERROR_NONE)
return result;
enc = encoding;
handleDefault = XML_FALSE;
}
break;
#endif /* XML_DTD */
case XML_ROLE_DOCTYPE_PUBLIC_ID:
#ifdef XML_DTD
useForeignDTD = XML_FALSE;
declEntity = (ENTITY *)lookup(parser,
&dtd->paramEntities,
externalSubsetName,
sizeof(ENTITY));
if (!declEntity)
return XML_ERROR_NO_MEMORY;
#endif /* XML_DTD */
dtd->hasParamEntityRefs = XML_TRUE;
if (startDoctypeDeclHandler) {
XML_Char *pubId;
if (!XmlIsPublicId(enc, s, next, eventPP))
return XML_ERROR_PUBLICID;
pubId = poolStoreString(&tempPool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!pubId)
return XML_ERROR_NO_MEMORY;
normalizePublicId(pubId);
poolFinish(&tempPool);
doctypePubid = pubId;
handleDefault = XML_FALSE;
goto alreadyChecked;
}
/* fall through */
case XML_ROLE_ENTITY_PUBLIC_ID:
if (!XmlIsPublicId(enc, s, next, eventPP))
return XML_ERROR_PUBLICID;
alreadyChecked:
if (dtd->keepProcessing && declEntity) {
XML_Char *tem = poolStoreString(&dtd->pool,
enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!tem)
return XML_ERROR_NO_MEMORY;
normalizePublicId(tem);
declEntity->publicId = tem;
poolFinish(&dtd->pool);
if (entityDeclHandler)
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_DOCTYPE_CLOSE:
if (doctypeName) {
startDoctypeDeclHandler(handlerArg, doctypeName,
doctypeSysid, doctypePubid, 0);
poolClear(&tempPool);
handleDefault = XML_FALSE;
}
/* doctypeSysid will be non-NULL in the case of a previous
XML_ROLE_DOCTYPE_SYSTEM_ID, even if startDoctypeDeclHandler
was not set, indicating an external subset
*/
#ifdef XML_DTD
if (doctypeSysid || useForeignDTD) {
XML_Bool hadParamEntityRefs = dtd->hasParamEntityRefs;
dtd->hasParamEntityRefs = XML_TRUE;
if (paramEntityParsing && externalEntityRefHandler) {
ENTITY *entity = (ENTITY *)lookup(parser,
&dtd->paramEntities,
externalSubsetName,
sizeof(ENTITY));
if (!entity)
return XML_ERROR_NO_MEMORY;
if (useForeignDTD)
entity->base = curBase;
dtd->paramEntityRead = XML_FALSE;
if (!externalEntityRefHandler(externalEntityRefHandlerArg,
0,
entity->base,
entity->systemId,
entity->publicId))
return XML_ERROR_EXTERNAL_ENTITY_HANDLING;
if (dtd->paramEntityRead) {
if (!dtd->standalone &&
notStandaloneHandler &&
!notStandaloneHandler(handlerArg))
return XML_ERROR_NOT_STANDALONE;
}
/* if we didn't read the foreign DTD then this means that there
is no external subset and we must reset dtd->hasParamEntityRefs
*/
else if (!doctypeSysid)
dtd->hasParamEntityRefs = hadParamEntityRefs;
/* end of DTD - no need to update dtd->keepProcessing */
}
useForeignDTD = XML_FALSE;
}
#endif /* XML_DTD */
if (endDoctypeDeclHandler) {
endDoctypeDeclHandler(handlerArg);
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_INSTANCE_START:
#ifdef XML_DTD
/* if there is no DOCTYPE declaration then now is the
last chance to read the foreign DTD
*/
if (useForeignDTD) {
XML_Bool hadParamEntityRefs = dtd->hasParamEntityRefs;
dtd->hasParamEntityRefs = XML_TRUE;
if (paramEntityParsing && externalEntityRefHandler) {
ENTITY *entity = (ENTITY *)lookup(parser, &dtd->paramEntities,
externalSubsetName,
sizeof(ENTITY));
if (!entity)
return XML_ERROR_NO_MEMORY;
entity->base = curBase;
dtd->paramEntityRead = XML_FALSE;
if (!externalEntityRefHandler(externalEntityRefHandlerArg,
0,
entity->base,
entity->systemId,
entity->publicId))
return XML_ERROR_EXTERNAL_ENTITY_HANDLING;
if (dtd->paramEntityRead) {
if (!dtd->standalone &&
notStandaloneHandler &&
!notStandaloneHandler(handlerArg))
return XML_ERROR_NOT_STANDALONE;
}
/* if we didn't read the foreign DTD then this means that there
is no external subset and we must reset dtd->hasParamEntityRefs
*/
else
dtd->hasParamEntityRefs = hadParamEntityRefs;
/* end of DTD - no need to update dtd->keepProcessing */
}
}
#endif /* XML_DTD */
processor = contentProcessor;
return contentProcessor(parser, s, end, nextPtr);
case XML_ROLE_ATTLIST_ELEMENT_NAME:
declElementType = getElementType(parser, enc, s, next);
if (!declElementType)
return XML_ERROR_NO_MEMORY;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_NAME:
declAttributeId = getAttributeId(parser, enc, s, next);
if (!declAttributeId)
return XML_ERROR_NO_MEMORY;
declAttributeIsCdata = XML_FALSE;
declAttributeType = NULL;
declAttributeIsId = XML_FALSE;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_CDATA:
declAttributeIsCdata = XML_TRUE;
declAttributeType = atypeCDATA;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_ID:
declAttributeIsId = XML_TRUE;
declAttributeType = atypeID;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_IDREF:
declAttributeType = atypeIDREF;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_IDREFS:
declAttributeType = atypeIDREFS;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_ENTITY:
declAttributeType = atypeENTITY;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_ENTITIES:
declAttributeType = atypeENTITIES;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_NMTOKEN:
declAttributeType = atypeNMTOKEN;
goto checkAttListDeclHandler;
case XML_ROLE_ATTRIBUTE_TYPE_NMTOKENS:
declAttributeType = atypeNMTOKENS;
checkAttListDeclHandler:
if (dtd->keepProcessing && attlistDeclHandler)
handleDefault = XML_FALSE;
break;
case XML_ROLE_ATTRIBUTE_ENUM_VALUE:
case XML_ROLE_ATTRIBUTE_NOTATION_VALUE:
if (dtd->keepProcessing && attlistDeclHandler) {
const XML_Char *prefix;
if (declAttributeType) {
prefix = enumValueSep;
}
else {
prefix = (role == XML_ROLE_ATTRIBUTE_NOTATION_VALUE
? notationPrefix
: enumValueStart);
}
if (!poolAppendString(&tempPool, prefix))
return XML_ERROR_NO_MEMORY;
if (!poolAppend(&tempPool, enc, s, next))
return XML_ERROR_NO_MEMORY;
declAttributeType = tempPool.start;
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_IMPLIED_ATTRIBUTE_VALUE:
case XML_ROLE_REQUIRED_ATTRIBUTE_VALUE:
if (dtd->keepProcessing) {
if (!defineAttribute(declElementType, declAttributeId,
declAttributeIsCdata, declAttributeIsId,
0, parser))
return XML_ERROR_NO_MEMORY;
if (attlistDeclHandler && declAttributeType) {
if (*declAttributeType == XML_T(ASCII_LPAREN)
|| (*declAttributeType == XML_T(ASCII_N)
&& declAttributeType[1] == XML_T(ASCII_O))) {
/* Enumerated or Notation type */
if (!poolAppendChar(&tempPool, XML_T(ASCII_RPAREN))
|| !poolAppendChar(&tempPool, XML_T('\0')))
return XML_ERROR_NO_MEMORY;
declAttributeType = tempPool.start;
poolFinish(&tempPool);
}
*eventEndPP = s;
attlistDeclHandler(handlerArg, declElementType->name,
declAttributeId->name, declAttributeType,
0, role == XML_ROLE_REQUIRED_ATTRIBUTE_VALUE);
poolClear(&tempPool);
handleDefault = XML_FALSE;
}
}
break;
case XML_ROLE_DEFAULT_ATTRIBUTE_VALUE:
case XML_ROLE_FIXED_ATTRIBUTE_VALUE:
if (dtd->keepProcessing) {
const XML_Char *attVal;
enum XML_Error result =
storeAttributeValue(parser, enc, declAttributeIsCdata,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar,
&dtd->pool);
if (result)
return result;
attVal = poolStart(&dtd->pool);
poolFinish(&dtd->pool);
/* ID attributes aren't allowed to have a default */
if (!defineAttribute(declElementType, declAttributeId,
declAttributeIsCdata, XML_FALSE, attVal, parser))
return XML_ERROR_NO_MEMORY;
if (attlistDeclHandler && declAttributeType) {
if (*declAttributeType == XML_T(ASCII_LPAREN)
|| (*declAttributeType == XML_T(ASCII_N)
&& declAttributeType[1] == XML_T(ASCII_O))) {
/* Enumerated or Notation type */
if (!poolAppendChar(&tempPool, XML_T(ASCII_RPAREN))
|| !poolAppendChar(&tempPool, XML_T('\0')))
return XML_ERROR_NO_MEMORY;
declAttributeType = tempPool.start;
poolFinish(&tempPool);
}
*eventEndPP = s;
attlistDeclHandler(handlerArg, declElementType->name,
declAttributeId->name, declAttributeType,
attVal,
role == XML_ROLE_FIXED_ATTRIBUTE_VALUE);
poolClear(&tempPool);
handleDefault = XML_FALSE;
}
}
break;
case XML_ROLE_ENTITY_VALUE:
if (dtd->keepProcessing) {
enum XML_Error result = storeEntityValue(parser, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (declEntity) {
declEntity->textPtr = poolStart(&dtd->entityValuePool);
declEntity->textLen = (int)(poolLength(&dtd->entityValuePool));
poolFinish(&dtd->entityValuePool);
if (entityDeclHandler) {
*eventEndPP = s;
entityDeclHandler(handlerArg,
declEntity->name,
declEntity->is_param,
declEntity->textPtr,
declEntity->textLen,
curBase, 0, 0, 0);
handleDefault = XML_FALSE;
}
}
else
poolDiscard(&dtd->entityValuePool);
if (result != XML_ERROR_NONE)
return result;
}
break;
case XML_ROLE_DOCTYPE_SYSTEM_ID:
#ifdef XML_DTD
useForeignDTD = XML_FALSE;
#endif /* XML_DTD */
dtd->hasParamEntityRefs = XML_TRUE;
if (startDoctypeDeclHandler) {
doctypeSysid = poolStoreString(&tempPool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (doctypeSysid == NULL)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
handleDefault = XML_FALSE;
}
#ifdef XML_DTD
else
/* use externalSubsetName to make doctypeSysid non-NULL
for the case where no startDoctypeDeclHandler is set */
doctypeSysid = externalSubsetName;
#endif /* XML_DTD */
if (!dtd->standalone
#ifdef XML_DTD
&& !paramEntityParsing
#endif /* XML_DTD */
&& notStandaloneHandler
&& !notStandaloneHandler(handlerArg))
return XML_ERROR_NOT_STANDALONE;
#ifndef XML_DTD
break;
#else /* XML_DTD */
if (!declEntity) {
declEntity = (ENTITY *)lookup(parser,
&dtd->paramEntities,
externalSubsetName,
sizeof(ENTITY));
if (!declEntity)
return XML_ERROR_NO_MEMORY;
declEntity->publicId = NULL;
}
/* fall through */
#endif /* XML_DTD */
case XML_ROLE_ENTITY_SYSTEM_ID:
if (dtd->keepProcessing && declEntity) {
declEntity->systemId = poolStoreString(&dtd->pool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!declEntity->systemId)
return XML_ERROR_NO_MEMORY;
declEntity->base = curBase;
poolFinish(&dtd->pool);
if (entityDeclHandler)
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_ENTITY_COMPLETE:
if (dtd->keepProcessing && declEntity && entityDeclHandler) {
*eventEndPP = s;
entityDeclHandler(handlerArg,
declEntity->name,
declEntity->is_param,
0,0,
declEntity->base,
declEntity->systemId,
declEntity->publicId,
0);
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_ENTITY_NOTATION_NAME:
if (dtd->keepProcessing && declEntity) {
declEntity->notation = poolStoreString(&dtd->pool, enc, s, next);
if (!declEntity->notation)
return XML_ERROR_NO_MEMORY;
poolFinish(&dtd->pool);
if (unparsedEntityDeclHandler) {
*eventEndPP = s;
unparsedEntityDeclHandler(handlerArg,
declEntity->name,
declEntity->base,
declEntity->systemId,
declEntity->publicId,
declEntity->notation);
handleDefault = XML_FALSE;
}
else if (entityDeclHandler) {
*eventEndPP = s;
entityDeclHandler(handlerArg,
declEntity->name,
0,0,0,
declEntity->base,
declEntity->systemId,
declEntity->publicId,
declEntity->notation);
handleDefault = XML_FALSE;
}
}
break;
case XML_ROLE_GENERAL_ENTITY_NAME:
{
if (XmlPredefinedEntityName(enc, s, next)) {
declEntity = NULL;
break;
}
if (dtd->keepProcessing) {
const XML_Char *name = poolStoreString(&dtd->pool, enc, s, next);
if (!name)
return XML_ERROR_NO_MEMORY;
declEntity = (ENTITY *)lookup(parser, &dtd->generalEntities, name,
sizeof(ENTITY));
if (!declEntity)
return XML_ERROR_NO_MEMORY;
if (declEntity->name != name) {
poolDiscard(&dtd->pool);
declEntity = NULL;
}
else {
poolFinish(&dtd->pool);
declEntity->publicId = NULL;
declEntity->is_param = XML_FALSE;
/* if we have a parent parser or are reading an internal parameter
entity, then the entity declaration is not considered "internal"
*/
declEntity->is_internal = !(parentParser || openInternalEntities);
if (entityDeclHandler)
handleDefault = XML_FALSE;
}
}
else {
poolDiscard(&dtd->pool);
declEntity = NULL;
}
}
break;
case XML_ROLE_PARAM_ENTITY_NAME:
#ifdef XML_DTD
if (dtd->keepProcessing) {
const XML_Char *name = poolStoreString(&dtd->pool, enc, s, next);
if (!name)
return XML_ERROR_NO_MEMORY;
declEntity = (ENTITY *)lookup(parser, &dtd->paramEntities,
name, sizeof(ENTITY));
if (!declEntity)
return XML_ERROR_NO_MEMORY;
if (declEntity->name != name) {
poolDiscard(&dtd->pool);
declEntity = NULL;
}
else {
poolFinish(&dtd->pool);
declEntity->publicId = NULL;
declEntity->is_param = XML_TRUE;
/* if we have a parent parser or are reading an internal parameter
entity, then the entity declaration is not considered "internal"
*/
declEntity->is_internal = !(parentParser || openInternalEntities);
if (entityDeclHandler)
handleDefault = XML_FALSE;
}
}
else {
poolDiscard(&dtd->pool);
declEntity = NULL;
}
#else /* not XML_DTD */
declEntity = NULL;
#endif /* XML_DTD */
break;
case XML_ROLE_NOTATION_NAME:
declNotationPublicId = NULL;
declNotationName = NULL;
if (notationDeclHandler) {
declNotationName = poolStoreString(&tempPool, enc, s, next);
if (!declNotationName)
return XML_ERROR_NO_MEMORY;
poolFinish(&tempPool);
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_NOTATION_PUBLIC_ID:
if (!XmlIsPublicId(enc, s, next, eventPP))
return XML_ERROR_PUBLICID;
if (declNotationName) { /* means notationDeclHandler != NULL */
XML_Char *tem = poolStoreString(&tempPool,
enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!tem)
return XML_ERROR_NO_MEMORY;
normalizePublicId(tem);
declNotationPublicId = tem;
poolFinish(&tempPool);
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_NOTATION_SYSTEM_ID:
if (declNotationName && notationDeclHandler) {
const XML_Char *systemId
= poolStoreString(&tempPool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!systemId)
return XML_ERROR_NO_MEMORY;
*eventEndPP = s;
notationDeclHandler(handlerArg,
declNotationName,
curBase,
systemId,
declNotationPublicId);
handleDefault = XML_FALSE;
}
poolClear(&tempPool);
break;
case XML_ROLE_NOTATION_NO_SYSTEM_ID:
if (declNotationPublicId && notationDeclHandler) {
*eventEndPP = s;
notationDeclHandler(handlerArg,
declNotationName,
curBase,
0,
declNotationPublicId);
handleDefault = XML_FALSE;
}
poolClear(&tempPool);
break;
case XML_ROLE_ERROR:
switch (tok) {
case XML_TOK_PARAM_ENTITY_REF:
/* PE references in internal subset are
not allowed within declarations. */
return XML_ERROR_PARAM_ENTITY_REF;
case XML_TOK_XML_DECL:
return XML_ERROR_MISPLACED_XML_PI;
default:
return XML_ERROR_SYNTAX;
}
#ifdef XML_DTD
case XML_ROLE_IGNORE_SECT:
{
enum XML_Error result;
if (defaultHandler)
reportDefault(parser, enc, s, next);
handleDefault = XML_FALSE;
result = doIgnoreSection(parser, enc, &next, end, nextPtr, haveMore);
if (result != XML_ERROR_NONE)
return result;
else if (!next) {
processor = ignoreSectionProcessor;
return result;
}
}
break;
#endif /* XML_DTD */
case XML_ROLE_GROUP_OPEN:
if (prologState.level >= groupSize) {
if (groupSize) {
char *temp = (char *)REALLOC(groupConnector, groupSize *= 2);
if (temp == NULL)
return XML_ERROR_NO_MEMORY;
groupConnector = temp;
if (dtd->scaffIndex) {
int *temp = (int *)REALLOC(dtd->scaffIndex,
groupSize * sizeof(int));
if (temp == NULL)
return XML_ERROR_NO_MEMORY;
dtd->scaffIndex = temp;
}
}
else {
groupConnector = (char *)MALLOC(groupSize = 32);
if (!groupConnector)
return XML_ERROR_NO_MEMORY;
}
}
groupConnector[prologState.level] = 0;
if (dtd->in_eldecl) {
int myindex = nextScaffoldPart(parser);
if (myindex < 0)
return XML_ERROR_NO_MEMORY;
dtd->scaffIndex[dtd->scaffLevel] = myindex;
dtd->scaffLevel++;
dtd->scaffold[myindex].type = XML_CTYPE_SEQ;
if (elementDeclHandler)
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_GROUP_SEQUENCE:
if (groupConnector[prologState.level] == ASCII_PIPE)
return XML_ERROR_SYNTAX;
groupConnector[prologState.level] = ASCII_COMMA;
if (dtd->in_eldecl && elementDeclHandler)
handleDefault = XML_FALSE;
break;
case XML_ROLE_GROUP_CHOICE:
if (groupConnector[prologState.level] == ASCII_COMMA)
return XML_ERROR_SYNTAX;
if (dtd->in_eldecl
&& !groupConnector[prologState.level]
&& (dtd->scaffold[dtd->scaffIndex[dtd->scaffLevel - 1]].type
!= XML_CTYPE_MIXED)
) {
dtd->scaffold[dtd->scaffIndex[dtd->scaffLevel - 1]].type
= XML_CTYPE_CHOICE;
if (elementDeclHandler)
handleDefault = XML_FALSE;
}
groupConnector[prologState.level] = ASCII_PIPE;
break;
case XML_ROLE_PARAM_ENTITY_REF:
#ifdef XML_DTD
case XML_ROLE_INNER_PARAM_ENTITY_REF:
dtd->hasParamEntityRefs = XML_TRUE;
if (!paramEntityParsing)
dtd->keepProcessing = dtd->standalone;
else {
const XML_Char *name;
ENTITY *entity;
name = poolStoreString(&dtd->pool, enc,
s + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!name)
return XML_ERROR_NO_MEMORY;
entity = (ENTITY *)lookup(parser, &dtd->paramEntities, name, 0);
poolDiscard(&dtd->pool);
/* first, determine if a check for an existing declaration is needed;
if yes, check that the entity exists, and that it is internal,
otherwise call the skipped entity handler
*/
if (prologState.documentEntity &&
(dtd->standalone
? !openInternalEntities
: !dtd->hasParamEntityRefs)) {
if (!entity)
return XML_ERROR_UNDEFINED_ENTITY;
else if (!entity->is_internal)
return XML_ERROR_ENTITY_DECLARED_IN_PE;
}
else if (!entity) {
dtd->keepProcessing = dtd->standalone;
/* cannot report skipped entities in declarations */
if ((role == XML_ROLE_PARAM_ENTITY_REF) && skippedEntityHandler) {
skippedEntityHandler(handlerArg, name, 1);
handleDefault = XML_FALSE;
}
break;
}
if (entity->open)
return XML_ERROR_RECURSIVE_ENTITY_REF;
if (entity->textPtr) {
enum XML_Error result;
XML_Bool betweenDecl =
(role == XML_ROLE_PARAM_ENTITY_REF ? XML_TRUE : XML_FALSE);
result = processInternalEntity(parser, entity, betweenDecl);
if (result != XML_ERROR_NONE)
return result;
handleDefault = XML_FALSE;
break;
}
if (externalEntityRefHandler) {
dtd->paramEntityRead = XML_FALSE;
entity->open = XML_TRUE;
if (!externalEntityRefHandler(externalEntityRefHandlerArg,
0,
entity->base,
entity->systemId,
entity->publicId)) {
entity->open = XML_FALSE;
return XML_ERROR_EXTERNAL_ENTITY_HANDLING;
}
entity->open = XML_FALSE;
handleDefault = XML_FALSE;
if (!dtd->paramEntityRead) {
dtd->keepProcessing = dtd->standalone;
break;
}
}
else {
dtd->keepProcessing = dtd->standalone;
break;
}
}
#endif /* XML_DTD */
if (!dtd->standalone &&
notStandaloneHandler &&
!notStandaloneHandler(handlerArg))
return XML_ERROR_NOT_STANDALONE;
break;
/* Element declaration stuff */
case XML_ROLE_ELEMENT_NAME:
if (elementDeclHandler) {
declElementType = getElementType(parser, enc, s, next);
if (!declElementType)
return XML_ERROR_NO_MEMORY;
dtd->scaffLevel = 0;
dtd->scaffCount = 0;
dtd->in_eldecl = XML_TRUE;
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_CONTENT_ANY:
case XML_ROLE_CONTENT_EMPTY:
if (dtd->in_eldecl) {
if (elementDeclHandler) {
XML_Content * content = (XML_Content *) MALLOC(sizeof(XML_Content));
if (!content)
return XML_ERROR_NO_MEMORY;
content->quant = XML_CQUANT_NONE;
content->name = NULL;
content->numchildren = 0;
content->children = NULL;
content->type = ((role == XML_ROLE_CONTENT_ANY) ?
XML_CTYPE_ANY :
XML_CTYPE_EMPTY);
*eventEndPP = s;
elementDeclHandler(handlerArg, declElementType->name, content);
handleDefault = XML_FALSE;
}
dtd->in_eldecl = XML_FALSE;
}
break;
case XML_ROLE_CONTENT_PCDATA:
if (dtd->in_eldecl) {
dtd->scaffold[dtd->scaffIndex[dtd->scaffLevel - 1]].type
= XML_CTYPE_MIXED;
if (elementDeclHandler)
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_CONTENT_ELEMENT:
quant = XML_CQUANT_NONE;
goto elementContent;
case XML_ROLE_CONTENT_ELEMENT_OPT:
quant = XML_CQUANT_OPT;
goto elementContent;
case XML_ROLE_CONTENT_ELEMENT_REP:
quant = XML_CQUANT_REP;
goto elementContent;
case XML_ROLE_CONTENT_ELEMENT_PLUS:
quant = XML_CQUANT_PLUS;
elementContent:
if (dtd->in_eldecl) {
ELEMENT_TYPE *el;
const XML_Char *name;
int nameLen;
const char *nxt = (quant == XML_CQUANT_NONE
? next
: next - enc->minBytesPerChar);
int myindex = nextScaffoldPart(parser);
if (myindex < 0)
return XML_ERROR_NO_MEMORY;
dtd->scaffold[myindex].type = XML_CTYPE_NAME;
dtd->scaffold[myindex].quant = quant;
el = getElementType(parser, enc, s, nxt);
if (!el)
return XML_ERROR_NO_MEMORY;
name = el->name;
dtd->scaffold[myindex].name = name;
nameLen = 0;
for (; name[nameLen++]; );
dtd->contentStringLen += nameLen;
if (elementDeclHandler)
handleDefault = XML_FALSE;
}
break;
case XML_ROLE_GROUP_CLOSE:
quant = XML_CQUANT_NONE;
goto closeGroup;
case XML_ROLE_GROUP_CLOSE_OPT:
quant = XML_CQUANT_OPT;
goto closeGroup;
case XML_ROLE_GROUP_CLOSE_REP:
quant = XML_CQUANT_REP;
goto closeGroup;
case XML_ROLE_GROUP_CLOSE_PLUS:
quant = XML_CQUANT_PLUS;
closeGroup:
if (dtd->in_eldecl) {
if (elementDeclHandler)
handleDefault = XML_FALSE;
dtd->scaffLevel--;
dtd->scaffold[dtd->scaffIndex[dtd->scaffLevel]].quant = quant;
if (dtd->scaffLevel == 0) {
if (!handleDefault) {
XML_Content *model = build_model(parser);
if (!model)
return XML_ERROR_NO_MEMORY;
*eventEndPP = s;
elementDeclHandler(handlerArg, declElementType->name, model);
}
dtd->in_eldecl = XML_FALSE;
dtd->contentStringLen = 0;
}
}
break;
/* End element declaration stuff */
case XML_ROLE_PI:
if (!reportProcessingInstruction(parser, enc, s, next))
return XML_ERROR_NO_MEMORY;
handleDefault = XML_FALSE;
break;
case XML_ROLE_COMMENT:
if (!reportComment(parser, enc, s, next))
return XML_ERROR_NO_MEMORY;
handleDefault = XML_FALSE;
break;
case XML_ROLE_NONE:
switch (tok) {
case XML_TOK_BOM:
handleDefault = XML_FALSE;
break;
}
break;
case XML_ROLE_DOCTYPE_NONE:
if (startDoctypeDeclHandler)
handleDefault = XML_FALSE;
break;
case XML_ROLE_ENTITY_NONE:
if (dtd->keepProcessing && entityDeclHandler)
handleDefault = XML_FALSE;
break;
case XML_ROLE_NOTATION_NONE:
if (notationDeclHandler)
handleDefault = XML_FALSE;
break;
case XML_ROLE_ATTLIST_NONE:
if (dtd->keepProcessing && attlistDeclHandler)
handleDefault = XML_FALSE;
break;
case XML_ROLE_ELEMENT_NONE:
if (elementDeclHandler)
handleDefault = XML_FALSE;
break;
} /* end of big switch */
if (handleDefault && defaultHandler)
reportDefault(parser, enc, s, next);
switch (ps_parsing) {
case XML_SUSPENDED:
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
default:
s = next;
tok = XmlPrologTok(enc, s, end, &next);
}
}
/* not reached */
}
static enum XML_Error PTRCALL
epilogProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
processor = epilogProcessor;
eventPtr = s;
for (;;) {
const char *next = NULL;
int tok = XmlPrologTok(encoding, s, end, &next);
eventEndPtr = next;
switch (tok) {
/* report partial linebreak - it might be the last token */
case -XML_TOK_PROLOG_S:
if (defaultHandler) {
reportDefault(parser, encoding, s, next);
if (ps_parsing == XML_FINISHED)
return XML_ERROR_ABORTED;
}
*nextPtr = next;
return XML_ERROR_NONE;
case XML_TOK_NONE:
*nextPtr = s;
return XML_ERROR_NONE;
case XML_TOK_PROLOG_S:
if (defaultHandler)
reportDefault(parser, encoding, s, next);
break;
case XML_TOK_PI:
if (!reportProcessingInstruction(parser, encoding, s, next))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_COMMENT:
if (!reportComment(parser, encoding, s, next))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_INVALID:
eventPtr = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
if (!ps_finalBuffer) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_UNCLOSED_TOKEN;
case XML_TOK_PARTIAL_CHAR:
if (!ps_finalBuffer) {
*nextPtr = s;
return XML_ERROR_NONE;
}
return XML_ERROR_PARTIAL_CHAR;
default:
return XML_ERROR_JUNK_AFTER_DOC_ELEMENT;
}
eventPtr = s = next;
switch (ps_parsing) {
case XML_SUSPENDED:
*nextPtr = next;
return XML_ERROR_NONE;
case XML_FINISHED:
return XML_ERROR_ABORTED;
default: ;
}
}
}
static enum XML_Error
processInternalEntity(XML_Parser parser, ENTITY *entity,
XML_Bool betweenDecl)
{
const char *textStart, *textEnd;
const char *next;
enum XML_Error result;
OPEN_INTERNAL_ENTITY *openEntity;
if (freeInternalEntities) {
openEntity = freeInternalEntities;
freeInternalEntities = openEntity->next;
}
else {
openEntity = (OPEN_INTERNAL_ENTITY *)MALLOC(sizeof(OPEN_INTERNAL_ENTITY));
if (!openEntity)
return XML_ERROR_NO_MEMORY;
}
entity->open = XML_TRUE;
entity->processed = 0;
openEntity->next = openInternalEntities;
openInternalEntities = openEntity;
openEntity->entity = entity;
openEntity->startTagLevel = tagLevel;
openEntity->betweenDecl = betweenDecl;
openEntity->internalEventPtr = NULL;
openEntity->internalEventEndPtr = NULL;
textStart = (char *)entity->textPtr;
textEnd = (char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
next = textStart;
#ifdef XML_DTD
if (entity->is_param) {
int tok = XmlPrologTok(internalEncoding, textStart, textEnd, &next);
result = doProlog(parser, internalEncoding, textStart, textEnd, tok,
next, &next, XML_FALSE);
}
else
#endif /* XML_DTD */
result = doContent(parser, tagLevel, internalEncoding, textStart,
textEnd, &next, XML_FALSE);
if (result == XML_ERROR_NONE) {
if (textEnd != next && ps_parsing == XML_SUSPENDED) {
entity->processed = (int)(next - textStart);
processor = internalEntityProcessor;
}
else {
entity->open = XML_FALSE;
openInternalEntities = openEntity->next;
/* put openEntity back in list of free instances */
openEntity->next = freeInternalEntities;
freeInternalEntities = openEntity;
}
}
return result;
}
static enum XML_Error PTRCALL
internalEntityProcessor(XML_Parser parser,
const char *s,
const char *end,
const char **nextPtr)
{
ENTITY *entity;
const char *textStart, *textEnd;
const char *next;
enum XML_Error result;
OPEN_INTERNAL_ENTITY *openEntity = openInternalEntities;
if (!openEntity)
return XML_ERROR_UNEXPECTED_STATE;
entity = openEntity->entity;
textStart = ((char *)entity->textPtr) + entity->processed;
textEnd = (char *)(entity->textPtr + entity->textLen);
/* Set a safe default value in case 'next' does not get set */
next = textStart;
#ifdef XML_DTD
if (entity->is_param) {
int tok = XmlPrologTok(internalEncoding, textStart, textEnd, &next);
result = doProlog(parser, internalEncoding, textStart, textEnd, tok,
next, &next, XML_FALSE);
}
else
#endif /* XML_DTD */
result = doContent(parser, openEntity->startTagLevel, internalEncoding,
textStart, textEnd, &next, XML_FALSE);
if (result != XML_ERROR_NONE)
return result;
else if (textEnd != next && ps_parsing == XML_SUSPENDED) {
entity->processed = (int)(next - (char *)entity->textPtr);
return result;
}
else {
entity->open = XML_FALSE;
openInternalEntities = openEntity->next;
/* put openEntity back in list of free instances */
openEntity->next = freeInternalEntities;
freeInternalEntities = openEntity;
}
#ifdef XML_DTD
if (entity->is_param) {
int tok;
processor = prologProcessor;
tok = XmlPrologTok(encoding, s, end, &next);
return doProlog(parser, encoding, s, end, tok, next, nextPtr,
(XML_Bool)!ps_finalBuffer);
}
else
#endif /* XML_DTD */
{
processor = contentProcessor;
/* see externalEntityContentProcessor vs contentProcessor */
return doContent(parser, parentParser ? 1 : 0, encoding, s, end,
nextPtr, (XML_Bool)!ps_finalBuffer);
}
}
static enum XML_Error PTRCALL
errorProcessor(XML_Parser parser,
const char *UNUSED_P(s),
const char *UNUSED_P(end),
const char **UNUSED_P(nextPtr))
{
return errorCode;
}
static enum XML_Error
storeAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
const char *ptr, const char *end,
STRING_POOL *pool)
{
enum XML_Error result = appendAttributeValue(parser, enc, isCdata, ptr,
end, pool);
if (result)
return result;
if (!isCdata && poolLength(pool) && poolLastChar(pool) == 0x20)
poolChop(pool);
if (!poolAppendChar(pool, XML_T('\0')))
return XML_ERROR_NO_MEMORY;
return XML_ERROR_NONE;
}
static enum XML_Error
appendAttributeValue(XML_Parser parser, const ENCODING *enc, XML_Bool isCdata,
const char *ptr, const char *end,
STRING_POOL *pool)
{
DTD * const dtd = _dtd; /* save one level of indirection */
for (;;) {
const char *next;
int tok = XmlAttributeValueTok(enc, ptr, end, &next);
switch (tok) {
case XML_TOK_NONE:
return XML_ERROR_NONE;
case XML_TOK_INVALID:
if (enc == encoding)
eventPtr = next;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_PARTIAL:
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_INVALID_TOKEN;
case XML_TOK_CHAR_REF:
{
XML_Char buf[XML_ENCODE_MAX];
int i;
int n = XmlCharRefNumber(enc, ptr);
if (n < 0) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_BAD_CHAR_REF;
}
if (!isCdata
&& n == 0x20 /* space */
&& (poolLength(pool) == 0 || poolLastChar(pool) == 0x20))
break;
n = XmlEncode(n, (ICHAR *)buf);
if (!n) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_BAD_CHAR_REF;
}
for (i = 0; i < n; i++) {
if (!poolAppendChar(pool, buf[i]))
return XML_ERROR_NO_MEMORY;
}
}
break;
case XML_TOK_DATA_CHARS:
if (!poolAppend(pool, enc, ptr, next))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_TRAILING_CR:
next = ptr + enc->minBytesPerChar;
/* fall through */
case XML_TOK_ATTRIBUTE_VALUE_S:
case XML_TOK_DATA_NEWLINE:
if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == 0x20))
break;
if (!poolAppendChar(pool, 0x20))
return XML_ERROR_NO_MEMORY;
break;
case XML_TOK_ENTITY_REF:
{
const XML_Char *name;
ENTITY *entity;
char checkEntityDecl;
XML_Char ch = (XML_Char) XmlPredefinedEntityName(enc,
ptr + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (ch) {
if (!poolAppendChar(pool, ch))
return XML_ERROR_NO_MEMORY;
break;
}
name = poolStoreString(&temp2Pool, enc,
ptr + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!name)
return XML_ERROR_NO_MEMORY;
entity = (ENTITY *)lookup(parser, &dtd->generalEntities, name, 0);
poolDiscard(&temp2Pool);
/* First, determine if a check for an existing declaration is needed;
if yes, check that the entity exists, and that it is internal.
*/
if (pool == &dtd->pool) /* are we called from prolog? */
checkEntityDecl =
#ifdef XML_DTD
prologState.documentEntity &&
#endif /* XML_DTD */
(dtd->standalone
? !openInternalEntities
: !dtd->hasParamEntityRefs);
else /* if (pool == &tempPool): we are called from content */
checkEntityDecl = !dtd->hasParamEntityRefs || dtd->standalone;
if (checkEntityDecl) {
if (!entity)
return XML_ERROR_UNDEFINED_ENTITY;
else if (!entity->is_internal)
return XML_ERROR_ENTITY_DECLARED_IN_PE;
}
else if (!entity) {
/* Cannot report skipped entity here - see comments on
skippedEntityHandler.
if (skippedEntityHandler)
skippedEntityHandler(handlerArg, name, 0);
*/
/* Cannot call the default handler because this would be
out of sync with the call to the startElementHandler.
if ((pool == &tempPool) && defaultHandler)
reportDefault(parser, enc, ptr, next);
*/
break;
}
if (entity->open) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_RECURSIVE_ENTITY_REF;
}
if (entity->notation) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_BINARY_ENTITY_REF;
}
if (!entity->textPtr) {
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF;
}
else {
enum XML_Error result;
const XML_Char *textEnd = entity->textPtr + entity->textLen;
entity->open = XML_TRUE;
result = appendAttributeValue(parser, internalEncoding, isCdata,
(char *)entity->textPtr,
(char *)textEnd, pool);
entity->open = XML_FALSE;
if (result)
return result;
}
}
break;
default:
if (enc == encoding)
eventPtr = ptr;
return XML_ERROR_UNEXPECTED_STATE;
}
ptr = next;
}
/* not reached */
}
static enum XML_Error
storeEntityValue(XML_Parser parser,
const ENCODING *enc,
const char *entityTextPtr,
const char *entityTextEnd)
{
DTD * const dtd = _dtd; /* save one level of indirection */
STRING_POOL *pool = &(dtd->entityValuePool);
enum XML_Error result = XML_ERROR_NONE;
#ifdef XML_DTD
int oldInEntityValue = prologState.inEntityValue;
prologState.inEntityValue = 1;
#endif /* XML_DTD */
/* never return Null for the value argument in EntityDeclHandler,
since this would indicate an external entity; therefore we
have to make sure that entityValuePool.start is not null */
if (!pool->blocks) {
if (!poolGrow(pool))
return XML_ERROR_NO_MEMORY;
}
for (;;) {
const char *next;
int tok = XmlEntityValueTok(enc, entityTextPtr, entityTextEnd, &next);
switch (tok) {
case XML_TOK_PARAM_ENTITY_REF:
#ifdef XML_DTD
if (isParamEntity || enc != encoding) {
const XML_Char *name;
ENTITY *entity;
name = poolStoreString(&tempPool, enc,
entityTextPtr + enc->minBytesPerChar,
next - enc->minBytesPerChar);
if (!name) {
result = XML_ERROR_NO_MEMORY;
goto endEntityValue;
}
entity = (ENTITY *)lookup(parser, &dtd->paramEntities, name, 0);
poolDiscard(&tempPool);
if (!entity) {
/* not a well-formedness error - see XML 1.0: WFC Entity Declared */
/* cannot report skipped entity here - see comments on
skippedEntityHandler
if (skippedEntityHandler)
skippedEntityHandler(handlerArg, name, 0);
*/
dtd->keepProcessing = dtd->standalone;
goto endEntityValue;
}
if (entity->open) {
if (enc == encoding)
eventPtr = entityTextPtr;
result = XML_ERROR_RECURSIVE_ENTITY_REF;
goto endEntityValue;
}
if (entity->systemId) {
if (externalEntityRefHandler) {
dtd->paramEntityRead = XML_FALSE;
entity->open = XML_TRUE;
if (!externalEntityRefHandler(externalEntityRefHandlerArg,
0,
entity->base,
entity->systemId,
entity->publicId)) {
entity->open = XML_FALSE;
result = XML_ERROR_EXTERNAL_ENTITY_HANDLING;
goto endEntityValue;
}
entity->open = XML_FALSE;
if (!dtd->paramEntityRead)
dtd->keepProcessing = dtd->standalone;
}
else
dtd->keepProcessing = dtd->standalone;
}
else {
entity->open = XML_TRUE;
result = storeEntityValue(parser,
internalEncoding,
(char *)entity->textPtr,
(char *)(entity->textPtr
+ entity->textLen));
entity->open = XML_FALSE;
if (result)
goto endEntityValue;
}
break;
}
#endif /* XML_DTD */
/* In the internal subset, PE references are not legal
within markup declarations, e.g entity values in this case. */
eventPtr = entityTextPtr;
result = XML_ERROR_PARAM_ENTITY_REF;
goto endEntityValue;
case XML_TOK_NONE:
result = XML_ERROR_NONE;
goto endEntityValue;
case XML_TOK_ENTITY_REF:
case XML_TOK_DATA_CHARS:
if (!poolAppend(pool, enc, entityTextPtr, next)) {
result = XML_ERROR_NO_MEMORY;
goto endEntityValue;
}
break;
case XML_TOK_TRAILING_CR:
next = entityTextPtr + enc->minBytesPerChar;
/* fall through */
case XML_TOK_DATA_NEWLINE:
if (pool->end == pool->ptr && !poolGrow(pool)) {
result = XML_ERROR_NO_MEMORY;
goto endEntityValue;
}
*(pool->ptr)++ = 0xA;
break;
case XML_TOK_CHAR_REF:
{
XML_Char buf[XML_ENCODE_MAX];
int i;
int n = XmlCharRefNumber(enc, entityTextPtr);
if (n < 0) {
if (enc == encoding)
eventPtr = entityTextPtr;
result = XML_ERROR_BAD_CHAR_REF;
goto endEntityValue;
}
n = XmlEncode(n, (ICHAR *)buf);
if (!n) {
if (enc == encoding)
eventPtr = entityTextPtr;
result = XML_ERROR_BAD_CHAR_REF;
goto endEntityValue;
}
for (i = 0; i < n; i++) {
if (pool->end == pool->ptr && !poolGrow(pool)) {
result = XML_ERROR_NO_MEMORY;
goto endEntityValue;
}
*(pool->ptr)++ = buf[i];
}
}
break;
case XML_TOK_PARTIAL:
if (enc == encoding)
eventPtr = entityTextPtr;
result = XML_ERROR_INVALID_TOKEN;
goto endEntityValue;
case XML_TOK_INVALID:
if (enc == encoding)
eventPtr = next;
result = XML_ERROR_INVALID_TOKEN;
goto endEntityValue;
default:
if (enc == encoding)
eventPtr = entityTextPtr;
result = XML_ERROR_UNEXPECTED_STATE;
goto endEntityValue;
}
entityTextPtr = next;
}
endEntityValue:
#ifdef XML_DTD
prologState.inEntityValue = oldInEntityValue;
#endif /* XML_DTD */
return result;
}
static void FASTCALL
normalizeLines(XML_Char *s)
{
XML_Char *p;
for (;; s++) {
if (*s == XML_T('\0'))
return;
if (*s == 0xD)
break;
}
p = s;
do {
if (*s == 0xD) {
*p++ = 0xA;
if (*++s == 0xA)
s++;
}
else
*p++ = *s++;
} while (*s);
*p = XML_T('\0');
}
static int
reportProcessingInstruction(XML_Parser parser, const ENCODING *enc,
const char *start, const char *end)
{
const XML_Char *target;
XML_Char *data;
const char *tem;
if (!processingInstructionHandler) {
if (defaultHandler)
reportDefault(parser, enc, start, end);
return 1;
}
start += enc->minBytesPerChar * 2;
tem = start + XmlNameLength(enc, start);
target = poolStoreString(&tempPool, enc, start, tem);
if (!target)
return 0;
poolFinish(&tempPool);
data = poolStoreString(&tempPool, enc,
XmlSkipS(enc, tem),
end - enc->minBytesPerChar*2);
if (!data)
return 0;
normalizeLines(data);
processingInstructionHandler(handlerArg, target, data);
poolClear(&tempPool);
return 1;
}
static int
reportComment(XML_Parser parser, const ENCODING *enc,
const char *start, const char *end)
{
XML_Char *data;
if (!commentHandler) {
if (defaultHandler)
reportDefault(parser, enc, start, end);
return 1;
}
data = poolStoreString(&tempPool,
enc,
start + enc->minBytesPerChar * 4,
end - enc->minBytesPerChar * 3);
if (!data)
return 0;
normalizeLines(data);
commentHandler(handlerArg, data);
poolClear(&tempPool);
return 1;
}
static void
reportDefault(XML_Parser parser, const ENCODING *enc,
const char *s, const char *end)
{
if (MUST_CONVERT(enc, s)) {
enum XML_Convert_Result convert_res;
const char **eventPP;
const char **eventEndPP;
if (enc == encoding) {
eventPP = &eventPtr;
eventEndPP = &eventEndPtr;
}
else {
eventPP = &(openInternalEntities->internalEventPtr);
eventEndPP = &(openInternalEntities->internalEventEndPtr);
}
do {
ICHAR *dataPtr = (ICHAR *)dataBuf;
convert_res = XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd);
*eventEndPP = s;
defaultHandler(handlerArg, dataBuf, (int)(dataPtr - (ICHAR *)dataBuf));
*eventPP = s;
} while ((convert_res != XML_CONVERT_COMPLETED) && (convert_res != XML_CONVERT_INPUT_INCOMPLETE));
}
else
defaultHandler(handlerArg, (XML_Char *)s, (int)((XML_Char *)end - (XML_Char *)s));
}
static int
defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *attId, XML_Bool isCdata,
XML_Bool isId, const XML_Char *value, XML_Parser parser)
{
DEFAULT_ATTRIBUTE *att;
if (value || isId) {
/* The handling of default attributes gets messed up if we have
a default which duplicates a non-default. */
int i;
for (i = 0; i < type->nDefaultAtts; i++)
if (attId == type->defaultAtts[i].id)
return 1;
if (isId && !type->idAtt && !attId->xmlns)
type->idAtt = attId;
}
if (type->nDefaultAtts == type->allocDefaultAtts) {
if (type->allocDefaultAtts == 0) {
type->allocDefaultAtts = 8;
type->defaultAtts = (DEFAULT_ATTRIBUTE *)MALLOC(type->allocDefaultAtts
* sizeof(DEFAULT_ATTRIBUTE));
if (!type->defaultAtts)
return 0;
}
else {
DEFAULT_ATTRIBUTE *temp;
int count = type->allocDefaultAtts * 2;
temp = (DEFAULT_ATTRIBUTE *)
REALLOC(type->defaultAtts, (count * sizeof(DEFAULT_ATTRIBUTE)));
if (temp == NULL)
return 0;
type->allocDefaultAtts = count;
type->defaultAtts = temp;
}
}
att = type->defaultAtts + type->nDefaultAtts;
att->id = attId;
att->value = value;
att->isCdata = isCdata;
if (!isCdata)
attId->maybeTokenized = XML_TRUE;
type->nDefaultAtts += 1;
return 1;
}
static int
setElementTypePrefix(XML_Parser parser, ELEMENT_TYPE *elementType)
{
DTD * const dtd = _dtd; /* save one level of indirection */
const XML_Char *name;
for (name = elementType->name; *name; name++) {
if (*name == XML_T(ASCII_COLON)) {
PREFIX *prefix;
const XML_Char *s;
for (s = elementType->name; s != name; s++) {
if (!poolAppendChar(&dtd->pool, *s))
return 0;
}
if (!poolAppendChar(&dtd->pool, XML_T('\0')))
return 0;
prefix = (PREFIX *)lookup(parser, &dtd->prefixes, poolStart(&dtd->pool),
sizeof(PREFIX));
if (!prefix)
return 0;
if (prefix->name == poolStart(&dtd->pool))
poolFinish(&dtd->pool);
else
poolDiscard(&dtd->pool);
elementType->prefix = prefix;
}
}
return 1;
}
static ATTRIBUTE_ID *
getAttributeId(XML_Parser parser, const ENCODING *enc,
const char *start, const char *end)
{
DTD * const dtd = _dtd; /* save one level of indirection */
ATTRIBUTE_ID *id;
const XML_Char *name;
if (!poolAppendChar(&dtd->pool, XML_T('\0')))
return NULL;
name = poolStoreString(&dtd->pool, enc, start, end);
if (!name)
return NULL;
/* skip quotation mark - its storage will be re-used (like in name[-1]) */
++name;
id = (ATTRIBUTE_ID *)lookup(parser, &dtd->attributeIds, name, sizeof(ATTRIBUTE_ID));
if (!id)
return NULL;
if (id->name != name)
poolDiscard(&dtd->pool);
else {
poolFinish(&dtd->pool);
if (!ns)
;
else if (name[0] == XML_T(ASCII_x)
&& name[1] == XML_T(ASCII_m)
&& name[2] == XML_T(ASCII_l)
&& name[3] == XML_T(ASCII_n)
&& name[4] == XML_T(ASCII_s)
&& (name[5] == XML_T('\0') || name[5] == XML_T(ASCII_COLON))) {
if (name[5] == XML_T('\0'))
id->prefix = &dtd->defaultPrefix;
else
id->prefix = (PREFIX *)lookup(parser, &dtd->prefixes, name + 6, sizeof(PREFIX));
id->xmlns = XML_TRUE;
}
else {
int i;
for (i = 0; name[i]; i++) {
/* attributes without prefix are *not* in the default namespace */
if (name[i] == XML_T(ASCII_COLON)) {
int j;
for (j = 0; j < i; j++) {
if (!poolAppendChar(&dtd->pool, name[j]))
return NULL;
}
if (!poolAppendChar(&dtd->pool, XML_T('\0')))
return NULL;
id->prefix = (PREFIX *)lookup(parser, &dtd->prefixes, poolStart(&dtd->pool),
sizeof(PREFIX));
if (!id->prefix)
return NULL;
if (id->prefix->name == poolStart(&dtd->pool))
poolFinish(&dtd->pool);
else
poolDiscard(&dtd->pool);
break;
}
}
}
}
return id;
}
#define CONTEXT_SEP XML_T(ASCII_FF)
static const XML_Char *
getContext(XML_Parser parser)
{
DTD * const dtd = _dtd; /* save one level of indirection */
HASH_TABLE_ITER iter;
XML_Bool needSep = XML_FALSE;
if (dtd->defaultPrefix.binding) {
int i;
int len;
if (!poolAppendChar(&tempPool, XML_T(ASCII_EQUALS)))
return NULL;
len = dtd->defaultPrefix.binding->uriLen;
if (namespaceSeparator)
len--;
for (i = 0; i < len; i++)
if (!poolAppendChar(&tempPool, dtd->defaultPrefix.binding->uri[i]))
return NULL;
needSep = XML_TRUE;
}
hashTableIterInit(&iter, &(dtd->prefixes));
for (;;) {
int i;
int len;
const XML_Char *s;
PREFIX *prefix = (PREFIX *)hashTableIterNext(&iter);
if (!prefix)
break;
if (!prefix->binding)
continue;
if (needSep && !poolAppendChar(&tempPool, CONTEXT_SEP))
return NULL;
for (s = prefix->name; *s; s++)
if (!poolAppendChar(&tempPool, *s))
return NULL;
if (!poolAppendChar(&tempPool, XML_T(ASCII_EQUALS)))
return NULL;
len = prefix->binding->uriLen;
if (namespaceSeparator)
len--;
for (i = 0; i < len; i++)
if (!poolAppendChar(&tempPool, prefix->binding->uri[i]))
return NULL;
needSep = XML_TRUE;
}
hashTableIterInit(&iter, &(dtd->generalEntities));
for (;;) {
const XML_Char *s;
ENTITY *e = (ENTITY *)hashTableIterNext(&iter);
if (!e)
break;
if (!e->open)
continue;
if (needSep && !poolAppendChar(&tempPool, CONTEXT_SEP))
return NULL;
for (s = e->name; *s; s++)
if (!poolAppendChar(&tempPool, *s))
return 0;
needSep = XML_TRUE;
}
if (!poolAppendChar(&tempPool, XML_T('\0')))
return NULL;
return tempPool.start;
}
static XML_Bool
setContext(XML_Parser parser, const XML_Char *context)
{
DTD * const dtd = _dtd; /* save one level of indirection */
const XML_Char *s = context;
while (*context != XML_T('\0')) {
if (*s == CONTEXT_SEP || *s == XML_T('\0')) {
ENTITY *e;
if (!poolAppendChar(&tempPool, XML_T('\0')))
return XML_FALSE;
e = (ENTITY *)lookup(parser, &dtd->generalEntities, poolStart(&tempPool), 0);
if (e)
e->open = XML_TRUE;
if (*s != XML_T('\0'))
s++;
context = s;
poolDiscard(&tempPool);
}
else if (*s == XML_T(ASCII_EQUALS)) {
PREFIX *prefix;
if (poolLength(&tempPool) == 0)
prefix = &dtd->defaultPrefix;
else {
if (!poolAppendChar(&tempPool, XML_T('\0')))
return XML_FALSE;
prefix = (PREFIX *)lookup(parser, &dtd->prefixes, poolStart(&tempPool),
sizeof(PREFIX));
if (!prefix)
return XML_FALSE;
if (prefix->name == poolStart(&tempPool)) {
prefix->name = poolCopyString(&dtd->pool, prefix->name);
if (!prefix->name)
return XML_FALSE;
}
poolDiscard(&tempPool);
}
for (context = s + 1;
*context != CONTEXT_SEP && *context != XML_T('\0');
context++)
if (!poolAppendChar(&tempPool, *context))
return XML_FALSE;
if (!poolAppendChar(&tempPool, XML_T('\0')))
return XML_FALSE;
if (addBinding(parser, prefix, NULL, poolStart(&tempPool),
&inheritedBindings) != XML_ERROR_NONE)
return XML_FALSE;
poolDiscard(&tempPool);
if (*context != XML_T('\0'))
++context;
s = context;
}
else {
if (!poolAppendChar(&tempPool, *s))
return XML_FALSE;
s++;
}
}
return XML_TRUE;
}
static void FASTCALL
normalizePublicId(XML_Char *publicId)
{
XML_Char *p = publicId;
XML_Char *s;
for (s = publicId; *s; s++) {
switch (*s) {
case 0x20:
case 0xD:
case 0xA:
if (p != publicId && p[-1] != 0x20)
*p++ = 0x20;
break;
default:
*p++ = *s;
}
}
if (p != publicId && p[-1] == 0x20)
--p;
*p = XML_T('\0');
}
static DTD *
dtdCreate(const XML_Memory_Handling_Suite *ms)
{
DTD *p = (DTD *)ms->malloc_fcn(sizeof(DTD));
if (p == NULL)
return p;
poolInit(&(p->pool), ms);
poolInit(&(p->entityValuePool), ms);
hashTableInit(&(p->generalEntities), ms);
hashTableInit(&(p->elementTypes), ms);
hashTableInit(&(p->attributeIds), ms);
hashTableInit(&(p->prefixes), ms);
#ifdef XML_DTD
p->paramEntityRead = XML_FALSE;
hashTableInit(&(p->paramEntities), ms);
#endif /* XML_DTD */
p->defaultPrefix.name = NULL;
p->defaultPrefix.binding = NULL;
p->in_eldecl = XML_FALSE;
p->scaffIndex = NULL;
p->scaffold = NULL;
p->scaffLevel = 0;
p->scaffSize = 0;
p->scaffCount = 0;
p->contentStringLen = 0;
p->keepProcessing = XML_TRUE;
p->hasParamEntityRefs = XML_FALSE;
p->standalone = XML_FALSE;
return p;
}
static void
dtdReset(DTD *p, const XML_Memory_Handling_Suite *ms)
{
HASH_TABLE_ITER iter;
hashTableIterInit(&iter, &(p->elementTypes));
for (;;) {
ELEMENT_TYPE *e = (ELEMENT_TYPE *)hashTableIterNext(&iter);
if (!e)
break;
if (e->allocDefaultAtts != 0)
ms->free_fcn(e->defaultAtts);
}
hashTableClear(&(p->generalEntities));
#ifdef XML_DTD
p->paramEntityRead = XML_FALSE;
hashTableClear(&(p->paramEntities));
#endif /* XML_DTD */
hashTableClear(&(p->elementTypes));
hashTableClear(&(p->attributeIds));
hashTableClear(&(p->prefixes));
poolClear(&(p->pool));
poolClear(&(p->entityValuePool));
p->defaultPrefix.name = NULL;
p->defaultPrefix.binding = NULL;
p->in_eldecl = XML_FALSE;
ms->free_fcn(p->scaffIndex);
p->scaffIndex = NULL;
ms->free_fcn(p->scaffold);
p->scaffold = NULL;
p->scaffLevel = 0;
p->scaffSize = 0;
p->scaffCount = 0;
p->contentStringLen = 0;
p->keepProcessing = XML_TRUE;
p->hasParamEntityRefs = XML_FALSE;
p->standalone = XML_FALSE;
}
static void
dtdDestroy(DTD *p, XML_Bool isDocEntity, const XML_Memory_Handling_Suite *ms)
{
HASH_TABLE_ITER iter;
hashTableIterInit(&iter, &(p->elementTypes));
for (;;) {
ELEMENT_TYPE *e = (ELEMENT_TYPE *)hashTableIterNext(&iter);
if (!e)
break;
if (e->allocDefaultAtts != 0)
ms->free_fcn(e->defaultAtts);
}
hashTableDestroy(&(p->generalEntities));
#ifdef XML_DTD
hashTableDestroy(&(p->paramEntities));
#endif /* XML_DTD */
hashTableDestroy(&(p->elementTypes));
hashTableDestroy(&(p->attributeIds));
hashTableDestroy(&(p->prefixes));
poolDestroy(&(p->pool));
poolDestroy(&(p->entityValuePool));
if (isDocEntity) {
ms->free_fcn(p->scaffIndex);
ms->free_fcn(p->scaffold);
}
ms->free_fcn(p);
}
/* Do a deep copy of the DTD. Return 0 for out of memory, non-zero otherwise.
The new DTD has already been initialized.
*/
static int
dtdCopy(XML_Parser oldParser, DTD *newDtd, const DTD *oldDtd, const XML_Memory_Handling_Suite *ms)
{
HASH_TABLE_ITER iter;
/* Copy the prefix table. */
hashTableIterInit(&iter, &(oldDtd->prefixes));
for (;;) {
const XML_Char *name;
const PREFIX *oldP = (PREFIX *)hashTableIterNext(&iter);
if (!oldP)
break;
name = poolCopyString(&(newDtd->pool), oldP->name);
if (!name)
return 0;
if (!lookup(oldParser, &(newDtd->prefixes), name, sizeof(PREFIX)))
return 0;
}
hashTableIterInit(&iter, &(oldDtd->attributeIds));
/* Copy the attribute id table. */
for (;;) {
ATTRIBUTE_ID *newA;
const XML_Char *name;
const ATTRIBUTE_ID *oldA = (ATTRIBUTE_ID *)hashTableIterNext(&iter);
if (!oldA)
break;
/* Remember to allocate the scratch byte before the name. */
if (!poolAppendChar(&(newDtd->pool), XML_T('\0')))
return 0;
name = poolCopyString(&(newDtd->pool), oldA->name);
if (!name)
return 0;
++name;
newA = (ATTRIBUTE_ID *)lookup(oldParser, &(newDtd->attributeIds), name,
sizeof(ATTRIBUTE_ID));
if (!newA)
return 0;
newA->maybeTokenized = oldA->maybeTokenized;
if (oldA->prefix) {
newA->xmlns = oldA->xmlns;
if (oldA->prefix == &oldDtd->defaultPrefix)
newA->prefix = &newDtd->defaultPrefix;
else
newA->prefix = (PREFIX *)lookup(oldParser, &(newDtd->prefixes),
oldA->prefix->name, 0);
}
}
/* Copy the element type table. */
hashTableIterInit(&iter, &(oldDtd->elementTypes));
for (;;) {
int i;
ELEMENT_TYPE *newE;
const XML_Char *name;
const ELEMENT_TYPE *oldE = (ELEMENT_TYPE *)hashTableIterNext(&iter);
if (!oldE)
break;
name = poolCopyString(&(newDtd->pool), oldE->name);
if (!name)
return 0;
newE = (ELEMENT_TYPE *)lookup(oldParser, &(newDtd->elementTypes), name,
sizeof(ELEMENT_TYPE));
if (!newE)
return 0;
if (oldE->nDefaultAtts) {
newE->defaultAtts = (DEFAULT_ATTRIBUTE *)
ms->malloc_fcn(oldE->nDefaultAtts * sizeof(DEFAULT_ATTRIBUTE));
if (!newE->defaultAtts) {
return 0;
}
}
if (oldE->idAtt)
newE->idAtt = (ATTRIBUTE_ID *)
lookup(oldParser, &(newDtd->attributeIds), oldE->idAtt->name, 0);
newE->allocDefaultAtts = newE->nDefaultAtts = oldE->nDefaultAtts;
if (oldE->prefix)
newE->prefix = (PREFIX *)lookup(oldParser, &(newDtd->prefixes),
oldE->prefix->name, 0);
for (i = 0; i < newE->nDefaultAtts; i++) {
newE->defaultAtts[i].id = (ATTRIBUTE_ID *)
lookup(oldParser, &(newDtd->attributeIds), oldE->defaultAtts[i].id->name, 0);
newE->defaultAtts[i].isCdata = oldE->defaultAtts[i].isCdata;
if (oldE->defaultAtts[i].value) {
newE->defaultAtts[i].value
= poolCopyString(&(newDtd->pool), oldE->defaultAtts[i].value);
if (!newE->defaultAtts[i].value)
return 0;
}
else
newE->defaultAtts[i].value = NULL;
}
}
/* Copy the entity tables. */
if (!copyEntityTable(oldParser,
&(newDtd->generalEntities),
&(newDtd->pool),
&(oldDtd->generalEntities)))
return 0;
#ifdef XML_DTD
if (!copyEntityTable(oldParser,
&(newDtd->paramEntities),
&(newDtd->pool),
&(oldDtd->paramEntities)))
return 0;
newDtd->paramEntityRead = oldDtd->paramEntityRead;
#endif /* XML_DTD */
newDtd->keepProcessing = oldDtd->keepProcessing;
newDtd->hasParamEntityRefs = oldDtd->hasParamEntityRefs;
newDtd->standalone = oldDtd->standalone;
/* Don't want deep copying for scaffolding */
newDtd->in_eldecl = oldDtd->in_eldecl;
newDtd->scaffold = oldDtd->scaffold;
newDtd->contentStringLen = oldDtd->contentStringLen;
newDtd->scaffSize = oldDtd->scaffSize;
newDtd->scaffLevel = oldDtd->scaffLevel;
newDtd->scaffIndex = oldDtd->scaffIndex;
return 1;
} /* End dtdCopy */
static int
copyEntityTable(XML_Parser oldParser,
HASH_TABLE *newTable,
STRING_POOL *newPool,
const HASH_TABLE *oldTable)
{
HASH_TABLE_ITER iter;
const XML_Char *cachedOldBase = NULL;
const XML_Char *cachedNewBase = NULL;
hashTableIterInit(&iter, oldTable);
for (;;) {
ENTITY *newE;
const XML_Char *name;
const ENTITY *oldE = (ENTITY *)hashTableIterNext(&iter);
if (!oldE)
break;
name = poolCopyString(newPool, oldE->name);
if (!name)
return 0;
newE = (ENTITY *)lookup(oldParser, newTable, name, sizeof(ENTITY));
if (!newE)
return 0;
if (oldE->systemId) {
const XML_Char *tem = poolCopyString(newPool, oldE->systemId);
if (!tem)
return 0;
newE->systemId = tem;
if (oldE->base) {
if (oldE->base == cachedOldBase)
newE->base = cachedNewBase;
else {
cachedOldBase = oldE->base;
tem = poolCopyString(newPool, cachedOldBase);
if (!tem)
return 0;
cachedNewBase = newE->base = tem;
}
}
if (oldE->publicId) {
tem = poolCopyString(newPool, oldE->publicId);
if (!tem)
return 0;
newE->publicId = tem;
}
}
else {
const XML_Char *tem = poolCopyStringN(newPool, oldE->textPtr,
oldE->textLen);
if (!tem)
return 0;
newE->textPtr = tem;
newE->textLen = oldE->textLen;
}
if (oldE->notation) {
const XML_Char *tem = poolCopyString(newPool, oldE->notation);
if (!tem)
return 0;
newE->notation = tem;
}
newE->is_param = oldE->is_param;
newE->is_internal = oldE->is_internal;
}
return 1;
}
#define INIT_POWER 6
static XML_Bool FASTCALL
keyeq(KEY s1, KEY s2)
{
for (; *s1 == *s2; s1++, s2++)
if (*s1 == 0)
return XML_TRUE;
return XML_FALSE;
}
static size_t
keylen(KEY s)
{
size_t len = 0;
for (; *s; s++, len++);
return len;
}
static void
copy_salt_to_sipkey(XML_Parser parser, struct sipkey * key)
{
key->k[0] = 0;
key->k[1] = get_hash_secret_salt(parser);
}
static unsigned long FASTCALL
hash(XML_Parser parser, KEY s)
{
struct siphash state;
struct sipkey key;
(void)sip_tobin;
(void)sip24_valid;
copy_salt_to_sipkey(parser, &key);
sip24_init(&state, &key);
sip24_update(&state, s, keylen(s) * sizeof(XML_Char));
return (unsigned long)sip24_final(&state);
}
static NAMED *
lookup(XML_Parser parser, HASH_TABLE *table, KEY name, size_t createSize)
{
size_t i;
if (table->size == 0) {
size_t tsize;
if (!createSize)
return NULL;
table->power = INIT_POWER;
/* table->size is a power of 2 */
table->size = (size_t)1 << INIT_POWER;
tsize = table->size * sizeof(NAMED *);
table->v = (NAMED **)table->mem->malloc_fcn(tsize);
if (!table->v) {
table->size = 0;
return NULL;
}
memset(table->v, 0, tsize);
i = hash(parser, name) & ((unsigned long)table->size - 1);
}
else {
unsigned long h = hash(parser, name);
unsigned long mask = (unsigned long)table->size - 1;
unsigned char step = 0;
i = h & mask;
while (table->v[i]) {
if (keyeq(name, table->v[i]->name))
return table->v[i];
if (!step)
step = PROBE_STEP(h, mask, table->power);
i < step ? (i += table->size - step) : (i -= step);
}
if (!createSize)
return NULL;
/* check for overflow (table is half full) */
if (table->used >> (table->power - 1)) {
unsigned char newPower = table->power + 1;
size_t newSize = (size_t)1 << newPower;
unsigned long newMask = (unsigned long)newSize - 1;
size_t tsize = newSize * sizeof(NAMED *);
NAMED **newV = (NAMED **)table->mem->malloc_fcn(tsize);
if (!newV)
return NULL;
memset(newV, 0, tsize);
for (i = 0; i < table->size; i++)
if (table->v[i]) {
unsigned long newHash = hash(parser, table->v[i]->name);
size_t j = newHash & newMask;
step = 0;
while (newV[j]) {
if (!step)
step = PROBE_STEP(newHash, newMask, newPower);
j < step ? (j += newSize - step) : (j -= step);
}
newV[j] = table->v[i];
}
table->mem->free_fcn(table->v);
table->v = newV;
table->power = newPower;
table->size = newSize;
i = h & newMask;
step = 0;
while (table->v[i]) {
if (!step)
step = PROBE_STEP(h, newMask, newPower);
i < step ? (i += newSize - step) : (i -= step);
}
}
}
table->v[i] = (NAMED *)table->mem->malloc_fcn(createSize);
if (!table->v[i])
return NULL;
memset(table->v[i], 0, createSize);
table->v[i]->name = name;
(table->used)++;
return table->v[i];
}
static void FASTCALL
hashTableClear(HASH_TABLE *table)
{
size_t i;
for (i = 0; i < table->size; i++) {
table->mem->free_fcn(table->v[i]);
table->v[i] = NULL;
}
table->used = 0;
}
static void FASTCALL
hashTableDestroy(HASH_TABLE *table)
{
size_t i;
for (i = 0; i < table->size; i++)
table->mem->free_fcn(table->v[i]);
table->mem->free_fcn(table->v);
}
static void FASTCALL
hashTableInit(HASH_TABLE *p, const XML_Memory_Handling_Suite *ms)
{
p->power = 0;
p->size = 0;
p->used = 0;
p->v = NULL;
p->mem = ms;
}
static void FASTCALL
hashTableIterInit(HASH_TABLE_ITER *iter, const HASH_TABLE *table)
{
iter->p = table->v;
iter->end = iter->p + table->size;
}
static NAMED * FASTCALL
hashTableIterNext(HASH_TABLE_ITER *iter)
{
while (iter->p != iter->end) {
NAMED *tem = *(iter->p)++;
if (tem)
return tem;
}
return NULL;
}
static void FASTCALL
poolInit(STRING_POOL *pool, const XML_Memory_Handling_Suite *ms)
{
pool->blocks = NULL;
pool->freeBlocks = NULL;
pool->start = NULL;
pool->ptr = NULL;
pool->end = NULL;
pool->mem = ms;
}
static void FASTCALL
poolClear(STRING_POOL *pool)
{
if (!pool->freeBlocks)
pool->freeBlocks = pool->blocks;
else {
BLOCK *p = pool->blocks;
while (p) {
BLOCK *tem = p->next;
p->next = pool->freeBlocks;
pool->freeBlocks = p;
p = tem;
}
}
pool->blocks = NULL;
pool->start = NULL;
pool->ptr = NULL;
pool->end = NULL;
}
static void FASTCALL
poolDestroy(STRING_POOL *pool)
{
BLOCK *p = pool->blocks;
while (p) {
BLOCK *tem = p->next;
pool->mem->free_fcn(p);
p = tem;
}
p = pool->freeBlocks;
while (p) {
BLOCK *tem = p->next;
pool->mem->free_fcn(p);
p = tem;
}
}
static XML_Char *
poolAppend(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end)
{
if (!pool->ptr && !poolGrow(pool))
return NULL;
for (;;) {
const enum XML_Convert_Result convert_res = XmlConvert(enc, &ptr, end, (ICHAR **)&(pool->ptr), (ICHAR *)pool->end);
if ((convert_res == XML_CONVERT_COMPLETED) || (convert_res == XML_CONVERT_INPUT_INCOMPLETE))
break;
if (!poolGrow(pool))
return NULL;
}
return pool->start;
}
static const XML_Char * FASTCALL
poolCopyString(STRING_POOL *pool, const XML_Char *s)
{
do {
if (!poolAppendChar(pool, *s))
return NULL;
} while (*s++);
s = pool->start;
poolFinish(pool);
return s;
}
static const XML_Char *
poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n)
{
if (!pool->ptr && !poolGrow(pool))
return NULL;
for (; n > 0; --n, s++) {
if (!poolAppendChar(pool, *s))
return NULL;
}
s = pool->start;
poolFinish(pool);
return s;
}
static const XML_Char * FASTCALL
poolAppendString(STRING_POOL *pool, const XML_Char *s)
{
while (*s) {
if (!poolAppendChar(pool, *s))
return NULL;
s++;
}
return pool->start;
}
static XML_Char *
poolStoreString(STRING_POOL *pool, const ENCODING *enc,
const char *ptr, const char *end)
{
if (!poolAppend(pool, enc, ptr, end))
return NULL;
if (pool->ptr == pool->end && !poolGrow(pool))
return NULL;
*(pool->ptr)++ = 0;
return pool->start;
}
static size_t
poolBytesToAllocateFor(int blockSize)
{
/* Unprotected math would be:
** return offsetof(BLOCK, s) + blockSize * sizeof(XML_Char);
**
** Detect overflow, avoiding _signed_ overflow undefined behavior
** For a + b * c we check b * c in isolation first, so that addition of a
** on top has no chance of making us accept a small non-negative number
*/
const size_t stretch = sizeof(XML_Char); /* can be 4 bytes */
if (blockSize <= 0)
return 0;
if (blockSize > (int)(INT_MAX / stretch))
return 0;
{
const int stretchedBlockSize = blockSize * (int)stretch;
const int bytesToAllocate = (int)(
offsetof(BLOCK, s) + (unsigned)stretchedBlockSize);
if (bytesToAllocate < 0)
return 0;
return (size_t)bytesToAllocate;
}
}
static XML_Bool FASTCALL
poolGrow(STRING_POOL *pool)
{
if (pool->freeBlocks) {
if (pool->start == 0) {
pool->blocks = pool->freeBlocks;
pool->freeBlocks = pool->freeBlocks->next;
pool->blocks->next = NULL;
pool->start = pool->blocks->s;
pool->end = pool->start + pool->blocks->size;
pool->ptr = pool->start;
return XML_TRUE;
}
if (pool->end - pool->start < pool->freeBlocks->size) {
BLOCK *tem = pool->freeBlocks->next;
pool->freeBlocks->next = pool->blocks;
pool->blocks = pool->freeBlocks;
pool->freeBlocks = tem;
memcpy(pool->blocks->s, pool->start,
(pool->end - pool->start) * sizeof(XML_Char));
pool->ptr = pool->blocks->s + (pool->ptr - pool->start);
pool->start = pool->blocks->s;
pool->end = pool->start + pool->blocks->size;
return XML_TRUE;
}
}
if (pool->blocks && pool->start == pool->blocks->s) {
BLOCK *temp;
int blockSize = (int)((unsigned)(pool->end - pool->start)*2U);
size_t bytesToAllocate;
if (blockSize < 0)
return XML_FALSE;
bytesToAllocate = poolBytesToAllocateFor(blockSize);
if (bytesToAllocate == 0)
return XML_FALSE;
temp = (BLOCK *)
pool->mem->realloc_fcn(pool->blocks, (unsigned)bytesToAllocate);
if (temp == NULL)
return XML_FALSE;
pool->blocks = temp;
pool->blocks->size = blockSize;
pool->ptr = pool->blocks->s + (pool->ptr - pool->start);
pool->start = pool->blocks->s;
pool->end = pool->start + blockSize;
}
else {
BLOCK *tem;
int blockSize = (int)(pool->end - pool->start);
size_t bytesToAllocate;
if (blockSize < 0)
return XML_FALSE;
if (blockSize < INIT_BLOCK_SIZE)
blockSize = INIT_BLOCK_SIZE;
else {
/* Detect overflow, avoiding _signed_ overflow undefined behavior */
if ((int)((unsigned)blockSize * 2U) < 0) {
return XML_FALSE;
}
blockSize *= 2;
}
bytesToAllocate = poolBytesToAllocateFor(blockSize);
if (bytesToAllocate == 0)
return XML_FALSE;
tem = (BLOCK *)pool->mem->malloc_fcn(bytesToAllocate);
if (!tem)
return XML_FALSE;
tem->size = blockSize;
tem->next = pool->blocks;
pool->blocks = tem;
if (pool->ptr != pool->start)
memcpy(tem->s, pool->start,
(pool->ptr - pool->start) * sizeof(XML_Char));
pool->ptr = tem->s + (pool->ptr - pool->start);
pool->start = tem->s;
pool->end = tem->s + blockSize;
}
return XML_TRUE;
}
static int FASTCALL
nextScaffoldPart(XML_Parser parser)
{
DTD * const dtd = _dtd; /* save one level of indirection */
CONTENT_SCAFFOLD * me;
int next;
if (!dtd->scaffIndex) {
dtd->scaffIndex = (int *)MALLOC(groupSize * sizeof(int));
if (!dtd->scaffIndex)
return -1;
dtd->scaffIndex[0] = 0;
}
if (dtd->scaffCount >= dtd->scaffSize) {
CONTENT_SCAFFOLD *temp;
if (dtd->scaffold) {
temp = (CONTENT_SCAFFOLD *)
REALLOC(dtd->scaffold, dtd->scaffSize * 2 * sizeof(CONTENT_SCAFFOLD));
if (temp == NULL)
return -1;
dtd->scaffSize *= 2;
}
else {
temp = (CONTENT_SCAFFOLD *)MALLOC(INIT_SCAFFOLD_ELEMENTS
* sizeof(CONTENT_SCAFFOLD));
if (temp == NULL)
return -1;
dtd->scaffSize = INIT_SCAFFOLD_ELEMENTS;
}
dtd->scaffold = temp;
}
next = dtd->scaffCount++;
me = &dtd->scaffold[next];
if (dtd->scaffLevel) {
CONTENT_SCAFFOLD *parent = &dtd->scaffold[dtd->scaffIndex[dtd->scaffLevel-1]];
if (parent->lastchild) {
dtd->scaffold[parent->lastchild].nextsib = next;
}
if (!parent->childcnt)
parent->firstchild = next;
parent->lastchild = next;
parent->childcnt++;
}
me->firstchild = me->lastchild = me->childcnt = me->nextsib = 0;
return next;
}
static void
build_node(XML_Parser parser,
int src_node,
XML_Content *dest,
XML_Content **contpos,
XML_Char **strpos)
{
DTD * const dtd = _dtd; /* save one level of indirection */
dest->type = dtd->scaffold[src_node].type;
dest->quant = dtd->scaffold[src_node].quant;
if (dest->type == XML_CTYPE_NAME) {
const XML_Char *src;
dest->name = *strpos;
src = dtd->scaffold[src_node].name;
for (;;) {
*(*strpos)++ = *src;
if (!*src)
break;
src++;
}
dest->numchildren = 0;
dest->children = NULL;
}
else {
unsigned int i;
int cn;
dest->numchildren = dtd->scaffold[src_node].childcnt;
dest->children = *contpos;
*contpos += dest->numchildren;
for (i = 0, cn = dtd->scaffold[src_node].firstchild;
i < dest->numchildren;
i++, cn = dtd->scaffold[cn].nextsib) {
build_node(parser, cn, &(dest->children[i]), contpos, strpos);
}
dest->name = NULL;
}
}
static XML_Content *
build_model (XML_Parser parser)
{
DTD * const dtd = _dtd; /* save one level of indirection */
XML_Content *ret;
XML_Content *cpos;
XML_Char * str;
int allocsize = (dtd->scaffCount * sizeof(XML_Content)
+ (dtd->contentStringLen * sizeof(XML_Char)));
ret = (XML_Content *)MALLOC(allocsize);
if (!ret)
return NULL;
str = (XML_Char *) (&ret[dtd->scaffCount]);
cpos = &ret[1];
build_node(parser, 0, ret, &cpos, &str);
return ret;
}
static ELEMENT_TYPE *
getElementType(XML_Parser parser,
const ENCODING *enc,
const char *ptr,
const char *end)
{
DTD * const dtd = _dtd; /* save one level of indirection */
const XML_Char *name = poolStoreString(&dtd->pool, enc, ptr, end);
ELEMENT_TYPE *ret;
if (!name)
return NULL;
ret = (ELEMENT_TYPE *) lookup(parser, &dtd->elementTypes, name, sizeof(ELEMENT_TYPE));
if (!ret)
return NULL;
if (ret->name != name)
poolDiscard(&dtd->pool);
else {
poolFinish(&dtd->pool);
if (!setElementTypePrefix(parser, ret))
return NULL;
}
return ret;
}
hexpat-0.20.13/cbits/xmltok_impl.h 0000644 0000000 0000000 00000001225 13122604047 015147 0 ustar 00 0000000 0000000 /*
Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
enum {
BT_NONXML,
BT_MALFORM,
BT_LT,
BT_AMP,
BT_RSQB,
BT_LEAD2,
BT_LEAD3,
BT_LEAD4,
BT_TRAIL,
BT_CR,
BT_LF,
BT_GT,
BT_QUOT,
BT_APOS,
BT_EQUALS,
BT_QUEST,
BT_EXCL,
BT_SOL,
BT_SEMI,
BT_NUM,
BT_LSQB,
BT_S,
BT_NMSTRT,
BT_COLON,
BT_HEX,
BT_DIGIT,
BT_NAME,
BT_MINUS,
BT_OTHER, /* known not to be a name or name start character */
BT_NONASCII, /* might be a name or name start character */
BT_PERCNT,
BT_LPAR,
BT_RPAR,
BT_AST,
BT_PLUS,
BT_COMMA,
BT_VERBAR
};
#include
hexpat-0.20.13/cbits/xmltok_impl.c 0000644 0000000 0000000 00000127751 13122604047 015157 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
/* This file is included! */
#ifdef XML_TOK_IMPL_C
#ifndef IS_INVALID_CHAR
#define IS_INVALID_CHAR(enc, ptr, n) (0)
#endif
#define INVALID_LEAD_CASE(n, ptr, nextTokPtr) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (IS_INVALID_CHAR(enc, ptr, n)) { \
*(nextTokPtr) = (ptr); \
return XML_TOK_INVALID; \
} \
ptr += n; \
break;
#define INVALID_CASES(ptr, nextTokPtr) \
INVALID_LEAD_CASE(2, ptr, nextTokPtr) \
INVALID_LEAD_CASE(3, ptr, nextTokPtr) \
INVALID_LEAD_CASE(4, ptr, nextTokPtr) \
case BT_NONXML: \
case BT_MALFORM: \
case BT_TRAIL: \
*(nextTokPtr) = (ptr); \
return XML_TOK_INVALID;
#define CHECK_NAME_CASE(n, enc, ptr, end, nextTokPtr) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (!IS_NAME_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
ptr += n; \
break;
#define CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) \
case BT_NONASCII: \
if (!IS_NAME_CHAR_MINBPC(enc, ptr)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
case BT_NMSTRT: \
case BT_HEX: \
case BT_DIGIT: \
case BT_NAME: \
case BT_MINUS: \
ptr += MINBPC(enc); \
break; \
CHECK_NAME_CASE(2, enc, ptr, end, nextTokPtr) \
CHECK_NAME_CASE(3, enc, ptr, end, nextTokPtr) \
CHECK_NAME_CASE(4, enc, ptr, end, nextTokPtr)
#define CHECK_NMSTRT_CASE(n, enc, ptr, end, nextTokPtr) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (!IS_NMSTRT_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
ptr += n; \
break;
#define CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) \
case BT_NONASCII: \
if (!IS_NMSTRT_CHAR_MINBPC(enc, ptr)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
case BT_NMSTRT: \
case BT_HEX: \
ptr += MINBPC(enc); \
break; \
CHECK_NMSTRT_CASE(2, enc, ptr, end, nextTokPtr) \
CHECK_NMSTRT_CASE(3, enc, ptr, end, nextTokPtr) \
CHECK_NMSTRT_CASE(4, enc, ptr, end, nextTokPtr)
#ifndef PREFIX
#define PREFIX(ident) ident
#endif
#define HAS_CHARS(enc, ptr, end, count) \
(end - ptr >= count * MINBPC(enc))
#define HAS_CHAR(enc, ptr, end) \
HAS_CHARS(enc, ptr, end, 1)
#define REQUIRE_CHARS(enc, ptr, end, count) \
{ \
if (! HAS_CHARS(enc, ptr, end, count)) { \
return XML_TOK_PARTIAL; \
} \
}
#define REQUIRE_CHAR(enc, ptr, end) \
REQUIRE_CHARS(enc, ptr, end, 1)
/* ptr points to character following " */
switch (BYTE_TYPE(enc, ptr + MINBPC(enc))) {
case BT_S: case BT_CR: case BT_LF: case BT_PERCNT:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
/* fall through */
case BT_S: case BT_CR: case BT_LF:
*nextTokPtr = ptr;
return XML_TOK_DECL_OPEN;
case BT_NMSTRT:
case BT_HEX:
ptr += MINBPC(enc);
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static int PTRCALL
PREFIX(checkPiTarget)(const ENCODING *UNUSED_P(enc), const char *ptr,
const char *end, int *tokPtr)
{
int upper = 0;
*tokPtr = XML_TOK_PI;
if (end - ptr != MINBPC(enc)*3)
return 1;
switch (BYTE_TO_ASCII(enc, ptr)) {
case ASCII_x:
break;
case ASCII_X:
upper = 1;
break;
default:
return 1;
}
ptr += MINBPC(enc);
switch (BYTE_TO_ASCII(enc, ptr)) {
case ASCII_m:
break;
case ASCII_M:
upper = 1;
break;
default:
return 1;
}
ptr += MINBPC(enc);
switch (BYTE_TO_ASCII(enc, ptr)) {
case ASCII_l:
break;
case ASCII_L:
upper = 1;
break;
default:
return 1;
}
if (upper)
return 0;
*tokPtr = XML_TOK_XML_DECL;
return 1;
}
/* ptr points to character following "" */
static int PTRCALL
PREFIX(scanPi)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
int tok;
const char *target = ptr;
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
if (!PREFIX(checkPiTarget)(enc, target, ptr, &tok)) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
ptr += MINBPC(enc);
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
INVALID_CASES(ptr, nextTokPtr)
case BT_QUEST:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (CHAR_MATCHES(enc, ptr, ASCII_GT)) {
*nextTokPtr = ptr + MINBPC(enc);
return tok;
}
break;
default:
ptr += MINBPC(enc);
break;
}
}
return XML_TOK_PARTIAL;
case BT_QUEST:
if (!PREFIX(checkPiTarget)(enc, target, ptr, &tok)) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (CHAR_MATCHES(enc, ptr, ASCII_GT)) {
*nextTokPtr = ptr + MINBPC(enc);
return tok;
}
/* fall through */
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static int PTRCALL
PREFIX(scanCdataSection)(const ENCODING *UNUSED_P(enc), const char *ptr,
const char *end, const char **nextTokPtr)
{
static const char CDATA_LSQB[] = { ASCII_C, ASCII_D, ASCII_A,
ASCII_T, ASCII_A, ASCII_LSQB };
int i;
/* CDATA[ */
REQUIRE_CHARS(enc, ptr, end, 6);
for (i = 0; i < 6; i++, ptr += MINBPC(enc)) {
if (!CHAR_MATCHES(enc, ptr, CDATA_LSQB[i])) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
*nextTokPtr = ptr;
return XML_TOK_CDATA_SECT_OPEN;
}
static int PTRCALL
PREFIX(cdataSectionTok)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
if (ptr >= end)
return XML_TOK_NONE;
if (MINBPC(enc) > 1) {
size_t n = end - ptr;
if (n & (MINBPC(enc) - 1)) {
n &= ~(MINBPC(enc) - 1);
if (n == 0)
return XML_TOK_PARTIAL;
end = ptr + n;
}
}
switch (BYTE_TYPE(enc, ptr)) {
case BT_RSQB:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (!CHAR_MATCHES(enc, ptr, ASCII_RSQB))
break;
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (!CHAR_MATCHES(enc, ptr, ASCII_GT)) {
ptr -= MINBPC(enc);
break;
}
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_CDATA_SECT_CLOSE;
case BT_CR:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC(enc);
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
case BT_LF:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_DATA_NEWLINE;
INVALID_CASES(ptr, nextTokPtr)
default:
ptr += MINBPC(enc);
break;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_DATA_CHARS; \
} \
ptr += n; \
break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NONXML:
case BT_MALFORM:
case BT_TRAIL:
case BT_CR:
case BT_LF:
case BT_RSQB:
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC(enc);
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
/* ptr points to character following "" */
static int PTRCALL
PREFIX(scanEndTag)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
for (ptr += MINBPC(enc); HAS_CHAR(enc, ptr, end); ptr += MINBPC(enc)) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_S: case BT_CR: case BT_LF:
break;
case BT_GT:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_END_TAG;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
#ifdef XML_NS
case BT_COLON:
/* no need to check qname syntax here,
since end-tag must match exactly */
ptr += MINBPC(enc);
break;
#endif
case BT_GT:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_END_TAG;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "" */
static int PTRCALL
PREFIX(scanHexCharRef)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
if (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
case BT_HEX:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
for (ptr += MINBPC(enc); HAS_CHAR(enc, ptr, end); ptr += MINBPC(enc)) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
case BT_HEX:
break;
case BT_SEMI:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_CHAR_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "" */
static int PTRCALL
PREFIX(scanCharRef)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
if (HAS_CHAR(enc, ptr, end)) {
if (CHAR_MATCHES(enc, ptr, ASCII_x))
return PREFIX(scanHexCharRef)(enc, ptr + MINBPC(enc), end, nextTokPtr);
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
for (ptr += MINBPC(enc); HAS_CHAR(enc, ptr, end); ptr += MINBPC(enc)) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
break;
case BT_SEMI:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_CHAR_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "&" */
static int PTRCALL
PREFIX(scanRef)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_NUM:
return PREFIX(scanCharRef)(enc, ptr + MINBPC(enc), end, nextTokPtr);
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_SEMI:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_ENTITY_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following first character of attribute name */
static int PTRCALL
PREFIX(scanAtts)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
#ifdef XML_NS
int hadColon = 0;
#endif
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
#ifdef XML_NS
case BT_COLON:
if (hadColon) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
hadColon = 1;
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
break;
#endif
case BT_S: case BT_CR: case BT_LF:
for (;;) {
int t;
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
t = BYTE_TYPE(enc, ptr);
if (t == BT_EQUALS)
break;
switch (t) {
case BT_S:
case BT_LF:
case BT_CR:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
/* fall through */
case BT_EQUALS:
{
int open;
#ifdef XML_NS
hadColon = 0;
#endif
for (;;) {
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
open = BYTE_TYPE(enc, ptr);
if (open == BT_QUOT || open == BT_APOS)
break;
switch (open) {
case BT_S:
case BT_LF:
case BT_CR:
break;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
ptr += MINBPC(enc);
/* in attribute value */
for (;;) {
int t;
REQUIRE_CHAR(enc, ptr, end);
t = BYTE_TYPE(enc, ptr);
if (t == open)
break;
switch (t) {
INVALID_CASES(ptr, nextTokPtr)
case BT_AMP:
{
int tok = PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, &ptr);
if (tok <= 0) {
if (tok == XML_TOK_INVALID)
*nextTokPtr = ptr;
return tok;
}
break;
}
case BT_LT:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
default:
ptr += MINBPC(enc);
break;
}
}
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
case BT_S:
case BT_CR:
case BT_LF:
break;
case BT_SOL:
goto sol;
case BT_GT:
goto gt;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
/* ptr points to closing quote */
for (;;) {
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_CR: case BT_LF:
continue;
case BT_GT:
gt:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_START_TAG_WITH_ATTS;
case BT_SOL:
sol:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (!CHAR_MATCHES(enc, ptr, ASCII_GT)) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_EMPTY_ELEMENT_WITH_ATTS;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
break;
}
break;
}
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
/* ptr points to character following "<" */
static int PTRCALL
PREFIX(scanLt)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
#ifdef XML_NS
int hadColon;
#endif
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_EXCL:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
case BT_MINUS:
return PREFIX(scanComment)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_LSQB:
return PREFIX(scanCdataSection)(enc, ptr + MINBPC(enc),
end, nextTokPtr);
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
case BT_QUEST:
return PREFIX(scanPi)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_SOL:
return PREFIX(scanEndTag)(enc, ptr + MINBPC(enc), end, nextTokPtr);
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
#ifdef XML_NS
hadColon = 0;
#endif
/* we have a start-tag */
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
#ifdef XML_NS
case BT_COLON:
if (hadColon) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
hadColon = 1;
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
break;
#endif
case BT_S: case BT_CR: case BT_LF:
{
ptr += MINBPC(enc);
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_GT:
goto gt;
case BT_SOL:
goto sol;
case BT_S: case BT_CR: case BT_LF:
ptr += MINBPC(enc);
continue;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
return PREFIX(scanAtts)(enc, ptr, end, nextTokPtr);
}
return XML_TOK_PARTIAL;
}
case BT_GT:
gt:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_START_TAG_NO_ATTS;
case BT_SOL:
sol:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (!CHAR_MATCHES(enc, ptr, ASCII_GT)) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_EMPTY_ELEMENT_NO_ATTS;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static int PTRCALL
PREFIX(contentTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
if (ptr >= end)
return XML_TOK_NONE;
if (MINBPC(enc) > 1) {
size_t n = end - ptr;
if (n & (MINBPC(enc) - 1)) {
n &= ~(MINBPC(enc) - 1);
if (n == 0)
return XML_TOK_PARTIAL;
end = ptr + n;
}
}
switch (BYTE_TYPE(enc, ptr)) {
case BT_LT:
return PREFIX(scanLt)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_AMP:
return PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_CR:
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC(enc);
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
case BT_LF:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_DATA_NEWLINE;
case BT_RSQB:
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_TRAILING_RSQB;
if (!CHAR_MATCHES(enc, ptr, ASCII_RSQB))
break;
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_TRAILING_RSQB;
if (!CHAR_MATCHES(enc, ptr, ASCII_GT)) {
ptr -= MINBPC(enc);
break;
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
INVALID_CASES(ptr, nextTokPtr)
default:
ptr += MINBPC(enc);
break;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_DATA_CHARS; \
} \
ptr += n; \
break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_RSQB:
if (HAS_CHARS(enc, ptr, end, 2)) {
if (!CHAR_MATCHES(enc, ptr + MINBPC(enc), ASCII_RSQB)) {
ptr += MINBPC(enc);
break;
}
if (HAS_CHARS(enc, ptr, end, 3)) {
if (!CHAR_MATCHES(enc, ptr + 2*MINBPC(enc), ASCII_GT)) {
ptr += MINBPC(enc);
break;
}
*nextTokPtr = ptr + 2*MINBPC(enc);
return XML_TOK_INVALID;
}
}
/* fall through */
case BT_AMP:
case BT_LT:
case BT_NONXML:
case BT_MALFORM:
case BT_TRAIL:
case BT_CR:
case BT_LF:
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC(enc);
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
/* ptr points to character following "%" */
static int PTRCALL
PREFIX(scanPercent)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
case BT_S: case BT_LF: case BT_CR: case BT_PERCNT:
*nextTokPtr = ptr;
return XML_TOK_PERCENT;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_SEMI:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_PARAM_ENTITY_REF;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return XML_TOK_PARTIAL;
}
static int PTRCALL
PREFIX(scanPoundName)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr)
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_CR: case BT_LF: case BT_S:
case BT_RPAR: case BT_GT: case BT_PERCNT: case BT_VERBAR:
*nextTokPtr = ptr;
return XML_TOK_POUND_NAME;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return -XML_TOK_POUND_NAME;
}
static int PTRCALL
PREFIX(scanLit)(int open, const ENCODING *enc,
const char *ptr, const char *end,
const char **nextTokPtr)
{
while (HAS_CHAR(enc, ptr, end)) {
int t = BYTE_TYPE(enc, ptr);
switch (t) {
INVALID_CASES(ptr, nextTokPtr)
case BT_QUOT:
case BT_APOS:
ptr += MINBPC(enc);
if (t != open)
break;
if (! HAS_CHAR(enc, ptr, end))
return -XML_TOK_LITERAL;
*nextTokPtr = ptr;
switch (BYTE_TYPE(enc, ptr)) {
case BT_S: case BT_CR: case BT_LF:
case BT_GT: case BT_PERCNT: case BT_LSQB:
return XML_TOK_LITERAL;
default:
return XML_TOK_INVALID;
}
default:
ptr += MINBPC(enc);
break;
}
}
return XML_TOK_PARTIAL;
}
static int PTRCALL
PREFIX(prologTok)(const ENCODING *enc, const char *ptr, const char *end,
const char **nextTokPtr)
{
int tok;
if (ptr >= end)
return XML_TOK_NONE;
if (MINBPC(enc) > 1) {
size_t n = end - ptr;
if (n & (MINBPC(enc) - 1)) {
n &= ~(MINBPC(enc) - 1);
if (n == 0)
return XML_TOK_PARTIAL;
end = ptr + n;
}
}
switch (BYTE_TYPE(enc, ptr)) {
case BT_QUOT:
return PREFIX(scanLit)(BT_QUOT, enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_APOS:
return PREFIX(scanLit)(BT_APOS, enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_LT:
{
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
switch (BYTE_TYPE(enc, ptr)) {
case BT_EXCL:
return PREFIX(scanDecl)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_QUEST:
return PREFIX(scanPi)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_NMSTRT:
case BT_HEX:
case BT_NONASCII:
case BT_LEAD2:
case BT_LEAD3:
case BT_LEAD4:
*nextTokPtr = ptr - MINBPC(enc);
return XML_TOK_INSTANCE_START;
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
case BT_CR:
if (ptr + MINBPC(enc) == end) {
*nextTokPtr = end;
/* indicate that this might be part of a CR/LF pair */
return -XML_TOK_PROLOG_S;
}
/* fall through */
case BT_S: case BT_LF:
for (;;) {
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
break;
switch (BYTE_TYPE(enc, ptr)) {
case BT_S: case BT_LF:
break;
case BT_CR:
/* don't split CR/LF pair */
if (ptr + MINBPC(enc) != end)
break;
/* fall through */
default:
*nextTokPtr = ptr;
return XML_TOK_PROLOG_S;
}
}
*nextTokPtr = ptr;
return XML_TOK_PROLOG_S;
case BT_PERCNT:
return PREFIX(scanPercent)(enc, ptr + MINBPC(enc), end, nextTokPtr);
case BT_COMMA:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_COMMA;
case BT_LSQB:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_OPEN_BRACKET;
case BT_RSQB:
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return -XML_TOK_CLOSE_BRACKET;
if (CHAR_MATCHES(enc, ptr, ASCII_RSQB)) {
REQUIRE_CHARS(enc, ptr, end, 2);
if (CHAR_MATCHES(enc, ptr + MINBPC(enc), ASCII_GT)) {
*nextTokPtr = ptr + 2*MINBPC(enc);
return XML_TOK_COND_SECT_CLOSE;
}
}
*nextTokPtr = ptr;
return XML_TOK_CLOSE_BRACKET;
case BT_LPAR:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_OPEN_PAREN;
case BT_RPAR:
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return -XML_TOK_CLOSE_PAREN;
switch (BYTE_TYPE(enc, ptr)) {
case BT_AST:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_CLOSE_PAREN_ASTERISK;
case BT_QUEST:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_CLOSE_PAREN_QUESTION;
case BT_PLUS:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_CLOSE_PAREN_PLUS;
case BT_CR: case BT_LF: case BT_S:
case BT_GT: case BT_COMMA: case BT_VERBAR:
case BT_RPAR:
*nextTokPtr = ptr;
return XML_TOK_CLOSE_PAREN;
}
*nextTokPtr = ptr;
return XML_TOK_INVALID;
case BT_VERBAR:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_OR;
case BT_GT:
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_DECL_CLOSE;
case BT_NUM:
return PREFIX(scanPoundName)(enc, ptr + MINBPC(enc), end, nextTokPtr);
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
if (IS_NMSTRT_CHAR(enc, ptr, n)) { \
ptr += n; \
tok = XML_TOK_NAME; \
break; \
} \
if (IS_NAME_CHAR(enc, ptr, n)) { \
ptr += n; \
tok = XML_TOK_NMTOKEN; \
break; \
} \
*nextTokPtr = ptr; \
return XML_TOK_INVALID;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NMSTRT:
case BT_HEX:
tok = XML_TOK_NAME;
ptr += MINBPC(enc);
break;
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
#ifdef XML_NS
case BT_COLON:
#endif
tok = XML_TOK_NMTOKEN;
ptr += MINBPC(enc);
break;
case BT_NONASCII:
if (IS_NMSTRT_CHAR_MINBPC(enc, ptr)) {
ptr += MINBPC(enc);
tok = XML_TOK_NAME;
break;
}
if (IS_NAME_CHAR_MINBPC(enc, ptr)) {
ptr += MINBPC(enc);
tok = XML_TOK_NMTOKEN;
break;
}
/* fall through */
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
case BT_GT: case BT_RPAR: case BT_COMMA:
case BT_VERBAR: case BT_LSQB: case BT_PERCNT:
case BT_S: case BT_CR: case BT_LF:
*nextTokPtr = ptr;
return tok;
#ifdef XML_NS
case BT_COLON:
ptr += MINBPC(enc);
switch (tok) {
case XML_TOK_NAME:
REQUIRE_CHAR(enc, ptr, end);
tok = XML_TOK_PREFIXED_NAME;
switch (BYTE_TYPE(enc, ptr)) {
CHECK_NAME_CASES(enc, ptr, end, nextTokPtr)
default:
tok = XML_TOK_NMTOKEN;
break;
}
break;
case XML_TOK_PREFIXED_NAME:
tok = XML_TOK_NMTOKEN;
break;
}
break;
#endif
case BT_PLUS:
if (tok == XML_TOK_NMTOKEN) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_NAME_PLUS;
case BT_AST:
if (tok == XML_TOK_NMTOKEN) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_NAME_ASTERISK;
case BT_QUEST:
if (tok == XML_TOK_NMTOKEN) {
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_NAME_QUESTION;
default:
*nextTokPtr = ptr;
return XML_TOK_INVALID;
}
}
return -tok;
}
static int PTRCALL
PREFIX(attributeValueTok)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
const char *start;
if (ptr >= end)
return XML_TOK_NONE;
else if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_PARTIAL;
start = ptr;
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: ptr += n; break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_AMP:
if (ptr == start)
return PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, nextTokPtr);
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_LT:
/* this is for inside entity references */
*nextTokPtr = ptr;
return XML_TOK_INVALID;
case BT_LF:
if (ptr == start) {
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_CR:
if (ptr == start) {
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC(enc);
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_S:
if (ptr == start) {
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_ATTRIBUTE_VALUE_S;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC(enc);
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
static int PTRCALL
PREFIX(entityValueTok)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
const char *start;
if (ptr >= end)
return XML_TOK_NONE;
else if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_PARTIAL;
start = ptr;
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: ptr += n; break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_AMP:
if (ptr == start)
return PREFIX(scanRef)(enc, ptr + MINBPC(enc), end, nextTokPtr);
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_PERCNT:
if (ptr == start) {
int tok = PREFIX(scanPercent)(enc, ptr + MINBPC(enc),
end, nextTokPtr);
return (tok == XML_TOK_PERCENT) ? XML_TOK_INVALID : tok;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_LF:
if (ptr == start) {
*nextTokPtr = ptr + MINBPC(enc);
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
case BT_CR:
if (ptr == start) {
ptr += MINBPC(enc);
if (! HAS_CHAR(enc, ptr, end))
return XML_TOK_TRAILING_CR;
if (BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC(enc);
*nextTokPtr = ptr;
return XML_TOK_DATA_NEWLINE;
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
default:
ptr += MINBPC(enc);
break;
}
}
*nextTokPtr = ptr;
return XML_TOK_DATA_CHARS;
}
#ifdef XML_DTD
static int PTRCALL
PREFIX(ignoreSectionTok)(const ENCODING *enc, const char *ptr,
const char *end, const char **nextTokPtr)
{
int level = 0;
if (MINBPC(enc) > 1) {
size_t n = end - ptr;
if (n & (MINBPC(enc) - 1)) {
n &= ~(MINBPC(enc) - 1);
end = ptr + n;
}
}
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
INVALID_CASES(ptr, nextTokPtr)
case BT_LT:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (CHAR_MATCHES(enc, ptr, ASCII_EXCL)) {
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (CHAR_MATCHES(enc, ptr, ASCII_LSQB)) {
++level;
ptr += MINBPC(enc);
}
}
break;
case BT_RSQB:
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (CHAR_MATCHES(enc, ptr, ASCII_RSQB)) {
ptr += MINBPC(enc);
REQUIRE_CHAR(enc, ptr, end);
if (CHAR_MATCHES(enc, ptr, ASCII_GT)) {
ptr += MINBPC(enc);
if (level == 0) {
*nextTokPtr = ptr;
return XML_TOK_IGNORE_SECT;
}
--level;
}
}
break;
default:
ptr += MINBPC(enc);
break;
}
}
return XML_TOK_PARTIAL;
}
#endif /* XML_DTD */
static int PTRCALL
PREFIX(isPublicId)(const ENCODING *enc, const char *ptr, const char *end,
const char **badPtr)
{
ptr += MINBPC(enc);
end -= MINBPC(enc);
for (; HAS_CHAR(enc, ptr, end); ptr += MINBPC(enc)) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_DIGIT:
case BT_HEX:
case BT_MINUS:
case BT_APOS:
case BT_LPAR:
case BT_RPAR:
case BT_PLUS:
case BT_COMMA:
case BT_SOL:
case BT_EQUALS:
case BT_QUEST:
case BT_CR:
case BT_LF:
case BT_SEMI:
case BT_EXCL:
case BT_AST:
case BT_PERCNT:
case BT_NUM:
#ifdef XML_NS
case BT_COLON:
#endif
break;
case BT_S:
if (CHAR_MATCHES(enc, ptr, ASCII_TAB)) {
*badPtr = ptr;
return 0;
}
break;
case BT_NAME:
case BT_NMSTRT:
if (!(BYTE_TO_ASCII(enc, ptr) & ~0x7f))
break;
default:
switch (BYTE_TO_ASCII(enc, ptr)) {
case 0x24: /* $ */
case 0x40: /* @ */
break;
default:
*badPtr = ptr;
return 0;
}
break;
}
}
return 1;
}
/* This must only be called for a well-formed start-tag or empty
element tag. Returns the number of attributes. Pointers to the
first attsMax attributes are stored in atts.
*/
static int PTRCALL
PREFIX(getAtts)(const ENCODING *enc, const char *ptr,
int attsMax, ATTRIBUTE *atts)
{
enum { other, inName, inValue } state = inName;
int nAtts = 0;
int open = 0; /* defined when state == inValue;
initialization just to shut up compilers */
for (ptr += MINBPC(enc);; ptr += MINBPC(enc)) {
switch (BYTE_TYPE(enc, ptr)) {
#define START_NAME \
if (state == other) { \
if (nAtts < attsMax) { \
atts[nAtts].name = ptr; \
atts[nAtts].normalized = 1; \
} \
state = inName; \
}
#define LEAD_CASE(n) \
case BT_LEAD ## n: START_NAME ptr += (n - MINBPC(enc)); break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NONASCII:
case BT_NMSTRT:
case BT_HEX:
START_NAME
break;
#undef START_NAME
case BT_QUOT:
if (state != inValue) {
if (nAtts < attsMax)
atts[nAtts].valuePtr = ptr + MINBPC(enc);
state = inValue;
open = BT_QUOT;
}
else if (open == BT_QUOT) {
state = other;
if (nAtts < attsMax)
atts[nAtts].valueEnd = ptr;
nAtts++;
}
break;
case BT_APOS:
if (state != inValue) {
if (nAtts < attsMax)
atts[nAtts].valuePtr = ptr + MINBPC(enc);
state = inValue;
open = BT_APOS;
}
else if (open == BT_APOS) {
state = other;
if (nAtts < attsMax)
atts[nAtts].valueEnd = ptr;
nAtts++;
}
break;
case BT_AMP:
if (nAtts < attsMax)
atts[nAtts].normalized = 0;
break;
case BT_S:
if (state == inName)
state = other;
else if (state == inValue
&& nAtts < attsMax
&& atts[nAtts].normalized
&& (ptr == atts[nAtts].valuePtr
|| BYTE_TO_ASCII(enc, ptr) != ASCII_SPACE
|| BYTE_TO_ASCII(enc, ptr + MINBPC(enc)) == ASCII_SPACE
|| BYTE_TYPE(enc, ptr + MINBPC(enc)) == open))
atts[nAtts].normalized = 0;
break;
case BT_CR: case BT_LF:
/* This case ensures that the first attribute name is counted
Apart from that we could just change state on the quote. */
if (state == inName)
state = other;
else if (state == inValue && nAtts < attsMax)
atts[nAtts].normalized = 0;
break;
case BT_GT:
case BT_SOL:
if (state != inValue)
return nAtts;
break;
default:
break;
}
}
/* not reached */
}
static int PTRFASTCALL
PREFIX(charRefNumber)(const ENCODING *UNUSED_P(enc), const char *ptr)
{
int result = 0;
/* skip */
ptr += 2*MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_x)) {
for (ptr += MINBPC(enc);
!CHAR_MATCHES(enc, ptr, ASCII_SEMI);
ptr += MINBPC(enc)) {
int c = BYTE_TO_ASCII(enc, ptr);
switch (c) {
case ASCII_0: case ASCII_1: case ASCII_2: case ASCII_3: case ASCII_4:
case ASCII_5: case ASCII_6: case ASCII_7: case ASCII_8: case ASCII_9:
result <<= 4;
result |= (c - ASCII_0);
break;
case ASCII_A: case ASCII_B: case ASCII_C:
case ASCII_D: case ASCII_E: case ASCII_F:
result <<= 4;
result += 10 + (c - ASCII_A);
break;
case ASCII_a: case ASCII_b: case ASCII_c:
case ASCII_d: case ASCII_e: case ASCII_f:
result <<= 4;
result += 10 + (c - ASCII_a);
break;
}
if (result >= 0x110000)
return -1;
}
}
else {
for (; !CHAR_MATCHES(enc, ptr, ASCII_SEMI); ptr += MINBPC(enc)) {
int c = BYTE_TO_ASCII(enc, ptr);
result *= 10;
result += (c - ASCII_0);
if (result >= 0x110000)
return -1;
}
}
return checkCharRefNumber(result);
}
static int PTRCALL
PREFIX(predefinedEntityName)(const ENCODING *UNUSED_P(enc), const char *ptr,
const char *end)
{
switch ((end - ptr)/MINBPC(enc)) {
case 2:
if (CHAR_MATCHES(enc, ptr + MINBPC(enc), ASCII_t)) {
switch (BYTE_TO_ASCII(enc, ptr)) {
case ASCII_l:
return ASCII_LT;
case ASCII_g:
return ASCII_GT;
}
}
break;
case 3:
if (CHAR_MATCHES(enc, ptr, ASCII_a)) {
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_m)) {
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_p))
return ASCII_AMP;
}
}
break;
case 4:
switch (BYTE_TO_ASCII(enc, ptr)) {
case ASCII_q:
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_u)) {
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_o)) {
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_t))
return ASCII_QUOT;
}
}
break;
case ASCII_a:
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_p)) {
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_o)) {
ptr += MINBPC(enc);
if (CHAR_MATCHES(enc, ptr, ASCII_s))
return ASCII_APOS;
}
}
break;
}
}
return 0;
}
static int PTRCALL
PREFIX(sameName)(const ENCODING *enc, const char *ptr1, const char *ptr2)
{
for (;;) {
switch (BYTE_TYPE(enc, ptr1)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
if (*ptr1++ != *ptr2++) \
return 0;
LEAD_CASE(4) LEAD_CASE(3) LEAD_CASE(2)
#undef LEAD_CASE
/* fall through */
if (*ptr1++ != *ptr2++)
return 0;
break;
case BT_NONASCII:
case BT_NMSTRT:
#ifdef XML_NS
case BT_COLON:
#endif
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
if (*ptr2++ != *ptr1++)
return 0;
if (MINBPC(enc) > 1) {
if (*ptr2++ != *ptr1++)
return 0;
if (MINBPC(enc) > 2) {
if (*ptr2++ != *ptr1++)
return 0;
if (MINBPC(enc) > 3) {
if (*ptr2++ != *ptr1++)
return 0;
}
}
}
break;
default:
if (MINBPC(enc) == 1 && *ptr1 == *ptr2)
return 1;
switch (BYTE_TYPE(enc, ptr2)) {
case BT_LEAD2:
case BT_LEAD3:
case BT_LEAD4:
case BT_NONASCII:
case BT_NMSTRT:
#ifdef XML_NS
case BT_COLON:
#endif
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
return 0;
default:
return 1;
}
}
}
/* not reached */
}
static int PTRCALL
PREFIX(nameMatchesAscii)(const ENCODING *UNUSED_P(enc), const char *ptr1,
const char *end1, const char *ptr2)
{
for (; *ptr2; ptr1 += MINBPC(enc), ptr2++) {
if (end1 - ptr1 < MINBPC(enc))
return 0;
if (!CHAR_MATCHES(enc, ptr1, *ptr2))
return 0;
}
return ptr1 == end1;
}
static int PTRFASTCALL
PREFIX(nameLength)(const ENCODING *enc, const char *ptr)
{
const char *start = ptr;
for (;;) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: ptr += n; break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_NONASCII:
case BT_NMSTRT:
#ifdef XML_NS
case BT_COLON:
#endif
case BT_HEX:
case BT_DIGIT:
case BT_NAME:
case BT_MINUS:
ptr += MINBPC(enc);
break;
default:
return (int)(ptr - start);
}
}
}
static const char * PTRFASTCALL
PREFIX(skipS)(const ENCODING *enc, const char *ptr)
{
for (;;) {
switch (BYTE_TYPE(enc, ptr)) {
case BT_LF:
case BT_CR:
case BT_S:
ptr += MINBPC(enc);
break;
default:
return ptr;
}
}
}
static void PTRCALL
PREFIX(updatePosition)(const ENCODING *enc,
const char *ptr,
const char *end,
POSITION *pos)
{
while (HAS_CHAR(enc, ptr, end)) {
switch (BYTE_TYPE(enc, ptr)) {
#define LEAD_CASE(n) \
case BT_LEAD ## n: \
ptr += n; \
break;
LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
case BT_LF:
pos->columnNumber = (XML_Size)-1;
pos->lineNumber++;
ptr += MINBPC(enc);
break;
case BT_CR:
pos->lineNumber++;
ptr += MINBPC(enc);
if (HAS_CHAR(enc, ptr, end) && BYTE_TYPE(enc, ptr) == BT_LF)
ptr += MINBPC(enc);
pos->columnNumber = (XML_Size)-1;
break;
default:
ptr += MINBPC(enc);
break;
}
pos->columnNumber++;
}
}
#undef DO_LEAD_CASE
#undef MULTIBYTE_CASES
#undef INVALID_CASES
#undef CHECK_NAME_CASE
#undef CHECK_NAME_CASES
#undef CHECK_NMSTRT_CASE
#undef CHECK_NMSTRT_CASES
#endif /* XML_TOK_IMPL_C */
hexpat-0.20.13/cbits/xmlrole.c 0000644 0000000 0000000 00000101164 13122604047 014270 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#include
#ifdef _WIN32
#include "winconfig.h"
#else
#ifdef HAVE_EXPAT_CONFIG_H
#include
#endif
#endif /* ndef _WIN32 */
#include "expat_external.h"
#include "internal.h"
#include "xmlrole.h"
#include "ascii.h"
/* Doesn't check:
that ,| are not mixed in a model group
content of literals
*/
static const char KW_ANY[] = {
ASCII_A, ASCII_N, ASCII_Y, '\0' };
static const char KW_ATTLIST[] = {
ASCII_A, ASCII_T, ASCII_T, ASCII_L, ASCII_I, ASCII_S, ASCII_T, '\0' };
static const char KW_CDATA[] = {
ASCII_C, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' };
static const char KW_DOCTYPE[] = {
ASCII_D, ASCII_O, ASCII_C, ASCII_T, ASCII_Y, ASCII_P, ASCII_E, '\0' };
static const char KW_ELEMENT[] = {
ASCII_E, ASCII_L, ASCII_E, ASCII_M, ASCII_E, ASCII_N, ASCII_T, '\0' };
static const char KW_EMPTY[] = {
ASCII_E, ASCII_M, ASCII_P, ASCII_T, ASCII_Y, '\0' };
static const char KW_ENTITIES[] = {
ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_I, ASCII_E, ASCII_S,
'\0' };
static const char KW_ENTITY[] = {
ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_Y, '\0' };
static const char KW_FIXED[] = {
ASCII_F, ASCII_I, ASCII_X, ASCII_E, ASCII_D, '\0' };
static const char KW_ID[] = {
ASCII_I, ASCII_D, '\0' };
static const char KW_IDREF[] = {
ASCII_I, ASCII_D, ASCII_R, ASCII_E, ASCII_F, '\0' };
static const char KW_IDREFS[] = {
ASCII_I, ASCII_D, ASCII_R, ASCII_E, ASCII_F, ASCII_S, '\0' };
#ifdef XML_DTD
static const char KW_IGNORE[] = {
ASCII_I, ASCII_G, ASCII_N, ASCII_O, ASCII_R, ASCII_E, '\0' };
#endif
static const char KW_IMPLIED[] = {
ASCII_I, ASCII_M, ASCII_P, ASCII_L, ASCII_I, ASCII_E, ASCII_D, '\0' };
#ifdef XML_DTD
static const char KW_INCLUDE[] = {
ASCII_I, ASCII_N, ASCII_C, ASCII_L, ASCII_U, ASCII_D, ASCII_E, '\0' };
#endif
static const char KW_NDATA[] = {
ASCII_N, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' };
static const char KW_NMTOKEN[] = {
ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, '\0' };
static const char KW_NMTOKENS[] = {
ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, ASCII_S,
'\0' };
static const char KW_NOTATION[] =
{ ASCII_N, ASCII_O, ASCII_T, ASCII_A, ASCII_T, ASCII_I, ASCII_O, ASCII_N,
'\0' };
static const char KW_PCDATA[] = {
ASCII_P, ASCII_C, ASCII_D, ASCII_A, ASCII_T, ASCII_A, '\0' };
static const char KW_PUBLIC[] = {
ASCII_P, ASCII_U, ASCII_B, ASCII_L, ASCII_I, ASCII_C, '\0' };
static const char KW_REQUIRED[] = {
ASCII_R, ASCII_E, ASCII_Q, ASCII_U, ASCII_I, ASCII_R, ASCII_E, ASCII_D,
'\0' };
static const char KW_SYSTEM[] = {
ASCII_S, ASCII_Y, ASCII_S, ASCII_T, ASCII_E, ASCII_M, '\0' };
#ifndef MIN_BYTES_PER_CHAR
#define MIN_BYTES_PER_CHAR(enc) ((enc)->minBytesPerChar)
#endif
#ifdef XML_DTD
#define setTopLevel(state) \
((state)->handler = ((state)->documentEntity \
? internalSubset \
: externalSubset1))
#else /* not XML_DTD */
#define setTopLevel(state) ((state)->handler = internalSubset)
#endif /* not XML_DTD */
typedef int PTRCALL PROLOG_HANDLER(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc);
static PROLOG_HANDLER
prolog0, prolog1, prolog2,
doctype0, doctype1, doctype2, doctype3, doctype4, doctype5,
internalSubset,
entity0, entity1, entity2, entity3, entity4, entity5, entity6,
entity7, entity8, entity9, entity10,
notation0, notation1, notation2, notation3, notation4,
attlist0, attlist1, attlist2, attlist3, attlist4, attlist5, attlist6,
attlist7, attlist8, attlist9,
element0, element1, element2, element3, element4, element5, element6,
element7,
#ifdef XML_DTD
externalSubset0, externalSubset1,
condSect0, condSect1, condSect2,
#endif /* XML_DTD */
declClose,
error;
static int FASTCALL common(PROLOG_STATE *state, int tok);
static int PTRCALL
prolog0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
state->handler = prolog1;
return XML_ROLE_NONE;
case XML_TOK_XML_DECL:
state->handler = prolog1;
return XML_ROLE_XML_DECL;
case XML_TOK_PI:
state->handler = prolog1;
return XML_ROLE_PI;
case XML_TOK_COMMENT:
state->handler = prolog1;
return XML_ROLE_COMMENT;
case XML_TOK_BOM:
return XML_ROLE_NONE;
case XML_TOK_DECL_OPEN:
if (!XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
end,
KW_DOCTYPE))
break;
state->handler = doctype0;
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_INSTANCE_START:
state->handler = error;
return XML_ROLE_INSTANCE_START;
}
return common(state, tok);
}
static int PTRCALL
prolog1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_PI:
return XML_ROLE_PI;
case XML_TOK_COMMENT:
return XML_ROLE_COMMENT;
case XML_TOK_BOM:
return XML_ROLE_NONE;
case XML_TOK_DECL_OPEN:
if (!XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
end,
KW_DOCTYPE))
break;
state->handler = doctype0;
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_INSTANCE_START:
state->handler = error;
return XML_ROLE_INSTANCE_START;
}
return common(state, tok);
}
static int PTRCALL
prolog2(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_PI:
return XML_ROLE_PI;
case XML_TOK_COMMENT:
return XML_ROLE_COMMENT;
case XML_TOK_INSTANCE_START:
state->handler = error;
return XML_ROLE_INSTANCE_START;
}
return common(state, tok);
}
static int PTRCALL
doctype0(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = doctype1;
return XML_ROLE_DOCTYPE_NAME;
}
return common(state, tok);
}
static int PTRCALL
doctype1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_OPEN_BRACKET:
state->handler = internalSubset;
return XML_ROLE_DOCTYPE_INTERNAL_SUBSET;
case XML_TOK_DECL_CLOSE:
state->handler = prolog2;
return XML_ROLE_DOCTYPE_CLOSE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) {
state->handler = doctype3;
return XML_ROLE_DOCTYPE_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) {
state->handler = doctype2;
return XML_ROLE_DOCTYPE_NONE;
}
break;
}
return common(state, tok);
}
static int PTRCALL
doctype2(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_LITERAL:
state->handler = doctype3;
return XML_ROLE_DOCTYPE_PUBLIC_ID;
}
return common(state, tok);
}
static int PTRCALL
doctype3(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_LITERAL:
state->handler = doctype4;
return XML_ROLE_DOCTYPE_SYSTEM_ID;
}
return common(state, tok);
}
static int PTRCALL
doctype4(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_OPEN_BRACKET:
state->handler = internalSubset;
return XML_ROLE_DOCTYPE_INTERNAL_SUBSET;
case XML_TOK_DECL_CLOSE:
state->handler = prolog2;
return XML_ROLE_DOCTYPE_CLOSE;
}
return common(state, tok);
}
static int PTRCALL
doctype5(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_DECL_CLOSE:
state->handler = prolog2;
return XML_ROLE_DOCTYPE_CLOSE;
}
return common(state, tok);
}
static int PTRCALL
internalSubset(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_DECL_OPEN:
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
end,
KW_ENTITY)) {
state->handler = entity0;
return XML_ROLE_ENTITY_NONE;
}
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
end,
KW_ATTLIST)) {
state->handler = attlist0;
return XML_ROLE_ATTLIST_NONE;
}
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
end,
KW_ELEMENT)) {
state->handler = element0;
return XML_ROLE_ELEMENT_NONE;
}
if (XmlNameMatchesAscii(enc,
ptr + 2 * MIN_BYTES_PER_CHAR(enc),
end,
KW_NOTATION)) {
state->handler = notation0;
return XML_ROLE_NOTATION_NONE;
}
break;
case XML_TOK_PI:
return XML_ROLE_PI;
case XML_TOK_COMMENT:
return XML_ROLE_COMMENT;
case XML_TOK_PARAM_ENTITY_REF:
return XML_ROLE_PARAM_ENTITY_REF;
case XML_TOK_CLOSE_BRACKET:
state->handler = doctype5;
return XML_ROLE_DOCTYPE_NONE;
case XML_TOK_NONE:
return XML_ROLE_NONE;
}
return common(state, tok);
}
#ifdef XML_DTD
static int PTRCALL
externalSubset0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
state->handler = externalSubset1;
if (tok == XML_TOK_XML_DECL)
return XML_ROLE_TEXT_DECL;
return externalSubset1(state, tok, ptr, end, enc);
}
static int PTRCALL
externalSubset1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_COND_SECT_OPEN:
state->handler = condSect0;
return XML_ROLE_NONE;
case XML_TOK_COND_SECT_CLOSE:
if (state->includeLevel == 0)
break;
state->includeLevel -= 1;
return XML_ROLE_NONE;
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_CLOSE_BRACKET:
break;
case XML_TOK_NONE:
if (state->includeLevel)
break;
return XML_ROLE_NONE;
default:
return internalSubset(state, tok, ptr, end, enc);
}
return common(state, tok);
}
#endif /* XML_DTD */
static int PTRCALL
entity0(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_PERCENT:
state->handler = entity1;
return XML_ROLE_ENTITY_NONE;
case XML_TOK_NAME:
state->handler = entity2;
return XML_ROLE_GENERAL_ENTITY_NAME;
}
return common(state, tok);
}
static int PTRCALL
entity1(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_NAME:
state->handler = entity7;
return XML_ROLE_PARAM_ENTITY_NAME;
}
return common(state, tok);
}
static int PTRCALL
entity2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) {
state->handler = entity4;
return XML_ROLE_ENTITY_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) {
state->handler = entity3;
return XML_ROLE_ENTITY_NONE;
}
break;
case XML_TOK_LITERAL:
state->handler = declClose;
state->role_none = XML_ROLE_ENTITY_NONE;
return XML_ROLE_ENTITY_VALUE;
}
return common(state, tok);
}
static int PTRCALL
entity3(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_LITERAL:
state->handler = entity4;
return XML_ROLE_ENTITY_PUBLIC_ID;
}
return common(state, tok);
}
static int PTRCALL
entity4(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_LITERAL:
state->handler = entity5;
return XML_ROLE_ENTITY_SYSTEM_ID;
}
return common(state, tok);
}
static int PTRCALL
entity5(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_DECL_CLOSE:
setTopLevel(state);
return XML_ROLE_ENTITY_COMPLETE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_NDATA)) {
state->handler = entity6;
return XML_ROLE_ENTITY_NONE;
}
break;
}
return common(state, tok);
}
static int PTRCALL
entity6(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_NAME:
state->handler = declClose;
state->role_none = XML_ROLE_ENTITY_NONE;
return XML_ROLE_ENTITY_NOTATION_NAME;
}
return common(state, tok);
}
static int PTRCALL
entity7(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) {
state->handler = entity9;
return XML_ROLE_ENTITY_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) {
state->handler = entity8;
return XML_ROLE_ENTITY_NONE;
}
break;
case XML_TOK_LITERAL:
state->handler = declClose;
state->role_none = XML_ROLE_ENTITY_NONE;
return XML_ROLE_ENTITY_VALUE;
}
return common(state, tok);
}
static int PTRCALL
entity8(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_LITERAL:
state->handler = entity9;
return XML_ROLE_ENTITY_PUBLIC_ID;
}
return common(state, tok);
}
static int PTRCALL
entity9(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_LITERAL:
state->handler = entity10;
return XML_ROLE_ENTITY_SYSTEM_ID;
}
return common(state, tok);
}
static int PTRCALL
entity10(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ENTITY_NONE;
case XML_TOK_DECL_CLOSE:
setTopLevel(state);
return XML_ROLE_ENTITY_COMPLETE;
}
return common(state, tok);
}
static int PTRCALL
notation0(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NOTATION_NONE;
case XML_TOK_NAME:
state->handler = notation1;
return XML_ROLE_NOTATION_NAME;
}
return common(state, tok);
}
static int PTRCALL
notation1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NOTATION_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_SYSTEM)) {
state->handler = notation3;
return XML_ROLE_NOTATION_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_PUBLIC)) {
state->handler = notation2;
return XML_ROLE_NOTATION_NONE;
}
break;
}
return common(state, tok);
}
static int PTRCALL
notation2(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NOTATION_NONE;
case XML_TOK_LITERAL:
state->handler = notation4;
return XML_ROLE_NOTATION_PUBLIC_ID;
}
return common(state, tok);
}
static int PTRCALL
notation3(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NOTATION_NONE;
case XML_TOK_LITERAL:
state->handler = declClose;
state->role_none = XML_ROLE_NOTATION_NONE;
return XML_ROLE_NOTATION_SYSTEM_ID;
}
return common(state, tok);
}
static int PTRCALL
notation4(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NOTATION_NONE;
case XML_TOK_LITERAL:
state->handler = declClose;
state->role_none = XML_ROLE_NOTATION_NONE;
return XML_ROLE_NOTATION_SYSTEM_ID;
case XML_TOK_DECL_CLOSE:
setTopLevel(state);
return XML_ROLE_NOTATION_NO_SYSTEM_ID;
}
return common(state, tok);
}
static int PTRCALL
attlist0(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = attlist1;
return XML_ROLE_ATTLIST_ELEMENT_NAME;
}
return common(state, tok);
}
static int PTRCALL
attlist1(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_DECL_CLOSE:
setTopLevel(state);
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = attlist2;
return XML_ROLE_ATTRIBUTE_NAME;
}
return common(state, tok);
}
static int PTRCALL
attlist2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_NAME:
{
static const char * const types[] = {
KW_CDATA,
KW_ID,
KW_IDREF,
KW_IDREFS,
KW_ENTITY,
KW_ENTITIES,
KW_NMTOKEN,
KW_NMTOKENS,
};
int i;
for (i = 0; i < (int)(sizeof(types)/sizeof(types[0])); i++)
if (XmlNameMatchesAscii(enc, ptr, end, types[i])) {
state->handler = attlist8;
return XML_ROLE_ATTRIBUTE_TYPE_CDATA + i;
}
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_NOTATION)) {
state->handler = attlist5;
return XML_ROLE_ATTLIST_NONE;
}
break;
case XML_TOK_OPEN_PAREN:
state->handler = attlist3;
return XML_ROLE_ATTLIST_NONE;
}
return common(state, tok);
}
static int PTRCALL
attlist3(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_NMTOKEN:
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = attlist4;
return XML_ROLE_ATTRIBUTE_ENUM_VALUE;
}
return common(state, tok);
}
static int PTRCALL
attlist4(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_CLOSE_PAREN:
state->handler = attlist8;
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_OR:
state->handler = attlist3;
return XML_ROLE_ATTLIST_NONE;
}
return common(state, tok);
}
static int PTRCALL
attlist5(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_OPEN_PAREN:
state->handler = attlist6;
return XML_ROLE_ATTLIST_NONE;
}
return common(state, tok);
}
static int PTRCALL
attlist6(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_NAME:
state->handler = attlist7;
return XML_ROLE_ATTRIBUTE_NOTATION_VALUE;
}
return common(state, tok);
}
static int PTRCALL
attlist7(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_CLOSE_PAREN:
state->handler = attlist8;
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_OR:
state->handler = attlist6;
return XML_ROLE_ATTLIST_NONE;
}
return common(state, tok);
}
/* default value */
static int PTRCALL
attlist8(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_POUND_NAME:
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
end,
KW_IMPLIED)) {
state->handler = attlist1;
return XML_ROLE_IMPLIED_ATTRIBUTE_VALUE;
}
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
end,
KW_REQUIRED)) {
state->handler = attlist1;
return XML_ROLE_REQUIRED_ATTRIBUTE_VALUE;
}
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
end,
KW_FIXED)) {
state->handler = attlist9;
return XML_ROLE_ATTLIST_NONE;
}
break;
case XML_TOK_LITERAL:
state->handler = attlist1;
return XML_ROLE_DEFAULT_ATTRIBUTE_VALUE;
}
return common(state, tok);
}
static int PTRCALL
attlist9(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ATTLIST_NONE;
case XML_TOK_LITERAL:
state->handler = attlist1;
return XML_ROLE_FIXED_ATTRIBUTE_VALUE;
}
return common(state, tok);
}
static int PTRCALL
element0(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = element1;
return XML_ROLE_ELEMENT_NAME;
}
return common(state, tok);
}
static int PTRCALL
element1(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_EMPTY)) {
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
return XML_ROLE_CONTENT_EMPTY;
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_ANY)) {
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
return XML_ROLE_CONTENT_ANY;
}
break;
case XML_TOK_OPEN_PAREN:
state->handler = element2;
state->level = 1;
return XML_ROLE_GROUP_OPEN;
}
return common(state, tok);
}
static int PTRCALL
element2(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_POUND_NAME:
if (XmlNameMatchesAscii(enc,
ptr + MIN_BYTES_PER_CHAR(enc),
end,
KW_PCDATA)) {
state->handler = element3;
return XML_ROLE_CONTENT_PCDATA;
}
break;
case XML_TOK_OPEN_PAREN:
state->level = 2;
state->handler = element6;
return XML_ROLE_GROUP_OPEN;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT;
case XML_TOK_NAME_QUESTION:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_OPT;
case XML_TOK_NAME_ASTERISK:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_REP;
case XML_TOK_NAME_PLUS:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_PLUS;
}
return common(state, tok);
}
static int PTRCALL
element3(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_CLOSE_PAREN:
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
return XML_ROLE_GROUP_CLOSE;
case XML_TOK_CLOSE_PAREN_ASTERISK:
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
return XML_ROLE_GROUP_CLOSE_REP;
case XML_TOK_OR:
state->handler = element4;
return XML_ROLE_ELEMENT_NONE;
}
return common(state, tok);
}
static int PTRCALL
element4(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = element5;
return XML_ROLE_CONTENT_ELEMENT;
}
return common(state, tok);
}
static int PTRCALL
element5(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_CLOSE_PAREN_ASTERISK:
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
return XML_ROLE_GROUP_CLOSE_REP;
case XML_TOK_OR:
state->handler = element4;
return XML_ROLE_ELEMENT_NONE;
}
return common(state, tok);
}
static int PTRCALL
element6(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_OPEN_PAREN:
state->level += 1;
return XML_ROLE_GROUP_OPEN;
case XML_TOK_NAME:
case XML_TOK_PREFIXED_NAME:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT;
case XML_TOK_NAME_QUESTION:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_OPT;
case XML_TOK_NAME_ASTERISK:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_REP;
case XML_TOK_NAME_PLUS:
state->handler = element7;
return XML_ROLE_CONTENT_ELEMENT_PLUS;
}
return common(state, tok);
}
static int PTRCALL
element7(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_ELEMENT_NONE;
case XML_TOK_CLOSE_PAREN:
state->level -= 1;
if (state->level == 0) {
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
}
return XML_ROLE_GROUP_CLOSE;
case XML_TOK_CLOSE_PAREN_ASTERISK:
state->level -= 1;
if (state->level == 0) {
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
}
return XML_ROLE_GROUP_CLOSE_REP;
case XML_TOK_CLOSE_PAREN_QUESTION:
state->level -= 1;
if (state->level == 0) {
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
}
return XML_ROLE_GROUP_CLOSE_OPT;
case XML_TOK_CLOSE_PAREN_PLUS:
state->level -= 1;
if (state->level == 0) {
state->handler = declClose;
state->role_none = XML_ROLE_ELEMENT_NONE;
}
return XML_ROLE_GROUP_CLOSE_PLUS;
case XML_TOK_COMMA:
state->handler = element6;
return XML_ROLE_GROUP_SEQUENCE;
case XML_TOK_OR:
state->handler = element6;
return XML_ROLE_GROUP_CHOICE;
}
return common(state, tok);
}
#ifdef XML_DTD
static int PTRCALL
condSect0(PROLOG_STATE *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc)
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_NAME:
if (XmlNameMatchesAscii(enc, ptr, end, KW_INCLUDE)) {
state->handler = condSect1;
return XML_ROLE_NONE;
}
if (XmlNameMatchesAscii(enc, ptr, end, KW_IGNORE)) {
state->handler = condSect2;
return XML_ROLE_NONE;
}
break;
}
return common(state, tok);
}
static int PTRCALL
condSect1(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_OPEN_BRACKET:
state->handler = externalSubset1;
state->includeLevel += 1;
return XML_ROLE_NONE;
}
return common(state, tok);
}
static int PTRCALL
condSect2(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return XML_ROLE_NONE;
case XML_TOK_OPEN_BRACKET:
state->handler = externalSubset1;
return XML_ROLE_IGNORE_SECT;
}
return common(state, tok);
}
#endif /* XML_DTD */
static int PTRCALL
declClose(PROLOG_STATE *state,
int tok,
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
switch (tok) {
case XML_TOK_PROLOG_S:
return state->role_none;
case XML_TOK_DECL_CLOSE:
setTopLevel(state);
return state->role_none;
}
return common(state, tok);
}
static int PTRCALL
error(PROLOG_STATE *UNUSED_P(state),
int UNUSED_P(tok),
const char *UNUSED_P(ptr),
const char *UNUSED_P(end),
const ENCODING *UNUSED_P(enc))
{
return XML_ROLE_NONE;
}
static int FASTCALL
common(PROLOG_STATE *state, int tok)
{
#ifdef XML_DTD
if (!state->documentEntity && tok == XML_TOK_PARAM_ENTITY_REF)
return XML_ROLE_INNER_PARAM_ENTITY_REF;
#endif
state->handler = error;
return XML_ROLE_ERROR;
}
void
XmlPrologStateInit(PROLOG_STATE *state)
{
state->handler = prolog0;
#ifdef XML_DTD
state->documentEntity = 1;
state->includeLevel = 0;
state->inEntityValue = 0;
#endif /* XML_DTD */
}
#ifdef XML_DTD
void
XmlPrologStateInitExternalEntity(PROLOG_STATE *state)
{
state->handler = externalSubset0;
state->documentEntity = 0;
state->includeLevel = 0;
}
#endif /* XML_DTD */
hexpat-0.20.13/cbits/xmlrole.h 0000644 0000000 0000000 00000005717 13122604047 014304 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#ifndef XmlRole_INCLUDED
#define XmlRole_INCLUDED 1
#ifdef __VMS
/* 0 1 2 3 0 1 2 3
1234567890123456789012345678901 1234567890123456789012345678901 */
#define XmlPrologStateInitExternalEntity XmlPrologStateInitExternalEnt
#endif
#include "xmltok.h"
#ifdef __cplusplus
extern "C" {
#endif
enum {
XML_ROLE_ERROR = -1,
XML_ROLE_NONE = 0,
XML_ROLE_XML_DECL,
XML_ROLE_INSTANCE_START,
XML_ROLE_DOCTYPE_NONE,
XML_ROLE_DOCTYPE_NAME,
XML_ROLE_DOCTYPE_SYSTEM_ID,
XML_ROLE_DOCTYPE_PUBLIC_ID,
XML_ROLE_DOCTYPE_INTERNAL_SUBSET,
XML_ROLE_DOCTYPE_CLOSE,
XML_ROLE_GENERAL_ENTITY_NAME,
XML_ROLE_PARAM_ENTITY_NAME,
XML_ROLE_ENTITY_NONE,
XML_ROLE_ENTITY_VALUE,
XML_ROLE_ENTITY_SYSTEM_ID,
XML_ROLE_ENTITY_PUBLIC_ID,
XML_ROLE_ENTITY_COMPLETE,
XML_ROLE_ENTITY_NOTATION_NAME,
XML_ROLE_NOTATION_NONE,
XML_ROLE_NOTATION_NAME,
XML_ROLE_NOTATION_SYSTEM_ID,
XML_ROLE_NOTATION_NO_SYSTEM_ID,
XML_ROLE_NOTATION_PUBLIC_ID,
XML_ROLE_ATTRIBUTE_NAME,
XML_ROLE_ATTRIBUTE_TYPE_CDATA,
XML_ROLE_ATTRIBUTE_TYPE_ID,
XML_ROLE_ATTRIBUTE_TYPE_IDREF,
XML_ROLE_ATTRIBUTE_TYPE_IDREFS,
XML_ROLE_ATTRIBUTE_TYPE_ENTITY,
XML_ROLE_ATTRIBUTE_TYPE_ENTITIES,
XML_ROLE_ATTRIBUTE_TYPE_NMTOKEN,
XML_ROLE_ATTRIBUTE_TYPE_NMTOKENS,
XML_ROLE_ATTRIBUTE_ENUM_VALUE,
XML_ROLE_ATTRIBUTE_NOTATION_VALUE,
XML_ROLE_ATTLIST_NONE,
XML_ROLE_ATTLIST_ELEMENT_NAME,
XML_ROLE_IMPLIED_ATTRIBUTE_VALUE,
XML_ROLE_REQUIRED_ATTRIBUTE_VALUE,
XML_ROLE_DEFAULT_ATTRIBUTE_VALUE,
XML_ROLE_FIXED_ATTRIBUTE_VALUE,
XML_ROLE_ELEMENT_NONE,
XML_ROLE_ELEMENT_NAME,
XML_ROLE_CONTENT_ANY,
XML_ROLE_CONTENT_EMPTY,
XML_ROLE_CONTENT_PCDATA,
XML_ROLE_GROUP_OPEN,
XML_ROLE_GROUP_CLOSE,
XML_ROLE_GROUP_CLOSE_REP,
XML_ROLE_GROUP_CLOSE_OPT,
XML_ROLE_GROUP_CLOSE_PLUS,
XML_ROLE_GROUP_CHOICE,
XML_ROLE_GROUP_SEQUENCE,
XML_ROLE_CONTENT_ELEMENT,
XML_ROLE_CONTENT_ELEMENT_REP,
XML_ROLE_CONTENT_ELEMENT_OPT,
XML_ROLE_CONTENT_ELEMENT_PLUS,
XML_ROLE_PI,
XML_ROLE_COMMENT,
#ifdef XML_DTD
XML_ROLE_TEXT_DECL,
XML_ROLE_IGNORE_SECT,
XML_ROLE_INNER_PARAM_ENTITY_REF,
#endif /* XML_DTD */
XML_ROLE_PARAM_ENTITY_REF
};
typedef struct prolog_state {
int (PTRCALL *handler) (struct prolog_state *state,
int tok,
const char *ptr,
const char *end,
const ENCODING *enc);
unsigned level;
int role_none;
#ifdef XML_DTD
unsigned includeLevel;
int documentEntity;
int inEntityValue;
#endif /* XML_DTD */
} PROLOG_STATE;
void XmlPrologStateInit(PROLOG_STATE *);
#ifdef XML_DTD
void XmlPrologStateInitExternalEntity(PROLOG_STATE *);
#endif /* XML_DTD */
#define XmlTokenRole(state, tok, ptr, end, enc) \
(((state)->handler)(state, tok, ptr, end, enc))
#ifdef __cplusplus
}
#endif
#endif /* not XmlRole_INCLUDED */
hexpat-0.20.13/cbits/expat_external.h 0000644 0000000 0000000 00000007651 13122604047 015644 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#ifndef Expat_External_INCLUDED
#define Expat_External_INCLUDED 1
/* External API definitions */
#if defined(_MSC_EXTENSIONS) && !defined(__BEOS__) && !defined(__CYGWIN__)
#define XML_USE_MSC_EXTENSIONS 1
#endif
/* Expat tries very hard to make the API boundary very specifically
defined. There are two macros defined to control this boundary;
each of these can be defined before including this header to
achieve some different behavior, but doing so it not recommended or
tested frequently.
XMLCALL - The calling convention to use for all calls across the
"library boundary." This will default to cdecl, and
try really hard to tell the compiler that's what we
want.
XMLIMPORT - Whatever magic is needed to note that a function is
to be imported from a dynamically loaded library
(.dll, .so, or .sl, depending on your platform).
The XMLCALL macro was added in Expat 1.95.7. The only one which is
expected to be directly useful in client code is XMLCALL.
Note that on at least some Unix versions, the Expat library must be
compiled with the cdecl calling convention as the default since
system headers may assume the cdecl convention.
*/
#ifndef XMLCALL
#if defined(_MSC_VER)
#define XMLCALL __cdecl
#elif defined(__GNUC__) && defined(__i386) && !defined(__INTEL_COMPILER)
#define XMLCALL __attribute__((cdecl))
#else
/* For any platform which uses this definition and supports more than
one calling convention, we need to extend this definition to
declare the convention used on that platform, if it's possible to
do so.
If this is the case for your platform, please file a bug report
with information on how to identify your platform via the C
pre-processor and how to specify the same calling convention as the
platform's malloc() implementation.
*/
#define XMLCALL
#endif
#endif /* not defined XMLCALL */
#if !defined(XML_STATIC) && !defined(XMLIMPORT)
#ifndef XML_BUILDING_EXPAT
/* using Expat from an application */
#ifdef XML_USE_MSC_EXTENSIONS
#define XMLIMPORT __declspec(dllimport)
#endif
#endif
#endif /* not defined XML_STATIC */
#if !defined(XMLIMPORT) && defined(__GNUC__) && (__GNUC__ >= 4)
#define XMLIMPORT __attribute__ ((visibility ("default")))
#endif
/* If we didn't define it above, define it away: */
#ifndef XMLIMPORT
#define XMLIMPORT
#endif
#if defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 96))
#define XML_ATTR_MALLOC __attribute__((__malloc__))
#else
#define XML_ATTR_MALLOC
#endif
#if defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
#define XML_ATTR_ALLOC_SIZE(x) __attribute__((__alloc_size__(x)))
#else
#define XML_ATTR_ALLOC_SIZE(x)
#endif
#define XMLPARSEAPI(type) XMLIMPORT type XMLCALL
#ifdef __cplusplus
extern "C" {
#endif
#ifdef XML_UNICODE_WCHAR_T
# define XML_UNICODE
# if defined(__SIZEOF_WCHAR_T__) && (__SIZEOF_WCHAR_T__ != 2)
# error "sizeof(wchar_t) != 2; Need -fshort-wchar for both Expat and libc"
# endif
#endif
#ifdef XML_UNICODE /* Information is UTF-16 encoded. */
#ifdef XML_UNICODE_WCHAR_T
typedef wchar_t XML_Char;
typedef wchar_t XML_LChar;
#else
typedef unsigned short XML_Char;
typedef char XML_LChar;
#endif /* XML_UNICODE_WCHAR_T */
#else /* Information is UTF-8 encoded. */
typedef char XML_Char;
typedef char XML_LChar;
#endif /* XML_UNICODE */
#ifdef XML_LARGE_SIZE /* Use large integers for file/stream positions. */
#if defined(XML_USE_MSC_EXTENSIONS) && _MSC_VER < 1400
typedef __int64 XML_Index;
typedef unsigned __int64 XML_Size;
#else
typedef long long XML_Index;
typedef unsigned long long XML_Size;
#endif
#else
typedef long XML_Index;
typedef unsigned long XML_Size;
#endif /* XML_LARGE_SIZE */
#ifdef __cplusplus
}
#endif
#endif /* not Expat_External_INCLUDED */
hexpat-0.20.13/cbits/ascii.h 0000644 0000000 0000000 00000003701 13122604047 013701 0 ustar 00 0000000 0000000 /* Copyright (c) 1998, 1999 Thai Open Source Software Center Ltd
See the file COPYING for copying permission.
*/
#define ASCII_A 0x41
#define ASCII_B 0x42
#define ASCII_C 0x43
#define ASCII_D 0x44
#define ASCII_E 0x45
#define ASCII_F 0x46
#define ASCII_G 0x47
#define ASCII_H 0x48
#define ASCII_I 0x49
#define ASCII_J 0x4A
#define ASCII_K 0x4B
#define ASCII_L 0x4C
#define ASCII_M 0x4D
#define ASCII_N 0x4E
#define ASCII_O 0x4F
#define ASCII_P 0x50
#define ASCII_Q 0x51
#define ASCII_R 0x52
#define ASCII_S 0x53
#define ASCII_T 0x54
#define ASCII_U 0x55
#define ASCII_V 0x56
#define ASCII_W 0x57
#define ASCII_X 0x58
#define ASCII_Y 0x59
#define ASCII_Z 0x5A
#define ASCII_a 0x61
#define ASCII_b 0x62
#define ASCII_c 0x63
#define ASCII_d 0x64
#define ASCII_e 0x65
#define ASCII_f 0x66
#define ASCII_g 0x67
#define ASCII_h 0x68
#define ASCII_i 0x69
#define ASCII_j 0x6A
#define ASCII_k 0x6B
#define ASCII_l 0x6C
#define ASCII_m 0x6D
#define ASCII_n 0x6E
#define ASCII_o 0x6F
#define ASCII_p 0x70
#define ASCII_q 0x71
#define ASCII_r 0x72
#define ASCII_s 0x73
#define ASCII_t 0x74
#define ASCII_u 0x75
#define ASCII_v 0x76
#define ASCII_w 0x77
#define ASCII_x 0x78
#define ASCII_y 0x79
#define ASCII_z 0x7A
#define ASCII_0 0x30
#define ASCII_1 0x31
#define ASCII_2 0x32
#define ASCII_3 0x33
#define ASCII_4 0x34
#define ASCII_5 0x35
#define ASCII_6 0x36
#define ASCII_7 0x37
#define ASCII_8 0x38
#define ASCII_9 0x39
#define ASCII_TAB 0x09
#define ASCII_SPACE 0x20
#define ASCII_EXCL 0x21
#define ASCII_QUOT 0x22
#define ASCII_AMP 0x26
#define ASCII_APOS 0x27
#define ASCII_MINUS 0x2D
#define ASCII_PERIOD 0x2E
#define ASCII_COLON 0x3A
#define ASCII_SEMI 0x3B
#define ASCII_LT 0x3C
#define ASCII_EQUALS 0x3D
#define ASCII_GT 0x3E
#define ASCII_LSQB 0x5B
#define ASCII_RSQB 0x5D
#define ASCII_UNDERSCORE 0x5F
#define ASCII_LPAREN 0x28
#define ASCII_RPAREN 0x29
#define ASCII_FF 0x0C
#define ASCII_SLASH 0x2F
#define ASCII_HASH 0x23
#define ASCII_PIPE 0x7C
#define ASCII_COMMA 0x2C
hexpat-0.20.13/cbits/nametab.h 0000644 0000000 0000000 00000015612 13122604047 014224 0 ustar 00 0000000 0000000 static const unsigned namingBitmap[] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0x00000000, 0x04000000, 0x87FFFFFE, 0x07FFFFFE,
0x00000000, 0x00000000, 0xFF7FFFFF, 0xFF7FFFFF,
0xFFFFFFFF, 0x7FF3FFFF, 0xFFFFFDFE, 0x7FFFFFFF,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFE00F, 0xFC31FFFF,
0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF,
0xFFFFFFFF, 0xF80001FF, 0x00000003, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFD740, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD,
0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF,
0xFFFF0003, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF,
0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE,
0x0000007F, 0x00000000, 0xFFFF0000, 0x000707FF,
0x00000000, 0x07FFFFFE, 0x000007FE, 0xFFFE0000,
0xFFFFFFFF, 0x7CFFFFFF, 0x002F7FFF, 0x00000060,
0xFFFFFFE0, 0x23FFFFFF, 0xFF000000, 0x00000003,
0xFFF99FE0, 0x03C5FDFF, 0xB0000000, 0x00030003,
0xFFF987E0, 0x036DFDFF, 0x5E000000, 0x001C0000,
0xFFFBAFE0, 0x23EDFDFF, 0x00000000, 0x00000001,
0xFFF99FE0, 0x23CDFDFF, 0xB0000000, 0x00000003,
0xD63DC7E0, 0x03BFC718, 0x00000000, 0x00000000,
0xFFFDDFE0, 0x03EFFDFF, 0x00000000, 0x00000003,
0xFFFDDFE0, 0x03EFFDFF, 0x40000000, 0x00000003,
0xFFFDDFE0, 0x03FFFDFF, 0x00000000, 0x00000003,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFE, 0x000D7FFF, 0x0000003F, 0x00000000,
0xFEF02596, 0x200D6CAE, 0x0000001F, 0x00000000,
0x00000000, 0x00000000, 0xFFFFFEFF, 0x000003FF,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0xFFFFFFFF, 0xFFFF003F, 0x007FFFFF,
0x0007DAED, 0x50000000, 0x82315001, 0x002C62AB,
0x40000000, 0xF580C900, 0x00000007, 0x02010800,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0x0FFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x03FFFFFF,
0x3F3FFFFF, 0xFFFFFFFF, 0xAAFF3F3F, 0x3FFFFFFF,
0xFFFFFFFF, 0x5FDFFFFF, 0x0FCF1FDC, 0x1FDC1FFF,
0x00000000, 0x00004C40, 0x00000000, 0x00000000,
0x00000007, 0x00000000, 0x00000000, 0x00000000,
0x00000080, 0x000003FE, 0xFFFFFFFE, 0xFFFFFFFF,
0x001FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x07FFFFFF,
0xFFFFFFE0, 0x00001FFF, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0x0000003F, 0x00000000, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF,
0xFFFFFFFF, 0x0000000F, 0x00000000, 0x00000000,
0x00000000, 0x07FF6000, 0x87FFFFFE, 0x07FFFFFE,
0x00000000, 0x00800000, 0xFF7FFFFF, 0xFF7FFFFF,
0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF,
0xFFFFFFFF, 0xF80001FF, 0x00030003, 0x00000000,
0xFFFFFFFF, 0xFFFFFFFF, 0x0000003F, 0x00000003,
0xFFFFD7C0, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD,
0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF,
0xFFFF007B, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF,
0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE,
0xFFFE007F, 0xBBFFFFFB, 0xFFFF0016, 0x000707FF,
0x00000000, 0x07FFFFFE, 0x0007FFFF, 0xFFFF03FF,
0xFFFFFFFF, 0x7CFFFFFF, 0xFFEF7FFF, 0x03FF3DFF,
0xFFFFFFEE, 0xF3FFFFFF, 0xFF1E3FFF, 0x0000FFCF,
0xFFF99FEE, 0xD3C5FDFF, 0xB080399F, 0x0003FFCF,
0xFFF987E4, 0xD36DFDFF, 0x5E003987, 0x001FFFC0,
0xFFFBAFEE, 0xF3EDFDFF, 0x00003BBF, 0x0000FFC1,
0xFFF99FEE, 0xF3CDFDFF, 0xB0C0398F, 0x0000FFC3,
0xD63DC7EC, 0xC3BFC718, 0x00803DC7, 0x0000FF80,
0xFFFDDFEE, 0xC3EFFDFF, 0x00603DDF, 0x0000FFC3,
0xFFFDDFEC, 0xC3EFFDFF, 0x40603DDF, 0x0000FFC3,
0xFFFDDFEC, 0xC3FFFDFF, 0x00803DCF, 0x0000FFC3,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0xFFFFFFFE, 0x07FF7FFF, 0x03FF7FFF, 0x00000000,
0xFEF02596, 0x3BFF6CAE, 0x03FF3F5F, 0x00000000,
0x03000000, 0xC2A003FF, 0xFFFFFEFF, 0xFFFE03FF,
0xFEBF0FDF, 0x02FE3FFF, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x1FFF0000, 0x00000002,
0x000000A0, 0x003EFFFE, 0xFFFFFFFE, 0xFFFFFFFF,
0x661FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x77FFFFFF,
};
static const unsigned char nmstrtPages[] = {
0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x00,
0x00, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13,
0x00, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x15, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};
static const unsigned char namePages[] = {
0x19, 0x03, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x00,
0x00, 0x1F, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25,
0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13,
0x26, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x27, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
};