pax_global_header00006660000000000000000000000064147613165040014521gustar00rootroot0000000000000052 comment=e9ad1bbe1e161dfb84a4383dec748c906f7556e0 sissaschool-elementpath-d3688c7/000077500000000000000000000000001476131650400167275ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/.coveragerc000066400000000000000000000003371476131650400210530ustar00rootroot00000000000000[run] branch = True source = elementpath/ omit = elementpath/protocols.py elementpath/regex/generate_categories.py [report] exclude_lines = pragma: no cover raise NotImplementedError() if TYPE_CHECKING:sissaschool-elementpath-d3688c7/.github/000077500000000000000000000000001476131650400202675ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/.github/workflows/000077500000000000000000000000001476131650400223245ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/.github/workflows/test-elementpath.yml000066400000000000000000000027721476131650400263420ustar00rootroot00000000000000name: elementpath on: push: branches: [master, develop] pull_request: branches: [master, develop] jobs: build: runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: os: [ubuntu-latest, macos-latest, windows-latest] python-version: [3.8, 3.9, "3.10", 3.11, 3.12, "3.13.0", "3.14.0-alpha.5", "pypy-3.10"] exclude: - os: macos-latest python-version: 3.8 - os: windows-latest python-version: 3.8 - os: macos-latest python-version: 3.9 - os: windows-latest python-version: 3.9 steps: - uses: actions/checkout@v4 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - name: Install pip and setuptools run: | python -m pip install --upgrade pip pip install setuptools - name: Lint with flake8 run: | pip install flake8 flake8 elementpath --max-line-length=100 --statistics - name: Lint with mypy if: ${{ matrix.python-version != '3.8' }} run: | pip install mypy==1.15.0 xmlschema lxml-stubs mypy --show-error-codes --strict elementpath - name: Install optional dependencies if: ${{ matrix.python-version != '3.14.0-alpha.5' }} run: pip install lxml - name: Test with unittest run: | python -m unittest sissaschool-elementpath-d3688c7/.gitignore000066400000000000000000000002661476131650400207230ustar00rootroot00000000000000*.pyc *.pyo *~ *.so *.swp *.egg-info .idea/ .project .ipynb_checkpoints/ .tox/ .mypy_cache/ .coverage* !.coveragerc doc/_* __pycache__/ dist/ build/ development/ out/ profiling/out/ sissaschool-elementpath-d3688c7/.readthedocs.yml000066400000000000000000000002711476131650400220150ustar00rootroot00000000000000version: 2 build: os: "ubuntu-22.04" tools: python: "3.12" formats: - pdf sphinx: configuration: doc/conf.py python: install: - requirements: doc/requirements.txt sissaschool-elementpath-d3688c7/CHANGELOG.rst000066400000000000000000000447201476131650400207570ustar00rootroot00000000000000********* CHANGELOG ********* `v4.8.0`_ (2025-03-03) ====================== * Add full PSVI type labeling in XDM to solve type errors with XSD 1.1 assertions * Add *schema* optional argument to dynamic context * Add a RootToken as a proxy of the parsed token tree for compatibility with xmlschema<=3.4.3 * Extend XDM to split ElementTree/lxml processing from schema nodes and to allow future extensions `v4.7.0`_ (2024-12-20) ====================== * Fix *fragment* argument usage (issue #81) * Fix constructors nud() to skip argument check with XP31+ arrow operator (issue #83) `v4.6.0`_ (2024-10-27) ====================== * Fix XsdAttributeGroupProtocol * Improve Unicode support with installable UnicodeData.txt versions * Extend names disambiguation with a fix for issue #78 * Refactor tree builders to fix document position of tails (issue #79) `v4.5.0`_ (2024-09-09) ====================== * Fix and clean node trees iteration methods (issue #72) * Fix missing raw string for '[^\r\n]' (pull request #76) * Full and more specific type annotations `v4.4.0`_ (2024-03-11) ====================== * Improve stand-alone XPath functions builder (issue #70) * Update tokens and parsers __repr__ * Fix static typing protocols to work with etree and XSD elements `v4.3.0`_ (2024-02-17) ====================== * Change the purpose of the evaluation with a dynamic schema context * Add a tox.ini testenv with Python 3.13 pre-releases `v4.2.1`_ (2024-02-10) ====================== * Fix dynamic context initialization with lxml a non-root element (issue #71) * Fix XP30+ function fn:function-lookup * Fix XP30+ fn:unparsed-text, fn:unparsed-text-lines and fn:unparsed-text-available `v4.2.0`_ (2024-02-03) ====================== * Drop support for Python 3.7 * Add *uri* and *fragment* options to dynamic context * Make context root node not mandatory (issue #63) * Add function objects constructor (issue #70) `v4.1.5`_ (2023-07-25) ====================== * Fix typed value of ElementNode() if self.elem.text is None `v4.1.4`_ (2023-06-26) ====================== * Fix select of prefixed names (issue #68) * Fix zero length *xs:base64Binary* (pull request #69) `v4.1.3`_ (2023-06-17) ====================== * Fix XP30+ fn:path (issue #67) * Fix weak tests (issues #64 and #66) `v4.1.2`_ (2023-04-28) ====================== * Add support for Python 3.12 * Fix self shortcut operator (adding is_schema_node() to node classes) `v4.1.1`_ (2023-04-11) ====================== * Simplify type annotations for XSD datatypes * Full test coverage of sequence type functions with bugfixes `v4.1.0`_ (2023-03-21) ====================== * Refactor XPath function call (context=None only as keyword argument) * Add external function support (issue #60) * Some fixes to string representation and source property of tokens * Extend documentation and tests * Clean XSD datatypes hierarchy `v4.0.1`_ (2023-02-02) ====================== * Fix packaging: include py.typed in package data * Revert to comparison between xs:QName instances and strings `v4.0.0`_ (2023-02-01) ====================== * First XPath 3.1 implementation (without UCA collation support) `v3.0.2`_ (2022-08-12) ====================== * Extend root concept to subtrees used as root (e.g. XSD 1.1 assertions) * Begin XPath 3.1 implementation adding XPathMap and XPathArray `v3.0.1`_ (2022-07-23) ====================== * Fix of descendant path operator (issue #51) * Add support for Python 3.11 `v3.0.0`_ (2022-07-16) ====================== * Transition to full XPath node implementation (more memory usage but better control and overall faster) * Add etree.py module with a safe XML parser (ported from xmlschema) `v2.5.3`_ (2022-05-30) ====================== * Fix unary path step operator (issue #46) * Fix sphinx warnings *'reference target not found'* (issue #45) `v2.5.2`_ (2022-05-17) ====================== * Include PR #43 with fixes for `XPathContext.iter_siblings()` (issues #42 and #44) `v2.5.1`_ (2022-04-28) ====================== * Fix for failed floats equality tests (issue #41) * Static typing tested with mypy==0.950 `v2.5.0`_ (2022-03-04) ====================== * Add XPath 3.0 support * Better use of lxml.etree features * Full coverage of W3C tests * Drop support for Python 3.6 `v2.4.0`_ (2021-11-09) ====================== * Fix type annotations and going strict on parsers and other public classes * Add XPathConstructor token class (subclass of XPathFunction) * Last release for Python 3.6 `v2.3.2`_ (2021-09-16) ====================== * Make ElementProtocol and LxmlElementProtocol runtime checkable (only for Python 3.8+) * Type annotations for all package public APIs `v2.3.1`_ (2021-09-07) ====================== * Add LxmlElementProtocol * Add pytest env to tox.ini (test issue #39) `v2.3.0`_ (2021-09-01) ====================== * Add inline type annotations check support * Add structural Protocol based type checks (effective for Python 3.8+) `v2.2.3`_ (2021-06-16) ====================== * Add Python 3.10 in Tox and CI tests * Apply __slots__ to TDOP and regex classes `v2.2.2`_ (2021-05-03) ====================== * Fix issue sissaschool/xmlschema#243 (assert with xsi:nil usage) * First implementation of XPath 3.0 fn:format-integer `v2.2.1`_ (2021-03-24) ====================== * Add function signatures at token registration * Some fixes to XPath tokens and more XPath 3.0 implementations `v2.2.0`_ (2021-03-01) ====================== * Optimize TDOP parser's tokenizer * Resolve ambiguities with operators and statements that are also names * Merge with XPath 3.0/3.1 develop (to be completed) `v2.1.4`_ (2021-02-09) ====================== * Add tests and apply small fixes to TDOP parser * Fix wildcard selection of attributes (issue #35) `v2.1.3`_ (2021-01-30) ====================== * Extend tests for XPath 2.0 with minor fixes * Fix fn:round-half-to-even (issue #33) `v2.1.2`_ (2021-01-22) ====================== * Extend tests for XPath 1.0/2.0 with minor fixes * Fix for +/- prefix operators * Fix for regex patterns anchors and binary datatypes `v2.1.1`_ (2021-01-06) ====================== * Fix for issue #32 (test failure on missing locale setting) * Extend tests for XPath 1.0 with minor fixes `v2.1.0`_ (2021-01-05) ====================== * Create custom class hierarchy for XPath nodes that replaces named-tuples * Bind attribute nodes, text nodes and namespace nodes to parent element (issue #31) `v2.0.5`_ (2020-12-02) ====================== * Increase the speed of path step selection on large trees * More tests and small fixes to XSD builtin datatypes `v2.0.4`_ (2020-10-30) ====================== * Lazy tokenizer for parser classes in order to minimize import time `v2.0.3`_ (2020-09-13) ====================== * Fix context handling in cycle statements * Change constructor's label to 'constructor function' `v2.0.2`_ (2020-09-03) ====================== * Add regex translator to package API * More than 99% of W3C XPath 2.0 tests pass `v2.0.1`_ (2020-08-24) ====================== * Add regex transpiler (for XPath/XQuery and XML Schema regular expressions) * Hotfix for issue #30 `v2.0.0`_ (2020-08-13) ====================== * Extensive testing with W3C XPath 2.0 tests (~98% passed) * Split context variables from in-scope variables (types) * Add other XSD builtin atomic types `v1.4.6`_ (2020-06-15) ====================== * Fix XPathContext to let the subclasses replace the XPath nodes iterator function `v1.4.5`_ (2020-05-22) ====================== * Fix tokenizer and parsers for ambiguities between symbols and names `v1.4.4`_ (2020-04-23) ====================== * Improve XPath context and axes processing * Integrate pull requests and fix bug on predicate selector `v1.4.3`_ (2020-03-18) ====================== * Fix PyPy 3 tests on xs:base64Binary and xs:hexBinary * Separated the tests of schema proxy API and other schemas based tests `v1.4.2`_ (2020-03-13) ====================== * Multiple XSD type associations on a token * Extend xs:untypedAtomic type usage * Increase the tests coverage to 95% `v1.4.1`_ (2020-01-28) ====================== * Fix for node kind tests * Fix for issue #17 * Update test dependencies * Add PyPy3 to tests `v1.4.0`_ (2019-12-31) ====================== * Remove Python 2 support * Add TextNode node type * Fix for issue #15 and for errors related to PR #16 `v1.3.3`_ (2019-12-17) ====================== * Fix 'attribute' multi-role token (axis and kind test) * Fixes for issues #13 and #14 `v1.3.2`_ (2019-12-10) ====================== * Add token labels 'sequence types' and 'kind test' for callables that are not XPath functions * Add missing XPath 2.0 functions * Fix for issue #12 `v1.3.1`_ (2019-10-21) ====================== * Add test module for TDOP parser * Fix for issue #10 `v1.3.0`_ (2019-10-11) ====================== * Improved schema proxy * Improved XSD type matching using paths * Cached parent path for XPathContext (only Python 3) * Improve typed selection with TypedAttribute and TypedElement named-tuples * Add iter_results to XPathContext * Remove XMLSchemaProxy from package * Fix descendant shortcut operator '//' * Fix text() function * Fix typed select of '(name)' token * Fix 24-hour time for DateTime `v1.2.1`_ (2019-08-30) ====================== * Hashable XSD datatypes classes * Fix Duration types comparison `v1.2.0`_ (2019-08-14) ====================== * Added special XSD datatypes * Better handling of schema contexts * Added validators for numeric types * Fixed function conversion rules * Fixed tests with lxml and XPath 1.0 * Added tests for uncovered code `v1.1.8`_ (2019-05-20) ====================== * Added code coverage and flake8 checks * Drop Python 3.4 support * Use more specific XPath errors for functions and namespace resolving * Fix for issue #4 `v1.1.7`_ (2019-04-25) ====================== * Added Parser.is_spaced() method for checking if the current token has extra spaces before or after * Fixes for '/' and ':' tokens * Fixes for fn:max() and fn:min() functions `v1.1.6`_ (2019-03-28) ====================== * Fixes for XSD datatypes * Minor fixes after a first test run with Python v3.8a3 `v1.1.5`_ (2019-02-23) ====================== * Differentiated unordered XPath gregorian types from ordered types for XSD * Fix issue #2 `v1.1.4`_ (2019-02-21) ====================== * Implementation of a full Static Analysis Phase at parse() level * Schema-based static analysis for XPath 2.0 parsers using schema contexts * Added ``XPathSchemaContext`` class for processing schema contexts * Added atomization() and get_atomized_operand() helpers to XPathToken * Fix value comparison operators `v1.1.3`_ (2019-02-06) ====================== * Fix for issue #1 * Added fn:static-base-uri() and fn:resolve-uri() * Fixes to XPath 1.0 functions for compatibility mode `v1.1.2`_ (2019-01-30) ====================== * Fixes for XSD datatypes * Change the default value of *default_namespace* argument of XPath2Parser to ``None`` `v1.1.1`_ (2019-01-19) ====================== * Improvements and fixes for XSD datatypes * Rewritten AbstractDateTime for supporting years with value > 9999 * Added fn:dateTime() `v1.1.0`_ (2018-12-23) ====================== * Almost full implementation of XPath 2.0 * Extended XPath errors management * Add XSD datatypes for data/time builtins * Add constructors for XSD builtins `v1.0.12`_ (2018-09-01) ======================= * Fixed the default namespace use for names without prefix. `v1.0.11`_ (2018-07-25) ======================= * Added two recursive protected methods to context class * Minor fixes for context and helpers `v1.0.10`_ (2018-06-15) ======================= * Updated TDOP parser and implemented token classes serialization `v1.0.8`_ (2018-06-13) ====================== * Fixed token classes creation for parsers serialization `v1.0.7`_ (2018-05-07) ====================== * Added autodoc based manual with Sphinx `v1.0.6`_ (2018-05-02) ====================== * Added tox testing * Improved the parser class with raw_advance method `v1.0.5`_ (2018-03-31) ====================== * Added n.10 XPath 2.0 functions for strings * Fix README.rst for right rendering in PyPI * Added ElementPathMissingContextError exception for a correct handling of static context evaluation `v1.0.4`_ (2018-03-27) ====================== * Fixed packaging ('packages' argument in setup.py). `v1.0.3`_ (2018-03-27) ====================== * Fixed the effective boolean value for a list containing an empty string. `v1.0.2`_ (2018-03-27) ====================== * Add QName parsing like in the ElementPath library (usage regulated by a *strict* flag). `v1.0.1`_ (2018-03-27) ====================== * Some bug fixes for attributes selection. `v1.0.0`_ (2018-03-26) ====================== * First stable version. .. _v1.0.0: https://github.com/sissaschool/elementpath/commit/b28da83 .. _v1.0.1: https://github.com/sissaschool/elementpath/compare/v1.0.0...v1.0.1 .. _v1.0.2: https://github.com/sissaschool/elementpath/compare/v1.0.1...v1.0.2 .. _v1.0.3: https://github.com/sissaschool/elementpath/compare/v1.0.2...v1.0.3 .. _v1.0.4: https://github.com/sissaschool/elementpath/compare/v1.0.3...v1.0.4 .. _v1.0.5: https://github.com/sissaschool/elementpath/compare/v1.0.4...v1.0.5 .. _v1.0.6: https://github.com/sissaschool/elementpath/compare/v1.0.5...v1.0.6 .. _v1.0.7: https://github.com/sissaschool/elementpath/compare/v1.0.6...v1.0.7 .. _v1.0.8: https://github.com/sissaschool/elementpath/compare/v1.0.7...v1.0.8 .. _v1.0.10: https://github.com/sissaschool/elementpath/compare/v1.0.8...v1.0.10 .. _v1.0.11: https://github.com/sissaschool/elementpath/compare/v1.0.10...v1.0.11 .. _v1.0.12: https://github.com/sissaschool/elementpath/compare/v1.0.11...v1.0.12 .. _v1.1.0: https://github.com/sissaschool/elementpath/compare/v1.0.12...v1.1.0 .. _v1.1.1: https://github.com/sissaschool/elementpath/compare/v1.1.0...v1.1.1 .. _v1.1.2: https://github.com/sissaschool/elementpath/compare/v1.1.1...v1.1.2 .. _v1.1.3: https://github.com/sissaschool/elementpath/compare/v1.1.2...v1.1.3 .. _v1.1.4: https://github.com/sissaschool/elementpath/compare/v1.1.3...v1.1.4 .. _v1.1.5: https://github.com/sissaschool/elementpath/compare/v1.1.4...v1.1.5 .. _v1.1.6: https://github.com/sissaschool/elementpath/compare/v1.1.5...v1.1.6 .. _v1.1.7: https://github.com/sissaschool/elementpath/compare/v1.1.6...v1.1.7 .. _v1.1.8: https://github.com/sissaschool/elementpath/compare/v1.1.7...v1.1.8 .. _v1.1.9: https://github.com/sissaschool/elementpath/compare/v1.1.8...v1.1.9 .. _v1.2.0: https://github.com/sissaschool/elementpath/compare/v1.1.9...v1.2.0 .. _v1.2.1: https://github.com/sissaschool/elementpath/compare/v1.2.0...v1.2.1 .. _v1.3.0: https://github.com/sissaschool/elementpath/compare/v1.2.1...v1.3.0 .. _v1.3.1: https://github.com/sissaschool/elementpath/compare/v1.3.0...v1.3.1 .. _v1.3.2: https://github.com/sissaschool/elementpath/compare/v1.3.1...v1.3.2 .. _v1.3.3: https://github.com/sissaschool/elementpath/compare/v1.3.2...v1.3.3 .. _v1.4.0: https://github.com/sissaschool/elementpath/compare/v1.3.3...v1.4.0 .. _v1.4.1: https://github.com/sissaschool/elementpath/compare/v1.4.0...v1.4.1 .. _v1.4.2: https://github.com/sissaschool/elementpath/compare/v1.4.1...v1.4.2 .. _v1.4.3: https://github.com/sissaschool/elementpath/compare/v1.4.2...v1.4.3 .. _v1.4.4: https://github.com/sissaschool/elementpath/compare/v1.4.3...v1.4.4 .. _v1.4.5: https://github.com/sissaschool/elementpath/compare/v1.4.4...v1.4.5 .. _v1.4.6: https://github.com/sissaschool/elementpath/compare/v1.4.5...v1.4.6 .. _v2.0.0: https://github.com/sissaschool/elementpath/compare/v1.4.6...v2.0.0 .. _v2.0.1: https://github.com/sissaschool/elementpath/compare/v2.0.0...v2.0.1 .. _v2.0.2: https://github.com/sissaschool/elementpath/compare/v2.0.1...v2.0.2 .. _v2.0.3: https://github.com/sissaschool/elementpath/compare/v2.0.2...v2.0.3 .. _v2.0.4: https://github.com/sissaschool/elementpath/compare/v2.0.3...v2.0.4 .. _v2.0.5: https://github.com/sissaschool/elementpath/compare/v2.0.4...v2.0.5 .. _v2.1.0: https://github.com/sissaschool/elementpath/compare/v2.0.5...v2.1.0 .. _v2.1.1: https://github.com/sissaschool/elementpath/compare/v2.1.0...v2.1.1 .. _v2.1.2: https://github.com/sissaschool/elementpath/compare/v2.1.1...v2.1.2 .. _v2.1.3: https://github.com/sissaschool/elementpath/compare/v2.1.2...v2.1.3 .. _v2.1.4: https://github.com/sissaschool/elementpath/compare/v2.1.3...v2.1.4 .. _v2.2.0: https://github.com/sissaschool/elementpath/compare/v2.1.4...v2.2.0 .. _v2.2.1: https://github.com/sissaschool/elementpath/compare/v2.2.0...v2.2.1 .. _v2.2.2: https://github.com/sissaschool/elementpath/compare/v2.2.1...v2.2.2 .. _v2.2.3: https://github.com/sissaschool/elementpath/compare/v2.2.2...v2.2.3 .. _v2.3.0: https://github.com/sissaschool/elementpath/compare/v2.2.3...v2.3.0 .. _v2.3.1: https://github.com/sissaschool/elementpath/compare/v2.3.0...v2.3.1 .. _v2.3.2: https://github.com/sissaschool/elementpath/compare/v2.3.1...v2.3.2 .. _v2.4.0: https://github.com/sissaschool/elementpath/compare/v2.3.3...v2.4.0 .. _v2.5.0: https://github.com/sissaschool/elementpath/compare/v2.4.0...v2.5.0 .. _v2.5.1: https://github.com/sissaschool/elementpath/compare/v2.5.0...v2.5.1 .. _v2.5.2: https://github.com/sissaschool/elementpath/compare/v2.5.1...v2.5.2 .. _v2.5.3: https://github.com/sissaschool/elementpath/compare/v2.5.2...v2.5.3 .. _v3.0.0: https://github.com/sissaschool/elementpath/compare/v2.5.3...v3.0.0 .. _v3.0.1: https://github.com/sissaschool/elementpath/compare/v3.0.0...v3.0.1 .. _v3.0.2: https://github.com/sissaschool/elementpath/compare/v3.0.1...v3.0.2 .. _v4.0.0: https://github.com/sissaschool/elementpath/compare/v3.0.2...v4.0.0 .. _v4.0.1: https://github.com/sissaschool/elementpath/compare/v4.0.0...v4.0.1 .. _v4.1.0: https://github.com/sissaschool/elementpath/compare/v4.0.1...v4.1.0 .. _v4.1.1: https://github.com/sissaschool/elementpath/compare/v4.1.0...v4.1.1 .. _v4.1.2: https://github.com/sissaschool/elementpath/compare/v4.1.1...v4.1.2 .. _v4.1.3: https://github.com/sissaschool/elementpath/compare/v4.1.2...v4.1.3 .. _v4.1.4: https://github.com/sissaschool/elementpath/compare/v4.1.3...v4.1.4 .. _v4.1.5: https://github.com/sissaschool/elementpath/compare/v4.1.4...v4.1.5 .. _v4.2.0: https://github.com/sissaschool/elementpath/compare/v4.1.5...v4.2.0 .. _v4.2.1: https://github.com/sissaschool/elementpath/compare/v4.2.0...v4.2.1 .. _v4.3.0: https://github.com/sissaschool/elementpath/compare/v4.2.1...v4.3.0 .. _v4.4.0: https://github.com/sissaschool/elementpath/compare/v4.3.0...v4.4.0 .. _v4.5.0: https://github.com/sissaschool/elementpath/compare/v4.4.0...v4.5.0 .. _v4.6.0: https://github.com/sissaschool/elementpath/compare/v4.5.0...v4.6.0 .. _v4.7.0: https://github.com/sissaschool/elementpath/compare/v4.6.0...v4.7.0 .. _v4.8.0: https://github.com/sissaschool/elementpath/compare/v4.7.0...v4.8.0 sissaschool-elementpath-d3688c7/LICENSE000066400000000000000000000021531476131650400177350ustar00rootroot00000000000000The MIT License (MIT) Copyright (c), 2018-2021, SISSA (Scuola Internazionale Superiore di Studi Avanzati) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. sissaschool-elementpath-d3688c7/MANIFEST.in000066400000000000000000000005471476131650400204730ustar00rootroot00000000000000include LICENSE include MANIFEST.in include README.rst include CHANGELOG.rst include setup.py include setup.cfg include requirements-dev.txt include tox.ini include .coveragerc include mypy.ini include doc/* recursive-include elementpath * recursive-include scripts * recursive-include tests * recursive-exclude tests/.mypy_cache * global-exclude *.py[cod] sissaschool-elementpath-d3688c7/README.rst000066400000000000000000000120401476131650400204130ustar00rootroot00000000000000*********** elementpath *********** .. image:: https://img.shields.io/pypi/v/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. image:: https://img.shields.io/pypi/pyversions/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. image:: https://img.shields.io/pypi/implementation/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. image:: https://img.shields.io/badge/License-MIT-blue.svg :alt: MIT License :target: https://lbesson.mit-license.org/ .. image:: https://img.shields.io/pypi/dm/elementpath.svg :target: https://pypi.python.org/pypi/elementpath/ .. elementpath-introduction The proposal of this package is to provide XPath 1.0, 2.0, 3.0 and 3.1 selectors for ElementTree XML data structures, both for the standard ElementTree library and for the `lxml.etree `_ library. For `lxml.etree `_ this package can be useful for providing XPath 2.0/3.0/3.1 selectors, because `lxml.etree `_ already has it's own implementation of XPath 1.0. Installation and usage ====================== You can install the package with *pip* in a Python 3.8+ environment:: pip install elementpath For using it import the package and apply the selectors on ElementTree nodes: .. code-block:: pycon >>> import elementpath >>> from xml.etree import ElementTree >>> root = ElementTree.XML('') >>> elementpath.select(root, '/A/B2/*') [, , ] The *select* API provides the standard XPath result format that is a list or an elementary datatype's value. If you want only to iterate over results you can use the generator function *iter_select* that accepts the same arguments of *select*. The selectors API works also using XML data trees based on the `lxml.etree `_ library: .. code-block:: pycon >>> import elementpath >>> import lxml.etree as etree >>> root = etree.XML('') >>> elementpath.select(root, '/A/B2/*') [, , ] When you need to apply the same XPath expression to several XML data you can also use the *Selector* class, creating an instance and then using it to apply the path on distinct XML data: .. code-block:: pycon >>> import elementpath >>> import lxml.etree as etree >>> selector = elementpath.Selector('/A/*/*') >>> root = etree.XML('') >>> selector.select(root) [, , ] >>> root = etree.XML('') >>> selector.select(root) [, , , ] Public API classes and functions are described into the `elementpath manual on the "Read the Docs" site `_. For default the XPath 2.0 is used. If you need XPath 1.0 parser provide the *parser* argument: .. code-block:: pycon >>> from elementpath import select, XPath1Parser >>> from xml.etree import ElementTree >>> root = ElementTree.XML('') >>> select(root, '/A/B2/*', parser=XPath1Parser) [, , ] For XPath 3.0/3.1 import the parser from *elementpath.xpath3* subpackage, that is not loaded for default: .. code-block:: pycon >>> from elementpath.xpath3 import XPath3Parser >>> select(root, 'math:atan(1.0e0)', parser=XPath3Parser) 0.7853981633974483 Note: *XPath3Parser* is an alias of *XPath31Parser*. If you need only XPath 3.0 you can also use a more specific subpackage, avoiding the loading of XPath 3.1 implementation: .. code-block:: pycon >>> from elementpath.xpath30 import XPath30Parser >>> select(root, 'math:atan(1.0e0)', parser=XPath30Parser) 0.7853981633974483 Contributing ============ You can contribute to this package reporting bugs, using the issue tracker or by a pull request. In case you open an issue please try to provide a test or test data for reproducing the wrong behaviour. The provided testing code shall be added to the tests of the package. The XPath parsers are based on an implementation of the Pratt's Top Down Operator Precedence parser. The implemented parser includes some lookup-ahead features, helpers for registering tokens and for extending language implementations. Also the token class has been generalized using a `MutableSequence` as base class. See *tdop.py* for the basic internal classes and *xpath1_parser.py* for extensions and for a basic usage of the parser. If you like you can use the basic parser and tokens provided by the *tdop.py* module to implement other types of parsers (I think it could be also a funny exercise!). License ======= This software is distributed under the terms of the MIT License. See the file 'LICENSE' in the root directory of the present distribution, or http://opensource.org/licenses/MIT. sissaschool-elementpath-d3688c7/doc/000077500000000000000000000000001476131650400174745ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/doc/Makefile000066400000000000000000000011401476131650400211300ustar00rootroot00000000000000# Minimal makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build SPHINXPROJ = elementpath SOURCEDIR = . BUILDDIR = _build # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)sissaschool-elementpath-d3688c7/doc/advanced.rst000066400000000000000000000222151476131650400217750ustar00rootroot00000000000000*************** Advanced topics *************** .. testsetup:: from xml.etree import ElementTree from elementpath import XPath2Parser, XPathToken, XPathContext, get_node_tree Parsing expressions =================== An XPath expression (the *path*) is analyzed using a parser instance, having as result a tree of tokens: .. doctest:: >>> from elementpath import XPath2Parser, XPathToken >>> >>> parser = XPath2Parser() >>> token = parser.parse('/root/(: comment :) child[@attr]') >>> isinstance(token, XPathToken) True That token is a proxy token for the tree produced by TDOP parsing: .. doctest:: >>> token <_SolidusOperator object at 0x... >>> str(token) "'/' operator" >>> token.tree '(/ (/ (root)) ([ (child) (@ (attr))))' >>> token.source '/root/child[@attr]' Providing a wrong expression an error is raised: .. doctest:: >>> token = parser.parse('/root/#child2/@attr') Traceback (most recent call last): ......... elementpath.exceptions.ElementPathSyntaxError: '#' unknown at line 1, column 7: [err:XPST0003] unknown symbol '#' The result tree is also checked with a static evaluation, that uses only the information provided by the parser instance (e.g. statically known namespaces). In *elementpath* a parser instance represents the `XPath static context `_. Static evaluation is not based on any XML input data but permits to found many errors related with operators and function arguments: .. doctest:: >>> token = parser.parse('1 + "1"') Traceback (most recent call last): File "", line 1, in File ".../elementpath/xpath2/xpath2_parser.py", ..., in parse root_token.evaluate() # Static context evaluation ......... elementpath.exceptions.ElementPathTypeError: '+' operator at line 1, column 3: [err:XPTY0004] ... Dynamic evaluation ================== Evaluation on XML data is performed using the `XPath dynamic context `_, represented by *XPathContext* objects. .. doctest:: >>> from xml.etree import ElementTree >>> from elementpath import XPathContext >>> >>> root = ElementTree.XML('') >>> context = XPathContext(root) >>> token.evaluate(context) [EtreeElementNode(elem=>> token.evaluate() Traceback (most recent call last): ......... elementpath.exceptions.MissingContextError: '/' operator at line 1, column 6: [err:XPDY0002] Dynamic context required for evaluate Expressions that not depend on XML data can be evaluated also without a context: .. doctest:: >>> token = parser.parse('concat("foo", " ", "bar")') >>> token.evaluate() 'foo bar' For more details on parsing and evaluation of XPath expressions see the `XPath processing model `_. Node trees ========== In the `XPath Data Model `_ there are `seven kinds of nodes `_: document, element, attribute, text, namespace, processing instruction, and comment. For a fully compliant XPath processing all the seven node kinds have to be represented and processed, considering theirs properties (called accessors) and their position in the belonging document. But the ElementTree components don’t implement all the necessary characteristics, forcing to use workaround tricks, that make the code more complex. So since version v3.0 the data processing is based on XPath node types, that act as wrappers of elements of the input ElementTree structures. Node trees building requires more time and memory for handling dynamic context and for iterating the trees, but is overall fast because simplify the rest of the code. Node trees are automatically created at dynamic context initialization: .. doctest:: >>> from xml.etree import ElementTree >>> from elementpath import XPathContext, get_node_tree >>> >>> root = ElementTree.XML('') >>> context = XPathContext(root) >>> context.root EtreeElementNode(elem=) >>> context.root.children [EtreeElementNode(elem=), EtreeElementNode(elem=)] If the same XML data is applied several times for dynamic evaluation it maybe convenient to build the node tree before, in the way to create it only once: .. doctest:: >>> root_node = get_node_tree(root) >>> context = XPathContext(root_node) >>> context.root is root_node True The context root and the context item ===================================== Selector functions and class simplify the XML data processing. Often you only have to provide the root element and the path expression. But other keyword arguments, related to parser or context initialization, can be provided. Of these arguments the item has a particular relevance, because it defines the initial context item for performing dynamic evaluation. If you have this XML data: .. doctest:: >>> from xml.etree import ElementTree >>> from elementpath import select >>> >>> root = ElementTree.XML('') using a select on it with the self-shortcut expression, gives back the root element: .. doctest:: >>> select(root, '.') [] But if you want to use a specific child as the initial context item you have to provide the extra argument *item*: .. doctest:: >>> select(root, '.', item=root[1]) [] The same result can be obtained providing the same child element as argument *root*: .. doctest:: >>> select(root[1], '.') [] But this is not always true, because in the latter case the evaluation is done using a subtree of nodes: .. doctest:: >>> select(root, 'root()', item=root[1]) [] >>> select(root[1], 'root()') [] Both choices can be useful, depends if you need to keep the whole tree or to restrict the scope to a subtree. The context *item* can be set with an XPath node, an atomic value or an XPath function. .. note:: Since release v4.2.0 the *root* is optional. If the argument *root* is absent the argument *item* is mandatory and the dynamic context remain without a root. The root document and the root element ====================================== .. warning:: The initialization of context root and item is changed in release v4.2.0. Since then the provided XML is still considered a document for default, but the item is set with the root instead of `None` and the new attribute *document* is set with a dummy document for handling the document position. The dummy document is not referred by the root element and is discarded from results. Canonically the dynamic evaluation is performed on an XML document, created from an ElementTree instance: .. doctest:: >>> from xml.etree import ElementTree >>> from io import StringIO >>> from elementpath import select, XPathContext >>> >>> doc = ElementTree.parse(StringIO('')) >>> doc In this case a document node is created at context initialization and the context item is set to context root: .. doctest:: >>> context = XPathContext(doc) >>> context.root EtreeDocumentNode(document=) >>> context.item is context.root True >>> context.document is context.root True Providing a root element the document is not created and the context item is set to root element node. In this case the context document is a dummy document: .. doctest:: >>> root = ElementTree.XML('') >>> context = XPathContext(root) >>> context.root EtreeElementNode(elem=) >>> context.item is context.root True >>> context.document EtreeDocumentNode(document=) >>> context.root.parent is None True Exception to this is if XML data root has siblings and if you process the data with lxml: .. doctest:: >>> import lxml.etree as etree >>> root = etree.XML('') >>> context = XPathContext(root) >>> context.root EtreeDocumentNode(document=) >>> context.item is context.root True >>> context.document is context.root True Provide the option *fragment* with value `True` for processing an XML root element as a fragment. In this case a dummy document is not created and the context document is set to `None`: .. doctest:: >>> root = ElementTree.XML('') >>> context = XPathContext(root, fragment=True) >>> context.root EtreeElementNode(elem=) >>> context.item is context.root True >>> context.document is None True sissaschool-elementpath-d3688c7/doc/conf.py000066400000000000000000000133521476131650400207770ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Configuration file for the Sphinx documentation builder. # # This file does only contain a selection of the most common options. For a # full list see the documentation: # http://www.sphinx-doc.org/en/stable/config # -- Path setup -------------------------------------------------------------- # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # # import os # import sys # sys.path.insert(0, os.path.abspath('.')) # Extends the path with parent directory in order to import elementpath from # the project directory also if it's installed. import sys import os sys.path.insert(0, os.path.abspath('..')) # -- Project information ----------------------------------------------------- project = 'elementpath' copyright = '2018-2025, SISSA (International School for Advanced Studies)' author = 'Davide Brunato' # The short X.Y version version = '4.8' # The full version, including alpha/beta/rc tags release = '4.8.0' # -- General configuration --------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.doctest', ] # Options for autodoc add_module_names = False # do not add module name as prefix to classes or functions. autodoc_typehints = 'none' # do not add type annotations nitpick_ignore = [ ('py:class', 'XMLSchemaProxy') ] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] source_suffix = '.rst' # The master toctree document. master_doc = 'index' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. # language = None language = 'en' # required by Sphinx v5.0.0 # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path . exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # -- Options for HTML output ------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'alabaster' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # # html_theme_options = {} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Custom sidebar templates, must be a dictionary that maps document names # to template names. # # The default sidebars (for documents that don't match any pattern) are # defined by theme itself. Builtin themes are using these templates by # default: ``['localtoc.html', 'relations.html', 'sourcelink.html', # 'searchbox.html']``. # # html_sidebars = {} # -- Options for HTMLHelp output --------------------------------------------- # Output file base name for HTML help builder. htmlhelp_basename = 'elementpathdoc' # -- Options for LaTeX output ------------------------------------------------ latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # # 'preamble': '', # Latex figure (float) alignment # # 'figure_align': 'htbp', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ (master_doc, 'elementpath.tex', 'elementpath Manual', 'Davide Brunato', 'manual'), ] # -- Options for manual page output ------------------------------------------ # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ (master_doc, 'elementpath', 'elementpath Manual', [author], 1) ] # -- Options for Texinfo output ---------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ (master_doc, 'elementpath', 'elementpath Manual', author, 'elementpath', 'One line description of project.', 'Miscellaneous'), ] # -- Options for Epub output ------------------------------------------------- # Bibliographic Dublin Core info. epub_title = project epub_author = author epub_publisher = author epub_copyright = copyright # The unique identifier of the text. This can be a ISBN number # or the project homepage. # # epub_identifier = '' # A unique identification for the text. # # epub_uid = '' # A list of files that should not be packed into the epub file. epub_exclude_files = ['search.html'] # -- Extension configuration ------------------------------------------------- sissaschool-elementpath-d3688c7/doc/index.rst000066400000000000000000000005321476131650400213350ustar00rootroot00000000000000.. elementpath documentation master file, created by sphinx-quickstart on Fri May 4 19:54:35 2018. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. elementpath manual ================== .. toctree:: :maxdepth: 2 introduction advanced xpath_api pratt_api sissaschool-elementpath-d3688c7/doc/introduction.rst000066400000000000000000000001561476131650400227510ustar00rootroot00000000000000************ Introduction ************ .. include:: ../README.rst :start-after: elementpath-introduction sissaschool-elementpath-d3688c7/doc/make.bat000066400000000000000000000014571476131650400211100ustar00rootroot00000000000000@ECHO OFF pushd %~dp0 REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set SOURCEDIR=. set BUILDDIR=_build set SPHINXPROJ=elementpath if "%1" == "" goto help %SPHINXBUILD% >NUL 2>NUL if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx echo.installed, then set the SPHINXBUILD environment variable to point echo.to the full path of the 'sphinx-build' executable. Alternatively you echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from echo.http://sphinx-doc.org/ exit /b 1 ) %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% goto end :help %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% :end popd sissaschool-elementpath-d3688c7/doc/pratt_api.rst000066400000000000000000000036121476131650400222130ustar00rootroot00000000000000****************** Pratt's parser API ****************** The TDOP (Top Down Operator Precedence) parser implemented within this library is a variant of the original Pratt's parser based on a class for the parser and meta-classes for tokens. The parser base class includes helper functions for registering token classes, the Pratt's methods and a regexp-based tokenizer builder. There are also additional methods and attributes to help the developing of new parsers. Parsers can be defined by class derivation and following a tokens registration procedure. These classes are not available at package level but only within module `elementpath.tdop`. Token base class ================ .. autoclass:: elementpath.tdop.Token .. autoattribute:: arity .. autoattribute:: tree .. autoattribute:: source .. automethod:: nud .. automethod:: led .. automethod:: evaluate .. automethod:: iter Helper methods for checking symbols and for error raising: .. automethod:: expected .. automethod:: unexpected .. automethod:: wrong_syntax .. automethod:: wrong_value .. automethod:: wrong_type Parser base class ================= .. autoclass:: elementpath.tdop.Parser .. autoattribute:: position Parsing methods: .. automethod:: parse .. automethod:: advance .. automethod:: advance_until .. automethod:: expression Helper methods for checking parser status: .. automethod:: is_source_start .. automethod:: is_line_start .. automethod:: is_spaced Helper methods for building new parsers: .. automethod:: register .. automethod:: unregister .. automethod:: duplicate .. automethod:: literal .. automethod:: nullary .. automethod:: prefix .. automethod:: postfix .. automethod:: infix .. automethod:: infixr .. automethod:: method .. automethod:: build .. automethod:: create_tokenizer sissaschool-elementpath-d3688c7/doc/requirements.txt000066400000000000000000000000571476131650400227620ustar00rootroot00000000000000Sphinx==7.2.6 readthedocs-sphinx-search==0.3.2 sissaschool-elementpath-d3688c7/doc/xpath_api.rst000066400000000000000000000113251476131650400222050ustar00rootroot00000000000000**************** Public XPath API **************** The package includes some classes and functions that implement XPath selectors, parsers, tokens, contexts and schema proxy. XPath selectors =============== .. autofunction:: elementpath.select .. autofunction:: elementpath.iter_select .. autoclass:: elementpath.Selector .. autoattribute:: namespaces .. automethod:: select .. automethod:: iter_select XPath parsers ============= .. autoclass:: elementpath.XPath1Parser .. autoattribute:: DEFAULT_NAMESPACES .. autoattribute:: version Helper methods for defining token classes: .. automethod:: axis .. automethod:: function .. autoclass:: elementpath.XPath2Parser .. autoclass:: elementpath.xpath3.XPath30Parser .. autoclass:: elementpath.xpath3.XPath31Parser XPath tokens ============ .. autoclass:: elementpath.XPathToken .. automethod:: evaluate .. automethod:: select Context manipulation helpers: .. automethod:: get_argument .. automethod:: atomization .. automethod:: get_atomized_operand .. automethod:: iter_comparison_data .. automethod:: get_operands .. automethod:: get_results .. automethod:: select_results .. automethod:: adjust_datetime Schema context methods .. automethod:: select_xsd_nodes .. automethod:: add_xsd_type .. automethod:: get_xsd_type .. automethod:: get_typed_node Data accessor helpers .. automethod:: data_value .. automethod:: boolean_value .. automethod:: string_value .. automethod:: number_value .. automethod:: schema_node_value Error management helper: .. automethod:: error XPath contexts ============== .. autoclass:: elementpath.XPathContext .. autoclass:: elementpath.XPathSchemaContext XML Schema proxy ================ The XPath 2.0 parser can be interfaced with an XML Schema processor through a schema proxy. An :class:`XMLSchemaProxy` class is defined for interfacing schemas created with the *xmlschema* package. This class is based on an abstract class :class:`elementpath.AbstractSchemaProxy`, that can be used for implementing concrete interfaces to other types of XML Schema processors. .. autoclass:: elementpath.AbstractSchemaProxy .. automethod:: bind_parser .. automethod:: get_context .. automethod:: find .. automethod:: get_type .. automethod:: get_attribute .. automethod:: get_element .. automethod:: is_instance .. automethod:: cast_as .. automethod:: iter_atomic_types XPath nodes =========== XPath nodes are processed using a set of classes derived from :class:`elementpath.XPathNode`. This class hierarchy is as simple as possible, with a focus on speed a low memory consumption. .. autoclass:: elementpath.XPathNode The seven XPath node types: .. autoclass:: elementpath.AttributeNode .. autoclass:: elementpath.NamespaceNode .. autoclass:: elementpath.TextNode .. autoclass:: elementpath.CommentNode .. autoclass:: elementpath.ProcessingInstructionNode .. autoclass:: elementpath.ElementNode .. autoclass:: elementpath.DocumentNode There are also other two specialized versions of ElementNode usable on specific cases: .. autoclass:: elementpath.LazyElementNode .. autoclass:: elementpath.SchemaElementNode Node tree builders ================== Node trees are automatically created during the initialization of an :class:`elementpath.XPathContext`. But if you need to process the same XML data more times there is an helper API for creating document or element based node trees: .. autofunction:: elementpath.get_node_tree .. autofunction:: elementpath.build_node_tree .. autofunction:: elementpath.build_lxml_node_tree .. autofunction:: elementpath.build_schema_node_tree XPath regular expressions ========================= .. autofunction:: elementpath.translate_pattern .. autofunction:: elementpath.install_unicode_data .. autofunction:: elementpath.unicode_version Exception classes ================= .. autoexception:: elementpath.ElementPathError .. autoexception:: elementpath.MissingContextError .. autoexception:: elementpath.UnsupportedFeatureError .. autoexception:: elementpath.RegexError .. autoexception:: elementpath.ElementPathLocaleError There are also other exceptions, multiple derived from the base exception :class:`elementpath.ElementPathError` and Python built-in exceptions: .. autoexception:: elementpath.ElementPathKeyError .. autoexception:: elementpath.ElementPathNameError .. autoexception:: elementpath.ElementPathOverflowError .. autoexception:: elementpath.ElementPathRuntimeError .. autoexception:: elementpath.ElementPathSyntaxError .. autoexception:: elementpath.ElementPathTypeError .. autoexception:: elementpath.ElementPathValueError .. autoexception:: elementpath.ElementPathZeroDivisionError sissaschool-elementpath-d3688c7/elementpath/000077500000000000000000000000001476131650400212355ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/__init__.py000066400000000000000000000052621476131650400233530ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # __version__ = '4.8.0' __author__ = "Davide Brunato" __contact__ = "brunato@sissa.it" __copyright__ = "Copyright 2018-2025, SISSA" __license__ = "MIT" __status__ = "Production/Stable" # Imports here are considered as stable API, other internal calls may change. from . import datatypes # XSD datatypes from . import etree # Safe parser and helper functions for ElementTree from . import protocols # Protocols for type annotations from .exceptions import ElementPathError, MissingContextError, ElementPathKeyError, \ ElementPathZeroDivisionError, ElementPathNameError, ElementPathOverflowError, \ ElementPathRuntimeError, ElementPathSyntaxError, ElementPathTypeError, \ ElementPathValueError, ElementPathLocaleError, UnsupportedFeatureError from .xpath_context import XPathContext, XPathSchemaContext from .xpath_nodes import XPathNode, AttributeNode, NamespaceNode, CommentNode, \ ProcessingInstructionNode, TextNode, ElementNode, LazyElementNode, \ SchemaElementNode, DocumentNode from .tree_builders import get_node_tree, build_node_tree, build_lxml_node_tree, \ build_schema_node_tree from .xpath_tokens import XPathToken, XPathFunction from .xpath1 import XPath1Parser from .xpath2 import XPath2Parser from .xpath_selectors import select, iter_select, Selector from .schema_proxy import AbstractSchemaProxy from .regex import RegexError, translate_pattern, install_unicode_data, unicode_version __all__ = ['datatypes', 'protocols', 'etree', 'ElementPathError', 'MissingContextError', 'UnsupportedFeatureError', 'ElementPathKeyError', 'ElementPathLocaleError', 'ElementPathZeroDivisionError', 'ElementPathNameError', 'ElementPathOverflowError', 'ElementPathRuntimeError', 'ElementPathSyntaxError', 'ElementPathTypeError', 'ElementPathValueError', 'XPathContext', 'XPathSchemaContext', 'XPathNode', 'AttributeNode', 'NamespaceNode', 'CommentNode', 'ProcessingInstructionNode', 'TextNode', 'ElementNode', 'LazyElementNode', 'SchemaElementNode', 'DocumentNode', 'get_node_tree', 'build_node_tree', 'build_lxml_node_tree', 'build_schema_node_tree', 'XPathToken', 'XPathFunction', 'XPath1Parser', 'XPath2Parser', 'select', 'iter_select', 'Selector', 'AbstractSchemaProxy', 'RegexError', 'translate_pattern', 'install_unicode_data', 'unicode_version'] sissaschool-elementpath-d3688c7/elementpath/_typing.py000066400000000000000000000020171476131650400232600ustar00rootroot00000000000000# # Copyright (c), 2024, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Version related imports for subscriptable types for type annotations (no builtins). """ import sys if sys.version_info < (3, 9): from typing import Callable, Counter, Deque, Iterable, Iterator, \ Mapping, Match, MutableMapping, MutableSequence, MutableSet, Pattern, Sequence else: from collections import deque as Deque, Counter # noqa from collections.abc import Callable, Iterable, Iterator, Mapping, MutableMapping, \ MutableSequence, MutableSet, Sequence from re import Match, Pattern __all__ = ['Callable', 'Counter', 'Deque', 'Iterable', 'Iterator', 'Match', 'Mapping', 'MutableMapping', 'MutableSequence', 'MutableSet', 'Pattern', 'Sequence'] sissaschool-elementpath-d3688c7/elementpath/aliases.py000066400000000000000000000033611476131650400232330ustar00rootroot00000000000000# # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Common type hints aliases for elementpath. """ from typing import Any, List, Optional, NoReturn, Tuple, Type, TYPE_CHECKING, TypeVar, Union from elementpath._typing import MutableMapping ## # Type aliases NamespacesType = MutableMapping[str, str] NsmapType = MutableMapping[Optional[str], str] # compatible with the nsmap of lxml Element AnyNsmapType = Union[NamespacesType, NsmapType, None] # for composition and function arguments NargsType = Optional[Union[int, Tuple[int, Optional[int]]]] ClassCheckType = Union[Type[Any], Tuple[Type[Any], ...]] T = TypeVar('T') Emptiable = Union[T, List[NoReturn]] SequenceType = Union[T, List[T]] InputType = Union[None, T, List[T], Tuple[T, ...]] if TYPE_CHECKING: from elementpath.datatypes import AtomicType, ArithmeticType, NumericType from elementpath.xpath_nodes import ChildNodeType, ParentNodeType, RootArgType from elementpath.xpath_context import ContextType, FunctionArgType, ItemType, \ ItemArgType, ValueType from elementpath.xpath_tokens import XPathParserType, XPathTokenType __all__ = ['NamespacesType', 'NsmapType', 'AnyNsmapType', 'NargsType', 'ClassCheckType', 'Emptiable', 'SequenceType', 'InputType', 'AtomicType', 'ArithmeticType', 'NumericType', 'ChildNodeType', 'ParentNodeType', 'RootArgType', 'ContextType', 'FunctionArgType', 'ItemType', 'ItemArgType', 'ValueType', 'XPathParserType', 'XPathTokenType'] sissaschool-elementpath-d3688c7/elementpath/collations.py000066400000000000000000000150711476131650400237620ustar00rootroot00000000000000# # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import locale import threading from contextlib import AbstractContextManager from types import TracebackType from typing import TYPE_CHECKING, Any, Optional, Tuple, Type, Union from urllib.parse import urljoin, urlsplit from elementpath.exceptions import xpath_error if TYPE_CHECKING: from .xpath_tokens import XPathToken context_class_base = AbstractContextManager[Any] else: context_class_base = AbstractContextManager UNICODE_COLLATION_BASE_URI = "http://www.w3.org/2013/collation/UCA" UNICODE_CODEPOINT_COLLATION = \ "http://www.w3.org/2005/xpath-functions/collation/codepoint" HTML_ASCII_CASE_INSENSITIVE_COLLATION = \ "http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive" XQUERY_TEST_SUITE_CASEBLIND_COLLATION = \ "http://www.w3.org/2010/09/qt-fots-catalog/collation/caseblind" _locale_collate_lock = threading.Lock() def get_locale_category(category: int) -> str: """ Gets the current value of a locale category. A replacement of locale.getdefaultlocale(), deprecated since Python 3.11. """ _locale = locale.setlocale(category, None) if _locale == 'C': # locale category does not seem to be configured, so get the user # preferred locale and then restore the previous state _locale = locale.setlocale(category, '') locale.setlocale(category, 'C') return _locale def unicode_codepoint_strcoll(s1: str, s2: str) -> int: return 0 if s1 == s2 else -1 if s1 < s2 else 1 def unicode_codepoint_strxfrm(s: str) -> str: return s def case_insensitive_strcoll(s1: str, s2: str) -> int: if s1.casefold() == s2.casefold(): return 0 elif s1.casefold() < s2.casefold(): return -1 else: return 1 def case_insensitive_strxfrm(s: str) -> str: return s.casefold() class CollationManager(context_class_base): """ Context Manager for collations. Provide helper operators as methods. """ lc_collate: Union[None, str, Tuple[Optional[str], Optional[str]]] fallback: bool = False _current_lc_collate: Optional[Tuple[Optional[str], Optional[str]]] = None def __init__(self, collation: Optional[str], token: Optional['XPathToken'] = None) -> None: self.collation = collation self.token = token self.strcoll = locale.strcoll self.strxfrm = locale.strxfrm if collation is None: msg = 'collation cannot be an empty sequence' raise xpath_error('XPTY0004', msg, self.token) elif not urlsplit(collation).scheme and token is not None: # Collation is a relative URI: try to complete with the static base URI base_uri = token.parser.base_uri if base_uri: collation = urljoin(base_uri, collation) if collation == UNICODE_CODEPOINT_COLLATION: self.lc_collate = None self.strcoll = unicode_codepoint_strcoll self.strxfrm = unicode_codepoint_strxfrm elif collation == HTML_ASCII_CASE_INSENSITIVE_COLLATION: self.lc_collate = None self.strcoll = case_insensitive_strcoll self.strxfrm = case_insensitive_strxfrm elif collation == XQUERY_TEST_SUITE_CASEBLIND_COLLATION: self.lc_collate = None self.strcoll = case_insensitive_strcoll self.strxfrm = case_insensitive_strxfrm elif collation.startswith(UNICODE_COLLATION_BASE_URI): self.lc_collate = 'en_US.UTF-8' self.fallback = True for param in urlsplit(collation).query.split(';'): assert isinstance(param, str) if param.startswith('lang='): # Language code: should be a string in lexical space of xs:language, # but in implementations '_' can be used instead of hyphens and '.' # is used to provide the encoding. Use UTF-8 as default encoding. lang = param[5:] self.lc_collate = lang if '.' in lang else (lang, 'UTF-8') elif param.startswith('fallback='): if param.endswith('yes'): self.fallback = True elif param.endswith('no'): self.fallback = False else: # Other compatible collations locale lib specs (e.g.: it_IT.UTF-8) self.lc_collate = collation def __enter__(self) -> 'CollationManager': if self.lc_collate is not None: # Only one locale set can be used at a time _locale_collate_lock.acquire() self._current_lc_collate = locale.getlocale(locale.LC_COLLATE) try: locale.setlocale(locale.LC_COLLATE, self.lc_collate) except locale.Error: if not self.fallback: self._current_lc_collate = None _locale_collate_lock.release() msg = f"Unsupported collation {self.collation!r}" raise xpath_error('FOCH0002', msg, self.token) from None locale.setlocale(locale.LC_COLLATE, 'en_US.UTF-8') return self def __exit__(self, exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException], exc_tb: Optional[TracebackType]) -> None: if self._current_lc_collate is not None: locale.setlocale(locale.LC_COLLATE, self._current_lc_collate) self._current_lc_collate = None _locale_collate_lock.release() def eq(self, a: Any, b: Any) -> bool: if not isinstance(a, str) or not isinstance(b, str): return bool(a == b) return self.strcoll(a, b) == 0 def ne(self, a: Any, b: Any) -> bool: if not isinstance(a, str) or not isinstance(b, str): return bool(a != b) return self.strcoll(a, b) != 0 def contains(self, a: str, b: str) -> bool: return self.strxfrm(b) in self.strxfrm(a) def find(self, a: str, b: str) -> int: return self.strxfrm(a).find(self.strxfrm(b)) def startswith(self, a: str, b: str) -> bool: return self.strxfrm(a).startswith(self.strxfrm(b)) def endswith(self, a: str, b: str) -> bool: return self.strxfrm(a).endswith(self.strxfrm(b)) sissaschool-elementpath-d3688c7/elementpath/compare.py000066400000000000000000000407371476131650400232500ustar00rootroot00000000000000# # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import math from decimal import Decimal from functools import cmp_to_key from itertools import zip_longest from typing import Any, Optional from elementpath._typing import Callable, Iterable, Iterator from elementpath.protocols import ElementProtocol from elementpath.exceptions import xpath_error from elementpath.datatypes import UntypedAtomic, AnyURI, AbstractQName from elementpath.collations import UNICODE_CODEPOINT_COLLATION, CollationManager from elementpath.xpath_nodes import XPathNode, EtreeElementNode, TextAttributeNode, \ NamespaceNode, TextNode, CommentNode, ProcessingInstructionNode, EtreeDocumentNode from elementpath.xpath_tokens import XPathToken, XPathFunction, XPathMap, XPathArray def deep_equal(seq1: Iterable[Any], seq2: Iterable[Any], collation: Optional[str] = None, token: Optional[XPathToken] = None) -> bool: etree_node_types = (EtreeElementNode, CommentNode, ProcessingInstructionNode) def etree_deep_equal(e1: ElementProtocol, e2: ElementProtocol) -> bool: if cm.ne(e1.tag, e2.tag): return False elif cm.ne((e1.text or '').strip(), (e2.text or '').strip()): return False elif cm.ne((e1.tail or '').strip(), (e2.tail or '').strip()): return False elif len(e1) != len(e2) or len(e1.attrib) != len(e2.attrib): return False try: items1 = {(cm.strxfrm(k or ''), cm.strxfrm(v)) # type: ignore[arg-type] for k, v in e1.attrib.items()} items2 = {(cm.strxfrm(k or ''), cm.strxfrm(v)) # type: ignore[arg-type] for k, v in e2.attrib.items()} except TypeError: return False if items1 != items2: return False return all(etree_deep_equal(c1, c2) for c1, c2 in zip(e1, e2)) if collation is None: collation = UNICODE_CODEPOINT_COLLATION with CollationManager(collation, token=token) as cm: for value1, value2 in zip_longest(seq1, seq2): if isinstance(value1, XPathFunction) and \ not isinstance(value1, (XPathMap, XPathArray)): raise xpath_error('FOTY0015', token=token) if isinstance(value2, XPathFunction) and \ not isinstance(value2, (XPathMap, XPathArray)): raise xpath_error('FOTY0015', token=token) if (value1 is None) ^ (value2 is None): return False elif value1 is None: return True elif isinstance(value1, XPathNode) ^ isinstance(value2, XPathNode): return False elif isinstance(value1, XPathNode): assert isinstance(value2, XPathNode) if value1.__class__ != value2.__class__: return False elif isinstance(value1, etree_node_types): assert isinstance(value2, etree_node_types) if not etree_deep_equal(value1.obj, value2.obj): return False elif isinstance(value1, EtreeDocumentNode): assert isinstance(value2, EtreeDocumentNode) for child1, child2 in zip_longest(value1, value2): if child1 is None or child2 is None: return False elif child1.__class__ != child2.__class__: return False elif isinstance(child1, etree_node_types): assert isinstance(child2, etree_node_types) if not etree_deep_equal(child1.obj, child2.obj): return False elif isinstance(child1, TextNode): assert isinstance(child2, TextNode) if cm.ne(child1.obj, child2.obj): return False elif cm.ne(value1.obj, value2.obj): return False elif isinstance(value1, TextAttributeNode): if cm.ne(value1.name, value2.name): return False elif isinstance(value1, NamespaceNode): assert isinstance(value2, NamespaceNode) if cm.ne(value1.prefix, value2.prefix): return False else: try: if isinstance(value1, bool): if not isinstance(value2, bool) or value1 is not value2: return False elif isinstance(value2, bool): return False if isinstance(value1, AbstractQName): if not isinstance(value2, AbstractQName) or value1 != value2: return False elif isinstance(value2, AbstractQName): return False elif isinstance(value1, (str, AnyURI, UntypedAtomic)) \ and isinstance(value2, (str, AnyURI, UntypedAtomic)): if cm.strcoll(str(value1), str(value2)): return False elif isinstance(value1, UntypedAtomic) \ or isinstance(value2, UntypedAtomic): return False elif isinstance(value1, float): if math.isnan(value1): if not math.isnan(value2): return False elif math.isinf(value1): if value1 != value2: return False elif isinstance(value2, Decimal): if value1 != float(value2): return False elif not isinstance(value2, (value1.__class__, int)): return False elif value1 != value2: return False elif isinstance(value2, float): if math.isnan(value2): return False elif math.isinf(value2): if value1 != value2: return False elif isinstance(value1, Decimal): if value2 != float(value1): return False elif not isinstance(value1, (value2.__class__, int)): return False elif value1 != value2: return False elif value1 != value2: return False except TypeError: return False return True def is_empty_sequence(x: Any) -> bool: return not x and isinstance(x, list) def deep_compare(obj1: Any, obj2: Any, collation: Optional[str] = None, token: Optional[XPathToken] = None) -> int: msg_tmpl = "Sorting failed, cannot compare {!r} with {!r}" etree_node_types = (EtreeElementNode, CommentNode, ProcessingInstructionNode) result: int = 0 def iter_object(obj: Any) -> Iterator[Any]: if isinstance(obj, XPathArray): yield from obj.items() elif isinstance(obj, (list, Iterator)): yield from obj else: yield obj def etree_deep_compare(e1: ElementProtocol, e2: ElementProtocol) -> int: nonlocal result result = cm.strcoll(e1.tag, e2.tag) if result: return result result = cm.strcoll((e1.text or '').strip(), (e2.text or '').strip()) if result: return result for a1, a2 in zip_longest(e1.attrib.items(), e2.attrib.items()): if a1 is None: return 1 elif a2 is None: return -1 result = cm.strcoll(a1[0], a2[0]) or cm.strcoll(a1[1], a2[1]) if result: return result for c1, c2 in zip_longest(e1, e2): if c1 is None: return 1 elif c2 is None: return -1 result = etree_deep_compare(c1, c2) if result: return result else: result = cm.strcoll((e1.tail or '').strip(), (e2.tail or '').strip()) if result: return result return 0 if collation is None: collation = UNICODE_CODEPOINT_COLLATION with CollationManager(collation, token=token) as cm: for value1, value2 in zip_longest(iter_object(obj1), iter_object(obj2)): if isinstance(value1, XPathFunction) and \ not isinstance(value1, XPathArray): raise xpath_error('FOTY0013', token=token) if isinstance(value2, XPathFunction) and \ not isinstance(value2, XPathArray): raise xpath_error('FOTY0013', token=token) if (value1 is None) ^ (value2 is None): return -1 if value1 is None else 1 if is_empty_sequence(value1) ^ is_empty_sequence(value2): return -1 if is_empty_sequence(value1) else 1 if isinstance(value1, XPathNode) ^ isinstance(value2, XPathNode): msg = f"cannot compare {type(value1)} with {type(value2)}" raise xpath_error('XPTY0004', msg, token=token) elif isinstance(value1, XPathNode): assert isinstance(value2, XPathNode) if value1.__class__ != value2.__class__: msg = f"cannot compare {type(value1)} with {type(value2)}" raise xpath_error('XPTY0004', msg, token=token) elif isinstance(value1, etree_node_types): assert isinstance(value2, etree_node_types) result = etree_deep_compare(value1.obj, value2.obj) if result: return result elif isinstance(value1, EtreeDocumentNode): assert isinstance(value2, EtreeDocumentNode) for child1, child2 in zip_longest(value1, value2): if child1 is None: return -1 elif child2 is None: return 1 elif child1.__class__ != child2.__class__: msg = f"cannot compare {type(child1)} with {type(child2)}" raise xpath_error('XPTY0004', msg, token=token) elif isinstance(child1, etree_node_types): assert isinstance(child2, etree_node_types) result = etree_deep_compare(child1.obj, child2.obj) if result: return result elif isinstance(child1, TextNode): assert isinstance(child2, TextNode) result = cm.strcoll( child1.obj.strip(), child2.obj.strip() ) if result: return result elif isinstance(value1, TextNode): assert isinstance(value2, TextNode) result = cm.strcoll(value1.obj, value2.obj) if result: return result elif isinstance(value1, TextAttributeNode): assert isinstance(value2, TextAttributeNode) result = cm.strcoll(value1.name or '', value2.name or '') if result: return result elif isinstance(value1, NamespaceNode): assert isinstance(value2, NamespaceNode) result = cm.strcoll(value1.prefix or '', value2.prefix or '') if result: return result else: try: if isinstance(value1, bool): if not isinstance(value2, bool): return -1 elif value1 is not value2: return -1 if value1 else 1 elif isinstance(value2, bool): return -1 elif isinstance(value1, UntypedAtomic): if isinstance(value2, UntypedAtomic): result = cm.strcoll(str(value1), str(value2)) if result: return result else: msg = msg_tmpl.format(value1, value2) raise xpath_error('XPTY0004', msg, token) elif isinstance(value2, UntypedAtomic): msg = msg_tmpl.format(value1, value2) raise xpath_error('XPTY0004', msg, token) elif isinstance(value1, float): if math.isnan(value1): if not math.isnan(value2): return -1 elif math.isinf(value1): if value1 != value2: return -1 if value1 < value2 else 1 elif isinstance(value2, Decimal): if value1 != float(value2): return -1 if value1 < float(value2) else 1 elif not isinstance(value2, (value1.__class__, int)): return -1 elif value1 != value2: return -1 if value1 < value2 else 1 elif isinstance(value2, float): if math.isnan(value2): return -1 elif math.isinf(value2): if value1 != value2: return -1 if value1 < value2 else 1 elif isinstance(value1, Decimal): if value2 != float(value1): return -1 if float(value1) < value2 else 1 elif not isinstance(value1, (value2.__class__, int)): return -1 elif value1 != value2: return -1 if value1 < value2 else 1 elif isinstance(value1, (str, AnyURI, UntypedAtomic)) \ and isinstance(value2, (str, AnyURI, UntypedAtomic)): result = cm.strcoll(str(value1), str(value2)) if result: return result elif value1 != value2: return -1 if value1 < value2 else 1 except TypeError as err: raise xpath_error('XPTY0004', message_or_error=err, token=token) return 0 def get_key_function(collation: Optional[str] = None, key_func: Optional[Callable[[Any], Any]] = None, token: Optional[XPathToken] = None) -> Any: def compare_func(obj1: Any, obj2: Any) -> int: if key_func is not None: if isinstance(obj1, (list, Iterator)): obj1 = map(key_func, obj1) else: obj1 = key_func(obj1) if isinstance(obj2, (list, Iterator)): obj2 = map(key_func, obj2) else: obj2 = key_func(obj2) return deep_compare(obj1, obj2, collation, token) return cmp_to_key(compare_func) def same_key(k1: Any, k2: Any) -> bool: if isinstance(k1, (str, AnyURI, UntypedAtomic)): if not isinstance(k2, (str, AnyURI, UntypedAtomic)): return False return str(k1) == str(k2) elif isinstance(k1, float) and math.isnan(k1): return isinstance(k2, float) and math.isnan(k2) elif isinstance(k1, AbstractQName) ^ isinstance(k2, AbstractQName): return False try: return True if k1 == k2 else False except TypeError: return False # EAFP :) sissaschool-elementpath-d3688c7/elementpath/datatypes/000077500000000000000000000000001476131650400232335ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/datatypes/__init__.py000066400000000000000000000061601476131650400253470ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XSD atomic datatypes subpackage. Includes a class for UntypedAtomic data and classes for other XSD built-in types. This subpackage raises only built-in exceptions in order to be reusable in other packages. """ from decimal import Decimal from typing import Union from .atomic_types import xsd10_atomic_types, xsd11_atomic_types, \ AtomicTypeMeta, AnyAtomicType from .untyped import UntypedAtomic from .qname import AbstractQName, QName, Notation from .numeric import Float10, Float, Integer, Int, NegativeInteger, \ PositiveInteger, NonNegativeInteger, NonPositiveInteger, Long, \ Short, Byte, UnsignedByte, UnsignedInt, UnsignedLong, UnsignedShort from .string import NormalizedString, XsdToken, Name, NCName, NMToken, Id, \ Idref, Language, Entity from .uri import AnyURI from .binary import AbstractBinary, Base64Binary, HexBinary from .datetime import AbstractDateTime, DateTime10, DateTime, DateTimeStamp, \ Date10, Date, GregorianDay, GregorianMonth, GregorianYear, GregorianYear10, \ GregorianMonthDay, GregorianYearMonth, GregorianYearMonth10, Time, Timezone, \ Duration, DayTimeDuration, YearMonthDuration, OrderedDateTime from .proxies import BooleanProxy, DecimalProxy, DoubleProxy10, DoubleProxy, \ StringProxy, NumericProxy, ArithmeticProxy xsd11_atomic_types.update( (k, v) for k, v in xsd10_atomic_types.items() if k not in xsd11_atomic_types ) ### # Aliases for type annotations AtomicType = Union[str, int, float, Decimal, bool, AnyAtomicType] NumericType = Union[int, float, Decimal] ArithmeticType = Union[NumericType, AbstractDateTime, Duration, UntypedAtomic] DatetimeValueType = AbstractDateTime # keep until v5.0 for backward compatibility __all__ = ['xsd10_atomic_types', 'xsd11_atomic_types', 'AtomicTypeMeta', 'AnyAtomicType', 'NumericProxy', 'ArithmeticProxy', 'AbstractDateTime', 'DateTime10', 'DateTime', 'DateTimeStamp', 'Date10', 'Date', 'Time', 'GregorianDay', 'GregorianMonth', 'GregorianMonthDay', 'GregorianYear10', 'GregorianYear', 'GregorianYearMonth10', 'GregorianYearMonth', 'Timezone', 'Duration', 'YearMonthDuration', 'DayTimeDuration', 'StringProxy', 'NormalizedString', 'XsdToken', 'Language', 'Name', 'NCName', 'Id', 'Idref', 'Entity', 'NMToken', 'Base64Binary', 'HexBinary', 'Float10', 'Float', 'Integer', 'NonPositiveInteger', 'NegativeInteger', 'Long', 'Int', 'Short', 'Byte', 'NonNegativeInteger', 'PositiveInteger', 'UnsignedLong', 'UnsignedInt', 'UnsignedShort', 'UnsignedByte', 'AnyURI', 'Notation', 'QName', 'BooleanProxy', 'DecimalProxy', 'DoubleProxy10', 'DoubleProxy', 'UntypedAtomic', 'AbstractBinary', 'AtomicType', 'DatetimeValueType', 'OrderedDateTime', 'AbstractQName', 'NumericType', 'ArithmeticType'] sissaschool-elementpath-d3688c7/elementpath/datatypes/atomic_types.py000066400000000000000000000073431476131650400263140ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import ABCMeta, abstractmethod from typing import Any, Dict, Optional, Tuple, Type import re from elementpath._typing import Pattern XSD_NAMESPACE = "http://www.w3.org/2001/XMLSchema" ### # Classes for XSD built-in atomic types. All defined classes use a # metaclass that adds some common methods and registers each class # into a dictionary. Some classes of XSD primitive types are defined # as proxies of basic Python datatypes. xsd10_atomic_types: Dict[Optional[str], 'AtomicTypeMeta'] = {} """Dictionary of builtin XSD 1.0 atomic types.""" xsd11_atomic_types: Dict[Optional[str], 'AtomicTypeMeta'] = {} """Dictionary of builtin XSD 1.1 atomic types.""" class AtomicTypeMeta(ABCMeta): """ Metaclass for creating XSD atomic types. The created classes are decorated with missing attributes and methods. When a name attribute is provided the class is registered into a global map of XSD atomic types and also the expanded name is added. """ xsd_version: str pattern: Pattern[str] name: Optional[str] = None def __new__(mcs, class_name: str, bases: Tuple[Type[Any], ...], dict_: Dict[str, Any]) \ -> 'AtomicTypeMeta': try: name = dict_['name'] except KeyError: name = dict_['name'] = None # do not inherit name if name is not None and not isinstance(name, str): raise TypeError("attribute 'name' must be a string or None") dict_['is_valid'] = classmethod(mcs.is_valid) dict_['invalid_type'] = classmethod(mcs.invalid_type) dict_['invalid_value'] = classmethod(mcs.invalid_value) cls = super(AtomicTypeMeta, mcs).__new__(mcs, class_name, bases, dict_) # Add missing attributes and methods if not hasattr(cls, 'xsd_version'): cls.xsd_version = '1.0' if not hasattr(cls, 'pattern'): cls.pattern = re.compile(r'^$') # Register class with a name if name: expanded_name = '{%s}%s' % (XSD_NAMESPACE, name) if cls.xsd_version == '1.0': xsd10_atomic_types[name] = xsd10_atomic_types[expanded_name] = cls else: xsd11_atomic_types[name] = xsd11_atomic_types[expanded_name] = cls return cls def validate(cls: Type[Any], value: object) -> None: if isinstance(value, cls): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) def is_valid(cls: Type[Any], value: object) -> bool: try: cls.validate(value) except (TypeError, ValueError): return False else: return True def invalid_type(cls: Type[Any], value: object) -> TypeError: if cls.name: return TypeError('invalid type {!r} for xs:{}'.format(type(value), cls.name)) return TypeError('invalid type {!r} for {!r}'.format(type(value), cls)) def invalid_value(cls: Type[Any], value: object) -> ValueError: if cls.name: return ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return ValueError('invalid value {!r} for {!r}'.format(value, cls)) class AnyAtomicType(metaclass=AtomicTypeMeta): name = 'anyAtomicType' @abstractmethod def __init__(self, value: Any) -> None: raise NotImplementedError() sissaschool-elementpath-d3688c7/elementpath/datatypes/binary.py000066400000000000000000000136011476131650400250720ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from abc import abstractmethod from typing import Any, Callable, Union import codecs from elementpath.helpers import collapse_white_spaces from .atomic_types import AnyAtomicType from .untyped import UntypedAtomic class AbstractBinary(AnyAtomicType): """ Abstract class for xs:base64Binary data. :param value: a string or a binary data or an untyped atomic instance. :param ordered: a boolean that enable total ordering for the instance, `False` for default. """ value: bytes invalid_type: Callable[[Any], TypeError] def __init__(self, value: Union[str, bytes, UntypedAtomic, 'AbstractBinary'], ordered: bool = False) -> None: self.ordered = ordered if isinstance(value, self.__class__): self.value = value.value elif isinstance(value, AbstractBinary): self.value = self.encoder(value.decode()) else: if isinstance(value, UntypedAtomic): value = collapse_white_spaces(value.value) elif isinstance(value, str): value = collapse_white_spaces(value) elif isinstance(value, bytes): value = collapse_white_spaces(value.decode('utf-8')) else: raise self.invalid_type(value) self.validate(value) self.value = value.replace(' ', '').encode('ascii') def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.value) def __bytes__(self) -> bytes: return self.value @classmethod def validate(cls, value: object) -> None: raise NotImplementedError() @staticmethod @abstractmethod def encoder(value: bytes) -> bytes: raise NotImplementedError() @abstractmethod def decode(self) -> bytes: raise NotImplementedError() def __eq__(self, other: object) -> bool: if isinstance(other, AbstractBinary): return self.decode() == other.decode() else: return NotImplemented def __lt__(self, other: object) -> bool: if not self.ordered or not isinstance(other, AbstractBinary): return NotImplemented for oct1, oct2 in zip(self.decode(), other.decode()): if oct1 != oct2: return oct1 < oct2 return len(self.decode()) < len(other.decode()) def __le__(self, other: object) -> bool: if not self.ordered or not isinstance(other, AbstractBinary): return NotImplemented for oct1, oct2 in zip(self.decode(), other.decode()): if oct1 != oct2: return oct1 < oct2 return len(self.decode()) <= len(other.decode()) def __gt__(self, other: object) -> bool: if not self.ordered or not isinstance(other, AbstractBinary): return NotImplemented for oct1, oct2 in zip(self.decode(), other.decode()): if oct1 != oct2: return oct1 > oct2 return len(self.decode()) > len(other.decode()) def __ge__(self, other: object) -> bool: if not self.ordered or not isinstance(other, AbstractBinary): return NotImplemented for oct1, oct2 in zip(self.decode(), other.decode()): if oct1 != oct2: return oct1 > oct2 return len(self.decode()) >= len(other.decode()) class Base64Binary(AbstractBinary): name = 'base64Binary' pattern = re.compile( r'((?:(?:[A-Za-z0-9+/] ?){4})*(?:(?:[A-Za-z0-9+/] ?){3}[A-Za-z0-9+/]|' r'(?:[A-Za-z0-9+/] ?){2}' r'[AEIMQUYcgkosw048] ?=|[A-Za-z0-9+/] ?[AQgw] ?= ?=))?' ) @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, bytes): value = value.decode() elif not isinstance(value, str): raise cls.invalid_type(value) value = value.replace(' ', '') if value: match = cls.pattern.match(value) if match is None or match.group(0) != value: raise cls.invalid_value(value) def __str__(self) -> str: return self.value.decode('utf-8') def __hash__(self) -> int: return hash(self.value) def __len__(self) -> int: length = len(self.value) if length == 0: return 0 elif self.value[-2] == ord('='): return length // 4 * 3 - 2 elif self.value[-1] == ord('='): return length // 4 * 3 - 1 return length // 4 * 3 @staticmethod def encoder(value: bytes) -> bytes: return codecs.encode(value, 'base64').rstrip(b'\n') def decode(self) -> bytes: return codecs.decode(self.value, 'base64') class HexBinary(AbstractBinary): name = 'hexBinary' pattern = re.compile(r'^([0-9a-fA-F]{2})*$') @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, bytes): value = value.decode() elif not isinstance(value, str): raise cls.invalid_type(value) value = value.strip() if cls.pattern.match(value) is None: raise cls.invalid_value(value) @staticmethod def encoder(value: bytes) -> bytes: return codecs.encode(value, 'hex') def decode(self) -> bytes: return codecs.decode(self.value, 'hex') def __str__(self) -> str: return self.value.decode('utf-8').upper() def __hash__(self) -> int: return hash(self.value.upper()) def __len__(self) -> int: return len(self.value) // 2 sissaschool-elementpath-d3688c7/elementpath/datatypes/datetime.py000066400000000000000000001171771476131650400254170ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import abstractmethod import re import math import operator import datetime from calendar import isleap from decimal import Decimal, Context from typing import cast, Any, Dict, Optional, Tuple, Type, TypeVar, Union from elementpath._typing import Callable from elementpath.helpers import MONTH_DAYS_LEAP, MONTH_DAYS, DAYS_IN_4Y, \ DAYS_IN_100Y, DAYS_IN_400Y, days_from_common_era, adjust_day, \ normalized_seconds, months2days, round_number from .atomic_types import AnyAtomicType from .untyped import UntypedAtomic class Timezone(datetime.tzinfo): """ A tzinfo implementation for XSD timezone offsets. Offsets must be specified between -14:00 and +14:00. :param offset: a timedelta instance or an XSD timezone formatted string. """ _maxoffset = datetime.timedelta(hours=14, minutes=0) _minoffset = -_maxoffset def __init__(self, offset: datetime.timedelta) -> None: super(Timezone, self).__init__() if not isinstance(offset, datetime.timedelta): raise TypeError("offset must be a datetime.timedelta") if offset < self._minoffset or offset > self._maxoffset: raise ValueError("offset must be between -14:00 and +14:00") self.offset = offset @classmethod def fromstring(cls, text: str) -> 'Timezone': try: hours, minutes = text.strip().split(':') if hours.startswith('-'): return cls(datetime.timedelta(hours=int(hours), minutes=-int(minutes))) else: return cls(datetime.timedelta(hours=int(hours), minutes=int(minutes))) except AttributeError: raise TypeError("argument is not a string") except ValueError: if text.strip() == 'Z': return cls(datetime.timedelta(0)) raise ValueError("%r: not an XSD timezone formatted string" % text) from None @classmethod def fromduration(cls, duration: 'Duration') -> 'Timezone': if duration.seconds % 60 != 0: raise ValueError("{!r} has not an integral number of minutes".format(duration)) return cls(datetime.timedelta(seconds=int(duration.seconds))) def __getinitargs__(self) -> Tuple[datetime.timedelta]: return self.offset, def __hash__(self) -> int: return hash(self.offset) def __eq__(self, other: object) -> bool: return isinstance(other, Timezone) and self.offset == other.offset def __ne__(self, other: object) -> bool: return not isinstance(other, Timezone) or self.offset != other.offset def __repr__(self) -> str: return "%s(%r)" % (self.__class__.__name__, self.offset) def __str__(self) -> str: return self.tzname(None) def utcoffset(self, dt: Optional[datetime.datetime]) -> datetime.timedelta: if not isinstance(dt, datetime.datetime) and dt is not None: raise TypeError("utcoffset() argument must be a " "datetime.datetime instance or None") return self.offset def tzname(self, dt: Optional[datetime.datetime]) -> str: if not isinstance(dt, datetime.datetime) and dt is not None: raise TypeError("tzname() argument must be a " "datetime.datetime instance or None") if not self.offset: return 'Z' elif self.offset < datetime.timedelta(0): sign, offset = '-', -self.offset else: sign, offset = '+', self.offset hours, minutes = offset.seconds // 3600, offset.seconds // 60 % 60 return '{}{:02d}:{:02d}'.format(sign, hours, minutes) def dst(self, dt: Optional[datetime.datetime]) -> None: if not isinstance(dt, datetime.datetime) and dt is not None: raise TypeError("dst() argument must be a " "datetime.datetime instance or None") def fromutc(self, dt: datetime.datetime) -> datetime.datetime: if isinstance(dt, datetime.datetime): return dt + self.offset raise TypeError("fromutc() argument must be a datetime.datetime instance") _DT = TypeVar('_DT', bound='AbstractDateTime') class AbstractDateTime(AnyAtomicType): """ A class for representing XSD date/time objects. It uses and internal datetime.datetime attribute and an integer attribute for processing BCE years or for years after 9999 CE. """ xsd_version = '1.0' pattern = re.compile(r'^$') _utc_timezone = Timezone(datetime.timedelta(0)) _year = None def __init__(self, year: int = 2000, month: int = 1, day: int = 1, hour: int = 0, minute: int = 0, second: int = 0, microsecond: int = 0, tzinfo: Optional[datetime.tzinfo] = None) -> None: if hour == 24 and minute == second == microsecond == 0: hour = 0 if year == 9999 and month == 12 and day == 31: delta = datetime.timedelta(0) year = 10000 month = 1 day = 1 else: delta = datetime.timedelta(days=1) hour = 0 else: delta = datetime.timedelta(0) if 1 <= year <= 9999: self._dt = datetime.datetime(year, month, day, hour, minute, second, microsecond, tzinfo) elif year == 0: raise ValueError('0 is an illegal value for year') elif not isinstance(year, int): raise TypeError("invalid type %r for year" % type(year)) elif abs(year) > 2 ** 31: raise OverflowError("year overflow") else: self._year = year if isleap(year + bool(self.xsd_version != '1.0')): self._dt = datetime.datetime(4, month, day, hour, minute, second, microsecond, tzinfo) else: self._dt = datetime.datetime(6, month, day, hour, minute, second, microsecond, tzinfo) if delta: self._dt += delta def __repr__(self) -> str: fields = self.pattern.groupindex.keys() arg_string = ', '.join( str(getattr(self, k)) for k in ['year', 'month', 'day', 'hour', 'minute'] if k in fields ) if 'second' in fields: if self.microsecond: arg_string += ', %d.%06d' % (self.second, self.microsecond) else: arg_string += ', %d' % self.second if self.tzinfo is not None: arg_string += ', tzinfo=%r' % self.tzinfo return '%s(%s)' % (self.__class__.__name__, arg_string) @abstractmethod def __str__(self) -> str: raise NotImplementedError @abstractmethod def __lt__(self, other: object) -> bool: raise NotImplementedError @abstractmethod def __le__(self, other: object) -> bool: raise NotImplementedError @abstractmethod def __gt__(self, other: object) -> bool: raise NotImplementedError @abstractmethod def __ge__(self, other: object) -> bool: raise NotImplementedError @abstractmethod def __add__(self, other: object) -> Any: raise NotImplementedError @abstractmethod def __sub__(self, other: object) -> Any: raise NotImplementedError @property def year(self) -> int: return self._year or self._dt.year @property def bce(self) -> bool: return self._year is not None and self._year < 0 @property def iso_year(self) -> str: """The ISO string representation of the year field.""" year = self.year if -9999 <= year < -1: return '{:05}'.format(year if self.xsd_version == '1.0' else year + 1) elif year == -1: return '-0001' if self.xsd_version == '1.0' else '0000' elif 0 <= year <= 9999: return '{:04}'.format(year) else: return str(year) @property def month(self) -> int: return self._dt.month @property def day(self) -> int: return self._dt.day @property def hour(self) -> int: return self._dt.hour @property def minute(self) -> int: return self._dt.minute @property def second(self) -> int: return self._dt.second @property def microsecond(self) -> int: return self._dt.microsecond @property def tzinfo(self) -> Optional[Timezone]: return cast(Timezone, self._dt.tzinfo) @tzinfo.setter def tzinfo(self, tz: Timezone) -> None: self._dt = self._dt.replace(tzinfo=tz) def tzname(self) -> Optional[str]: return self._dt.tzname() def astimezone(self, tz: Optional[datetime.tzinfo] = None) -> datetime.datetime: return self._dt.astimezone(tz) def isocalendar(self) -> Tuple[int, int, int]: return self._dt.isocalendar() @classmethod def fromstring(cls: Type[_DT], datetime_string: str, tzinfo: Optional[Timezone] = None) \ -> _DT: """ Creates an XSD date/time instance from a string formatted value. :param datetime_string: a string containing an XSD formatted date/time specification. :param tzinfo: optional implicit timezone information, must be a `Timezone` instance. :return: an AbstractDateTime concrete subclass instance. """ if not isinstance(datetime_string, str): msg = '1st argument has an invalid type {!r}' raise TypeError(msg.format(type(datetime_string))) elif tzinfo and not isinstance(tzinfo, Timezone): msg = '2nd argument has an invalid type {!r}' raise TypeError(msg.format(type(tzinfo))) match = cls.pattern.match(datetime_string.strip()) if match is None: msg = 'Invalid datetime string {!r} for {!r}' raise ValueError(msg.format(datetime_string, cls)) match_dict = match.groupdict() kwargs: Dict[str, int] = { k: int(v) for k, v in match_dict.items() if k != 'tzinfo' and v is not None } if match_dict['tzinfo'] is not None: tzinfo = Timezone.fromstring(match_dict['tzinfo']) if 'microsecond' in kwargs: microseconds = match_dict['microsecond'] if len(microseconds) != 6: microseconds += '0' * (6 - len(microseconds)) kwargs['microsecond'] = int(microseconds[:6]) if 'year' in kwargs: year_digits = match_dict['year'].lstrip('-') if year_digits.startswith('0') and len(year_digits) > 4: msg = "Invalid datetime string {!r} for {!r} (when year " \ "exceeds 4 digits leading zeroes are not allowed)" raise ValueError(msg.format(datetime_string, cls)) if cls.xsd_version == '1.0': if kwargs['year'] == 0: raise ValueError("year '0000' is an illegal value for XSD 1.0") elif kwargs['year'] <= 0: kwargs['year'] -= 1 return cls(tzinfo=tzinfo, **kwargs) @classmethod def fromdatetime(cls: Type[_DT], dt: Union[datetime.datetime, datetime.date, datetime.time], year: Optional[int] = None) -> _DT: """ Creates an XSD date/time instance from a datetime.datetime/date/time instance. :param dt: the datetime, date or time instance that stores the XSD Date/Time value. :param year: if a year is provided the created instance refers to it and the \ possibly present *dt.year* part is ignored. :return: an AbstractDateTime concrete subclass instance. """ if not isinstance(dt, (datetime.datetime, datetime.date, datetime.time)): raise TypeError('1st argument has an invalid type %r' % type(dt)) elif year is not None and not isinstance(year, int): raise TypeError('2nd argument has an invalid type %r' % type(year)) kwargs = {k: getattr(dt, k) for k in cls.pattern.groupindex.keys() if hasattr(dt, k)} if year is not None: kwargs['year'] = year return cls(**kwargs) # Python can't compare offset-naive and offset-aware datetimes def _get_operands(self, other: object) -> Tuple[datetime.datetime, datetime.datetime]: if isinstance(other, (self.__class__, datetime.datetime)) or \ isinstance(self, other.__class__): dt: datetime.datetime = getattr(other, '_dt', cast(datetime.datetime, other)) if self._dt.tzinfo is dt.tzinfo: return self._dt, dt elif self.tzinfo is None: return self._dt.replace(tzinfo=self._utc_timezone), dt elif dt.tzinfo is None: return self._dt, dt.replace(tzinfo=self._utc_timezone) else: return self._dt, dt else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) def __hash__(self) -> int: return hash((self._dt, self._year)) def __eq__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return False try: return operator.eq(*self._get_operands(other)) and self.year == other.year except TypeError: return False def __ne__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return True try: return operator.ne(*self._get_operands(other)) or self.year != other.year except TypeError: return True class OrderedDateTime(AbstractDateTime): @abstractmethod def __str__(self) -> str: raise NotImplementedError @classmethod def fromdelta(cls, delta: datetime.timedelta, adjust_timezone: bool = False) \ -> 'OrderedDateTime': """ Creates an XSD dateTime/date instance from a datetime.timedelta related to 0001-01-01T00:00:00 CE. In case of a date the time part is not counted. :param delta: a datetime.timedelta instance. :param adjust_timezone: if `True` adjusts the timezone of Date objects \ with eventually present hours and minutes. """ try: dt = datetime.datetime(1, 1, 1) + delta except OverflowError: days = delta.days if days > 0: y400, days = divmod(days, DAYS_IN_400Y) y100, days = divmod(days, DAYS_IN_100Y) y4, days = divmod(days, DAYS_IN_4Y) y1, days = divmod(days, 365) year = y400 * 400 + y100 * 100 + y4 * 4 + y1 + 1 if y1 == 4 or y100 == 4: year -= 1 days = 365 td = datetime.timedelta(days=days, seconds=delta.seconds, microseconds=delta.microseconds) dt = datetime.datetime(4 if isleap(year) else 6, 1, 1) + td elif days >= -366: year = -1 td = datetime.timedelta(days=days, seconds=delta.seconds, microseconds=delta.microseconds) dt = datetime.datetime(5, 1, 1) + td else: days = -days - 366 y400, days = divmod(days, DAYS_IN_400Y) y100, days = divmod(days, DAYS_IN_100Y) y4, days = divmod(days, DAYS_IN_4Y) y1, days = divmod(days, 365) year = -y400 * 400 - y100 * 100 - y4 * 4 - y1 - 2 if y1 == 4 or y100 == 4: year += 1 days = 365 td = datetime.timedelta(days=-days, seconds=delta.seconds, microseconds=delta.microseconds) if not td: dt = datetime.datetime(4 if isleap(year + 1) else 6, 1, 1) year += 1 else: dt = datetime.datetime(5 if isleap(year + 1) else 7, 1, 1) + td else: year = dt.year if issubclass(cls, Date10): if adjust_timezone and (dt.hour or dt.minute): assert dt.tzinfo is None hour, minute = dt.hour, dt.minute if hour < 14 or hour == 14 and minute == 0: tz = Timezone(datetime.timedelta(hours=-hour, minutes=-minute)) dt = dt.replace(tzinfo=tz) else: tz = Timezone(datetime.timedelta(hours=-dt.hour + 24, minutes=-minute)) dt = dt.replace(tzinfo=tz) dt += datetime.timedelta(days=1) return cls(year, dt.month, dt.day, tzinfo=dt.tzinfo) return cls(year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) def todelta(self) -> datetime.timedelta: """Returns the datetime.timedelta from 0001-01-01T00:00:00 CE.""" if self._year is None: delta = operator.sub(*self._get_operands(datetime.datetime(1, 1, 1))) return cast(datetime.timedelta, delta) year, dt = self.year, self._dt tzinfo = None if dt.tzinfo is None else self._utc_timezone if year > 0: m_days = MONTH_DAYS_LEAP if isleap(year) else MONTH_DAYS days = days_from_common_era(year - 1) + sum(m_days[m] for m in range(1, dt.month)) else: m_days = MONTH_DAYS_LEAP if isleap(year + 1) else MONTH_DAYS days = days_from_common_era(year) + sum(m_days[m] for m in range(1, dt.month)) delta = (dt - datetime.datetime(dt.year, dt.month, day=1, tzinfo=tzinfo)) return datetime.timedelta(days=days, seconds=delta.total_seconds()) def _date_operator(self, op: Callable[[Any, Any], Any], other: object) \ -> Union['DayTimeDuration', 'OrderedDateTime']: if isinstance(other, self.__class__): dt1, dt2 = self._get_operands(other) if self._year is None and other._year is None: return DayTimeDuration.fromtimedelta(dt1 - dt2) return DayTimeDuration.fromtimedelta(self.todelta() - other.todelta()) elif isinstance(other, datetime.timedelta): delta = op(self.todelta(), other) return type(self).fromdelta(delta, adjust_timezone=True) elif isinstance(other, DayTimeDuration): delta = op(self.todelta(), other.get_timedelta()) tzinfo = cast(Optional[Timezone], self._dt.tzinfo) if tzinfo is None: return type(self).fromdelta(delta) value = type(self).fromdelta(delta + tzinfo.offset) value.tzinfo = tzinfo return value elif isinstance(other, YearMonthDuration): month = op(self._dt.month - 1, other.months) % 12 + 1 year = self.year + op(self._dt.month - 1, other.months) // 12 day = adjust_day(year, month, self._dt.day) if year > 0: dt = self._dt.replace(year=year, month=month, day=day) elif isleap(year): dt = self._dt.replace(year=4, month=month, day=day) else: dt = self._dt.replace(year=6, month=month, day=day) kwargs = {k: getattr(dt, k) for k in self.pattern.groupindex.keys()} if year <= 0: kwargs['year'] = year return type(self)(**kwargs) else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) def __lt__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 < y2 or y1 == y2 and dt1 < dt2 def __le__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 < y2 or y1 == y2 and dt1 <= dt2 def __gt__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 > y2 or y1 == y2 and dt1 > dt2 def __ge__(self, other: object) -> bool: if not isinstance(other, (AbstractDateTime, datetime.datetime)): return NotImplemented dt1, dt2 = self._get_operands(other) y1, y2 = self.year, other.year return y1 > y2 or y1 == y2 and dt1 >= dt2 def __add__(self, other: object) -> Union['DayTimeDuration', 'OrderedDateTime']: if isinstance(other, OrderedDateTime): raise TypeError("wrong type %r for operand %r" % (type(other), other)) return self._date_operator(operator.add, other) def __sub__(self, other: object) -> Union['DayTimeDuration', 'OrderedDateTime']: return self._date_operator(operator.sub, other) class DateTime10(OrderedDateTime): """XSD 1.0 xs:dateTime builtin type""" name = 'dateTime' pattern = re.compile( r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})-(?P[0-9]{2})' r'(T(?P[0-9]{2}):(?P[0-9]{2}):' r'(?P[0-9]{2})(?:\.(?P[0-9]+))?)' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, month: int, day: int, hour: int = 0, minute: int = 0, second: int = 0, microsecond: int = 0, tzinfo: Optional[datetime.tzinfo] = None) -> None: super(DateTime10, self).__init__( year, month, day, hour, minute, second, microsecond, tzinfo ) def __str__(self) -> str: if self.microsecond: return '{}-{:02}-{:02}T{:02}:{:02}:{:02}.{}{}'.format( self.iso_year, self.month, self.day, self.hour, self.minute, self.second, '{:06}'.format(self.microsecond).rstrip('0'), str(self.tzinfo or '') ) return '{}-{:02}-{:02}T{:02}:{:02}:{:02}{}'.format( self.iso_year, self.month, self.day, self.hour, self.minute, self.second, str(self.tzinfo or '') ) class DateTime(DateTime10): """XSD 1.1 xs:dateTime builtin type""" name = 'dateTime' xsd_version = '1.1' class DateTimeStamp(DateTime): """XSD 1.1 xs:dateTimeStamp builtin type""" name = 'dateTimeStamp' pattern = re.compile( r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})-(?P[0-9]{2})' r'(T(?P[0-9]{2}):(?P[0-9]{2}):' r'(?P[0-9]{2})(?:\.(?P[0-9]+))?)' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))$') class Date10(OrderedDateTime): """XSD 1.0 xs:date builtin type""" name = 'date' pattern = re.compile(r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})-(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, month: int, day: int, tzinfo: Optional[datetime.tzinfo] = None) -> None: super(Date10, self).__init__(year, month, day, tzinfo=tzinfo) def __str__(self) -> str: return '{}-{:02}-{:02}{}'.format( self.iso_year, self.month, self.day, str(self.tzinfo or '') ) class Date(Date10): """XSD 1.1 xs:date builtin type""" name = 'date' xsd_version = '1.1' class GregorianDay(OrderedDateTime): """XSD xs:gDay builtin type""" name = 'gDay' pattern = re.compile(r'^---(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, day: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianDay, self).__init__(day=day, tzinfo=tzinfo) def __str__(self) -> str: return '---{:02}{}'.format(self.day, str(self.tzinfo or '')) class GregorianMonth(OrderedDateTime): """XSD xs:gMonth builtin type""" name = 'gMonth' pattern = re.compile(r'^--(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, month: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianMonth, self).__init__(month=month, tzinfo=tzinfo) def __str__(self) -> str: return '--{:02}{}'.format(self.month, str(self.tzinfo or '')) class GregorianMonthDay(OrderedDateTime): """XSD xs:gMonthDay builtin type""" name = 'gMonthDay' pattern = re.compile(r'^--(?P[0-9]{2})-(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, month: int, day: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianMonthDay, self).__init__(month=month, day=day, tzinfo=tzinfo) def __str__(self) -> str: return '--{:02}-{:02}{}'.format(self.month, self.day, str(self.tzinfo or '')) class GregorianYear10(OrderedDateTime): """XSD 1.0 xs:gYear builtin type""" name = 'gYear' pattern = re.compile(r'^(?P-?[0-9]*[0-9]{4})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianYear10, self).__init__(year, tzinfo=tzinfo) def __str__(self) -> str: return '{}{}'.format(self.iso_year, str(self.tzinfo or '')) class GregorianYear(GregorianYear10): """XSD 1.1 xs:gYear builtin type""" name = 'gYear' xsd_version = '1.1' class GregorianYearMonth10(OrderedDateTime): """XSD 1.0 xs:gYearMonth builtin type""" name = 'gYearMonth' pattern = re.compile(r'^(?P-?[0-9]*[0-9]{4})-(?P[0-9]{2})' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, year: int, month: int, tzinfo: Optional[Timezone] = None) -> None: super(GregorianYearMonth10, self).__init__(year, month, tzinfo=tzinfo) def __str__(self) -> str: return '{}-{:02}{}'.format(self.iso_year, self.month, str(self.tzinfo or '')) class GregorianYearMonth(GregorianYearMonth10): """XSD 1.1 xs:gYearMonth builtin type""" name = 'gYearMonth' xsd_version = '1.1' class Time(AbstractDateTime): """XSD xs:time builtin type""" name = 'time' pattern = re.compile( r'^(?P[0-9]{2}):(?P[0-9]{2}):' r'(?P[0-9]{2})(?:\.(?P[0-9]+))?' r'(?PZ|[+-](?:(?:0[0-9]|1[0-3]):[0-5][0-9]|14:00))?$') def __init__(self, hour: int = 0, minute: int = 0, second: int = 0, microsecond: int = 0, tzinfo: Union[None, Timezone, datetime.tzinfo] = None) -> None: if hour == 24 and minute == second == microsecond == 0: hour = 0 super(Time, self).__init__( hour=hour, minute=minute, second=second, microsecond=microsecond, tzinfo=tzinfo ) def __str__(self) -> str: if self.microsecond: return '{:02}:{:02}:{:02}.{}{}'.format( self.hour, self.minute, self.second, '{:06}'.format(self.microsecond).rstrip('0'), str(self.tzinfo or '') ) return '{:02}:{:02}:{:02}{}'.format( self.hour, self.minute, self.second, str(self.tzinfo or '') ) def __lt__(self, other: object) -> bool: return cast(bool, operator.lt(*self._get_operands(other))) def __le__(self, other: object) -> bool: return cast(bool, operator.le(*self._get_operands(other))) def __gt__(self, other: object) -> bool: return cast(bool, operator.gt(*self._get_operands(other))) def __ge__(self, other: object) -> bool: return cast(bool, operator.ge(*self._get_operands(other))) def __add__(self, other: object) -> 'Time': if isinstance(other, DayTimeDuration): dt = self._dt + other.get_timedelta() elif isinstance(other, datetime.timedelta): dt = self._dt + other else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) def __sub__(self, other: object) -> Union['DayTimeDuration', 'Time']: if isinstance(other, self.__class__): delta = operator.sub(*self._get_operands(other)) return DayTimeDuration.fromtimedelta(delta) elif isinstance(other, DayTimeDuration): dt = self._dt - other.get_timedelta() return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) elif isinstance(other, datetime.timedelta): dt = self._dt - other return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) else: raise TypeError("wrong type %r for operand %r" % (type(other), other)) _D = TypeVar('_D', bound='Duration') class Duration(AnyAtomicType): """ Base class for the XSD duration types. :param months: an integer value that represents years and months. :param seconds: a decimal or an integer instance that represents \ days, hours, minutes, seconds and fractions of seconds. """ name = 'duration' pattern = re.compile( r'^(-)?P(?=[0-9]|T)(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?' r'(?:T(?=[0-9])(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+(?:\.[0-9]+)?)S)?)?$' ) def __init__(self, months: int = 0, seconds: Union[Decimal, int] = 0) -> None: if seconds < 0 < months or months < 0 < seconds: raise ValueError('signs differ: (months=%d, seconds=%d)' % (months, seconds)) elif abs(months) > 2 ** 31: raise OverflowError("months duration overflow") elif abs(seconds) > 2 ** 63: # type: ignore[operator] raise OverflowError("seconds duration overflow") self.months = months self.seconds = Decimal(seconds).quantize(Decimal('1.000000', context=Context(prec=30))) def __repr__(self) -> str: return '{}(months={!r}, seconds={})'.format( self.__class__.__name__, self.months, normalized_seconds(self.seconds) ) def __str__(self) -> str: m = abs(self.months) years, months = m // 12, m % 12 s = self.seconds.copy_abs() days = int(s // 86400) hours = int(s // 3600 % 24) minutes = int(s // 60 % 60) seconds = s % 60 value = '-P' if self.sign else 'P' if years or months or days: if years: value += '%dY' % years if months: value += '%dM' % months if days: value += '%dD' % days if hours or minutes or seconds: value += 'T' if hours: value += '%dH' % hours if minutes: value += '%dM' % minutes if seconds: value += '%sS' % normalized_seconds(seconds) elif value[-1] == 'P': value += 'T0S' return value @classmethod def fromstring(cls: Type[_D], text: str) -> _D: """ Creates a Duration instance from a formatted XSD duration string. :param text: an ISO 8601 representation without week fragment and an optional decimal part \ only for seconds fragment. """ if not isinstance(text, str): msg = 'argument has an invalid type {!r}' raise TypeError(msg.format(type(text))) match = cls.pattern.match(text.strip()) if match is None: raise ValueError('%r is not an xs:duration value' % text) sign, y, mo, d, h, mi, s = match.groups() seconds = Decimal(s or 0) minutes = int(mi or 0) + int(seconds // 60) seconds = seconds % 60 hours = int(h or 0) + minutes // 60 minutes = minutes % 60 days = int(d or 0) + hours // 24 hours = hours % 24 months = int(mo or 0) + 12 * int(y or 0) if sign is None: seconds = seconds + (days * 24 + hours) * 3600 + minutes * 60 else: months = -months seconds = -seconds - (days * 24 + hours) * 3600 - minutes * 60 if cls is DayTimeDuration: if months: raise ValueError('months must be 0 for %r' % cls.__name__) return cls(seconds=seconds) elif cls is YearMonthDuration: if seconds: raise ValueError('seconds must be 0 for %r' % cls.__name__) return cls(months=months) return cls(months=months, seconds=seconds) @property def sign(self) -> str: return '-' if self.months < 0 or self.seconds < 0 else '' def _compare_durations(self, other: object, op: Callable[[Any, Any], Any]) -> bool: """ Ordering is defined through comparison of four datetime.datetime values. Ref: https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/#duration """ if not isinstance(other, self.__class__): raise TypeError("wrong type %r for operand %r" % (type(other), other)) m1, s1 = self.months, int(self.seconds) m2, s2 = other.months, int(other.seconds) ms1, ms2 = int((self.seconds - s1) * 1000000), int((other.seconds - s2) * 1000000) return all([ op(datetime.timedelta(months2days(1696, 9, m1), s1, ms1), datetime.timedelta(months2days(1696, 9, m2), s2, ms2)), op(datetime.timedelta(months2days(1697, 2, m1), s1, ms1), datetime.timedelta(months2days(1697, 2, m2), s2, ms2)), op(datetime.timedelta(months2days(1903, 3, m1), s1, ms1), datetime.timedelta(months2days(1903, 3, m2), s2, ms2)), op(datetime.timedelta(months2days(1903, 7, m1), s1, ms1), datetime.timedelta(months2days(1903, 7, m2), s2, ms2)), ]) def __hash__(self) -> int: return hash((self.months, self.seconds)) def __eq__(self, other: object) -> bool: if isinstance(other, self.__class__): return self.months == other.months and self.seconds == other.seconds elif isinstance(other, UntypedAtomic): return self.__eq__(self.fromstring(other.value)) else: return other == (self.months, self.seconds) def __ne__(self, other: object) -> bool: if isinstance(other, self.__class__): return self.months != other.months or self.seconds != other.seconds elif isinstance(other, UntypedAtomic): return self.__ne__(self.fromstring(other.value)) else: return other != (self.months, self.seconds) def __lt__(self, other: object) -> bool: return self._compare_durations(other, operator.lt) def __le__(self, other: object) -> bool: return self == other or self._compare_durations(other, operator.le) def __gt__(self, other: object) -> bool: return self._compare_durations(other, operator.gt) def __ge__(self, other: object) -> bool: return self == other or self._compare_durations(other, operator.ge) class YearMonthDuration(Duration): name = 'yearMonthDuration' def __init__(self, months: int = 0) -> None: super(YearMonthDuration, self).__init__(months, 0) def __repr__(self) -> str: return '%s(months=%r)' % (self.__class__.__name__, self.months) def __str__(self) -> str: m = abs(self.months) years, months = m // 12, m % 12 if not years: return '-P%dM' % months if self.months < 0 else 'P%dM' % months elif not months: return '-P%dY' % years if self.months < 0 else 'P%dY' % years elif self.months < 0: return '-P%dY%dM' % (years, months) else: return 'P%dY%dM' % (years, months) def __add__(self, other: object) \ -> Union['YearMonthDuration', 'DayTimeDuration', 'OrderedDateTime']: if isinstance(other, self.__class__): return YearMonthDuration(months=self.months + other.months) elif isinstance(other, (DateTime10, Date10)): return other + self raise TypeError("cannot add %r to %r" % (type(other), type(self))) def __sub__(self, other: object) -> 'YearMonthDuration': if not isinstance(other, self.__class__): raise TypeError("cannot subtract %r from %r" % (type(other), type(self))) return YearMonthDuration(months=self.months - other.months) def __mul__(self, other: object) -> 'YearMonthDuration': if not isinstance(other, (float, int, Decimal)): raise TypeError("cannot multiply a %r by %r" % (type(self), type(other))) return YearMonthDuration(months=int(round_number(self.months * other))) def __truediv__(self, other: object) -> Union[float, 'YearMonthDuration']: if isinstance(other, self.__class__): return self.months / other.months elif isinstance(other, (float, int, Decimal)): return YearMonthDuration(months=int(round_number(self.months / other))) else: raise TypeError("cannot divide a %r by %r" % (type(self), type(other))) class DayTimeDuration(Duration): name = 'dayTimeDuration' def __init__(self, seconds: Union[Decimal, int] = 0) -> None: super(DayTimeDuration, self).__init__(0, seconds) @classmethod def fromtimedelta(cls, td: datetime.timedelta) -> 'DayTimeDuration': return cls(seconds=Decimal( '{}.{:06}'.format(td.days * 86400 + td.seconds, td.microseconds) )) def get_timedelta(self) -> datetime.timedelta: return datetime.timedelta( seconds=int(self.seconds), microseconds=int(self.seconds % 1 * 1000000) ) def __repr__(self) -> str: return '%s(seconds=%s)' % (self.__class__.__name__, normalized_seconds(self.seconds)) def __add__(self, other: object) -> Union['DayTimeDuration', Time, OrderedDateTime]: if isinstance(other, (Time, Date10)): return other + self elif isinstance(other, self.__class__): return DayTimeDuration(self.seconds + other.seconds) raise TypeError("cannot add %r to %r" % (type(other), type(self))) def __sub__(self, other: object) -> 'DayTimeDuration': if not isinstance(other, self.__class__): raise TypeError("cannot subtract %r from %r" % (type(other), type(self))) return DayTimeDuration(seconds=self.seconds - other.seconds) def __mul__(self, other: object) -> 'DayTimeDuration': if isinstance(other, (float, int, Decimal)): if math.isnan(other): raise ValueError("cannot multiply a %r by NaN" % type(self)) if isinstance(other, (int, Decimal)): seconds = self.seconds * other else: seconds = self.seconds * Decimal.from_float(other) return DayTimeDuration(seconds) else: raise TypeError("cannot multiply a %r by %r" % (type(self), type(other))) def __truediv__(self, other: object) -> Union[Decimal, 'DayTimeDuration']: if isinstance(other, self.__class__): return self.seconds / other.seconds elif isinstance(other, (float, int, Decimal)): if math.isnan(other): raise ValueError("cannot divide a %r by NaN" % type(self)) if isinstance(other, (int, Decimal)): seconds = self.seconds / other else: seconds = self.seconds / Decimal.from_float(other) return DayTimeDuration(seconds) else: raise TypeError("cannot divide a %r by %r" % (type(self), type(other))) sissaschool-elementpath-d3688c7/elementpath/datatypes/numeric.py000066400000000000000000000213271476131650400252540ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import math import re from typing import Any, Optional, SupportsFloat, SupportsInt, Union, Type from elementpath.helpers import NUMERIC_INF_OR_NAN, INVALID_NUMERIC, collapse_white_spaces from .atomic_types import AnyAtomicType class Float10(float, AnyAtomicType): name = 'float' xsd_version = '1.0' pattern = re.compile( r'^(?:[+-]?(?:[0-9]+(?:\.[0-9]*)?|\.[0-9]+)(?:[Ee][+-]?[0-9]+)? |[+-]?INF|NaN)$' ) def __new__(cls, value: Union[str, SupportsFloat]) -> 'Float10': if isinstance(value, str): value = collapse_white_spaces(value) if value in NUMERIC_INF_OR_NAN or cls.xsd_version != '1.0' and value == '+INF': if value == 'NaN': try: return float_nan except NameError: pass elif value.lower() in INVALID_NUMERIC: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) elif math.isnan(value): try: return float_nan except NameError: # pragma: no cover pass _value = super().__new__(cls, value) if _value > 3.4028235E38: return super().__new__(cls, 'INF') elif _value < -3.4028235E38: return super().__new__(cls, '-INF') elif -1e-37 < _value < 1e-37: return super().__new__(cls, -0.0 if str(_value).startswith('-') else 0.0) return _value def __init__(self, value: Union[str, SupportsFloat]) -> None: float.__init__(self) def __hash__(self) -> int: return super(Float10, self).__hash__() def __eq__(self, other: object) -> bool: if isinstance(other, self.__class__): if super(Float10, self).__eq__(other): return True return math.isclose(self, other, rel_tol=1e-7, abs_tol=0.0) return super(Float10, self).__eq__(other) def __ne__(self, other: object) -> bool: if isinstance(other, self.__class__): if super(Float10, self).__eq__(other): return False return not math.isclose(self, other, rel_tol=1e-7, abs_tol=0.0) return super(Float10, self).__ne__(other) def __add__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__add__(other)) elif isinstance(other, float): return super(Float10, self).__add__(other) return NotImplemented def __radd__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__radd__(other)) elif isinstance(other, float): return super(Float10, self).__radd__(other) return NotImplemented def __sub__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__sub__(other)) elif isinstance(other, float): return super(Float10, self).__sub__(other) return NotImplemented def __rsub__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__rsub__(other)) elif isinstance(other, float): return super(Float10, self).__rsub__(other) return NotImplemented def __mul__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__mul__(other)) elif isinstance(other, float): return super(Float10, self).__mul__(other) return NotImplemented def __rmul__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__rmul__(other)) elif isinstance(other, float): return super(Float10, self).__rmul__(other) return NotImplemented def __truediv__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__truediv__(other)) elif isinstance(other, float): return super(Float10, self).__truediv__(other) return NotImplemented def __rtruediv__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__rtruediv__(other)) elif isinstance(other, float): return super(Float10, self).__rtruediv__(other) return NotImplemented def __mod__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__mod__(other)) elif isinstance(other, float): return super(Float10, self).__mod__(other) return NotImplemented def __rmod__(self, other: object) -> Union[float, 'Float10', 'Float']: if isinstance(other, (self.__class__, int)) and not isinstance(other, bool): return self.__class__(super(Float10, self).__rmod__(other)) elif isinstance(other, float): return super(Float10, self).__rmod__(other) return NotImplemented def __abs__(self) -> Union['Float10', 'Float']: return self.__class__(super(Float10, self).__abs__()) class Float(Float10): name = 'float' xsd_version = '1.1' # The instance used for xs:float NaN values in order to keep identity float_nan = Float10('NaN') class Integer(int, AnyAtomicType): """A wrapper for emulating xs:integer and limited integer types.""" name = 'integer' pattern = re.compile(r'^[\-+]?[0-9]+$') lower_bound: Optional[int] = None higher_bound: Optional[int] = None def __init__(self, value: Union[str, SupportsInt]) -> None: if self.lower_bound is not None and self < self.lower_bound: raise ValueError("value {} is too low for {!r}".format(value, self.__class__)) elif self.higher_bound is not None and self >= self.higher_bound: raise ValueError("value {} is too high for {!r}".format(value, self.__class__)) int.__init__(self) @classmethod def __subclasshook__(cls, subclass: Type[Any]) -> bool: if cls is Integer: return issubclass(subclass, int) and not issubclass(subclass, bool) return NotImplemented # type: ignore[no-any-return,unused-ignore] @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class NonPositiveInteger(Integer): name = 'nonPositiveInteger' lower_bound, higher_bound = None, 1 class NegativeInteger(NonPositiveInteger): name = 'negativeInteger' lower_bound, higher_bound = None, 0 class Long(Integer): name = 'long' lower_bound, higher_bound = -2**63, 2**63 class Int(Long): name = 'int' lower_bound, higher_bound = -2**31, 2**31 class Short(Int): name = 'short' lower_bound, higher_bound = -2**15, 2**15 class Byte(Short): name = 'byte' lower_bound, higher_bound = -2**7, 2**7 class NonNegativeInteger(Integer): name = 'nonNegativeInteger' lower_bound = 0 higher_bound: Optional[int] = None class PositiveInteger(NonNegativeInteger): name = 'positiveInteger' lower_bound, higher_bound = 1, None class UnsignedLong(NonNegativeInteger): name = 'unsignedLong' lower_bound, higher_bound = 0, 2**64 class UnsignedInt(UnsignedLong): name = 'unsignedInt' lower_bound, higher_bound = 0, 2**32 class UnsignedShort(UnsignedInt): name = 'unsignedShort' lower_bound, higher_bound = 0, 2**16 class UnsignedByte(UnsignedShort): name = 'unsignedByte' lower_bound, higher_bound = 0, 2**8 sissaschool-elementpath-d3688c7/elementpath/datatypes/proxies.py000066400000000000000000000154671476131650400253130ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re import math from decimal import Decimal from typing import Any, Union, SupportsFloat from elementpath.helpers import BOOLEAN_VALUES, collapse_white_spaces, get_double from .atomic_types import AnyAtomicType from .untyped import UntypedAtomic from .numeric import Float10, Integer from .datetime import AbstractDateTime, Duration FloatArgType = Union[SupportsFloat, str, bytes] #### # Type proxies for basic Python datatypes: a proxy class creates # and validates its Python datatype and virtual registered types. class BooleanProxy(AnyAtomicType): name = 'boolean' pattern = re.compile(r'^(?:true|false|1|0)$') def __new__(cls, value: object) -> bool: # type: ignore[misc] if isinstance(value, bool): return value elif isinstance(value, (int, float, Decimal)): if math.isnan(value): return False return bool(value) elif isinstance(value, UntypedAtomic): value = value.value elif not isinstance(value, str): raise TypeError('invalid type {!r} for xs:{}'.format(type(value), cls.name)) if value.strip() not in BOOLEAN_VALUES: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return 't' in value or '1' in value def __init__(self, value: object) -> None: bool.__init__(self) @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, bool) @classmethod def validate(cls, value: object) -> None: if isinstance(value, bool): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class DecimalProxy(AnyAtomicType): name = 'decimal' pattern = re.compile(r'^[+-]?(?:[0-9]+(?:\.[0-9]*)?|\.[0-9]+)$') def __new__(cls, value: Any) -> Decimal: # type: ignore[misc] if isinstance(value, (str, UntypedAtomic)): value = collapse_white_spaces(str(value)).replace(' ', '') if cls.pattern.match(value) is None: raise cls.invalid_value(value) elif isinstance(value, (float, Float10, Decimal)): if math.isinf(value) or math.isnan(value): raise cls.invalid_value(value) try: return Decimal(value) except (ValueError, ArithmeticError): msg = 'invalid value {!r} for xs:{}' raise ArithmeticError(msg.format(value, cls.name)) from None def __init__(self, value: Any) -> None: pass @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, (int, Decimal, Integer)) and not issubclass(subclass, bool) @classmethod def validate(cls, value: object) -> None: if isinstance(value, Decimal): if math.isnan(value) or math.isinf(value): raise cls.invalid_value(value) elif isinstance(value, (int, Integer)) and not isinstance(value, bool): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class DoubleProxy10(AnyAtomicType): name = 'double' xsd_version = '1.0' pattern = re.compile( r'^(?:[+-]?(?:[0-9]+(?:\.[0-9]*)?|\.[0-9]+)(?:[Ee][+-]?[0-9]+)?|[+-]?INF|NaN)$' ) def __new__(cls, value: Union[SupportsFloat, str]) -> float: # type: ignore[misc] return get_double(value, cls.xsd_version) def __init__(self, value: Union[SupportsFloat, str]) -> None: float.__init__(self) @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, float) and not issubclass(subclass, Float10) @classmethod def validate(cls, value: object) -> None: if isinstance(value, float) and not isinstance(value, Float10): return elif isinstance(value, str): if cls.pattern.match(value) is None: raise cls.invalid_value(value) else: raise cls.invalid_type(value) class DoubleProxy(DoubleProxy10): name = 'double' xsd_version = '1.1' class StringProxy(AnyAtomicType): name = 'string' def __new__(cls, *args: object, **kwargs: object) -> str: # type: ignore[misc] return str(*args, **kwargs) def __init__(self, *args: object, **kwargs: object) -> None: str.__init__(self) @classmethod def __subclasshook__(cls, subclass: type) -> bool: return issubclass(subclass, str) @classmethod def validate(cls, value: object) -> None: if not isinstance(value, str): raise cls.invalid_type(value) #### # Type proxies for multiple type-checking in XPath expressions class NumericTypeMeta(type): """Metaclass for checking numeric classes and instances.""" def __instancecheck__(cls, instance: object) -> bool: return isinstance(instance, (int, float, Decimal)) and not isinstance(instance, bool) def __subclasscheck__(cls, subclass: type) -> bool: if issubclass(subclass, bool): return False return issubclass(subclass, int) or issubclass(subclass, float) \ or issubclass(subclass, Decimal) class NumericProxy(metaclass=NumericTypeMeta): """Proxy for xs:numeric related types. Builds xs:float instances.""" def __new__(cls, *args: FloatArgType, **kwargs: FloatArgType) -> float: # type: ignore[misc] return float(*args, **kwargs) class ArithmeticTypeMeta(type): """Metaclass for checking numeric, datetime and duration classes/instances.""" def __instancecheck__(cls, instance: object) -> bool: return isinstance( instance, (int, float, Decimal, AbstractDateTime, Duration, UntypedAtomic) ) and not isinstance(instance, bool) def __subclasscheck__(cls, subclass: type) -> bool: if issubclass(subclass, bool): return False return issubclass(subclass, int) or issubclass(subclass, float) or \ issubclass(subclass, Decimal) or issubclass(subclass, Duration) \ or issubclass(subclass, AbstractDateTime) or issubclass(subclass, UntypedAtomic) class ArithmeticProxy(metaclass=ArithmeticTypeMeta): """Proxy for arithmetic related types. Builds xs:float instances.""" def __new__(cls, *args: FloatArgType, **kwargs: FloatArgType) -> float: # type: ignore[misc] return float(*args, **kwargs) sissaschool-elementpath-d3688c7/elementpath/datatypes/qname.py000066400000000000000000000057741476131650400247230ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from typing import Any, Optional from .atomic_types import AnyAtomicType from .untyped import UntypedAtomic class AbstractQName(AnyAtomicType): """ XPath compliant QName, bound with a prefix and a namespace. :param uri: the bound namespace URI, must be a not empty \ URI if a prefixed name is provided for the 2nd argument. :param qname: the prefixed name or a local name. """ pattern = re.compile( r'^(?:(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*):)?' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*)$', ) def __new__(cls, *args: Any, **kwargs: Any) -> 'AbstractQName': if cls.__name__ == 'Notation': raise TypeError("can't instantiate xs:NOTATION objects") return super().__new__(cls) def __init__(self, uri: Optional[str], qname: str) -> None: if uri is None: self.uri = '' elif isinstance(uri, str): self.uri = uri else: raise TypeError('the 1st argument has an invalid type %r' % type(uri)) if not isinstance(qname, str): raise TypeError('the 2nd argument has an invalid type %r' % type(qname)) self.qname = qname.strip() match = self.pattern.match(self.qname) if match is None: raise ValueError('invalid value {!r} for an xs:QName'.format(self.qname)) self.prefix = match.groupdict()['prefix'] self.local_name = match.groupdict()['local'] if not uri and self.prefix: msg = '{!r}: cannot associate a non-empty prefix with no namespace' raise ValueError(msg.format(self)) @property def namespace(self) -> str: return self.uri @property def expanded_name(self) -> str: return '{%s}%s' % (self.uri, self.local_name) if self.uri else self.local_name @property def braced_uri_name(self) -> str: return 'Q{%s}%s' % (self.uri, self.local_name) def __repr__(self) -> str: return '%s(uri=%r, qname=%r)' % (self.__class__.__name__, self.uri, self.qname) def __str__(self) -> str: return self.qname def __hash__(self) -> int: return hash((self.uri, self.local_name)) def __eq__(self, other: object) -> bool: if isinstance(other, AbstractQName): return self.uri == other.uri and self.local_name == other.local_name elif isinstance(other, (str, UntypedAtomic)): return other == self.qname raise TypeError("cannot compare {!r} to {!r}".format(type(self), type(other))) class QName(AbstractQName): name = 'QName' class Notation(AbstractQName): name = 'NOTATION' sissaschool-elementpath-d3688c7/elementpath/datatypes/string.py000066400000000000000000000046641476131650400251250ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from typing import Any from elementpath.helpers import collapse_white_spaces, Patterns from .atomic_types import AnyAtomicType class NormalizedString(str, AnyAtomicType): name = 'normalizedString' pattern = re.compile('^[^\t\r]*$') def __new__(cls, obj: Any) -> 'NormalizedString': try: return super().__new__(cls, Patterns.normalize.sub(' ', obj)) except TypeError: return super().__new__(cls, obj) def __init__(self, obj: Any) -> None: str.__init__(self) class XsdToken(NormalizedString): name = 'token' pattern = re.compile(r'^[\S\xa0]*(?: [\S\xa0]+)*$') def __new__(cls, value: Any) -> 'XsdToken': if not isinstance(value, str): value = str(value) else: value = collapse_white_spaces(value) match = cls.pattern.match(value) if match is None: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return super(NormalizedString, cls).__new__(cls, value) # noqa class Language(XsdToken): name = 'language' pattern = re.compile(r'^[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*$') def __new__(cls, value: Any) -> 'Language': if isinstance(value, bool): value = 'true' if value else 'false' elif not isinstance(value, str): value = str(value) else: value = collapse_white_spaces(value) match = cls.pattern.match(value) if match is None: raise ValueError('invalid value {!r} for xs:{}'.format(value, cls.name)) return super(NormalizedString, cls).__new__(cls, value) # noqa class Name(XsdToken): name = 'Name' pattern = re.compile(r'^(?:[^\d\W]|:)[\w.\-:\u00B7\u0300-\u036F\u203F\u2040]*$') class NCName(Name): name = 'NCName' pattern = re.compile(r'^[^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*$') class Id(NCName): name = 'ID' class Idref(NCName): name = 'IDREF' class Entity(NCName): name = 'ENTITY' class NMToken(XsdToken): name = 'NMTOKEN' pattern = re.compile(r'^[\w.\-:\u00B7\u0300-\u036F\u203F\u2040]+$') sissaschool-elementpath-d3688c7/elementpath/datatypes/untyped.py000066400000000000000000000123351476131650400253010ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import operator from decimal import Decimal from typing import Any, Optional, Tuple, Union from elementpath.helpers import BOOLEAN_VALUES, get_double from .atomic_types import AnyAtomicType class UntypedAtomic(AnyAtomicType): """ Class for xs:untypedAtomic data. Provides special methods for comparing and converting to basic data types. :param value: the untyped value, usually a string. """ name = 'untypedAtomic' value: str @classmethod def validate(cls, value: object) -> None: if not isinstance(value, cls): raise cls.invalid_type(value) def __init__(self, value: Union[str, bytes, bool, float, Decimal, 'UntypedAtomic', AnyAtomicType]) -> None: if isinstance(value, str): self.value = value elif isinstance(value, bytes): self.value = value.decode('utf-8') elif isinstance(value, bool): self.value = 'true' if value else 'false' elif isinstance(value, float): self.value = str(value).rstrip('0').rstrip('.') elif isinstance(value, Decimal): self.value = str(value.normalize()) elif isinstance(value, UntypedAtomic): self.value = value.value elif isinstance(value, AnyAtomicType): self.value = str(value) else: raise TypeError("{!r} is not an atomic value".format(value)) def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.value) def _get_operands(self, other: Any, force_float: bool = True) -> Tuple[Any, Any]: """ Returns a couple of operands, applying a cast to the instance value based on the type of the *other* argument. :param other: The other operand, that determines the cast for the untyped instance. :param force_float: Force a conversion to float if *other* is an UntypedAtomic instance. :return: A couple of values. """ if isinstance(other, UntypedAtomic): if force_float: return get_double(self.value), get_double(other.value) return self.value, other.value elif isinstance(other, bool): # Cast to xs:boolean value = self.value.strip() if value not in BOOLEAN_VALUES: raise ValueError("{!r} cannot be cast to xs:boolean".format(self.value)) return value in ('1', 'true'), other elif isinstance(other, int): return get_double(self.value), other elif other is None or isinstance(other, (str, list)): return self.value, other if hasattr(other, 'fromstring'): return type(other).fromstring(self.value), other elif hasattr(other, 'ordered'): return type(other)(self.value, other.ordered), other else: return type(other)(self.value), other def __hash__(self) -> int: return hash(self.value) def __eq__(self, other: Any) -> Any: return operator.eq(*self._get_operands(other, force_float=False)) def __ne__(self, other: Any) -> Any: return not operator.eq(*self._get_operands(other, force_float=False)) def __lt__(self, other: Any) -> Any: return operator.lt(*self._get_operands(other)) def __le__(self, other: Any) -> Any: return operator.le(*self._get_operands(other)) def __gt__(self, other: Any) -> Any: return operator.gt(*self._get_operands(other)) def __ge__(self, other: Any) -> Any: return operator.ge(*self._get_operands(other)) def __add__(self, other: Any) -> Any: return operator.add(*self._get_operands(other)) __radd__ = __add__ def __sub__(self, other: Any) -> Any: return operator.sub(*self._get_operands(other)) def __rsub__(self, other: Any) -> Any: return operator.sub(*reversed(self._get_operands(other))) def __mul__(self, other: Any) -> Any: return operator.mul(*self._get_operands(other)) __rmul__ = __mul__ def __truediv__(self, other: Any) -> Any: return operator.truediv(*self._get_operands(other)) def __rtruediv__(self, other: Any) -> Any: return operator.truediv(*reversed(self._get_operands(other))) def __int__(self) -> int: return int(self.value) def __float__(self) -> float: return get_double(self.value, xsd_version='1.1') def __bool__(self) -> bool: return bool(self.value) # For effective boolean value, not for cast to xs:boolean. def __abs__(self) -> Decimal: return abs(Decimal(self.value)) def __mod__(self, other: Any) -> Any: return operator.mod(*self._get_operands(other)) def __round__(self, n: Optional[int] = None) -> float: return round(float(self.value), ndigits=n) def __str__(self) -> str: return self.value def __bytes__(self) -> bytes: return bytes(self.value, encoding='utf-8') sissaschool-elementpath-d3688c7/elementpath/datatypes/uri.py000066400000000000000000000103151476131650400244040ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from decimal import Decimal from urllib.parse import urlparse from typing import Union from elementpath.helpers import collapse_white_spaces, Patterns from .atomic_types import AnyAtomicType from .untyped import UntypedAtomic from .numeric import Integer class AnyURI(AnyAtomicType): """ Class for xs:anyURI data. :param value: a string or an untyped atomic instance. """ value: str name = 'anyURI' def __init__(self, value: Union[str, bytes, UntypedAtomic, 'AnyURI']) -> None: if isinstance(value, str): self.value = collapse_white_spaces(value) elif isinstance(value, bytes): self.value = collapse_white_spaces(value.decode('utf-8')) elif isinstance(value, self.__class__): self.value = value.value elif isinstance(value, UntypedAtomic): self.value = collapse_white_spaces(value.value) else: raise TypeError('the argument has an invalid type %r' % type(value)) self.validate(self.value) def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.value) def __str__(self) -> str: return self.value def __bool__(self) -> bool: return bool(self.value) # For effective boolean value def __hash__(self) -> int: return hash(self.value) def __contains__(self, item: str) -> bool: return item in self.value def __eq__(self, other: object) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value == other.value elif isinstance(other, (bool, float, Decimal, Integer)): raise TypeError("cannot compare {} with xs:{}".format(type(other), self.name)) return self.value == other def __ne__(self, other: object) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value != other.value elif isinstance(other, (bool, float, Decimal, Integer)): raise TypeError("cannot compare {} with xs:{}".format(type(other), self.name)) return self.value != other def __lt__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value < other.value return self.value < other def __le__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value <= other.value return self.value <= other def __gt__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value > other.value return self.value > other def __ge__(self, other: Union[str, 'AnyURI', UntypedAtomic]) -> bool: if isinstance(other, (AnyURI, UntypedAtomic)): return self.value >= other.value return self.value >= other @classmethod def validate(cls, value: object) -> None: if isinstance(value, cls): return elif isinstance(value, bytes): value = value.decode() elif not isinstance(value, str): raise cls.invalid_type(value) try: url_parts = urlparse(value) _ = url_parts.port # check invalid port! except ValueError as err: msg = 'invalid value {!r} for xs:{} ({})' raise ValueError(msg.format(value, cls.name, str(err))) from None else: if url_parts.path.startswith(':'): raise cls.invalid_value(value) elif value.count('#') > 1: msg = 'invalid value {!r} for xs:{} (too many # characters)' raise ValueError(msg.format(value, cls.name)) elif Patterns.wrong_escape.search(value) is not None: msg = 'invalid value {!r} for xs:{} (wrong escaping)' raise ValueError(msg.format(value, cls.name)) sissaschool-elementpath-d3688c7/elementpath/decoder.py000066400000000000000000000172471476131650400232270ustar00rootroot00000000000000# # Copyright (c), 2024, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XSD atomic datatypes subpackage. Includes a class for UntypedAtomic data and classes for other XSD built-in types. This subpackage raises only built-in exceptions in order to be reusable in other packages. """ from collections import namedtuple from decimal import Decimal from functools import lru_cache from typing import List, Optional, Type from elementpath._typing import Callable, Iterator, MutableMapping from elementpath.aliases import AnyNsmapType from elementpath.datatypes import AtomicType from elementpath.protocols import XsdTypeProtocol from elementpath.exceptions import xpath_error from elementpath.namespaces import XSD_NAMESPACE import elementpath.datatypes as dt DecoderType = Callable[[str, AnyNsmapType], AtomicType] Builder = namedtuple('Builder', 'cls text nsmap', defaults=(None, None)) class _Notation(dt.Notation): """An instantiable xs:NOTATION.""" # noinspection PyArgumentList ATOMIC_BUILDERS: MutableMapping[Optional[str], Builder] = { f'{{{XSD_NAMESPACE}}}untypedAtomic': Builder(dt.UntypedAtomic, '1'), f'{{{XSD_NAMESPACE}}}anyType': Builder(dt.UntypedAtomic, '1'), f'{{{XSD_NAMESPACE}}}anySimpleType': Builder(dt.UntypedAtomic, '1'), f'{{{XSD_NAMESPACE}}}anyAtomicType': Builder(dt.UntypedAtomic, '1'), f'{{{XSD_NAMESPACE}}}boolean': Builder(bool, 'true'), f'{{{XSD_NAMESPACE}}}decimal': Builder(Decimal, '1.0'), f'{{{XSD_NAMESPACE}}}double': Builder(float, '1.0'), f'{{{XSD_NAMESPACE}}}float': Builder(dt.Float10, '1.0'), f'{{{XSD_NAMESPACE}}}string': Builder(str, ' alpha\t'), f'{{{XSD_NAMESPACE}}}date': Builder(dt.Date, '2000-01-01'), f'{{{XSD_NAMESPACE}}}dateTime': Builder(dt.DateTime, '2000-01-01T12:00:00'), f'{{{XSD_NAMESPACE}}}gDay': Builder(dt.GregorianDay, '---31'), f'{{{XSD_NAMESPACE}}}gMonth': Builder(dt.GregorianMonth, '--12'), f'{{{XSD_NAMESPACE}}}gMonthDay': Builder(dt.GregorianMonthDay, '--12-01'), f'{{{XSD_NAMESPACE}}}gYear': Builder(dt.GregorianYear, '1999'), f'{{{XSD_NAMESPACE}}}gYearMonth': Builder(dt.GregorianYearMonth, '1999-09'), f'{{{XSD_NAMESPACE}}}time': Builder(dt.Time, '09:26:54'), f'{{{XSD_NAMESPACE}}}duration': Builder(dt.Duration, 'P1MT1S'), f'{{{XSD_NAMESPACE}}}dayTimeDuration': Builder(dt.DayTimeDuration, 'P1DT1S'), f'{{{XSD_NAMESPACE}}}yearMonthDuration': Builder(dt.YearMonthDuration, 'P1Y1M'), f'{{{XSD_NAMESPACE}}}QName': Builder(dt.QName, 'xs:element'), f'{{{XSD_NAMESPACE}}}NOTATION': Builder(_Notation, 'xs:element'), f'{{{XSD_NAMESPACE}}}anyURI': Builder(dt.AnyURI, 'https://example.com'), f'{{{XSD_NAMESPACE}}}normalizedString': Builder(dt.NormalizedString, ' alpha '), f'{{{XSD_NAMESPACE}}}token': Builder(dt.XsdToken, 'a token'), f'{{{XSD_NAMESPACE}}}language': Builder(dt.Language, 'en-US'), f'{{{XSD_NAMESPACE}}}Name': Builder(dt.Name, '_a.name::'), f'{{{XSD_NAMESPACE}}}NCName': Builder(dt.NCName, 'nc-name'), f'{{{XSD_NAMESPACE}}}ID': Builder(dt.Id, 'id1'), f'{{{XSD_NAMESPACE}}}IDREF': Builder(dt.Idref, 'id_ref1'), f'{{{XSD_NAMESPACE}}}ENTITY': Builder(dt.Entity, 'entity1'), f'{{{XSD_NAMESPACE}}}NMTOKEN': Builder(dt.NMToken, 'a_token'), f'{{{XSD_NAMESPACE}}}base64Binary': Builder(dt.Base64Binary, 'YWxwaGE='), f'{{{XSD_NAMESPACE}}}hexBinary': Builder(dt.HexBinary, '31'), f'{{{XSD_NAMESPACE}}}dateTimeStamp': Builder(dt.DateTimeStamp.fromstring, '2000-01-01T12:00:00+01:00'), f'{{{XSD_NAMESPACE}}}integer': Builder(dt.Integer, '1'), f'{{{XSD_NAMESPACE}}}long': Builder(dt.Long, '1'), f'{{{XSD_NAMESPACE}}}int': Builder(dt.Int, '1'), f'{{{XSD_NAMESPACE}}}short': Builder(dt.Short, '1'), f'{{{XSD_NAMESPACE}}}byte': Builder(dt.Byte, '1'), f'{{{XSD_NAMESPACE}}}positiveInteger': Builder(dt.PositiveInteger, '1'), f'{{{XSD_NAMESPACE}}}negativeInteger': Builder(dt.NegativeInteger, '-1'), f'{{{XSD_NAMESPACE}}}nonPositiveInteger': Builder(dt.NonPositiveInteger, '0'), f'{{{XSD_NAMESPACE}}}nonNegativeInteger': Builder(dt.NonNegativeInteger, '0'), f'{{{XSD_NAMESPACE}}}unsignedLong': Builder(dt.UnsignedLong, '1'), f'{{{XSD_NAMESPACE}}}unsignedInt': Builder(dt.UnsignedInt, '1'), f'{{{XSD_NAMESPACE}}}unsignedShort': Builder(dt.UnsignedShort, '1'), f'{{{XSD_NAMESPACE}}}unsignedByte': Builder(dt.UnsignedByte, '1'), } @lru_cache(maxsize=None) def get_builders(xsd_type: XsdTypeProtocol) -> List[Builder]: """ Returns a list of atomic builtin XSD types that are in the base type of the XSD type argument. """ def iter_builders(root_type: XsdTypeProtocol, depth: int) -> Iterator[Builder]: if depth > 15: return if root_type.name in ATOMIC_BUILDERS: yield ATOMIC_BUILDERS[root_type.name] elif hasattr(root_type, 'member_types'): for member_type in root_type.member_types: yield from iter_builders(member_type, depth + 1) if xsd_type.name in ATOMIC_BUILDERS: return [ATOMIC_BUILDERS[xsd_type.name]] elif xsd_type.is_simple() or (simple_type := xsd_type.simple_type) is None: return [builder for builder in iter_builders(xsd_type.root_type, 1)] elif simple_type.name in ATOMIC_BUILDERS: return [ATOMIC_BUILDERS[simple_type.name]] return [builder for builder in iter_builders(simple_type.root_type, 1)] def get_atomic_sequence(xsd_type: Optional[XsdTypeProtocol], text: object = None, namespaces: AnyNsmapType = None) -> Iterator[dt.AtomicType]: """Returns a decoder function for atomic values of an XSD type instance.""" def decode(value: str) -> dt.AtomicType: if issubclass(cls, (dt.AbstractDateTime, dt.Duration)): return cls.fromstring(value) elif not issubclass(cls, dt.AbstractQName): return cls(value) else: nonlocal namespaces if namespaces is None: namespaces = {'xs': XSD_NAMESPACE} if ':' not in value: return cls(namespaces.get(''), value) else: return cls(namespaces[value.split(':')[0]], value) if xsd_type is None: yield dt.UntypedAtomic(text if isinstance(text, str) else '') return for k, builder in enumerate(get_builders(xsd_type), start=1): cls: Type[dt.AtomicType] = builder.cls _text = text if isinstance(text, str) else builder.text if len(builder) < k and not xsd_type.is_valid(text, namespaces=namespaces): continue try: if xsd_type.is_list(): for item in _text.split(): yield decode(item) else: yield decode(_text) except ValueError as err: raise xpath_error('FORG0001', err, namespaces=namespaces) except ArithmeticError as err: if issubclass(cls, dt.AbstractDateTime): raise xpath_error('FODT0001', err, namespaces=namespaces) elif issubclass(cls, dt.Duration): raise xpath_error('FODT0002', err, namespaces=namespaces) else: raise xpath_error('FOCA0002', err, namespaces=namespaces) else: return else: if hasattr(xsd_type, 'decode'): yield xsd_type.decode(text if isinstance(text, str) else '') else: yield dt.UntypedAtomic(text if isinstance(text, str) else '') __all__ = ['ATOMIC_BUILDERS', 'get_atomic_sequence'] sissaschool-elementpath-d3688c7/elementpath/etree.py000066400000000000000000000277231476131650400227260ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ A unified loader module for ElementTree with a safe parser and helper functions. """ import sys import re import io import importlib from typing import cast, Any, Optional, Tuple, Union from elementpath._typing import Counter, Iterator, MutableMapping from elementpath.protocols import ElementProtocol, DocumentProtocol ### # Programmatic import of the pure Python ElementTree module. # # In Python 3 the pure Python implementation is overwritten by the C module API, # so use a programmatic re-import to obtain the pure Python module, necessary for # defining a safer XMLParser. ### # Temporary remove the loaded modules import xml.etree.ElementTree as ElementTree sys.modules.pop('xml.etree.ElementTree') _cmod = sys.modules.pop('_elementtree', None) # Load the pure Python module sys.modules['_elementtree'] = None # type: ignore[assignment] import xml.etree.ElementTree as PyElementTree # noqa import xml.etree # noqa # Restore original modules if _cmod is not None: # pragma: no cover sys.modules['_elementtree'] = _cmod xml.etree.ElementTree = ElementTree sys.modules['xml.etree.ElementTree'] = ElementTree class SafeXMLParser(PyElementTree.XMLParser): """ An XMLParser that forbids entities processing. Drops the *html* argument that is deprecated since version 3.4. :param target: the target object called by the `feed()` method of the \ parser, that defaults to `TreeBuilder`. :param encoding: if provided, its value overrides the encoding specified \ in the XML file. """ def __init__(self, target: Optional[Any] = None, encoding: Optional[str] = None) -> None: super(SafeXMLParser, self).__init__(target=target, encoding=encoding) self.parser.EntityDeclHandler = self.entity_declaration self.parser.UnparsedEntityDeclHandler = self.unparsed_entity_declaration self.parser.ExternalEntityRefHandler = self.external_entity_reference def entity_declaration(self, entity_name, is_parameter_entity, value, base, # type: ignore system_id, public_id, notation_name): raise PyElementTree.ParseError( "Entities are forbidden (entity_name={!r})".format(entity_name) ) def unparsed_entity_declaration(self, entity_name, base, system_id, # type: ignore public_id, notation_name): raise PyElementTree.ParseError( "Unparsed entities are forbidden (entity_name={!r})".format(entity_name) ) def external_entity_reference(self, context, base, system_id, public_id): # type: ignore raise PyElementTree.ParseError( "External references are forbidden (system_id={!r}, " "public_id={!r})".format(system_id, public_id) ) # pragma: no cover (EntityDeclHandler is called before) def defuse_xml(xml_source: Union[str, bytes]) -> Union[str, bytes]: resource: Any if isinstance(xml_source, str): resource = io.StringIO(xml_source) else: resource = io.BytesIO(xml_source) safe_parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) try: for _ in PyElementTree.iterparse(resource, ('start',), safe_parser): # pragma: no cover break except PyElementTree.ParseError as err: msg = str(err) if "Entities are forbidden" in msg or \ "Unparsed entities are forbidden" in msg or \ "External references are forbidden" in msg: raise return xml_source def is_etree_element(obj: Any) -> bool: return hasattr(obj, 'tag') and hasattr(obj, 'attrib') and hasattr(obj, 'text') def is_lxml_etree_element(obj: Any) -> bool: return is_etree_element(obj) and \ hasattr(obj, 'getparent') and \ hasattr(obj, 'nsmap') and \ obj.__class__.__module__ in ('lxml.etree', 'lxml.html') def is_etree_element_instance(obj: Any) -> bool: """Strictly checks that the objects is an ElementTree or lxml.etree Element.""" return isinstance(obj, ElementTree.Element) or \ isinstance(obj, PyElementTree.Element) or \ is_lxml_etree_element(obj) def is_etree_document(obj: Any) -> bool: return hasattr(obj, 'getroot') and hasattr(obj, 'parse') and hasattr(obj, 'iter') def is_lxml_etree_document(obj: Any) -> bool: return is_etree_document(obj) and \ hasattr(obj, 'xpath') and \ hasattr(obj, 'xslt') and \ obj.__class__.__module__ in ('lxml.etree', 'lxml.html') def is_etree_document_instance(obj: Any) -> bool: """Strictly checks that the objects is an ElementTree or lxml.etree document.""" return isinstance(obj, ElementTree.ElementTree) or \ isinstance(obj, PyElementTree.ElementTree) or \ is_lxml_etree_document(obj) def etree_iter_strings(elem: Union[DocumentProtocol, ElementProtocol], normalize: bool = False) -> Iterator[str]: e: ElementProtocol if normalize: if hasattr(elem, 'getroot'): root = cast(DocumentProtocol, elem).getroot() if root is None: return else: root = elem for e in elem.iter(): if callable(e.tag): continue if e.text is not None: yield e.text.strip() if e is root else e.text if e.tail is not None and e is not root: yield e.tail.strip() if e in root else e.tail else: for e in elem.iter(): if callable(e.tag): continue if e.text is not None: yield e.text if e.tail is not None and e is not elem: yield e.tail def etree_deep_equal(e1: ElementProtocol, e2: ElementProtocol) -> bool: if e1.tag != e2.tag: return False elif (e1.text or '').strip() != (e2.text or '').strip(): return False elif (e1.tail or '').strip() != (e2.tail or '').strip(): return False elif e1.attrib != e2.attrib: return False elif len(e1) != len(e2): return False return all(etree_deep_equal(c1, c2) for c1, c2 in zip(e1, e2)) def etree_iter_paths(elem: ElementProtocol, path: str = '.') \ -> Iterator[Tuple[ElementProtocol, str]]: yield elem, path comment_nodes = 0 pi_nodes = Counter[Optional[str]]() positions = Counter[Optional[str]]() for child in elem: if callable(child.tag): if child.tag.__name__ == 'Comment': # type: ignore[attr-defined] comment_nodes += 1 yield child, f'{path}/comment()[{comment_nodes}]' continue try: name = cast(str, child.target) # type: ignore[attr-defined] except AttributeError: assert child.text is not None name = child.text.split(' ', maxsplit=1)[0] pi_nodes[name] += 1 yield child, f'{path}/processing-instruction({name})[{pi_nodes[name]}]' continue if child.tag.startswith('{'): tag = f'Q{child.tag}' else: tag = f'Q{{}}{child.tag}' if path == '/': child_path = f'/{tag}' elif path: child_path = '/'.join((path, tag)) else: child_path = tag positions[child.tag] += 1 child_path += f'[{positions[child.tag]}]' yield from etree_iter_paths(child, child_path) def etree_tostring(elem: ElementProtocol, namespaces: Optional[MutableMapping[str, str]] = None, indent: str = '', max_lines: Optional[int] = None, spaces_for_tab: Optional[int] = 4, xml_declaration: Optional[bool] = None, encoding: str = 'unicode', method: str = 'xml') -> Union[str, bytes]: """ Serialize an Element tree to a string. :param elem: the Element instance. :param namespaces: is an optional mapping from namespace prefix to URI. \ Provided namespaces are registered before serialization. Ignored if the \ provided *elem* argument is a lxml Element instance. :param indent: the base line indentation. :param max_lines: if truncate serialization after a number of lines \ (default: do not truncate). :param spaces_for_tab: number of spaces for replacing tab characters. For \ default tabs are replaced with 4 spaces, provide `None` to keep tab characters. :param xml_declaration: if set to `True` inserts the XML declaration at the head. :param encoding: if "unicode" (the default) the output is a string, \ otherwise it’s binary. :param method: is either "xml" (the default), "html" or "text". :return: a Unicode string. """ def reindent(line: str) -> str: if not line: return line elif line.startswith(min_indent): return line[start:] if start >= 0 else indent[start:] + line else: return indent + line etree_module: Any if not is_etree_element_instance(elem): raise TypeError(f"{elem!r} is not an Element") elif isinstance(elem, PyElementTree.Element): etree_module = PyElementTree elif not hasattr(elem, 'nsmap'): etree_module = ElementTree else: etree_module = importlib.import_module('lxml.etree') if namespaces and not hasattr(elem, 'nsmap'): default_namespace = namespaces.get('') for prefix, uri in namespaces.items(): if prefix and not re.match(r'ns\d+$', prefix): etree_module.register_namespace(prefix, uri) if uri == default_namespace: default_namespace = None if default_namespace: etree_module.register_namespace('', default_namespace) xml_text = etree_module.tostring(elem, encoding=encoding, method=method) if isinstance(xml_text, bytes): xml_text = xml_text.decode('utf-8') if spaces_for_tab is not None: xml_text = xml_text.replace('\t', ' ' * spaces_for_tab) if xml_text.startswith(''.format(encoding)] lines.extend(xml_text.splitlines()) else: lines = xml_text.splitlines() # Clear ending empty lines while lines and not lines[-1].strip(): lines.pop(-1) if not lines or method == 'text' or (not indent and not max_lines): if encoding == 'unicode': return '\n'.join(lines) return '\n'.join(lines).encode(encoding) last_indent = ' ' * min(k for k in range(len(lines[-1])) if lines[-1][k] != ' ') if len(lines) > 2: try: child_indent = ' ' * min( k for line in lines[1:-1] for k in range(len(line)) if line[k] != ' ' ) except ValueError: child_indent = '' min_indent = min(child_indent, last_indent) else: min_indent = child_indent = last_indent start = len(min_indent) - len(indent) if max_lines is not None and len(lines) > max_lines + 2: lines = lines[:max_lines] + [child_indent + '...'] * 2 + lines[-1:] if encoding == 'unicode': return '\n'.join(reindent(line) for line in lines) return '\n'.join(reindent(line) for line in lines).encode(encoding) __all__ = ['ElementTree', 'PyElementTree', 'SafeXMLParser', 'defuse_xml', 'is_etree_element', 'is_lxml_etree_element', 'is_etree_element_instance', 'is_etree_document', 'is_lxml_etree_document', 'is_etree_document_instance', 'etree_iter_strings', 'etree_deep_equal', 'etree_iter_paths', 'etree_tostring'] sissaschool-elementpath-d3688c7/elementpath/exceptions.py000066400000000000000000000353651476131650400240040ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import locale from typing import TYPE_CHECKING, Any, Optional, Union if TYPE_CHECKING: from elementpath.tdop import Token from elementpath.aliases import AnyNsmapType from elementpath.namespaces import XQT_ERRORS_NAMESPACE from elementpath import datatypes class ElementPathError(Exception): """ Base exception class for elementpath package. :param message: the message related to the error. :param code: an optional error code. :param token: an optional token instance related with the error. """ def __init__(self, message: str, code: Optional[str] = None, token: Optional['Token[Any]'] = None) -> None: super(ElementPathError, self).__init__(message) self.message = message self.code = code self.token = token def __str__(self) -> str: if self.token is None or not isinstance(self.token.value, (str, bytes)): if not self.code: return self.message return '[{}] {}'.format(self.code, self.message) elif not self.code: return '{1} at line {2}, column {3}: {0}'.format( self.message, self.token, *self.token.position ) return '{2} at line {3}, column {4}: [{1}] {0}'.format( self.message, self.code, self.token, *self.token.position ) class MissingContextError(ElementPathError): """Raised when the dynamic context is required for evaluate the XPath expression.""" class UnsupportedFeatureError(ElementPathError, NotImplementedError): """Raised when an XPath feature is not supported in the current context.""" class ElementPathKeyError(ElementPathError, KeyError): pass class ElementPathZeroDivisionError(ElementPathError, ZeroDivisionError): pass class ElementPathNameError(ElementPathError, NameError): pass class ElementPathOverflowError(ElementPathError, OverflowError): pass class ElementPathRuntimeError(ElementPathError, RuntimeError): pass class ElementPathSyntaxError(ElementPathError, SyntaxError): pass class ElementPathTypeError(ElementPathError, TypeError): pass class ElementPathValueError(ElementPathError, ValueError): pass class ElementPathLocaleError(ElementPathError, locale.Error): pass XPATH_ERROR_CODES = { # XPath 2.0 parser errors (https://www.w3.org/TR/xpath20/#id-errors) 'XPST0001': (ElementPathValueError, 'Parser not bound to a schema'), 'XPST0003': (ElementPathSyntaxError, 'Invalid XPath expression'), 'XPDY0002': (MissingContextError, 'Dynamic context required for evaluate'), 'XPTY0004': (ElementPathTypeError, 'Type is not appropriate for the context'), 'XPST0005': (ElementPathValueError, 'A not empty sequence required'), 'XPST0008': (ElementPathNameError, 'Name not found'), 'XPST0010': (ElementPathNameError, 'Axis not found'), 'XPST0017': (ElementPathTypeError, 'Wrong number of arguments'), 'XPTY0018': (ElementPathTypeError, 'Step result contains both nodes and atomic values'), 'XPTY0019': (ElementPathTypeError, 'Intermediate step contains an atomic value'), 'XPTY0020': (ElementPathTypeError, 'Context item is not a node'), 'XPDY0050': (ElementPathTypeError, 'Type does not match sequence type'), 'XPST0051': (ElementPathNameError, 'Unknown atomic type'), 'XPST0080': (ElementPathNameError, 'Target type cannot be xs:NOTATION or xs:anyAtomicType'), 'XPST0081': (ElementPathNameError, 'Unknown namespace'), # Data types and functions errors 'FOER0000': (ElementPathError, 'Unidentified error'), 'FOAR0001': (ElementPathZeroDivisionError, 'Division by zero'), 'FOAR0002': (ElementPathOverflowError, 'Numeric operation overflow/underflow'), 'FOCA0001': (ElementPathValueError, 'Input value too large for decimal'), 'FOCA0002': (ElementPathValueError, 'Invalid lexical value'), 'FOCA0003': (ElementPathValueError, 'Input value too large for integer'), 'FOCA0005': (ElementPathValueError, 'NaN supplied as float/double value'), 'FOCA0006': (ElementPathValueError, 'String to be cast to decimal has too many digits of precision'), 'FOCH0001': (ElementPathValueError, 'Code point not valid'), 'FOCH0002': (ElementPathLocaleError, 'Unsupported collation'), 'FOCH0003': (ElementPathValueError, 'Unsupported normalization form'), 'FOCH0004': (ElementPathLocaleError, 'Collation does not support collation units'), 'FODC0001': (ElementPathValueError, 'No context document'), 'FODC0002': (ElementPathValueError, 'Error retrieving resource'), 'FODC0003': (ElementPathValueError, 'Function stability not defined'), 'FODC0004': (ElementPathValueError, 'Invalid argument to fn:collection'), 'FODC0005': (ElementPathValueError, 'Invalid argument to fn:doc or fn:doc-available'), 'FODT0001': (ElementPathOverflowError, 'Overflow/underflow in date/time operation'), 'FODT0002': (ElementPathOverflowError, 'Overflow/underflow in duration operation'), 'FODT0003': (ElementPathValueError, 'Invalid timezone value'), 'FONS0004': (ElementPathKeyError, 'No namespace found for prefix'), 'FONS0005': (ElementPathValueError, 'Base-uri not defined in the static context'), 'FORG0001': (ElementPathValueError, 'Invalid value for cast/constructor'), 'FORG0002': (ElementPathValueError, 'Invalid argument to fn:resolve-uri()'), 'FORG0003': (ElementPathValueError, 'fn:zero-or-one called with a sequence containing more than one item'), 'FORG0004': (ElementPathValueError, 'fn:one-or-more called with a sequence containing no items'), 'FORG0005': (ElementPathValueError, 'fn:exactly-one called with a sequence containing zero or more than one item'), 'FORG0006': (ElementPathTypeError, 'Invalid argument type'), 'FORG0008': (ElementPathValueError, 'The two arguments to fn:dateTime have inconsistent timezones'), 'FORG0009': (ElementPathValueError, 'Error in resolving a relative URI against a base URI in fn:resolve-uri'), 'FORX0001': (ElementPathValueError, 'Invalid regular expression flags'), 'FORX0002': (ElementPathValueError, 'Invalid regular expression'), 'FORX0003': (ElementPathValueError, 'Regular expression matches zero-length string'), 'FORX0004': (ElementPathValueError, 'Invalid replacement string'), 'FOTY0012': (ElementPathValueError, 'Argument node does not have a typed value'), # XPath 3.0+ errors 'XQST0039': (ElementPathTypeError, 'Duplicate parameter name in inline function expression'), 'XQST0046': (ElementPathTypeError, 'The namespace part of the EQName is not a valid URI'), 'XQST0052': (ElementPathNameError, 'The name of an in-scope simple schema type required'), 'XQST0070': (ElementPathNameError, 'Illegal use of a predefined namespace'), 'FOTY0013': (ElementPathTypeError, 'The argument to fn:data() contains a function item'), 'FOTY0014': (ElementPathTypeError, 'The argument to fn:string() is a function item'), 'FOTY0015': (ElementPathTypeError, 'An argument to fn:deep-equal() contains a function item'), 'FODC0006': (ElementPathValueError, 'String passed to fn:parse-xml is not a well-formed XML document'), 'FODC0010': (ElementPathRuntimeError, 'The processor does not support serialization'), 'FOUT1170': (ElementPathValueError, 'Invalid $href argument to fn:unparsed-text()'), 'FOUT1190': (ElementPathValueError, 'Cannot decode resource retrieved by fn:unparsed-text()'), 'FOUT1200': (ElementPathValueError, 'Cannot infer encoding of resource retrieved by fn:unparsed-text()'), 'FODF1280': (ElementPathValueError, 'Invalid decimal format name'), 'FODF1310': (ElementPathValueError, 'Invalid decimal format picture string'), 'FOFD1340': (ElementPathValueError, 'Invalid date/time formatting parameters'), 'FOFD1350': (ElementPathValueError, 'Invalid date/time formatting component'), 'XPTY0117': (ElementPathTypeError, 'Item type is xs:untypedAtomic and the expected type is namespace-sensitive'), 'XPDY0130': (ElementPathValueError, 'An implementation-defined limit has been exceeded'), 'XPST0133': (ElementPathValueError, 'The namespace URI for EQName is http://www.w3.org/2000/xmlns/'), # XSLT and XQuery Serialization errors # (the complete list: https://www.w3.org/TR/xslt-xquery-serialization/#id-errors) 'SENR0001': (ElementPathTypeError, 'item is an attribute node or a namespace node'), 'SEPM0016': (ElementPathValueError, 'parameter value is invalid for the defined domain'), 'SEPM0017': (ElementPathValueError, 'error during extraction of serialization parameters'), 'SEPM0018': (ElementPathTypeError, 'use-character-maps serialization parameter in ' 'a sequence of length greater than one'), 'SEPM0019': (ElementPathValueError, 'same serialization parameter appears more than once'), 'SERE0020': (ElementPathTypeError, 'a numeric value being serialized using the JSON output ' 'method cannot be represented in the JSON grammar'), 'SERE0021': (ElementPathTypeError, 'a sequence being serialized using the JSON output ' 'method includes items for which no rules are provided ' 'in the appropriate section of the serialization rules'), 'SERE0022': (ElementPathValueError, 'a map being serialized using the JSON output method ' 'has two keys with the same string value'), 'SERE0023': (ElementPathTypeError, 'a sequence being serialized using the JSON output ' 'method is of length greater than one'), # XPath 3.1+ errors 'FOJS0001': (ElementPathSyntaxError, 'JSON syntax error'), 'FOJS0003': (ElementPathValueError, 'JSON duplicate keys'), 'FOJS0004': (ElementPathRuntimeError, 'JSON: not schema-aware'), 'FOJS0005': (ElementPathValueError, 'Invalid options'), 'FOJS0006': (ElementPathValueError, 'Invalid XML representation of JSON'), 'FOJS0007': (ElementPathValueError, 'Bad JSON escape sequence'), 'FOAY0001': (ElementPathValueError, 'Array index out of bounds'), 'FOAY0002': (ElementPathValueError, 'Negative array length'), 'FOQM0001': (ElementPathValueError, 'Module URI is a zero-length string'), 'FOQM0002': (ElementPathRuntimeError, 'Module URI not found'), 'FOQM0003': (ElementPathRuntimeError, 'Static error in dynamically-loaded XQuery module'), 'FOQM0005': (ElementPathValueError, 'Parameter for dynamically-loaded ' 'XQuery module has incorrect type'), 'FOQM0006': (ElementPathRuntimeError, 'No suitable XQuery processor available'), 'FOXT0001': (ElementPathRuntimeError, 'No suitable XSLT processor available'), 'FOXT0002': (ElementPathValueError, 'Invalid parameters to XSLT transformation'), 'FOXT0003': (ElementPathRuntimeError, 'XSLT transformation failed'), 'FOXT0004': (ElementPathRuntimeError, 'XSLT transformation has been disabled'), 'FOXT0006': (ElementPathValueError, 'XSLT output contains non-accepted characters'), 'FOAP0001': (ElementPathTypeError, 'Wrong number of arguments'), 'FORG0010': (ElementPathValueError, 'Invalid date/time'), 'XQDY0137': (ElementPathValueError, 'No two keys in a map may have the same key value'), } def xpath_error(code: Union[str, 'datatypes.QName'], message_or_error: Union[None, str, Exception] = None, token: Optional['Token[Any]'] = None, namespaces: AnyNsmapType = None) -> ElementPathError: """ Returns an XPath error instance related with a code. An XPath/XQuery/XSLT error code (ref: http://www.w3.org/2005/xqt-errors) is an alphanumeric token starting with four uppercase letters and ending with four digits. :param code: the error code. :param message_or_error: an optional custom message or related exception. :param token: an optional token instance. :param namespaces: an optional namespace mapping for finding the prefix \ related with the namespace 'http://www.w3.org/2005/xqt-errors'. For default the prefix 'err' is used. """ if isinstance(code, datatypes.QName): namespace = code.uri if namespace: pcode, code = code.qname, code.local_name else: pcode, code = code.braced_uri_name, code.local_name else: namespace = XQT_ERRORS_NAMESPACE prefix: Optional[str] if not namespaces or namespaces.get('err') == XQT_ERRORS_NAMESPACE: prefix = 'err' else: for prefix, uri in namespaces.items(): if uri == XQT_ERRORS_NAMESPACE: break else: prefix = 'err' if code.startswith('{'): try: namespace, code = code[1:].split('}') except ValueError: message = '{!r} is not an xs:QName'.format(code) raise ElementPathValueError(message, 'err:XPTY0004', token) else: pcode = f'{prefix}:{code}' if prefix else code elif ':' not in code: pcode = f'{prefix}:{code}' if prefix else code elif code.startswith(f'{prefix}:') and code.count(':') == 1: pcode, code = code, code.split(':')[1] else: message = '%r is not an XPath error code' % code raise ElementPathValueError(message, 'err:XPTY0004', token) if namespace != XQT_ERRORS_NAMESPACE: message = 'invalid namespace {!r}'.format(namespace) raise ElementPathValueError(message, 'err:XPTY0004', token) try: error_class, default_message = XPATH_ERROR_CODES[code] except KeyError: if namespace == XQT_ERRORS_NAMESPACE: message = f'unknown XPath error code {code}' raise ElementPathValueError(message, 'err:XPTY0004', token) from None else: error_class = ElementPathError default_message = 'custom XPath error' if message_or_error is None: message = default_message elif isinstance(message_or_error, str): message = message_or_error elif isinstance(message_or_error, ElementPathError): message = message_or_error.message else: message = str(message_or_error) return error_class(message, pcode, token) sissaschool-elementpath-d3688c7/elementpath/helpers.py000066400000000000000000000251241476131650400232550ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re import math from calendar import isleap, leapdays from decimal import Decimal from operator import attrgetter from typing import Any, List, Optional, overload, SupportsFloat, Type, Union from urllib.parse import urlsplit from elementpath._typing import Iterator, Match, Pattern ### # Common sets constants OCCURRENCE_INDICATORS = frozenset(('?', '*', '+')) BOOLEAN_VALUES = frozenset(('true', 'false', '1', '0')) NUMERIC_INF_OR_NAN = frozenset(('INF', '-INF', 'NaN')) INVALID_NUMERIC = frozenset( ('inf', '+inf', '-inf', 'nan', 'infinity', '+infinity', '-infinity') ) ### # Data validation patterns class LazyPattern: """ A descriptor for creating lazy regexp patterns. The compiled pattern is built only when the descriptor attribute is accessed (e.g. a hasattr() call). """ _compiled: Pattern[str] def __init__(self, pattern: str, flags: Union[int, re.RegexFlag] = 0) -> None: self._pattern = pattern self._flags = flags def __set_name__(self, owner: Type[Any], name: str) -> None: self._name = name @overload def __get__(self, instance: None, owner: Type[Any]) -> Pattern[str]: ... @overload def __get__(self, instance: Any, owner: Type[Any]) -> Pattern[str]: ... def __get__(self, instance: Optional[Any], owner: Type[Any]) -> Pattern[str]: try: return self._compiled except AttributeError: self._compiled = re.compile(self._pattern, self._flags) return self._compiled def __set__(self, instance: Any, value: Any) -> None: raise AttributeError("Can't set attribute {}".format(self._name)) def __delete__(self, instance: Any) -> None: raise AttributeError("Can't delete attribute {}".format(self._name)) class Patterns: """ Helper patterns, the ones that aren't used at import time are defined lazy. """ whitespaces = re.compile(r'[^\S\xa0]+') # include ASCII 160 (non-breaking space) normalize = LazyPattern(r'[^\S\xa0]') ncname = LazyPattern(r'^[^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*$') extended_qname = LazyPattern( r'^(?:Q{(?P[^}]+)}|' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*):)?' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*)$', ) replacement = LazyPattern(r'^([^\\$]|\\{2}|\\\$|\$\d+)*$') sequence_type = LazyPattern(r'\s?([()?*+,])\s?') unicode_escape = LazyPattern(r'(?:\\u([0-9A-Fa-f]{4})|\\U([0-9A-Fa-f]{8}))') wrong_escape = LazyPattern(r'%(?![a-fA-F\d]{2})') xml_newlines = LazyPattern('\r\n|\r|\n') # Regex patterns related to names and namespaces namespace_uri = LazyPattern(r'{([^}]+)}') expanded_name = LazyPattern( r'^(?:{(?P[^}]+)})?' r'(?P[^\d\W][\w\-.\u00B7\u0300-\u036F\u0387\u06DD\u06DE\u203F\u2040]*)$', ) def upper_camel_case(s: str) -> str: return re.sub(r'^\d+', '', re.sub(r'[\W_]', '', s.title())) def collapse_white_spaces(s: str) -> str: return Patterns.whitespaces.sub(' ', s).strip(' ') def is_ncname(s: str) -> bool: return Patterns.ncname.match(s) is not None def is_idrefs(value: Optional[str]) -> bool: return isinstance(value, str) and \ all(Patterns.ncname.match(x) is not None for x in value.split()) node_position = attrgetter('position') ### # Date/Time helpers MONTH_DAYS = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] MONTH_DAYS_LEAP = [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] def adjust_day(year: int, month: int, day: int) -> int: if month in (1, 3, 5, 7, 8, 10, 12): return day elif month in (4, 6, 9, 11): return min(day, 30) else: return min(day, 29) if isleap(year) else min(day, 28) def days_from_common_era(year: int) -> int: """ Returns the number of days from 0001-01-01 to the provided year. For a common era year the days are counted until the last day of December, for a BCE year the days are counted down from the end to the 1st of January. """ if year > 0: return year * 365 + year // 4 - year // 100 + year // 400 elif year >= -1: return year * 366 else: year = -year - 1 return -(366 + year * 365 + year // 4 - year // 100 + year // 400) DAYS_IN_4Y = days_from_common_era(4) DAYS_IN_100Y = days_from_common_era(100) DAYS_IN_400Y = days_from_common_era(400) def months2days(year: int, month: int, months_delta: int) -> int: """ Converts a delta of months to a delta of days, counting from the 1st day of the month, relative to the year and the month passed as arguments. :param year: the reference start year, a negative or zero value means a BCE year \ (0 is 1 BCE, -1 is 2 BCE, -2 is 3 BCE, etc.). :param month: the starting month (1-12). :param months_delta: the number of months, if negative count backwards. """ if not months_delta: return 0 total_months = month - 1 + months_delta target_year = year + total_months // 12 target_month = total_months % 12 + 1 if month <= 2: y_days = 365 * (target_year - year) + leapdays(year, target_year) else: y_days = 365 * (target_year - year) + leapdays(year + 1, target_year + 1) months_days = MONTH_DAYS_LEAP if isleap(target_year) else MONTH_DAYS if target_month >= month: m_days = sum(months_days[m] for m in range(month, target_month)) return y_days + m_days if y_days >= 0 else y_days + m_days else: m_days = sum(months_days[m] for m in range(target_month, month)) return y_days - m_days if y_days >= 0 else y_days - m_days def round_number(value: Union[float, int, Decimal]) -> Union[float, int, Decimal]: if math.isnan(value) or math.isinf(value): return value number = Decimal(value) if number > 0: return type(value)(number.quantize(Decimal('1'), rounding='ROUND_HALF_UP')) else: return type(value)(number.quantize(Decimal('1'), rounding='ROUND_HALF_DOWN')) def normalized_seconds(seconds: Union[int, Decimal]) -> str: # Decimal.normalize() does not remove exp every time: eg. Decimal('1E+1') return '{:.6f}'.format(seconds).rstrip('0').rstrip('.') def is_xml_codepoint(cp: int) -> bool: return cp in (0x9, 0xA, 0xD) or \ 0x20 <= cp <= 0xD7FF or \ 0xE000 <= cp <= 0xFFFD or \ 0x10000 <= cp <= 0x10FFFF def ordinal(n: int) -> str: if n in (11, 12, 13): return '%dth' % n least_significant_digit = n % 10 if least_significant_digit == 1: return '%dst' % n elif least_significant_digit == 2: return '%dnd' % n elif least_significant_digit == 3: return '%drd' % n else: return '%dth' % n def get_double(value: Union[SupportsFloat, str], xsd_version: str = '1.0') -> float: if isinstance(value, str): value = collapse_white_spaces(value) if value in NUMERIC_INF_OR_NAN or xsd_version != '1.0' and value == '+INF': if value == 'NaN': return math.nan # for NaN use the predefined instance to keep identity elif value.lower() in INVALID_NUMERIC: raise ValueError(f'invalid value {value!r} for xs:double/xs:float') elif math.isnan(value): return math.nan return float(value) def numeric_equal(op1: SupportsFloat, op2: SupportsFloat) -> bool: if op1 == op2: return True return math.isclose(op1, op2, rel_tol=1e-7, abs_tol=0.0) def numeric_not_equal(op1: SupportsFloat, op2: SupportsFloat) -> bool: if op1 == op2: return False return not math.isclose(op1, op2, rel_tol=1e-7, abs_tol=0.0) def equal(op1: Any, op2: Any) -> bool: if isinstance(op1, float) and math.isnan(op1): return isinstance(op2, float) and math.isnan(op2) return bool(op1 == op2) def not_equal(op1: Any, op2: Any) -> bool: if isinstance(op1, float) and math.isnan(op1): return not isinstance(op2, float) or not math.isnan(op2) return bool(op1 != op2) def match_wildcard(name: str, wildcard: str) -> bool: if wildcard == '*' or wildcard == '*:*': return True elif wildcard.startswith('*:'): if name.startswith('{'): return name.endswith(f'}}{wildcard[2:]}') else: return name == wildcard[2:] elif wildcard.startswith('{') and wildcard.endswith('}*') or wildcard.endswith(':*'): return name.startswith(wildcard[:-1]) else: return False def escape_json_string(s: str, escaped: bool = False) -> str: if escaped: s = s.replace('\\"', '"') else: s = s.replace('\\', '\\\\') s = s.replace('\"', '\\"').\ replace('\b', r'\b').\ replace('\r', r'\r').\ replace('\n', r'\n').\ replace('\t', r'\t').\ replace('\f', r'\f').\ replace('/', r'\/') return ''.join( rf'\u{ord(x):04X}' if 1 <= ord(x) <= 31 or 127 <= ord(x) <= 159 else x for x in s ) def unescape_json_string(s: str) -> str: def unicode_escape_callback(match: Match[str]) -> str: group = match.group(1) or match.group(2) return chr(int(group.upper(), 16)) s = s.replace('\\"', '\"').\ replace(r'\b', '\b').\ replace(r'\r', '\r').\ replace(r'\n', '\n').\ replace(r'\t', '\t').\ replace(r'\f', '\f').\ replace(r'\/', '/').\ replace('\\\\', '\\') return Patterns.unicode_escape.sub(unicode_escape_callback, s) def iter_sequence(obj: Any) -> Iterator[Any]: if obj is None: return elif isinstance(obj, list): for item in obj: yield from iter_sequence(item) else: yield obj def split_function_test(function_test: str) -> List[str]: if not function_test.startswith('function('): return [] elif function_test == 'function(*)': return ['*'] parts = function_test[9:].partition(') as ') if parts[0]: sequence_types = parts[0].split(', ') sequence_types.append(parts[2]) else: sequence_types = [parts[2]] return sequence_types def is_absolute_uri(uri: str) -> bool: try: parts = urlsplit(uri.strip()) except ValueError: return False else: return parts.scheme == 'urn' or \ parts.scheme != '' and parts.netloc != '' or \ parts.path.startswith('/') sissaschool-elementpath-d3688c7/elementpath/namespaces.py000066400000000000000000000123041476131650400237260ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import cast, Tuple, Union from elementpath.aliases import NamespacesType, NsmapType from elementpath.helpers import Patterns # Namespaces XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace" XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/" # Used in DOM for xmlns declarations XSD_NAMESPACE = "http://www.w3.org/2001/XMLSchema" XSI_NAMESPACE = "http://www.w3.org/2001/XMLSchema-instance" XLINK_NAMESPACE = "http://www.w3.org/1999/xlink" # XPath/XQuery namespaces XPATH_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions" XQT_ERRORS_NAMESPACE = "http://www.w3.org/2005/xqt-errors" XPATH_MATH_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions/math" XPATH_MAP_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions/map" XPATH_ARRAY_FUNCTIONS_NAMESPACE = "http://www.w3.org/2005/xpath-functions/array" XSLT_XQUERY_SERIALIZATION_NAMESPACE = "http://www.w3.org/2010/xslt-xquery-serialization" # XML namespace attributes XML_BASE = '{%s}base' % XML_NAMESPACE XML_LANG = '{%s}lang' % XML_NAMESPACE XML_SPACE = '{%s}space' % XML_NAMESPACE XML_ID = '{%s}id' % XML_NAMESPACE # XML Schema Instance namespace attributes XSI_TYPE = '{%s}type' % XSI_NAMESPACE XSI_NIL = '{%s}nil' % XSI_NAMESPACE XSI_SCHEMA_LOCATION = '{%s}schemaLocation' % XSI_NAMESPACE XSI_NONS_SCHEMA_LOCATION = '{%s}schemaLocation' % XSI_NAMESPACE # XML Schema tags (schema and types) XSD_SCHEMA = '{%s}schema' % XSD_NAMESPACE XSD_ANY_TYPE = '{%s}anyType' % XSD_NAMESPACE XSD_ANY_SIMPLE_TYPE = '{%s}anySimpleType' % XSD_NAMESPACE XSD_ANY_ATOMIC_TYPE = '{%s}anyAtomicType' % XSD_NAMESPACE XSD_NOTATION = '{%s}NOTATION' % XSD_NAMESPACE XSD_ID = '{%s}ID' % XSD_NAMESPACE XSD_IDREF = '{%s}IDREF' % XSD_NAMESPACE XSD_IDREFS = '{%s}IDREFS' % XSD_NAMESPACE XSD_STRING = '{%s}string' % XSD_NAMESPACE XSD_FLOAT = '{%s}float' % XSD_NAMESPACE XSD_DOUBLE = '{%s}double' % XSD_NAMESPACE XSD_DECIMAL = '{%s}decimal' % XSD_NAMESPACE # XPath type labels defined in XSD namespace that are not XSD builtin types XSD_UNTYPED = '{%s}untyped' % XSD_NAMESPACE XSD_UNTYPED_ATOMIC = '{%s}untypedAtomic' % XSD_NAMESPACE XSD_ERROR = '{%s}error' % XSD_NAMESPACE XSD_NUMERIC = '{%s}numeric' % XSD_NAMESPACE def get_namespace(name: str) -> str: try: return Patterns.namespace_uri.match(name).group(1) # type: ignore[union-attr] except AttributeError: return '' def split_expanded_name(name: str) -> Tuple[str, str]: match = Patterns.expanded_name.match(name) if match is None: raise ValueError(f"{name!r} is not an expanded QName") namespace, local_name = match.groups() return namespace or '', local_name def get_prefixed_name(name: str, namespaces: Union[NamespacesType, NsmapType]) -> str: """ Get the prefixed form of a QName, using a namespace map. :param name: an extended QName or a local name or a prefixed QName. :param namespaces: a dictionary with a map from prefixes to namespace URIs. """ try: if not name.startswith(('{', 'Q{')): return name elif name[0] == '{': ns_uri, local_name = name[1:].split('}') else: ns_uri, local_name = name[2:].split('}') except (ValueError, TypeError): raise ValueError(f"{name!r} is not a QName") for prefix, uri in sorted(namespaces.items(), reverse=True, key=lambda x: x if x[0] is not None else ('', x[1])): if uri == ns_uri: return f'{prefix}:{local_name}' if prefix else local_name else: if ns_uri == XML_NAMESPACE: return f'xml:{local_name}' return name def get_expanded_name(name: str, namespaces: Union[NamespacesType, NsmapType]) -> str: """ Get the expanded form of a QName, using a namespace map. Local names are mapped to the default namespace. :param name: a prefixed QName or a local name or an extended QName. :param namespaces: a dictionary with a map from prefixes to namespace URIs. :return: the expanded format of a QName or a local name. """ if not name or name.startswith('{'): return name elif name.startswith('Q{'): return name[1:] try: prefix, local_name = name.split(':') except ValueError: if ':' in name: raise ValueError(f"wrong format for prefixed QName {name!r}") elif '' in namespaces: uri = namespaces[''] elif None in namespaces: uri = cast(NsmapType, namespaces)[None] # lxml nsmap else: return name return f'{{{uri}}}{name}' if uri else name else: if not prefix or not local_name: raise ValueError(f"wrong format for reference name {name!r}") elif prefix == 'xml': return f'{{{XML_NAMESPACE}}}{local_name}' uri = namespaces[prefix] if not uri: raise ValueError(f"prefix {prefix!r} is mapped to an empty URI") return f'{{{uri}}}{local_name}' sissaschool-elementpath-d3688c7/elementpath/protocols.py000066400000000000000000000301521476131650400236340ustar00rootroot00000000000000# # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Define type hints protocols for XPath related objects. """ from typing import overload, Any, Dict, Iterator, Iterable, Optional, Sequence, ItemsView, \ Protocol, Sized, Hashable, Union, TypeVar, Mapping, Tuple, TYPE_CHECKING, Set from xml.etree.ElementTree import Element, ElementTree from elementpath._typing import MutableMapping from elementpath.aliases import NamespacesType, NsmapType if TYPE_CHECKING: from elementpath.schema_proxy import AbstractSchemaProxy _T = TypeVar("_T") _AnyStr = Union[str, bytes] class LxmlQNameProtocol(Protocol): localname: _AnyStr namespace: _AnyStr text: _AnyStr LxmlKeyType = Union[str, bytes, LxmlQNameProtocol] class LxmlAttribProtocol(Protocol): """A minimal protocol for attributes of lxml Element objects.""" def get(self, *args: Any, **kwargs: Any) -> Optional[str]: ... def items(self) -> Sequence[Tuple[Any, Any]]: ... def __contains__(self, key: Any) -> bool: ... def __getitem__(self, key: Any) -> Any: ... def __iter__(self) -> Iterator[Any]: ... def __len__(self) -> int: ... class ElementProtocol(Sized, Hashable, Protocol): """A protocol for generic ElementTree elements.""" def __iter__(self) -> Iterator['ElementProtocol']: ... def find( self, path: str, namespaces: Optional[Dict[str, str]] = ... ) -> Optional['ElementProtocol']: ... def iter(self, tag: Optional[str] = ...) -> Iterator['ElementProtocol']: ... @overload def get(self, key: str) -> Optional[str]: ... @overload def get(self, key: str, default: _T) -> Union[str, _T]: ... def get(self, key: str, default: Optional[_T] = None) -> Union[str, _T, None]: ... @property def tag(self) -> str: ... @property def text(self) -> Optional[str]: ... @property def tail(self) -> Optional[str]: ... @property def attrib(self) -> 'AttribType': ... class EtreeElementProtocol(ElementProtocol, Protocol): """A protocol for xml.etree.ElementTree elements.""" def __iter__(self) -> Iterator['EtreeElementProtocol']: ... def find( self, path: str, namespaces: Optional[Dict[str, str]] = ... ) -> Optional['EtreeElementProtocol']: ... def iter(self, tag: Optional[str] = ...) -> Iterator['EtreeElementProtocol']: ... @property def attrib(self) -> Dict[str, str]: ... class LxmlElementProtocol(ElementProtocol, Protocol): """A protocol for lxml.etree elements.""" def __iter__(self) -> Iterator['LxmlElementProtocol']: ... def find( self, path: str, namespaces: Optional[MutableMapping[str, str]] = ... ) -> Optional['LxmlElementProtocol']: ... def iter(self, tag: Optional[str] = ...) -> Iterator['LxmlElementProtocol']: ... def getroottree(self) -> 'LxmlDocumentProtocol': ... def getnext(self) -> Optional['LxmlElementProtocol']: ... def getparent(self) -> Optional['LxmlElementProtocol']: ... def getprevious(self) -> Optional['LxmlElementProtocol']: ... def itersiblings(self, tag: Optional[str] = ..., *tags: str, preceding: bool = False) -> Iterable['LxmlElementProtocol']: ... @property def nsmap(self) -> NsmapType: ... @property def attrib(self) -> LxmlAttribProtocol: ... class DocumentProtocol(Hashable, Protocol): def getroot(self) -> Optional[ElementProtocol]: ... def parse(self, source: Any, *args: Any, **kwargs: Any) -> ElementProtocol: ... def iter(self, tag: Optional[str] = ...) -> Iterator[ElementProtocol]: ... class LxmlDocumentProtocol(Hashable, Protocol): def getroot(self) -> Optional[LxmlElementProtocol]: ... def parse(self, source: Any, *args: Any, **kwargs: Any) -> LxmlElementProtocol: ... def iter(self, tag: Optional[str] = ...) -> Iterator[LxmlElementProtocol]: ... class XsdValidatorProtocol(Hashable, Protocol): def is_matching(self, name: Optional[str], default_namespace: Optional[str] = None) -> bool: ... @property def name(self) -> Optional[str]: ... @property def xsd_version(self) -> str: ... @property def maps(self) -> 'GlobalMapsProtocol': ... class XsdComponentProtocol(XsdValidatorProtocol, Protocol): @property def parent(self) -> Optional['XsdComponentProtocol']: ... class XsdTypeProtocol(XsdComponentProtocol, Protocol): def is_simple(self) -> bool: """Returns `True` if it's a simpleType instance, `False` if it's a complexType.""" ... def is_empty(self) -> bool: """ Returns `True` if it's a simpleType instance or a complexType with empty content, `False` otherwise. """ ... def has_simple_content(self) -> bool: """ Returns `True` if it's a simpleType instance or a complexType with simple content, `False` otherwise. """ ... def has_mixed_content(self) -> bool: """ Returns `True` if it's a complexType with mixed content, `False` otherwise. """ ... def is_element_only(self) -> bool: """ Returns `True` if it's a complexType with element-only content, `False` otherwise. """ ... def is_atomic(self) -> bool: """Returns `True` if the instance is an atomic simpleType, `False` otherwise.""" ... def is_list(self) -> bool: """Returns `True` if the instance is a list simpleType, `False` otherwise.""" ... def is_union(self) -> bool: """Returns `True` if the instance is a union simpleType, `False` otherwise.""" ... def is_key(self) -> bool: """Returns `True` if it's a simpleType derived from xs:ID, `False` otherwise.""" ... def is_qname(self) -> bool: """Returns `True` if it's a simpleType derived from xs:QName, `False` otherwise.""" ... def is_notation(self) -> bool: """Returns `True` if it's a simpleType derived from xs:NOTATION, `False` otherwise.""" ... @overload def is_valid(self, obj: Any, use_defaults: bool = True, namespaces: Optional[NamespacesType] = None, *args: Any, **kwargs: Any) -> bool: ... @overload def is_valid(self, obj: Any, *args: Any, **kwargs: Any) -> bool: ... def is_valid(self, obj: Any, *args: Any, **kwargs: Any) -> bool: """ Validates an XML object node using the XSD type. The argument *obj* is an element for complex type nodes or a text value for simple type nodes. Returns `True` if the argument is valid, `False` otherwise. """ ... @overload def validate(self, obj: Any, use_defaults: bool = True, namespaces: Optional[NamespacesType] = None, *args: Any, **kwargs: Any) -> None: ... @overload def validate(self, obj: Any, *args: Any, **kwargs: Any) -> None: ... def validate(self, obj: Any, *args: Any, **kwargs: Any) -> None: """ Validates an XML object node using the XSD type. The argument *obj* is an element for complex type nodes or a text value for simple type nodes. Raises a `ValueError` compatible exception (a `ValueError` or a subclass of it) if the argument is not valid. """ ... def decode(self, obj: Any, *args: Any, **kwargs: Any) -> Any: """ Decodes an XML object node using the XSD type. The argument *obj* is an element for complex type nodes or a text value for simple type nodes. Raises a `ValueError` or a `TypeError` compatible exception if the argument it's not valid. """ ... @property def root_type(self) -> 'XsdTypeProtocol': """ The type at base of the definition of the XSD type. For a special type is the type itself. For an atomic type is the primitive type. For a list is the primitive type of the item. For a union is the base union type. For a complex type is xs:anyType. """ ... @property def simple_type(self) -> Optional['XsdTypeProtocol']: """ The instance if it's a simpleType instance or the simpleType instance used for deriving a complexType with simple content, `None` otherwise. """ ... class XsdAttributeProtocol(XsdComponentProtocol, Protocol): @property def type(self) -> Optional[XsdTypeProtocol]: ... @property def ref(self) -> Optional[Any]: ... XsdXPathNodeType = Union['XsdSchemaProtocol', 'XsdElementProtocol'] class XsdAttributeGroupProtocol(XsdComponentProtocol, Protocol): @overload def get(self, key: Optional[str]) -> Optional[XsdAttributeProtocol]: ... @overload def get(self, key: Optional[str], default: _T) -> Union[XsdAttributeProtocol, _T]: ... def get(self, key: Optional[str], default: Optional[_T] = None) \ -> Union[XsdAttributeProtocol, _T, None]: ... def items(self) -> ItemsView[Optional[str], XsdAttributeProtocol]: ... def __contains__(self, key: Optional[str]) -> bool: ... def __getitem__(self, key: Optional[str]) -> XsdAttributeProtocol: ... def __iter__(self) -> Iterator[Optional[str]]: ... def __len__(self) -> int: ... def __hash__(self) -> int: ... @property def ref(self) -> Optional[Any]: ... class XsdElementProtocol(XsdComponentProtocol, ElementProtocol, Protocol): def __iter__(self) -> Iterator['XsdElementProtocol']: ... def find( self, path: str, namespaces: Optional[NamespacesType] = ... ) -> Optional[XsdXPathNodeType]: ... def iter(self, tag: Optional[str] = ...) -> Iterator['XsdElementProtocol']: ... @property def name(self) -> Optional[str]: ... @property def type(self) -> Optional[XsdTypeProtocol]: ... @property def ref(self) -> Optional[Any]: ... @property def attrib(self) -> XsdAttributeGroupProtocol: ... @property def xpath_proxy(self) -> 'AbstractSchemaProxy': ... GT = TypeVar("GT") XsdGlobalValue = Union[GT, Tuple[ElementProtocol, Any]] class GlobalMapsProtocol(Protocol): @property def types(self) -> Mapping[str, XsdGlobalValue[XsdTypeProtocol]]: ... @property def attributes(self) -> Mapping[str, XsdGlobalValue[XsdAttributeProtocol]]: ... @property def elements(self) -> Mapping[str, XsdGlobalValue[XsdElementProtocol]]: ... @property def substitution_groups(self) -> Mapping[str, Set[Any]]: ... class XsdSchemaProtocol(XsdValidatorProtocol, ElementProtocol, Protocol): def __iter__(self) -> Iterator[XsdXPathNodeType]: ... def find( self, path: str, namespaces: Optional[NamespacesType] = ... ) -> Optional[XsdXPathNodeType]: ... def iter(self, tag: Optional[str] = ...) -> Iterator[XsdXPathNodeType]: ... @property def validity(self) -> str: ... @property def validation_attempted(self) -> str: ... @property def tag(self) -> str: ... @property def attrib(self) -> MutableMapping[Optional[str], 'XsdAttributeProtocol']: ... @property def xpath_proxy(self) -> 'AbstractSchemaProxy': ... DocumentType = Union[ElementTree, DocumentProtocol] ElementType = Union[Element, ElementProtocol] SchemaElemType = Union[XsdSchemaProtocol, XsdElementProtocol] CommentType = Union[Element, ElementProtocol] ProcessingInstructionType = Union[Element, ElementProtocol] AttribType = Union[ MutableMapping[str, Any], MutableMapping[Optional[str], Any], LxmlAttribProtocol, XsdAttributeGroupProtocol ] __all__ = ['ElementProtocol', 'EtreeElementProtocol', 'LxmlAttribProtocol', 'LxmlElementProtocol', 'DocumentProtocol', 'LxmlDocumentProtocol', 'XsdValidatorProtocol', 'XsdComponentProtocol', 'XsdTypeProtocol', 'XsdAttributeProtocol', 'XsdAttributeGroupProtocol', 'XsdElementProtocol', 'GlobalMapsProtocol', 'XsdSchemaProtocol', 'DocumentType', 'ElementType', 'SchemaElemType', 'CommentType', 'ProcessingInstructionType', 'AttribType'] sissaschool-elementpath-d3688c7/elementpath/py.typed000066400000000000000000000000001476131650400227220ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/regex/000077500000000000000000000000001476131650400223475ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/regex/__init__.py000066400000000000000000000021641476131650400244630ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Subpackage for processing XML regular expressions and for converting them to Python-compatible regexps. XPath/XQuery/XML-Schema regexp flavors are supported through translate_pattern() API options. Default options process XPath/XQuery patterns. """ from .codepoints import RegexError, iter_code_points from .unicode_subsets import UnicodeSubset, UnicodeData, install_unicode_data, \ unicode_version, unicode_subset, lazy_subset, unicode_category, unicode_block from .character_classes import CharacterClass from .patterns import translate_pattern __all__ = ['translate_pattern', 'RegexError', 'UnicodeSubset', 'UnicodeData', 'install_unicode_data', 'unicode_version', 'unicode_subset', 'lazy_subset', 'unicode_category', 'unicode_block', 'CharacterClass', 'iter_code_points'] sissaschool-elementpath-d3688c7/elementpath/regex/character_classes.py000066400000000000000000000204241476131650400263740ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re from sys import maxunicode from collections import Counter from itertools import chain from typing import AbstractSet, Any, Callable, Dict, Optional, Union from elementpath._typing import Iterator, MutableSet from .codepoints import RegexError from .unicode_subsets import UnicodeSubset, lazy_subset, unicode_subset, unicode_category I_SHORTCUT_REPLACE = ( ":A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF" "\u200C-\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD" ) C_SHORTCUT_REPLACE = ( "-.0-9:A-Z_a-z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-" "\u200D\u203F\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD" ) @lazy_subset def c_shortcut() -> UnicodeSubset: return UnicodeSubset(C_SHORTCUT_REPLACE) @lazy_subset def i_shortcut() -> UnicodeSubset: return UnicodeSubset(I_SHORTCUT_REPLACE) @lazy_subset def s_shortcut() -> UnicodeSubset: return UnicodeSubset(' \t\n\r') @lazy_subset def d_shortcut() -> UnicodeSubset: return unicode_category('Nd') @lazy_subset def w_shortcut() -> UnicodeSubset: return UnicodeSubset(chain.from_iterable(unicode_category(x) for x in 'LMNS')) # Single and Multi character escapes CHARACTER_ESCAPES: Dict[str, Union[str, Callable[[], UnicodeSubset]]] = { # Single-character escapes '\\n': '\n', '\\r': '\r', '\\t': '\t', '\\|': '|', '\\.': '.', '\\-': '-', '\\^': '^', '\\?': '?', '\\*': '*', '\\+': '+', '\\{': '{', '\\}': '}', '\\(': '(', '\\)': ')', '\\[': '[', '\\]': ']', '\\\\': '\\', # Multi-character escapes '\\s': s_shortcut, '\\S': s_shortcut, '\\d': d_shortcut, '\\D': d_shortcut, '\\i': i_shortcut, '\\I': i_shortcut, '\\c': c_shortcut, '\\C': c_shortcut, '\\w': w_shortcut, '\\W': w_shortcut, } class CharacterClass(MutableSet[int]): """ A set class to represent XML Schema/XQuery/XPath regex character class. :param charset: a string with formatted character set. :param xsd_version: the reference XSD version for syntax variants. Defaults to '1.0'. TODO: implement __ior__, __iand__, __ixor__ operators for a full mutable set class. """ _re_char_set = re.compile(r'(? None: self.xsd_version = xsd_version self.positive = UnicodeSubset() self.negative = UnicodeSubset() if charset: self.add(charset) def __repr__(self) -> str: return '%s(%s)' % (self.__class__.__name__, str(self)) def __str__(self) -> str: if not self.negative: return '[%s]' % str(self.positive) elif not self.positive: return '[^%s]' % str(self.negative) else: return '[%s%s]' % ( str(UnicodeSubset(self.negative.complement())), str(self.positive) ) def __copy__(self) -> 'CharacterClass': obj = CharacterClass(xsd_version=self.xsd_version) obj.positive.update(self.positive) obj.negative.update(self.negative) return self def __contains__(self, item: object) -> bool: if isinstance(item, str): item = ord(item) elif not isinstance(item, int): return False if self.negative: return item not in self.negative or item in self.positive return item in self.positive def __iter__(self) -> Iterator[int]: if self.negative: return ( cp for cp in range(maxunicode + 1) if cp in self.positive or cp not in self.negative ) return iter(sorted(self.positive)) # type: ignore[arg-type] def __len__(self) -> int: if self.negative: not_in_positive = Counter(x not in self.positive for x in self.negative)[True] return maxunicode + 1 - not_in_positive return len(self.positive) def __isub__(self, other: AbstractSet[Any]) -> 'CharacterClass': if isinstance(other, CharacterClass): if self.negative: if other.negative: self.positive |= (other.negative - self.negative) self.negative.clear() self.negative |= other.positive elif other.negative: self.positive &= other.negative self.positive -= other.positive return self return NotImplemented def __sub__(self, other: AbstractSet[Any]) -> 'CharacterClass': obj = self.__copy__() return obj.__isub__(other) def add(self, charset: Union[int, str]) -> None: if isinstance(charset, int): charset = chr(charset) for part in self._re_char_set.split(charset): if part in CHARACTER_ESCAPES: value = CHARACTER_ESCAPES[part] if isinstance(value, str): self.positive.update(value) elif part[-1].islower(): self.positive |= value() else: self.negative |= value() elif part.startswith('\\p') or part.startswith('\\P'): if self._re_unicode_ref.search(part) is None: raise RegexError("wrong Unicode block specification %r" % part) try: subset = unicode_subset(part[3:-1]) except RegexError: # XSD 1.1 supports Is prefix to match Unicode blocks if not self.xsd_version or not part[3:].startswith('Is'): raise self.positive |= UnicodeSubset([(0, maxunicode + 1)]) else: if part.startswith('\\p'): self.positive |= subset else: self.negative |= subset else: self.positive.update(part) def discard(self, charset: Union[int, str]) -> None: if isinstance(charset, int): charset = chr(charset) for part in self._re_char_set.split(charset): if part in CHARACTER_ESCAPES: value = CHARACTER_ESCAPES[part] if isinstance(value, str): self.positive.difference_update(value) if self.negative: self.negative.update(value) elif part[-1].islower(): self.positive -= value() if self.negative: self.negative |= value() else: self.positive &= value() self.negative.clear() elif part.startswith('\\p') or part.startswith('\\P'): if self._re_unicode_ref.search(part) is None: raise RegexError("wrong Unicode block specification %r" % part) try: subset = unicode_subset(part[3:-1]) except RegexError: # XSD 1.1 supports Is prefix to match Unicode blocks if not self.xsd_version or not part[3:].startswith('Is'): raise self.positive -= UnicodeSubset([(0, maxunicode + 1)]) else: if part.startswith('\\p'): self.positive -= subset else: self.negative -= subset else: self.positive.difference_update(part) def clear(self) -> None: self.positive.clear() self.negative.clear() def complement(self) -> None: if self.positive or self.negative: self.positive, self.negative = self.negative, self.positive else: self.positive.codepoints = [(0, maxunicode + 1)] sissaschool-elementpath-d3688c7/elementpath/regex/codepoints.py000066400000000000000000000150101476131650400250650ustar00rootroot00000000000000# # Copyright (c), 2016-2024, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module defines common definitions and helper functions for regex subpackage. """ from typing import Set, Tuple, Union from elementpath._typing import Iterable, Iterator CHARACTER_CLASS_ESCAPED: Set[int] = {ord(c) for c in r'-|.^?*+{}()[]\\'} """Code Points of escaped chars in a character class.""" CodePoint = Union[int, Tuple[int, int]] class RegexError(Exception): """ Error in a regular expression or in a character class specification. This exception is derived from `Exception` base class and is raised only by the regex subpackage. """ def code_point_order(cp: CodePoint) -> int: """Ordering function for code points.""" return cp if isinstance(cp, int) else cp[0] def code_point_reverse_order(cp: CodePoint) -> int: """Reverse ordering function for code points.""" return cp if isinstance(cp, int) else cp[1] - 1 def iter_code_points(codepoints: Iterable[CodePoint], reverse: bool = False) \ -> Iterator[CodePoint]: """ Iterates a code points sequence. Three or more consecutive code points are merged in a range. :param codepoints: an iterable with code points and code point ranges. :param reverse: if `True` reverses the order of the sequence. :return: yields code points or code point ranges. """ start_cp = end_cp = 0 if reverse: codepoints = sorted(codepoints, key=code_point_reverse_order, reverse=True) else: codepoints = sorted(codepoints, key=code_point_order) for cp in codepoints: if isinstance(cp, int): cp0 = cp cp1 = cp + 1 else: cp0, cp1 = cp if not end_cp: start_cp = cp0 end_cp = cp1 continue elif reverse: if start_cp <= cp1: if start_cp > cp0: start_cp = cp0 continue elif end_cp >= cp0: if end_cp < cp1: end_cp = cp1 continue if end_cp > start_cp + 1: yield start_cp, end_cp else: yield start_cp start_cp = cp0 end_cp = cp1 else: if end_cp: if end_cp > start_cp + 1: yield start_cp, end_cp else: yield start_cp def code_point_repr(cp: CodePoint) -> str: """ Returns the string representation of a code point. :param cp: an integer or a tuple with at least two integers. \ Values must be in interval [0, sys.maxunicode]. """ if isinstance(cp, int): if cp in CHARACTER_CLASS_ESCAPED: return r'\%s' % chr(cp) return chr(cp) if cp[0] in CHARACTER_CLASS_ESCAPED: start_char = r'\%s' % chr(cp[0]) else: start_char = chr(cp[0]) end_cp = cp[1] - 1 # Character ranges include the right bound if end_cp in CHARACTER_CLASS_ESCAPED: end_char = r'\%s' % chr(end_cp) else: end_char = chr(end_cp) if end_cp > cp[0] + 1: return '%s-%s' % (start_char, end_char) else: return start_char + end_char def iterparse_character_subset(s: str, expand_ranges: bool = False) -> Iterator[CodePoint]: """ Parses a regex character subset, generating a sequence of code points and code points ranges. An unescaped hyphen (-) that is not at the start or at the end is interpreted as range specifier. :param s: a string representing the character subset. :param expand_ranges: if set to `True` then expands character ranges. :return: yields integers or couples of integers. """ escaped = False on_range = False char = '' length = len(s) subset_index_iterator = iter(range(len(s))) for k in subset_index_iterator: if k == 0: char = s[0] if char == '\\': escaped = True elif char in r'[]' and length > 1: raise RegexError("bad character %r at position 0" % char) elif expand_ranges: yield ord(char) elif length <= 2 or s[1] != '-': yield ord(char) elif s[k] == '-': if escaped or (k == length - 1): char = s[k] yield ord(char) escaped = False elif on_range: char = s[k] yield ord(char) on_range = False else: # Parse character range on_range = True k = next(subset_index_iterator) end_char = s[k] if end_char == '\\' and (k < length - 1): if s[k + 1] in r'-|.^?*+{}()[]': k = next(subset_index_iterator) end_char = s[k] elif s[k + 1] in r'sSdDiIcCwWpP': msg = "bad character range '%s-\\%s' at position %d: %r" raise RegexError(msg % (char, s[k + 1], k - 2, s)) if ord(char) > ord(end_char): msg = "bad character range '%s-%s' at position %d: %r" raise RegexError(msg % (char, end_char, k - 2, s)) elif expand_ranges: yield from range(ord(char) + 1, ord(end_char) + 1) else: yield ord(char), ord(end_char) + 1 elif s[k] in r'|.^?*+{}()': if escaped: escaped = False on_range = False char = s[k] yield ord(char) elif s[k] in r'[]': if not escaped and length > 1: raise RegexError("bad character %r at position %d" % (s[k], k)) escaped = on_range = False char = s[k] if k >= length - 2 or s[k + 1] != '-': yield ord(char) elif s[k] == '\\': if escaped: escaped = on_range = False char = '\\' yield ord(char) else: escaped = True else: if escaped: escaped = False yield ord('\\') on_range = False char = s[k] if k >= length - 2 or s[k + 1] != '-': yield ord(char) if escaped: yield ord('\\') sissaschool-elementpath-d3688c7/elementpath/regex/patterns.py000066400000000000000000000263251476131650400245710ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Parse and translate XML Schema regular expressions to Python regex syntax. """ import re from sys import maxunicode from .codepoints import RegexError from .unicode_subsets import UnicodeSubset, unicode_subset from .character_classes import CharacterClass, I_SHORTCUT_REPLACE, C_SHORTCUT_REPLACE HYPHENS_PATTERN = re.compile(r'(? str: """ Translates a pattern regex expression to a Python regex pattern. With default options the translator processes XPath 2.0/XQuery 1.0 regex patterns. For XML Schema patterns set all boolean options to `False`. :param pattern: the source XML Schema regular expression. :param flags: regex flags as represented by Python's re module. :param xsd_version: apply regex rules of a specific XSD version, '1.0' for default. :param back_references: if `True` supports back-references and capturing groups. :param lazy_quantifiers: if `True` supports lazy quantifiers (\\*?, +?). :param anchors: if `True` supports ^ and $ anchors, otherwise the translated \ pattern is anchored to its boundaries and anchors are treated as normal characters. """ pos: int msg: str def parse_character_class() -> CharacterClass: nonlocal pos nonlocal msg pos += 1 if pattern[pos] == '^': pos += 1 negative = True else: negative = False char_class_pos = pos while True: if pattern[pos] == '[': msg = "invalid character '[' at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) elif pattern[pos] == '\\': if pattern[pos + 1].isdigit(): msg = "illegal back-reference in character class at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) pos += 2 elif pattern[pos] == ']' or pattern[pos:pos + 2] == '-[': if pos == char_class_pos: msg = "empty character class at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) char_class_pattern = pattern[char_class_pos:pos] if HYPHENS_PATTERN.search(char_class_pattern) and pos - char_class_pos > 2: msg = "invalid character range '--' at position {}: {!r}" raise RegexError(msg.format(pos, pattern)) if xsd_version == '1.0': hyphen_match = INVALID_HYPHEN_PATTERN.search(char_class_pattern) if hyphen_match is not None: hyphen_pos = char_class_pos + hyphen_match.span()[1] - 2 msg = "unescaped character '-' at position {}: {!r}" raise RegexError(msg.format(hyphen_pos, pattern)) char_class = CharacterClass(char_class_pattern, xsd_version) if negative: char_class.complement() break # pragma: no cover else: pos += 1 if pattern[pos] != ']': # Parse a group subtraction pos += 1 subtracted_class = parse_character_class() pos += 1 char_class -= subtracted_class return char_class group_open_char = '(' if back_references else '(?:' regex = [] if anchors else ['^%s' % group_open_char] pos = 0 pattern_len = len(pattern) total_groups = 0 nested_groups = 0 dot_all = flags & re.DOTALL if back_references: match = FORBIDDEN_ESCAPES_REF_PATTERN.search(pattern) else: match = FORBIDDEN_ESCAPES_NOREF_PATTERN.search(pattern) if match: msg = "not allowed escape sequence {!r} at position {}: {!r}" raise RegexError(msg.format(match.group(), match.span()[0], pattern)) while pos < pattern_len: ch = pattern[pos] if ch == '.': regex.append(ch if dot_all else r'[^\r\n]') elif ch in ('^', '$'): if not anchors: regex.append(r'\%s' % ch) elif ch == '^': regex.append(r'(?= pattern_len: regex.append('\\') elif pattern[pos].isdigit(): regex.append('\\%s' % pattern[pos]) reference = DIGITS_PATTERN.match(pattern[pos:]).group() # type: ignore[union-attr] if len(reference) > 1: k = 0 for k in range(1, len(reference)): if total_groups < int(reference[:k + 1]): regex.append('[%s]' % pattern[pos + k]) break else: regex.append(pattern[pos + k]) pos += k # pragma: no cover elif pattern[pos] == 'i': regex.append('[%s]' % I_SHORTCUT_REPLACE) elif pattern[pos] == 'I': regex.append('[^%s]' % I_SHORTCUT_REPLACE) elif pattern[pos] == 'c': regex.append('[%s]' % C_SHORTCUT_REPLACE) elif pattern[pos] == 'C': regex.append('[^%s]' % C_SHORTCUT_REPLACE) elif pattern[pos] in 'pP': block_pos = pos - 1 try: if pattern[pos + 1] != '{': raise RegexError("a '{' expected, found %r." % pattern[pos + 1]) while pattern[pos] != '}': pos += 1 except (IndexError, ValueError): msg = "truncated unicode block escape at position {}: {!r}" raise RegexError(msg.format(block_pos, pattern)) block_name = pattern[block_pos + 3:pos] if flags & re.VERBOSE: # spaces are completely collapsed in verbose regex patterns block_name = block_name.replace(' ', '') try: p_shortcut_set = unicode_subset(block_name) except RegexError: # XSD 1.1 supports Is prefix to match Unicode blocks if xsd_version == '1.0' or not block_name.startswith('Is'): raise p_shortcut_group = '[%s]' % UnicodeSubset([(0, maxunicode)]) else: if pattern[block_pos + 1] == 'p': p_shortcut_group = '[%s]' % p_shortcut_set else: p_shortcut_group = '[^%s]' % p_shortcut_set if flags & re.IGNORECASE: regex.append('(?-i:%s)' % p_shortcut_group) else: regex.append(p_shortcut_group) else: regex.append('\\%s' % pattern[pos]) else: regex.append(ch) pos += 1 if nested_groups > 0: raise RegexError("unterminated subpattern in expression: %r" % pattern) if not anchors: regex.append(r')$(?!\n\Z)') return ''.join(regex) sissaschool-elementpath-d3688c7/elementpath/regex/unicode_blocks.py000066400000000000000000000420501476131650400257050ustar00rootroot00000000000000# # Copyright (c), 2018-2024, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or https://opensource.org/licenses/MIT. # # @author Davide Brunato # # --- Auto-generated code: don't edit this file --- # UNICODE_BLOCKS_VER_2_0_0 = { 'Basic Latin': '\u0000-\u007F', 'Latin-1 Supplement': '\u0080-\u00FF', 'Latin Extended-A': '\u0100-\u017F', 'Latin Extended-B': '\u0180-\u024F', 'IPA Extensions': '\u0250-\u02AF', 'Spacing Modifier Letters': '\u02B0-\u02FF', 'Combining Diacritical Marks': '\u0300-\u036F', 'Greek': '\u0370-\u03FF', 'Cyrillic': '\u0400-\u04FF', 'Armenian': '\u0530-\u058F', 'Hebrew': '\u0590-\u05FF', 'Arabic': '\u0600-\u06FF', 'Devanagari': '\u0900-\u097F', 'Bengali': '\u0980-\u09FF', 'Gurmukhi': '\u0A00-\u0A7F', 'Gujarati': '\u0A80-\u0AFF', 'Oriya': '\u0B00-\u0B7F', 'Tamil': '\u0B80-\u0BFF', 'Telugu': '\u0C00-\u0C7F', 'Kannada': '\u0C80-\u0CFF', 'Malayalam': '\u0D00-\u0D7F', 'Thai': '\u0E00-\u0E7F', 'Lao': '\u0E80-\u0EFF', 'Tibetan': '\u0F00-\u0FBF', 'Georgian': '\u10A0-\u10FF', 'Hangul Jamo': '\u1100-\u11FF', 'Latin Extended Additional': '\u1E00-\u1EFF', 'Greek Extended': '\u1F00-\u1FFF', 'General Punctuation': '\u2000-\u206F', 'Superscripts and Subscripts': '\u2070-\u209F', 'Currency Symbols': '\u20A0-\u20CF', 'Combining Marks for Symbols': '\u20D0-\u20FF', 'Letterlike Symbols': '\u2100-\u214F', 'Number Forms': '\u2150-\u218F', 'Arrows': '\u2190-\u21FF', 'Mathematical Operators': '\u2200-\u22FF', 'Miscellaneous Technical': '\u2300-\u23FF', 'Control Pictures': '\u2400-\u243F', 'Optical Character Recognition': '\u2440-\u245F', 'Enclosed Alphanumerics': '\u2460-\u24FF', 'Box Drawing': '\u2500-\u257F', 'Block Elements': '\u2580-\u259F', 'Geometric Shapes': '\u25A0-\u25FF', 'Miscellaneous Symbols': '\u2600-\u26FF', 'Dingbats': '\u2700-\u27BF', 'CJK Symbols and Punctuation': '\u3000-\u303F', 'Hiragana': '\u3040-\u309F', 'Katakana': '\u30A0-\u30FF', 'Bopomofo': '\u3100-\u312F', 'Hangul Compatibility Jamo': '\u3130-\u318F', 'Kanbun': '\u3190-\u319F', 'Enclosed CJK Letters and Months': '\u3200-\u32FF', 'CJK Compatibility': '\u3300-\u33FF', 'CJK Unified Ideographs': '\u4E00-\u9FFF', 'Hangul Syllables': '\uAC00-\uD7A3', 'High Surrogates': '\uD800-\uDB7F', 'High Private Use Surrogates': '\uDB80-\uDBFF', 'Low Surrogates': '\uDC00-\uDFFF', 'Private Use': '\uE000-\uF8FF', 'CJK Compatibility Ideographs': '\uF900-\uFAFF', 'Alphabetic Presentation Forms': '\uFB00-\uFB4F', 'Arabic Presentation Forms-A': '\uFB50-\uFDFF', 'Combining Half Marks': '\uFE20-\uFE2F', 'CJK Compatibility Forms': '\uFE30-\uFE4F', 'Small Form Variants': '\uFE50-\uFE6F', 'Arabic Presentation Forms-B': '\uFE70-\uFEFF', 'Halfwidth and Fullwidth Forms': '\uFF00-\uFFEF', 'Specials': '\uFEFF-\uFEFF\uFFF0-\uFFFF' } UPDATE_BLOCKS_VER_2_1_9 = { 'Arabic Presentation Forms-B': '\uFE70-\uFEFE', 'Specials': '\uFEFF-\uFEFF\uFFF0-\uFFFD' } UPDATE_BLOCKS_VER_3_0_0 = { 'Syriac': '\u0700-\u074F', 'Thaana': '\u0780-\u07BF', 'Sinhala': '\u0D80-\u0DFF', 'Tibetan': '\u0F00-\u0FFF', 'Myanmar': '\u1000-\u109F', 'Ethiopic': '\u1200-\u137F', 'Cherokee': '\u13A0-\u13FF', 'Unified Canadian Aboriginal Syllabics': '\u1400-\u167F', 'Ogham': '\u1680-\u169F', 'Runic': '\u16A0-\u16FF', 'Khmer': '\u1780-\u17FF', 'Mongolian': '\u1800-\u18AF', 'Braille Patterns': '\u2800-\u28FF', 'CJK Radicals Supplement': '\u2E80-\u2EFF', 'Kangxi Radicals': '\u2F00-\u2FDF', 'Ideographic Description Characters': '\u2FF0-\u2FFF', 'Bopomofo Extended': '\u31A0-\u31BF', 'CJK Unified Ideographs Extension A': '\u3400-\u4DB5', 'Yi Syllables': '\uA000-\uA48F', 'Yi Radicals': '\uA490-\uA4CF' } UPDATE_BLOCKS_VER_3_1_0 = { 'Private Use': '\uE000-\uF8FF\U000F0000-\U000FFFFD\U00100000-\U0010FFFD', 'Old Italic': '\U00010300-\U0001032F', 'Gothic': '\U00010330-\U0001034F', 'Deseret': '\U00010400-\U0001044F', 'Byzantine Musical Symbols': '\U0001D000-\U0001D0FF', 'Musical Symbols': '\U0001D100-\U0001D1FF', 'Mathematical Alphanumeric Symbols': '\U0001D400-\U0001D7FF', 'CJK Unified Ideographs Extension B': '\U00020000-\U0002A6D6', 'CJK Compatibility Ideographs Supplement': '\U0002F800-\U0002FA1F', 'Tags': '\U000E0000-\U000E007F' } REMOVED_BLOCKS_VER_3_1_1 = [ 'Greek', 'Combining Marks for Symbols', 'Private Use' ] UPDATE_BLOCKS_VER_3_1_1 = { 'Greek and Coptic': '\u0370-\u03FF', 'Cyrillic Supplementary': '\u0500-\u052F', 'Tagalog': '\u1700-\u171F', 'Hanunoo': '\u1720-\u173F', 'Buhid': '\u1740-\u175F', 'Tagbanwa': '\u1760-\u177F', 'Combining Diacritical Marks for Symbols': '\u20D0-\u20FF', 'Miscellaneous Mathematical Symbols-A': '\u27C0-\u27EF', 'Supplemental Arrows-A': '\u27F0-\u27FF', 'Supplemental Arrows-B': '\u2900-\u297F', 'Miscellaneous Mathematical Symbols-B': '\u2980-\u29FF', 'Supplemental Mathematical Operators': '\u2A00-\u2AFF', 'Katakana Phonetic Extensions': '\u31F0-\u31FF', 'CJK Unified Ideographs Extension A': '\u3400-\u4DBF', 'Hangul Syllables': '\uAC00-\uD7AF', 'Private Use Area': '\uE000-\uF8FF', 'Variation Selectors': '\uFE00-\uFE0F', 'Arabic Presentation Forms-B': '\uFE70-\uFEFF', 'Specials': '\uFFF0-\uFFFF', 'CJK Unified Ideographs Extension B': '\U00020000-\U0002A6DF', 'Supplementary Private Use Area-A': '\U000F0000-\U000FFFFF', 'Supplementary Private Use Area-B': '\U00100000-\U0010FFFF' } UPDATE_BLOCKS_VER_4_0_0 = { 'Limbu': '\u1900-\u194F', 'Tai Le': '\u1950-\u197F', 'Khmer Symbols': '\u19E0-\u19FF', 'Phonetic Extensions': '\u1D00-\u1D7F', 'Miscellaneous Symbols and Arrows': '\u2B00-\u2BFF', 'Yijing Hexagram Symbols': '\u4DC0-\u4DFF', 'Linear B Syllabary': '\U00010000-\U0001007F', 'Linear B Ideograms': '\U00010080-\U000100FF', 'Aegean Numbers': '\U00010100-\U0001013F', 'Ugaritic': '\U00010380-\U0001039F', 'Shavian': '\U00010450-\U0001047F', 'Osmanya': '\U00010480-\U000104AF', 'Cypriot Syllabary': '\U00010800-\U0001083F', 'Tai Xuan Jing Symbols': '\U0001D300-\U0001D35F', 'Variation Selectors Supplement': '\U000E0100-\U000E01EF' } REMOVED_BLOCKS_VER_4_0_1 = [ 'Cyrillic Supplementary' ] UPDATE_BLOCKS_VER_4_0_1 = { 'Cyrillic Supplement': '\u0500-\u052F' } UPDATE_BLOCKS_VER_4_1_0 = { 'Arabic Supplement': '\u0750-\u077F', 'Ethiopic Supplement': '\u1380-\u139F', 'New Tai Lue': '\u1980-\u19DF', 'Buginese': '\u1A00-\u1A1F', 'Phonetic Extensions Supplement': '\u1D80-\u1DBF', 'Combining Diacritical Marks Supplement': '\u1DC0-\u1DFF', 'Glagolitic': '\u2C00-\u2C5F', 'Coptic': '\u2C80-\u2CFF', 'Georgian Supplement': '\u2D00-\u2D2F', 'Tifinagh': '\u2D30-\u2D7F', 'Ethiopic Extended': '\u2D80-\u2DDF', 'Supplemental Punctuation': '\u2E00-\u2E7F', 'CJK Strokes': '\u31C0-\u31EF', 'Modifier Tone Letters': '\uA700-\uA71F', 'Syloti Nagri': '\uA800-\uA82F', 'Vertical Forms': '\uFE10-\uFE1F', 'Ancient Greek Numbers': '\U00010140-\U0001018F', 'Old Persian': '\U000103A0-\U000103DF', 'Kharoshthi': '\U00010A00-\U00010A5F', 'Ancient Greek Musical Notation': '\U0001D200-\U0001D24F' } UPDATE_BLOCKS_VER_5_0_0 = { 'NKo': '\u07C0-\u07FF', 'Balinese': '\u1B00-\u1B7F', 'Latin Extended-C': '\u2C60-\u2C7F', 'Latin Extended-D': '\uA720-\uA7FF', 'Phags-pa': '\uA840-\uA87F', 'Phoenician': '\U00010900-\U0001091F', 'Cuneiform': '\U00012000-\U000123FF', 'Cuneiform Numbers and Punctuation': '\U00012400-\U0001247F', 'Counting Rod Numerals': '\U0001D360-\U0001D37F' } UPDATE_BLOCKS_VER_5_1_0 = { 'Sundanese': '\u1B80-\u1BBF', 'Lepcha': '\u1C00-\u1C4F', 'Ol Chiki': '\u1C50-\u1C7F', 'Cyrillic Extended-A': '\u2DE0-\u2DFF', 'Vai': '\uA500-\uA63F', 'Cyrillic Extended-B': '\uA640-\uA69F', 'Saurashtra': '\uA880-\uA8DF', 'Kayah Li': '\uA900-\uA92F', 'Rejang': '\uA930-\uA95F', 'Cham': '\uAA00-\uAA5F', 'Ancient Symbols': '\U00010190-\U000101CF', 'Phaistos Disc': '\U000101D0-\U000101FF', 'Lycian': '\U00010280-\U0001029F', 'Carian': '\U000102A0-\U000102DF', 'Lydian': '\U00010920-\U0001093F', 'Mahjong Tiles': '\U0001F000-\U0001F02F', 'Domino Tiles': '\U0001F030-\U0001F09F' } UPDATE_BLOCKS_VER_5_2_0 = { 'Samaritan': '\u0800-\u083F', 'Unified Canadian Aboriginal Syllabics Extended': '\u18B0-\u18FF', 'Tai Tham': '\u1A20-\u1AAF', 'Vedic Extensions': '\u1CD0-\u1CFF', 'Lisu': '\uA4D0-\uA4FF', 'Bamum': '\uA6A0-\uA6FF', 'Common Indic Number Forms': '\uA830-\uA83F', 'Devanagari Extended': '\uA8E0-\uA8FF', 'Hangul Jamo Extended-A': '\uA960-\uA97F', 'Javanese': '\uA980-\uA9DF', 'Myanmar Extended-A': '\uAA60-\uAA7F', 'Tai Viet': '\uAA80-\uAADF', 'Meetei Mayek': '\uABC0-\uABFF', 'Hangul Jamo Extended-B': '\uD7B0-\uD7FF', 'Imperial Aramaic': '\U00010840-\U0001085F', 'Old South Arabian': '\U00010A60-\U00010A7F', 'Avestan': '\U00010B00-\U00010B3F', 'Inscriptional Parthian': '\U00010B40-\U00010B5F', 'Inscriptional Pahlavi': '\U00010B60-\U00010B7F', 'Old Turkic': '\U00010C00-\U00010C4F', 'Rumi Numeral Symbols': '\U00010E60-\U00010E7F', 'Kaithi': '\U00011080-\U000110CF', 'Egyptian Hieroglyphs': '\U00013000-\U0001342F', 'Enclosed Alphanumeric Supplement': '\U0001F100-\U0001F1FF', 'Enclosed Ideographic Supplement': '\U0001F200-\U0001F2FF', 'CJK Unified Ideographs Extension C': '\U0002A700-\U0002B73F' } UPDATE_BLOCKS_VER_6_0_0 = { 'Mandaic': '\u0840-\u085F', 'Batak': '\u1BC0-\u1BFF', 'Ethiopic Extended-A': '\uAB00-\uAB2F', 'Brahmi': '\U00011000-\U0001107F', 'Bamum Supplement': '\U00016800-\U00016A3F', 'Kana Supplement': '\U0001B000-\U0001B0FF', 'Playing Cards': '\U0001F0A0-\U0001F0FF', 'Miscellaneous Symbols And Pictographs': '\U0001F300-\U0001F5FF', 'Emoticons': '\U0001F600-\U0001F64F', 'Transport And Map Symbols': '\U0001F680-\U0001F6FF', 'Alchemical Symbols': '\U0001F700-\U0001F77F', 'CJK Unified Ideographs Extension D': '\U0002B740-\U0002B81F' } UPDATE_BLOCKS_VER_6_1_0 = { 'Arabic Extended-A': '\u08A0-\u08FF', 'Sundanese Supplement': '\u1CC0-\u1CCF', 'Meetei Mayek Extensions': '\uAAE0-\uAAFF', 'Meroitic Hieroglyphs': '\U00010980-\U0001099F', 'Meroitic Cursive': '\U000109A0-\U000109FF', 'Sora Sompeng': '\U000110D0-\U000110FF', 'Chakma': '\U00011100-\U0001114F', 'Sharada': '\U00011180-\U000111DF', 'Takri': '\U00011680-\U000116CF', 'Miao': '\U00016F00-\U00016F9F', 'Arabic Mathematical Alphabetic Symbols': '\U0001EE00-\U0001EEFF' } REMOVED_BLOCKS_VER_7_0_0 = [ 'Miscellaneous Symbols And Pictographs', 'Transport And Map Symbols' ] UPDATE_BLOCKS_VER_7_0_0 = { 'Combining Diacritical Marks Extended': '\u1AB0-\u1AFF', 'Myanmar Extended-B': '\uA9E0-\uA9FF', 'Latin Extended-E': '\uAB30-\uAB6F', 'Coptic Epact Numbers': '\U000102E0-\U000102FF', 'Old Permic': '\U00010350-\U0001037F', 'Elbasan': '\U00010500-\U0001052F', 'Caucasian Albanian': '\U00010530-\U0001056F', 'Linear A': '\U00010600-\U0001077F', 'Palmyrene': '\U00010860-\U0001087F', 'Nabataean': '\U00010880-\U000108AF', 'Old North Arabian': '\U00010A80-\U00010A9F', 'Manichaean': '\U00010AC0-\U00010AFF', 'Psalter Pahlavi': '\U00010B80-\U00010BAF', 'Mahajani': '\U00011150-\U0001117F', 'Sinhala Archaic Numbers': '\U000111E0-\U000111FF', 'Khojki': '\U00011200-\U0001124F', 'Khudawadi': '\U000112B0-\U000112FF', 'Grantha': '\U00011300-\U0001137F', 'Tirhuta': '\U00011480-\U000114DF', 'Siddham': '\U00011580-\U000115FF', 'Modi': '\U00011600-\U0001165F', 'Warang Citi': '\U000118A0-\U000118FF', 'Pau Cin Hau': '\U00011AC0-\U00011AFF', 'Mro': '\U00016A40-\U00016A6F', 'Bassa Vah': '\U00016AD0-\U00016AFF', 'Pahawh Hmong': '\U00016B00-\U00016B8F', 'Duployan': '\U0001BC00-\U0001BC9F', 'Shorthand Format Controls': '\U0001BCA0-\U0001BCAF', 'Mende Kikakui': '\U0001E800-\U0001E8DF', 'Miscellaneous Symbols and Pictographs': '\U0001F300-\U0001F5FF', 'Ornamental Dingbats': '\U0001F650-\U0001F67F', 'Transport and Map Symbols': '\U0001F680-\U0001F6FF', 'Geometric Shapes Extended': '\U0001F780-\U0001F7FF', 'Supplemental Arrows-C': '\U0001F800-\U0001F8FF' } UPDATE_BLOCKS_VER_8_0_0 = { 'Cherokee Supplement': '\uAB70-\uABBF', 'Hatran': '\U000108E0-\U000108FF', 'Old Hungarian': '\U00010C80-\U00010CFF', 'Multani': '\U00011280-\U000112AF', 'Ahom': '\U00011700-\U0001173F', 'Early Dynastic Cuneiform': '\U00012480-\U0001254F', 'Anatolian Hieroglyphs': '\U00014400-\U0001467F', 'Sutton SignWriting': '\U0001D800-\U0001DAAF', 'Supplemental Symbols and Pictographs': '\U0001F900-\U0001F9FF', 'CJK Unified Ideographs Extension E': '\U0002B820-\U0002CEAF' } UPDATE_BLOCKS_VER_9_0_0 = { 'Cyrillic Extended-C': '\u1C80-\u1C8F', 'Osage': '\U000104B0-\U000104FF', 'Newa': '\U00011400-\U0001147F', 'Mongolian Supplement': '\U00011660-\U0001167F', 'Bhaiksuki': '\U00011C00-\U00011C6F', 'Marchen': '\U00011C70-\U00011CBF', 'Ideographic Symbols and Punctuation': '\U00016FE0-\U00016FFF', 'Tangut': '\U00017000-\U000187FF', 'Tangut Components': '\U00018800-\U00018AFF', 'Glagolitic Supplement': '\U0001E000-\U0001E02F', 'Adlam': '\U0001E900-\U0001E95F' } UPDATE_BLOCKS_VER_10_0_0 = { 'Syriac Supplement': '\u0860-\u086F', 'Zanabazar Square': '\U00011A00-\U00011A4F', 'Soyombo': '\U00011A50-\U00011AAF', 'Masaram Gondi': '\U00011D00-\U00011D5F', 'Kana Extended-A': '\U0001B100-\U0001B12F', 'Nushu': '\U0001B170-\U0001B2FF', 'CJK Unified Ideographs Extension F': '\U0002CEB0-\U0002EBEF' } UPDATE_BLOCKS_VER_11_0_0 = { 'Georgian Extended': '\u1C90-\u1CBF', 'Hanifi Rohingya': '\U00010D00-\U00010D3F', 'Old Sogdian': '\U00010F00-\U00010F2F', 'Sogdian': '\U00010F30-\U00010F6F', 'Dogra': '\U00011800-\U0001184F', 'Gunjala Gondi': '\U00011D60-\U00011DAF', 'Makasar': '\U00011EE0-\U00011EFF', 'Medefaidrin': '\U00016E40-\U00016E9F', 'Mayan Numerals': '\U0001D2E0-\U0001D2FF', 'Indic Siyaq Numbers': '\U0001EC70-\U0001ECBF', 'Chess Symbols': '\U0001FA00-\U0001FA6F' } UPDATE_BLOCKS_VER_12_0_0 = { 'Elymaic': '\U00010FE0-\U00010FFF', 'Nandinagari': '\U000119A0-\U000119FF', 'Tamil Supplement': '\U00011FC0-\U00011FFF', 'Egyptian Hieroglyph Format Controls': '\U00013430-\U0001343F', 'Small Kana Extension': '\U0001B130-\U0001B16F', 'Nyiakeng Puachue Hmong': '\U0001E100-\U0001E14F', 'Wancho': '\U0001E2C0-\U0001E2FF', 'Ottoman Siyaq Numbers': '\U0001ED00-\U0001ED4F', 'Symbols and Pictographs Extended-A': '\U0001FA70-\U0001FAFF' } UPDATE_BLOCKS_VER_13_0_0 = { 'Yezidi': '\U00010E80-\U00010EBF', 'Chorasmian': '\U00010FB0-\U00010FDF', 'Dives Akuru': '\U00011900-\U0001195F', 'Lisu Supplement': '\U00011FB0-\U00011FBF', 'Khitan Small Script': '\U00018B00-\U00018CFF', 'Tangut Supplement': '\U00018D00-\U00018D8F', 'Symbols for Legacy Computing': '\U0001FB00-\U0001FBFF', 'CJK Unified Ideographs Extension G': '\U00030000-\U0003134F' } UPDATE_BLOCKS_VER_14_0_0 = { 'Arabic Extended-B': '\u0870-\u089F', 'Vithkuqi': '\U00010570-\U000105BF', 'Latin Extended-F': '\U00010780-\U000107BF', 'Old Uyghur': '\U00010F70-\U00010FAF', 'Ahom': '\U00011700-\U0001174F', 'Unified Canadian Aboriginal Syllabics Extended-A': '\U00011AB0-\U00011ABF', 'Cypro-Minoan': '\U00012F90-\U00012FFF', 'Tangsa': '\U00016A70-\U00016ACF', 'Tangut Supplement': '\U00018D00-\U00018D7F', 'Kana Extended-B': '\U0001AFF0-\U0001AFFF', 'Znamenny Musical Notation': '\U0001CF00-\U0001CFCF', 'Latin Extended-G': '\U0001DF00-\U0001DFFF', 'Toto': '\U0001E290-\U0001E2BF', 'Ethiopic Extended-B': '\U0001E7E0-\U0001E7FF' } UPDATE_BLOCKS_VER_15_0_0 = { 'Arabic Extended-C': '\U00010EC0-\U00010EFF', 'Devanagari Extended-A': '\U00011B00-\U00011B5F', 'Kawi': '\U00011F00-\U00011F5F', 'Egyptian Hieroglyph Format Controls': '\U00013430-\U0001345F', 'Kaktovik Numerals': '\U0001D2C0-\U0001D2DF', 'Cyrillic Extended-D': '\U0001E030-\U0001E08F', 'Nag Mundari': '\U0001E4D0-\U0001E4FF', 'CJK Unified Ideographs Extension H': '\U00031350-\U000323AF' } UPDATE_BLOCKS_VER_15_1_0 = { 'CJK Unified Ideographs Extension I': '\U0002EBF0-\U0002EE5F' } UPDATE_BLOCKS_VER_16_0_0 = { 'Todhri': '\U000105C0-\U000105FF', 'Garay': '\U00010D40-\U00010D8F', 'Tulu-Tigalari': '\U00011380-\U000113FF', 'Myanmar Extended-C': '\U000116D0-\U000116FF', 'Sunuwar': '\U00011BC0-\U00011BFF', 'Egyptian Hieroglyphs Extended-A': '\U00013460-\U000143FF', 'Gurung Khema': '\U00016100-\U0001613F', 'Kirat Rai': '\U00016D40-\U00016D7F', 'Symbols for Legacy Computing Supplement': '\U0001CC00-\U0001CEBF', 'Ol Onal': '\U0001E5D0-\U0001E5FF' } sissaschool-elementpath-d3688c7/elementpath/regex/unicode_categories.py000066400000000000000000003205161476131650400265630ustar00rootroot00000000000000# # Copyright (c), 2018-2024, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or https://opensource.org/licenses/MIT. # # @author Davide Brunato # # --- Auto-generated code: don't edit this file --- # UNICODE_VERSIONS = [ '12.1.0', '13.0.0', '14.0.0', '15.0.0', '15.1.0', '16.0.0' ] UNICODE_CATEGORIES = { 'C': [(0, 32), (127, 160), 173, (888, 890), (896, 900), 907, 909, 930, 1328, (1367, 1369), (1419, 1421), 1424, (1480, 1488), (1515, 1519), (1525, 1542), (1564, 1566), 1757, (1806, 1808), (1867, 1869), (1970, 1984), (2043, 2045), (2094, 2096), 2111, (2140, 2142), 2143, (2155, 2208), 2229, (2238, 2259), 2274, 2436, (2445, 2447), (2449, 2451), 2473, 2481, (2483, 2486), (2490, 2492), (2501, 2503), (2505, 2507), (2511, 2519), (2520, 2524), 2526, (2532, 2534), (2559, 2561), 2564, (2571, 2575), (2577, 2579), 2601, 2609, 2612, 2615, (2618, 2620), 2621, (2627, 2631), (2633, 2635), (2638, 2641), (2642, 2649), 2653, (2655, 2662), (2679, 2689), 2692, 2702, 2706, 2729, 2737, 2740, (2746, 2748), 2758, 2762, (2766, 2768), (2769, 2784), (2788, 2790), (2802, 2809), 2816, 2820, (2829, 2831), (2833, 2835), 2857, 2865, 2868, (2874, 2876), (2885, 2887), (2889, 2891), (2894, 2902), (2904, 2908), 2910, (2916, 2918), (2936, 2946), 2948, (2955, 2958), 2961, (2966, 2969), 2971, 2973, (2976, 2979), (2981, 2984), (2987, 2990), (3002, 3006), (3011, 3014), 3017, (3022, 3024), (3025, 3031), (3032, 3046), (3067, 3072), 3085, 3089, 3113, (3130, 3133), 3141, 3145, (3150, 3157), 3159, (3163, 3168), (3172, 3174), (3184, 3191), 3213, 3217, 3241, 3252, (3258, 3260), 3269, 3273, (3278, 3285), (3287, 3294), 3295, (3300, 3302), 3312, (3315, 3328), 3332, 3341, 3345, 3397, 3401, (3408, 3412), (3428, 3430), (3456, 3458), 3460, (3479, 3482), 3506, 3516, (3518, 3520), (3527, 3530), (3531, 3535), 3541, 3543, (3552, 3558), (3568, 3570), (3573, 3585), (3643, 3647), (3676, 3713), 3715, 3717, 3723, 3748, 3750, (3774, 3776), 3781, 3783, (3790, 3792), (3802, 3804), (3808, 3840), 3912, (3949, 3953), 3992, 4029, 4045, (4059, 4096), 4294, (4296, 4301), (4302, 4304), 4681, (4686, 4688), 4695, 4697, (4702, 4704), 4745, (4750, 4752), 4785, (4790, 4792), 4799, 4801, (4806, 4808), 4823, 4881, (4886, 4888), (4955, 4957), (4989, 4992), (5018, 5024), (5110, 5112), (5118, 5120), (5789, 5792), (5881, 5888), 5901, (5909, 5920), (5943, 5952), (5972, 5984), 5997, 6001, (6004, 6016), (6110, 6112), (6122, 6128), (6138, 6144), (6158, 6160), (6170, 6176), (6265, 6272), (6315, 6320), (6390, 6400), 6431, (6444, 6448), (6460, 6464), (6465, 6468), (6510, 6512), (6517, 6528), (6572, 6576), (6602, 6608), (6619, 6622), (6684, 6686), 6751, (6781, 6783), (6794, 6800), (6810, 6816), (6830, 6832), (6847, 6912), (6988, 6992), (7037, 7040), (7156, 7164), (7224, 7227), (7242, 7245), (7305, 7312), (7355, 7357), (7368, 7376), (7419, 7424), 7674, (7958, 7960), (7966, 7968), (8006, 8008), (8014, 8016), 8024, 8026, 8028, 8030, (8062, 8064), 8117, 8133, (8148, 8150), 8156, (8176, 8178), 8181, 8191, (8203, 8208), (8234, 8239), (8288, 8304), (8306, 8308), 8335, (8349, 8352), (8384, 8400), (8433, 8448), (8588, 8592), (9255, 9280), (9291, 9312), (11124, 11126), (11158, 11160), 11311, 11359, (11508, 11513), 11558, (11560, 11565), (11566, 11568), (11624, 11631), (11633, 11647), (11671, 11680), 11687, 11695, 11703, 11711, 11719, 11727, 11735, 11743, (11856, 11904), 11930, (12020, 12032), (12246, 12272), (12284, 12288), 12352, (12439, 12441), (12544, 12549), 12592, 12687, (12731, 12736), (12772, 12784), 12831, (19894, 19904), (40944, 40960), (42125, 42128), (42183, 42192), (42540, 42560), (42744, 42752), (42944, 42946), (42951, 42999), (43052, 43056), (43066, 43072), (43128, 43136), (43206, 43214), (43226, 43232), (43348, 43359), (43389, 43392), 43470, (43482, 43486), 43519, (43575, 43584), (43598, 43600), (43610, 43612), (43715, 43739), (43767, 43777), (43783, 43785), (43791, 43793), (43799, 43808), 43815, 43823, (43880, 43888), (44014, 44016), (44026, 44032), (55204, 55216), (55239, 55243), (55292, 63744), (64110, 64112), (64218, 64256), (64263, 64275), (64280, 64285), 64311, 64317, 64319, 64322, 64325, (64450, 64467), (64832, 64848), (64912, 64914), (64968, 65008), (65022, 65024), (65050, 65056), 65107, 65127, (65132, 65136), 65141, (65277, 65281), (65471, 65474), (65480, 65482), (65488, 65490), (65496, 65498), (65501, 65504), 65511, (65519, 65532), (65534, 65536), 65548, 65575, 65595, 65598, (65614, 65616), (65630, 65664), (65787, 65792), (65795, 65799), (65844, 65847), 65935, (65948, 65952), (65953, 66000), (66046, 66176), (66205, 66208), (66257, 66272), (66300, 66304), (66340, 66349), (66379, 66384), (66427, 66432), 66462, (66500, 66504), (66518, 66560), (66718, 66720), (66730, 66736), (66772, 66776), (66812, 66816), (66856, 66864), (66916, 66927), (66928, 67072), (67383, 67392), (67414, 67424), (67432, 67584), (67590, 67592), 67593, 67638, (67641, 67644), (67645, 67647), 67670, (67743, 67751), (67760, 67808), 67827, (67830, 67835), (67868, 67871), (67898, 67903), (67904, 67968), (68024, 68028), (68048, 68050), 68100, (68103, 68108), 68116, 68120, (68150, 68152), (68155, 68159), (68169, 68176), (68185, 68192), (68256, 68288), (68327, 68331), (68343, 68352), (68406, 68409), (68438, 68440), (68467, 68472), (68498, 68505), (68509, 68521), (68528, 68608), (68681, 68736), (68787, 68800), (68851, 68858), (68904, 68912), (68922, 69216), (69247, 69376), (69416, 69424), (69466, 69600), (69623, 69632), (69710, 69714), (69744, 69759), 69821, (69826, 69840), (69865, 69872), (69882, 69888), 69941, (69959, 69968), (70007, 70016), (70094, 70096), 70112, (70133, 70144), 70162, (70207, 70272), 70279, 70281, 70286, 70302, (70314, 70320), (70379, 70384), (70394, 70400), 70404, (70413, 70415), (70417, 70419), 70441, 70449, 70452, 70458, (70469, 70471), (70473, 70475), (70478, 70480), (70481, 70487), (70488, 70493), (70500, 70502), (70509, 70512), (70517, 70656), 70746, 70748, (70752, 70784), (70856, 70864), (70874, 71040), (71094, 71096), (71134, 71168), (71237, 71248), (71258, 71264), (71277, 71296), (71353, 71360), (71370, 71424), (71451, 71453), (71468, 71472), (71488, 71680), (71740, 71840), (71923, 71935), (71936, 72096), (72104, 72106), (72152, 72154), (72165, 72192), (72264, 72272), (72355, 72384), (72441, 72704), 72713, 72759, (72774, 72784), (72813, 72816), (72848, 72850), 72872, (72887, 72960), 72967, 72970, (73015, 73018), 73019, 73022, (73032, 73040), (73050, 73056), 73062, 73065, 73103, 73106, (73113, 73120), (73130, 73440), (73465, 73664), (73714, 73727), (74650, 74752), 74863, (74869, 74880), (75076, 77824), (78895, 82944), (83527, 92160), (92729, 92736), 92767, (92778, 92782), (92784, 92880), (92910, 92912), (92918, 92928), (92998, 93008), 93018, 93026, (93048, 93053), (93072, 93760), (93851, 93952), (94027, 94031), (94088, 94095), (94112, 94176), (94180, 94208), (100344, 100352), (101107, 110592), (110879, 110928), (110931, 110948), (110952, 110960), (111356, 113664), (113771, 113776), (113789, 113792), (113801, 113808), (113818, 113820), (113824, 118784), (119030, 119040), (119079, 119081), (119155, 119163), (119273, 119296), (119366, 119520), (119540, 119552), (119639, 119648), (119673, 119808), 119893, 119965, (119968, 119970), (119971, 119973), (119975, 119977), 119981, 119994, 119996, 120004, 120070, (120075, 120077), 120085, 120093, 120122, 120127, 120133, (120135, 120138), 120145, (120486, 120488), (120780, 120782), (121484, 121499), 121504, (121520, 122880), 122887, (122905, 122907), 122914, 122917, (122923, 123136), (123181, 123184), (123198, 123200), (123210, 123214), (123216, 123584), (123642, 123647), (123648, 124928), (125125, 125127), (125143, 125184), (125260, 125264), (125274, 125278), (125280, 126065), (126133, 126209), (126270, 126464), 126468, 126496, 126499, (126501, 126503), 126504, 126515, 126520, 126522, (126524, 126530), (126531, 126535), 126536, 126538, 126540, 126544, 126547, (126549, 126551), 126552, 126554, 126556, 126558, 126560, 126563, (126565, 126567), 126571, 126579, 126584, 126589, 126591, 126602, (126620, 126625), 126628, 126634, (126652, 126704), (126706, 126976), (127020, 127024), (127124, 127136), (127151, 127153), 127168, 127184, (127222, 127232), (127245, 127248), (127341, 127344), (127405, 127462), (127491, 127504), (127548, 127552), (127561, 127568), (127570, 127584), (127590, 127744), (128726, 128736), (128749, 128752), (128763, 128768), (128884, 128896), (128985, 128992), (129004, 129024), (129036, 129040), (129096, 129104), (129114, 129120), (129160, 129168), (129198, 129280), 129292, 129394, (129399, 129402), (129443, 129445), (129451, 129454), (129483, 129485), (129620, 129632), (129646, 129648), (129652, 129656), (129659, 129664), (129667, 129680), (129686, 131072), (173783, 173824), (177973, 177984), (178206, 178208), (183970, 183984), (191457, 194560), (195102, 917760), (918000, 1114112)], 'Cc': [(0, 32), (127, 160)], 'Cf': [173, (1536, 1542), 1564, 1757, 1807, 2274, 6158, (8203, 8208), (8234, 8239), (8288, 8293), (8294, 8304), 65279, (65529, 65532), 69821, 69837, (78896, 78905), (113824, 113828), (119155, 119163), 917505, (917536, 917632)], 'Cn': [(888, 890), (896, 900), 907, 909, 930, 1328, (1367, 1369), (1419, 1421), 1424, (1480, 1488), (1515, 1519), (1525, 1536), 1565, 1806, (1867, 1869), (1970, 1984), (2043, 2045), (2094, 2096), 2111, (2140, 2142), 2143, (2155, 2208), 2229, (2238, 2259), 2436, (2445, 2447), (2449, 2451), 2473, 2481, (2483, 2486), (2490, 2492), (2501, 2503), (2505, 2507), (2511, 2519), (2520, 2524), 2526, (2532, 2534), (2559, 2561), 2564, (2571, 2575), (2577, 2579), 2601, 2609, 2612, 2615, (2618, 2620), 2621, (2627, 2631), (2633, 2635), (2638, 2641), (2642, 2649), 2653, (2655, 2662), (2679, 2689), 2692, 2702, 2706, 2729, 2737, 2740, (2746, 2748), 2758, 2762, (2766, 2768), (2769, 2784), (2788, 2790), (2802, 2809), 2816, 2820, (2829, 2831), (2833, 2835), 2857, 2865, 2868, (2874, 2876), (2885, 2887), (2889, 2891), (2894, 2902), (2904, 2908), 2910, (2916, 2918), (2936, 2946), 2948, (2955, 2958), 2961, (2966, 2969), 2971, 2973, (2976, 2979), (2981, 2984), (2987, 2990), (3002, 3006), (3011, 3014), 3017, (3022, 3024), (3025, 3031), (3032, 3046), (3067, 3072), 3085, 3089, 3113, (3130, 3133), 3141, 3145, (3150, 3157), 3159, (3163, 3168), (3172, 3174), (3184, 3191), 3213, 3217, 3241, 3252, (3258, 3260), 3269, 3273, (3278, 3285), (3287, 3294), 3295, (3300, 3302), 3312, (3315, 3328), 3332, 3341, 3345, 3397, 3401, (3408, 3412), (3428, 3430), (3456, 3458), 3460, (3479, 3482), 3506, 3516, (3518, 3520), (3527, 3530), (3531, 3535), 3541, 3543, (3552, 3558), (3568, 3570), (3573, 3585), (3643, 3647), (3676, 3713), 3715, 3717, 3723, 3748, 3750, (3774, 3776), 3781, 3783, (3790, 3792), (3802, 3804), (3808, 3840), 3912, (3949, 3953), 3992, 4029, 4045, (4059, 4096), 4294, (4296, 4301), (4302, 4304), 4681, (4686, 4688), 4695, 4697, (4702, 4704), 4745, (4750, 4752), 4785, (4790, 4792), 4799, 4801, (4806, 4808), 4823, 4881, (4886, 4888), (4955, 4957), (4989, 4992), (5018, 5024), (5110, 5112), (5118, 5120), (5789, 5792), (5881, 5888), 5901, (5909, 5920), (5943, 5952), (5972, 5984), 5997, 6001, (6004, 6016), (6110, 6112), (6122, 6128), (6138, 6144), 6159, (6170, 6176), (6265, 6272), (6315, 6320), (6390, 6400), 6431, (6444, 6448), (6460, 6464), (6465, 6468), (6510, 6512), (6517, 6528), (6572, 6576), (6602, 6608), (6619, 6622), (6684, 6686), 6751, (6781, 6783), (6794, 6800), (6810, 6816), (6830, 6832), (6847, 6912), (6988, 6992), (7037, 7040), (7156, 7164), (7224, 7227), (7242, 7245), (7305, 7312), (7355, 7357), (7368, 7376), (7419, 7424), 7674, (7958, 7960), (7966, 7968), (8006, 8008), (8014, 8016), 8024, 8026, 8028, 8030, (8062, 8064), 8117, 8133, (8148, 8150), 8156, (8176, 8178), 8181, 8191, 8293, (8306, 8308), 8335, (8349, 8352), (8384, 8400), (8433, 8448), (8588, 8592), (9255, 9280), (9291, 9312), (11124, 11126), (11158, 11160), 11311, 11359, (11508, 11513), 11558, (11560, 11565), (11566, 11568), (11624, 11631), (11633, 11647), (11671, 11680), 11687, 11695, 11703, 11711, 11719, 11727, 11735, 11743, (11856, 11904), 11930, (12020, 12032), (12246, 12272), (12284, 12288), 12352, (12439, 12441), (12544, 12549), 12592, 12687, (12731, 12736), (12772, 12784), 12831, (19894, 19904), (40944, 40960), (42125, 42128), (42183, 42192), (42540, 42560), (42744, 42752), (42944, 42946), (42951, 42999), (43052, 43056), (43066, 43072), (43128, 43136), (43206, 43214), (43226, 43232), (43348, 43359), (43389, 43392), 43470, (43482, 43486), 43519, (43575, 43584), (43598, 43600), (43610, 43612), (43715, 43739), (43767, 43777), (43783, 43785), (43791, 43793), (43799, 43808), 43815, 43823, (43880, 43888), (44014, 44016), (44026, 44032), (55204, 55216), (55239, 55243), (55292, 55296), (64110, 64112), (64218, 64256), (64263, 64275), (64280, 64285), 64311, 64317, 64319, 64322, 64325, (64450, 64467), (64832, 64848), (64912, 64914), (64968, 65008), (65022, 65024), (65050, 65056), 65107, 65127, (65132, 65136), 65141, (65277, 65279), 65280, (65471, 65474), (65480, 65482), (65488, 65490), (65496, 65498), (65501, 65504), 65511, (65519, 65529), (65534, 65536), 65548, 65575, 65595, 65598, (65614, 65616), (65630, 65664), (65787, 65792), (65795, 65799), (65844, 65847), 65935, (65948, 65952), (65953, 66000), (66046, 66176), (66205, 66208), (66257, 66272), (66300, 66304), (66340, 66349), (66379, 66384), (66427, 66432), 66462, (66500, 66504), (66518, 66560), (66718, 66720), (66730, 66736), (66772, 66776), (66812, 66816), (66856, 66864), (66916, 66927), (66928, 67072), (67383, 67392), (67414, 67424), (67432, 67584), (67590, 67592), 67593, 67638, (67641, 67644), (67645, 67647), 67670, (67743, 67751), (67760, 67808), 67827, (67830, 67835), (67868, 67871), (67898, 67903), (67904, 67968), (68024, 68028), (68048, 68050), 68100, (68103, 68108), 68116, 68120, (68150, 68152), (68155, 68159), (68169, 68176), (68185, 68192), (68256, 68288), (68327, 68331), (68343, 68352), (68406, 68409), (68438, 68440), (68467, 68472), (68498, 68505), (68509, 68521), (68528, 68608), (68681, 68736), (68787, 68800), (68851, 68858), (68904, 68912), (68922, 69216), (69247, 69376), (69416, 69424), (69466, 69600), (69623, 69632), (69710, 69714), (69744, 69759), (69826, 69837), (69838, 69840), (69865, 69872), (69882, 69888), 69941, (69959, 69968), (70007, 70016), (70094, 70096), 70112, (70133, 70144), 70162, (70207, 70272), 70279, 70281, 70286, 70302, (70314, 70320), (70379, 70384), (70394, 70400), 70404, (70413, 70415), (70417, 70419), 70441, 70449, 70452, 70458, (70469, 70471), (70473, 70475), (70478, 70480), (70481, 70487), (70488, 70493), (70500, 70502), (70509, 70512), (70517, 70656), 70746, 70748, (70752, 70784), (70856, 70864), (70874, 71040), (71094, 71096), (71134, 71168), (71237, 71248), (71258, 71264), (71277, 71296), (71353, 71360), (71370, 71424), (71451, 71453), (71468, 71472), (71488, 71680), (71740, 71840), (71923, 71935), (71936, 72096), (72104, 72106), (72152, 72154), (72165, 72192), (72264, 72272), (72355, 72384), (72441, 72704), 72713, 72759, (72774, 72784), (72813, 72816), (72848, 72850), 72872, (72887, 72960), 72967, 72970, (73015, 73018), 73019, 73022, (73032, 73040), (73050, 73056), 73062, 73065, 73103, 73106, (73113, 73120), (73130, 73440), (73465, 73664), (73714, 73727), (74650, 74752), 74863, (74869, 74880), (75076, 77824), 78895, (78905, 82944), (83527, 92160), (92729, 92736), 92767, (92778, 92782), (92784, 92880), (92910, 92912), (92918, 92928), (92998, 93008), 93018, 93026, (93048, 93053), (93072, 93760), (93851, 93952), (94027, 94031), (94088, 94095), (94112, 94176), (94180, 94208), (100344, 100352), (101107, 110592), (110879, 110928), (110931, 110948), (110952, 110960), (111356, 113664), (113771, 113776), (113789, 113792), (113801, 113808), (113818, 113820), (113828, 118784), (119030, 119040), (119079, 119081), (119273, 119296), (119366, 119520), (119540, 119552), (119639, 119648), (119673, 119808), 119893, 119965, (119968, 119970), (119971, 119973), (119975, 119977), 119981, 119994, 119996, 120004, 120070, (120075, 120077), 120085, 120093, 120122, 120127, 120133, (120135, 120138), 120145, (120486, 120488), (120780, 120782), (121484, 121499), 121504, (121520, 122880), 122887, (122905, 122907), 122914, 122917, (122923, 123136), (123181, 123184), (123198, 123200), (123210, 123214), (123216, 123584), (123642, 123647), (123648, 124928), (125125, 125127), (125143, 125184), (125260, 125264), (125274, 125278), (125280, 126065), (126133, 126209), (126270, 126464), 126468, 126496, 126499, (126501, 126503), 126504, 126515, 126520, 126522, (126524, 126530), (126531, 126535), 126536, 126538, 126540, 126544, 126547, (126549, 126551), 126552, 126554, 126556, 126558, 126560, 126563, (126565, 126567), 126571, 126579, 126584, 126589, 126591, 126602, (126620, 126625), 126628, 126634, (126652, 126704), (126706, 126976), (127020, 127024), (127124, 127136), (127151, 127153), 127168, 127184, (127222, 127232), (127245, 127248), (127341, 127344), (127405, 127462), (127491, 127504), (127548, 127552), (127561, 127568), (127570, 127584), (127590, 127744), (128726, 128736), (128749, 128752), (128763, 128768), (128884, 128896), (128985, 128992), (129004, 129024), (129036, 129040), (129096, 129104), (129114, 129120), (129160, 129168), (129198, 129280), 129292, 129394, (129399, 129402), (129443, 129445), (129451, 129454), (129483, 129485), (129620, 129632), (129646, 129648), (129652, 129656), (129659, 129664), (129667, 129680), (129686, 131072), (173783, 173824), (177973, 177984), (178206, 178208), (183970, 183984), (191457, 194560), (195102, 917505), (917506, 917536), (917632, 917760), (918000, 983040), (1048574, 1048576), (1114110, 1114112)], 'Co': [(57344, 63744), (983040, 1048574), (1048576, 1114110)], 'Cs': [(55296, 57344)], 'L': [(65, 91), (97, 123), 170, 181, 186, (192, 215), (216, 247), (248, 706), (710, 722), (736, 741), 748, 750, (880, 885), (886, 888), (890, 894), 895, 902, (904, 907), 908, (910, 930), (931, 1014), (1015, 1154), (1162, 1328), (1329, 1367), 1369, (1376, 1417), (1488, 1515), (1519, 1523), (1568, 1611), (1646, 1648), (1649, 1748), 1749, (1765, 1767), (1774, 1776), (1786, 1789), 1791, 1808, (1810, 1840), (1869, 1958), 1969, (1994, 2027), (2036, 2038), 2042, (2048, 2070), 2074, 2084, 2088, (2112, 2137), (2144, 2155), (2208, 2229), (2230, 2238), (2308, 2362), 2365, 2384, (2392, 2402), (2417, 2433), (2437, 2445), (2447, 2449), (2451, 2473), (2474, 2481), 2482, (2486, 2490), 2493, 2510, (2524, 2526), (2527, 2530), (2544, 2546), 2556, (2565, 2571), (2575, 2577), (2579, 2601), (2602, 2609), (2610, 2612), (2613, 2615), (2616, 2618), (2649, 2653), 2654, (2674, 2677), (2693, 2702), (2703, 2706), (2707, 2729), (2730, 2737), (2738, 2740), (2741, 2746), 2749, 2768, (2784, 2786), 2809, (2821, 2829), (2831, 2833), (2835, 2857), (2858, 2865), (2866, 2868), (2869, 2874), 2877, (2908, 2910), (2911, 2914), 2929, 2947, (2949, 2955), (2958, 2961), (2962, 2966), (2969, 2971), 2972, (2974, 2976), (2979, 2981), (2984, 2987), (2990, 3002), 3024, (3077, 3085), (3086, 3089), (3090, 3113), (3114, 3130), 3133, (3160, 3163), (3168, 3170), 3200, (3205, 3213), (3214, 3217), (3218, 3241), (3242, 3252), (3253, 3258), 3261, 3294, (3296, 3298), (3313, 3315), (3333, 3341), (3342, 3345), (3346, 3387), 3389, 3406, (3412, 3415), (3423, 3426), (3450, 3456), (3461, 3479), (3482, 3506), (3507, 3516), 3517, (3520, 3527), (3585, 3633), (3634, 3636), (3648, 3655), (3713, 3715), 3716, (3718, 3723), (3724, 3748), 3749, (3751, 3761), (3762, 3764), 3773, (3776, 3781), 3782, (3804, 3808), 3840, (3904, 3912), (3913, 3949), (3976, 3981), (4096, 4139), 4159, (4176, 4182), (4186, 4190), 4193, (4197, 4199), (4206, 4209), (4213, 4226), 4238, (4256, 4294), 4295, 4301, (4304, 4347), (4348, 4681), (4682, 4686), (4688, 4695), 4696, (4698, 4702), (4704, 4745), (4746, 4750), (4752, 4785), (4786, 4790), (4792, 4799), 4800, (4802, 4806), (4808, 4823), (4824, 4881), (4882, 4886), (4888, 4955), (4992, 5008), (5024, 5110), (5112, 5118), (5121, 5741), (5743, 5760), (5761, 5787), (5792, 5867), (5873, 5881), (5888, 5901), (5902, 5906), (5920, 5938), (5952, 5970), (5984, 5997), (5998, 6001), (6016, 6068), 6103, 6108, (6176, 6265), (6272, 6277), (6279, 6313), 6314, (6320, 6390), (6400, 6431), (6480, 6510), (6512, 6517), (6528, 6572), (6576, 6602), (6656, 6679), (6688, 6741), 6823, (6917, 6964), (6981, 6988), (7043, 7073), (7086, 7088), (7098, 7142), (7168, 7204), (7245, 7248), (7258, 7294), (7296, 7305), (7312, 7355), (7357, 7360), (7401, 7405), (7406, 7412), (7413, 7415), 7418, (7424, 7616), (7680, 7958), (7960, 7966), (7968, 8006), (8008, 8014), (8016, 8024), 8025, 8027, 8029, (8031, 8062), (8064, 8117), (8118, 8125), 8126, (8130, 8133), (8134, 8141), (8144, 8148), (8150, 8156), (8160, 8173), (8178, 8181), (8182, 8189), 8305, 8319, (8336, 8349), 8450, 8455, (8458, 8468), 8469, (8473, 8478), 8484, 8486, 8488, (8490, 8494), (8495, 8506), (8508, 8512), (8517, 8522), 8526, (8579, 8581), (11264, 11311), (11312, 11359), (11360, 11493), (11499, 11503), (11506, 11508), (11520, 11558), 11559, 11565, (11568, 11624), 11631, (11648, 11671), (11680, 11687), (11688, 11695), (11696, 11703), (11704, 11711), (11712, 11719), (11720, 11727), (11728, 11735), (11736, 11743), 11823, (12293, 12295), (12337, 12342), (12347, 12349), (12353, 12439), (12445, 12448), (12449, 12539), (12540, 12544), (12549, 12592), (12593, 12687), (12704, 12731), (12784, 12800), (13312, 19894), (19968, 40944), (40960, 42125), (42192, 42238), (42240, 42509), (42512, 42528), (42538, 42540), (42560, 42607), (42623, 42654), (42656, 42726), (42775, 42784), (42786, 42889), (42891, 42944), (42946, 42951), (42999, 43010), (43011, 43014), (43015, 43019), (43020, 43043), (43072, 43124), (43138, 43188), (43250, 43256), 43259, (43261, 43263), (43274, 43302), (43312, 43335), (43360, 43389), (43396, 43443), 43471, (43488, 43493), (43494, 43504), (43514, 43519), (43520, 43561), (43584, 43587), (43588, 43596), (43616, 43639), 43642, (43646, 43696), 43697, (43701, 43703), (43705, 43710), 43712, 43714, (43739, 43742), (43744, 43755), (43762, 43765), (43777, 43783), (43785, 43791), (43793, 43799), (43808, 43815), (43816, 43823), (43824, 43867), (43868, 43880), (43888, 44003), (44032, 55204), (55216, 55239), (55243, 55292), (63744, 64110), (64112, 64218), (64256, 64263), (64275, 64280), 64285, (64287, 64297), (64298, 64311), (64312, 64317), 64318, (64320, 64322), (64323, 64325), (64326, 64434), (64467, 64830), (64848, 64912), (64914, 64968), (65008, 65020), (65136, 65141), (65142, 65277), (65313, 65339), (65345, 65371), (65382, 65471), (65474, 65480), (65482, 65488), (65490, 65496), (65498, 65501), (65536, 65548), (65549, 65575), (65576, 65595), (65596, 65598), (65599, 65614), (65616, 65630), (65664, 65787), (66176, 66205), (66208, 66257), (66304, 66336), (66349, 66369), (66370, 66378), (66384, 66422), (66432, 66462), (66464, 66500), (66504, 66512), (66560, 66718), (66736, 66772), (66776, 66812), (66816, 66856), (66864, 66916), (67072, 67383), (67392, 67414), (67424, 67432), (67584, 67590), 67592, (67594, 67638), (67639, 67641), 67644, (67647, 67670), (67680, 67703), (67712, 67743), (67808, 67827), (67828, 67830), (67840, 67862), (67872, 67898), (67968, 68024), (68030, 68032), 68096, (68112, 68116), (68117, 68120), (68121, 68150), (68192, 68221), (68224, 68253), (68288, 68296), (68297, 68325), (68352, 68406), (68416, 68438), (68448, 68467), (68480, 68498), (68608, 68681), (68736, 68787), (68800, 68851), (68864, 68900), (69376, 69405), 69415, (69424, 69446), (69600, 69623), (69635, 69688), (69763, 69808), (69840, 69865), (69891, 69927), 69956, (69968, 70003), 70006, (70019, 70067), (70081, 70085), 70106, 70108, (70144, 70162), (70163, 70188), (70272, 70279), 70280, (70282, 70286), (70287, 70302), (70303, 70313), (70320, 70367), (70405, 70413), (70415, 70417), (70419, 70441), (70442, 70449), (70450, 70452), (70453, 70458), 70461, 70480, (70493, 70498), (70656, 70709), (70727, 70731), 70751, (70784, 70832), (70852, 70854), 70855, (71040, 71087), (71128, 71132), (71168, 71216), 71236, (71296, 71339), 71352, (71424, 71451), (71680, 71724), (71840, 71904), 71935, (72096, 72104), (72106, 72145), 72161, 72163, 72192, (72203, 72243), 72250, 72272, (72284, 72330), 72349, (72384, 72441), (72704, 72713), (72714, 72751), 72768, (72818, 72848), (72960, 72967), (72968, 72970), (72971, 73009), 73030, (73056, 73062), (73063, 73065), (73066, 73098), 73112, (73440, 73459), (73728, 74650), (74880, 75076), (77824, 78895), (82944, 83527), (92160, 92729), (92736, 92767), (92880, 92910), (92928, 92976), (92992, 92996), (93027, 93048), (93053, 93072), (93760, 93824), (93952, 94027), 94032, (94099, 94112), (94176, 94178), 94179, (94208, 100344), (100352, 101107), (110592, 110879), (110928, 110931), (110948, 110952), (110960, 111356), (113664, 113771), (113776, 113789), (113792, 113801), (113808, 113818), (119808, 119893), (119894, 119965), (119966, 119968), 119970, (119973, 119975), (119977, 119981), (119982, 119994), 119995, (119997, 120004), (120005, 120070), (120071, 120075), (120077, 120085), (120086, 120093), (120094, 120122), (120123, 120127), (120128, 120133), 120134, (120138, 120145), (120146, 120486), (120488, 120513), (120514, 120539), (120540, 120571), (120572, 120597), (120598, 120629), (120630, 120655), (120656, 120687), (120688, 120713), (120714, 120745), (120746, 120771), (120772, 120780), (123136, 123181), (123191, 123198), 123214, (123584, 123628), (124928, 125125), (125184, 125252), 125259, (126464, 126468), (126469, 126496), (126497, 126499), 126500, 126503, (126505, 126515), (126516, 126520), 126521, 126523, 126530, 126535, 126537, 126539, (126541, 126544), (126545, 126547), 126548, 126551, 126553, 126555, 126557, 126559, (126561, 126563), 126564, (126567, 126571), (126572, 126579), (126580, 126584), (126585, 126589), 126590, (126592, 126602), (126603, 126620), (126625, 126628), (126629, 126634), (126635, 126652), (131072, 173783), (173824, 177973), (177984, 178206), (178208, 183970), (183984, 191457), (194560, 195102)], 'Ll': [(97, 123), 181, (223, 247), (248, 256), 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, (311, 313), 314, 316, 318, 320, 322, 324, 326, (328, 330), 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, (382, 385), 387, 389, 392, (396, 398), 402, 405, (409, 412), 414, 417, 419, 421, 424, (426, 428), 429, 432, 436, 438, (441, 443), (445, 448), 454, 457, 460, 462, 464, 466, 468, 470, 472, 474, (476, 478), 479, 481, 483, 485, 487, 489, 491, 493, (495, 497), 499, 501, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, (563, 570), 572, (575, 577), 578, 583, 585, 587, 589, (591, 660), (661, 688), 881, 883, 887, (891, 894), 912, (940, 975), (976, 978), (981, 984), 985, 987, 989, 991, 993, 995, 997, 999, 1001, 1003, 1005, (1007, 1012), 1013, 1016, (1019, 1021), (1072, 1120), 1121, 1123, 1125, 1127, 1129, 1131, 1133, 1135, 1137, 1139, 1141, 1143, 1145, 1147, 1149, 1151, 1153, 1163, 1165, 1167, 1169, 1171, 1173, 1175, 1177, 1179, 1181, 1183, 1185, 1187, 1189, 1191, 1193, 1195, 1197, 1199, 1201, 1203, 1205, 1207, 1209, 1211, 1213, 1215, 1218, 1220, 1222, 1224, 1226, 1228, (1230, 1232), 1233, 1235, 1237, 1239, 1241, 1243, 1245, 1247, 1249, 1251, 1253, 1255, 1257, 1259, 1261, 1263, 1265, 1267, 1269, 1271, 1273, 1275, 1277, 1279, 1281, 1283, 1285, 1287, 1289, 1291, 1293, 1295, 1297, 1299, 1301, 1303, 1305, 1307, 1309, 1311, 1313, 1315, 1317, 1319, 1321, 1323, 1325, 1327, (1376, 1417), (4304, 4347), (4349, 4352), (5112, 5118), (7296, 7305), (7424, 7468), (7531, 7544), (7545, 7579), 7681, 7683, 7685, 7687, 7689, 7691, 7693, 7695, 7697, 7699, 7701, 7703, 7705, 7707, 7709, 7711, 7713, 7715, 7717, 7719, 7721, 7723, 7725, 7727, 7729, 7731, 7733, 7735, 7737, 7739, 7741, 7743, 7745, 7747, 7749, 7751, 7753, 7755, 7757, 7759, 7761, 7763, 7765, 7767, 7769, 7771, 7773, 7775, 7777, 7779, 7781, 7783, 7785, 7787, 7789, 7791, 7793, 7795, 7797, 7799, 7801, 7803, 7805, 7807, 7809, 7811, 7813, 7815, 7817, 7819, 7821, 7823, 7825, 7827, (7829, 7838), 7839, 7841, 7843, 7845, 7847, 7849, 7851, 7853, 7855, 7857, 7859, 7861, 7863, 7865, 7867, 7869, 7871, 7873, 7875, 7877, 7879, 7881, 7883, 7885, 7887, 7889, 7891, 7893, 7895, 7897, 7899, 7901, 7903, 7905, 7907, 7909, 7911, 7913, 7915, 7917, 7919, 7921, 7923, 7925, 7927, 7929, 7931, 7933, (7935, 7944), (7952, 7958), (7968, 7976), (7984, 7992), (8000, 8006), (8016, 8024), (8032, 8040), (8048, 8062), (8064, 8072), (8080, 8088), (8096, 8104), (8112, 8117), (8118, 8120), 8126, (8130, 8133), (8134, 8136), (8144, 8148), (8150, 8152), (8160, 8168), (8178, 8181), (8182, 8184), 8458, (8462, 8464), 8467, 8495, 8500, 8505, (8508, 8510), (8518, 8522), 8526, 8580, (11312, 11359), 11361, (11365, 11367), 11368, 11370, 11372, 11377, (11379, 11381), (11382, 11388), 11393, 11395, 11397, 11399, 11401, 11403, 11405, 11407, 11409, 11411, 11413, 11415, 11417, 11419, 11421, 11423, 11425, 11427, 11429, 11431, 11433, 11435, 11437, 11439, 11441, 11443, 11445, 11447, 11449, 11451, 11453, 11455, 11457, 11459, 11461, 11463, 11465, 11467, 11469, 11471, 11473, 11475, 11477, 11479, 11481, 11483, 11485, 11487, 11489, (11491, 11493), 11500, 11502, 11507, (11520, 11558), 11559, 11565, 42561, 42563, 42565, 42567, 42569, 42571, 42573, 42575, 42577, 42579, 42581, 42583, 42585, 42587, 42589, 42591, 42593, 42595, 42597, 42599, 42601, 42603, 42605, 42625, 42627, 42629, 42631, 42633, 42635, 42637, 42639, 42641, 42643, 42645, 42647, 42649, 42651, 42787, 42789, 42791, 42793, 42795, 42797, (42799, 42802), 42803, 42805, 42807, 42809, 42811, 42813, 42815, 42817, 42819, 42821, 42823, 42825, 42827, 42829, 42831, 42833, 42835, 42837, 42839, 42841, 42843, 42845, 42847, 42849, 42851, 42853, 42855, 42857, 42859, 42861, 42863, (42865, 42873), 42874, 42876, 42879, 42881, 42883, 42885, 42887, 42892, 42894, 42897, (42899, 42902), 42903, 42905, 42907, 42909, 42911, 42913, 42915, 42917, 42919, 42921, 42927, 42933, 42935, 42937, 42939, 42941, 42943, 42947, 43002, (43824, 43867), (43872, 43880), (43888, 43968), (64256, 64263), (64275, 64280), (65345, 65371), (66600, 66640), (66776, 66812), (68800, 68851), (71872, 71904), (93792, 93824), (119834, 119860), (119886, 119893), (119894, 119912), (119938, 119964), (119990, 119994), 119995, (119997, 120004), (120005, 120016), (120042, 120068), (120094, 120120), (120146, 120172), (120198, 120224), (120250, 120276), (120302, 120328), (120354, 120380), (120406, 120432), (120458, 120486), (120514, 120539), (120540, 120546), (120572, 120597), (120598, 120604), (120630, 120655), (120656, 120662), (120688, 120713), (120714, 120720), (120746, 120771), (120772, 120778), 120779, (125218, 125252)], 'Lm': [(688, 706), (710, 722), (736, 741), 748, 750, 884, 890, 1369, 1600, (1765, 1767), (2036, 2038), 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, (7288, 7294), (7468, 7531), 7544, (7579, 7616), 8305, 8319, (8336, 8349), (11388, 11390), 11631, 11823, 12293, (12337, 12342), 12347, (12445, 12447), (12540, 12543), 40981, (42232, 42238), 42508, 42623, (42652, 42654), (42775, 42784), 42864, 42888, (43000, 43002), 43471, 43494, 43632, 43741, (43763, 43765), (43868, 43872), 65392, (65438, 65440), (92992, 92996), (94099, 94112), (94176, 94178), 94179, (123191, 123198), 125259], 'Lo': [170, 186, 443, (448, 452), 660, (1488, 1515), (1519, 1523), (1568, 1600), (1601, 1611), (1646, 1648), (1649, 1748), 1749, (1774, 1776), (1786, 1789), 1791, 1808, (1810, 1840), (1869, 1958), 1969, (1994, 2027), (2048, 2070), (2112, 2137), (2144, 2155), (2208, 2229), (2230, 2238), (2308, 2362), 2365, 2384, (2392, 2402), (2418, 2433), (2437, 2445), (2447, 2449), (2451, 2473), (2474, 2481), 2482, (2486, 2490), 2493, 2510, (2524, 2526), (2527, 2530), (2544, 2546), 2556, (2565, 2571), (2575, 2577), (2579, 2601), (2602, 2609), (2610, 2612), (2613, 2615), (2616, 2618), (2649, 2653), 2654, (2674, 2677), (2693, 2702), (2703, 2706), (2707, 2729), (2730, 2737), (2738, 2740), (2741, 2746), 2749, 2768, (2784, 2786), 2809, (2821, 2829), (2831, 2833), (2835, 2857), (2858, 2865), (2866, 2868), (2869, 2874), 2877, (2908, 2910), (2911, 2914), 2929, 2947, (2949, 2955), (2958, 2961), (2962, 2966), (2969, 2971), 2972, (2974, 2976), (2979, 2981), (2984, 2987), (2990, 3002), 3024, (3077, 3085), (3086, 3089), (3090, 3113), (3114, 3130), 3133, (3160, 3163), (3168, 3170), 3200, (3205, 3213), (3214, 3217), (3218, 3241), (3242, 3252), (3253, 3258), 3261, 3294, (3296, 3298), (3313, 3315), (3333, 3341), (3342, 3345), (3346, 3387), 3389, 3406, (3412, 3415), (3423, 3426), (3450, 3456), (3461, 3479), (3482, 3506), (3507, 3516), 3517, (3520, 3527), (3585, 3633), (3634, 3636), (3648, 3654), (3713, 3715), 3716, (3718, 3723), (3724, 3748), 3749, (3751, 3761), (3762, 3764), 3773, (3776, 3781), (3804, 3808), 3840, (3904, 3912), (3913, 3949), (3976, 3981), (4096, 4139), 4159, (4176, 4182), (4186, 4190), 4193, (4197, 4199), (4206, 4209), (4213, 4226), 4238, (4352, 4681), (4682, 4686), (4688, 4695), 4696, (4698, 4702), (4704, 4745), (4746, 4750), (4752, 4785), (4786, 4790), (4792, 4799), 4800, (4802, 4806), (4808, 4823), (4824, 4881), (4882, 4886), (4888, 4955), (4992, 5008), (5121, 5741), (5743, 5760), (5761, 5787), (5792, 5867), (5873, 5881), (5888, 5901), (5902, 5906), (5920, 5938), (5952, 5970), (5984, 5997), (5998, 6001), (6016, 6068), 6108, (6176, 6211), (6212, 6265), (6272, 6277), (6279, 6313), 6314, (6320, 6390), (6400, 6431), (6480, 6510), (6512, 6517), (6528, 6572), (6576, 6602), (6656, 6679), (6688, 6741), (6917, 6964), (6981, 6988), (7043, 7073), (7086, 7088), (7098, 7142), (7168, 7204), (7245, 7248), (7258, 7288), (7401, 7405), (7406, 7412), (7413, 7415), 7418, (8501, 8505), (11568, 11624), (11648, 11671), (11680, 11687), (11688, 11695), (11696, 11703), (11704, 11711), (11712, 11719), (11720, 11727), (11728, 11735), (11736, 11743), 12294, 12348, (12353, 12439), 12447, (12449, 12539), 12543, (12549, 12592), (12593, 12687), (12704, 12731), (12784, 12800), (13312, 19894), (19968, 40944), (40960, 40981), (40982, 42125), (42192, 42232), (42240, 42508), (42512, 42528), (42538, 42540), 42606, (42656, 42726), 42895, 42999, (43003, 43010), (43011, 43014), (43015, 43019), (43020, 43043), (43072, 43124), (43138, 43188), (43250, 43256), 43259, (43261, 43263), (43274, 43302), (43312, 43335), (43360, 43389), (43396, 43443), (43488, 43493), (43495, 43504), (43514, 43519), (43520, 43561), (43584, 43587), (43588, 43596), (43616, 43632), (43633, 43639), 43642, (43646, 43696), 43697, (43701, 43703), (43705, 43710), 43712, 43714, (43739, 43741), (43744, 43755), 43762, (43777, 43783), (43785, 43791), (43793, 43799), (43808, 43815), (43816, 43823), (43968, 44003), (44032, 55204), (55216, 55239), (55243, 55292), (63744, 64110), (64112, 64218), 64285, (64287, 64297), (64298, 64311), (64312, 64317), 64318, (64320, 64322), (64323, 64325), (64326, 64434), (64467, 64830), (64848, 64912), (64914, 64968), (65008, 65020), (65136, 65141), (65142, 65277), (65382, 65392), (65393, 65438), (65440, 65471), (65474, 65480), (65482, 65488), (65490, 65496), (65498, 65501), (65536, 65548), (65549, 65575), (65576, 65595), (65596, 65598), (65599, 65614), (65616, 65630), (65664, 65787), (66176, 66205), (66208, 66257), (66304, 66336), (66349, 66369), (66370, 66378), (66384, 66422), (66432, 66462), (66464, 66500), (66504, 66512), (66640, 66718), (66816, 66856), (66864, 66916), (67072, 67383), (67392, 67414), (67424, 67432), (67584, 67590), 67592, (67594, 67638), (67639, 67641), 67644, (67647, 67670), (67680, 67703), (67712, 67743), (67808, 67827), (67828, 67830), (67840, 67862), (67872, 67898), (67968, 68024), (68030, 68032), 68096, (68112, 68116), (68117, 68120), (68121, 68150), (68192, 68221), (68224, 68253), (68288, 68296), (68297, 68325), (68352, 68406), (68416, 68438), (68448, 68467), (68480, 68498), (68608, 68681), (68864, 68900), (69376, 69405), 69415, (69424, 69446), (69600, 69623), (69635, 69688), (69763, 69808), (69840, 69865), (69891, 69927), 69956, (69968, 70003), 70006, (70019, 70067), (70081, 70085), 70106, 70108, (70144, 70162), (70163, 70188), (70272, 70279), 70280, (70282, 70286), (70287, 70302), (70303, 70313), (70320, 70367), (70405, 70413), (70415, 70417), (70419, 70441), (70442, 70449), (70450, 70452), (70453, 70458), 70461, 70480, (70493, 70498), (70656, 70709), (70727, 70731), 70751, (70784, 70832), (70852, 70854), 70855, (71040, 71087), (71128, 71132), (71168, 71216), 71236, (71296, 71339), 71352, (71424, 71451), (71680, 71724), 71935, (72096, 72104), (72106, 72145), 72161, 72163, 72192, (72203, 72243), 72250, 72272, (72284, 72330), 72349, (72384, 72441), (72704, 72713), (72714, 72751), 72768, (72818, 72848), (72960, 72967), (72968, 72970), (72971, 73009), 73030, (73056, 73062), (73063, 73065), (73066, 73098), 73112, (73440, 73459), (73728, 74650), (74880, 75076), (77824, 78895), (82944, 83527), (92160, 92729), (92736, 92767), (92880, 92910), (92928, 92976), (93027, 93048), (93053, 93072), (93952, 94027), 94032, (94208, 100344), (100352, 101107), (110592, 110879), (110928, 110931), (110948, 110952), (110960, 111356), (113664, 113771), (113776, 113789), (113792, 113801), (113808, 113818), (123136, 123181), 123214, (123584, 123628), (124928, 125125), (126464, 126468), (126469, 126496), (126497, 126499), 126500, 126503, (126505, 126515), (126516, 126520), 126521, 126523, 126530, 126535, 126537, 126539, (126541, 126544), (126545, 126547), 126548, 126551, 126553, 126555, 126557, 126559, (126561, 126563), 126564, (126567, 126571), (126572, 126579), (126580, 126584), (126585, 126589), 126590, (126592, 126602), (126603, 126620), (126625, 126628), (126629, 126634), (126635, 126652), (131072, 173783), (173824, 177973), (177984, 178206), (178208, 183970), (183984, 191457), (194560, 195102)], 'Lt': [453, 456, 459, 498, (8072, 8080), (8088, 8096), (8104, 8112), 8124, 8140, 8188], 'Lu': [(65, 91), (192, 215), (216, 223), 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, (376, 378), 379, 381, (385, 387), 388, (390, 392), (393, 396), (398, 402), (403, 405), (406, 409), (412, 414), (415, 417), 418, 420, (422, 424), 425, 428, (430, 432), (433, 436), 437, (439, 441), 444, 452, 455, 458, 461, 463, 465, 467, 469, 471, 473, 475, 478, 480, 482, 484, 486, 488, 490, 492, 494, 497, 500, (502, 505), 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, (570, 572), (573, 575), 577, (579, 583), 584, 586, 588, 590, 880, 882, 886, 895, 902, (904, 907), 908, (910, 912), (913, 930), (931, 940), 975, (978, 981), 984, 986, 988, 990, 992, 994, 996, 998, 1000, 1002, 1004, 1006, 1012, 1015, (1017, 1019), (1021, 1072), 1120, 1122, 1124, 1126, 1128, 1130, 1132, 1134, 1136, 1138, 1140, 1142, 1144, 1146, 1148, 1150, 1152, 1162, 1164, 1166, 1168, 1170, 1172, 1174, 1176, 1178, 1180, 1182, 1184, 1186, 1188, 1190, 1192, 1194, 1196, 1198, 1200, 1202, 1204, 1206, 1208, 1210, 1212, 1214, (1216, 1218), 1219, 1221, 1223, 1225, 1227, 1229, 1232, 1234, 1236, 1238, 1240, 1242, 1244, 1246, 1248, 1250, 1252, 1254, 1256, 1258, 1260, 1262, 1264, 1266, 1268, 1270, 1272, 1274, 1276, 1278, 1280, 1282, 1284, 1286, 1288, 1290, 1292, 1294, 1296, 1298, 1300, 1302, 1304, 1306, 1308, 1310, 1312, 1314, 1316, 1318, 1320, 1322, 1324, 1326, (1329, 1367), (4256, 4294), 4295, 4301, (5024, 5110), (7312, 7355), (7357, 7360), 7680, 7682, 7684, 7686, 7688, 7690, 7692, 7694, 7696, 7698, 7700, 7702, 7704, 7706, 7708, 7710, 7712, 7714, 7716, 7718, 7720, 7722, 7724, 7726, 7728, 7730, 7732, 7734, 7736, 7738, 7740, 7742, 7744, 7746, 7748, 7750, 7752, 7754, 7756, 7758, 7760, 7762, 7764, 7766, 7768, 7770, 7772, 7774, 7776, 7778, 7780, 7782, 7784, 7786, 7788, 7790, 7792, 7794, 7796, 7798, 7800, 7802, 7804, 7806, 7808, 7810, 7812, 7814, 7816, 7818, 7820, 7822, 7824, 7826, 7828, 7838, 7840, 7842, 7844, 7846, 7848, 7850, 7852, 7854, 7856, 7858, 7860, 7862, 7864, 7866, 7868, 7870, 7872, 7874, 7876, 7878, 7880, 7882, 7884, 7886, 7888, 7890, 7892, 7894, 7896, 7898, 7900, 7902, 7904, 7906, 7908, 7910, 7912, 7914, 7916, 7918, 7920, 7922, 7924, 7926, 7928, 7930, 7932, 7934, (7944, 7952), (7960, 7966), (7976, 7984), (7992, 8000), (8008, 8014), 8025, 8027, 8029, 8031, (8040, 8048), (8120, 8124), (8136, 8140), (8152, 8156), (8168, 8173), (8184, 8188), 8450, 8455, (8459, 8462), (8464, 8467), 8469, (8473, 8478), 8484, 8486, 8488, (8490, 8494), (8496, 8500), (8510, 8512), 8517, 8579, (11264, 11311), 11360, (11362, 11365), 11367, 11369, 11371, (11373, 11377), 11378, 11381, (11390, 11393), 11394, 11396, 11398, 11400, 11402, 11404, 11406, 11408, 11410, 11412, 11414, 11416, 11418, 11420, 11422, 11424, 11426, 11428, 11430, 11432, 11434, 11436, 11438, 11440, 11442, 11444, 11446, 11448, 11450, 11452, 11454, 11456, 11458, 11460, 11462, 11464, 11466, 11468, 11470, 11472, 11474, 11476, 11478, 11480, 11482, 11484, 11486, 11488, 11490, 11499, 11501, 11506, 42560, 42562, 42564, 42566, 42568, 42570, 42572, 42574, 42576, 42578, 42580, 42582, 42584, 42586, 42588, 42590, 42592, 42594, 42596, 42598, 42600, 42602, 42604, 42624, 42626, 42628, 42630, 42632, 42634, 42636, 42638, 42640, 42642, 42644, 42646, 42648, 42650, 42786, 42788, 42790, 42792, 42794, 42796, 42798, 42802, 42804, 42806, 42808, 42810, 42812, 42814, 42816, 42818, 42820, 42822, 42824, 42826, 42828, 42830, 42832, 42834, 42836, 42838, 42840, 42842, 42844, 42846, 42848, 42850, 42852, 42854, 42856, 42858, 42860, 42862, 42873, 42875, (42877, 42879), 42880, 42882, 42884, 42886, 42891, 42893, 42896, 42898, 42902, 42904, 42906, 42908, 42910, 42912, 42914, 42916, 42918, 42920, (42922, 42927), (42928, 42933), 42934, 42936, 42938, 42940, 42942, 42946, (42948, 42951), (65313, 65339), (66560, 66600), (66736, 66772), (68736, 68787), (71840, 71872), (93760, 93792), (119808, 119834), (119860, 119886), (119912, 119938), 119964, (119966, 119968), 119970, (119973, 119975), (119977, 119981), (119982, 119990), (120016, 120042), (120068, 120070), (120071, 120075), (120077, 120085), (120086, 120093), (120120, 120122), (120123, 120127), (120128, 120133), 120134, (120138, 120145), (120172, 120198), (120224, 120250), (120276, 120302), (120328, 120354), (120380, 120406), (120432, 120458), (120488, 120513), (120546, 120571), (120604, 120629), (120662, 120687), (120720, 120745), 120778, (125184, 125218)], 'M': [(768, 880), (1155, 1162), (1425, 1470), 1471, (1473, 1475), (1476, 1478), 1479, (1552, 1563), (1611, 1632), 1648, (1750, 1757), (1759, 1765), (1767, 1769), (1770, 1774), 1809, (1840, 1867), (1958, 1969), (2027, 2036), 2045, (2070, 2074), (2075, 2084), (2085, 2088), (2089, 2094), (2137, 2140), (2259, 2274), (2275, 2308), (2362, 2365), (2366, 2384), (2385, 2392), (2402, 2404), (2433, 2436), 2492, (2494, 2501), (2503, 2505), (2507, 2510), 2519, (2530, 2532), 2558, (2561, 2564), 2620, (2622, 2627), (2631, 2633), (2635, 2638), 2641, (2672, 2674), 2677, (2689, 2692), 2748, (2750, 2758), (2759, 2762), (2763, 2766), (2786, 2788), (2810, 2816), (2817, 2820), 2876, (2878, 2885), (2887, 2889), (2891, 2894), (2902, 2904), (2914, 2916), 2946, (3006, 3011), (3014, 3017), (3018, 3022), 3031, (3072, 3077), (3134, 3141), (3142, 3145), (3146, 3150), (3157, 3159), (3170, 3172), (3201, 3204), 3260, (3262, 3269), (3270, 3273), (3274, 3278), (3285, 3287), (3298, 3300), (3328, 3332), (3387, 3389), (3390, 3397), (3398, 3401), (3402, 3406), 3415, (3426, 3428), (3458, 3460), 3530, (3535, 3541), 3542, (3544, 3552), (3570, 3572), 3633, (3636, 3643), (3655, 3663), 3761, (3764, 3773), (3784, 3790), (3864, 3866), 3893, 3895, 3897, (3902, 3904), (3953, 3973), (3974, 3976), (3981, 3992), (3993, 4029), 4038, (4139, 4159), (4182, 4186), (4190, 4193), (4194, 4197), (4199, 4206), (4209, 4213), (4226, 4238), 4239, (4250, 4254), (4957, 4960), (5906, 5909), (5938, 5941), (5970, 5972), (6002, 6004), (6068, 6100), 6109, (6155, 6158), (6277, 6279), 6313, (6432, 6444), (6448, 6460), (6679, 6684), (6741, 6751), (6752, 6781), 6783, (6832, 6847), (6912, 6917), (6964, 6981), (7019, 7028), (7040, 7043), (7073, 7086), (7142, 7156), (7204, 7224), (7376, 7379), (7380, 7401), 7405, 7412, (7415, 7418), (7616, 7674), (7675, 7680), (8400, 8433), (11503, 11506), 11647, (11744, 11776), (12330, 12336), (12441, 12443), (42607, 42611), (42612, 42622), (42654, 42656), (42736, 42738), 43010, 43014, 43019, (43043, 43048), (43136, 43138), (43188, 43206), (43232, 43250), 43263, (43302, 43310), (43335, 43348), (43392, 43396), (43443, 43457), 43493, (43561, 43575), 43587, (43596, 43598), (43643, 43646), 43696, (43698, 43701), (43703, 43705), (43710, 43712), 43713, (43755, 43760), (43765, 43767), (44003, 44011), (44012, 44014), 64286, (65024, 65040), (65056, 65072), 66045, 66272, (66422, 66427), (68097, 68100), (68101, 68103), (68108, 68112), (68152, 68155), 68159, (68325, 68327), (68900, 68904), (69446, 69457), (69632, 69635), (69688, 69703), (69759, 69763), (69808, 69819), (69888, 69891), (69927, 69941), (69957, 69959), 70003, (70016, 70019), (70067, 70081), (70089, 70093), (70188, 70200), 70206, (70367, 70379), (70400, 70404), (70459, 70461), (70462, 70469), (70471, 70473), (70475, 70478), 70487, (70498, 70500), (70502, 70509), (70512, 70517), (70709, 70727), 70750, (70832, 70852), (71087, 71094), (71096, 71105), (71132, 71134), (71216, 71233), (71339, 71352), (71453, 71468), (71724, 71739), (72145, 72152), (72154, 72161), 72164, (72193, 72203), (72243, 72250), (72251, 72255), 72263, (72273, 72284), (72330, 72346), (72751, 72759), (72760, 72768), (72850, 72872), (72873, 72887), (73009, 73015), 73018, (73020, 73022), (73023, 73030), 73031, (73098, 73103), (73104, 73106), (73107, 73112), (73459, 73463), (92912, 92917), (92976, 92983), 94031, (94033, 94088), (94095, 94099), (113821, 113823), (119141, 119146), (119149, 119155), (119163, 119171), (119173, 119180), (119210, 119214), (119362, 119365), (121344, 121399), (121403, 121453), 121461, 121476, (121499, 121504), (121505, 121520), (122880, 122887), (122888, 122905), (122907, 122914), (122915, 122917), (122918, 122923), (123184, 123191), (123628, 123632), (125136, 125143), (125252, 125259), (917760, 918000)], 'Mc': [2307, 2363, (2366, 2369), (2377, 2381), (2382, 2384), (2434, 2436), (2494, 2497), (2503, 2505), (2507, 2509), 2519, 2563, (2622, 2625), 2691, (2750, 2753), 2761, (2763, 2765), (2818, 2820), 2878, 2880, (2887, 2889), (2891, 2893), 2903, (3006, 3008), (3009, 3011), (3014, 3017), (3018, 3021), 3031, (3073, 3076), (3137, 3141), (3202, 3204), 3262, (3264, 3269), (3271, 3273), (3274, 3276), (3285, 3287), (3330, 3332), (3390, 3393), (3398, 3401), (3402, 3405), 3415, (3458, 3460), (3535, 3538), (3544, 3552), (3570, 3572), (3902, 3904), 3967, (4139, 4141), 4145, 4152, (4155, 4157), (4182, 4184), (4194, 4197), (4199, 4206), (4227, 4229), (4231, 4237), 4239, (4250, 4253), 6070, (6078, 6086), (6087, 6089), (6435, 6439), (6441, 6444), (6448, 6450), (6451, 6457), (6681, 6683), 6741, 6743, 6753, (6755, 6757), (6765, 6771), 6916, 6965, 6971, (6973, 6978), (6979, 6981), 7042, 7073, (7078, 7080), 7082, 7143, (7146, 7149), 7150, (7154, 7156), (7204, 7212), (7220, 7222), 7393, 7415, (12334, 12336), (43043, 43045), 43047, (43136, 43138), (43188, 43204), (43346, 43348), 43395, (43444, 43446), (43450, 43452), (43454, 43457), (43567, 43569), (43571, 43573), 43597, 43643, 43645, 43755, (43758, 43760), 43765, (44003, 44005), (44006, 44008), (44009, 44011), 44012, 69632, 69634, 69762, (69808, 69811), (69815, 69817), 69932, (69957, 69959), 70018, (70067, 70070), (70079, 70081), (70188, 70191), (70194, 70196), 70197, (70368, 70371), (70402, 70404), (70462, 70464), (70465, 70469), (70471, 70473), (70475, 70478), 70487, (70498, 70500), (70709, 70712), (70720, 70722), 70725, (70832, 70835), 70841, (70843, 70847), 70849, (71087, 71090), (71096, 71100), 71102, (71216, 71219), (71227, 71229), 71230, 71340, (71342, 71344), 71350, (71456, 71458), 71462, (71724, 71727), 71736, (72145, 72148), (72156, 72160), 72164, 72249, (72279, 72281), 72343, 72751, 72766, 72873, 72881, 72884, (73098, 73103), (73107, 73109), 73110, (73461, 73463), (94033, 94088), (119141, 119143), (119149, 119155)], 'Me': [(1160, 1162), 6846, (8413, 8417), (8418, 8421), (42608, 42611)], 'Mn': [(768, 880), (1155, 1160), (1425, 1470), 1471, (1473, 1475), (1476, 1478), 1479, (1552, 1563), (1611, 1632), 1648, (1750, 1757), (1759, 1765), (1767, 1769), (1770, 1774), 1809, (1840, 1867), (1958, 1969), (2027, 2036), 2045, (2070, 2074), (2075, 2084), (2085, 2088), (2089, 2094), (2137, 2140), (2259, 2274), (2275, 2307), 2362, 2364, (2369, 2377), 2381, (2385, 2392), (2402, 2404), 2433, 2492, (2497, 2501), 2509, (2530, 2532), 2558, (2561, 2563), 2620, (2625, 2627), (2631, 2633), (2635, 2638), 2641, (2672, 2674), 2677, (2689, 2691), 2748, (2753, 2758), (2759, 2761), 2765, (2786, 2788), (2810, 2816), 2817, 2876, 2879, (2881, 2885), 2893, 2902, (2914, 2916), 2946, 3008, 3021, 3072, 3076, (3134, 3137), (3142, 3145), (3146, 3150), (3157, 3159), (3170, 3172), 3201, 3260, 3263, 3270, (3276, 3278), (3298, 3300), (3328, 3330), (3387, 3389), (3393, 3397), 3405, (3426, 3428), 3530, (3538, 3541), 3542, 3633, (3636, 3643), (3655, 3663), 3761, (3764, 3773), (3784, 3790), (3864, 3866), 3893, 3895, 3897, (3953, 3967), (3968, 3973), (3974, 3976), (3981, 3992), (3993, 4029), 4038, (4141, 4145), (4146, 4152), (4153, 4155), (4157, 4159), (4184, 4186), (4190, 4193), (4209, 4213), 4226, (4229, 4231), 4237, 4253, (4957, 4960), (5906, 5909), (5938, 5941), (5970, 5972), (6002, 6004), (6068, 6070), (6071, 6078), 6086, (6089, 6100), 6109, (6155, 6158), (6277, 6279), 6313, (6432, 6435), (6439, 6441), 6450, (6457, 6460), (6679, 6681), 6683, 6742, (6744, 6751), 6752, 6754, (6757, 6765), (6771, 6781), 6783, (6832, 6846), (6912, 6916), 6964, (6966, 6971), 6972, 6978, (7019, 7028), (7040, 7042), (7074, 7078), (7080, 7082), (7083, 7086), 7142, (7144, 7146), 7149, (7151, 7154), (7212, 7220), (7222, 7224), (7376, 7379), (7380, 7393), (7394, 7401), 7405, 7412, (7416, 7418), (7616, 7674), (7675, 7680), (8400, 8413), 8417, (8421, 8433), (11503, 11506), 11647, (11744, 11776), (12330, 12334), (12441, 12443), 42607, (42612, 42622), (42654, 42656), (42736, 42738), 43010, 43014, 43019, (43045, 43047), (43204, 43206), (43232, 43250), 43263, (43302, 43310), (43335, 43346), (43392, 43395), 43443, (43446, 43450), (43452, 43454), 43493, (43561, 43567), (43569, 43571), (43573, 43575), 43587, 43596, 43644, 43696, (43698, 43701), (43703, 43705), (43710, 43712), 43713, (43756, 43758), 43766, 44005, 44008, 44013, 64286, (65024, 65040), (65056, 65072), 66045, 66272, (66422, 66427), (68097, 68100), (68101, 68103), (68108, 68112), (68152, 68155), 68159, (68325, 68327), (68900, 68904), (69446, 69457), 69633, (69688, 69703), (69759, 69762), (69811, 69815), (69817, 69819), (69888, 69891), (69927, 69932), (69933, 69941), 70003, (70016, 70018), (70070, 70079), (70089, 70093), (70191, 70194), 70196, (70198, 70200), 70206, 70367, (70371, 70379), (70400, 70402), (70459, 70461), 70464, (70502, 70509), (70512, 70517), (70712, 70720), (70722, 70725), 70726, 70750, (70835, 70841), 70842, (70847, 70849), (70850, 70852), (71090, 71094), (71100, 71102), (71103, 71105), (71132, 71134), (71219, 71227), 71229, (71231, 71233), 71339, 71341, (71344, 71350), 71351, (71453, 71456), (71458, 71462), (71463, 71468), (71727, 71736), (71737, 71739), (72148, 72152), (72154, 72156), 72160, (72193, 72203), (72243, 72249), (72251, 72255), 72263, (72273, 72279), (72281, 72284), (72330, 72343), (72344, 72346), (72752, 72759), (72760, 72766), 72767, (72850, 72872), (72874, 72881), (72882, 72884), (72885, 72887), (73009, 73015), 73018, (73020, 73022), (73023, 73030), 73031, (73104, 73106), 73109, 73111, (73459, 73461), (92912, 92917), (92976, 92983), 94031, (94095, 94099), (113821, 113823), (119143, 119146), (119163, 119171), (119173, 119180), (119210, 119214), (119362, 119365), (121344, 121399), (121403, 121453), 121461, 121476, (121499, 121504), (121505, 121520), (122880, 122887), (122888, 122905), (122907, 122914), (122915, 122917), (122918, 122923), (123184, 123191), (123628, 123632), (125136, 125143), (125252, 125259), (917760, 918000)], 'N': [(48, 58), (178, 180), 185, (188, 191), (1632, 1642), (1776, 1786), (1984, 1994), (2406, 2416), (2534, 2544), (2548, 2554), (2662, 2672), (2790, 2800), (2918, 2928), (2930, 2936), (3046, 3059), (3174, 3184), (3192, 3199), (3302, 3312), (3416, 3423), (3430, 3449), (3558, 3568), (3664, 3674), (3792, 3802), (3872, 3892), (4160, 4170), (4240, 4250), (4969, 4989), (5870, 5873), (6112, 6122), (6128, 6138), (6160, 6170), (6470, 6480), (6608, 6619), (6784, 6794), (6800, 6810), (6992, 7002), (7088, 7098), (7232, 7242), (7248, 7258), 8304, (8308, 8314), (8320, 8330), (8528, 8579), (8581, 8586), (9312, 9372), (9450, 9472), (10102, 10132), 11517, 12295, (12321, 12330), (12344, 12347), (12690, 12694), (12832, 12842), (12872, 12880), (12881, 12896), (12928, 12938), (12977, 12992), (42528, 42538), (42726, 42736), (43056, 43062), (43216, 43226), (43264, 43274), (43472, 43482), (43504, 43514), (43600, 43610), (44016, 44026), (65296, 65306), (65799, 65844), (65856, 65913), (65930, 65932), (66273, 66300), (66336, 66340), 66369, 66378, (66513, 66518), (66720, 66730), (67672, 67680), (67705, 67712), (67751, 67760), (67835, 67840), (67862, 67868), (68028, 68030), (68032, 68048), (68050, 68096), (68160, 68169), (68221, 68223), (68253, 68256), (68331, 68336), (68440, 68448), (68472, 68480), (68521, 68528), (68858, 68864), (68912, 68922), (69216, 69247), (69405, 69415), (69457, 69461), (69714, 69744), (69872, 69882), (69942, 69952), (70096, 70106), (70113, 70133), (70384, 70394), (70736, 70746), (70864, 70874), (71248, 71258), (71360, 71370), (71472, 71484), (71904, 71923), (72784, 72813), (73040, 73050), (73120, 73130), (73664, 73685), (74752, 74863), (92768, 92778), (93008, 93018), (93019, 93026), (93824, 93847), (119520, 119540), (119648, 119673), (120782, 120832), (123200, 123210), (123632, 123642), (125127, 125136), (125264, 125274), (126065, 126124), (126125, 126128), (126129, 126133), (126209, 126254), (126255, 126270), (127232, 127245)], 'Nd': [(48, 58), (1632, 1642), (1776, 1786), (1984, 1994), (2406, 2416), (2534, 2544), (2662, 2672), (2790, 2800), (2918, 2928), (3046, 3056), (3174, 3184), (3302, 3312), (3430, 3440), (3558, 3568), (3664, 3674), (3792, 3802), (3872, 3882), (4160, 4170), (4240, 4250), (6112, 6122), (6160, 6170), (6470, 6480), (6608, 6618), (6784, 6794), (6800, 6810), (6992, 7002), (7088, 7098), (7232, 7242), (7248, 7258), (42528, 42538), (43216, 43226), (43264, 43274), (43472, 43482), (43504, 43514), (43600, 43610), (44016, 44026), (65296, 65306), (66720, 66730), (68912, 68922), (69734, 69744), (69872, 69882), (69942, 69952), (70096, 70106), (70384, 70394), (70736, 70746), (70864, 70874), (71248, 71258), (71360, 71370), (71472, 71482), (71904, 71914), (72784, 72794), (73040, 73050), (73120, 73130), (92768, 92778), (93008, 93018), (120782, 120832), (123200, 123210), (123632, 123642), (125264, 125274)], 'Nl': [(5870, 5873), (8544, 8579), (8581, 8585), 12295, (12321, 12330), (12344, 12347), (42726, 42736), (65856, 65909), 66369, 66378, (66513, 66518), (74752, 74863)], 'No': [(178, 180), 185, (188, 191), (2548, 2554), (2930, 2936), (3056, 3059), (3192, 3199), (3416, 3423), (3440, 3449), (3882, 3892), (4969, 4989), (6128, 6138), 6618, 8304, (8308, 8314), (8320, 8330), (8528, 8544), 8585, (9312, 9372), (9450, 9472), (10102, 10132), 11517, (12690, 12694), (12832, 12842), (12872, 12880), (12881, 12896), (12928, 12938), (12977, 12992), (43056, 43062), (65799, 65844), (65909, 65913), (65930, 65932), (66273, 66300), (66336, 66340), (67672, 67680), (67705, 67712), (67751, 67760), (67835, 67840), (67862, 67868), (68028, 68030), (68032, 68048), (68050, 68096), (68160, 68169), (68221, 68223), (68253, 68256), (68331, 68336), (68440, 68448), (68472, 68480), (68521, 68528), (68858, 68864), (69216, 69247), (69405, 69415), (69457, 69461), (69714, 69734), (70113, 70133), (71482, 71484), (71914, 71923), (72794, 72813), (73664, 73685), (93019, 93026), (93824, 93847), (119520, 119540), (119648, 119673), (125127, 125136), (126065, 126124), (126125, 126128), (126129, 126133), (126209, 126254), (126255, 126270), (127232, 127245)], 'P': [(33, 36), (37, 43), (44, 48), (58, 60), (63, 65), (91, 94), 95, 123, 125, 161, 167, 171, (182, 184), 187, 191, 894, 903, (1370, 1376), (1417, 1419), 1470, 1472, 1475, 1478, (1523, 1525), (1545, 1547), (1548, 1550), 1563, (1566, 1568), (1642, 1646), 1748, (1792, 1806), (2039, 2042), (2096, 2111), 2142, (2404, 2406), 2416, 2557, 2678, 2800, 3191, 3204, 3572, 3663, (3674, 3676), (3844, 3859), 3860, (3898, 3902), 3973, (4048, 4053), (4057, 4059), (4170, 4176), 4347, (4960, 4969), 5120, 5742, (5787, 5789), (5867, 5870), (5941, 5943), (6100, 6103), (6104, 6107), (6144, 6155), (6468, 6470), (6686, 6688), (6816, 6823), (6824, 6830), (7002, 7009), (7164, 7168), (7227, 7232), (7294, 7296), (7360, 7368), 7379, (8208, 8232), (8240, 8260), (8261, 8274), (8275, 8287), (8317, 8319), (8333, 8335), (8968, 8972), (9001, 9003), (10088, 10102), (10181, 10183), (10214, 10224), (10627, 10649), (10712, 10716), (10748, 10750), (11513, 11517), (11518, 11520), 11632, (11776, 11823), (11824, 11856), (12289, 12292), (12296, 12306), (12308, 12320), 12336, 12349, 12448, 12539, (42238, 42240), (42509, 42512), 42611, 42622, (42738, 42744), (43124, 43128), (43214, 43216), (43256, 43259), 43260, (43310, 43312), 43359, (43457, 43470), (43486, 43488), (43612, 43616), (43742, 43744), (43760, 43762), 44011, (64830, 64832), (65040, 65050), (65072, 65107), (65108, 65122), 65123, 65128, (65130, 65132), (65281, 65284), (65285, 65291), (65292, 65296), (65306, 65308), (65311, 65313), (65339, 65342), 65343, 65371, 65373, (65375, 65382), (65792, 65795), 66463, 66512, 66927, 67671, 67871, 67903, (68176, 68185), 68223, (68336, 68343), (68409, 68416), (68505, 68509), (69461, 69466), (69703, 69710), (69819, 69821), (69822, 69826), (69952, 69956), (70004, 70006), (70085, 70089), 70093, 70107, (70109, 70112), (70200, 70206), 70313, (70731, 70736), 70747, 70749, 70854, (71105, 71128), (71233, 71236), (71264, 71277), (71484, 71487), 71739, 72162, (72255, 72263), (72346, 72349), (72350, 72355), (72769, 72774), (72816, 72818), (73463, 73465), 73727, (74864, 74869), (92782, 92784), 92917, (92983, 92988), 92996, (93847, 93851), 94178, 113823, (121479, 121484), (125278, 125280)], 'Pc': [95, (8255, 8257), 8276, (65075, 65077), (65101, 65104), 65343], 'Pd': [45, 1418, 1470, 5120, 6150, (8208, 8214), 11799, 11802, (11834, 11836), 11840, 12316, 12336, 12448, (65073, 65075), 65112, 65123, 65293], 'Pe': [41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, (12318, 12320), 64830, 65048, 65078, 65080, 65082, 65084, 65086, 65088, 65090, 65092, 65096, 65114, 65116, 65118, 65289, 65341, 65373, 65376, 65379], 'Pf': [187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809], 'Pi': [171, 8216, (8219, 8221), 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808], 'Po': [(33, 36), (37, 40), 42, 44, (46, 48), (58, 60), (63, 65), 92, 161, 167, (182, 184), 191, 894, 903, (1370, 1376), 1417, 1472, 1475, 1478, (1523, 1525), (1545, 1547), (1548, 1550), 1563, (1566, 1568), (1642, 1646), 1748, (1792, 1806), (2039, 2042), (2096, 2111), 2142, (2404, 2406), 2416, 2557, 2678, 2800, 3191, 3204, 3572, 3663, (3674, 3676), (3844, 3859), 3860, 3973, (4048, 4053), (4057, 4059), (4170, 4176), 4347, (4960, 4969), 5742, (5867, 5870), (5941, 5943), (6100, 6103), (6104, 6107), (6144, 6150), (6151, 6155), (6468, 6470), (6686, 6688), (6816, 6823), (6824, 6830), (7002, 7009), (7164, 7168), (7227, 7232), (7294, 7296), (7360, 7368), 7379, (8214, 8216), (8224, 8232), (8240, 8249), (8251, 8255), (8257, 8260), (8263, 8274), 8275, (8277, 8287), (11513, 11517), (11518, 11520), 11632, (11776, 11778), (11782, 11785), 11787, (11790, 11799), (11800, 11802), 11803, (11806, 11808), (11818, 11823), (11824, 11834), (11836, 11840), 11841, (11843, 11856), (12289, 12292), 12349, 12539, (42238, 42240), (42509, 42512), 42611, 42622, (42738, 42744), (43124, 43128), (43214, 43216), (43256, 43259), 43260, (43310, 43312), 43359, (43457, 43470), (43486, 43488), (43612, 43616), (43742, 43744), (43760, 43762), 44011, (65040, 65047), 65049, 65072, (65093, 65095), (65097, 65101), (65104, 65107), (65108, 65112), (65119, 65122), 65128, (65130, 65132), (65281, 65284), (65285, 65288), 65290, 65292, (65294, 65296), (65306, 65308), (65311, 65313), 65340, 65377, (65380, 65382), (65792, 65795), 66463, 66512, 66927, 67671, 67871, 67903, (68176, 68185), 68223, (68336, 68343), (68409, 68416), (68505, 68509), (69461, 69466), (69703, 69710), (69819, 69821), (69822, 69826), (69952, 69956), (70004, 70006), (70085, 70089), 70093, 70107, (70109, 70112), (70200, 70206), 70313, (70731, 70736), 70747, 70749, 70854, (71105, 71128), (71233, 71236), (71264, 71277), (71484, 71487), 71739, 72162, (72255, 72263), (72346, 72349), (72350, 72355), (72769, 72774), (72816, 72818), (73463, 73465), 73727, (74864, 74869), (92782, 92784), 92917, (92983, 92988), 92996, (93847, 93851), 94178, 113823, (121479, 121484), (125278, 125280)], 'Ps': [40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317, 64831, 65047, 65077, 65079, 65081, 65083, 65085, 65087, 65089, 65091, 65095, 65113, 65115, 65117, 65288, 65339, 65371, 65375, 65378], 'S': [36, 43, (60, 63), 94, 96, 124, 126, (162, 167), (168, 170), 172, (174, 178), 180, 184, 215, 247, (706, 710), (722, 736), (741, 748), 749, (751, 768), 885, (900, 902), 1014, 1154, (1421, 1424), (1542, 1545), 1547, (1550, 1552), 1758, 1769, (1789, 1791), 2038, (2046, 2048), (2546, 2548), (2554, 2556), 2801, 2928, (3059, 3067), 3199, 3407, 3449, 3647, (3841, 3844), 3859, (3861, 3864), (3866, 3872), 3892, 3894, 3896, (4030, 4038), (4039, 4045), (4046, 4048), (4053, 4057), (4254, 4256), (5008, 5018), 5741, 6107, 6464, (6622, 6656), (7009, 7019), (7028, 7037), 8125, (8127, 8130), (8141, 8144), (8157, 8160), (8173, 8176), (8189, 8191), 8260, 8274, (8314, 8317), (8330, 8333), (8352, 8384), (8448, 8450), (8451, 8455), (8456, 8458), 8468, (8470, 8473), (8478, 8484), 8485, 8487, 8489, 8494, (8506, 8508), (8512, 8517), (8522, 8526), 8527, (8586, 8588), (8592, 8968), (8972, 9001), (9003, 9255), (9280, 9291), (9372, 9450), (9472, 10088), (10132, 10181), (10183, 10214), (10224, 10627), (10649, 10712), (10716, 10748), (10750, 11124), (11126, 11158), (11160, 11264), (11493, 11499), (11904, 11930), (11931, 12020), (12032, 12246), (12272, 12284), 12292, (12306, 12308), 12320, (12342, 12344), (12350, 12352), (12443, 12445), (12688, 12690), (12694, 12704), (12736, 12772), (12800, 12831), (12842, 12872), 12880, (12896, 12928), (12938, 12977), (12992, 13312), (19904, 19968), (42128, 42183), (42752, 42775), (42784, 42786), (42889, 42891), (43048, 43052), (43062, 43066), (43639, 43642), 43867, 64297, (64434, 64450), (65020, 65022), 65122, (65124, 65127), 65129, 65284, 65291, (65308, 65311), 65342, 65344, 65372, 65374, (65504, 65511), (65512, 65519), (65532, 65534), (65847, 65856), (65913, 65930), (65932, 65935), (65936, 65948), 65952, (66000, 66045), (67703, 67705), 68296, 71487, (73685, 73714), (92988, 92992), 92997, 113820, (118784, 119030), (119040, 119079), (119081, 119141), (119146, 119149), (119171, 119173), (119180, 119210), (119214, 119273), (119296, 119362), 119365, (119552, 119639), 120513, 120539, 120571, 120597, 120629, 120655, 120687, 120713, 120745, 120771, (120832, 121344), (121399, 121403), (121453, 121461), (121462, 121476), (121477, 121479), 123215, 123647, 126124, 126128, 126254, (126704, 126706), (126976, 127020), (127024, 127124), (127136, 127151), (127153, 127168), (127169, 127184), (127185, 127222), (127248, 127341), (127344, 127405), (127462, 127491), (127504, 127548), (127552, 127561), (127568, 127570), (127584, 127590), (127744, 128726), (128736, 128749), (128752, 128763), (128768, 128884), (128896, 128985), (128992, 129004), (129024, 129036), (129040, 129096), (129104, 129114), (129120, 129160), (129168, 129198), (129280, 129292), (129293, 129394), (129395, 129399), (129402, 129443), (129445, 129451), (129454, 129483), (129485, 129620), (129632, 129646), (129648, 129652), (129656, 129659), (129664, 129667), (129680, 129686)], 'Sc': [36, (162, 166), 1423, 1547, (2046, 2048), (2546, 2548), 2555, 2801, 3065, 3647, 6107, (8352, 8384), 43064, 65020, 65129, 65284, (65504, 65506), (65509, 65511), (73693, 73697), 123647, 126128], 'Sk': [94, 96, 168, 175, 180, 184, (706, 710), (722, 736), (741, 748), 749, (751, 768), 885, (900, 902), 8125, (8127, 8130), (8141, 8144), (8157, 8160), (8173, 8176), (8189, 8191), (12443, 12445), (42752, 42775), (42784, 42786), (42889, 42891), 43867, (64434, 64450), 65342, 65344, 65507, (127995, 128000)], 'Sm': [43, (60, 63), 124, 126, 172, 177, 215, 247, 1014, (1542, 1545), 8260, 8274, (8314, 8317), (8330, 8333), 8472, (8512, 8517), 8523, (8592, 8597), (8602, 8604), 8608, 8611, 8614, 8622, (8654, 8656), 8658, 8660, (8692, 8960), (8992, 8994), 9084, (9115, 9140), (9180, 9186), 9655, 9665, (9720, 9728), 9839, (10176, 10181), (10183, 10214), (10224, 10240), (10496, 10627), (10649, 10712), (10716, 10748), (10750, 11008), (11056, 11077), (11079, 11085), 64297, 65122, (65124, 65127), 65291, (65308, 65311), 65372, 65374, 65506, (65513, 65517), 120513, 120539, 120571, 120597, 120629, 120655, 120687, 120713, 120745, 120771, (126704, 126706)], 'So': [166, 169, 174, 176, 1154, (1421, 1423), (1550, 1552), 1758, 1769, (1789, 1791), 2038, 2554, 2928, (3059, 3065), 3066, 3199, 3407, 3449, (3841, 3844), 3859, (3861, 3864), (3866, 3872), 3892, 3894, 3896, (4030, 4038), (4039, 4045), (4046, 4048), (4053, 4057), (4254, 4256), (5008, 5018), 5741, 6464, (6622, 6656), (7009, 7019), (7028, 7037), (8448, 8450), (8451, 8455), (8456, 8458), 8468, (8470, 8472), (8478, 8484), 8485, 8487, 8489, 8494, (8506, 8508), 8522, (8524, 8526), 8527, (8586, 8588), (8597, 8602), (8604, 8608), (8609, 8611), (8612, 8614), (8615, 8622), (8623, 8654), (8656, 8658), 8659, (8661, 8692), (8960, 8968), (8972, 8992), (8994, 9001), (9003, 9084), (9085, 9115), (9140, 9180), (9186, 9255), (9280, 9291), (9372, 9450), (9472, 9655), (9656, 9665), (9666, 9720), (9728, 9839), (9840, 10088), (10132, 10176), (10240, 10496), (11008, 11056), (11077, 11079), (11085, 11124), (11126, 11158), (11160, 11264), (11493, 11499), (11904, 11930), (11931, 12020), (12032, 12246), (12272, 12284), 12292, (12306, 12308), 12320, (12342, 12344), (12350, 12352), (12688, 12690), (12694, 12704), (12736, 12772), (12800, 12831), (12842, 12872), 12880, (12896, 12928), (12938, 12977), (12992, 13312), (19904, 19968), (42128, 42183), (43048, 43052), (43062, 43064), 43065, (43639, 43642), 65021, 65508, 65512, (65517, 65519), (65532, 65534), (65847, 65856), (65913, 65930), (65932, 65935), (65936, 65948), 65952, (66000, 66045), (67703, 67705), 68296, 71487, (73685, 73693), (73697, 73714), (92988, 92992), 92997, 113820, (118784, 119030), (119040, 119079), (119081, 119141), (119146, 119149), (119171, 119173), (119180, 119210), (119214, 119273), (119296, 119362), 119365, (119552, 119639), (120832, 121344), (121399, 121403), (121453, 121461), (121462, 121476), (121477, 121479), 123215, 126124, 126254, (126976, 127020), (127024, 127124), (127136, 127151), (127153, 127168), (127169, 127184), (127185, 127222), (127248, 127341), (127344, 127405), (127462, 127491), (127504, 127548), (127552, 127561), (127568, 127570), (127584, 127590), (127744, 127995), (128000, 128726), (128736, 128749), (128752, 128763), (128768, 128884), (128896, 128985), (128992, 129004), (129024, 129036), (129040, 129096), (129104, 129114), (129120, 129160), (129168, 129198), (129280, 129292), (129293, 129394), (129395, 129399), (129402, 129443), (129445, 129451), (129454, 129483), (129485, 129620), (129632, 129646), (129648, 129652), (129656, 129659), (129664, 129667), (129680, 129686)], 'Z': [32, 160, 5760, (8192, 8203), (8232, 8234), 8239, 8287, 12288], 'Zl': [8232], 'Zp': [8233], 'Zs': [32, 160, 5760, (8192, 8203), 8239, 8287, 12288] } DIFF_CATEGORIES_VER_13_0_0 = { 'C': ([(2238, 2259), (2894, 2902), 3332, (3456, 3458), (6847, 6912), (11158, 11160), (11856, 11904), (12731, 12736), (19894, 19904), (40944, 40960), (42951, 42999), (43052, 43056), (43880, 43888), (65948, 65952), (69247, 69376), (69466, 69600), (69959, 69968), (70094, 70096), 70746, (70752, 70784), (71936, 72096), (73465, 73664), (94180, 94208), (101107, 110592), (127245, 127248), (127341, 127344), (127405, 127462), (128726, 128736), (128763, 128768), (129198, 129280), 129292, 129394, (129399, 129402), (129443, 129445), (129451, 129454), (129483, 129485), (129652, 129656), (129667, 129680), (129686, 131072), (173783, 173824), (195102, 917760)], [(2248, 2259), (2894, 2901), 3456, (6849, 6912), 11158, (11859, 11904), (40957, 40960), (42955, 42997), (43053, 43056), (43884, 43888), (65949, 65952), 69247, 69290, (69294, 69296), (69298, 69376), (69466, 69552), (69580, 69600), (69960, 69968), (70754, 70784), (71943, 71945), (71946, 71948), 71956, 71959, 71990, (71993, 71995), (72007, 72016), (72026, 72096), (73465, 73648), (73649, 73664), (94181, 94192), (94194, 94208), (101590, 101632), (101641, 110592), (127406, 127462), (128728, 128736), (128765, 128768), (129198, 129200), (129202, 129280), 129401, 129484, (129653, 129656), (129671, 129680), (129705, 129712), (129719, 129728), (129731, 129744), (129751, 129792), 129939, (129995, 130032), (130042, 131072), (173790, 173824), (195102, 196608), (201547, 917760)]), 'Cn': ([(2238, 2259), (2894, 2902), 3332, (3456, 3458), (6847, 6912), (11158, 11160), (11856, 11904), (12731, 12736), (19894, 19904), (40944, 40960), (42951, 42999), (43052, 43056), (43880, 43888), (65948, 65952), (69247, 69376), (69466, 69600), (69959, 69968), (70094, 70096), 70746, (70752, 70784), (71936, 72096), (73465, 73664), (94180, 94208), (101107, 110592), (127245, 127248), (127341, 127344), (127405, 127462), (128726, 128736), (128763, 128768), (129198, 129280), 129292, 129394, (129399, 129402), (129443, 129445), (129451, 129454), (129483, 129485), (129652, 129656), (129667, 129680), (129686, 131072), (173783, 173824), (195102, 917505)], [(2248, 2259), (2894, 2901), 3456, (6849, 6912), 11158, (11859, 11904), (40957, 40960), (42955, 42997), (43053, 43056), (43884, 43888), (65949, 65952), 69247, 69290, (69294, 69296), (69298, 69376), (69466, 69552), (69580, 69600), (69960, 69968), (70754, 70784), (71943, 71945), (71946, 71948), 71956, 71959, 71990, (71993, 71995), (72007, 72016), (72026, 72096), (73465, 73648), (73649, 73664), (94181, 94192), (94194, 94208), (101590, 101632), (101641, 110592), (127406, 127462), (128728, 128736), (128765, 128768), (129198, 129200), (129202, 129280), 129401, 129484, (129653, 129656), (129671, 129680), (129705, 129712), (129719, 129728), (129731, 129744), (129751, 129792), 129939, (129995, 130032), (130042, 131072), (173790, 173824), (195102, 196608), (201547, 917505)]), 'L': ([(2230, 2238), (3333, 3341), (12704, 12731), (13312, 19894), (19968, 40944), (42946, 42951), (42999, 43010), (43868, 43880), 70751, 71935, (100352, 101107), (131072, 173783)], [(2230, 2248), (3332, 3341), (12704, 12736), (13312, 19904), (19968, 40957), (42946, 42955), (42997, 43010), (43868, 43882), (69248, 69290), (69296, 69298), (69552, 69573), 69959, (70751, 70754), (71935, 71943), 71945, (71948, 71956), (71957, 71959), (71960, 71984), 71999, 72001, 73648, (100352, 101590), (101632, 101641), (131072, 173790), (196608, 201547)]), 'Ll': ([(43872, 43880)], [42952, 42954, 42998, (43872, 43881)]), 'Lm': ([], [43881]), 'Lo': ([(2230, 2238), (3333, 3341), (12704, 12731), (13312, 19894), (19968, 40944), 70751, 71935, (100352, 101107), (131072, 173783)], [(2230, 2248), (3332, 3341), (12704, 12736), (13312, 19904), (19968, 40957), (69248, 69290), (69296, 69298), (69552, 69573), 69959, (70751, 70754), (71935, 71943), 71945, (71948, 71956), (71957, 71959), (71960, 71984), 71999, 72001, 73648, (100352, 101590), (101632, 101641), (131072, 173790), (196608, 201547)]), 'Lu': ([(42948, 42951)], [(42948, 42952), 42953, 42997]), 'M': ([(2902, 2904), (3458, 3460), (6832, 6847)], [(2901, 2904), (3457, 3460), (6832, 6849), 43052, (69291, 69293), (70094, 70096), (71984, 71990), (71991, 71993), (71995, 71999), 72000, (72002, 72004), 94180, (94192, 94194)]), 'Mc': ([], [70094, (71984, 71990), (71991, 71993), 71997, 72000, 72002, (94192, 94194)]), 'Mn': ([2902], [(2901, 2903), 3457, (6847, 6849), 43052, (69291, 69293), 70095, (71995, 71997), 71998, 72003, 94180]), 'N': ([], [(69573, 69580), (72016, 72026), (130032, 130042)]), 'Nd': ([], [(72016, 72026), (130032, 130042)]), 'No': ([], [(69573, 69580)]), 'P': ([70747], [11858, 69293, (70746, 70748), (72004, 72007)]), 'Pd': ([], [69293]), 'Po': ([70747], [11858, (70746, 70748), (72004, 72007)]), 'S': ([(11160, 11264), (65936, 65948), (127248, 127341), (127344, 127405), (127744, 128726), (128752, 128763), (129280, 129292), (129293, 129394), (129395, 129399), (129402, 129443), (129445, 129451), (129454, 129483), (129648, 129652), (129664, 129667), (129680, 129686)], [(11159, 11264), (11856, 11858), (43882, 43884), (65936, 65949), (127245, 127406), (127744, 128728), (128752, 128765), (129200, 129202), (129280, 129401), (129402, 129484), (129648, 129653), (129664, 129671), (129680, 129705), (129712, 129719), (129728, 129731), (129744, 129751), (129792, 129939), (129940, 129995)]), 'Sk': ([], [(43882, 43884)]), 'So': ([(11160, 11264), (65936, 65948), (127248, 127341), (127344, 127405), (128000, 128726), (128752, 128763), (129280, 129292), (129293, 129394), (129395, 129399), (129402, 129443), (129445, 129451), (129454, 129483), (129648, 129652), (129664, 129667), (129680, 129686)], [(11159, 11264), (11856, 11858), (65936, 65949), (127245, 127406), (128000, 128728), (128752, 128765), (129200, 129202), (129280, 129401), (129402, 129484), (129648, 129653), (129664, 129671), (129680, 129705), (129712, 129719), (129728, 129731), (129744, 129751), (129792, 129939), (129940, 129995)]) } DIFF_CATEGORIES_VER_14_0_0 = { 'C': ([(1564, 1566), (2155, 2208), 2229, (2248, 2259), (3130, 3133), (3163, 3168), (3287, 3294), 5901, (5909, 5920), (6158, 6160), (6849, 6912), (6988, 6992), (7037, 7040), 7674, (8384, 8400), 11311, 11359, (11859, 11904), (40957, 40960), (42944, 42946), (42955, 42997), (64450, 64467), (64832, 64848), (64968, 65008), (65022, 65024), (66928, 67072), (67432, 67584), (69466, 69552), (69744, 69759), (69826, 69840), (71353, 71360), (71488, 71680), (72355, 72384), (75076, 77824), (92784, 92880), (101641, 110592), (110879, 110928), (113824, 118784), (119273, 119296), (121520, 122880), (123216, 123584), (123648, 124928), (128728, 128736), (129004, 129024), 129401, 129484, (129659, 129664), (129705, 129712), (129719, 129728), (129731, 129744), (129751, 129792), (173790, 173824), (177973, 177984)], [1564, (2155, 2160), (2191, 2200), (3130, 3132), (3163, 3165), (3166, 3168), (3287, 3293), (5910, 5919), 6158, (6863, 6912), (6989, 6992), 7039, (8385, 8400), (11870, 11904), (42955, 42960), 42962, 42964, (42970, 42994), (64451, 64467), (64968, 64975), (64976, 65008), 66939, 66955, 66963, 66966, 66978, 66994, 67002, (67005, 67072), (67432, 67456), 67462, 67505, (67515, 67584), (69466, 69488), (69514, 69552), (69750, 69759), (69827, 69840), (71354, 71360), (71495, 71680), (72355, 72368), (75076, 77712), (77811, 77824), 92863, (92874, 92880), (101641, 110576), 110580, 110588, 110591, (110883, 110928), (113824, 118528), (118574, 118576), (118599, 118608), (118724, 118784), (119275, 119296), (121520, 122624), (122655, 122880), (123216, 123536), (123567, 123584), (123648, 124896), 124903, 124908, 124911, 124927, (128728, 128733), (129004, 129008), (129009, 129024), (129661, 129664), (129709, 129712), (129723, 129728), (129734, 129744), (129754, 129760), (129768, 129776), (129783, 129792), (173792, 173824), (177977, 177984)]), 'Cf': ([], [(2192, 2194)]), 'Cn': ([1565, (2155, 2208), 2229, (2248, 2259), (3130, 3133), (3163, 3168), (3287, 3294), 5901, (5909, 5920), 6159, (6849, 6912), (6988, 6992), (7037, 7040), 7674, (8384, 8400), 11311, 11359, (11859, 11904), (40957, 40960), (42944, 42946), (42955, 42997), (64450, 64467), (64832, 64848), (64968, 65008), (65022, 65024), (66928, 67072), (67432, 67584), (69466, 69552), (69744, 69759), (69826, 69837), (71353, 71360), (71488, 71680), (72355, 72384), (75076, 77824), (92784, 92880), (101641, 110592), (110879, 110928), (113828, 118784), (119273, 119296), (121520, 122880), (123216, 123584), (123648, 124928), (128728, 128736), (129004, 129024), 129401, 129484, (129659, 129664), (129705, 129712), (129719, 129728), (129731, 129744), (129751, 129792), (173790, 173824), (177973, 177984)], [(2155, 2160), 2191, (2194, 2200), (3130, 3132), (3163, 3165), (3166, 3168), (3287, 3293), (5910, 5919), (6863, 6912), (6989, 6992), 7039, (8385, 8400), (11870, 11904), (42955, 42960), 42962, 42964, (42970, 42994), (64451, 64467), (64968, 64975), (64976, 65008), 66939, 66955, 66963, 66966, 66978, 66994, 67002, (67005, 67072), (67432, 67456), 67462, 67505, (67515, 67584), (69466, 69488), (69514, 69552), (69750, 69759), (69827, 69837), (71354, 71360), (71495, 71680), (72355, 72368), (75076, 77712), (77811, 77824), 92863, (92874, 92880), (101641, 110576), 110580, 110588, 110591, (110883, 110928), (113828, 118528), (118574, 118576), (118599, 118608), (118724, 118784), (119275, 119296), (121520, 122624), (122655, 122880), (123216, 123536), (123567, 123584), (123648, 124896), 124903, 124908, 124911, 124927, (128728, 128733), (129004, 129008), (129009, 129024), (129661, 129664), (129709, 129712), (129723, 129728), (129734, 129744), (129754, 129760), (129768, 129776), (129783, 129792), (173792, 173824), (177977, 177984)]), 'L': ([(2208, 2229), (2230, 2248), 3294, (5888, 5901), (5902, 5906), (5920, 5938), (6981, 6988), (11264, 11311), (11312, 11359), (11360, 11493), (19968, 40957), (40960, 42125), (42891, 42944), (42946, 42955), (42997, 43010), (72384, 72441), (110592, 110879), (131072, 173790), (173824, 177973)], [(2160, 2184), (2185, 2191), (2208, 2250), 3165, (3293, 3295), (5888, 5906), (5919, 5938), (6981, 6989), (11264, 11493), (19968, 42125), (42891, 42955), (42960, 42962), 42963, (42965, 42970), (42994, 43010), (66928, 66939), (66940, 66955), (66956, 66963), (66964, 66966), (66967, 66978), (66979, 66994), (66995, 67002), (67003, 67005), (67456, 67462), (67463, 67505), (67506, 67515), (69488, 69506), (69745, 69747), 69749, (71488, 71495), (72368, 72441), (77712, 77809), (92784, 92863), (110576, 110580), (110581, 110588), (110589, 110591), (110592, 110883), (122624, 122655), (123536, 123566), (124896, 124903), (124904, 124908), (124909, 124911), (124912, 124927), (131072, 173792), (173824, 177977)]), 'Ll': ([(11312, 11359)], [(11312, 11360), 42945, 42961, 42963, 42965, 42967, 42969, (66967, 66978), (66979, 66994), (66995, 67002), (67003, 67005), (122624, 122634), (122635, 122655)]), 'Lm': ([], [2249, (42994, 42997), (67456, 67462), (67463, 67505), (67506, 67515), (110576, 110580), (110581, 110588), (110589, 110591)]), 'Lo': ([(2208, 2229), (2230, 2248), 3294, (5888, 5901), (5902, 5906), (5920, 5938), (6981, 6988), (19968, 40957), (40960, 40981), (72384, 72441), (110592, 110879), (131072, 173790), (173824, 177973)], [(2160, 2184), (2185, 2191), (2208, 2249), 3165, (3293, 3295), (5888, 5906), (5919, 5938), (6981, 6989), (19968, 40981), (69488, 69506), (69745, 69747), 69749, (71488, 71495), (72368, 72441), (77712, 77809), (92784, 92863), (110592, 110883), 122634, (123536, 123566), (124896, 124903), (124904, 124908), (124909, 124911), (124912, 124927), (131072, 173792), (173824, 177977)]), 'Lu': ([(11264, 11311)], [(11264, 11312), 42944, 42960, 42966, 42968, (66928, 66939), (66940, 66955), (66956, 66963), (66964, 66966)]), 'M': ([(2259, 2274), (5906, 5909), (6832, 6849), (7616, 7674), (7675, 7680)], [(2200, 2208), (2250, 2274), 3132, (5906, 5910), 6159, (6832, 6863), (7616, 7680), (69506, 69510), 69744, (69747, 69749), 69826, (118528, 118574), (118576, 118599), 123566]), 'Mc': ([], [5909, 5940]), 'Mn': ([(2259, 2274), (5938, 5941), (6847, 6849), (7616, 7674), (7675, 7680)], [(2200, 2208), (2250, 2274), 3132, (5938, 5940), 6159, (6847, 6863), (7616, 7680), (69506, 69510), 69744, (69747, 69749), 69826, (118528, 118574), (118576, 118599), 123566]), 'N': ([], [(92864, 92874)]), 'Nd': ([], [(92864, 92874)]), 'P': ([(1566, 1568), 11858], [(1565, 1568), (7037, 7039), (11858, 11870), (69510, 69514), 71353, (77809, 77811)]), 'Pd': ([], [11869]), 'Pe': ([], [11862, 11864, 11866, 11868]), 'Po': ([(1566, 1568), 11858], [(1565, 1568), (7037, 7039), (11858, 11861), (69510, 69514), 71353, (77809, 77811)]), 'Ps': ([], [11861, 11863, 11865, 11867]), 'S': ([(8352, 8384), (64434, 64450), (65020, 65022), (119214, 119273), (128736, 128749), (129280, 129401), (129402, 129484), (129485, 129620), (129656, 129659), (129680, 129705), (129712, 129719), (129728, 129731), (129744, 129751)], [2184, (8352, 8385), (64434, 64451), (64832, 64848), 64975, (65020, 65024), (118608, 118724), (119214, 119275), (128733, 128749), 129008, (129280, 129620), (129656, 129661), (129680, 129709), (129712, 129723), (129728, 129734), (129744, 129754), (129760, 129768), (129776, 129783)]), 'Sc': ([(8352, 8384)], [(8352, 8385)]), 'Sk': ([(64434, 64450)], [2184, (64434, 64451)]), 'So': ([65021, (119214, 119273), (128736, 128749), (129280, 129401), (129402, 129484), (129485, 129620), (129656, 129659), (129680, 129705), (129712, 129719), (129728, 129731), (129744, 129751)], [(64832, 64848), 64975, (65021, 65024), (118608, 118724), (119214, 119275), (128733, 128749), 129008, (129280, 129620), (129656, 129661), (129680, 129709), (129712, 129723), (129728, 129734), (129744, 129754), (129760, 129768), (129776, 129783)]) } DIFF_CATEGORIES_VER_15_0_0 = { 'C': ([(3315, 3328), (3790, 3792), (69298, 69376), (70207, 70272), (72441, 72704), (73465, 73648), (78895, 82944), (110883, 110928), (110931, 110948), (119366, 119520), (122655, 122880), (122923, 123136), (123648, 124896), (128728, 128733), (128884, 128896), (128985, 128992), (129653, 129656), (129671, 129680), (129709, 129712), (129723, 129728), (129734, 129744), (129754, 129760), (129768, 129776), (129783, 129792), (177977, 177984), (201547, 917760)], [(3316, 3328), 3791, (69298, 69373), (70210, 70272), (72441, 72448), (72458, 72704), (73465, 73472), 73489, (73531, 73534), (73562, 73648), (78896, 78912), (78934, 82944), (110883, 110898), (110899, 110928), (110931, 110933), (110934, 110948), (119366, 119488), (119508, 119520), (122655, 122661), (122667, 122880), (122923, 122928), (122990, 123023), (123024, 123136), (123648, 124112), (124154, 124896), (128728, 128732), (128887, 128891), (128986, 128992), (129673, 129680), 129726, (129734, 129742), (129756, 129760), (129769, 129776), (129785, 129792), (177978, 177984), (201547, 201552), (205744, 917760)]), 'Cf': ([(78896, 78905)], [(78896, 78912)]), 'Cn': ([(3315, 3328), (3790, 3792), (69298, 69376), (70207, 70272), (72441, 72704), (73465, 73648), 78895, (78905, 82944), (110883, 110928), (110931, 110948), (119366, 119520), (122655, 122880), (122923, 123136), (123648, 124896), (128728, 128733), (128884, 128896), (128985, 128992), (129653, 129656), (129671, 129680), (129709, 129712), (129723, 129728), (129734, 129744), (129754, 129760), (129768, 129776), (129783, 129792), (177977, 177984), (201547, 917505)], [(3316, 3328), 3791, (69298, 69373), (70210, 70272), (72441, 72448), (72458, 72704), (73465, 73472), 73489, (73531, 73534), (73562, 73648), (78934, 82944), (110883, 110898), (110899, 110928), (110931, 110933), (110934, 110948), (119366, 119488), (119508, 119520), (122655, 122661), (122667, 122880), (122923, 122928), (122990, 123023), (123024, 123136), (123648, 124112), (124154, 124896), (128728, 128732), (128887, 128891), (128986, 128992), (129673, 129680), 129726, (129734, 129742), (129756, 129760), (129769, 129776), (129785, 129792), (177978, 177984), (201547, 201552), (205744, 917505)]), 'L': ([(77824, 78895), (173824, 177977)], [(70207, 70209), 73474, (73476, 73489), (73490, 73524), (77824, 78896), (78913, 78919), 110898, 110933, (122661, 122667), (122928, 122990), (124112, 124140), (173824, 177978), (201552, 205744)]), 'Ll': ([], [(122661, 122667)]), 'Lm': ([], [(122928, 122990), 124139]), 'Lo': ([(77824, 78895), (173824, 177977)], [(70207, 70209), 73474, (73476, 73489), (73490, 73524), (77824, 78896), (78913, 78919), 110898, 110933, (124112, 124139), (173824, 177978), (201552, 205744)]), 'M': ([(3784, 3790)], [3315, (3784, 3791), (69373, 69376), 70209, (73472, 73474), 73475, (73524, 73531), (73534, 73539), 78912, (78919, 78934), 123023, (124140, 124144)]), 'Mc': ([], [3315, 73475, (73524, 73526), (73534, 73536), 73537]), 'Mn': ([(3784, 3790)], [(3784, 3791), (69373, 69376), 70209, (73472, 73474), (73526, 73531), 73536, 73538, 78912, (78919, 78934), 123023, (124140, 124144)]), 'N': ([], [(73552, 73562), (119488, 119508), (124144, 124154)]), 'Nd': ([], [(73552, 73562), (124144, 124154)]), 'No': ([], [(119488, 119508)]), 'P': ([], [(72448, 72458), (73539, 73552)]), 'Po': ([], [(72448, 72458), (73539, 73552)]), 'S': ([(128733, 128749), (128768, 128884), (128896, 128985), (129648, 129653), (129656, 129661), (129664, 129671), (129680, 129709), (129712, 129723), (129728, 129734), (129744, 129754), (129760, 129768), (129776, 129783)], [(128732, 128749), (128768, 128887), (128891, 128986), (129648, 129661), (129664, 129673), (129680, 129726), (129727, 129734), (129742, 129756), (129760, 129769), (129776, 129785)]), 'So': ([(128733, 128749), (128768, 128884), (128896, 128985), (129648, 129653), (129656, 129661), (129664, 129671), (129680, 129709), (129712, 129723), (129728, 129734), (129744, 129754), (129760, 129768), (129776, 129783)], [(128732, 128749), (128768, 128887), (128891, 128986), (129648, 129661), (129664, 129673), (129680, 129726), (129727, 129734), (129742, 129756), (129760, 129769), (129776, 129785)]) } DIFF_CATEGORIES_VER_15_1_0 = { 'C': ([(12284, 12288), (12772, 12784), (191457, 194560)], [(12772, 12783), (191457, 191472), (192094, 194560)]), 'Cn': ([(12284, 12288), (12772, 12784), (191457, 194560)], [(12772, 12783), (191457, 191472), (192094, 194560)]), 'L': ([], [(191472, 192094)]), 'Lo': ([], [(191472, 192094)]), 'S': ([(12272, 12284)], [(12272, 12288), 12783]), 'So': ([(12272, 12284)], [(12272, 12288), 12783]) } DIFF_CATEGORIES_VER_16_0_0 = { 'C': ([(2191, 2200), (6989, 6992), 7039, (7305, 7312), (9255, 9280), (12772, 12783), (42955, 42960), (42970, 42994), (67005, 67072), (68922, 69216), (69298, 69373), (70517, 70656), (71370, 71424), (72458, 72704), (73562, 73648), (78934, 82944), (83527, 92160), (93072, 93760), (101590, 101632), (113824, 118528), (124154, 124896), (129202, 129280), (129673, 129680), 129726, (129734, 129742), (129756, 129760), (129769, 129776), (129995, 130032)], [(2191, 2199), 6989, (7307, 7312), (9258, 9280), (12774, 12783), (42958, 42960), (42973, 42994), (67005, 67008), (67060, 67072), (68922, 68928), (68966, 68969), (68998, 69006), (69008, 69216), (69298, 69314), (69317, 69372), (70517, 70528), 70538, (70540, 70542), 70543, 70582, 70593, (70595, 70597), 70598, 70603, 70614, (70617, 70625), (70627, 70656), (71370, 71376), (71396, 71424), (72458, 72640), (72674, 72688), (72698, 72704), (73563, 73648), (78934, 78944), (82939, 82944), (83527, 90368), (90426, 92160), (93072, 93504), (93562, 93760), (101590, 101631), (113824, 117760), (118010, 118016), (118452, 118528), (124154, 124368), (124411, 124415), (124416, 124896), (129212, 129216), (129218, 129280), (129674, 129679), (129735, 129742), (129757, 129759), (129770, 129776)]), 'Cn': ([(2194, 2200), (6989, 6992), 7039, (7305, 7312), (9255, 9280), (12772, 12783), (42955, 42960), (42970, 42994), (67005, 67072), (68922, 69216), (69298, 69373), (70517, 70656), (71370, 71424), (72458, 72704), (73562, 73648), (78934, 82944), (83527, 92160), (93072, 93760), (101590, 101632), (113828, 118528), (124154, 124896), (129202, 129280), (129673, 129680), 129726, (129734, 129742), (129756, 129760), (129769, 129776), (129995, 130032)], [(2194, 2199), 6989, (7307, 7312), (9258, 9280), (12774, 12783), (42958, 42960), (42973, 42994), (67005, 67008), (67060, 67072), (68922, 68928), (68966, 68969), (68998, 69006), (69008, 69216), (69298, 69314), (69317, 69372), (70517, 70528), 70538, (70540, 70542), 70543, 70582, 70593, (70595, 70597), 70598, 70603, 70614, (70617, 70625), (70627, 70656), (71370, 71376), (71396, 71424), (72458, 72640), (72674, 72688), (72698, 72704), (73563, 73648), (78934, 78944), (82939, 82944), (83527, 90368), (90426, 92160), (93072, 93504), (93562, 93760), (101590, 101631), (113828, 117760), (118010, 118016), (118452, 118528), (124154, 124368), (124411, 124415), (124416, 124896), (129212, 129216), (129218, 129280), (129674, 129679), (129735, 129742), (129757, 129759), (129770, 129776)]), 'L': ([(7296, 7305), (42891, 42955), (42965, 42970), (101632, 101641)], [(7296, 7307), (42891, 42958), (42965, 42973), (67008, 67060), (68938, 68966), (68975, 68998), (69314, 69317), (70528, 70538), 70539, 70542, (70544, 70582), 70583, 70609, 70611, (72640, 72673), (78944, 82939), (90368, 90398), (93504, 93549), (101631, 101641), (124368, 124398), 124400]), 'Ll': ([], [7306, 42957, 42971, (68976, 68998)]), 'Lm': ([], [68942, 68975, (93504, 93507), (93547, 93549)]), 'Lo': ([(101632, 101641)], [(67008, 67060), (68938, 68942), 68943, (69314, 69317), (70528, 70538), 70539, 70542, (70544, 70582), 70583, 70609, 70611, (72640, 72673), (78944, 82939), (90368, 90398), (93507, 93547), (101631, 101641), (124368, 124398), 124400]), 'Lu': ([], [7305, (42955, 42957), 42970, 42972, (68944, 68966)]), 'M': ([(2200, 2208), (69373, 69376)], [(2199, 2208), (68969, 68974), (69372, 69376), (70584, 70593), 70594, 70597, (70599, 70603), (70604, 70609), 70610, (70625, 70627), 73562, (90398, 90416), (124398, 124400)]), 'Mc': ([], [(70584, 70587), 70594, 70597, (70599, 70603), (70604, 70606), 70607, 71454, (90410, 90413)]), 'Mn': ([(2200, 2208), (69373, 69376), (71453, 71456)], [(2199, 2208), (68969, 68974), (69372, 69376), (70587, 70593), 70606, 70608, 70610, (70625, 70627), 71453, 71455, 73562, (90398, 90410), (90413, 90416), (124398, 124400)]), 'N': ([], [(68928, 68938), (71376, 71396), (72688, 72698), (90416, 90426), (93552, 93562), (118000, 118010), (124401, 124411)]), 'Nd': ([], [(68928, 68938), (71376, 71396), (72688, 72698), (90416, 90426), (93552, 93562), (118000, 118010), (124401, 124411)]), 'P': ([(7037, 7039)], [(6990, 6992), (7037, 7040), 68974, (70612, 70614), (70615, 70617), 72673, (93549, 93552), 124415]), 'Pd': ([], [68974]), 'Po': ([(7037, 7039)], [(6990, 6992), (7037, 7040), (70612, 70614), (70615, 70617), 72673, (93549, 93552), 124415]), 'S': ([(9003, 9255), (12736, 12772), (129200, 129202), (129664, 129673), (129680, 129726), (129727, 129734), (129742, 129756), (129760, 129769), (129940, 129995)], [(9003, 9258), (12736, 12774), (69006, 69008), (117760, 118000), (118016, 118452), (129200, 129212), (129216, 129218), (129664, 129674), (129679, 129735), (129742, 129757), (129759, 129770), (129940, 130032)]), 'Sm': ([], [(69006, 69008)]), 'So': ([(9186, 9255), (12736, 12772), (129200, 129202), (129664, 129673), (129680, 129726), (129727, 129734), (129742, 129756), (129760, 129769), (129940, 129995)], [(9186, 9258), (12736, 12774), (117760, 118000), (118016, 118452), (129200, 129212), (129216, 129218), (129664, 129674), (129679, 129735), (129742, 129757), (129759, 129770), (129940, 130032)]) } sissaschool-elementpath-d3688c7/elementpath/regex/unicode_subsets.py000066400000000000000000000542201476131650400261220ustar00rootroot00000000000000# # Copyright (c), 2016-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module defines a class for handling Unicode subsets with less usage of memory. """ from collections import defaultdict from functools import wraps from sys import maxunicode from types import ModuleType from typing import cast, Callable, Dict, List, Optional, Tuple, Union from unicodedata import unidata_version from elementpath._typing import Iterable, Iterator, MutableSet from .codepoints import RegexError, CodePoint, code_point_order, \ code_point_repr, iter_code_points, iterparse_character_subset from . import unicode_blocks from . import unicode_categories __all__ = ['UnicodeSubset', 'UnicodeData', 'install_unicode_data', 'unicode_version', 'lazy_subset', 'unicode_subset', 'unicode_category', 'unicode_block'] UNICODE_VERSIONS = ( '16.0.0', '15.1.0', '15.0.0', '14.0.0', '13.0.0', '12.1.0', '12.0.0', '11.0.0', '10.0.0', '9.0.0', '8.0.0', '7.0.0', '6.3.0', '6.2.0', '6.1.0', '6.0.0', '5.2.0', '5.1.0', '5.0.0', '4.1.0', '4.0.1', '4.0.0', '3.2.0', '3.1.1', '3.1.0', '3.0.1', '3.0.0', '2.1.9', '2.1.8', '2.1.5', '2.1.2', '2.0.0' ) CodePointsArgType = Union[None, str, 'UnicodeSubset', List[CodePoint], Iterable[CodePoint]] class UnicodeSubset(MutableSet[CodePoint]): """ Represents a subset of Unicode code points, implemented with an ordered list of integer values and ranges. Codepoints can be added or discarded using sequences of integer values and ranges or with strings equivalent to regex character set. :param codepoints: a sequence of integer values and ranges, another UnicodeSubset \ instance ora a string equivalent of a regex character set. """ __slots__ = '_codepoints', _codepoints: List[CodePoint] def __init__(self, codepoints: CodePointsArgType = None) -> None: if not codepoints: self._codepoints = list() elif isinstance(codepoints, list): self._codepoints = sorted(codepoints, key=code_point_order) elif isinstance(codepoints, UnicodeSubset): self._codepoints = codepoints._codepoints.copy() else: self._codepoints = list() self.update(codepoints) @property def codepoints(self) -> List[CodePoint]: return self._codepoints @codepoints.setter def codepoints(self, codepoints: Iterable[CodePoint]) -> None: self._codepoints = sorted(codepoints, key=code_point_order) def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, str(self)) def __str__(self) -> str: return ''.join(code_point_repr(cp) for cp in self._codepoints) def copy(self) -> 'UnicodeSubset': return self.__copy__() def __copy__(self) -> 'UnicodeSubset': subset = self.__class__() subset._codepoints = self._codepoints.copy() return subset def __reversed__(self) -> Iterator[int]: for item in reversed(self._codepoints): if isinstance(item, int): yield item else: yield from reversed(range(item[0], item[1])) def complement(self) -> Iterator[CodePoint]: last_cp = 0 for cp in self._codepoints: if isinstance(cp, int): cp0 = cp cp1 = cp + 1 else: cp0, cp1 = cp diff = cp0 - last_cp if diff > 2: yield last_cp, cp0 elif diff == 2: yield last_cp yield last_cp + 1 elif diff == 1: yield last_cp elif diff: raise ValueError("unordered code points found in {!r}".format(self)) last_cp = cp1 if last_cp < maxunicode: yield last_cp, maxunicode + 1 elif last_cp == maxunicode: yield maxunicode def iter_characters(self) -> Iterator[str]: return map(chr, self.__iter__()) # # MutableSet's abstract methods implementation def __contains__(self, value: object) -> bool: if not isinstance(value, int): try: value = ord(value) # type: ignore[arg-type] except TypeError: return False for cp in self._codepoints: if not isinstance(cp, int): if cp[0] > value: return False elif cp[1] <= value: continue else: return True elif cp > value: return False elif cp == value: return True return False def __iter__(self) -> Iterator[int]: for cp in self._codepoints: if isinstance(cp, int): yield cp else: yield from range(*cp) def __len__(self) -> int: k = 0 for _ in self: k += 1 return k def update(self, *others: Union[str, Iterable[CodePoint]]) -> None: for value in others: if isinstance(value, str): for cp in iter_code_points(iterparse_character_subset(value), reverse=True): self.add(cp) else: for cp in iter_code_points(value, reverse=True): self.add(cp) def add(self, value: CodePoint) -> None: if isinstance(value, int): if 0 <= value <= maxunicode: start_cp = value end_cp = value + 1 else: raise ValueError(f"{value!r} is not a Unicode code point value") elif 0 <= value[0] < value[1] <= maxunicode + 1: start_cp, end_cp = value else: raise ValueError(f"{value!r} is not a Unicode code point range") code_points = self._codepoints last_index = len(code_points) - 1 for k, cp in enumerate(code_points): if isinstance(cp, int): cp0 = cp cp1 = cp + 1 else: cp0, cp1 = cp if end_cp < cp0: code_points.insert(k, value) elif start_cp > cp1: continue elif end_cp > cp1: if k == last_index: code_points[k] = min(cp0, start_cp), end_cp else: next_cp = code_points[k + 1] higher_bound = next_cp if isinstance(next_cp, int) else next_cp[0] if end_cp <= higher_bound: code_points[k] = min(cp0, start_cp), end_cp else: code_points[k] = min(cp0, start_cp), higher_bound start_cp = higher_bound continue elif start_cp < cp0: code_points[k] = start_cp, cp1 break else: self._codepoints.append(value) def difference(self, other: 'UnicodeSubset') -> 'UnicodeSubset': subset = self.__copy__() subset.difference_update(other) return subset def difference_update(self, *others: Union[str, Iterable[CodePoint]]) -> None: for value in others: if isinstance(value, str): for cp in iter_code_points(iterparse_character_subset(value), reverse=True): self.discard(cp) else: for cp in iter_code_points(value, reverse=True): self.discard(cp) def discard(self, value: CodePoint) -> None: if isinstance(value, int): if 0 <= value <= maxunicode: start_cp = value end_cp = value + 1 else: raise ValueError(f"{value!r} is not a Unicode code point value") elif 0 <= value[0] < value[1] <= maxunicode + 1: start_cp, end_cp = value else: raise ValueError(f"{value!r} is not a Unicode code point range") codepoints = self._codepoints for k in reversed(range(len(codepoints))): cp = codepoints[k] if isinstance(cp, int): cp0 = cp cp1 = cp + 1 else: cp0, cp1 = cp if start_cp >= cp1: break elif end_cp >= cp1: if start_cp <= cp0: del codepoints[k] elif start_cp - cp0 > 1: codepoints[k] = cp0, start_cp else: codepoints[k] = cp0 elif end_cp > cp0: if start_cp <= cp0: if cp1 - end_cp > 1: codepoints[k] = end_cp, cp1 else: codepoints[k] = cp1 - 1 else: if cp1 - end_cp > 1: codepoints.insert(k + 1, (end_cp, cp1)) else: codepoints.insert(k + 1, cp1 - 1) if start_cp - cp0 > 1: codepoints[k] = cp0, start_cp else: codepoints[k] = cp0 # # MutableSet's mixin methods override def clear(self) -> None: del self._codepoints[:] def __eq__(self, other: object) -> bool: if not isinstance(other, Iterable): return NotImplemented elif isinstance(other, UnicodeSubset): return self._codepoints == other._codepoints else: return self._codepoints == other def __ior__(self, other: object) -> 'UnicodeSubset': if not isinstance(other, Iterable): return NotImplemented elif isinstance(other, UnicodeSubset): other = reversed(other._codepoints) elif isinstance(other, str): other = reversed(UnicodeSubset(other)._codepoints) else: other = iter_code_points(other, reverse=True) for cp in other: self.add(cp) return self def __or__(self, other: object) -> 'UnicodeSubset': obj = self.__copy__() return obj.__ior__(other) def __isub__(self, other: object) -> 'UnicodeSubset': if not isinstance(other, Iterable): return NotImplemented elif isinstance(other, UnicodeSubset): other = reversed(other._codepoints) elif isinstance(other, str): other = reversed(UnicodeSubset(other)._codepoints) else: other = iter_code_points(other, reverse=True) for cp in other: self.discard(cp) return self def __sub__(self, other: object) -> 'UnicodeSubset': obj = self.__copy__() return obj.__isub__(other) __rsub__ = __sub__ def __iand__(self, other: object) -> 'UnicodeSubset': if not isinstance(other, Iterable): return NotImplemented for value in (self - other): self.discard(value) return self def __and__(self, other: object) -> 'UnicodeSubset': obj = self.__copy__() return obj.__iand__(other) def __ixor__(self, other: object) -> 'UnicodeSubset': if other is self: self.clear() return self elif not isinstance(other, Iterable): return NotImplemented elif not isinstance(other, UnicodeSubset): other = UnicodeSubset(cast(Union[str, Iterable[CodePoint]], other)) for value in other: if value in self: self.discard(value) else: self.add(value) return self def __xor__(self, other: object) -> 'UnicodeSubset': obj = self.__copy__() return obj.__ixor__(other) def iterparse_unicode_data(url: str) -> Iterator[Tuple[int, str]]: """Iterate UnicodeData.txt source giving back codepoints and categories.""" from urllib.request import urlopen with urlopen(url) as res: prev_cp = -1 for line in res.readlines(): fields = line.split(b';') cp = int(fields[0], 16) cat = fields[2].decode('utf-8') if cp - prev_cp > 1: if fields[1].endswith(b', Last>'): # Ranges of codepoints expressed with First and then Last for x in range(prev_cp + 1, cp): yield x, cat else: # For default is 'Cn' that means 'Other, not assigned' for x in range(prev_cp + 1, cp): yield x, 'Cn' prev_cp = cp yield cp, cat while cp < maxunicode: cp += 1 yield cp, 'Cn' def get_categories_from_url(url: str) -> Dict[str, UnicodeSubset]: categories: Dict[str, List[CodePoint]] = defaultdict(list) major_category = 'C' major_start_cp, major_next_cp = 0, 1 minor_category = 'Cc' minor_start_cp, minor_next_cp = 0, 1 for cp, cat in iterparse_unicode_data(url): if cat[0] != major_category: if cp > major_next_cp: categories[major_category].append((major_start_cp, cp)) else: categories[major_category].append(major_start_cp) major_category = cat[0] major_start_cp, major_next_cp = cp, cp + 1 if cat != minor_category: if cp > minor_next_cp: categories[minor_category].append((minor_start_cp, cp)) else: categories[minor_category].append(minor_start_cp) minor_category = cat minor_start_cp, minor_next_cp = cp, cp + 1 else: if major_next_cp == maxunicode + 1: categories[major_category].append(major_start_cp) else: categories[major_category].append((major_start_cp, maxunicode + 1)) if minor_next_cp == maxunicode + 1: categories[minor_category].append(minor_start_cp) else: categories[minor_category].append((minor_start_cp, maxunicode + 1)) return {k: UnicodeSubset(v) for k, v in categories.items()} def get_categories(version_info: Tuple[int, ...], module: ModuleType) -> Dict[str, UnicodeSubset]: categories = {k: v.copy() for k, v in module.UNICODE_CATEGORIES.items()} for name in module.__dict__: if not name.startswith('DIFF_CATEGORIES_VER_'): continue diff_version = name[20:].replace('_', '.') if version_info < tuple(int(x) for x in diff_version.split('.')): break for k, (exclude_cps, insert_cps) in getattr(unicode_categories, name).items(): values = [] additional = iter(insert_cps) cpa = next(additional, None) cpa_int = cpa[0] if isinstance(cpa, tuple) else cpa for cp in categories[k]: if cp in exclude_cps: continue cp_int = cp[0] if isinstance(cp, tuple) else cp while cpa_int is not None and cpa_int <= cp_int: values.append(cpa) cpa = next(additional, None) cpa_int = cpa[0] if isinstance(cpa, tuple) else cpa else: values.append(cp) else: if cpa is not None: values.append(cpa) values.extend(additional) categories[k] = values return {k: UnicodeSubset(v) for k, v in categories.items()} class UnicodeData: _blocks: Dict[str, Union[str, UnicodeSubset]] @staticmethod def _unicode_block_key(name: str) -> str: return name.upper().replace(' ', '').replace('_', '').replace('-', '') def __init__(self, version: Optional[str] = None, categories: Optional[Dict[str, UnicodeSubset]] = None) -> None: if version is None: version = unidata_version elif version not in UNICODE_VERSIONS: raise ValueError("argument is not a valid Unicode version") self.version = version version_info = tuple(int(x) for x in version.split('.')) if categories is not None: self._categories = categories elif self.version in unicode_categories.UNICODE_VERSIONS: self._categories = get_categories(version_info, unicode_categories) else: raise TypeError(f"can't get version {version} from module") # Build blocks dict for version superseded_blocks = [] blocks = unicode_blocks.UNICODE_BLOCKS_VER_2_0_0.copy() for name in unicode_blocks.__dict__: # noqa if name.startswith('UPDATE_BLOCKS_VER_'): diff_version = name[18:].replace('_', '.') if version_info < tuple(int(x) for x in diff_version.split('.')): break blocks.update(getattr(unicode_blocks, name)) elif name.startswith('REMOVED_BLOCKS_VER_'): diff_version = name[19:].replace('_', '.') if version_info < tuple(int(x) for x in diff_version.split('.')): break superseded_blocks.extend(getattr(unicode_blocks, name)) # Following naming rules: https://www.w3.org/TR/xmlschema11-2/#cces-blockesc self._blocks = {k.replace(' ', '').replace('_', ''): v for k, v in blocks.items()} # Additional map for lookup using normalization for Unicode naming rules, # doesn't include superseded blocks. self._unicode_blocks = { k.upper().replace(' ', '').replace('_', '').replace('-', ''): k for k in blocks if k not in superseded_blocks } def category(self, name: str) -> UnicodeSubset: return self._categories[name] def block(self, name: str, normalize: bool = False) -> UnicodeSubset: if normalize: key = name.upper().replace(' ', '').replace('_', '').replace('-', '') try: name = self._unicode_blocks[key] except KeyError: if key != 'NOBLOCK': raise name = key try: subset = self._blocks[name] except KeyError: if name != 'NoBlock': raise # Define the special block "No_Block", that contains all the other codepoints not # belonging to a defined block (https://www.unicode.org/Public/UNIDATA/Blocks.txt) no_block = UnicodeSubset([(0, maxunicode + 1)]) for v in self._blocks.values(): no_block -= v self._blocks['NoBlock'] = no_block self._unicode_blocks['NOBLOCK'] = 'NoBlock' return no_block else: if not isinstance(subset, UnicodeSubset): subset = self._blocks[name] = UnicodeSubset(subset) return subset ### # Installed Unicode Data instance and accessors __unicode_data = UnicodeData() # Simple cache for Unicode subsets defined using callables with no-arguments, that # can include subsets defined on versioned Unicode data or fixed codepoints. This # cache is cleared if Unicode data is reinstalled. __subsets_cache: Dict[Callable[[], UnicodeSubset], UnicodeSubset] = {} def install_unicode_data(version: Optional[str] = None, name_or_url: Optional[str] = None) -> None: """ Install a different version of UnicodeData. For default the package installs the version that matches `unicodedata.unidata_version`. Call without parameters to restore the default version. :param version: Unicode version to install. It's required if a *name_or_url* is provided. :param name_or_url: Import name of an additional module or a URL to raw UnicodeData.txt. """ global __unicode_data if name_or_url is None: __unicode_data = UnicodeData(version) elif version is None: raise TypeError("you must specify a version to install") elif name_or_url.endswith('unicode_categories'): import importlib module = importlib.import_module(name_or_url) version_info = tuple(int(x) for x in version.split('.')) categories = get_categories(version_info, module) __unicode_data = UnicodeData(version, categories) else: categories = get_categories_from_url(name_or_url) __unicode_data = UnicodeData(version, categories) __subsets_cache.clear() def unicode_version() -> str: """Returns the installed UnicodeData version.""" return __unicode_data.version def lazy_subset(func: Callable[[], UnicodeSubset]) -> Callable[[], UnicodeSubset]: """ Defines a lazy UnicodeSubset wrapping its definition in a callable with no arguments. """ @wraps(func) def wrapper() -> UnicodeSubset: try: return __subsets_cache[func] except KeyError: __subsets_cache[func] = func() return __subsets_cache[func] return wrapper def unicode_subset(name: str) -> UnicodeSubset: """Retrieve a Unicode subset by name, raising a RegexError if it cannot be retrieved.""" if name[:2] == 'Is': try: return __unicode_data.block(name[2:]) except KeyError: raise RegexError(f"{name!r} doesn't match any Unicode block") else: try: return __unicode_data.category(name) except KeyError: raise RegexError(f"{name!r} doesn't match any Unicode category") def unicode_category(name: str) -> UnicodeSubset: """ Returns the Unicode Character Category subset addressed by the provided name, raising a KeyError if it's not found. """ return __unicode_data.category(name) def unicode_block(name: str, normalize: bool = False) -> UnicodeSubset: """ Returns the Unicode block subset addressed by the provided name, raising a KeyError if it's not found. For default the lookup is done following the XSD naming rules for blocks and keeping superseded blocks (e.g. Greek), otherwise the name is normalized following the Unicode standard rules, without considering the casing, spaces, hyphens and underscores and the lookup is restricted to blocks defined on installed version. """ return __unicode_data.block(name, normalize) sissaschool-elementpath-d3688c7/elementpath/schema_proxy.py000066400000000000000000000211751476131650400243160ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from abc import ABCMeta, abstractmethod from functools import lru_cache from typing import TYPE_CHECKING, Any, Dict, Optional, Set, Union from elementpath._typing import Iterator from elementpath.exceptions import ElementPathTypeError from elementpath.protocols import XsdTypeProtocol, XsdAttributeProtocol, \ XsdElementProtocol, XsdSchemaProtocol from elementpath.datatypes import AtomicType from elementpath.etree import is_etree_element from elementpath.xpath_context import XPathSchemaContext if TYPE_CHECKING: from elementpath.xpath_tokens import XPath2ParserType PathResult = Union[XsdSchemaProtocol, XsdElementProtocol, XsdAttributeProtocol] class AbstractSchemaProxy(metaclass=ABCMeta): """ Abstract base class for defining schema proxies. An implementation can override initialization type annotations :param schema: a schema instance compatible with the XsdSchemaProtocol. :param base_element: the schema element used as base item for static analysis. """ __slots__ = ('_schema', '_base_element', '_find', '_is_fully_valid') def __init__(self, schema: XsdSchemaProtocol, base_element: Optional[XsdElementProtocol] = None) -> None: if not is_etree_element(schema): raise ElementPathTypeError( "argument {!r} is not a compatible schema instance".format(schema) ) if base_element is not None and not is_etree_element(base_element): raise ElementPathTypeError( "argument 'base_element' is not a compatible element instance" ) self._schema = schema self._base_element: Optional[XsdElementProtocol] = base_element if self._base_element is not None: self._find = self._base_element.find else: self._find = self._schema.find self._is_fully_valid = False @property def schema(self) -> XsdSchemaProtocol: return self._schema @property def base_element(self) -> Optional[XsdElementProtocol]: return self._base_element @property def validity(self) -> str: validity = self._schema.validity if validity != 'valid': self._is_fully_valid = False return validity @property def validation_attempted(self) -> str: validation_attempted = self._schema.validation_attempted if validation_attempted != 'full': self._is_fully_valid = False return validation_attempted def is_fully_valid(self) -> bool: if self._is_fully_valid: return True self._is_fully_valid = self.validity == 'valid' and self.validation_attempted == 'full' return self._is_fully_valid def bind_parser(self, parser: 'XPath2ParserType') -> None: """ Binds a parser instance with schema proxy adding the schema's atomic types constructors. This method can be redefined in a concrete proxy to optimize schema bindings. :param parser: a parser instance. """ if parser.schema is not self: parser.schema = self for xsd_type in self.iter_atomic_types(): if xsd_type.name is not None: # pragma: no cover parser.schema_constructor(xsd_type.name) def get_context(self) -> XPathSchemaContext: """ Get a context instance for static analysis phase. :returns: an `XPathSchemaContext` instance. """ return XPathSchemaContext(root=self._schema, item=self._base_element, schema=self) def find(self, path: str, namespaces: Optional[Dict[str, str]] = None) \ -> Optional[PathResult]: """ Find a schema element or attribute using an XPath expression. :param path: an XPath expression that selects an element or an attribute node. :param namespaces: an optional mapping from namespace prefix to namespace URI. :return: The first matching schema component, or ``None`` if there is no match. """ return self._find(path, namespaces) @lru_cache(maxsize=None) def cached_find(self, expanded_path: str) -> Optional[PathResult]: """ Find a schema element or attribute using an expanded path as XPath expression. :param expanded_path: an XPath expression with qualified names already resolved \ to expanded form. :return: The first matching schema component, or ``None`` if there is no match. """ return self._find(expanded_path) @property def xsd_version(self) -> str: """The XSD version, returns '1.0' or '1.1'.""" return self._schema.xsd_version def is_assertion_based(self) -> bool: return self._base_element is not None and \ self._base_element.parent is self._base_element.type def get_type(self, qname: str) -> Optional[XsdTypeProtocol]: """ Get the XSD global type from the schema's scope. A concrete implementation must return an object that supports the protocols `XsdTypeProtocol`, or `None` if the global type is not found. :param qname: the fully qualified name of the type to retrieve. :returns: an object that represents an XSD type or `None`. """ xsd_type = self._schema.maps.types.get(qname) if isinstance(xsd_type, tuple): return None return xsd_type def get_attribute(self, qname: str) -> Optional[XsdAttributeProtocol]: """ Get the XSD global attribute from the schema's scope. A concrete implementation must return an object that supports the protocol `XsdAttributeProtocol`, or `None` if the global attribute is not found. :param qname: the fully qualified name of the attribute to retrieve. :returns: an object that represents an XSD attribute or `None`. """ xsd_attribute = self._schema.maps.attributes.get(qname) if isinstance(xsd_attribute, tuple): return None return xsd_attribute def get_element(self, qname: str) -> Optional[XsdElementProtocol]: """ Get the XSD global element from the schema's scope. A concrete implementation must return an object that supports the protocol `XsdElementProtocol` interface, or `None` if the global element is not found. :param qname: the fully qualified name of the element to retrieve. :returns: an object that represents an XSD element or `None`. """ xsd_element = self._schema.maps.elements.get(qname) if isinstance(xsd_element, tuple): return None return xsd_element def get_substitution_group(self, qname: str) -> Optional[Set[XsdElementProtocol]]: """ Get a substitution group. A concrete implementation must return a list containing substitution elements or `None` if the substitution group is not found. Moreover, each item of the returned list must be an object that implements the `AbstractXsdElement` interface. :param qname: the fully qualified name of the substitution group to retrieve. :returns: a list containing substitution elements or `None`. """ return self._schema.maps.substitution_groups.get(qname) @abstractmethod def is_instance(self, obj: Any, type_qname: str) -> bool: """ Returns `True` if *obj* is an instance of the XSD global type, `False` if not. :param obj: the instance to be tested. :param type_qname: the fully qualified name of the type used to test the instance. """ @abstractmethod def cast_as(self, obj: Any, type_qname: str) -> AtomicType: """ Converts *obj* to the Python type associated with an XSD global type. A concrete implementation must raises a `ValueError` or `TypeError` in case of a decoding error or a `KeyError` if the type is not bound to the schema's scope. :param obj: the instance to be cast. :param type_qname: the fully qualified name of the type used to convert the instance. """ @abstractmethod def iter_atomic_types(self) -> Iterator[XsdTypeProtocol]: """ Returns an iterator for not builtin atomic types defined in the schema's scope. A concrete implementation must yield objects that implement the protocol `XsdTypeProtocol`. """ __all__ = ['PathResult', 'AbstractSchemaProxy'] sissaschool-elementpath-d3688c7/elementpath/sequence_types.py000066400000000000000000000330371476131650400246510ustar00rootroot00000000000000# # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from itertools import zip_longest from typing import TYPE_CHECKING, cast, Any, Optional from elementpath.exceptions import ElementPathKeyError, xpath_error from elementpath.helpers import collapse_white_spaces, OCCURRENCE_INDICATORS, Patterns from elementpath.namespaces import XSD_NAMESPACE, XSD_ERROR, XSD_ANY_SIMPLE_TYPE, XSD_NUMERIC, \ get_expanded_name, XSD_UNTYPED, XSD_UNTYPED_ATOMIC from elementpath.datatypes import xsd10_atomic_types, xsd11_atomic_types, AnyAtomicType, \ QName, NumericProxy from elementpath.xpath_nodes import XPathNode, DocumentNode, ElementNode, AttributeNode from elementpath import xpath_tokens if TYPE_CHECKING: from elementpath.xpath_tokens import XPathParserType XSD_EXTENDED_PREFIX = f'{{{XSD_NAMESPACE}}}' COMMON_SEQUENCE_TYPES = { 'xs:anyType', 'xs:anySimpleType', 'xs:anyAtomicType', 'xs:boolean', 'xs:decimal', 'xs:double', 'xs:float', 'xs:string', 'xs:date', 'xs:dateTime', 'xs:gDay', 'xs:gMonth', 'xs:gMonthDay', 'xs:gYear', 'xs:gYearMonth', 'xs:time', 'xs:duration', 'xs:dayTimeDuration', 'xs:yearMonthDuration', 'xs:QName', 'xs:anyURI', 'xs:normalizedString', 'xs:token', 'xs:language', 'xs:Name', 'xs:NCName', 'xs:ID', 'xs:IDREF', 'xs:ENTITY', 'xs:NMTOKEN', 'xs:base64Binary', 'xs:hexBinary', 'xs:integer', 'xs:long', 'xs:int', 'xs:short', 'xs:byte', 'xs:positiveInteger', 'xs:negativeInteger', 'xs:numeric', 'xs:nonPositiveInteger', 'xs:nonNegativeInteger', 'xs:unsignedLong', 'xs:unsignedInt', 'xs:unsignedShort', 'xs:unsignedByte', 'xs:untyped', 'xs:untypedAtomic', 'attribute()', 'attribute(*)', 'element()', 'element(*)', 'text()', 'document-node()', 'comment()', 'processing-instruction()', 'item()', 'node()', 'numeric' } ### # Sequence type checking def normalize_sequence_type(sequence_type: str) -> str: sequence_type = collapse_white_spaces(sequence_type) sequence_type = Patterns.sequence_type.sub(r'\1', sequence_type) return sequence_type.replace(',', ', ').replace(')as', ') as') def is_sequence_type_restriction(st1: str, st2: str) -> bool: """Returns `True` if st2 is a restriction of st1.""" st1, st2 = normalize_sequence_type(st1), normalize_sequence_type(st2) if st2 in ('empty-sequence()', 'none') and \ (st1 in ('empty-sequence()', 'none') or st1.endswith(('?', '*'))): return True # check occurrences if st1[-1] not in '?+*': if st2[-1] in '+*': return False elif st2[-1] == '?': st2 = st2[:-1] elif st1[-1] == '+': st1 = st1[:-1] if st2[-1] in '?*': return False elif st2[-1] == '+': st2 = st2[:-1] elif st1[-1] == '*': st1 = st1[:-1] if st2[-1] in '?+': return False elif st2[-1] == '*': st2 = st2[:-1] else: st1 = st1[:-1] if st2[-1] in '+*': return False elif st2[-1] == '?': st2 = st2[:-1] if st1 == st2: return True elif st1 == 'item()': return True elif st2 == 'item()': return False elif st1 == 'node()': return st2.startswith(('element(', 'attribute(', 'comment(', 'text(', 'processing-instruction(', 'document(', 'namespace(')) elif st2 == 'node()': return False elif st1 == 'xs:anyAtomicType': try: return issubclass(xsd11_atomic_types[st2[3:]], AnyAtomicType) except KeyError: return False elif st1.startswith('xs:'): if st2 == 'xs:anyAtomicType': return True try: return issubclass(xsd11_atomic_types[st2[3:]], xsd11_atomic_types[st1[3:]]) except KeyError: return False elif not st1.startswith('function('): return False if st1 == 'function(*)': return st2.startswith('function(') parts1 = st1[9:].partition(') as ') parts2 = st2[9:].partition(') as ') for st1, st2 in zip_longest(parts1[0].split(', '), parts2[0].split(', ')): if st1 is None or st2 is None: return False if not is_sequence_type_restriction(st2, st1): return False else: if not is_sequence_type_restriction(parts1[2], parts2[2]): return False return True def is_instance(obj: Any, type_qname: str, parser: Optional['XPathParserType'] = None) -> bool: """Checks an instance against an XSD type.""" xsd_version = getattr(parser, 'xsd_version', '1.0') if not type_qname.startswith('{'): if parser is not None: type_qname = get_expanded_name(type_qname, parser.namespaces) elif type_qname.startswith('xs:'): type_qname = type_qname.replace('xs:', XSD_EXTENDED_PREFIX, 1) if type_qname.startswith(XSD_EXTENDED_PREFIX): try: if xsd_version == '1.1': return isinstance(obj, xsd11_atomic_types[type_qname]) return isinstance(obj, xsd10_atomic_types[type_qname]) except KeyError: pass if type_qname == XSD_ERROR: return obj is None or obj == [] elif type_qname == XSD_ANY_SIMPLE_TYPE: return isinstance(obj, AnyAtomicType) or \ isinstance(obj, list) and \ all(isinstance(x, AnyAtomicType) for x in obj) elif type_qname in ('numeric', XSD_NUMERIC): return isinstance(obj, NumericProxy) if parser is not None and parser.schema is not None: try: return parser.schema.is_instance(obj, type_qname) except KeyError: pass raise ElementPathKeyError("unknown type %r" % type_qname) def is_sequence_type(value: Any, parser: Optional['XPathParserType'] = None) -> bool: """Checks if a string is a sequence type specification.""" def is_st(st: str) -> bool: if not st: return False elif st == 'empty-sequence()' or st == 'none': return True elif st[-1] in OCCURRENCE_INDICATORS: st = st[:-1] if st in COMMON_SEQUENCE_TYPES: return True elif st.startswith(('map(', 'array(')): if parser and parser.version < '3.1' or not st.endswith(')'): return False if st in ('map(*)', 'array(*)'): return True if st.startswith('map('): key_type, _, value_type = st[4:-1].partition(', ') return key_type.startswith('xs:') and \ not key_type.endswith(('+', '*')) and \ is_st(key_type) and \ is_st(key_type) else: return is_st(st[6:-1]) elif st.startswith('element(') and st.endswith(')'): if ',' not in st: return Patterns.extended_qname.match(st[8:-1]) is not None try: arg1, arg2 = st[8:-1].split(', ') except ValueError: return False else: return (arg1 == '*' or Patterns.extended_qname.match(arg1) is not None) \ and Patterns.extended_qname.match(arg2) is not None elif st.startswith('document-node(') and st.endswith(')'): if not st.startswith('document-node(element('): return False return is_st(st[14:-1]) elif st.startswith('function('): if parser and parser.version < '3.0': return False elif st == 'function(*)': return True elif ' as ' in st: pass elif not st.endswith(')'): return False else: return is_st(st[9:-1]) st, return_type = st.rsplit(' as ', 1) if not is_st(return_type): return False elif st == 'function()': return True st = st[9:-1] if st.endswith(', ...'): st = st[:-5] if 'function(' not in st: return all(is_st(x) for x in st.split(', ')) elif st.startswith('function(*)') and 'function(' not in st[11:]: return all(is_st(x) for x in st.split(', ')) # Cover only if function() spec is the last argument k = st.index('function(') if not is_st(st[k:]): return False return all(is_st(x) for x in st[:k].split(', ') if x) elif QName.pattern.match(st) is None: return False if parser is None: return False try: is_instance(None, st, parser) except (KeyError, ValueError): return False else: return True if not isinstance(value, str): return False return is_st(normalize_sequence_type(value)) def match_sequence_type(value: Any, sequence_type: str, parser: Optional['XPathParserType'] = None, strict: bool = True) -> bool: """ Checks a value instance against a sequence type. :param value: the instance to check. :param sequence_type: a string containing the sequence type spec. :param parser: an optional parser instance for type checking. :param strict: if `False` match xs:anyURI with strings. """ def match_st(v: Any, st: str, occurrence: Optional[str] = None) -> bool: if st[-1] in OCCURRENCE_INDICATORS and ') as ' not in st: return match_st(v, st[:-1], st[-1]) elif v is None or isinstance(v, list) and v == []: return st in ('empty-sequence()', 'none') or occurrence in ('?', '*') elif st in ('empty-sequence()', 'none'): return False elif isinstance(v, list): if len(v) == 1: return match_st(v[0], st) elif occurrence is None or occurrence == '?': return False else: return all(match_st(x, st) for x in v) elif st == 'item()': return isinstance(v, (XPathNode, AnyAtomicType, list, xpath_tokens.XPathFunction)) elif st == 'numeric' or st == 'xs:numeric': return isinstance(v, NumericProxy) elif st.startswith('function('): if not isinstance(v, xpath_tokens.XPathFunction): return False return v.match_function_test(st) elif st.startswith('array('): if not isinstance(v, xpath_tokens.XPathArray): return False if st == 'array(*)': return True item_st = st[6:-1] return all(match_st(x, item_st) for x in v.items()) elif st.startswith('map('): if not isinstance(v, xpath_tokens.XPathMap): return False if st == 'map(*)': return True key_st, _, value_st = st[4:-1].partition(', ') if key_st.endswith(('+', '*')): raise xpath_error('XPST0003', 'no multiples occurs for a map key') return all(match_st(k, key_st) and match_st(v, value_st) for k, v in v.items()) if isinstance(v, XPathNode): node_kind = v.node_kind elif '(' in st: return False elif not strict and st == 'xs:anyURI' and isinstance(v, str): return True else: try: return is_instance(v, st, parser) except (KeyError, ValueError): raise xpath_error('XPST0051') if st == 'node()': return True elif not st.startswith(node_kind) or not st.endswith(')'): return False elif st == f'{node_kind}()': return True elif node_kind == 'document': element_test = st[14:-1] if not element_test: return True document = cast(DocumentNode, v) return any( match_st(e, element_test) for e in document if isinstance(e, ElementNode) ) elif node_kind not in ('element', 'attribute'): return False _, params = st[:-1].split('(') if ', ' not in st: name = params else: name, type_name = params.rsplit(', ', 1) if type_name.endswith('?'): type_name = type_name[:-1] elif isinstance(v, ElementNode) and v.nilled: return False if type_name == 'xs:untyped': if isinstance(v, AttributeNode) and v.type_name != XSD_UNTYPED_ATOMIC: return False if isinstance(v, ElementNode) and v.type_name != XSD_UNTYPED: return False else: try: if not is_instance(v.typed_value, type_name, parser): return False except (KeyError, ValueError): raise xpath_error('XPST0051') if name == '*': return True try: exp_name = get_expanded_name(name, parser.namespaces) # type: ignore[union-attr] except (KeyError, ValueError): return False except AttributeError: return True if v.name == name else False else: return True if v.name == exp_name else False return match_st(value, normalize_sequence_type(sequence_type)) sissaschool-elementpath-d3688c7/elementpath/serialization.py000066400000000000000000000422141476131650400244670ustar00rootroot00000000000000# # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import json from decimal import Decimal, ROUND_UP from types import ModuleType from typing import cast, Any, Dict, Optional, Set, Union, Tuple from xml.etree import ElementTree from elementpath._typing import Iterator, Iterable from elementpath.exceptions import ElementPathError, xpath_error from elementpath.namespaces import XSLT_XQUERY_SERIALIZATION_NAMESPACE from elementpath.datatypes import AnyAtomicType, AnyURI, AbstractDateTime, \ AbstractBinary, UntypedAtomic, QName from elementpath.xpath_nodes import XPathNode, ElementNode, AttributeNode, DocumentNode, \ NamespaceNode, TextNode, CommentNode from elementpath.xpath_tokens import XPathToken, XPathMap, XPathArray from elementpath.protocols import EtreeElementProtocol, LxmlElementProtocol # XSLT and XQuery Serialization parameters SERIALIZATION_PARAMS = '{%s}serialization-parameters' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_OMIT_XML_DECLARATION = '{%s}omit-xml-declaration' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_USE_CHARACTER_MAPS = '{%s}use-character-maps' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_CHARACTER_MAP = '{%s}character-map' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_METHOD = '{%s}method' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_INDENT = '{%s}indent' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_VERSION = '{%s}version' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_CDATA = '{%s}cdata-section-elements' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_NO_INDENT = '{%s}suppress-indentation' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_STANDALONE = '{%s}standalone' % XSLT_XQUERY_SERIALIZATION_NAMESPACE SER_PARAM_ITEM_SEPARATOR = '{%s}item-separator' % XSLT_XQUERY_SERIALIZATION_NAMESPACE def get_serialization_params(params: Union[None, ElementNode, XPathMap] = None, token: Optional[XPathToken] = None) -> Dict['str', Any]: kwargs: Dict[str, Any] = {} character_map: Dict[str, str] value: Any if isinstance(params, XPathMap): if len(params[:]) > len(params.keys()): # pragma: no cover raise xpath_error('SEPM0019', token=token) for key, value in params.items(): if not isinstance(key, str) or value is None: continue elif isinstance(value, UntypedAtomic): value = str(value) if value == 'true': value = True elif value == 'false': value = False if key == 'omit-xml-declaration': if not isinstance(value, bool): raise xpath_error('XPTY0004', token=token) kwargs['xml_declaration'] = not value elif key == 'cdata-section-elements': # TODO: doesn't work within element nodes if isinstance(value, XPathArray): value = value.items() if not isinstance(value, list) or not all(isinstance(x, QName) for x in value): raise xpath_error('XPTY0004', token=token) kwargs['cdata_section'] = value elif key == 'method': if value not in ('html', 'xml', 'xhtml', 'text', 'adaptive', 'json'): raise xpath_error('SEPM0017', token=token) kwargs[key] = value if value != 'xhtml' else 'html' elif key == 'indent': if not isinstance(value, bool): raise xpath_error('XPTY0004', token=token) kwargs[key] = value elif key == 'item-separator': if not isinstance(value, str): raise xpath_error('XPTY0004', token=token) kwargs['item_separator'] = value elif key == 'use-character-maps': if not isinstance(value, XPathMap): raise xpath_error('XPTY0004', token=token) kwargs['character_map'] = character_map = {} for k, v in value.items(): if not isinstance(k, str) or not isinstance(v, str): raise xpath_error('XPTY0004', token=token) elif len(k) != 1: msg = f'invalid character {k!r} in character map' raise xpath_error('SEPM0016', msg, token) else: character_map[k] = v elif key == 'suppress-indentation': # pragma: no cover if isinstance(value, QName) or isinstance(value, list) \ and all(isinstance(x, QName) for x in value): kwargs[key] = value else: raise xpath_error('XPTY0004', token=token) elif key == 'standalone': if not value and isinstance(value, list): pass elif isinstance(value, bool): kwargs['standalone'] = value else: if value not in ('yes', 'no', 'omit'): raise xpath_error('XPTY0004', token=token) if value != 'omit': kwargs['standalone'] = value == 'yes' elif key == 'json-node-output-method': if not isinstance(value, (str, QName)): raise xpath_error('XPTY0004', token=token) kwargs[key] = value elif key == 'allow-duplicate-names': if value is not None and not isinstance(value, bool): raise xpath_error('XPTY0004', token=token) kwargs['allow_duplicate_names'] = value elif key == 'encoding': if not isinstance(value, str): raise xpath_error('XPTY0004', token=token) kwargs[key] = value elif key == 'html-version': if not isinstance(value, (int, Decimal)): raise xpath_error('XPTY0004', token=token) kwargs[key] = value elif isinstance(params, ElementNode): root = cast(Union[EtreeElementProtocol, LxmlElementProtocol], params.obj) if root.tag != SERIALIZATION_PARAMS: msg = 'output:serialization-parameters tag expected' raise xpath_error('XPTY0004', msg, token) if len(root) > len({e.tag for e in root}): raise xpath_error('SEPM0019', token=token) for child in root: if child.tag == SER_PARAM_OMIT_XML_DECLARATION: value = child.get('value') if value not in ('yes', 'no') or len(child.attrib) > 1: raise xpath_error('SEPM0017', token=token) elif value == 'no': kwargs['xml_declaration'] = True elif child.tag == SER_PARAM_USE_CHARACTER_MAPS: if len(child.attrib): raise xpath_error('SEPM0017', token=token) kwargs['character_map'] = character_map = {} for e in child: if e.tag != SER_PARAM_CHARACTER_MAP: raise xpath_error('SEPM0017', token=token) try: character = e.attrib['character'] if character in character_map: msg = 'duplicate character {!r} in character map' raise xpath_error('SEPM0018', msg.format(character), token) elif len(character) != 1: msg = 'invalid character {!r} in character map' raise xpath_error('SEPM0017', msg.format(character), token) character_map[character] = e.attrib['map-string'] except KeyError as key: msg = "missing {} in character map" raise xpath_error('SEPM0017', msg.format(key)) from None else: if len(e.attrib) > 2: msg = "invalid attribute in character map" raise xpath_error('SEPM0017', msg) elif child.tag == SER_PARAM_METHOD: value = child.get('value') if value not in ('html', 'xml', 'xhtml', 'text') or len(child.attrib) > 1: raise xpath_error('SEPM0017', token=token) kwargs['method'] = value if value != 'xhtml' else 'html' elif child.tag == SER_PARAM_INDENT: value = child.attrib.get('value', '') assert isinstance(value, str) value = value.strip() if value not in ('yes', 'no') or len(child.attrib) > 1: raise xpath_error('SEPM0017', token=token) elif child.tag == SER_PARAM_ITEM_SEPARATOR: try: kwargs['item_separator'] = child.attrib['value'] except KeyError: raise xpath_error('SEPM0017', token=token) from None elif child.tag == SER_PARAM_CDATA: pass # TODO param elif child.tag == SER_PARAM_NO_INDENT: pass # TODO param elif child.tag == SER_PARAM_STANDALONE: value = child.attrib.get('value', '') assert isinstance(value, str) value = value.strip() if value not in ('yes', 'no', 'omit') or len(child.attrib) > 1: raise xpath_error('SEPM0017', token=token) if value != 'omit': kwargs['standalone'] = value == 'yes' elif child.tag.startswith(f'{{{XSLT_XQUERY_SERIALIZATION_NAMESPACE}'): raise xpath_error('SEPM0017', token=token) elif not child.tag.startswith('{'): # no-namespace not allowed raise xpath_error('SEPM0017', token=token) return kwargs def iter_normalized(elements: Iterable[Any], item_separator: Optional[str] = None) -> Iterator[Any]: chunks = [] sep = ' ' if item_separator is None else item_separator for item in elements: if isinstance(item, XPathArray): for _item in item.iter_flatten(): if isinstance(_item, bool): chunks.append('true' if _item else 'false') elif isinstance(_item, AnyAtomicType): chunks.append(str(_item)) else: if chunks: yield sep.join(chunks) chunks.clear() if isinstance(_item, DocumentNode): yield from _item.children else: yield _item elif isinstance(item, bool): chunks.append('true' if item else 'false') elif isinstance(item, AnyAtomicType): chunks.append(str(item)) else: if chunks: yield sep.join(chunks) chunks.clear() if isinstance(item, DocumentNode): yield from item.children else: yield item else: if chunks: yield sep.join(chunks) def serialize_to_xml(elements: Iterable[Any], etree_module: Optional[ModuleType] = None, token: Optional['XPathToken'] = None, **params: Any) -> str: if etree_module is None: etree_module = ElementTree item_separator = params.get('item_separator') character_map = params.get('character_map') cdata_section: Union[Set[str], Tuple[()]] kwargs = {} if 'xml_declaration' in params: kwargs['xml_declaration'] = params['xml_declaration'] if 'standalone' in params: kwargs['standalone'] = params['standalone'] if 'cdata_section' in params: cdata_section = {x.expanded_name for x in params['cdata_section']} else: cdata_section = () method = kwargs.get('method', 'xml') if method == 'xhtml': method = 'html' chunks = [] for item in iter_normalized(elements, item_separator): if isinstance(item, ElementNode): item = item.obj elif isinstance(item, (AttributeNode, NamespaceNode)): raise xpath_error('SENR0001', token=token) elif isinstance(item, TextNode): if item.parent is not None and item.parent.name in cdata_section: chunks.append(f'') else: chunks.append(item.obj) continue elif not isinstance(item, str): raise xpath_error('SENR0001', token=token) else: chunks.append(item) continue try: cks = etree_module.tostringlist( item, encoding='utf-8', method=method, **kwargs ) except TypeError: ck = etree_module.tostring(item, encoding='utf-8', method=method) chunks.append(ck.decode('utf-8').rstrip(item.tail)) else: if cks and cks[0].startswith(b' str: if etree_module is None: etree_module = ElementTree class MapEncodingDict(dict): # type: ignore[type-arg] def __init__(self, items: Any) -> None: self[None] = None self._items = items def items(self) -> Any: return self._items class XPathEncoder(json.JSONEncoder): def default(self, obj: Any) -> Any: if isinstance(obj, XPathNode): if isinstance(obj, DocumentNode): return ''.join(self.default(child) for child in obj) elif isinstance(obj, ElementNode): elem = obj.obj assert etree_module is not None try: chunks = etree_module.tostringlist(elem, encoding='utf-8') except TypeError: chunk = etree_module.tostring(elem, encoding='utf-8') return cast(str, chunk.decode('utf-8')) else: if chunks and chunks[0].startswith(b'' else: return f'' elif isinstance(obj, XPathMap): if any(isinstance(v, list) and len(v) > 1 for v in obj.values()): raise xpath_error('SERE0023', token=token) map_keys = set() map_items = [] k: Any for k, v in obj.items(): if isinstance(k, QName): k = str(k) map_items.append((k, v)) if k not in map_keys: map_keys.add(k) elif not params.get('allow_duplicate_names'): raise xpath_error('SERE0022', token=token) return MapEncodingDict(map_items) elif isinstance(obj, XPathArray): return [v if v or not isinstance(v, list) else None for v in obj.items()] elif isinstance(obj, (AbstractBinary, AbstractDateTime, AnyURI, UntypedAtomic)): return str(obj) elif isinstance(obj, Decimal): return float(Decimal(obj).quantize(Decimal("0.01"), ROUND_UP)) else: return super().default(obj) kwargs: Dict[str, Any] = { 'cls': XPathEncoder, 'ensure_ascii': True, 'separators': (',', ':'), 'allow_nan': False, } try: parts = [json.dumps(x, **kwargs) for x in elements] except ElementPathError: raise except ValueError: raise xpath_error('SERE0020', token=token) except TypeError: raise xpath_error('SERE0021', token=token) if not parts: return 'null' elif len(parts) > 1: raise xpath_error('SERE0023', token=token) result = parts[0].replace('/', '\\/') if 'encoding' in params: return result.encode('utf-8').decode(params['encoding']) return result sissaschool-elementpath-d3688c7/elementpath/tdop.py000066400000000000000000001042401476131650400225560ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module contains base classes and helper functions for defining Pratt parsers. """ import sys import re from abc import ABCMeta from unicodedata import name as unicode_name from decimal import Decimal, DecimalException from typing import Any, cast, Dict, List, overload, Generic, Optional, Union, \ Tuple, Type, Iterator, TypeVar from elementpath._typing import Callable, Match, MutableMapping, \ MutableSequence, Pattern # # Simple top-down parser based on Vaughan Pratt's algorithm (Top Down Operator Precedence). # # References: # # https://tdop.github.io/ (Vaughan R. Pratt's "Top Down Operator Precedence" - 1973) # http://crockford.com/javascript/tdop/tdop.html (Douglas Crockford - 2007) # http://effbot.org/zone/simple-top-down-parsing.htm (Fredrik Lundh - 2008) # # This implementation is based on a base class for tokens and a base class for parsers. # A real parser is built with a derivation of the base parser class followed by the # registrations of token classes for the symbols of the language. # # A parser can be extended by derivation, copying the reusable token classes and # defining the additional ones. See the files xpath1_parser.py and xpath2_parser.py # for a full implementation example of a real parser. # # Parser special symbols set, that includes the special symbols of TDOP plus two # additional special symbols for managing invalid literals and unknown symbols # and source start. SPECIAL_SYMBOLS = frozenset(( '(start)', '(end)', '(string)', '(float)', '(decimal)', '(integer)', '(name)', '(invalid)', '(unknown)', )) class ParseError(SyntaxError): """An error when parsing source with TDOP parser.""" def _symbol_to_classname(symbol: str) -> str: """ Converts a symbol string to an identifier (only alphanumeric and '_'). """ def get_id_name(c: str) -> str: if c.isalnum() or c == '_': return c else: return '%s_' % unicode_name(str(c)).title() if symbol.isalnum(): return symbol.title() elif symbol in SPECIAL_SYMBOLS: return symbol[1:-1].title() elif all(c in '-_' for c in symbol): value = ' '.join(unicode_name(c) for c in symbol) return value.title().replace(' ', '').replace('-', '').replace('_', '') value = symbol.replace('-', '_') if value.isidentifier(): return value.title().replace('_', '') value = ''.join(get_id_name(c) for c in symbol) return value.replace(' ', '').replace('-', '').replace('_', '') class MultiLabel: """ Helper class for defining multi-value label for tokens. Useful when a symbol has more roles. A label of this type has equivalence with each of its values. Example: label = MultiLabel('function', 'operator') label == 'symbol' # False label == 'function' # True label == 'operator' # True """ def __init__(self, *values: str) -> None: self.values = values def __eq__(self, other: object) -> bool: return any(other == v for v in self.values) def __ne__(self, other: object) -> bool: return all(other != v for v in self.values) def __repr__(self) -> str: return '%s%s' % (self.__class__.__name__, self.values) def __str__(self) -> str: return '__'.join(self.values).replace(' ', '_') def __hash__(self) -> int: return hash(self.values) def __contains__(self, item: str) -> bool: return any(item in v for v in self.values) def startswith(self, s: str) -> bool: return any(v.startswith(s) for v in self.values) def endswith(self, s: str) -> bool: return any(v.endswith(s) for v in self.values) TK = TypeVar('TK', bound='Token[Any]') class Token(MutableSequence[TK]): """ Token base class for defining a parser based on Pratt's method. Each token instance is a list-like object. The number of token's items is the arity of the represented operator, where token's items are the operands. Nullary operators are used for symbols, names and literals. Tokens with items represent the other operators (unary, binary and so on). Each token class has a *symbol*, a lbp (left binding power) value and a rbp (right binding power) value, that are used in the sense described by the Pratt's method. This implementation of Pratt tokens includes two extra attributes, *pattern* and *label*, that can be used to simplify the parsing of symbols in a concrete parser. :param parser: The parser instance that creates the token instance. :param value: The token value. If not provided defaults to token symbol. :cvar symbol: the symbol of the token class. :cvar lbp: Pratt's left binding power, defaults to 0. :cvar rbp: Pratt's right binding power, defaults to 0. :cvar pattern: the regex pattern used for the token class. Defaults to the \ escaped symbol. Can be customized to match more detailed conditions (e.g. a \ function with its left round bracket), in order to simplify the related code. :cvar label: defines the typology of the token class. Its value is used in \ representations of the token instance and can be used to restrict code choices \ without more complicated analysis. The label value can be set as needed by the \ parser implementation (eg. 'function', 'axis', 'constructor function' are used by \ the XPath parsers). In the base parser class defaults to 'symbol' with 'literal' \ and 'operator' as possible alternatives. If set by a tuple of values the token \ class label is transformed to a multi-value label, that means the token class can \ covers multiple roles (e.g. as XPath function or axis). In those cases the definitive \ role is defined at parse time (nud and/or led methods) after the token instance creation. """ lbp: int = 0 # left binding power rbp: int = 0 # right binding power symbol: str = '' # the token identifier lookup_name: str = '' # the key in symbol table, usually matches the symbol. label: Union[str, MultiLabel] = 'symbol' # the label, that usually means a class of tokens. pattern: Optional[str] = None # a custom regex pattern for building the tokenizer __slots__ = '_items', 'parser', 'value', 'span' _items: List[TK] parser: 'Parser[TK]' value: Any span: Tuple[int, int] def __init__(self, parser: 'Parser[TK]', value: Optional[Any] = None) -> None: self._items = [] self.parser = parser self.value = value if value is not None else self.symbol self.span = (0, 0) if parser.next_match is None else parser.next_match.span() @overload def __getitem__(self, i: int) -> TK: ... # pragma: no cover @overload def __getitem__(self, s: slice) -> MutableSequence[TK]: ... # pragma: no cover def __getitem__(self, i: Union[int, slice]) \ -> Union[TK, MutableSequence[TK]]: return self._items[i] def __setitem__(self, i: Union[int, slice], o: Any) -> None: self._items[i] = o def __delitem__(self, i: Union[int, slice]) -> None: del self._items[i] def __len__(self) -> int: return len(self._items) def insert(self, i: int, item: TK) -> None: self._items.insert(i, item) def __str__(self) -> str: if self.symbol in SPECIAL_SYMBOLS: return '%r %s' % (self.value, self.symbol[1:-1]) else: return '%r %s' % (self.symbol, str(self.label)) def __repr__(self) -> str: return '<%s object at %#x>' % (self.__class__.__name__, id(self)) def __eq__(self, other: object) -> bool: if isinstance(other, Token): return self.symbol == other.symbol and self.value == other.value return False @property def arity(self) -> int: return len(self) @property def tree(self) -> str: """Returns a tree representation string.""" if self.symbol == '(name)': return '(%s)' % self.value elif self.symbol in SPECIAL_SYMBOLS: return '(%r)' % self.value elif self.symbol == '(': if len(self) == 1: return self[0].tree return f"({' '.join(item.tree for item in self)})" elif not self: return '(%s)' % self.symbol else: return f"({self.symbol} {' '.join(item.tree for item in self)})" @property def source(self) -> str: """Returns the source representation string.""" symbol = self.symbol if symbol == '(name)': return cast(str, self.value) elif symbol == '(decimal)': return str(self.value) elif symbol in SPECIAL_SYMBOLS: return repr(self.value).replace(r'\\', '\\') else: length = len(self) if not length: return symbol elif length == 1: if 'postfix' in self.label: return '%s %s' % (self[0].source, symbol) return '%s %s' % (symbol, self[0].source) elif length == 2: return '%s %s %s' % (self[0].source, symbol, self[1].source) else: return '%s %s' % (symbol, ' '.join(item.source for item in self)) @property def position(self) -> Tuple[int, int]: """A tuple with the position of the token in terms of line and column.""" token_index = self.span[0] line = self.parser.source[:token_index].count('\n') + 1 if line == 1: return 1, token_index + 1 return line, token_index - self.parser.source[:token_index].rindex('\n') def as_name(self) -> TK: """Returns a new '(name)' token for resolving ambiguous states.""" assert self.parser.name_pattern.match(self.symbol) is not None, \ "Token symbol is not compatible with the name pattern!" token = self.parser.symbol_table['(name)'](self.parser, self.symbol) token.span = self.span return token def is_source_start(self) -> bool: """ Returns `True` if the token is positioned at the start of the source, ignoring the spaces. """ return not bool(self.parser.source[0:self.span[0]].strip()) def is_line_start(self) -> bool: """ Returns `True` if the token is positioned at the start of a source line, ignoring the spaces. """ token_index = self.span[0] try: line_start = self.parser.source[:token_index].rindex('\n') + 1 except ValueError: return not bool(self.parser.source[:token_index].strip()) else: return not bool(self.parser.source[line_start:token_index].strip()) def is_spaced(self, before: bool = True, after: bool = True) -> bool: """ Returns `True` if the token has extra spaces (whitespace, tab or newline) immediately before or after it. :param before: if `True` considers also the extra spaces before the token. :param after: if `True` considers also the extra spaces after the token. """ start, end = self.span try: if before and start > 0 and self.parser.source[start - 1] in ' \t\n': return True return after and self.parser.source[end] in ' \t\n' except IndexError: return False def nud(self) -> TK: """Pratt's null denotation method""" raise self.wrong_syntax() def led(self, left: TK) -> TK: """Pratt's left denotation method""" raise self.wrong_syntax() def evaluate(self) -> Any: """Evaluation method""" return self.value def iter(self: TK, *symbols: str) -> Iterator[TK]: """Returns a generator for iterating the token's tree.""" status: List[Tuple[Optional[TK], Iterator[TK]]] = [] parent: Optional[TK] = self children: Iterator[TK] = iter(self) tk: TK while True: for tk in children: if parent is not None and len(parent._items) == 1: if not symbols or parent.symbol in symbols: yield parent parent = None if not tk._items: if not symbols or tk.symbol in symbols: yield tk if parent is not None: if not symbols or parent.symbol in symbols: yield parent parent = None continue status.append((parent, children)) parent, children = tk, iter(tk) break else: try: parent, children = status.pop() except IndexError: if parent is not None: if not symbols or parent.symbol in symbols: yield parent return else: if parent is not None: if not symbols or parent.symbol in symbols: yield parent parent = None def expected(self, *symbols: str, message: Optional[str] = None) -> None: if symbols and self.symbol not in symbols: raise self.wrong_syntax(message) def unexpected(self, *symbols: str, message: Optional[str] = None) -> None: if not symbols or self.symbol in symbols: raise self.wrong_syntax(message) def wrong_syntax(self, message: Optional[str] = None) -> ParseError: if message: return ParseError(message) elif self.symbol not in SPECIAL_SYMBOLS: return ParseError('unexpected %s' % self) elif self.symbol == '(invalid)': return ParseError('invalid literal %r' % self.value) elif self.symbol == '(unknown)': return ParseError('unknown symbol %r' % self.value) elif self.symbol == '(name)': return ParseError('unexpected name %r' % self.value) elif self.symbol != '(end)': return ParseError('unexpected literal %r' % self.value) elif self.parser.token.symbol == '(start)': return ParseError('source is empty') else: return ParseError('unexpected end of source') def wrong_type(self, message: str = 'invalid type') -> TypeError: return TypeError(message) def wrong_value(self, message: str = 'invalid value') -> ValueError: return ValueError(message) class ParserMeta(ABCMeta): token_base_class: Type[Any] literals_pattern: Pattern[str] name_pattern: Pattern[str] tokenizer: Optional[Pattern[str]] symbol_table: MutableMapping[str, Type[Any]] def __new__(mcs, name: str, bases: Tuple[Type[Any], ...], namespace: Dict[str, Any]) \ -> 'ParserMeta': cls = super(ParserMeta, mcs).__new__(mcs, name, bases, namespace) # Avoids more parsers definitions for a single module for k, v in sys.modules[cls.__module__].__dict__.items(): if isinstance(v, ParserMeta) and v.__module__ == cls.__module__: raise RuntimeError("Multiple parser class definitions per module are not allowed") # Checks and initializes class attributes if not hasattr(cls, 'token_base_class'): cls.token_base_class = Token if not hasattr(cls, 'literals_pattern'): cls.literals_pattern = re.compile( r"""'[^']*'|"[^"]*"|(?:\d+|\.\d+)(?:\.\d*)?(?:[Ee][+-]?\d+)?""" ) if not hasattr(cls, 'name_pattern'): cls.name_pattern = re.compile(r'[A-Za-z0-9_]+') if 'tokenizer' not in namespace: cls.tokenizer = None if 'symbol_table' not in namespace: cls.symbol_table = {} for base_class in bases: if hasattr(base_class, 'symbol_table'): cls.symbol_table.update(base_class.symbol_table) break return cls TK_co = TypeVar('TK_co', bound=Token[Any], covariant=True) RT = TypeVar('RT') class Parser(Generic[TK_co], metaclass=ParserMeta): """ Parser class for implementing a Top-Down Operator Precedence parser. :cvar symbol_table: a dictionary that stores the token classes defined for the language. :cvar token_base_class: the base class for creating language's token classes. :cvar tokenizer: the language tokenizer compiled regexp. """ token_base_class = Token tokenizer: Optional[Pattern[str]] = None symbol_table: Dict[str, Type[TK_co]] = {} _start_token: TK_co source: str tokens: Iterator[Match[str]] token: TK_co next_token: TK_co next_match: Optional[Match[str]] literals_pattern: Pattern[str] name_pattern: Pattern[str] __slots__ = 'source', 'tokens', 'next_match', '_start_token', 'token', 'next_token' def __init__(self) -> None: if self.tokenizer is None: self.build() self.source = '' self.tokens = iter(()) self.next_match = None self._start_token = self.symbol_table['(start)'](self) self.token = self.next_token = self._start_token def __repr__(self) -> str: return '<%s object at %#x>' % (self.__class__.__name__, id(self)) def __str__(self) -> str: return f'{self.__class__.__name__}()' def __eq__(self, other: object) -> bool: return isinstance(other, Parser) and \ self.token_base_class is other.token_base_class and \ self.symbol_table == other.symbol_table def parse(self, source: str) -> TK_co: """ Parses a source code of the formal language. This is the main method that has to be called for a parser's instance. :param source: The source string. :return: The root of the token's tree that parse the source. """ assert self.tokenizer, "Parser tokenizer is not built!" try: try: self.tokens = iter(self.tokenizer.finditer(source)) except TypeError as err: token = self.symbol_table['(invalid)'](self, source) raise token.wrong_syntax('invalid source type, {}'.format(err)) self.source = source self.advance() root_token = self.expression() self.next_token.expected('(end)') return root_token finally: self.tokens = iter(()) self.next_match = None self.token = self.next_token = self._start_token def advance(self, *symbols: str, message: Optional[str] = None) -> TK_co: """ The Pratt's function for advancing to next token. :param symbols: Optional arguments tuple. If not empty one of the provided \ symbols is expected. If the next token's symbol differs the parser raises a \ parse error. :param message: Optional custom message for unexpected symbols. :return: The current token instance. """ value: Any if self.next_token.symbol == '(end)': raise self.next_token.wrong_syntax() elif symbols and self.next_token.symbol not in symbols: raise self.next_token.wrong_syntax(message) self.token = self.next_token for self.next_match in self.tokens: assert self.next_match is not None if not self.next_match.group().isspace(): break else: self.next_token = self.symbol_table['(end)'](self) return self.token literal, symbol, name, unknown = self.next_match.groups() if symbol is not None: if symbol in self.symbol_table: self.next_token = self.symbol_table[symbol](self) elif self.name_pattern.match(symbol) is not None: self.next_token = self.symbol_table['(name)'](self, symbol) else: self.next_token = self.symbol_table['(unknown)'](self, symbol) raise self.next_token.wrong_syntax() elif literal is not None: if literal[0] in '\'"': value = self.unescape(literal) self.next_token = self.symbol_table['(string)'](self, value) elif 'e' in literal or 'E' in literal: try: value = float(literal) except ValueError as err: self.next_token = self.symbol_table['(invalid)'](self, literal) raise self.next_token.wrong_syntax(message=str(err)) else: self.next_token = self.symbol_table['(float)'](self, value) elif '.' in literal: try: value = Decimal(literal) except DecimalException as err: self.next_token = self.symbol_table['(invalid)'](self, literal) raise self.next_token.wrong_syntax(message=str(err)) else: self.next_token = self.symbol_table['(decimal)'](self, value) else: self.next_token = self.symbol_table['(integer)'](self, int(literal)) elif name is not None: self.next_token = self.symbol_table['(name)'](self, name) elif unknown is not None: self.next_token = self.symbol_table['(unknown)'](self, unknown) else: msg = "unexpected matching %r: incompatible tokenizer" raise RuntimeError(msg % self.next_match.group()) return self.token def advance_until(self, *stop_symbols: str) -> str: """ Advances until one of the symbols is found or the end of source is reached, returning the raw source string placed before. Useful for raw parsing of comments and references enclosed between specific symbols. :param stop_symbols: The symbols that have to be found for stopping advance. :return: The source string chunk enclosed between the initial position \ and the first stop symbol. """ if not stop_symbols: raise self.next_token.wrong_type("at least a stop symbol required!") elif self.next_token.symbol == '(end)': raise self.next_token.wrong_syntax() self.token = self.next_token source_chunk: List[str] = [] while True: try: self.next_match = next(self.tokens) except StopIteration: self.next_token = self.symbol_table['(end)'](self) break else: symbol = self.next_match.group(2) if symbol is not None: symbol = symbol.strip() if symbol not in stop_symbols: source_chunk.append(symbol) else: try: self.next_token = self.symbol_table[symbol](self) break except KeyError: self.next_token = self.symbol_table['(unknown)'](self) raise self.next_token.wrong_syntax() else: source_chunk.append(self.next_match.group()) return ''.join(source_chunk) def expression(self, rbp: int = 0) -> TK_co: """ Pratt's function for parsing an expression. It calls token.nud() and then advances until the right binding power is less the left binding power of the next token, invoking the led() method on the following token. :param rbp: right binding power for the expression. :return: left token. """ self.advance() left = self.token.nud() while rbp < self.next_token.lbp: self.advance() left = self.token.led(left) return cast(TK_co, left) @property def position(self) -> Tuple[int, int]: """Property that returns the current line and column indexes.""" return self.token.position def is_source_start(self) -> bool: """ Returns `True` if the parser is positioned at the start of the source, ignoring the spaces. """ return self.token.is_source_start() def is_line_start(self) -> bool: """ Returns `True` if the parser is positioned at the start of a source line, ignoring the spaces. """ return self.token.is_line_start() def is_spaced(self, before: bool = True, after: bool = True) -> bool: """ Returns `True` if the source has an extra space (whitespace, tab or newline) immediately before or after the current position of the parser. :param before: if `True` considers also the extra spaces before \ the current token symbol. :param after: if `True` considers also the extra spaces after \ the current token symbol. """ return self.token.is_spaced(before, after) @staticmethod def unescape(string_literal: str) -> str: return string_literal[1:-1].replace("\\'", "'").replace('\\"', '"') @classmethod def register(cls, symbol: Union[str, Type[TK_co]], **kwargs: Any) -> Type[TK_co]: """ Register/update a token class in the symbol table. :param symbol: The identifier symbol for a new class or an existent token class. :param kwargs: Optional attributes/methods for the token class. :return: A token class. """ token_class: Type[TK_co] if isinstance(symbol, str): if ' ' in symbol: raise ValueError("%r: a symbol can't contain whitespaces" % symbol) lookup_name = kwargs.get('lookup_name', symbol) try: token_class = cls.symbol_table[lookup_name] except KeyError: # Register a new symbol and create a new custom class. The new token # class is registered globally in the module of the parser class. kwargs['symbol'] = symbol kwargs['lookup_name'] = lookup_name label = kwargs.get('label', 'symbol') if isinstance(label, tuple): label = kwargs['label'] = MultiLabel(*label) if 'class_name' in kwargs: token_class_name = kwargs.pop('class_name') else: token_class_name = "_%s%s" % ( _symbol_to_classname(symbol), str(label).title().replace(' ', '') ) token_class_bases = kwargs.get('bases', (cls.token_base_class,)) kwargs.update({ '__module__': cls.__module__, '__qualname__': token_class_name, '__return__': None }) token_class = cast( Type[TK_co], ABCMeta(token_class_name, token_class_bases, kwargs) ) cls.symbol_table[lookup_name] = token_class setattr(sys.modules[cls.__module__], token_class_name, token_class) elif not isinstance(symbol, type) or not issubclass(symbol, Token): raise TypeError("A string or a %r subclass requested, not %r." % (Token, symbol)) else: token_class = symbol if cls.symbol_table.get(symbol.lookup_name) is not token_class: raise ValueError("Token class %r is not registered." % token_class) for key, value in kwargs.items(): if key == 'lbp' and value > token_class.lbp: token_class.lbp = value elif key == 'rbp' and value > token_class.rbp: token_class.rbp = value elif callable(value): setattr(token_class, key, value) return token_class @classmethod def unregister(cls, symbol: str) -> None: """Unregister a token class from the symbol table.""" del cls.symbol_table[symbol.strip()] @classmethod def duplicate(cls, symbol: str, new_symbol: str, **kwargs: Any) -> Type[TK_co]: """Duplicate a token class with a new symbol.""" token_class = cls.symbol_table[symbol] new_token_class = cls.register(new_symbol, **kwargs) for key, value in token_class.__dict__.items(): if key in kwargs or key in ('symbol', 'pattern') or key.startswith('_'): continue setattr(new_token_class, key, value) return new_token_class @classmethod def literal(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *literal*.""" def nud(self: Token[TK_co]) -> Token[TK_co]: return self def evaluate(self: Token[TK_co], *_args: Any, **_kwargs: Any) -> Any: return self.value return cls.register(symbol, label='literal', lbp=bp, evaluate=evaluate, nud=nud) @classmethod def nullary(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *nullary* operator.""" def nud(self: Token[TK_co]) -> Token[TK_co]: return self return cls.register(symbol, label='operator', lbp=bp, nud=nud) @classmethod def prefix(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *prefix* unary operator.""" def nud(self: Token[TK_co]) -> Token[TK_co]: self[:] = self.parser.expression(rbp=bp), return self return cls.register(symbol, label='prefix operator', lbp=bp, rbp=bp, nud=nud) @classmethod def postfix(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents a *postfix* unary operator.""" def led(self: Token[TK_co], left: Token[TK_co]) -> Token[TK_co]: self[:] = left, return self return cls.register(symbol, label='postfix operator', lbp=bp, rbp=bp, led=led) @classmethod def infix(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents an *infix* binary operator.""" def led(self: Token[TK_co], left: Token[TK_co]) -> Token[TK_co]: self[:] = left, self.parser.expression(rbp=bp) return self return cls.register(symbol, label='operator', lbp=bp, rbp=bp, led=led) @classmethod def infixr(cls, symbol: str, bp: int = 0) -> Type[TK_co]: """Register a token for a symbol that represents an *infixr* binary operator.""" def led(self: Token[TK_co], left: Token[TK_co]) -> Token[TK_co]: self[:] = left, self.parser.expression(rbp=bp - 1) return self return cls.register(symbol, label='operator', lbp=bp, rbp=bp - 1, led=led) @classmethod def method(cls, symbol: Union[str, Type[TK_co]], bp: int = 0) \ -> Callable[[Callable[..., RT]], Callable[..., RT]]: """ Register a token for a symbol that represents a custom operator or redefine a method for an existing token. """ token_class = cls.register(symbol, label='operator', lbp=bp, rbp=bp) def bind(func: Callable[..., Any]) -> Callable[..., Any]: method_name = func.__name__.partition('_')[0] if not callable(getattr(token_class, method_name)): raise TypeError(f"{method_name!r} is not a method of {token_class}") setattr(token_class, method_name, func) return func return bind @classmethod def build(cls) -> None: """ Builds the parser class. Checks if all declared symbols are defined and builds the regex tokenizer using the symbol related patterns. """ # Register a minimal set of special tokens if '(start)' not in cls.symbol_table: cls.register('(start)') if '(end)' not in cls.symbol_table: cls.register('(end)') if '(invalid)' not in cls.symbol_table: cls.register('(invalid)') if '(unknown)' not in cls.symbol_table: cls.register('(unknown)') cls.tokenizer = cls.create_tokenizer(cls.symbol_table) @classmethod def create_tokenizer(cls, symbol_table: MutableMapping[str, Type[TK_co]]) -> Pattern[str]: """ Returns a regex based tokenizer built from a symbol table of token classes. The returned tokenizer skips extra spaces between symbols. A regular expression is created from the symbol table of the parser using a template. The symbols are inserted in the template putting the longer symbols first. Symbols and their patterns can't contain spaces. :param symbol_table: a dictionary containing the token classes of the formal language. """ character_patterns = [] string_patterns = [] name_patterns = [] custom_patterns = set() for symbol, token_class in symbol_table.items(): if symbol in SPECIAL_SYMBOLS: continue elif token_class.pattern is not None: custom_patterns.add(token_class.pattern) elif cls.name_pattern.match(symbol) is not None: name_patterns.append(re.escape(symbol)) elif len(symbol) == 1: character_patterns.append(re.escape(symbol)) else: string_patterns.append(re.escape(symbol)) symbols_patterns: List[str] = [] if string_patterns: symbols_patterns.append('|'.join(sorted(string_patterns, key=lambda x: -len(x)))) if character_patterns: symbols_patterns.append('[{}]'.format(''.join(character_patterns))) if name_patterns: symbols_patterns.append(r'\b(?:{})\b(?![\-\.])'.format( '|'.join(sorted(name_patterns, key=lambda x: -len(x))) )) if custom_patterns: symbols_patterns.append('|'.join(custom_patterns)) tokenizer_pattern = r"({})|({})|({})|(\S)|\s+".format( cls.literals_pattern.pattern, '|'.join(symbols_patterns), cls.name_pattern.pattern ) return re.compile(tokenizer_pattern) sissaschool-elementpath-d3688c7/elementpath/tree_builders.py000066400000000000000000000427001476131650400244420ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import cast, Any, List, Optional, TYPE_CHECKING, Union from elementpath._typing import Iterator from elementpath.aliases import NamespacesType, NsmapType from elementpath.exceptions import ElementPathTypeError from elementpath.protocols import LxmlElementProtocol, DocumentProtocol, \ LxmlDocumentProtocol, XsdElementProtocol, DocumentType, ElementType, \ SchemaElemType from elementpath.etree import is_etree_document, is_etree_element, is_etree_element_instance from elementpath.xpath_nodes import ChildNodeType, ElementMapType, TextNode, \ ElementNode, SchemaElementNode, DocumentNode, RootNodeType, RootArgType, \ EtreeElementNode, EtreeDocumentNode, CommentNode, ProcessingInstructionNode if TYPE_CHECKING: from elementpath.schema_proxy import AbstractSchemaProxy __all__ = ['get_node_tree', 'build_node_tree', 'build_lxml_node_tree', 'build_schema_node_tree'] ElementTreeRootType = Union[DocumentType, ElementType] LxmlRootType = Union[LxmlDocumentProtocol, LxmlElementProtocol] def is_schema(obj: Any) -> bool: return hasattr(obj, 'xsd_version') and hasattr(obj, 'maps') and not hasattr(obj, 'parent') def get_node_tree(root: RootArgType, namespaces: Optional[NamespacesType] = None, uri: Optional[str] = None, fragment: Optional[bool] = None) -> RootNodeType: """ Returns a tree of XPath nodes that wrap the provided root tree. :param root: an Element or an ElementTree or a schema or a schema element. :param namespaces: an optional mapping from prefixes to namespace URIs, \ Ignored if root is a lxml etree or a schema structure. :param uri: an optional URI associated with the root element or the document. :param fragment: if `True` is provided the root is considered a fragment. In this \ case if `root` is an ElementTree instance skips it and use the root Element. If \ `False` is provided creates a dummy document when the root is an Element instance. \ For default the root node kind is preserved. """ root_node: RootNodeType if isinstance(root, (DocumentNode, ElementNode)): if uri is not None and root.uri is None: root.uri = uri if fragment: if isinstance(root, DocumentNode): root_node = root.getroot() if root_node.uri is None: root_node.uri = root.uri return root_node elif fragment is False and \ isinstance(root, ElementNode) and \ is_etree_element_instance(root.obj): return root.get_document_node(replace=False) return root if not is_etree_document(root) and \ (not is_etree_element(root) or callable(cast(ElementType, root).tag)): msg = "invalid root {!r}, an Element or an ElementTree or a schema node required" raise ElementPathTypeError(msg.format(root)) elif hasattr(root, 'xpath'): # a lxml element tree data return build_lxml_node_tree( cast(LxmlRootType, root), uri, fragment ) elif hasattr(root, 'xsd_version') and hasattr(root, 'maps'): # a schema or a schema node return build_schema_node_tree( cast(SchemaElemType, root), uri ) else: return build_node_tree( cast(ElementTreeRootType, root), namespaces, uri, fragment ) def build_node_tree(root: ElementTreeRootType, namespaces: Optional[NamespacesType] = None, uri: Optional[str] = None, fragment: Optional[bool] = None, schema: Optional['AbstractSchemaProxy'] = None) -> RootNodeType: """ Returns a tree of XPath nodes that wrap the provided root tree. :param root: an Element or an ElementTree. :param namespaces: an optional mapping from prefixes to namespace URIs. :param uri: an optional URI associated with the document or the root element. :param fragment: if `True` is provided the root is considered a fragment. In this \ case if `root` is an ElementTree instance skips it and use the root Element. If \ `False` is provided creates a dummy document when the root is an Element instance. \ For default the root node kind is preserved. :param schema: an optional schema proxy instance for applying XSD type annotations \ on element and attribute nodes. """ elem: ElementType parent: Any elements: Any child: ChildNodeType children: Iterator[Any] document: Optional[DocumentProtocol] position = 1 nsmap: Optional[NsmapType] if namespaces: nsmap = {k: v for k, v in namespaces.items()} elem_pos_offset = len(namespaces) + int('xml' not in namespaces) + 1 else: nsmap = {} elem_pos_offset = 2 if hasattr(root, 'parse'): document = cast(DocumentProtocol, root) root_elem = document.getroot() else: document = None root_elem = root if fragment and root_elem is not None: document = None # Explicitly requested a fragment: don't create a document node if document is not None: document_node = EtreeDocumentNode(document, uri, position) position += 1 if root_elem is None: return document_node elem = root_elem root_node = EtreeElementNode(elem, document_node, position, nsmap) elements = document_node.elements document_node.children.append(root_node) else: assert root_elem is not None document_node = None elem = root_elem root_node = EtreeElementNode(elem, None, position, nsmap) root_node.elements = elements = {} if uri is not None: root_node.uri = uri # Complete the root element node build elements[elem] = root_node position += elem_pos_offset + len(elem.attrib) if elem.text is not None: root_node.children.append(TextNode(elem.text, root_node, position)) position += 1 children = iter(elem) iterators: List[Any] = [] ancestors: List[Any] = [] parent = root_node while True: for elem in children: if not callable(elem.tag): child = EtreeElementNode(elem, parent, position, nsmap) position += elem_pos_offset + len(elem.attrib) if elem.text is not None: child.children.append(TextNode(elem.text, child, position)) position += 1 elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, parent, position) position += 1 else: child = ProcessingInstructionNode(elem, None, parent, position) position += 1 elements[elem] = child parent.children.append(child) if len(elem): ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break if elem.tail is not None: parent.children.append(TextNode(elem.tail, parent, position)) position += 1 else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: if document_node is not None: return document_node elif fragment is False and \ isinstance(root_node, ElementNode) and \ is_etree_element_instance(root_node.elem): return root_node.get_document_node(replace=False) else: return root_node else: if (tail := parent.children[-1].elem.tail) is not None: parent.children.append(TextNode(tail, parent, position)) position += 1 def build_lxml_node_tree(root: LxmlRootType, uri: Optional[str] = None, fragment: Optional[bool] = None) -> RootNodeType: """ Returns a tree of XPath nodes that wrap the provided lxml root tree. :param root: a lxml Element or a lxml ElementTree. :param uri: an optional URI associated with the document or the root element. :param fragment: if `True` is provided the root is considered a fragment. In this \ case if `root` is an ElementTree instance skips it and use the root Element. If \ `False` is provided creates a dummy document when the root is an Element instance. \ For default the root node kind is preserved. """ root_node: RootNodeType document: Optional[LxmlDocumentProtocol] parent: Any elements: Any child: ChildNodeType children: Iterator[Any] position = 1 if fragment: document = None # Explicitly requested a fragment: don't create a document node elif hasattr(root, 'parse'): document = cast(LxmlDocumentProtocol, root) elif fragment is False or root.getparent() is None and ( any(True for _sibling in root.itersiblings(preceding=True)) or any(True for _sibling in root.itersiblings())): # Despite a document is not explicitly requested create a dummy document # because the root element has siblings document = root.getroottree() else: document = None if document is not None: document_node = EtreeDocumentNode(document, uri, position) elements = document_node.elements position += 1 root_elem = document.getroot() if root_elem is None: return document_node # Add root siblings (comments and processing instructions) for elem in reversed([x for x in root_elem.itersiblings(preceding=True)]): if elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, document_node, position) else: child = ProcessingInstructionNode(elem, None, document_node, position) elements[elem] = child document_node.children.append(child) position += 1 root_node = EtreeElementNode(root_elem, document_node, position, root_elem.nsmap) document_node.children.append(root_node) else: if hasattr(root, 'parse'): root_elem = cast(LxmlDocumentProtocol, root).getroot() else: root_elem = root if root_elem is None: if fragment: msg = "requested a fragment of an empty ElementTree document" else: msg = "root argument is neither an lxml ElementTree nor a lxml Element" raise ElementPathTypeError(msg) document_node = None root_node = EtreeElementNode(root_elem, None, position, root_elem.nsmap) root_node.elements = elements = {} if uri is not None: root_node.uri = uri # Complete the root element node build elements[root_elem] = root_node if 'xml' in root_elem.nsmap: position += len(root_elem.nsmap) + len(root_elem.attrib) + 1 else: position += len(root_elem.nsmap) + len(root_elem.attrib) + 2 if root_elem.text is not None: root_node.children.append(TextNode(root_elem.text, root_node, position)) position += 1 children = iter(root_elem) iterators: List[Any] = [] ancestors: List[Any] = [] parent = root_node while True: for elem in children: if not callable(elem.tag): child = EtreeElementNode(elem, parent, position, elem.nsmap) if 'xml' in elem.nsmap: position += len(elem.nsmap) + len(elem.attrib) + 1 else: position += len(elem.nsmap) + len(elem.attrib) + 2 if elem.text is not None: child.children.append(TextNode(elem.text, child, position)) position += 1 elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, parent, position) position += 1 else: child = ProcessingInstructionNode(elem, None, parent, position) position += 1 elements[elem] = child parent.children.append(child) if len(elem): ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break if elem.tail is not None: parent.children.append(TextNode(elem.tail, parent, position)) position += 1 else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: if document_node is None: return root_node # Add root following siblings (comments and processing instructions) for elem in root_elem.itersiblings(): if elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, document_node, position) else: child = ProcessingInstructionNode(elem, None, document_node, position) elements[elem] = child document_node.children.append(child) position += 1 return document_node else: if (tail := parent.children[-1].elem.tail) is not None: parent.children.append(TextNode(tail, parent, position)) position += 1 def build_schema_node_tree(root: SchemaElemType, uri: Optional[str] = None, elements: Optional[ElementMapType] = None, global_elements: Optional[List[ChildNodeType]] = None) \ -> SchemaElementNode: """ Returns a graph of XPath nodes that wrap the provided XSD schema structure. The elements dictionary is shared between all nodes to keep all of them, globals and local, linked in a single structure. :param root: a schema or a schema element. :param uri: an optional URI associated with the root element. :param elements: a shared map from XSD elements to tree nodes. Provided for \ linking together parts of the same schema or other schemas. :param global_elements: a list for schema global elements, used for linking \ the elements declared by reference. """ parent: Any elem: Any child: SchemaElementNode children: Iterator[Any] position = 1 _elements = {} if elements is None else elements nsmap: Optional[NsmapType] = getattr(root, 'namespaces', None) if nsmap: elem_pos_offset = len(nsmap) + int('xml' not in nsmap) + 1 else: elem_pos_offset = 2 root_node = SchemaElementNode(root, None, position, nsmap) _elements[root] = root_node root_node.elements = _elements position += elem_pos_offset + len(root.attrib) if uri is not None: root_node.uri = uri if global_elements is not None: global_elements.append(root_node) elif is_schema(root): global_elements = root_node.children else: # Track global elements even if the initial root is not a schema to avoid circularity global_elements = [] local_nodes = {root: root_node} # Irrelevant even if it's the schema ref_nodes: List[SchemaElementNode] = [] children = iter(root) iterators: List[Any] = [] ancestors: List[Any] = [] parent = root_node while True: for elem in children: child = SchemaElementNode(elem, parent, position, elem.namespaces) position += elem_pos_offset + len(elem.attrib) _elements[elem] = child child.elements = _elements parent.children.append(child) if elem in local_nodes: if elem.ref is None: child.children = local_nodes[elem].children else: ref_nodes.append(child) else: local_nodes[elem] = child if elem.ref is None: ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break else: ref_nodes.append(child) else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: # connect references to proper nodes for element_node in ref_nodes: elem = element_node.elem ref = cast(XsdElementProtocol, elem.ref) other: Any for other in global_elements: if other.elem is ref: element_node.ref = other break else: # Extend node tree with other globals element_node.ref = build_schema_node_tree( ref, elements=_elements, global_elements=global_elements ) return root_node sissaschool-elementpath-d3688c7/elementpath/validators/000077500000000000000000000000001476131650400234055ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/validators/__init__.py000066400000000000000000000034421476131650400255210ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Subpackage for validating against XPath standard schemas. """ import pathlib from xml.etree.ElementTree import Element from typing import Optional from elementpath.exceptions import ElementPathRuntimeError try: import xmlschema except ImportError: # pragma: no cover from ..exceptions import xpath_error def validate_analyzed_string(root: Element) -> None: raise ElementPathRuntimeError('not schema-aware') def validate_json_to_xml(root: Element) -> None: raise xpath_error('FOJS0004') else: from ..namespaces import XPATH_FUNCTIONS_NAMESPACE analyzed_string_schema: Optional[xmlschema.XMLSchemaBase] = None json_to_xml_schema: Optional[xmlschema.XMLSchemaBase] = None __all__ = ['validate_analyzed_string', 'validate_json_to_xml'] def validate_analyzed_string(root: Element) -> None: global analyzed_string_schema if analyzed_string_schema is None: xsd_file = pathlib.Path(__file__).parent.joinpath('analyze-string.xsd') analyzed_string_schema = xmlschema.XMLSchema(xsd_file) analyzed_string_schema.validate(root) def validate_json_to_xml(root: Element) -> None: global json_to_xml_schema if json_to_xml_schema is None: xsd_file = pathlib.Path(__file__).parent.joinpath('schema-for-json.xsd') json_to_xml_schema = xmlschema.XMLSchema(xsd_file) json_to_xml_schema.validate(root, namespaces={'j': XPATH_FUNCTIONS_NAMESPACE}) sissaschool-elementpath-d3688c7/elementpath/validators/analyze-string.xsd000066400000000000000000000030541476131650400270760ustar00rootroot00000000000000 sissaschool-elementpath-d3688c7/elementpath/validators/schema-for-json.xsd000066400000000000000000000124151476131650400271230ustar00rootroot00000000000000 sissaschool-elementpath-d3688c7/elementpath/xpath1/000077500000000000000000000000001476131650400224425ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/xpath1/__init__.py000066400000000000000000000007711476131650400245600ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath1_parser import XPath1Parser else: from ._xpath1_axes import XPath1Parser __all__ = ['XPath1Parser'] sissaschool-elementpath-d3688c7/elementpath/xpath1/_xpath1_axes.py000066400000000000000000000115201476131650400253770ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 1.0 implementation - part 4 (axes) """ from typing import cast from elementpath._typing import Iterator from elementpath.xpath_nodes import ChildNodeType, AttributeNode, ElementNode, \ NamespaceNode, XPathNode, ParentNodeType from elementpath.xpath_context import ContextType, ItemType, XPathSchemaContext from elementpath.xpath_tokens import XPathAxis from ._xpath1_functions import XPath1Parser register = XPath1Parser.register method = XPath1Parser.method axis = XPath1Parser.axis @method(register('@', lbp=80, rbp=80, label="attribute reference")) def nud_attribute_reference(self: XPathAxis) -> XPathAxis: self.parser.expected_next( '*', '(name)', ':', '{', 'Q{', message="invalid attribute specification") self[:] = self.parser.expression(rbp=80), return self @method('@') @method(axis('attribute')) def select_attribute_reference_or_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[AttributeNode]: if context is None: raise self.missing_context() for _ in context.iter_attributes(): yield from cast(Iterator[AttributeNode], self[0].select(context)) @method(axis('namespace')) def select_namespace_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[NamespaceNode]: if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return # deprecated for XP20+ and not needed for schema analysis elif isinstance(context.item, ElementNode): elem = context.item if self[0].symbol != 'namespace-node': name = self[0].value else: name = '*' for context.item in elem.namespace_nodes: if name == '*' or name == context.item.prefix: yield context.item @method(axis('self')) def select_self_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[ItemType]: if context is None: raise self.missing_context() else: for _ in context.iter_self(): yield from self[0].select(context) @method(axis('child')) def select_child_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[ItemType]: if context is None: raise self.missing_context() else: for _ in context.iter_children_or_self(): yield from self[0].select(context) @method(axis('parent', reverse_axis=True)) def select_parent_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[ParentNodeType]: if context is None: raise self.missing_context() else: for _ in context.iter_parent(): yield from cast(Iterator[ParentNodeType], self[0].select(context)) @method(axis('following-sibling')) @method(axis('preceding-sibling', reverse_axis=True)) def select_sibling_axes(self: XPathAxis, context: ContextType = None) \ -> Iterator[ChildNodeType]: if context is None: raise self.missing_context() else: for _ in context.iter_siblings(axis=self.symbol): yield from cast(Iterator[ChildNodeType], self[0].select(context)) @method(axis('ancestor', reverse_axis=True)) @method(axis('ancestor-or-self', reverse_axis=True)) def select_ancestor_axes(self: XPathAxis, context: ContextType = None) \ -> Iterator[ParentNodeType]: if context is None: raise self.missing_context() else: for _ in context.iter_ancestors(axis=self.symbol): yield from cast(Iterator[ParentNodeType], self[0].select(context)) @method(axis('descendant')) @method(axis('descendant-or-self')) def select_descendant_axes(self: XPathAxis, context: ContextType = None) \ -> Iterator[XPathNode]: if context is None: raise self.missing_context() else: for _ in context.iter_descendants(axis=self.symbol): yield from cast(Iterator[XPathNode], self[0].select(context)) @method(axis('following')) def select_following_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[ChildNodeType]: if context is None: raise self.missing_context() else: for _ in context.iter_followings(): yield from cast(Iterator[ChildNodeType], self[0].select(context)) @method(axis('preceding', reverse_axis=True)) def select_preceding_axis(self: XPathAxis, context: ContextType = None) \ -> Iterator[ChildNodeType]: if context is None: raise self.missing_context() else: for _ in context.iter_preceding(): yield from cast(Iterator[ChildNodeType], self[0].select(context)) sissaschool-elementpath-d3688c7/elementpath/xpath1/_xpath1_functions.py000066400000000000000000000464161476131650400264630ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 1.0 implementation - part 3 (functions) """ import math import decimal from typing import Any, List, Union from elementpath._typing import Iterator from elementpath.aliases import Emptiable from elementpath.helpers import get_double from elementpath.datatypes import Duration, DayTimeDuration, YearMonthDuration, \ StringProxy, AnyURI, Float10, AnyAtomicType, AtomicType, NumericType from elementpath.namespaces import XML_ID, XML_LANG from elementpath.xpath_nodes import XPathNode, ElementNode, TextNode, CommentNode, \ ProcessingInstructionNode, DocumentNode, EtreeElementNode from elementpath.xpath_context import ContextType, XPathSchemaContext from elementpath.xpath_tokens import XPathFunction from ._xpath1_operators import XPath1Parser __all__ = ['XPath1Parser'] method = XPath1Parser.method function = XPath1Parser.function ### # Kind tests (for matching of node types in XPath 1.0 or sequence types in XPath 2.0) @method(function('node', nargs=0, label='kind test')) def select_node_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[XPathNode]: if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, XPathNode): if not isinstance(item, DocumentNode) or item is context.root: yield item @method('node') def nud_item_sequence_type(self: XPathFunction) -> XPathFunction: XPathFunction.nud(self) if self.parser.next_token.symbol in ('*', '+', '?'): self.occurrence = self.parser.next_token.symbol self.parser.advance() return self @method(function('processing-instruction', nargs=(0, 1), bp=79, label='kind test')) def select_pi_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[ProcessingInstructionNode]: if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, ProcessingInstructionNode): if not self: yield item else: name = self[0].value assert isinstance(name, str) if item.name == ' '.join(name.strip().split()): yield item @method('processing-instruction') def nud_pi_kind_test(self: XPathFunction) -> XPathFunction: self.parser.advance('(') if self.parser.next_token.symbol != ')': self.parser.next_token.expected('(name)', '(string)') self[0:] = self.parser.expression(5), self.parser.advance(')') return self @method(function('comment', nargs=0, label='kind test')) def select_comment_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[CommentNode]: if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, CommentNode): yield item @method(function('text', nargs=0, label='kind test')) def select_text_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[TextNode]: if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if isinstance(item, TextNode): yield item ### # Node set functions @method(function('last', nargs=0, sequence_types=('xs:integer',))) def evaluate_last_function(self: XPathFunction, context: ContextType = None) -> int: if self.context is not None: context = self.context elif context is None: raise self.missing_context() return context.size @method(function('position', nargs=0, sequence_types=('xs:integer',))) def evaluate_position_function(self: XPathFunction, context: ContextType = None) -> int: if self.context is not None: context = self.context elif context is None: raise self.missing_context() return context.position @method(function('count', nargs=1, sequence_types=('item()*', 'xs:integer'))) def evaluate_count_function(self: XPathFunction, context: ContextType = None) -> int: return len([x for x in self[0].select(self.context or context)]) @method(function('id', nargs=1, sequence_types=('xs:string*', 'element()*'))) def select_id_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ElementNode]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() value = self[0].evaluate(context) item = context.item if item is None: item = context.root if isinstance(item, (ElementNode, DocumentNode)): for element in item.iter_descendants(): if isinstance(element, EtreeElementNode) and element.obj.get(XML_ID) == value: yield element @method(function('name', nargs=(0, 1), sequence_types=('node()?', 'xs:string'))) @method(function('local-name', nargs=(0, 1), sequence_types=('node()?', 'xs:string'))) @method(function('namespace-uri', nargs=(0, 1), sequence_types=('node()?', 'xs:anyURI'))) def evaluate_name_related_functions(self: XPathFunction, context: ContextType = None) \ -> Union[str, AnyURI]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() arg = self.get_argument(context, default_to_context=True) if arg is None: return '' elif not isinstance(arg, XPathNode): raise self.error('XPTY0004') name = arg.name if name is None: return '' symbol = self.symbol if symbol == 'name': node_name = arg.node_name if node_name is None: return '' return node_name.qname elif symbol == 'local-name': return name if not name or name[0] != '{' else name.split('}')[1] elif self.parser.version == '1.0': return '' if not name or name[0] != '{' else name.split('}')[0][1:] else: return AnyURI('') if not name or name[0] != '{' else AnyURI(name.split('}')[0][1:]) ### # String functions @method(function('string', nargs=(0, 1), sequence_types=('item()?', 'xs:string'))) def evaluate_string_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context if not self: if context is None: raise self.missing_context() return self.string_value(context.item) return self.string_value(self.get_argument(context)) @method(function('contains', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:boolean'))) def evaluate_contains_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) return arg2 in arg1 @method(function('concat', nargs=(2, None), sequence_types=('xs:anyAtomicType?', 'xs:anyAtomicType?', 'xs:string'))) def evaluate_concat_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context return ''.join( self.string_value(self.get_argument(context, index=k)) for k in range(len(self)) ) @method(function('string-length', nargs=(0, 1), sequence_types=('xs:string?', 'xs:integer'))) def evaluate_string_length_function(self: XPathFunction, context: ContextType = None) -> int: if self.context is not None: context = self.context if self: return len(self.get_argument(context, default_to_context=True, default='', cls=str)) elif context is None: raise self.missing_context() else: return len(self.string_value(context.item)) @method(function('normalize-space', nargs=(0, 1), sequence_types=('xs:string?', 'xs:string'))) def evaluate_normalize_space_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context if self.parser.version == '1.0' or not self: arg = self.string_value(self.get_argument(context, default_to_context=True, default='')) else: arg = self.get_argument(context, default_to_context=True, default='', cls=str) return ' '.join(arg.strip().split()) @method(function('starts-with', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:boolean'))) def evaluate_starts_with_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context arg1: str = self.get_argument(context, default='', cls=str) arg2: str = self.get_argument(context, index=1, default='', cls=str) return arg1.startswith(arg2) @method(function('translate', nargs=3, sequence_types=('xs:string?', 'xs:string', 'xs:string', 'xs:string'))) def evaluate_translate_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context arg: str = self.get_argument(context, default='', cls=str) map_string: str = self.get_argument(context, index=1, cls=str) if map_string is None: message = "the 2nd argument of fn:translate() cannot be the empty sequence" raise self.error('XPTY0004', message) trans_string: str = self.get_argument(context, index=2, cls=str) if trans_string is None: message = "the 3rd argument of fn:translate() cannot be the empty sequence" raise self.error('XPTY0004', message) if len(map_string) == len(trans_string): return arg.translate(str.maketrans(map_string, trans_string)) elif len(map_string) > len(trans_string): k = len(trans_string) return arg.translate(str.maketrans(map_string[:k], trans_string, map_string[k:])) else: return arg.translate(str.maketrans(map_string, trans_string[:len(map_string)])) @method(function('substring', nargs=(2, 3), sequence_types=('xs:string?', 'xs:double', 'xs:double', 'xs:string'))) def evaluate_substring_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context item: str = self.get_argument(context, default='', cls=str) try: start = self.get_argument(context, index=1, required=True) if math.isnan(start) or math.isinf(start): return '' except TypeError: if isinstance(context, XPathSchemaContext): start = 0 else: raise self.error('FORG0006', "the second argument must be xs:numeric") from None else: start = int(round(start)) - 1 if len(self) == 2: return item[max(start, 0):] else: try: length = self.get_argument(context, index=2, required=True) if math.isnan(length) or length <= 0: return '' except TypeError: if isinstance(context, XPathSchemaContext): length = len(item) else: raise self.error('FORG0006', "the third argument must be xs:numeric") from None if math.isinf(length): return item[max(start, 0):] else: stop = start + int(round(length)) return item[slice(max(start, 0), max(stop, 0))] @method(function('substring-before', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:string'))) @method(function('substring-after', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:string'))) def evaluate_substring_before_or_after_functions( self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context arg1: str = self.get_argument(context, default='', cls=str) arg2: str = self.get_argument(context, index=1, default='', cls=str) index = arg1.find(arg2) if index < 0: return '' if self.symbol == 'substring-before': return arg1[:index] else: return arg1[index + len(arg2):] ### # Boolean functions @method(function('boolean', nargs=1, sequence_types=('item()*', 'xs:boolean'))) def evaluate_boolean_function(self: XPathFunction, context: ContextType = None) -> bool: return self.boolean_value(self[0].select(self.context or context)) @method(function('not', nargs=1, sequence_types=('item()*', 'xs:boolean'))) def evaluate_not_function(self: XPathFunction, context: ContextType = None) -> bool: return not self.boolean_value(self[0].select(self.context or context)) @method(function('true', nargs=0, sequence_types=('xs:boolean',))) def evaluate_true_function(self: XPathFunction, context: ContextType = None) -> bool: return True @method(function('false', nargs=0, sequence_types=('xs:boolean',))) def evaluate_false_function(self: XPathFunction, context: ContextType = None) -> bool: return False @method(function('lang', nargs=1, sequence_types=('xs:string?', 'xs:boolean'))) def evaluate_lang_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context elif context is None: raise self.missing_context() if not isinstance(context.item, EtreeElementNode): return False else: try: attr = context.item.obj.attrib[XML_LANG] except KeyError: for e in context.iter_ancestors(): if isinstance(e, EtreeElementNode) and XML_LANG in e.obj.attrib: lang = e.obj.attrib[XML_LANG] if not isinstance(lang, str): return False break else: return False else: if not isinstance(attr, str): return False lang = attr.strip() if '-' in lang: lang, _ = lang.split('-') value = self[0].evaluate() if not isinstance(value, str): return False return lang.lower() == value.lower() ### # Number functions @method(function('number', nargs=(0, 1), sequence_types=('xs:anyAtomicType?', 'xs:double'))) def evaluate_number_function(self: XPathFunction, context: ContextType = None) -> float: arg = self.get_argument(self.context or context, default_to_context=True) return self.number_value(arg) @method(function('sum', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:anyAtomicType?', 'xs:anyAtomicType?'))) def evaluate_sum_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AtomicType]: if self.context is not None: context = self.context xsd_version = self.parser.xsd_version values: List[Any] try: values = [get_double(self.string_value(x), xsd_version) if isinstance(x, XPathNode) else x for x in self[0].select_flatten(context)] except (TypeError, ValueError): if self.parser.version == '1.0': return math.nan elif isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006') from None if not values: zero = 0 if len(self) == 1 else self.get_argument(context, index=1) return [] if zero is None else zero if all(isinstance(x, (decimal.Decimal, int)) for x in values): result = sum(values) if len(values) > 1 else values[0] elif all(isinstance(x, DayTimeDuration) for x in values) or \ all(isinstance(x, YearMonthDuration) for x in values): result = sum(values[1:], start=values[0]) elif any(isinstance(x, Duration) for x in values): raise self.error('FORG0006', 'invalid sum of duration values') elif any(isinstance(x, (StringProxy, AnyURI)) for x in values): raise self.error('FORG0006', 'cannot apply fn:sum() to string-based types') elif any(isinstance(x, float) and math.isnan(x) for x in values): return math.nan elif all(isinstance(x, Float10) for x in values): result = sum(values) else: try: result = sum(self.number_value(x) for x in values) except TypeError: if self.parser.version == '1.0': return math.nan elif isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006') from None assert isinstance(result, AnyAtomicType) return result @method(function('ceiling', nargs=1, sequence_types=('xs:numeric?', 'xs:numeric?'))) @method(function('floor', nargs=1, sequence_types=('xs:numeric?', 'xs:numeric?'))) def evaluate_ceiling_and_floor_functions(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NumericType]: if self.context is not None: context = self.context arg = self.get_argument(context) if arg is None: return math.nan if self.parser.version == '1.0' else [] elif isinstance(arg, XPathNode) or self.parser.compatibility_mode: arg = self.number_value(arg) try: if math.isnan(arg) or math.isinf(arg): assert isinstance(arg, (int, float, decimal.Decimal)) return arg assert isinstance(arg, (int, float, decimal.Decimal)) if self.symbol == 'floor': return type(arg)(math.floor(arg)) else: return type(arg)(math.ceil(arg)) except TypeError as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(arg, str): raise self.error('XPTY0004', err) from None raise self.error('FORG0006', err) from None @method(function('round', nargs=1, sequence_types=('xs:numeric?', 'xs:numeric?'))) def evaluate_round_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NumericType]: if self.context is not None: context = self.context arg = self.get_argument(context) if arg is None: return math.nan if self.parser.version == '1.0' else [] elif isinstance(arg, XPathNode) or self.parser.compatibility_mode: arg = self.number_value(arg) if isinstance(arg, float) and (math.isnan(arg) or math.isinf(arg)): return arg try: number = decimal.Decimal(arg) assert isinstance(arg, (int, float, decimal.Decimal)) if number > 0: return type(arg)(number.quantize(decimal.Decimal('1'), rounding='ROUND_HALF_UP')) else: return type(arg)(number.quantize(decimal.Decimal('1'), rounding='ROUND_HALF_DOWN')) except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006', err) from None except decimal.InvalidOperation: if not isinstance(arg, str): assert isinstance(arg, (int, float, decimal.Decimal)) return round(arg) elif isinstance(context, XPathSchemaContext): return [] raise self.error('XPTY0004') from None except decimal.DecimalException as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FOCA0002', err) from None # XPath 1.0 definitions continue into module xpath1_axes sissaschool-elementpath-d3688c7/elementpath/xpath1/_xpath1_operators.py000066400000000000000000000650561476131650400264720ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 1.0 implementation - part 2 (operators and expressions) """ import math import decimal import operator from copy import copy from typing import Any, cast, List, NoReturn, Optional, Set, Type, Union from elementpath._typing import Iterator, Sequence from elementpath.exceptions import ElementPathKeyError, ElementPathTypeError from elementpath.helpers import collapse_white_spaces, node_position from elementpath.datatypes import AbstractDateTime, AnyURI, Duration, DayTimeDuration, \ YearMonthDuration, NumericProxy, ArithmeticProxy, NumericType, ArithmeticType from elementpath.xpath_context import ContextType, ItemType, XPathSchemaContext from elementpath.namespaces import XMLNS_NAMESPACE, XSD_NAMESPACE from elementpath.xpath_nodes import ParentNodeType, XPathNode, \ ElementNode, AttributeNode, DocumentNode from elementpath.xpath_tokens import XPathParserType, XPathToken, XPathTokenType from .xpath1_parser import XPath1Parser __all__ = ['XPath1Parser'] OPERATORS_MAP = { '=': operator.eq, '!=': operator.ne, '>': operator.gt, '>=': operator.ge, '<': operator.lt, '<=': operator.le, } register = XPath1Parser.register nullary = XPath1Parser.nullary infix = XPath1Parser.infix method = XPath1Parser.method @method(register('(name)', bp=10, label='literal')) def nud_name_literal(self: XPathToken) -> XPathToken: if self.parser.next_token.symbol == '::': msg = "axis '%s::' not found" % self.value if self.parser.compatibility_mode: raise self.error('XPST0010', msg) raise self.error('XPST0003', msg) elif self.parser.next_token.symbol == '(': if self.parser.version >= '2.0': pass # XP30+ has led() for '(' operator that can check this elif self.namespace == XSD_NAMESPACE: raise self.error('XPST0017', 'unknown constructor function {!r}'.format(self.value)) elif self.namespace or self.value not in self.parser.RESERVED_FUNCTION_NAMES: raise self.error('XPST0017', 'unknown function {!r}'.format(self.value)) else: msg = f"{self.value!r} is not allowed as function name" raise self.error('XPST0003', msg) return self @method('(name)') def evaluate_name_literal(self: XPathToken, context: ContextType = None) \ -> List[ItemType]: return [x for x in self.select(context)] @method('(name)') def select_name_literal(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: if context is None: raise self.missing_context() if isinstance(self.value, str): yield from context.iter_matching_nodes(self.value, self.parser.default_namespace) ### # Prefixed reference (name or function) class _PrefixedReferenceToken(XPathToken): symbol = lookup_name = ':' lbp = 95 rbp = 95 value: str def __init__(self, parser: XPathParserType, value: Optional[Any] = None) -> None: super().__init__(parser, value) # Change bind powers if it cannot be a namespace related token if self.is_spaced(): self.lbp = self.rbp = 0 elif self.parser.token.symbol not in ('*', '(name)', 'array'): self.lbp = self.rbp = 0 def __str__(self) -> str: if len(self) < 2: return 'unparsed prefixed reference' elif self[1].label.endswith('function'): return f"{self.value!r} {self[1].label}" elif '*' in self.value: return f"{self.value!r} prefixed wildcard" else: return f"{self.value!r} prefixed name" @property def source(self: XPathToken) -> str: if self.occurrence: return ':'.join(tk.source for tk in self) + self.occurrence else: return ':'.join(tk.source for tk in self) @property def name(self) -> str: prefix = self[0].value assert isinstance(prefix, str) if prefix == '*': return '*:%s' % self[1].value else: return f'{{{self.get_namespace(prefix)}}}{self[1].value}' def led(self: XPathToken, left: XPathToken) -> XPathToken: version = self.parser.version if self.is_spaced(): if version <= '3.0': raise self.wrong_syntax("a QName cannot contains spaces before or after ':'") return left if version == '1.0': left.expected('(name)') elif version <= '3.0': left.expected('(name)', '*') elif left.symbol not in ('(name)', '*'): return left if not self.parser.next_token.label.endswith('function'): self.parser.expected_next('(name)', '*') if left.symbol == '(name)': try: namespace = self.get_namespace(cast(str, left.value)) except ElementPathKeyError: self.parser.advance() # Assure there isn't a following incomplete comment self[:] = left, self.parser.token msg = "prefix {!r} is not declared".format(left.value) raise self.error('XPST0081', msg) from None else: self.parser.next_token.bind_namespace(namespace) elif self.parser.next_token.symbol != '(name)': raise self.wrong_syntax() self[:] = left, self.parser.expression(95) if self[1].label.endswith('function'): self.value = f'{self[0].value}:{self[1].symbol}' else: self.value = f'{self[0].value}:{self[1].value}' return self def evaluate(self: XPathToken, context: ContextType = None) \ -> Union[ItemType, List[ItemType]]: if self[1].label.endswith('function'): return self[1].evaluate(context) return [x for x in self.select(context)] def select(self, context: ContextType = None) -> Iterator[ItemType]: if self[1].label.endswith('function'): value = self[1].evaluate(context) if isinstance(value, list): yield from value elif value is not None: yield value return if context is None: raise self.missing_context() yield from context.iter_matching_nodes(self.name) XPath1Parser.symbol_table[':'] = _PrefixedReferenceToken ### # Namespace URI as in ElementPath @method('{', bp=95) def nud_namespace_uri(self: XPathToken) -> XPathToken: if self.parser.strict and self.symbol == '{': raise self.wrong_syntax("not allowed symbol if parser has strict=True") self.parser.next_token.unexpected('{') if self.parser.next_token.symbol == '}': namespace = '' else: value = self.parser.next_token.value assert isinstance(value, str) namespace = value + self.parser.advance_until('}') namespace = collapse_white_spaces(namespace) try: AnyURI(namespace) except ValueError as err: msg = f"invalid URI in an EQName: {str(err)}" raise self.error('XQST0046', msg) from None if namespace == XMLNS_NAMESPACE: msg = f"cannot use the URI {XMLNS_NAMESPACE!r}!r in an EQName" raise self.error('XQST0070', msg) self.parser.advance() if not self.parser.next_token.label.endswith('function'): self.parser.expected_next('(name)', '*') self.parser.next_token.bind_namespace(namespace) cls: Type[XPathToken] = self.parser.symbol_table['(string)'] self[:] = cls(self.parser, namespace), self.parser.expression(90) if not self[0].value: self.value = self[1].value else: self.value = f'{{{self[0].value}}}{self[1].value}' return self @method('{') def evaluate_namespace_uri(self: XPathToken, context: ContextType = None) \ -> Union[ItemType, List[ItemType]]: if self[1].label.endswith('function'): return self[1].evaluate(context) return [x for x in self.select(context)] @method('{') def select_namespace_uri(self: XPathToken, context: ContextType = None) \ -> Iterator[Union[ItemType, List[ItemType]]]: if self[1].label.endswith('function'): yield self[1].evaluate(context) return elif context is None: raise self.missing_context() if isinstance(self.value, str): yield from context.iter_matching_nodes(self.value) ### # Variables @method('$', bp=90) def nud_variable_reference(self: XPathToken) -> XPathToken: self.parser.expected_next('(name)') self[:] = self.parser.expression(rbp=90), if not isinstance(self[0].value, str) or ':' in self[0].value: raise self[0].wrong_syntax("variable reference requires a simple reference name") return self @method('$') def evaluate_variable_reference(self: XPathToken, context: ContextType = None) \ -> Union[ItemType, List[ItemType]]: if context is None: raise self.missing_context() try: value = context.variables[cast(str, self[0].value)] except KeyError as err: raise self.error('XPST0008', 'unknown variable %r' % str(err)) from None else: return value if value is not None else [] ### # Nullary operators (use only the context) @method(nullary('*')) def select_wildcard(self: XPathToken, context: ContextType = None) -> Iterator[ItemType]: if self: # Product operator item = self.evaluate(context) if not isinstance(item, list): if context is not None: context.item = item yield item elif context is not None: for context.item in item: yield context.item else: yield from item return elif context is None: raise self.missing_context() # Wildcard literal if self.parser.schema is None: for item in context.iter_children_or_self(): if item is None: pass # '*' wildcard doesn't match document nodes elif context.axis == 'attribute': if isinstance(item, AttributeNode): yield item elif isinstance(item, ElementNode): yield item else: # XSD typed selection for item in context.iter_children_or_self(): if context.is_principal_node_kind(): if isinstance(item, (ElementNode, AttributeNode)): yield item @method(nullary('.')) def select_self_shortcut(self: XPathToken, context: ContextType = None) -> Iterator[ItemType]: if context is None: raise self.missing_context() yield from context.iter_self() @method(nullary('..')) def select_parent_shortcut(self: XPathToken, context: ContextType = None) \ -> Iterator[ParentNodeType]: if context is None: raise self.missing_context() yield from context.iter_parent() ### # Logical Operators @method(infix('or', bp=20)) def evaluate_or_operator(self: XPathToken, context: ContextType = None) -> bool: if isinstance(context, XPathSchemaContext): op1 = self.boolean_value(self[0].select(copy(context))) op2 = self.boolean_value(self[1].select(copy(context))) return op1 or op2 return self.boolean_value(self[0].select(copy(context))) or \ self.boolean_value(self[1].select(copy(context))) @method(infix('and', bp=25)) def evaluate_and_operator(self: XPathToken, context: ContextType = None) -> bool: if isinstance(context, XPathSchemaContext): op1 = self.boolean_value(self[0].select(copy(context))) op2 = self.boolean_value(self[1].select(copy(context))) return op1 and op2 return self.boolean_value(self[0].select(copy(context))) and \ self.boolean_value(self[1].select(copy(context))) ### # Comparison operators @method('=', bp=30) @method('!=', bp=30) @method('<', bp=30) @method('>', bp=30) @method('<=', bp=30) @method('>=', bp=30) def led_comparison_operators(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol in OPERATORS_MAP: raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=30) return self @method('=') @method('!=') @method('<') @method('>') @method('<=') @method('>=') def evaluate_comparison_operators(self: XPathToken, context: ContextType = None) -> bool: op = OPERATORS_MAP[self.symbol] try: return any(op(x1, x2) for x1, x2 in self.iter_comparison_data(context)) except (TypeError, ValueError) as err: if isinstance(context, XPathSchemaContext): return False elif isinstance(err, ElementPathTypeError): raise elif isinstance(err, TypeError): raise self.error('XPTY0004', err) from None else: raise self.error('FORG0001', err) from None ### # Numerical operators @method(infix('+', bp=40)) def evaluate_plus_operator(self: XPathToken, context: ContextType = None) \ -> Union[List[NoReturn], ArithmeticType]: if len(self) == 1: arg: NumericType = self.get_argument(context, cls=NumericProxy) return [] if arg is None else +arg else: op1: Optional[ArithmeticType] op2: ArithmeticType op1, op2 = self.get_operands(context, cls=ArithmeticProxy) if op1 is None: return [] try: return op1 + op2 # type:ignore[operator] except (TypeError, OverflowError) as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(err, TypeError): raise self.error('XPTY0004', err) from None elif isinstance(op1, AbstractDateTime): raise self.error('FODT0001', err) from None elif isinstance(op1, Duration): raise self.error('FODT0002', err) from None else: raise self.error('FOAR0002', err) from None @method(infix('-', bp=40)) def evaluate_minus_operator(self: XPathToken, context: ContextType = None) \ -> Union[List[NoReturn], ArithmeticType]: if len(self) == 1: arg: NumericType = self.get_argument(context, cls=NumericProxy) return [] if arg is None else -arg else: op1: Optional[ArithmeticType] op2: ArithmeticType op1, op2 = self.get_operands(context, cls=ArithmeticProxy) if op1 is None: return [] try: return op1 - op2 # type:ignore[operator] except (TypeError, OverflowError) as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(err, TypeError): raise self.error('XPTY0004', err) from None elif isinstance(op1, AbstractDateTime): raise self.error('FODT0001', err) from None elif isinstance(op1, Duration): raise self.error('FODT0002', err) from None else: raise self.error('FOAR0002', err) from None @method('+') @method('-') def nud_plus_minus_operators(self: XPathToken) -> XPathToken: self[:] = self.parser.expression(rbp=70), return self @method(infix('*', bp=45)) def evaluate_multiply_operator(self: XPathToken, context: ContextType = None) \ -> Union[ArithmeticType, List[ItemType]]: op1: Optional[ArithmeticType] op2: ArithmeticType if self: op1, op2 = self.get_operands(context, cls=ArithmeticProxy) if op1 is None: return [] try: if isinstance(op2, (YearMonthDuration, DayTimeDuration)): return op2 * op1 return op1 * op2 # type:ignore[operator] except TypeError as err: if isinstance(context, XPathSchemaContext): return [] if isinstance(op1, (float, decimal.Decimal)): if math.isnan(op1): raise self.error('FOCA0005') from None elif math.isinf(op1): raise self.error('FODT0002') from None if isinstance(op2, (float, decimal.Decimal)): if math.isnan(op2): raise self.error('FOCA0005') from None elif math.isinf(op2): raise self.error('FODT0002') from None raise self.error('XPTY0004', err) from None except ValueError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FOCA0005', err) from None except OverflowError as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(op1, AbstractDateTime): raise self.error('FODT0001', err) from None elif isinstance(op1, Duration): raise self.error('FODT0002', err) from None else: raise self.error('FOAR0002', err) from None else: # This is not a multiplication operator but a wildcard select statement return [x for x in self.select(context)] @method(infix('div', bp=45)) def evaluate_div_operator(self: XPathToken, context: ContextType = None) \ -> Union[int, float, decimal.Decimal, List[Any]]: dividend: Optional[ArithmeticType] divisor: ArithmeticType dividend, divisor = self.get_operands(context, cls=ArithmeticProxy) if dividend is None: return [] elif divisor != 0: try: if isinstance(dividend, int) and isinstance(divisor, int): return decimal.Decimal(dividend) / decimal.Decimal(divisor) return dividend / divisor # type:ignore[operator] except TypeError as err: raise self.error('XPTY0004', err) from None except ValueError as err: raise self.error('FOCA0005', err) from None except OverflowError as err: raise self.error('FOAR0002', err) from None except (ZeroDivisionError, decimal.DivisionByZero): raise self.error('FOAR0001') from None elif isinstance(dividend, AbstractDateTime): raise self.error('FODT0001') elif isinstance(dividend, Duration): raise self.error('FODT0002') elif not self.parser.compatibility_mode and \ isinstance(dividend, (int, decimal.Decimal)) and \ isinstance(divisor, (int, decimal.Decimal)): raise self.error('FOAR0001') elif dividend == 0: return math.nan elif dividend > 0: return float('-inf') if str(divisor).startswith('-') else float('inf') else: return float('inf') if str(divisor).startswith('-') else float('-inf') @method(infix('mod', bp=45)) def evaluate_mod_operator(self: XPathToken, context: ContextType = None) \ -> Union[List[NoReturn], ArithmeticType]: op1: Optional[NumericType] op2: Optional[NumericType] op1, op2 = self.get_operands(context, cls=NumericProxy) if op1 is None: return [] elif op2 is None: raise self.error('XPTY0004', '2nd operand is an empty sequence') elif op2 == 0 and isinstance(op2, float): return math.nan elif math.isinf(op2) and not math.isinf(op1) and op1 != 0: return op1 if self.parser.version != '1.0' else math.nan try: if isinstance(op1, int) and isinstance(op2, int): return op1 % op2 if op1 * op2 >= 0 else -(abs(op1) % op2) return op1 % op2 # type: ignore[operator] except TypeError as err: raise self.error('FORG0006', err) from None except (ZeroDivisionError, decimal.InvalidOperation): raise self.error('FOAR0001') from None # Resolve the intrinsic ambiguity of some infix operators @method('or') @method('and') @method('div') @method('mod') def nud_disambiguation_of_infix_operators(self: XPathToken) -> XPathTokenType: return self.as_name() ### # Union expressions @method('|', bp=50) def led_union_operator(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol in ('|', 'union'): left.concatenated = True self[:] = left, self.parser.expression(rbp=50) return self @method('|') def select_union_operator(self: XPathToken, context: ContextType = None) \ -> Iterator[XPathNode]: if context is None: raise self.missing_context() results = {item for k in range(2) for item in self[k].select(copy(context))} if any(not isinstance(x, XPathNode) for x in results): raise self.error('XPTY0004', 'only XPath nodes are allowed') elif self.concatenated: yield from cast(Set[XPathNode], results) else: yield from cast(List[XPathNode], sorted(results, key=node_position)) ### # Path expressions @method('//', bp=75) def nud_descendant_path(self: XPathToken) -> XPathToken: if self.parser.next_token.label not in self.parser.PATH_STEP_LABELS: self.parser.expected_next(*self.parser.PATH_STEP_SYMBOLS) self[:] = self.parser.expression(75), return self @method('/', bp=75) def nud_child_path(self: XPathToken) -> XPathToken: if self.parser.next_token.label not in self.parser.PATH_STEP_LABELS: try: self.parser.expected_next(*self.parser.PATH_STEP_SYMBOLS) except SyntaxError: return self self[:] = self.parser.expression(75), return self @method('//') @method('/') def led_child_or_descendant_path(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol in ('/', '//', ':', '[', '$'): pass elif left.label not in self.parser.PATH_STEP_LABELS and \ left.symbol not in self.parser.PATH_STEP_SYMBOLS: raise self.wrong_syntax() if self.parser.next_token.label not in self.parser.PATH_STEP_LABELS: self.parser.expected_next(*self.parser.PATH_STEP_SYMBOLS) self[:] = left, self.parser.expression(75) return self @method('/') def select_child_path(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: """ Child path expression. Selects child:: axis as default (when bind to '*' or '(name)'). """ if context is None: raise self.missing_context() elif not self: if isinstance(context.root, DocumentNode): yield context.root elif len(self) == 1: if isinstance(context.document, DocumentNode): context.item = context.document elif context.root is None or isinstance(context.root.parent, ElementNode): return # No root or a rooted subtree -> document root produce [] else: context.item = context.root # A fragment or a schema node yield from self[0].select(context) else: items: Set[ItemType] = set() for _ in context.inner_focus_select(self[0]): if not isinstance(context.item, XPathNode): msg = f"Intermediate step contains an atomic value {context.item!r}" raise self.error('XPTY0019', msg) for result in self[1].select(context): if not isinstance(result, XPathNode): yield result elif result in items: pass elif isinstance(result, ElementNode): if result.obj not in items: items.add(result) yield result else: items.add(result) yield result @method('//') def select_descendant_path(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: """Operator '//' is a short equivalent to /descendant-or-self::node()/""" if context is None: raise self.missing_context() elif len(self) == 2: items: Set[ItemType] = set() for _ in context.inner_focus_select(self[0]): if not isinstance(context.item, XPathNode): raise self.error('XPTY0019') for _ in context.iter_descendants(): for result in self[1].select(context): if not isinstance(result, XPathNode): yield result elif result in items: pass elif isinstance(result, ElementNode): if result.obj not in items: items.add(result) yield result else: items.add(result) yield result else: if isinstance(context.document, DocumentNode): context.item = context.document elif context.root is None or isinstance(context.root.parent, ElementNode): return # No root or a rooted subtree -> document root produce [] else: context.item = context.root # A fragment or a schema node items = set() for _ in context.iter_descendants(): for result in self[0].select(context): if not isinstance(result, XPathNode): items.add(result) elif result in items: pass elif isinstance(result, ElementNode): if result.obj not in items: items.add(result) else: items.add(result) yield from sorted(items, key=node_position) ### # Predicate filters @method('[', bp=80) def led_predicate(self: XPathToken, left: XPathToken) -> XPathToken: self[:] = left, self.parser.expression() self.parser.advance(']') return self @method('[') def select_predicate(self: XPathToken, context: ContextType = None) -> Iterator[ItemType]: if context is None: raise self.missing_context() for _ in context.inner_focus_select(self[0], True): if (self[1].label in ('axis', 'kind test') or self[1].symbol == '..') \ and not isinstance(context.item, XPathNode): raise self.error('XPTY0020') predicate: Sequence[NumericType] predicate = [x for x in cast(Iterator[NumericType], self[1].select(copy(context)))] if len(predicate) == 1 and isinstance(predicate[0], NumericProxy): if context.position == predicate[0]: yield context.item elif self.boolean_value(predicate): yield context.item ### # Parenthesized expressions @method('(', bp=100) def nud_parenthesized_expr(self: XPathToken) -> XPathToken: self[:] = self.parser.expression(), self.parser.advance(')') return self @method('(') def evaluate_parenthesized_expr(self: XPathToken, context: ContextType = None) -> Any: return self[0].evaluate(context) @method('(') def select_parenthesized_expr(self: XPathToken, context: ContextType = None) -> Iterator[Any]: return self[0].select(context) # XPath 1.0 definitions continue into module xpath1_functions sissaschool-elementpath-d3688c7/elementpath/xpath1/xpath1_parser.py000066400000000000000000000377231476131650400256110ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 1.0 implementation - part 1 (parser class and symbols) """ import re from abc import ABCMeta from typing import cast, Any, ClassVar, Dict, List, Optional, Set, Tuple, Type, Union from elementpath._typing import Callable, MutableMapping, Sequence from elementpath.aliases import NamespacesType, NargsType from elementpath.exceptions import xpath_error, UnsupportedFeatureError, \ ElementPathValueError, ElementPathNameError, ElementPathKeyError, MissingContextError from elementpath.helpers import upper_camel_case from elementpath.collations import UNICODE_CODEPOINT_COLLATION from elementpath.datatypes import QName from elementpath.tdop import Parser from elementpath.namespaces import XML_NAMESPACE, XSD_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE from elementpath.sequence_types import match_sequence_type from elementpath.schema_proxy import AbstractSchemaProxy from elementpath.xpath_context import ContextType from elementpath.xpath_tokens import XPathTokenType, XPathToken, XPathAxis, \ XPathFunction, ProxyToken, RootToken class XPath1Parser(Parser[XPathTokenType]): """ XPath 1.0 expression parser class. Provide a *namespaces* dictionary argument for mapping namespace prefixes to URI inside expressions. If *strict* is set to `False` the parser enables also the parsing of QNames, like the ElementPath library. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param strict: a strict mode is `False` the parser enables parsing of QNames \ in extended format, like the Python's ElementPath library. Default is `True`. """ version = '1.0' """The XPath version string.""" token_base_class = XPathToken # type: ignore[assignment, unused-ignore] literals_pattern = re.compile( r"""'(?:[^']|'')*'|"(?:[^"]|"")*"|(?:\d+|\.\d+)(?:\.\d*)?(?:[Ee][+-]?\d+)?""" ) name_pattern = re.compile(r'[^\d\W][\w.\-\xb7\u0300-\u036F\u203F\u2040]*') RESERVED_FUNCTION_NAMES = { 'comment', 'element', 'node', 'processing-instruction', 'text' } DEFAULT_NAMESPACES: ClassVar[Dict[str, str]] = {'xml': XML_NAMESPACE} """Namespaces known statically by default.""" # Labels and symbols admitted after a path step PATH_STEP_LABELS: ClassVar[Tuple[str, ...]] = ('axis', 'kind test') PATH_STEP_SYMBOLS: ClassVar[Set[str]] = { '(integer)', '(string)', '(float)', '(decimal)', '(name)', '*', '@', '..', '.', '{' } # Class attributes for compatibility with XPath 2.0+ schema: Optional[AbstractSchemaProxy] = None variable_types: Optional[Dict[str, str]] = None document_types: Optional[Dict[str, str]] = None collection_types: Optional[NamespacesType] = None default_collection_type: str = 'node()*' base_uri: Optional[str] = None function_namespace = XPATH_FUNCTIONS_NAMESPACE function_signatures: Dict[Tuple[QName, int], str] = {} decimal_formats: Dict[Optional[str], Any] = {} parse_arguments: bool = True defuse_xml: bool = True compatibility_mode: bool = True """XPath 1.0 compatibility mode.""" default_namespace: Optional[str] = None """ The default namespace. For XPath 1.0 this value is always `None` because the default namespace is ignored (see https://www.w3.org/TR/1999/REC-xpath-19991116/#node-tests). """ default_collation = UNICODE_CODEPOINT_COLLATION @staticmethod def tracer(trace_data: str) -> None: """Trace data collector""" def __init__(self, namespaces: Optional[NamespacesType] = None, strict: bool = True) -> None: super(XPath1Parser, self).__init__() self.namespaces: Dict[str, str] = self.DEFAULT_NAMESPACES.copy() if namespaces is not None: self.namespaces.update(namespaces) self.strict: bool = strict def __str__(self) -> str: args = [] if self.namespaces != self.DEFAULT_NAMESPACES: args.append(str(self.other_namespaces)) if not self.strict: args.append('strict=False') return f"{self.__class__.__name__}({', '.join(args)})" @property def other_namespaces(self) -> Dict[str, str]: """The subset of namespaces not known by default.""" return {k: v for k, v in self.namespaces.items() if k not in self.DEFAULT_NAMESPACES or self.DEFAULT_NAMESPACES[k] != v} @property def xsd_version(self) -> str: return '1.0' # Use XSD 1.0 datatypes for default def is_schema_bound(self) -> bool: return False def xsd_qname(self, local_name: str) -> str: """Returns a prefixed QName string for XSD namespace.""" if self.namespaces.get('xs') == XSD_NAMESPACE: return 'xs:%s' % local_name for pfx, uri in self.namespaces.items(): if uri == XSD_NAMESPACE: return '%s:%s' % (pfx, local_name) if pfx else local_name raise xpath_error('XPST0081', 'Missing XSD namespace registration') @classmethod def create_restricted_parser(cls, name: str, symbols: Sequence[str]) \ -> Type['XPath1Parser']: """Get a parser subclass with a restricted set of symbols.s""" symbol_table = { k: v for k, v in cls.symbol_table.items() if k in symbols } return cast(Type['XPath1Parser'], ABCMeta( f"{name}{cls.__name__}", (cls,), {'symbol_table': symbol_table} )) @staticmethod def unescape(string_literal: str) -> str: if string_literal.startswith("'"): return string_literal[1:-1].replace("''", "'") else: return string_literal[1:-1].replace('""', '"') @classmethod def proxy(cls, symbol: str, label: str = 'proxy', bp: int = 90) -> Type[ProxyToken]: """Register a proxy token class for a symbol.""" if symbol in cls.symbol_table and not issubclass(cls.symbol_table[symbol], ProxyToken): # Move the token class before register the proxy token token_cls = cls.symbol_table.pop(symbol) cls.symbol_table[f'{{{token_cls.namespace}}}{symbol}'] = token_cls token_class_name = "_%s%sProxy" % ( upper_camel_case(symbol), str(label).title().replace(' ', '') ) token_class = cls.register( symbol, label='function', class_name=token_class_name, bases=(ProxyToken,), lbp=bp, rbp=bp ) assert issubclass(token_class, ProxyToken) return token_class @classmethod def axis(cls, symbol: str, reverse_axis: bool = False, bp: int = 80) -> Type[XPathAxis]: """Register a token class for a symbol that represents an XPath *axis*.""" token_class = cls.register(symbol, bases=(XPathAxis,), reverse_axis=reverse_axis, lbp=bp, rbp=bp) assert issubclass(token_class, XPathAxis) return token_class @classmethod def function(cls, symbol: str, prefix: Optional[str] = None, label: str = 'function', nargs: NargsType = None, sequence_types: Tuple[str, ...] = (), bp: int = 90) -> Type[XPathFunction]: """ Registers a token class for a symbol that represents an XPath function. """ kwargs = { 'bases': (XPathFunction,), 'label': label, 'nargs': nargs, 'lbp': bp, 'rbp': bp, } if 'function' not in label: # kind test or sequence type return cast(Type[XPathFunction], cls.register(symbol, **kwargs)) elif symbol in cls.RESERVED_FUNCTION_NAMES: raise ElementPathValueError(f'{symbol!r} is a reserved function name') if prefix: namespace = cls.DEFAULT_NAMESPACES[prefix] qname = QName(namespace, '%s:%s' % (prefix, symbol)) kwargs['lookup_name'] = qname.expanded_name kwargs['class_name'] = '_%s%s%s' % ( prefix.capitalize(), symbol.capitalize(), str(label).title().replace(' ', '') ) kwargs['namespace'] = namespace cls.proxy(symbol, label='function', bp=bp) else: qname = QName(XPATH_FUNCTIONS_NAMESPACE, 'fn:%s' % symbol) kwargs['namespace'] = XPATH_FUNCTIONS_NAMESPACE if sequence_types: # Register function signature(s) kwargs['sequence_types'] = sequence_types if nargs is None: pass # pragma: no cover elif isinstance(nargs, int): assert len(sequence_types) == nargs + 1 cls.function_signatures[(qname, nargs)] = 'function({}) as {}'.format( ', '.join(sequence_types[:-1]), sequence_types[-1] ) elif nargs[1] is None: assert len(sequence_types) == nargs[0] + 1 cls.function_signatures[(qname, nargs[0])] = 'function({}, ...) as {}'.format( ', '.join(sequence_types[:-1]), sequence_types[-1] ) else: assert len(sequence_types) == nargs[1] + 1 for arity in range(nargs[0], nargs[1] + 1): cls.function_signatures[(qname, arity)] = 'function({}) as {}'.format( ', '.join(sequence_types[:arity]), sequence_types[-1] ) return cast(Type[XPathFunction], cls.register(symbol, **kwargs)) def parse(self, source: str) -> XPathToken: if self.tokenizer is None: self.tokenizer = self.create_tokenizer(self.symbol_table) root_token = super().parse(source) if root_token.label in ('sequence type', 'function test'): raise root_token.error('XPST0003', "not allowed in XPath expression") try: root_token.evaluate() # Static context evaluation except MissingContextError: pass if self.schema is not None: # Static evaluation using a schema context context = self.schema.get_context() for _ in root_token.select(context): pass return RootToken(root_token) elif self.__class__.__module__.startswith('xmlschema.'): # Workaround for xmlschema < 4.0: returns a root token for sharing schema return RootToken(root_token) return root_token def expected_next(self, *symbols: str, message: Optional[str] = None) -> None: """ Checks the next token with a list of symbols. Replaces the next token with a '(name)' token if the check fails and the next token can be a name, otherwise raises a syntax error. :param symbols: a sequence of symbols. :param message: optional error message. """ if self.next_token.symbol in symbols: return elif '(name)' in symbols and \ not isinstance(self.next_token, (XPathFunction, XPathAxis)) and \ self.name_pattern.match(self.next_token.symbol) is not None: # Disambiguation replacing the next token with a '(name)' token cls = cast(Type[XPathToken], self.symbol_table['(name)']) self.next_token = cls(self, self.next_token.symbol) else: raise self.next_token.wrong_syntax(message) def check_variables(self, values: MutableMapping[str, Any]) -> None: """Checks the sequence types of the XPath dynamic context's variables.""" for varname, value in values.items(): if not match_sequence_type( value, 'item()*' if isinstance(value, list) else 'item()', self ): message = "Unmatched sequence type for variable {!r}".format(varname) raise xpath_error('XPDY0050', message) def get_function(self, name: str, arity: Optional[int], context: ContextType = None) -> XPathFunction: """ Returns an XPathFunction object suitable for stand-alone usage. :param name: the name of the function. :param arity: the arity of the function object, must be compatible \ with the signature of the XPath function. :param context: an optional context to bound to the function. """ if ':' not in name: qname = QName(XPATH_FUNCTIONS_NAMESPACE, f'fn:{name}') elif name.startswith('fn:'): qname = QName(XPATH_FUNCTIONS_NAMESPACE, name) name = name[3:] else: prefix, name = name.split(':') try: namespace = self.namespaces[prefix] except KeyError: raise ElementPathKeyError(f"Unknown namespace {prefix!r}") from None else: qname = QName(namespace, f'{prefix}:{name}') if qname.expanded_name in self.symbol_table: token_class = self.symbol_table[qname.expanded_name] elif name in self.symbol_table: token_class = self.symbol_table[name] else: raise ElementPathNameError(f'unknown function {name!r}') if not issubclass(token_class, XPathFunction): raise ElementPathNameError(f'{name!r} is not an XPath function') if token_class.namespace != qname.namespace: raise ElementPathNameError(f'namespace mismatch: {token_class.namespace}') try: func = token_class(self, nargs=arity) except TypeError: msg = f"unknown function {qname.qname}#{arity}" raise xpath_error('XPST0017', msg) from None else: if context is not None: func.context = context return func ### # Unsupported methods in XPath 1.0 @classmethod def constructor(cls, symbol: str, bp: int = 90, nargs: NargsType = 1, sequence_types: Union[Tuple[()], Tuple[str, ...], List[str]] = (), label: Union[str, Tuple[str, ...]] = 'constructor function') \ -> Callable[[Callable[..., Any]], Callable[..., Any]]: """ Statically creates a constructor token class, that is registered in the globals of the module where the method is called. """ raise UnsupportedFeatureError("Static definition of schema constructors token " "classes requires an XPath 2.0+ parser") def schema_constructor(self, atomic_type_name: str, bp: int = 90) \ -> Type[XPathFunction]: """Dynamically registers a token class for a schema atomic type constructor function.""" raise UnsupportedFeatureError("Dynamic definition of schema constructors token " "classes requires an XPath 2.0+ parser") def external_function(self, callback: Callable[..., Any], name: Optional[str] = None, prefix: Optional[str] = None, sequence_types: Tuple[str, ...] = (), bp: int = 90) -> Type[XPathFunction]: """Registers a token class for an external function.""" raise UnsupportedFeatureError( "Registration of external functions requires an XPath 2.0+ parser" ) ### # Special symbols XPath1Parser.register('(start)') XPath1Parser.register('(end)') XPath1Parser.literal('(string)') XPath1Parser.literal('(float)') XPath1Parser.literal('(decimal)') XPath1Parser.literal('(integer)') XPath1Parser.literal('(invalid)') XPath1Parser.register('(unknown)') ### # Simple symbols XPath1Parser.register(',') XPath1Parser.register(')', bp=100) XPath1Parser.register(']') XPath1Parser.register('::') XPath1Parser.register('}') # XPath 1.0 definitions continue into module _xpath1_operators sissaschool-elementpath-d3688c7/elementpath/xpath2/000077500000000000000000000000001476131650400224435ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/xpath2/__init__.py000066400000000000000000000010011476131650400245440ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath2_parser import XPath2Parser else: from ._xpath2_constructors import XPath2Parser __all__ = ['XPath2Parser'] sissaschool-elementpath-d3688c7/elementpath/xpath2/_xpath2_constructors.py000066400000000000000000000565401476131650400272240ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 2.0 implementation - part 4 (XSD constructors) """ import decimal from typing import cast, Optional, Union from elementpath.aliases import Emptiable from elementpath.exceptions import ElementPathError, ElementPathSyntaxError from elementpath.namespaces import XSD_NAMESPACE from elementpath.datatypes import xsd10_atomic_types, xsd11_atomic_types, \ GregorianDay, GregorianMonth, GregorianMonthDay, GregorianYear10, \ GregorianYear, GregorianYearMonth10, GregorianYearMonth, Duration, \ DayTimeDuration, YearMonthDuration, Date10, Date, DateTime10, DateTime, \ DateTimeStamp, Time, UntypedAtomic, QName, HexBinary, Base64Binary, \ BooleanProxy, AnyURI, AtomicType, NumericType, Notation from elementpath.xpath_context import ContextType, XPathSchemaContext from elementpath.xpath_tokens import XPathConstructor from ._xpath2_functions import XPath2Parser register = XPath2Parser.register unregister = XPath2Parser.unregister method = XPath2Parser.method constructor = XPath2Parser.constructor # Type annotations aliases OtherDateTimeTypes = Union[ Date10, GregorianDay, GregorianMonth, GregorianMonthDay, GregorianYear10, GregorianYear, GregorianYearMonth10, GregorianYearMonth, Time ] ### # Constructors for string-based XSD types @constructor('normalizedString') @constructor('token') @constructor('language') @constructor('NMTOKEN') @constructor('Name') @constructor('NCName') @constructor('ID') @constructor('IDREF') @constructor('ENTITY') @constructor('anyURI') def cast_string_based_types(self: XPathConstructor, value: AtomicType) \ -> Union[str, AnyURI]: try: result = xsd10_atomic_types[self.symbol](value) except ValueError as err: raise self.error('FORG0001', err) else: assert isinstance(result, (str, AnyURI)) return result ### # Constructors for numeric XSD types @constructor('decimal') @constructor('double') @constructor('float') def cast_numeric_types(self: XPathConstructor, value: AtomicType) -> NumericType: try: if self.parser.xsd_version == '1.0': result = xsd10_atomic_types[self.symbol](value) else: result = xsd11_atomic_types[self.symbol](value) except ValueError as err: if isinstance(value, (str, UntypedAtomic)): raise self.error('FORG0001', err) raise self.error('FOCA0002', err) except ArithmeticError as err: raise self.error('FOCA0002', err) from None else: assert isinstance(result, (int, float, decimal.Decimal)) return result @constructor('integer') @constructor('nonNegativeInteger') @constructor('positiveInteger') @constructor('nonPositiveInteger') @constructor('negativeInteger') @constructor('long') @constructor('int') @constructor('short') @constructor('byte') @constructor('unsignedLong') @constructor('unsignedInt') @constructor('unsignedShort') @constructor('unsignedByte') def cast_integer_types(self: XPathConstructor, value: AtomicType) -> int: try: result = xsd10_atomic_types[self.symbol](value) except ValueError: msg = 'could not convert {!r} to xs:{}'.format(value, self.symbol) if isinstance(value, (str, bytes, int, UntypedAtomic)): raise self.error('FORG0001', msg) from None raise self.error('FOCA0002', msg) from None except ArithmeticError as err: raise self.error('FOCA0002', err) from None else: assert isinstance(result, int) return result ### # Constructors for datetime XSD types @constructor('date') def cast_date_type(self: XPathConstructor, value: AtomicType) -> Date10: cls = Date if self.parser.xsd_version == '1.1' else Date10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): result = cls.fromstring(value.value) elif isinstance(value, DateTime10): result = cls(value.year, value.month, value.day, value.tzinfo) else: result = cls.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) else: assert isinstance(result, Date10) return result @constructor('gDay') def cast_gregorian_day_type(self: XPathConstructor, value: AtomicType) -> GregorianDay: if isinstance(value, GregorianDay): return value try: if isinstance(value, UntypedAtomic): result = GregorianDay.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): result = GregorianDay(value.day, value.tzinfo) else: result = GregorianDay.fromstring(value) # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) else: assert isinstance(result, GregorianDay) return result @constructor('gMonth') def cast_gregorian_month_type(self: XPathConstructor, value: AtomicType) -> GregorianMonth: if isinstance(value, GregorianMonth): return value try: if isinstance(value, UntypedAtomic): return GregorianMonth.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return GregorianMonth(value.month, value.tzinfo) return GregorianMonth.fromstring(value) # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) @constructor('gMonthDay') def cast_gregorian_month_day_type(self: XPathConstructor, value: AtomicType) \ -> GregorianMonthDay: if isinstance(value, GregorianMonthDay): return value try: if isinstance(value, UntypedAtomic): return GregorianMonthDay.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return GregorianMonthDay(value.month, value.day, value.tzinfo) return GregorianMonthDay.fromstring(value) # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) @constructor('gYear') def cast_gregorian_year_type(self: XPathConstructor, value: AtomicType) \ -> Union[GregorianYear10, GregorianYear]: cls = GregorianYear if self.parser.xsd_version == '1.1' else GregorianYear10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): return cls.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return cls(value.year, value.tzinfo) return cls.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('gYearMonth') def cast_gregorian_year_month_type(self: XPathConstructor, value: AtomicType) \ -> Union[GregorianYearMonth10, GregorianYearMonth]: cls = GregorianYearMonth \ if self.parser.xsd_version == '1.1' else GregorianYearMonth10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): return cls.fromstring(value.value) elif isinstance(value, (Date10, DateTime10)): return cls(value.year, value.month, value.tzinfo) return cls.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('time') def cast_time_type(self: XPathConstructor, value: AtomicType) -> Time: if isinstance(value, Time): return value try: if isinstance(value, UntypedAtomic): return Time.fromstring(value.value) elif isinstance(value, DateTime10): return Time(value.hour, value.minute, value.second, value.microsecond, value.tzinfo) return Time.fromstring(value) # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) @method('date') @method('gDay') @method('gMonth') @method('gMonthDay') @method('gYear') @method('gYearMonth') @method('time') def evaluate_other_datetime_types(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[OtherDateTimeTypes]: if self.context is not None: context = self.context arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: return cast(OtherDateTimeTypes, self.cast(arg)) except (TypeError, OverflowError) as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(err, TypeError): raise self.error('FORG0006', err) from None else: raise self.error('FODT0001', err) from None ### # Constructors for time durations XSD types @constructor('duration') def cast_duration_type(self: XPathConstructor, value: AtomicType) -> Duration: if isinstance(value, Duration): return value try: if isinstance(value, UntypedAtomic): return Duration.fromstring(value.value) return Duration.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0002', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('yearMonthDuration') def cast_year_month_duration_type(self: XPathConstructor, value: AtomicType) \ -> YearMonthDuration: if isinstance(value, YearMonthDuration): return value elif isinstance(value, Duration): return YearMonthDuration(months=value.months) try: if isinstance(value, UntypedAtomic): return YearMonthDuration.fromstring(value.value) return YearMonthDuration.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0002', err) from None except ValueError as err: raise self.error('FORG0001', err) @constructor('dayTimeDuration') def cast_day_time_duration_type(self: XPathConstructor, value: AtomicType) \ -> DayTimeDuration: if isinstance(value, DayTimeDuration): return value elif isinstance(value, Duration): return DayTimeDuration(seconds=value.seconds) try: if isinstance(value, UntypedAtomic): return DayTimeDuration.fromstring(value.value) return DayTimeDuration.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0002', err) from None except ValueError as err: raise self.error('FORG0001', err) from None @constructor('dateTimeStamp') def cast_datetime_stamp_type(self: XPathConstructor, value: AtomicType) \ -> DateTimeStamp: if isinstance(value, DateTimeStamp): return value elif isinstance(value, DateTime10): value = str(value) try: if isinstance(value, UntypedAtomic): return DateTimeStamp.fromstring(value.value) elif isinstance(value, Date): return DateTimeStamp(value.year, value.month, value.day, tzinfo=value.tzinfo) return DateTimeStamp.fromstring(value) # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) from None @method('dateTimeStamp') def evaluate_datetime_stamp_type(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[DateTimeStamp]: if self.context is not None: context = self.context arg = self.data_value(self.get_argument(context)) if arg is None: return [] if isinstance(arg, UntypedAtomic): result = self.cast(arg.value) elif isinstance(arg, Date): result = self.cast(arg) else: result = self.cast(str(arg)) assert isinstance(result, DateTimeStamp) return result @method('dateTimeStamp') def nud_datetime_stamp_type(self: XPathConstructor) -> XPathConstructor: if self.parser.xsd_version == '1.0': raise self.wrong_syntax("xs:dateTimeStamp is not recognized unless XSD 1.1 is enabled") if not self.parser.parse_arguments: return self try: self.parser.advance('(') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': msg = 'Too many arguments: expected at most 1 argument' raise self.error('XPST0017', msg) self.parser.advance(')') except SyntaxError as err: raise self.error('XPST0017', str(err)) from None return self ### # Constructors for binary XSD types @constructor('base64Binary') def cast_base64_binary_type(self: XPathConstructor, value: AtomicType) -> Base64Binary: try: return Base64Binary(value, ordered=self.parser.version >= '3.1') # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('XPTY0004', err) from None @constructor('hexBinary') def cast_hex_binary_type(self: XPathConstructor, value: AtomicType) -> HexBinary: try: return HexBinary(value, ordered=self.parser.version >= '3.1') # type: ignore[arg-type] except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('XPTY0004', err) from None @method('base64Binary') @method('hexBinary') def evaluate_binary_types(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[Union[HexBinary, Base64Binary]]: arg = self.data_value(self.get_argument(self.context or context)) if arg is None: return [] try: return cast(Union[HexBinary, Base64Binary], self.cast(arg)) except ElementPathError as err: if isinstance(context, XPathSchemaContext): return [] err.token = self raise @constructor('NOTATION') def cast_notation_type(self: XPathConstructor, value: AtomicType) -> Notation: raise NotImplementedError("No value is castable to xs:NOTATION") @method('NOTATION') def nud_notation_type(self: XPathConstructor) -> None: if not self.parser.parse_arguments: return self.parser.advance('(') if self.parser.next_token.symbol == ')': raise self.error('XPST0017', 'expected exactly one argument') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol != ')': raise self.error('XPST0017', 'expected exactly one argument') self.parser.advance() raise self.error('XPST0017', "no constructor function exists for xs:NOTATION") ### # Multirole tokens (function or constructor function) # # Case 1: In XPath 2.0 the 'boolean' keyword is used both for boolean() function and # for boolean() constructor function. unregister('boolean') @constructor('boolean', label=('function', 'constructor function'), sequence_types=('item()*', 'xs:boolean')) def cast_boolean_type(self: XPathConstructor, value: AtomicType) -> bool: try: return cast(bool, BooleanProxy(value)) except ValueError as err: raise self.error('FORG0001', err) from None except TypeError as err: raise self.error('XPTY0004', err) from None @method('boolean') def nud_boolean_type_and_function(self: XPathConstructor) -> XPathConstructor: if not self.parser.parse_arguments: return self self.parser.advance('(') if self.parser.next_token.symbol == ')': msg = 'Too few arguments: expected at least 1 argument' raise self.error('XPST0017', msg) self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': msg = 'Too many arguments: expected at most 1 argument' raise self.error('XPST0017', msg) self.parser.advance(')') return self @method('boolean') def evaluate_boolean_type_and_function(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[bool]: if self.context is not None: context = self.context if self.label == 'function': return self.boolean_value(self[0].select(context)) # xs:boolean constructor arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: return cast(bool, self.cast(arg)) except ElementPathError as err: if isinstance(context, XPathSchemaContext): return [] err.token = self raise ### # Case 2: In XPath 2.0 the 'string' keyword is used both for fn:string() and xs:string(). unregister('string') @constructor('string', label=('function', 'constructor function'), nargs=(0, 1), sequence_types=('item()?', 'xs:string')) def cast_string_type(self: XPathConstructor, value: AtomicType) -> str: return self.string_value(value) @method('string') def nud_string_type_and_function(self: XPathConstructor) -> XPathConstructor: if not self.parser.parse_arguments: return self try: self.parser.advance('(') if self.label != 'function' or self.parser.next_token.symbol != ')': self[0:] = self.parser.expression(5), self.parser.advance(')') except ElementPathSyntaxError as err: raise self.error('XPST0017', err) else: return self @method('string') def evaluate_string_type_and_function(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[str]: if self.context is not None: context = self.context if self.label == 'function': if not self: if context is None: raise self.missing_context() return self.string_value(context.item) return self.string_value(self.get_argument(context)) else: item = self.get_argument(context) return [] if item is None else self.string_value(item) # Case 3 and 4: In XPath 2.0 the XSD 'QName' and 'dateTime' types have special # constructor functions so the 'QName' keyword is used both for fn:QName() and # xs:QName(), the same for 'dateTime' keyword. # # In those cases the label at parse time is set by the nud method, in dependence # of the number of args. # @constructor('QName', bp=90, label=('function', 'constructor function'), nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:QName')) def cast_qname_type(self: XPathConstructor, value: AtomicType) -> QName: if isinstance(value, QName): return value elif isinstance(value, UntypedAtomic) and self.parser.version >= '3.0': return self.cast_to_qname(value.value) elif isinstance(value, str): return self.cast_to_qname(value) else: raise self.error('XPTY0004', 'the argument has an invalid type %r' % type(value)) @constructor('dateTime', bp=90, label=('function', 'constructor function'), nargs=(1, 2), sequence_types=('xs:date?', 'xs:time?', 'xs:dateTime?')) def cast_datetime_type(self: XPathConstructor, value: AtomicType) \ -> Optional[DateTime10]: cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 if isinstance(value, cls): return value try: if isinstance(value, UntypedAtomic): result = cls.fromstring(value.value) elif isinstance(value, Date10): result = cls(value.year, value.month, value.day, tzinfo=value.tzinfo) else: result = cls.fromstring(value) # type: ignore[arg-type] except OverflowError as err: raise self.error('FODT0001', err) from None except ValueError as err: raise self.error('FORG0001', err) from None else: return result @method('QName') @method('dateTime') def nud_qname_and_datetime(self: XPathConstructor) -> XPathConstructor: if not self.parser.parse_arguments: return self try: self.parser.advance('(') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': if self.label != 'function': raise self.error('XPST0017', 'unexpected 2nd argument') self.label = 'function' self.parser.advance(',') self[1:] = self.parser.expression(5), elif self.label != 'constructor function' or self.namespace != XSD_NAMESPACE: raise self.error('XPST0017', '2nd argument missing') else: self.label = 'constructor function' self.nargs = 1 self.parser.advance(')') except SyntaxError: raise self.error('XPST0017') from None else: return self @method('QName') def evaluate_qname_type_and_function(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[QName]: if self.context is not None: context = self.context if self.label == 'constructor function': arg = self.data_value(self.get_argument(context)) if arg is None: return [] value = self.cast(arg) assert isinstance(value, QName) return value else: uri = self.get_argument(context) qname = self.get_argument(context, index=1) try: return QName(uri, qname) except (TypeError, ValueError) as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(err, TypeError): raise self.error('XPTY0004', err) else: raise self.error('FOCA0002', err) @method('dateTime') def evaluate_datetime_type_and_function(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[DateTime10]: if self.context is not None: context = self.context if self.label == 'constructor function': arg = self.data_value(self.get_argument(context)) if arg is None: return [] try: result = self.cast(arg) except (ValueError, TypeError) as err: if isinstance(context, XPathSchemaContext): return [] elif isinstance(err, ValueError): raise self.error('FORG0001', err) from None else: raise self.error('FORG0006', err) from None else: assert isinstance(result, DateTime10) return result else: dt = self.get_argument(context, cls=Date10) tm = self.get_argument(context, 1, cls=Time) if dt is None or tm is None: return [] elif dt.tzinfo == tm.tzinfo or tm.tzinfo is None: tzinfo = dt.tzinfo elif dt.tzinfo is None: tzinfo = tm.tzinfo else: raise self.error('FORG0008') if self.parser.xsd_version == '1.1': return DateTime(dt.year, dt.month, dt.day, tm.hour, tm.minute, tm.second, tm.microsecond, tzinfo) return DateTime10(dt.year, dt.month, dt.day, tm.hour, tm.minute, tm.second, tm.microsecond, tzinfo) @constructor('untypedAtomic') def cast_untyped_atomic(self: XPathConstructor, value: AtomicType) -> UntypedAtomic: return UntypedAtomic(value) @method('untypedAtomic') def evaluate_untyped_atomic(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[UntypedAtomic]: arg = self.data_value(self.get_argument(self.context or context)) if arg is None: return [] elif isinstance(arg, UntypedAtomic): return arg else: arg = self.cast(arg) assert isinstance(arg, UntypedAtomic) return arg sissaschool-elementpath-d3688c7/elementpath/xpath2/_xpath2_functions.py000066400000000000000000002035341476131650400264610ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 2.0 implementation - part 3 (functions) """ import math import datetime import time import re import os.path import unicodedata from copy import copy from decimal import Decimal, DecimalException from string import ascii_letters from typing import cast, List, Optional, Type, Union from urllib.parse import urlsplit, quote as urllib_quote from elementpath._typing import Iterator from elementpath.aliases import Emptiable, AnyNsmapType from elementpath.exceptions import ElementPathValueError from elementpath.helpers import Patterns, is_idrefs, is_xml_codepoint, round_number from elementpath.datatypes import AtomicType, DateTime10, DateTime, Date10, Date, \ Float10, DoubleProxy, Time, Duration, DayTimeDuration, YearMonthDuration, \ UntypedAtomic, AnyURI, QName, NCName, Id, ArithmeticProxy, NumericProxy, NumericType from elementpath.namespaces import XML_NAMESPACE, get_namespace, split_expanded_name, \ XML_ID, XML_LANG from elementpath.compare import deep_equal from elementpath.sequence_types import match_sequence_type from elementpath.xpath_context import ContextType, ItemType, XPathSchemaContext from elementpath.xpath_nodes import XPathNode, DocumentNode, ElementNode, EtreeElementNode from elementpath.xpath_tokens import XPathFunction from elementpath.regex import RegexError, translate_pattern from elementpath.collations import CollationManager from ._xpath2_operators import XPath2Parser __all__ = ['XPath2Parser'] method = XPath2Parser.method function = XPath2Parser.function def is_local_url_scheme(scheme: str) -> bool: return scheme in ('', 'file') or len(scheme) == 1 and scheme in ascii_letters def is_local_dir_url(url: str) -> bool: url_parts = urlsplit(url) return is_local_url_scheme(url_parts.scheme) and os.path.isdir(url_parts.path.lstrip(':')) ### # Sequence types (allowed only for type checking in treat-as/instance-of statements) function('empty-sequence', nargs=0, label='sequence type') @method(function('item', nargs=0, label='sequence type')) def evaluate_item_sequence_type(self: XPathFunction, context: ContextType = None) -> ItemType: if context is None: raise self.missing_context() return context.item @method('item') def nud_item_sequence_type(self: XPathFunction) -> XPathFunction: XPathFunction.nud(self) if self.parser.next_token.symbol in ('*', '+', '?'): self.occurrence = self.parser.next_token.symbol self.parser.advance() return self ### # Function for QNames @method(function('prefix-from-QName', nargs=1, sequence_types=('xs:QName?', 'xs:NCName?'))) def evaluate_prefix_from_qname_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NCName]: if self.context is not None: context = self.context qname: Optional[QName] = self.get_argument(context) if qname is None: return [] elif not isinstance(qname, QName): raise self.error('XPTY0004', 'argument has an invalid type %r' % type(qname)) return NCName(qname.prefix) if qname.prefix else [] @method(function('local-name-from-QName', nargs=1, sequence_types=('xs:QName?', 'xs:NCName?'))) def evaluate_local_name_from_qname_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NCName]: if self.context is not None: context = self.context qname: Optional[QName] = self.get_argument(context) if qname is None: return [] elif not isinstance(qname, QName): if self.parser.version >= '3.0' and \ isinstance(self.data_value(qname), UntypedAtomic): code = 'XPTY0117' else: code = 'XPTY0004' raise self.error(code, 'argument has an invalid type %r' % type(qname)) return NCName(qname.local_name) @method(function('namespace-uri-from-QName', nargs=1, sequence_types=('xs:QName?', 'xs:anyURI?'))) def evaluate_uri_from_qname_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AnyURI]: if self.context is not None: context = self.context qname: Optional[QName] = self.get_argument(context) if qname is None: return [] elif not isinstance(qname, QName): if self.parser.version >= '3.0' and \ isinstance(self.data_value(qname), UntypedAtomic): code = 'XPTY0117' else: code = 'XPTY0004' raise self.error(code, 'argument has an invalid type %r' % type(qname)) return AnyURI(qname.uri or '') @method(function('namespace-uri-for-prefix', nargs=2, sequence_types=('xs:string?', 'element()', 'xs:anyURI?'))) def evaluate_namespace_uri_for_prefix_function( self: XPathFunction, context: ContextType = None) -> Emptiable[AnyURI]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() prefix = self.get_argument(context=copy(context)) if prefix is None: prefix = '' if not isinstance(prefix, str): raise self.error('FORG0006', '1st argument has an invalid type %r' % type(prefix)) elem = self.get_argument(context, index=1) if not isinstance(elem, ElementNode): raise self.error('FORG0006', '2nd argument %r is not an element node' % elem) if not isinstance(elem, EtreeElementNode): return [] ns_uris = {get_namespace(e.tag) for e in elem.obj.iter() if not callable(e.tag)} for p, uri in self.parser.namespaces.items(): if uri in ns_uris: if p == prefix: if not prefix or uri: return AnyURI(uri) else: msg = 'Prefix %r is associated to no namespace' raise self.error('XPST0081', msg % prefix) else: return [] @method(function('in-scope-prefixes', nargs=1, sequence_types=('element()', 'xs:string*'))) def select_in_scope_prefixes_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[str]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() arg = self.get_argument(context, required=True) if not isinstance(arg, ElementNode): raise self.error('XPTY0004', 'argument %r is not an element node' % arg) elem = arg.obj if isinstance(context, XPathSchemaContext): # For schema context returns prefixes of static namespaces for pfx, uri in self.parser.namespaces.items(): if uri: yield pfx or '' elif hasattr(elem, 'nsmap'): # For lxml returns Element nsmap prefixes, replacing None with '' if 'xml' not in elem.nsmap: yield 'xml' for pfx, uri in elem.nsmap.items(): if uri: yield pfx or '' else: # For ElementTree returns module registered prefixes for pfx, uri in self.parser.namespaces.items(): if uri: yield pfx or '' if context.namespaces: yield from (x for x in context.namespaces if x not in self.parser.namespaces) @method(function('resolve-QName', nargs=2, sequence_types=('xs:string?', 'element()', 'xs:QName?'))) def evaluate_resolve_qname_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[QName]: if self.context is not None: context = self.context qname = self.get_argument(context=copy(context)) if qname is None: return [] elif not isinstance(qname, str): raise self.error('FORG0006', '1st argument has an invalid type %r' % type(qname)) if context is None: raise self.missing_context() elem = self.get_argument(context, index=1) if not isinstance(elem, ElementNode): raise self.error('FORG0006', '2nd argument %r is not an element node' % elem) qname = qname.strip() match = QName.pattern.match(qname) if match is None: raise self.error('FOCA0002', '1st argument must be an xs:QName') prefix = match.groupdict()['prefix'] or '' if prefix == 'xml': return QName(XML_NAMESPACE, qname) try: nsmap: AnyNsmapType = elem.nsmap except AttributeError: nsmap = self.parser.namespaces if nsmap is not None: for pfx, uri in nsmap.items(): if pfx is None: pfx = '' if pfx == prefix: if pfx: return QName(uri, '{}:{}'.format(pfx, match.groupdict()['local'])) else: return QName(uri, match.groupdict()['local']) if prefix or nsmap is None or '' in nsmap or None in nsmap: raise self.error('FONS0004', f'no namespace found for prefix {prefix!r}') return QName('', qname) ### # Accessor functions @method(function('node-name', nargs=1, sequence_types=('node()?', 'xs:QName?'))) def evaluate_node_name_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[QName]: if self.context is not None: context = self.context arg = self.get_argument(context) if arg is None: return [] elif not isinstance(arg, XPathNode): raise self.error('XPTY0004', 'an XPath node required') name = arg.name if name is None: return [] elif name.startswith('{'): # name is a QName in extended format namespace, local_name = split_expanded_name(name) for pfx, uri in self.parser.namespaces.items(): if uri == namespace: return QName(uri, '{}:{}'.format(pfx, local_name)) raise self.error('FONS0004', 'no prefix found for namespace {}'.format(namespace)) else: # name is a local name return QName(self.parser.namespaces.get('', ''), name) @method(function('nilled', nargs=1, sequence_types=('node()?', 'xs:boolean?'))) def evaluate_nilled_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[bool]: if self.context is not None: context = self.context arg = self.get_argument(context) if arg is None: return [] elif not isinstance(arg, XPathNode): raise self.error('XPTY0004', 'an XPath node required') return [] if arg.nilled is None else arg.nilled @method(function('data', nargs=1, sequence_types=('item()*', 'xs:anyAtomicType*'))) def select_data_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[AtomicType]: yield from self[0].atomization(self.context or context) @method(function('base-uri', nargs=(0, 1), sequence_types=('node()?', 'xs:anyURI?'))) def evaluate_base_uri_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AnyURI]: if self.context is not None: context = self.context item = self.get_argument(context, default_to_context=True) if context is None: raise self.missing_context("context item is undefined") elif item is None: return [] elif isinstance(item, XPathNode): uri = item.base_uri return AnyURI(uri if uri is not None else '') else: raise self.error('XPTY0004', "context item is not a node") @method(function('document-uri', nargs=1, sequence_types=('node()?', 'xs:anyURI?'))) def evaluate_document_uri_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AnyURI]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() arg = self.get_argument(context) if isinstance(arg, DocumentNode): uri = arg.document_uri if uri is not None: return AnyURI(uri) elif isinstance(context.root, DocumentNode): if context.documents: for uri, doc in context.documents.items(): if doc and doc.document is context.root.document: return AnyURI(uri) return [] ### # Number functions @method(function('round-half-to-even', nargs=(1, 2), sequence_types=('xs:numeric?', 'xs:integer', 'xs:numeric?'))) def evaluate_round_half_to_even_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NumericType]: if self.context is not None: context = self.context item = self.get_argument(context) if item is None: return [] elif isinstance(item, float) and (math.isnan(item) or math.isinf(item)): return item elif not isinstance(item, (float, int, Decimal)): code = 'XPTY0004' if isinstance(item, str) else 'FORG0006' raise self.error(code, "invalid argument type {!r}".format(type(item))) precision = 0 if len(self) < 2 else self[1].evaluate(context) try: if isinstance(item, int): return round(item, precision) # type: ignore[arg-type] elif isinstance(item, Decimal): return round(item, precision) # type: ignore[arg-type] elif isinstance(item, Float10): return Float10(round(item, precision)) # type: ignore[arg-type] return float(round(Decimal.from_float(item), precision)) # type: ignore[arg-type] except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('XPTY0004', err) except (DecimalException, OverflowError): if isinstance(item, Decimal): return Decimal.from_float(round(float(item), precision)) # type: ignore[arg-type] return round(item, precision) # type: ignore[arg-type] @method(function('abs', nargs=1, sequence_types=('xs:numeric?', 'xs:numeric?'))) def evaluate_abs_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NumericType]: if self.context is not None: context = self.context item = self.get_argument(context) if item is None: return [] elif isinstance(item, float) and math.isnan(item): return item elif isinstance(item, XPathNode): value = self.string_value(item) try: return abs(Decimal(value)) except DecimalException: if isinstance(context, XPathSchemaContext): return [] raise self.error('FOCA0002', "invalid string value {!r} for {!r}".format(value, item)) elif isinstance(item, bool) or not isinstance(item, (float, int, Decimal)): raise self.error('XPTY0004', "invalid argument type {!r}".format(type(item))) else: return cast(NumericType, abs(item)) ### # Aggregate functions @method(function('avg', nargs=1, sequence_types=('xs:anyAtomicType*', 'xs:anyAtomicType'))) def evaluate_avg_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AtomicType]: if self.context is not None: context = self.context values: List[AtomicType] = [] for item in self[0].atomization(context): if isinstance(item, UntypedAtomic): values.append(self.cast_to_double(item.value)) elif isinstance(item, (AnyURI, bool)): raise self.error('FORG0006', 'non numeric value {!r} in the sequence'.format(item)) else: values.append(item) if not values: return [] elif isinstance(values[0], Duration): value = values[0] try: for item in values[1:]: value = value + item # type: ignore[operator, assignment] return value / len(values) # type: ignore[operator] except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006', err) elif all(isinstance(x, int) for x in values): result = sum(cast(List[int], values)) / Decimal(len(values)) return int(result) if result % 1 == 0 else result elif all(isinstance(x, (int, Decimal)) for x in values): return sum(cast(List[Decimal], values)) / Decimal(len(values)) elif all(not isinstance(x, DoubleProxy) for x in values): try: return sum( Float10(x) if isinstance(x, Decimal) else x for x in values # type: ignore[misc] ) / len(values) except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006', err) else: try: return sum( float(x) if isinstance(x, Decimal) else x for x in values # type: ignore[misc] ) / len(values) except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006', err) @method(function('max', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:anyAtomicType?'))) @method(function('min', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:anyAtomicType?'))) def evaluate_max_min_functions(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AtomicType]: def max_or_min() -> Emptiable[AtomicType]: if not values: return [] elif all(isinstance(x, str) for x in values): if to_any_uri: return AnyURI(aggregate_func( cast(List[str], values) )) elif any(isinstance(x, str) for x in values): if any(isinstance(x, ArithmeticProxy) for x in values): raise self.error('FORG0006', "cannot compare strings with numeric data") elif all(isinstance(x, (Decimal, int)) for x in values): return aggregate_func( cast(List[str], values) ) elif any(isinstance(x, float) and math.isnan(x) for x in values): return float_class('NaN') elif all(isinstance(x, (int, float, Decimal)) for x in values): return float_class( aggregate_func(cast(List[NumericType], values)) ) return aggregate_func(values) # type: ignore[type-var] values: List[AtomicType] = [] float_class: Union[Type[Float10], Type[float]] = Float10 to_any_uri = None aggregate_func = max if self.symbol == 'max' else min if self.context is not None: context = self.context for item in self[0].atomization(context): if isinstance(item, UntypedAtomic): values.append(self.cast_to_double(item)) float_class = float elif isinstance(item, float): values.append(item) if float_class is Float10 and not isinstance(item, Float10): float_class = float elif isinstance(item, AnyURI): values.append(item.value) if to_any_uri is None: to_any_uri = True elif isinstance(item, (DayTimeDuration, YearMonthDuration)): values.append(item) elif isinstance(item, (Duration, QName)): raise self.error('FORG0006', "xs:{} is not an ordered type".format(type(item).name)) else: to_any_uri = False values.append(item) if len(self) < 2: collation = self.parser.default_collation else: collation = self.get_argument(context, 1, required=True, cls=str) with CollationManager(collation, self): try: return max_or_min() except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006', err) ### # General functions for sequences @method(function('empty', nargs=1, sequence_types=('item()*', 'xs:boolean'))) @method(function('exists', nargs=1, sequence_types=('item()*', 'xs:boolean'))) def evaluate_empty_and_exists_functions(self: XPathFunction, context: ContextType = None) \ -> bool: return bool(next(iter(self.select(context)))) @method('empty') def select_empty_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[bool]: try: value = next(iter(self[0].select(self.context or context))) except StopIteration: yield True else: yield not value and isinstance(value, list) @method('exists') def select_exists_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[bool]: try: value = next(iter(self[0].select(self.context or context))) except StopIteration: yield False else: yield not (not value and isinstance(value, list)) @method(function('distinct-values', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:anyAtomicType*'))) def select_distinct_values_function(self: XPathFunction, context: ContextType = None)\ -> Iterator[AtomicType]: def distinct_values(case_insensitive: bool = False) -> Iterator[AtomicType]: nan = False results: List[AtomicType] = [] for value in self[0].atomization(context): if case_insensitive and isinstance(value, (str, bytes)): value = value.casefold() if isinstance(value, (float, Decimal)): if math.isnan(value): if not nan: yield value nan = True elif all(not math.isclose(value, x, rel_tol=1E-18, abs_tol=0) for x in results if isinstance(x, (int, Decimal, float))): yield value results.append(value) elif value not in results: yield value results.append(value) if len(self) < 2: collation = self.parser.default_collation else: collation = self.get_argument(self.context or context, 1, required=True, cls=str) with CollationManager(collation, self): yield from distinct_values() @method(function('insert-before', nargs=3, sequence_types=('item()*', 'xs:integer', 'item()*', 'item()*'))) def select_insert_before_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context position = self.get_argument(context, 1, required=True, cls=int) insert_at_pos = max(0, position - 1) inserted = False for pos, result in enumerate(self[0].select(context)): if not inserted and pos == insert_at_pos: yield from self[2].select(context) inserted = True yield result if not inserted: yield from self[2].select(context) @method(function('index-of', nargs=(2, 3), sequence_types=( 'xs:anyAtomicType*', 'xs:anyAtomicType', 'xs:string', 'xs:integer*'))) def select_index_of_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[int]: if self.context is not None: context = self.context value = self[1].get_atomized_operand(copy(context)) if value is None: raise self.error('XPTY0004', "2nd argument cannot be an empty sequence") if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) with CollationManager(collation, self) as manager: for pos, result in enumerate(self[0].atomization(context), start=1): if manager.eq(result, value): yield pos @method(function('remove', nargs=2, sequence_types=('item()*', 'xs:integer', 'item()*'))) def select_remove_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context position = self.get_argument(context, 1) if not isinstance(position, int): raise self.error('XPTY0004', 'an xs:integer required') for pos, result in enumerate(self[0].select(context), start=1): if pos != position: yield result @method(function('reverse', nargs=1, sequence_types=('item()*', 'item()*'))) def select_reverse_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: yield from reversed([x for x in self[0].select(self.context or context)]) @method(function('subsequence', nargs=(2, 3), sequence_types=('item()*', 'xs:double', 'xs:double', 'item()*'))) def select_subsequence_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context starting_loc = self.get_argument(context, 1, cls=NumericProxy) if not math.isnan(starting_loc) and not math.isinf(starting_loc): starting_loc = float(round_number(starting_loc)) if len(self) == 2: for pos, result in enumerate(self[0].select(context), start=1): if starting_loc <= pos: yield result else: length = self.get_argument(context, 2, cls=NumericProxy) if not math.isnan(length) and not math.isinf(length): length = float(round_number(length)) for pos, result in enumerate(self[0].select(context), start=1): if starting_loc <= pos < starting_loc + length: yield result @method(function('unordered', nargs=1, sequence_types=('item()*', 'item()*'))) def select_unordered_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context yield from sorted([x for x in self[0].select(context)], key=lambda x: self.string_value(x)) ### # Cardinality functions for sequences @method(function('zero-or-one', nargs=1, sequence_types=('item()*', 'item()?'))) def select_zero_or_one_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: results = iter(self[0].select(self.context or context)) try: item = next(results) except StopIteration: return try: next(results) except StopIteration: yield item else: raise self.error('FORG0003') @method(function('one-or-more', nargs=1, sequence_types=('item()*', 'item()+'))) def select_one_or_more_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: results = iter(self[0].select(self.context or context)) try: item = next(results) except StopIteration: raise self.error('FORG0004') from None else: yield item while True: try: yield next(results) except StopIteration: break @method(function('exactly-one', nargs=1, sequence_types=('item()*', 'item()'))) def select_exactly_one_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: results = iter(self[0].select(context)) try: item = next(results) except StopIteration: raise self.error('FORG0005') from None else: try: next(results) except StopIteration: yield item else: raise self.error('FORG0005') ### # Comparing sequences @method(function('deep-equal', nargs=(2, 3), sequence_types=('item()*', 'item()*', 'xs:string', 'xs:boolean'))) def evaluate_deep_equal_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) return deep_equal( seq1=self[0].select(copy(context)), seq2=self[1].select(copy(context)), collation=collation, ) ### # Regex @method(function('matches', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string', 'xs:string', 'xs:boolean'))) def evaluate_matches_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context input_string = self.get_argument(context, default='', cls=str) pattern = self.get_argument(context, 1, required=True, cls=str) flags = 0 if len(self) > 2: for c in self.get_argument(context, 2, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) return re.search(python_pattern, input_string, flags=flags) is not None except (re.error, RegexError) as err: if isinstance(context, XPathSchemaContext): return False msg = "Invalid regular expression: {}" raise self.error('FORX0002', msg.format(str(err))) from None except OverflowError as err: if isinstance(context, XPathSchemaContext): return False raise self.error('FORX0002', err) from None @method(function('replace', nargs=(3, 4), sequence_types=( 'xs:string?', 'xs:string', 'xs:string', 'xs:string', 'xs:string'))) def evaluate_replace_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context input_string: str = self.get_argument(context, default='', cls=str) pattern: str = self.get_argument(context, 1, required=True, cls=str) replacement: str = self.get_argument(context, 2, required=True, cls=str) flags = 0 q_flag = False if len(self) > 3: c: str for c in self.get_argument(context, 3, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) q_flag = True else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) re_pattern = re.compile(python_pattern, flags=flags) except (re.error, RegexError): if isinstance(context, XPathSchemaContext): return input_string raise self.error('FORX0002', f"Invalid regular expression {pattern!r}") else: if re_pattern.search(''): msg = f"Regular expression {pattern!r} matches zero-length string" raise self.error('FORX0003', msg) elif q_flag: # use replacement string as is (but inactivating escapes) replacement = replacement.replace('\\', '\\\\') input_string = input_string.replace('\\', '\\\\') return re_pattern.sub(replacement, input_string).replace('\\\\', '\\') elif Patterns.replacement.search(replacement) is None: raise self.error('FORX0004', f"Invalid replacement string {replacement!r}") else: for g in range(re_pattern.groups, -1, -1): if '$%d' % g in replacement: replacement = re.sub(r'(?' % g, replacement) return re_pattern.sub(replacement, input_string).replace('\\$', '$') @method(function('tokenize', nargs=(1, 3), sequence_types=('xs:string?', 'xs:string', 'xs:string', 'xs:string*'))) def evaluate_tokenize_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[Union[List[str], str]]: if self.context is not None: context = self.context input_string: Optional[str] = self.get_argument(context, cls=str) if input_string is None: return [] elif self.parser.version >= '3.1' and len(self) == 1: pattern = ' ' input_string = ' '.join(re.split('[ \t\n\r\f\v]+', input_string.strip(' \t\n\r\f\v'))) else: pattern = self.get_argument(context, 1, required=True, cls=str) flags = 0 if len(self) > 2: c: str for c in self.get_argument(context, 2, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) re_pattern = re.compile(python_pattern, flags=flags) except (re.error, RegexError): if isinstance(context, XPathSchemaContext): return [input_string] raise self.error('FORX0002', f"Invalid regular expression {pattern!r}") from None else: if re_pattern.search(''): msg = f"Regular expression {pattern!r} matches zero-length string" raise self.error('FORX0003', msg) result = [] if input_string: for value in re_pattern.split(input_string): if value is not None and re_pattern.search(value) is None: result.append(value) if len(result) == 1: return result[0] return result ### # Functions on anyURI @method(function('resolve-uri', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:anyURI?'))) def evaluate_resolve_uri_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AnyURI]: if self.context is not None: context = self.context relative = self.get_argument(context, cls=str) if len(self) == 1: if self.parser.base_uri is None: raise self.error('FONS0005') elif relative is None: return [] elif not AnyURI.is_valid(relative): raise self.error('FORG0002', '{!r} is not a valid URI'.format(relative)) else: return AnyURI(self.get_absolute_uri(relative)) base_uri = self.get_argument(context, index=1, required=True, cls=str) if not AnyURI.is_valid(base_uri): raise self.error('FORG0002', '{!r} is not a valid URI'.format(base_uri)) elif relative is None: return [] elif not AnyURI.is_valid(relative): raise self.error('FORG0002', '{!r} is not a valid URI'.format(relative)) else: return AnyURI(self.get_absolute_uri(relative, base_uri)) ### # String functions @method(function('codepoints-to-string', nargs=1, sequence_types=('xs:integer*', 'xs:string'))) def evaluate_codepoints_to_string_function( self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context result = [] value: Union[ItemType, int] for value in self[0].select(context): if isinstance(value, UntypedAtomic): value = int(value) if not isinstance(value, int): msg = "invalid type {} for codepoint {}".format(type(value), value) if isinstance(value, str): raise self.error('XPTY0004', msg) raise self.error('FORG0006', msg) elif is_xml_codepoint(value): result.append(chr(value)) else: msg = "{} is not a valid XML 1.0 codepoint".format(value) raise self.error('FOCH0001', msg) return ''.join(result) @method(function('string-to-codepoints', nargs=1, sequence_types=('xs:string?', 'xs:integer*'))) def evaluate_string_to_codepoints_function(self: XPathFunction, context: ContextType = None) \ -> List[int]: arg = self.get_argument(self.context or context, cls=str) return [ord(c) for c in arg] if arg else [] @method(function('compare', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:integer?'))) def evaluate_compare_function(self: XPathFunction, context: ContextType = None)\ -> Emptiable[int]: if self.context is not None: context = self.context comp1 = self.get_argument(context, 0, cls=str, promote=(AnyURI, UntypedAtomic)) comp2 = self.get_argument(context, 1, cls=str, promote=(AnyURI, UntypedAtomic)) if comp1 is None or comp2 is None: return [] if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True) with CollationManager(collation, self) as manager: value = manager.strcoll(comp1, comp2) return 0 if not value else 1 if value > 0 else -1 @method(function('contains', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_contains_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) with CollationManager(collation, self) as manager: return manager.contains(arg1, arg2) @method(function('codepoint-equal', nargs=2, sequence_types=('xs:string?', 'xs:string?', 'xs:boolean?'))) def evaluate_codepoint_equal_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[bool]: if self.context is not None: context = self.context comp1 = self.get_argument(context, 0, cls=str) comp2 = self.get_argument(context, 1, cls=str) if comp1 is None or comp2 is None: return [] elif len(comp1) != len(comp2): return False else: return all(ord(c1) == ord(c2) for c1, c2 in zip(comp1, comp2)) @method(function('string-join', nargs=2, sequence_types=('xs:string*', 'xs:string', 'xs:string'))) def evaluate_string_join_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context items = [ self.validated_value(s, cls=str, promote=AnyURI, index=k) for k, s in enumerate(self[0].atomization(context)) ] separator: str = self.get_argument(context, 1, required=True, cls=str) return separator.join(items) @method(function('normalize-unicode', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:string'))) def evaluate_normalize_unicode_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context arg: str = self.get_argument(context, default='', cls=str) if len(self) > 1: normalization_form = self.get_argument(context, 1, cls=str) if normalization_form is None: raise self.error('XPTY0004', "2nd argument can't be an empty sequence") else: normalization_form = normalization_form.strip().upper() else: normalization_form = 'NFC' if normalization_form == 'FULLY-NORMALIZED': msg = "%r normalization form not supported" % normalization_form raise self.error('FOCH0003', msg) if not arg: return '' elif not normalization_form: return arg try: return unicodedata.normalize(normalization_form, arg) except ValueError: msg = "unsupported normalization form %r" % normalization_form raise self.error('FOCH0003', msg) from None @method(function('upper-case', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_upper_case_function(self: XPathFunction, context: ContextType = None) -> str: return cast(str, self.get_argument(self.context or context, default='', cls=str)).upper() @method(function('lower-case', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_lower_case_function(self: XPathFunction, context: ContextType = None) -> str: return cast(str, self.get_argument(self.context or context, default='', cls=str)).lower() @method(function('encode-for-uri', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_encode_for_uri_function(self: XPathFunction, context: ContextType = None) -> str: uri_part: Optional[str] = self.get_argument(self.context or context, cls=str) return '' if uri_part is None else urllib_quote(uri_part, safe='~') @method(function('iri-to-uri', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_iri_to_uri_function(self: XPathFunction, context: ContextType = None) -> str: iri: Optional[str] = self.get_argument(self.context or context, cls=str, promote=AnyURI) return '' if iri is None else urllib_quote(iri, safe='-_.!~*\'()#;/?:@&=+$,[]%') @method(function('escape-html-uri', nargs=1, sequence_types=('xs:string?', 'xs:string'))) def evaluate_escape_html_uri_function(self: XPathFunction, context: ContextType = None) -> str: uri: Optional[str] = self.get_argument(self.context or context, cls=str) if uri is None: return '' return urllib_quote(uri, safe=''.join(chr(cp) for cp in range(32, 127))) @method(function('starts-with', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_starts_with_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context arg1: str = self.get_argument(context, default='', cls=str) arg2: str = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) with CollationManager(collation, self) as manager: return manager.startswith(arg1, arg2) @method(function('ends-with', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_ends_with_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context arg1 = self.get_argument(context, default='', cls=str) arg2 = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) with CollationManager(collation, self) as manager: return manager.endswith(arg1, arg2) @method(function('substring-before', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:string'))) @method(function('substring-after', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string?', 'xs:string', 'xs:string'))) def evaluate_substring_functions(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context arg1: str = self.get_argument(context, default='', cls=str) arg2: str = self.get_argument(context, index=1, default='', cls=str) if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) with CollationManager(collation, self) as manager: index = manager.find(arg1, arg2) if index < 0: return '' if self.symbol == 'substring-before': return arg1[:index] else: return arg1[index + len(arg2):] ### # Functions on durations, dates and times @method(function('years-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_years_from_duration_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Duration] = self.get_argument(self.context or context, cls=Duration) if item is None: return [] elif item.months >= 0: return item.months // 12 else: return -(abs(item.months) // 12) @method(function('months-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_months_from_duration_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Duration] = self.get_argument(self.context or context, cls=Duration) if item is None: return [] elif item.months >= 0: return item.months % 12 else: return -(abs(item.months) % 12) @method(function('days-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_days_from_duration_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Duration] = self.get_argument(self.context or context, cls=Duration) if item is None: return [] elif item.seconds >= 0: return int(item.seconds // 86400) else: return - int(abs(item.seconds) // 86400) @method(function('hours-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_hours_from_duration_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Duration] = self.get_argument(self.context or context, cls=Duration) if item is None: return [] elif item.seconds >= 0: return int(item.seconds // 3600 % 24) else: return - int(abs(item.seconds) // 3600 % 24) @method(function('minutes-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:integer?'))) def evaluate_minutes_from_duration_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Duration] = self.get_argument(self.context or context, cls=Duration) if item is None: return [] elif item.seconds >= 0: return int(item.seconds // 60 % 60) else: return - int(abs(item.seconds) // 60 % 60) @method(function('seconds-from-duration', nargs=1, sequence_types=('xs:duration?', 'xs:decimal?'))) def evaluate_seconds_from_duration_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[Union[int, Decimal]]: item: Optional[Duration] = self.get_argument(self.context or context, cls=Duration) if item is None: return [] elif item.seconds >= 0: return item.seconds % 60 else: return -(abs(item.seconds) % 60) @method(function('year-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('month-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('day-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('hours-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('minutes-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:integer?'))) @method(function('seconds-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:decimal?'))) def evaluate_from_datetime_functions(self: XPathFunction, context: ContextType = None) \ -> Emptiable[Union[int, Decimal]]: cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 item: Union[DateTime10, DateTime, None] = self.get_argument(self.context or context, cls=cls) if item is None: return [] elif self.symbol.startswith('year'): return item.year elif self.symbol.startswith('month'): return item.month elif self.symbol.startswith('day'): return item.day elif self.symbol.startswith('hour'): return item.hour elif self.symbol.startswith('minute'): return item.minute elif item.microsecond: return Decimal('{}.{}'.format(item.second, item.microsecond)) else: return item.second @method(function('timezone-from-dateTime', nargs=1, sequence_types=('xs:dateTime?', 'xs:dayTimeDuration?'))) def evaluate_timezone_from_datetime_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[DayTimeDuration]: cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 item: Union[DateTime10, DateTime, None] = self.get_argument(self.context or context, cls=cls) if item is None or item.tzinfo is None: return [] seconds = Decimal.from_float(item.tzinfo.offset.total_seconds()) return DayTimeDuration(seconds=seconds) @method(function('year-from-date', nargs=1, sequence_types=('xs:date?', 'xs:integer?'))) @method(function('month-from-date', nargs=1, sequence_types=('xs:date?', 'xs:integer?'))) @method(function('day-from-date', nargs=1, sequence_types=('xs:date?', 'xs:integer?'))) @method(function('timezone-from-date', nargs=1, sequence_types=('xs:date?', 'xs:dayTimeDuration?'))) def evaluate_from_date_functions(self: XPathFunction, context: ContextType = None) \ -> Emptiable[Union[int, DayTimeDuration]]: cls = Date if self.parser.xsd_version == '1.1' else Date10 item: Union[Date10, Date, None] = self.get_argument(self.context or context, cls=cls) if item is None: return [] elif self.symbol.startswith('year'): return item.year elif self.symbol.startswith('month'): return item.month elif self.symbol.startswith('day'): return item.day elif item.tzinfo is None: return [] dt = datetime.datetime(year=max(item.year, 0), month=item.month, day=item.day) seconds = Decimal.from_float(item.tzinfo.utcoffset(dt).total_seconds()) return DayTimeDuration(seconds=seconds) @method(function('hours-from-time', nargs=1, sequence_types=('xs:time?', 'xs:integer?'))) def evaluate_hours_from_time_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Time] = self.get_argument(self.context or context, cls=Time) return [] if item is None else item.hour @method(function('minutes-from-time', nargs=1, sequence_types=('xs:time?', 'xs:integer?'))) def evaluate_minutes_from_time_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[int]: item: Optional[Time] = self.get_argument(self.context or context, cls=Time) return [] if item is None else item.minute @method(function('seconds-from-time', nargs=1, sequence_types=('xs:time?', 'xs:decimal?'))) def evaluate_seconds_from_time_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[Union[int, Decimal]]: item: Optional[Time] = self.get_argument(self.context or context, cls=Time) return [] if item is None else item.second + item.microsecond / Decimal('1000000.0') @method(function('timezone-from-time', nargs=1, sequence_types=('xs:time?', 'xs:dayTimeDuration?'))) def evaluate_timezone_from_time_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[DayTimeDuration]: item: Optional[Time] = self.get_argument(self.context or context, cls=Time) if item is None or item.tzinfo is None: return [] seconds = Decimal.from_float(item.tzinfo.offset.total_seconds()) return DayTimeDuration(seconds=seconds) ### # Timezone adjustment functions @method(function('adjust-dateTime-to-timezone', nargs=(1, 2), sequence_types=('xs:dateTime?', 'xs:dayTimeDuration?', 'xs:dateTime?'))) def evaluate_adjust_datetime_to_timezone_function( self: XPathFunction, context: ContextType = None) -> Emptiable[Union[DateTime10, Date10]]: cls = DateTime if self.parser.xsd_version == '1.1' else DateTime10 result = self.adjust_datetime(self.context or context, cls) return cast(Emptiable[Union[DateTime10, DateTime]], result) @method(function('adjust-date-to-timezone', nargs=(1, 2), sequence_types=('xs:date?', 'xs:dayTimeDuration?', 'xs:date?'))) def evaluate_adjust_date_to_timezone_function( self: XPathFunction, context: ContextType = None) -> Emptiable[Union[Date10, Date]]: cls = Date if self.parser.xsd_version == '1.1' else Date10 result = self.adjust_datetime(self.context or context, cls) return cast(Emptiable[Union[Date10, Date]], result) @method(function('adjust-time-to-timezone', nargs=(1, 2), sequence_types=('xs:time?', 'xs:dayTimeDuration?', 'xs:time?'))) def evaluate_adjust_time_to_timezone_function( self: XPathFunction, context: ContextType = None) -> Emptiable[Time]: return cast(Emptiable[Time], self.adjust_datetime(self.context or context, Time)) ### # Static context functions @method(function('default-collation', nargs=0, sequence_types=('xs:string',))) def evaluate_default_collation_function(self: XPathFunction, context: ContextType = None) -> str: return self.parser.default_collation @method(function('static-base-uri', nargs=0, sequence_types=('xs:anyURI?',))) def evaluate_static_base_uri_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AnyURI]: if self.parser.base_uri is None: return [] return AnyURI(self.parser.base_uri) ### # Dynamic context functions @method(function('current-dateTime', nargs=0, sequence_types=('xs:dateTime',))) def evaluate_current_datetime_function(self: XPathFunction, context: ContextType = None) \ -> Union[DateTime10, DateTime]: if self.context is not None: context = self.context dt = datetime.datetime.now() if context is None else context.current_dt if self.parser.xsd_version == '1.1': return DateTime(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) return DateTime10(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) @method(function('current-date', nargs=0, sequence_types=('xs:date',))) def evaluate_current_date_function(self: XPathFunction, context: ContextType = None) \ -> Union[Date10, Date]: if self.context is not None: context = self.context dt = datetime.datetime.now() if context is None else context.current_dt if self.parser.xsd_version == '1.1': return Date(dt.year, dt.month, dt.day, tzinfo=dt.tzinfo) return Date10(dt.year, dt.month, dt.day, tzinfo=dt.tzinfo) @method(function('current-time', nargs=0, sequence_types=('xs:time',))) def evaluate_current_time_function(self: XPathFunction, context: ContextType = None) -> Time: if self.context is not None: context = self.context dt = datetime.datetime.now() if context is None else context.current_dt return Time(dt.hour, dt.minute, dt.second, dt.microsecond, dt.tzinfo) @method(function('implicit-timezone', nargs=0, sequence_types=('xs:dayTimeDuration',))) def evaluate_implicit_timezone_function(self: XPathFunction, context: ContextType = None) \ -> DayTimeDuration: if self.context is not None: context = self.context if context is not None and context.timezone is not None: return DayTimeDuration.fromtimedelta(context.timezone.offset) else: return DayTimeDuration.fromtimedelta(datetime.timedelta(seconds=time.timezone)) ### # The root function (Ref: https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-root) @method(function('root', nargs=(0, 1), sequence_types=('node()?', 'node()?'))) def evaluate_root_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[XPathNode]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() if isinstance(context, XPathSchemaContext): return [] elif not self: if not isinstance(context.item, XPathNode): raise self.error('XPTY0004') root = context.get_root(context.item) return root if root is not None else [] else: item = self.get_argument(context) if item is None: return [] elif not isinstance(item, XPathNode): raise self.error('XPTY0004') root = context.get_root(item) return root if root is not None else [] @method(function('lang', nargs=(1, 2), sequence_types=('xs:string?', 'node()', 'xs:boolean'))) def evaluate_lang_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context if len(self) > 1: item = self.get_argument(context, index=1, default_to_context=True) elif context is None: raise self.missing_context() else: item = context.item if not isinstance(item, ElementNode): raise self.error('XPTY0004') elif isinstance(item, EtreeElementNode): try: attr = item.obj.attrib[XML_LANG] except KeyError: if len(self) > 1 or context is None: return False for elem in context.iter_ancestors(): if isinstance(elem, EtreeElementNode): if XML_LANG in elem.obj.attrib: lang = cast(str, elem.obj.attrib[XML_LANG]) break else: return False else: if not isinstance(attr, str): return False lang = attr.strip() test_lang: str = self.get_argument(context, cls=str) if test_lang is None: test_lang = '' test_lang = test_lang.strip().lower() lang = lang.strip().lower() return lang == test_lang or lang.startswith(test_lang) and lang[len(test_lang)] == '-' else: return False ### # Functions that generate sequences @method(function('element-with-id', nargs=(1, 2), sequence_types=('xs:string*', 'node()', 'element()*'))) @method(function('id', nargs=(1, 2), sequence_types=('xs:string*', 'node()', 'element()*'))) def select_id_function(self: XPathFunction, context: ContextType = None) -> Iterator[ElementNode]: if self.context is not None: context = self.context idrefs = {x for item in self[0].select(copy(context)) for x in self.string_value(item).split() if Id.is_valid(x)} if context is None: raise self.missing_context() if len(self) == 1: node = context.item if node is None: node = context.root else: node = self.get_argument(context, index=1) if not isinstance(node, XPathNode): raise self.error('XPTY0004') if isinstance(context, XPathSchemaContext): return assert context is not None root = context.get_root(node) if root is None: return # TODO: PSVI bindings with also xsi:type evaluation for element in root.iter_descendants(): if not isinstance(element, EtreeElementNode): continue if element.obj.text in idrefs: if self.parser.schema is not None: xsd_element = self.parser.schema.find(element.extended_path) if xsd_element is None or not hasattr(xsd_element, 'type') or \ xsd_element.type is None or not xsd_element.type.is_key(): continue idrefs.remove(element.obj.text) if self.symbol == 'id': yield element else: parent = element.parent if isinstance(parent, ElementNode): yield parent continue # pragma: no cover for attr in element.attributes: if not isinstance(attr.obj, str): continue if attr.obj in idrefs: if attr.name == XML_ID: idrefs.remove(attr.obj) yield element break if self.parser.schema is None: continue xsd_element = self.parser.schema.find(element.extended_path) if xsd_element is None or not hasattr(xsd_element, 'attrib'): continue try: xsd_attribute = xsd_element.attrib[attr.name] except KeyError: continue else: if xsd_attribute.type is None or not xsd_attribute.type.is_key(): continue # pragma: no cover idrefs.remove(attr.obj) yield element break @method(function('idref', nargs=(1, 2), sequence_types=('xs:string*', 'node()', 'node()*'))) def select_idref_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[XPathNode]: # TODO: PSVI bindings with also xsi:type evaluation if self.context is not None: context = self.context ids = [x for x in self[0].select(context=copy(context)) if hasattr(x, 'split')] node = self.get_argument(context, index=1, default_to_context=True) if isinstance(context, XPathSchemaContext): return elif not isinstance(node, XPathNode): raise self.error('XPTY0004') elif isinstance(node, (EtreeElementNode, DocumentNode)): for element in node.iter_descendants(): if not isinstance(element, EtreeElementNode): continue text = element.obj.text if text and is_idrefs(text) and \ any(v in text.split() for x in ids for v in x.split()): yield element continue if element.attributes: for attr in element.attributes: # pragma: no cover if attr.name != XML_ID and isinstance(attr.obj, str) and \ any(v in attr.obj.split() for x in ids for v in x.split()): yield element break @method(function('doc', nargs=1, sequence_types=('xs:string?', 'document-node()?'))) @method(function('doc-available', nargs=1, sequence_types=('xs:string?', 'xs:boolean'))) def evaluate_doc_functions(self: XPathFunction, context: ContextType = None) \ -> Union[bool, Emptiable[DocumentNode]]: if self.context is not None: context = self.context uri = self.get_argument(context) if uri is None: return [] if self.symbol == 'doc' else False elif isinstance(uri, str): pass elif isinstance(uri, AnyURI): uri = str(uri) elif isinstance(uri, UntypedAtomic): raise self.error('FODC0002') else: raise self.error('XPTY0004') if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return [] if self.symbol == 'doc' else False uri = uri.strip() if uri.startswith(':'): if self.symbol == 'doc' or self.parser.version <= '3.0': raise self.error('FODC0005') return False try: uri = self.get_absolute_uri(uri) except ElementPathValueError as err: if self.symbol == 'doc': raise self.error('FODC0002', err.message) from None return False try: doc = context.documents[uri] # type: ignore[index] except (KeyError, TypeError): if self.symbol == 'doc': if is_local_dir_url(uri): raise self.error('FODC0005', 'document URI is a directory') raise self.error('FODC0002') return False else: if doc is None: raise self.error('FODC0002') try: sequence_type = self.parser.document_types[uri] # type: ignore[index] except (KeyError, TypeError): sequence_type = 'document-node()' if not match_sequence_type(doc, sequence_type, self.parser): msg = f"Type does not match sequence type {sequence_type!r}" raise self.error('XPDY0050', msg) return doc if self.symbol == 'doc' else True @method(function('collection', nargs=(0, 1), sequence_types=('xs:string?', 'node()*'))) def evaluate_collection_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[List[XPathNode]]: if self.context is not None: context = self.context uri: Optional[str] = self.get_argument(context, cls=str) if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return [] elif not self or uri is None: if context.default_collection is None: raise self.error('FODC0002', 'no default collection has been defined') collection = context.default_collection sequence_type = self.parser.default_collection_type else: uri = self.get_absolute_uri(uri) try: collection = context.collections[uri] # type: ignore[index] except (KeyError, TypeError): if is_local_dir_url(str(uri)): raise self.error('FODC0004', 'collection URI is a directory') raise self.error('FODC0002', '{!r} collection not found'.format(uri)) from None try: sequence_type = self.parser.collection_types[uri] # type: ignore[index] except (KeyError, TypeError): return collection if not match_sequence_type(collection, sequence_type, self.parser): msg = f"Type does not match sequence type {sequence_type!r}" raise self.error('XPDY0050', msg) return collection ### # The error function # # https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-error # https://www.w3.org/TR/xpath-functions/#func-error # @method(function('error', nargs=(0, 3), sequence_types=('xs:QName?', 'xs:string', 'item()*', 'none'))) def evaluate_error_function(self: XPathFunction, context: ContextType = None) -> None: if self.context is not None: context = self.context if not self: raise self.error('FOER0000') elif len(self) == 1: error = self.get_argument(context, cls=QName) if error is None: raise self.error('XPTY0004', "an xs:QName expected") raise self.error(error or 'FOER0000') else: error = self.get_argument(context, cls=QName) description = self.get_argument(context, index=1, cls=str) raise self.error(error or 'FOER0000', description) ### # The trace function # # https://www.w3.org/TR/2010/REC-xpath-functions-20101214/#func-trace # @method(function('trace', nargs=2, sequence_types=('item()*', 'xs:string', 'item()*'))) def select_trace_function(self: XPathFunction, context: ContextType = None) -> Iterator[ItemType]: if self.context is not None: context = self.context label = self.get_argument(context, index=1, cls=str) for value in self[0].select(context): self.parser.tracer('{} {}'.format(label, str(value).strip())) yield value # XPath 2.0 definitions continue into module xpath2_constructors sissaschool-elementpath-d3688c7/elementpath/xpath2/_xpath2_operators.py000066400000000000000000001024421476131650400264630ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 2.0 implementation - part 2 (operators, expressions and multi-role tokens) """ import math import operator from copy import copy from decimal import Decimal, DivisionByZero from typing import cast, List, Type, Union from elementpath._typing import Iterator from elementpath.aliases import Emptiable, SequenceType from elementpath.protocols import XsdAttributeProtocol from elementpath.exceptions import ElementPathError from elementpath.helpers import OCCURRENCE_INDICATORS, numeric_equal, numeric_not_equal, \ node_position, get_double from elementpath.namespaces import XSD_NAMESPACE, XSD_NOTATION, XSD_ANY_ATOMIC_TYPE, \ XSD_UNTYPED, get_namespace, get_expanded_name from elementpath.datatypes import UntypedAtomic, QName, AnyURI, \ Duration, Integer, DoubleProxy10, AtomicType, NumericType from elementpath.decoder import get_atomic_sequence from elementpath.xpath_nodes import ElementNode, DocumentNode, XPathNode, AttributeNode from elementpath.sequence_types import is_instance from elementpath.xpath_context import ContextType, ItemType, XPathSchemaContext from elementpath.xpath_tokens import XPathToken, XPathFunction, XPathConstructor from .xpath2_parser import XPath2Parser __all__ = ['XPath2Parser'] COMPARISON_OPERATORS = {'eq', 'ne', 'lt', 'le', 'gt', 'ge'} register = XPath2Parser.register infix = XPath2Parser.infix method = XPath2Parser.method function = XPath2Parser.function @method('then') @method('as') @method('of') @method('else') @method('in') @method('return') @method('satisfies') def nud_auxiliary_symbols(self: XPathToken) -> XPathToken: return self.as_name() ### # Variables @method('$', bp=90) def nud_variable_reference(self: XPathToken) -> XPathToken: self.parser.expected_next('(name)', 'Q{') self[:] = self.parser.expression(rbp=90), return self @method('$') def evaluate_variable_reference(self: XPathToken, context: ContextType = None) \ -> Emptiable[SequenceType[ItemType]]: if context is None: raise self.missing_context() varname = self[0].value assert isinstance(varname, str) try: get_expanded_name(varname, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', "namespace prefix {} not found".format(err)) try: value = context.variables[varname] except KeyError: pass else: return value if value is not None else [] if isinstance(context, XPathSchemaContext): try: sequence_type = self.parser.variable_types[varname].strip() # type: ignore[index] except (TypeError, KeyError): return [] else: if sequence_type[-1] in OCCURRENCE_INDICATORS: sequence_type = sequence_type[:-1] if QName.pattern.match(sequence_type) is not None: try: type_name = get_expanded_name(sequence_type, self.parser.namespaces) except KeyError: pass else: if self.parser.schema is not None: xsd_type = self.parser.schema.get_type(type_name) result = [v for v in get_atomic_sequence(xsd_type)] if len(result) == 1: return result[0] else: return cast(Emptiable[AtomicType], result) return UntypedAtomic('1') raise self.error('XPST0008', 'unknown variable %r' % str(varname)) ### # Node sequence composition XPath2Parser.duplicate('|', 'union') @method(infix('intersect', bp=55)) @method(infix('except', bp=55)) def select_intersect_and_except_operators(self: XPathToken, context: ContextType = None) \ -> Iterator[XPathNode]: if context is None: raise self.missing_context() s1, s2 = set(self[0].select(copy(context))), set(self[1].select(copy(context))) if any(not isinstance(x, XPathNode) for x in s1) \ or any(not isinstance(x, XPathNode) for x in s2): raise self.error('XPTY0004', 'only XPath nodes are allowed') if self.symbol == 'except': yield from cast(List[XPathNode], sorted(s1 - s2, key=node_position)) else: yield from cast(List[XPathNode], sorted(s1 & s2, key=node_position)) ### # 'if' expression @method('if', bp=20) def nud_if_expression(self: XPathToken) -> XPathToken: if self.parser.next_token.symbol != '(': return self.as_name() self.parser.advance('(') self[:] = self.parser.expression(5), self.parser.advance(')') self.parser.advance('then') self[1:] = self.parser.expression(5), self.parser.advance('else') self[2:] = self.parser.expression(5), return self @method('if') def evaluate_if_expression(self: XPathToken, context: ContextType = None) \ -> Union[ItemType, List[ItemType]]: if self.boolean_value(self[0].select(copy(context))): if isinstance(context, XPathSchemaContext): self[2].evaluate(copy(context)) return self[1].evaluate(context) else: if isinstance(context, XPathSchemaContext): self[1].evaluate(copy(context)) return self[2].evaluate(context) @method('if') def select_if_expression(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: if self.boolean_value(self[0].select(copy(context))): if isinstance(context, XPathSchemaContext): self[2].evaluate(copy(context)) yield from self[1].select(context) else: if isinstance(context, XPathSchemaContext): self[1].evaluate(copy(context)) yield from self[2].select(context) ### # Quantified expressions @method('some', bp=20) @method('every', bp=20) def nud_quantified_expressions(self: XPathToken) -> XPathToken: del self[:] if self.parser.next_token.symbol != '$': return self.as_name() while True: self.parser.next_token.expected('$') variable = self.parser.expression(5) self.append(variable) self.parser.advance('in') expr = self.parser.expression(5) self.append(expr) for tk in filter(lambda x: x.symbol == '$', expr.iter()): if tk[0].value == variable[0].value: raise tk.error('XPST0008', 'loop variable in its range expression') if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('satisfies') self.append(self.parser.expression(5)) return self @method('some') @method('every') def evaluate_quantified_expressions(self: XPathToken, context: ContextType = None) -> bool: if context is None: raise self.missing_context() context = copy(context) some = self.symbol == 'some' varnames = [cast(str, self[k][0].value) for k in range(0, len(self) - 1, 2)] selectors = [self[k].select for k in range(1, len(self) - 1, 2)] for results in copy(context).iter_product(selectors, varnames): context.variables.update(x for x in zip(varnames, results)) if self.boolean_value(self[-1].select(copy(context))): if some: return True elif not some: return False return not some ### # 'for' expressions @method('for', bp=20) def nud_for_expression(self: XPathToken) -> XPathToken: del self[:] if self.parser.next_token.symbol != '$': return self.as_name() while True: self.parser.next_token.expected('$') variable = self.parser.expression(5) self.append(variable) self.parser.advance('in') expr = self.parser.expression(5) self.append(expr) for tk in filter(lambda x: x.symbol == '$', expr.iter()): if tk[0].value == variable[0].value: raise tk.error('XPST0008', 'loop variable in its range expression') if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('return') self.append(self.parser.expression(5)) return self @method('for') def select_for_expression(self: XPathToken, context: ContextType = None) -> Iterator[ItemType]: if context is None: raise self.missing_context() context = copy(context) varnames = [cast(str, self[k][0].value) for k in range(0, len(self) - 1, 2)] selectors = [self[k].select for k in range(1, len(self) - 1, 2)] for results in copy(context).iter_product(selectors, varnames): context.variables.update(x for x in zip(varnames, results)) yield from self[-1].select(copy(context)) ### # Sequence type based @method('instance', bp=60) @method('treat', bp=61) def led_sequence_type_based_expressions(self: XPathToken, left: XPathToken) -> XPathToken: self.parser.advance('of' if self.symbol == 'instance' else 'as') self[:] = left, self.parse_sequence_type() return self @method('instance') def evaluate_instance_expression(self: XPathToken, context: ContextType = None) -> bool: occurs = self[1].occurrence position = None if self[1].symbol == 'empty-sequence': for _ in self[0].select(context): return False return True elif self[1].label in ('kind test', 'sequence type', 'function test'): if context is None: raise self.missing_context() context = copy(context) for position, context.item in enumerate(self[0].select(context)): if context.axis is None: context.axis = 'self' result = self[1].evaluate(context) if isinstance(result, list) and not result: return occurs in ('*', '?') elif position and (occurs is None or occurs == '?'): return False else: return position is not None or occurs in ('*', '?') else: type_name = self[1].source.rstrip('*+?') try: qname = get_expanded_name(type_name, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', "namespace prefix {} not found".format(err)) for position, item in enumerate(self[0].select(context)): try: if not is_instance(item, qname, self.parser): return False except KeyError: msg = f"atomic type {type_name!r} not found in in-scope schema types" raise self.error('XPST0051', msg) from None else: if position and (occurs is None or occurs == '?'): return False else: return position is not None or occurs in ('*', '?') @method('treat') def evaluate_treat_expression(self: XPathToken, context: ContextType = None) \ -> List[ItemType]: occurs = self[1].occurrence position = None castable_expr = [] if self[1].symbol == 'empty-sequence': for _ in self[0].select(context): raise self.error('XPDY0050') elif self[1].label in ('kind test', 'sequence type', 'function test'): for position, item in enumerate(self[0].select(context)): result = self[1].evaluate(context) if isinstance(result, list) and not result: raise self.error('XPDY0050') elif position and (occurs is None or occurs == '?'): raise self.error('XPDY0050', "more than one item in sequence") castable_expr.append(item) else: if position is None and occurs not in ('*', '?'): raise self.error('XPDY0050', "the sequence cannot be empty") else: type_name = self[1].source.rstrip('*+?') try: qname = get_expanded_name(type_name, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', 'prefix {} not found'.format(str(err))) if not qname.startswith('{') and not QName.is_valid(qname): raise self.error('XPST0003') for position, item in enumerate(self[0].select(context)): try: if not is_instance(item, qname, self.parser): msg = f"item {item!r} is not of type {type_name!r}" raise self.error('XPDY0050', msg) except KeyError: msg = f"atomic type {type_name!r} not found in in-scope schema types" raise self.error('XPST0051', msg) from None else: if position and (occurs is None or occurs == '?'): raise self.error('XPDY0050', "more than one item in sequence") castable_expr.append(item) else: if position is None and occurs not in ('*', '?'): raise self.error('XPDY0050', "the sequence cannot be empty") return castable_expr ### # Simple type based @method('castable', bp=62) @method('cast', bp=63) def led_cast_expressions(self: XPathToken, left: XPathToken) -> XPathToken: self.parser.advance('as') self.parser.expected_next('(name)', ':', 'Q{', message='an EQName expected') self[:] = left, self.parser.expression(rbp=85) if self.parser.next_token.symbol == '?': self[1].occurrence = '?' self.parser.advance() return self @method('castable') @method('cast') def evaluate_cast_expressions(self: XPathToken, context: ContextType = None) \ -> Emptiable[AtomicType]: type_name = self[1].source.rstrip('+*?') try: atomic_type = get_expanded_name(type_name, self.parser.namespaces) except KeyError as err: raise self.error('XPST0081', 'prefix {} not found'.format(str(err))) if atomic_type in (XSD_NOTATION, XSD_ANY_ATOMIC_TYPE): raise self.error('XPST0080') namespace = get_namespace(atomic_type) if namespace != XSD_NAMESPACE and \ (self.parser.schema is None or self.parser.schema.get_type(atomic_type) is None): msg = f"atomic type {atomic_type!r} not found in the in-scope schema types" raise self.error('XPST0051', msg) result = [res for res in self[0].select(context)] if len(result) > 1: if self.symbol != 'cast': return False raise self.error('XPTY0004', "more than one value in expression") elif not result: if self[1].occurrence == '?': return [] if self.symbol == 'cast' else True elif self.symbol != 'cast': return False else: raise self.error('XPTY0004', "an atomic value is required") arg = self.data_value(result[0]) value: Emptiable[AtomicType] try: if namespace != XSD_NAMESPACE: if self.parser.schema is not None: value = self.parser.schema.cast_as(self.string_value(arg), atomic_type) else: value = [] else: local_name = atomic_type.split('}')[1] try: token_class = cast(Type[XPathConstructor], self.parser.symbol_table[local_name]) except KeyError: msg = f"atomic type {type_name!r} not found in the in-scope schema types" raise self.error('XPST0051', msg) else: if token_class.label != 'constructor function': msg = f"token {type_name!r} is not a constructor" raise self.error('XPST0051', msg) if local_name == 'QName': if isinstance(arg, QName): pass elif self.parser.version < '3.0' and self[0].symbol != '(string)': raise self.error('XPTY0004', "Non literal string to QName cast") token = token_class(self.parser) value = token.cast(arg) except ElementPathError: if self.symbol != 'cast': return False elif isinstance(context, XPathSchemaContext): return UntypedAtomic('1') raise except (TypeError, ValueError) as err: if self.symbol != 'cast': return False elif isinstance(context, XPathSchemaContext): return UntypedAtomic('1') elif isinstance(arg, (UntypedAtomic, str)): raise self.error('FORG0001', err) from None raise self.error('XPTY0004', err) from None else: return value if self.symbol == 'cast' else True ### # Comma operator - concatenate items or sequences @method(infix(',', bp=5)) def evaluate_comma_operator(self: XPathToken, context: ContextType = None) \ -> List[ItemType]: results = [] for op in self: result = op.evaluate(context) if isinstance(result, list): results.extend(result) elif result is not None: results.append(result) return results @method(',') def select_comma_operator(self: XPathToken, context: ContextType = None) -> Iterator[ItemType]: for op in self: yield from op.select(context=copy(context)) ### # Parenthesized expression: XPath 2.0 admits the empty case (). @method(register('(', lbp=80, rpb=80, label='expression')) def nud_parenthesized_expression(self: XPathToken) -> XPathToken: if self.parser.next_token.symbol != ')': self[:] = self.parser.expression(), self.parser.advance(')') return self @method('(') def led_parenthesized_expression(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol == '(name)': if left.value in self.parser.RESERVED_FUNCTION_NAMES: msg = f"{left.value!r} is not allowed as function name" raise left.error('XPST0003', msg) else: raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) elif left.symbol == ':' and left[1].symbol == '(name)': if left[1].namespace == XSD_NAMESPACE: msg = 'unknown constructor function {!r}'.format(left[1].value) raise left[1].error('XPST0017', msg) raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) if self.parser.next_token.symbol != ')': self[:] = left, self.parser.expression() else: self[:] = left, self.parser.advance(')') return self @method('(') def evaluate_parenthesized_expression(self: XPathToken, context: ContextType = None) \ -> Union[ItemType, List[ItemType]]: return self[0].evaluate(context) if self else [] @method('(') def select_parenthesized_expression(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: return self[0].select(context) if self else iter(()) ### # Value comparison operators (eq, ne, lt, le, gt, and ge) # # Ref: https://www.w3.org/TR/xpath20/#id-value-comparisons # @method('eq', bp=30) @method('ne', bp=30) @method('lt', bp=30) @method('gt', bp=30) @method('le', bp=30) @method('ge', bp=30) def led_value_comparison_operators(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol in COMPARISON_OPERATORS: raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=30) return self @method('eq') @method('ne') @method('lt') @method('gt') @method('le') @method('ge') def evaluate_value_comparison_operators(self: XPathToken, context: ContextType = None) \ -> Emptiable[bool]: operands = [self[0].get_atomized_operand(context=copy(context)), self[1].get_atomized_operand(context=copy(context))] if any(x is None for x in operands): return [] elif any(isinstance(x, XPathFunction) for x in operands): raise self.error('FOTY0013', "cannot compare a function item") elif all(isinstance(x, DoubleProxy10) for x in operands): # Special case of two values: use custom operators if self.symbol == 'eq': return numeric_equal(*cast(List[float], operands)) elif self.symbol == 'ne': return numeric_not_equal(*cast(List[float], operands)) elif numeric_equal(*cast(List[float], operands)): return self.symbol in ('le', 'ge') cls0, cls1 = type(operands[0]), type(operands[1]) if cls0 is cls1 and cls0 is not Duration: pass elif all(isinstance(x, float) for x in operands): pass elif any(isinstance(x, bool) for x in operands): msg = "cannot apply {} between {!r} and {!r}".format(self, *operands) raise self.error('XPTY0004', msg) elif all(isinstance(x, (int, Decimal)) for x in operands): pass elif all(isinstance(x, (str, UntypedAtomic, AnyURI)) for x in operands): pass elif all(isinstance(x, (str, UntypedAtomic, QName)) for x in operands): pass elif all(isinstance(x, (float, Decimal, int)) for x in operands): if isinstance(operands[0], float): operands[1] = get_double(cast(NumericType, operands[1]), self.parser.xsd_version) else: operands[0] = get_double(cast(NumericType, operands[0]), self.parser.xsd_version) elif all(isinstance(x, Duration) for x in operands) and self.symbol in ('eq', 'ne'): pass elif (issubclass(cls0, cls1) or issubclass(cls1, cls0)) and not issubclass(cls0, Duration): pass else: msg = "cannot apply {} between {!r} and {!r}".format(self, *operands) raise self.error('XPTY0004', msg) try: return cast(bool, getattr(operator, self.symbol)(*operands)) except TypeError as err: raise self.error('XPTY0004', err) from None ### # Node comparison @method('is', bp=30) def led_node_comparison(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol == 'is': raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=30) return self @method('is') @method(infix('<<', bp=30)) @method(infix('>>', bp=30)) def evaluate_node_comparison(self: XPathToken, context: ContextType = None) -> Emptiable[bool]: symbol = self.symbol left = [x for x in self[0].select(context)] if not left: return [] elif len(left) > 1 or not isinstance(left[0], XPathNode): raise self[0].error('XPTY0004', f"left operand of {symbol!r} must be a single node") right = [x for x in self[1].select(context)] if not right: return [] elif len(right) > 1 or not isinstance(right[0], XPathNode): raise self[0].error('XPTY0004', "right operand of %r must be a single node" % symbol) if symbol == 'is': return left[0] is right[0] else: if left[0] is right[0] or context is None: return False documents = [context.root] documents.extend(v for v in context.variables.values() if isinstance(v, DocumentNode)) for root in documents: if root is not None: for item in root.iter_document(): # pragma: no cover if left[0] is item: return True if symbol == '<<' else False elif right[0] is item: return False if symbol == '<<' else True else: raise self.error('FOCA0002', "operands are not nodes of the XML tree!") ### # Range expression @method('to', bp=35) def led_range_expression(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol == 'to': raise self.wrong_syntax() self[:] = left, self.parser.expression(rbp=35) return self @method('to') def evaluate_range_expression(self: XPathToken, context: ContextType = None) -> List[int]: start, stop = self.get_operands(context, cls=Integer) try: return [x for x in range(start, stop + 1)] except TypeError: return [] @method('to') def select_range_expression(self: XPathToken, context: ContextType = None) -> Iterator[int]: yield from cast(List[int], self.evaluate(context)) ### # Numerical operators @method(infix('idiv', bp=45)) def evaluate_idiv_operator(self: XPathToken, context: ContextType = None) -> int: op1, op2 = self.get_operands(context) if op1 is None or op2 is None: raise self.error('XPST0005') try: if math.isinf(op1): raise self.error('FOAR0001' if op2 == 0 else 'FOAR0002') elif math.isnan(op1) or math.isnan(op2): raise self.error('FOAR0002') except TypeError as err: if isinstance(context, XPathSchemaContext): return 1 raise self.error('XPTY0004', err) from None try: result = op1 // op2 except (ZeroDivisionError, DivisionByZero): if isinstance(context, XPathSchemaContext): return 1 raise self.error('FOAR0001') from None else: if result >= 0 or isinstance(op1, Decimal) or \ isinstance(op2, Decimal) or abs(op1) == abs(op2): return int(result) else: return int(result) + 1 # Resolve the intrinsic ambiguity of some infix operators @method('union') @method('intersect') @method('except') @method('eq') @method('ne') @method('lt') @method('gt') @method('le') @method('ge') @method('is') @method('to') @method('idiv') @method('instance') @method('treat') @method('castable') @method('cast') def nud_disambiguation_of_infix_operators(self: XPathToken) -> XPathToken: return self.as_name() ### # Kind tests (sequence types that can appear also in XPath expressions) @method(function('document-node', nargs=(0, 1), label='kind test')) def select_document_node_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[DocumentNode]: if context is None: raise self.missing_context() elif not self: for item in context.iter_children_or_self(): if isinstance(item, DocumentNode): yield item else: elements = [e for e in self[0].select(copy(context)) if isinstance(e, ElementNode)] if isinstance(context.item, DocumentNode): if len(elements) == 1: yield context.item @method('document-node') def nud_document_node_kind_test(self: XPathFunction) -> XPathFunction: self.parser.advance('(') if self.parser.next_token.symbol in ('element', 'schema-element'): self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': msg = 'Too many arguments: expected at most 1 argument' raise self.error('XPST0017', msg) elif self.parser.next_token.symbol != ')': raise self.error('XPST0003', 'element or schema-element kind test expected') self.parser.advance(')') return self @method(function('element', nargs=(0, 2), label='kind test')) def select_element_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[ElementNode]: if context is None: raise self.missing_context() elif not self: for item in context.iter_children_or_self(): if isinstance(item, ElementNode): yield item else: for item in self[0].select(context): if len(self) == 1: yield cast(ElementNode, item) # Already selected by sequence type test elif isinstance(item, ElementNode): type_annotation = self[1].name if item.nilled: if self[1].occurrence in ('*', '?'): yield item elif item.type_name == type_annotation: if type_annotation != XSD_UNTYPED: yield item elif self[0].symbol != '*': yield item elif is_instance(item.typed_value, type_annotation, self.parser): yield item @method('element') def nud_element_kind_test(self: XPathFunction) -> XPathFunction: self.parser.advance('(') if self.parser.next_token.symbol != ')': self.parser.expected_next('(name)', ':', '*', message='a QName or a wildcard expected') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': self.parser.advance(',') self.parser.expected_next('(name)', ':', message='a QName expected') self[1:] = self.parser.expression(80), if self.parser.next_token.symbol in ('*', '+', '?'): self[1].occurrence = self.parser.next_token.symbol self.parser.advance() self.parser.advance(')') return self @method(function('schema-attribute', nargs=1, label='kind test')) def select_schema_attribute_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[AttributeNode]: if context is None: raise self.missing_context() attribute_name = self[0].source qname = get_expanded_name(attribute_name, self.parser.namespaces) for _ in context.iter_children_or_self(): if self.parser.schema is None: break if self.parser.schema.get_attribute(qname) is None: raise self.error('XPST0008', "attribute %r not found in schema" % attribute_name) if isinstance(context.item, AttributeNode) and context.item.match_name(qname): yield context.item return if not isinstance(context, XPathSchemaContext): raise self.error('XPST0008', 'schema attribute %r not found' % attribute_name) @method(function('schema-element', nargs=1, label='kind test')) def select_schema_element_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[ElementNode]: if context is None: raise self.missing_context() element_name = self[0].source qname = get_expanded_name(element_name, self.parser.namespaces) if self.parser.schema is not None: for _ in context.iter_children_or_self(): if self.parser.schema.get_element(qname) is None \ and self.parser.schema.get_substitution_group(qname) is None: raise self.error('XPST0008', "element %r not found in schema" % element_name) if isinstance(context.item, ElementNode) and context.item.name == qname: yield context.item return if not isinstance(context, XPathSchemaContext): raise self.error('XPST0008', 'schema element %r not found' % element_name) @method('schema-attribute') @method('schema-element') def nud_schema_node_kind_test(self: XPathFunction) -> XPathFunction: self.parser.advance('(') self.parser.expected_next('(name)', ':', 'Q{', message='a QName expected') self[0:] = self.parser.expression(5), self.parser.advance(')') return self ### # Multi role-tokens definition: in XPath 2.0 the 'attribute' keyword is used both for # attribute:: axis and attribute() node type function. # # First the XPath1 token class has to be removed from the XPath2 symbol table. Then the # symbol has to be registered usually with the same binding power (bp --> lbp, rbp), a # multi-value label (using a tuple of values) and a custom pattern. Finally a custom nud # or led method is required. XPath2Parser.unregister('attribute') XPath2Parser.register( 'attribute', lbp=90, rbp=90, label=('kind test', 'axis'), pattern=r'\battribute(?=\s*\:\:|\s*\(\:.*\:\)\s*\:\:|\s*\(|\s*\(\:.*\:\)\()' ) @method('attribute') def nud_attribute_kind_test_or_axis(self: XPathToken) -> XPathToken: if self.parser.next_token.symbol == '::': self.label = 'axis' self.parser.advance('::') self.parser.expected_next( '(name)', '*', 'text', 'node', 'document-node', 'comment', 'processing-instruction', 'attribute', 'schema-attribute', 'element', 'schema-element', 'namespace-node' ) self[:] = self.parser.expression(rbp=90), else: self.label = 'kind test' self.parser.advance('(') if self.parser.next_token.symbol != ')': self.parser.next_token.expected('(name)', '*', ':') self[:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': self.parser.advance(',') self.parser.next_token.expected('(name)', ':') self[1:] = self.parser.expression(5), self.parser.advance(')') if self.namespace: msg = f"{self.value!r} is not allowed as function name" raise self.error('XPST0003', msg) return self @method('attribute') def select_attribute_kind_test_or_axis(self: XPathToken, context: ContextType = None) \ -> Iterator[Union[AtomicType, AttributeNode, XsdAttributeProtocol]]: if context is None: raise self.missing_context() elif self.label == 'axis': for _ in context.iter_attributes(): yield from cast(Iterator[AttributeNode], self[0].select(context)) elif not self: for attribute in context.iter_attributes(): yield attribute else: name = self[0].value assert isinstance(name, str) if self.parser.schema is not None and len(self) == 2: assert isinstance(self[1].value, str) type_name = get_expanded_name(self[1].value, namespaces=self.parser.namespaces) else: type_name = None for attribute in context.iter_attributes(): if attribute.match_name(name): if isinstance(context, XPathSchemaContext): continue if type_name == XSD_UNTYPED == attribute.type_name: if name != '*': yield attribute elif not type_name or attribute.type_name == type_name or \ is_instance(attribute.typed_value, type_name, self.parser): yield attribute # XPath 2.0 definitions continue into module xpath2_functions sissaschool-elementpath-d3688c7/elementpath/xpath2/xpath2_parser.py000066400000000000000000000561721476131650400256120ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 2.0 implementation - part 1 (parser class and symbols) """ from abc import ABCMeta import locale from urllib.parse import urlparse from typing import cast, Any, ClassVar, Dict, List, Optional, Tuple, Type, Union from elementpath._typing import Callable, MutableMapping from elementpath.aliases import NamespacesType, NargsType from elementpath.helpers import upper_camel_case, is_ncname, ordinal from elementpath.exceptions import ElementPathError, ElementPathTypeError, \ ElementPathValueError, MissingContextError, xpath_error from elementpath.namespaces import XSD_NAMESPACE, XML_NAMESPACE, \ XPATH_FUNCTIONS_NAMESPACE, XQT_ERRORS_NAMESPACE, \ XSD_NOTATION, XSD_ANY_ATOMIC_TYPE, get_prefixed_name from elementpath.collations import UNICODE_COLLATION_BASE_URI, UNICODE_CODEPOINT_COLLATION from elementpath.datatypes import UntypedAtomic, AtomicType, QName from elementpath.xpath_tokens import XPathToken, ProxyToken, XPathFunction, XPathConstructor from elementpath.xpath_context import XPathContext, XPathSchemaContext from elementpath.sequence_types import is_sequence_type, match_sequence_type from elementpath.schema_proxy import AbstractSchemaProxy from elementpath.xpath1 import XPath1Parser class XPath2Parser(XPath1Parser): """ XPath 2.0 expression parser class. This is the default parser used by XPath selectors. A parser instance represents also the XPath static context. With *variable_types* you can pass a dictionary with the types of the in-scope variables. Provide a *namespaces* dictionary argument for mapping namespace prefixes to URI inside expressions. If *strict* is set to `False` the parser enables also the parsing of QNames, like the ElementPath library. There are some additional XPath 2.0 related arguments. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param variable_types: a dictionary with the static context's in-scope variable \ types. It defines the associations between variables and static types. :param strict: if strict mode is `False` the parser enables parsing of QNames, \ like the ElementPath library. Default is `True`. :param compatibility_mode: if set to `True` the parser instance works with \ XPath 1.0 compatibility rules. :param default_namespace: the default namespace to apply to unprefixed names. \ For default no namespace is applied (empty namespace ''). :param function_namespace: the default namespace to apply to unprefixed function \ names. For default the namespace "http://www.w3.org/2005/xpath-functions" is used. :param schema: the schema proxy class or instance to use for types, attributes and \ elements lookups. If an `AbstractSchemaProxy` subclass is provided then a schema \ proxy instance is built without the optional argument, that involves a mapping of \ only XSD builtin types. If it's not provided the XPath 2.0 schema's related \ expressions cannot be used. :param base_uri: an absolute URI maybe provided, used when necessary in the \ resolution of relative URIs. :param default_collation: the default string collation to use. If not set the \ environment's default locale setting is used. :param document_types: statically known documents, that is a dictionary from \ absolute URIs onto types. Used for type check when calling the *fn:doc* function \ with a sequence of URIs. The default type of a document is 'document-node()'. :param collection_types: statically known collections, that is a dictionary from \ absolute URIs onto types. Used for type check when calling the *fn:collection* \ function with a sequence of URIs. The default type of a collection is 'node()*'. :param default_collection_type: this is the type of the sequence of nodes that \ would result from calling the *fn:collection* function with no arguments. \ Default is 'node()*'. """ version = '2.0' DEFAULT_NAMESPACES: ClassVar[Dict[str, str]] = { 'xml': XML_NAMESPACE, 'xs': XSD_NAMESPACE, 'fn': XPATH_FUNCTIONS_NAMESPACE, 'err': XQT_ERRORS_NAMESPACE } PATH_STEP_LABELS = ('axis', 'function', 'kind test') PATH_STEP_SYMBOLS = { '(integer)', '(string)', '(float)', '(decimal)', '(name)', '*', '@', '..', '.', '(', '{' } # https://www.w3.org/TR/xpath20/#id-reserved-fn-names RESERVED_FUNCTION_NAMES = { 'attribute', 'comment', 'document-node', 'element', 'empty-sequence', 'if', 'item', 'node', 'processing-instruction', 'schema-attribute', 'schema-element', 'text', 'typeswitch', } function_signatures: Dict[Tuple[QName, int], str] = XPath1Parser.function_signatures.copy() namespaces: Dict[str, str] token: XPathToken next_token: XPathToken def __init__(self, namespaces: Optional[NamespacesType] = None, strict: bool = True, compatibility_mode: bool = False, default_collation: Optional[str] = None, default_namespace: Optional[str] = None, function_namespace: Optional[str] = None, xsd_version: Optional[str] = None, schema: Optional[AbstractSchemaProxy] = None, base_uri: Optional[str] = None, variable_types: Optional[Dict[str, str]] = None, document_types: Optional[Dict[str, str]] = None, collection_types: Optional[NamespacesType] = None, default_collection_type: str = 'node()*') -> None: super(XPath2Parser, self).__init__(namespaces, strict) self.compatibility_mode = compatibility_mode if default_collation is not None: self.default_collation = default_collation else: # Obtain the current collation locale using setlocale() with `None`. # Consider only configured UTF-8 encodings, otherwise keep Unicode # Codepoint Collation. _locale = locale.setlocale(locale.LC_COLLATE, None) if '.' in _locale: language_code, encoding = _locale.split('.') if encoding.lower() == 'utf-8': self.default_collation = f'{UNICODE_COLLATION_BASE_URI}?lang={language_code}' self._xsd_version = xsd_version if xsd_version is not None else '1.0' if default_namespace is not None: self.default_namespace = self.namespaces[''] = default_namespace else: self.default_namespace = self.namespaces.get('', '') if function_namespace is not None: self.function_namespace = function_namespace if schema is None: pass elif not isinstance(schema, AbstractSchemaProxy): msg = "argument 'schema' must be an instance of AbstractSchemaProxy" raise ElementPathTypeError(msg) else: schema.bind_parser(self) if not variable_types: self.variable_types = {} elif all(is_sequence_type(v, self) for v in variable_types.values()): self.variable_types = variable_types.copy() else: raise ElementPathValueError('invalid sequence type for in-scope variable types') self.base_uri = None if base_uri is None else urlparse(base_uri).geturl() if document_types: if any(not is_sequence_type(v, self) for v in document_types.values()): raise ElementPathValueError('invalid sequence type in document_types argument') self.document_types = document_types if collection_types: if any(not is_sequence_type(v, self) for v in collection_types.values()): raise ElementPathValueError('invalid sequence type in collection_types argument') self.collection_types = collection_types if not is_sequence_type(default_collection_type, self): raise ElementPathValueError('invalid sequence type for ' 'default_collection_type argument') self.default_collection_type = default_collection_type def __str__(self) -> str: args = [] if self.compatibility_mode: args.append('compatibility_mode=True') if self.default_collation != UNICODE_CODEPOINT_COLLATION: args.append(f'default_collation={self.default_collation!r}') if self.function_namespace != XPATH_FUNCTIONS_NAMESPACE: args.append(f'function_namespace={self.function_namespace!r}') if self._xsd_version != '1.0': args.append(f'xsd_version={self._xsd_version!r}') if self.schema is not None: args.append(f'schema={self.schema!r}') if self.base_uri is not None: args.append(f'base_uri={self.base_uri!r}') if self.variable_types: args.append(f'variable_types={self.variable_types!r}') if self.document_types: args.append(f'document_types={self.document_types!r}') if self.collection_types: args.append(f'collection_types={self.collection_types!r}') if self.default_collection_type != 'node()*': args.append(f'default_collection_type={self.default_collection_type!r}') if not args: return super().__str__() repr_string = super().__str__()[:-1] if repr_string.endswith('('): return f"{repr_string}{', '.join(args)})" return f"{repr_string}, {', '.join(args)})" def __getstate__(self) -> Dict[str, Any]: state = self.__dict__.copy() state.pop('symbol_table', None) state.pop('tokenizer', None) return state @property def xsd_version(self) -> str: if self.schema is None: return self._xsd_version try: return self.schema.xsd_version except (AttributeError, NotImplementedError): return self._xsd_version def advance(self, *symbols: str, message: Optional[str] = None) -> XPathToken: super(XPath2Parser, self).advance(*symbols, message=message) if self.next_token.symbol == '(:': # Parses and consumes an XPath 2.0 comment. A comment is delimited # by symbols '(:' and ':)' and can be nested. The current token is # saved and restored after parsing the entire comment. Comments # cannot be inside a prefixed name ':' specification. self.token.unexpected(':') token = self.token comment_level = 1 while comment_level: self.advance_until('(:', ':)') if self.next_token.symbol == ':)': comment_level -= 1 else: comment_level += 1 self.advance(':)') self.next_token.unexpected(':') self.token = token return self.token @classmethod def constructor(cls, symbol: str, bp: int = 90, nargs: NargsType = 1, sequence_types: Union[Tuple[()], Tuple[str, ...], List[str]] = (), label: Union[str, Tuple[str, ...]] = 'constructor function') \ -> Callable[[Callable[..., Any]], Callable[..., Any]]: """ Statically creates a constructor token class, that is registered in the globals of the module where the method is called. """ def nud_(self: XPathConstructor) -> XPathConstructor: if not self.parser.parse_arguments: return self try: self.parser.advance('(') self[0:] = self.parser.expression(5), if self.parser.next_token.symbol == ',': msg = 'Too many arguments: expected at most 1 argument' raise self.error('XPST0017', msg) self.parser.advance(')') except SyntaxError: raise self.error('XPST0017') from None else: if self[0].symbol == '?': self.to_partial_function() return self def evaluate_(self: XPathConstructor, context: Optional[XPathContext] = None) \ -> Union[List[None], AtomicType]: if self.context is not None: context = self.context arg = self.data_value(self.get_argument(context)) if arg is None: return [] elif arg == '?' and self[0].symbol == '?': raise self.error('XPTY0004', "cannot evaluate a partial function") try: if isinstance(arg, UntypedAtomic): return self.cast(arg.value) return self.cast(arg) except ElementPathError: raise except (TypeError, ValueError) as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0001', err) from None if not sequence_types: assert nargs == 1 sequence_types = ('xs:anyAtomicType?', 'xs:%s?' % symbol) token_class = cls.register(symbol, nargs=nargs, sequence_types=sequence_types, label=label, bases=(XPathConstructor,), lbp=bp, rbp=bp, nud=nud_, evaluate=evaluate_) def bind(func: Callable[..., Any]) -> Callable[..., Any]: method_name = func.__name__.partition('_')[0] if method_name != 'cast': raise ValueError("The function name must be 'cast' or starts with 'cast_'") setattr(token_class, method_name, func) return func return bind def schema_constructor(self, atomic_type_name: str, bp: int = 90) \ -> Type[XPathFunction]: """Dynamically registers a token class for a schema atomic type constructor function.""" if atomic_type_name in (XSD_ANY_ATOMIC_TYPE, XSD_NOTATION): raise xpath_error('XPST0080') def nud_(self_: XPathFunction) -> XPathFunction: self_.parser.advance('(') self_[0:] = self_.parser.expression(5), self_.parser.advance(')') try: self_.evaluate() # for static context evaluation except MissingContextError: pass return self_ def evaluate_(self_: XPathFunction, context: Optional[XPathContext] = None) \ -> Union[List[None], AtomicType]: arg = self_.get_argument(context) if arg is None or self_.parser.schema is None: return [] value = self_.string_value(arg) try: return self_.parser.schema.cast_as(value, atomic_type_name) except (TypeError, ValueError) as err: if isinstance(context, XPathSchemaContext): return [] raise self_.error('FORG0001', err) symbol = get_prefixed_name(atomic_type_name, self.namespaces) token_class_name = "_%sConstructorFunction" % symbol.replace(':', '_') kwargs = { 'symbol': symbol, 'nargs': 1, 'label': 'constructor function', 'pattern': r'\b%s(?=\s*\(|\s*\(\:.*\:\)\()' % symbol, 'lbp': bp, 'rbp': bp, 'nud': nud_, 'evaluate': evaluate_, '__module__': self.__module__, '__qualname__': token_class_name, '__return__': None } token_class = cast( Type[XPathFunction], ABCMeta(token_class_name, (XPathFunction,), kwargs) ) if self.symbol_table is self.__class__.symbol_table: self.symbol_table = dict(self.__class__.symbol_table) self.symbol_table[symbol] = token_class self.tokenizer = None return token_class def external_function(self, callback: Callable[..., Any], name: Optional[str] = None, prefix: Optional[str] = None, sequence_types: Tuple[str, ...] = (), bp: int = 90) -> Type[XPathFunction]: """Registers a token class for an external function.""" import inspect symbol = name or callback.__name__ if not is_ncname(symbol): raise ElementPathValueError(f'{symbol!r} is not a name') elif symbol in self.RESERVED_FUNCTION_NAMES: raise ElementPathValueError(f'{symbol!r} is a reserved function name') nargs: NargsType spec = inspect.getfullargspec(callback) if spec.varargs is not None: if spec.args: nargs = len(spec.args), None else: nargs = None elif spec.defaults is None: nargs = len(spec.args) else: nargs = len(spec.args) - len(spec.defaults), len(spec.args) if prefix: namespace = self.namespaces[prefix] qname = QName(namespace, f'{prefix}:{symbol}') else: namespace = XPATH_FUNCTIONS_NAMESPACE qname = QName(XPATH_FUNCTIONS_NAMESPACE, f'fn:{symbol}') class_name = f'{upper_camel_case(qname.qname)}ExternalFunction' lookup_name = qname.expanded_name if self.symbol_table is self.__class__.symbol_table: self.symbol_table = dict(self.__class__.symbol_table) if lookup_name in self.symbol_table: msg = f'function {qname.qname!r} is already registered' raise ElementPathValueError(msg) elif symbol not in self.symbol_table or \ not issubclass(self.symbol_table[symbol], ProxyToken): if symbol in self.symbol_table: token_cls = self.symbol_table[symbol] if not issubclass(token_cls, XPathFunction) \ or token_cls.label == 'kind test': msg = f'{symbol!r} name collides with {token_cls!r}' raise ElementPathValueError(msg) if namespace == token_cls.namespace: msg = f'function {qname.qname!r} is already registered' raise ElementPathValueError(msg) # Move the token class before register the proxy token self.symbol_table[f'{{{token_cls.namespace}}}{symbol}'] = token_cls token_class_name = f'{upper_camel_case(qname.local_name)}FunctionProxy' kwargs = { 'class_name': token_class_name, 'symbol': symbol, 'label': 'function', 'lbp': bp, 'rbp': bp, '__module__': self.__module__, '__qualname__': token_class_name, '__return__': None } self.symbol_table[symbol] = cast( Type[ProxyToken], ABCMeta(class_name, (ProxyToken,), kwargs) ) def evaluate_external_function(self_: XPathFunction, context: Optional[XPathContext] = None) -> Any: args = [] for k in range(len(self_)): arg = self_.get_argument(context, index=k) args.append(arg) if sequence_types: for k, (arg, st) in enumerate(zip(args, sequence_types), start=1): if not match_sequence_type(arg, st, self): msg_ = f"{ordinal(k)} argument does not match sequence type {st!r}" raise xpath_error('XPDY0050', msg_) result = callback(*args) if not match_sequence_type(result, sequence_types[-1], self): msg_ = f"Result does not match sequence type {sequence_types[-1]!r}" raise xpath_error('XPDY0050', msg_) return result return callback(*args) kwargs = { 'class_name': class_name, 'symbol': symbol, 'namespace': namespace, 'label': 'external function', 'nargs': nargs, 'lbp': bp, 'rbp': bp, 'evaluate': evaluate_external_function, '__module__': self.__module__, '__qualname__': class_name, '__return__': None } if sequence_types: # Register function signature(s) kwargs['sequence_types'] = sequence_types if self.function_signatures is self.__class__.function_signatures: self.function_signatures = dict(self.__class__.function_signatures) if nargs is None: pass # pragma: no cover elif isinstance(nargs, int): assert len(sequence_types) == nargs + 1 self.function_signatures[(qname, nargs)] = 'function({}) as {}'.format( ', '.join(sequence_types[:-1]), sequence_types[-1] ) elif nargs[1] is None: assert len(sequence_types) == nargs[0] + 1 self.function_signatures[(qname, nargs[0])] = 'function({}, ...) as {}'.format( ', '.join(sequence_types[:-1]), sequence_types[-1] ) else: assert len(sequence_types) == nargs[1] + 1 for arity in range(nargs[0], nargs[1] + 1): self.function_signatures[(qname, arity)] = 'function({}) as {}'.format( ', '.join(sequence_types[:arity]), sequence_types[-1] ) token_class = cast( Type[XPathFunction], ABCMeta(class_name, (XPathFunction,), kwargs) ) self.symbol_table[lookup_name] = token_class self.tokenizer = None return token_class def is_schema_bound(self) -> bool: return self.schema is not None and 'symbol_table' in self.__dict__ def check_variables(self, values: MutableMapping[str, Any]) -> None: if self.variable_types is None: return for varname, xsd_type in self.variable_types.items(): if varname not in values: raise xpath_error('XPST0008', "missing variable {!r}".format(varname)) for varname, value in values.items(): try: sequence_type = self.variable_types[varname] except KeyError: sequence_type = 'item()*' if isinstance(value, list) else 'item()' if not match_sequence_type(value, sequence_type, self): message = "Unmatched sequence type for variable {!r}".format(varname) raise xpath_error('XPDY0050', message) ## # Remove symbols that have to be redefined for XPath 2.0. XPath2Parser.unregister(',') XPath2Parser.unregister('(') XPath2Parser.unregister('$') XPath2Parser.unregister('contains') XPath2Parser.unregister('lang') XPath2Parser.unregister('id') XPath2Parser.unregister('substring-before') XPath2Parser.unregister('substring-after') XPath2Parser.unregister('starts-with') ### # Symbols XPath2Parser.register('?') XPath2Parser.register('(:') XPath2Parser.register(':)') # XPath 2.0 definitions continue into module xpath2_operators sissaschool-elementpath-d3688c7/elementpath/xpath3.py000066400000000000000000000007561476131650400230260ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from .xpath30 import XPath30Parser from .xpath31 import XPath31Parser XPath3Parser = XPath31Parser __all__ = ['XPath30Parser', 'XPath31Parser', 'XPath3Parser'] sissaschool-elementpath-d3688c7/elementpath/xpath30/000077500000000000000000000000001476131650400225245ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/xpath30/__init__.py000066400000000000000000000010031476131650400246270ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath30_parser import XPath30Parser else: from ._xpath30_functions import XPath30Parser __all__ = ['XPath30Parser'] sissaschool-elementpath-d3688c7/elementpath/xpath30/_translation_maps.py000066400000000000000000000101271476131650400266140ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ Translation maps for XPath 3.0+ format functions. Add languages with pull-requests. """ from string import ascii_lowercase ALPHABET_CHARACTERS = { None: ascii_lowercase, 'en': ascii_lowercase, 'it': 'abcdefghilmnopqrstuvz', 'el': 'αβγδεζηθικλμνξοπρςστυφχψω', } OTHER_NUMBERS = ( '\u2070\u00B9\u00B2\u00B3' + ''.join(chr(x) for x in range(0x2074, 0x207A)), # superscript digits (0-9) ''.join(chr(x) for x in range(0x2080, 0x208A)), # subscript digits (0-9) ''.join(chr(x) for x in range(0x2460, 0x2474)), # circled numbers (1-20) ''.join(chr(x) for x in range(0x2474, 0x2488)), # parenthesized numbers (1-20) ''.join(chr(x) for x in range(0x2488, 0x249C)), # full stop numbers (1-20) ) ROMAN_NUMERALS_MAP = { 1000: 'M', 900: 'CM', 500: 'D', 400: 'CD', 100: 'C', 90: 'XC', 50: 'L', 40: 'XL', 10: 'X', 9: 'IX', 5: 'V', 4: 'IV', 1: 'I', } NUM_TO_MONTH_MAPS = { 'en': { 1: 'january', 2: 'february', 3: 'march', 4: 'april', 5: 'may', 6: 'june', 7: 'july', 8: 'august', 9: 'september', 10: 'october', 11: 'november', 12: 'december', }, 'it': { 1: 'gennaio', 2: 'febbraio', 3: 'marzo', 4: 'aprile', 5: 'maggio', 6: 'giugno', 7: 'luglio', 8: 'agosto', 9: 'settembre', 10: 'ottobre', 11: 'novembre', 12: 'dicembre', }, } NUM_TO_WEEKDAY_MAPS = { 'en': { 1: 'monday', 2: 'tuesday', 3: 'wednesday', 4: 'thursday', 5: 'friday', 6: 'saturday', 7: 'sunday', }, 'it': { 1: 'lunedì', 2: 'martedì', 3: 'mercoledì', 4: 'giovedì', 5: 'venerdì', 6: 'sabato', 7: 'domenica', }, } NUM_TO_WORD_MAPS = { 'en': { 10 ** 9: 'billion', 10 ** 6: 'million', 1000: 'thousand', 100: 'hundred', 90: 'ninety', 80: 'eighty', 70: 'seventy', 60: 'sixty', 50: 'fifty', 40: 'forty', 30: 'thirty', 20: 'twenty', 19: 'nineteen', 18: 'eighteen', 17: 'seventeen', 16: 'sixteen', 15: 'fifteen', 14: 'fourteen', 13: 'thirteen', 12: 'twelve', 11: 'eleven', 10: 'ten', 9: 'nine', 8: 'eight', 7: 'seven', 6: 'six', 5: 'five', 4: 'four', 3: 'three', 2: 'two', 1: 'one', 0: 'zero', }, 'it': { 10 ** 9: 'miliardo', 10 ** 6: 'milione', 1000: 'mille', 100: 'cento', 90: 'novanta', 80: 'ottanta', 70: 'settanta', 60: 'sessanta', 50: 'cinquanta', 40: 'quaranta', 30: 'trenta', 20: 'venti', 19: 'diciannove', 18: 'diciotto', 17: 'diciassette', 16: 'sedici', 15: 'quindici', 14: 'quattordici', 13: 'tredici', 12: 'dodici', 11: 'undici', 10: 'dieci', 9: 'nove', 8: 'otto', 7: 'sette', 6: 'sei', 5: 'cinque', 4: 'quattro', 3: 'tre', 2: 'due', 1: 'uno', 0: 'zero', } } MILITARY_TIME_ZONES = { '+01': 'A', '+02': 'B', '+03': 'C', '+04': 'D', '+05': 'E', '+06': 'F', '+07': 'G', '+08': 'H', '+09': 'I', None: 'J', '+10': 'K', '+11': 'L', '+12': 'M', '-01': 'N', '-02': 'O', '-03': 'P', '-04': 'Q', '-05': 'R', '-06': 'S', '-07': 'T', '-08': 'U', '-09': 'V', '-10': 'W', '-11': 'X', '-12': 'Y', '+00': 'Z', } sissaschool-elementpath-d3688c7/elementpath/xpath30/_xpath30_functions.py000066400000000000000000002143101476131650400266150ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.0 implementation - part 3 (functions) """ import sys import decimal import os import re import codecs import math from copy import copy from itertools import zip_longest from typing import cast, Any, Dict, List, Optional, Tuple, Union, Type, Set from urllib.parse import urlsplit from urllib.request import urlopen from urllib.error import URLError from elementpath._typing import Iterator from elementpath.aliases import Emptiable from elementpath.protocols import ElementProtocol from elementpath.exceptions import ElementPathError from elementpath.tdop import MultiLabel from elementpath.helpers import OCCURRENCE_INDICATORS, Patterns, \ is_xml_codepoint, node_position from elementpath.namespaces import get_expanded_name, split_expanded_name, \ XPATH_FUNCTIONS_NAMESPACE, XSD_NAMESPACE from elementpath.datatypes import xsd10_atomic_types, NumericProxy, QName, Date10, \ DateTime10, Time, AnyURI, UntypedAtomic, AtomicType, NumericType, NMToken, \ Idref, Entity from elementpath.sequence_types import is_sequence_type, match_sequence_type from elementpath.etree import defuse_xml from elementpath.xpath_nodes import XPathNode, ElementNode, NamespaceNode, \ DocumentNode, EtreeElementNode, SchemaElementNode from elementpath.tree_builders import get_node_tree from elementpath.xpath_tokens import XPathToken, ValueToken, XPathFunction, XPathConstructor from elementpath.serialization import get_serialization_params, serialize_to_xml, \ serialize_to_json from elementpath.xpath_context import ContextType, ItemType, FunctionArgType, \ XPathContext, XPathSchemaContext from elementpath.regex import translate_pattern, RegexError from ._xpath30_operators import XPath30Parser from .xpath30_helpers import UNICODE_DIGIT_PATTERN, DECIMAL_DIGIT_PATTERN, \ MODIFIER_PATTERN, decimal_to_string, int_to_roman, int_to_alphabetic, \ format_digits, int_to_words, parse_datetime_picture, parse_datetime_marker, \ ordinal_suffix if sys.version_info < (3, 9): zoneinfo = None else: import zoneinfo FORMAT_INTEGER_TOKENS = {'A', 'a', 'i', 'I', 'w', 'W', 'Ww'} DECL_PARAM_PATTERN = re.compile(r'([^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*)\s*=\s*') EXPONENT_PIC = re.compile(r'\d[eE]\d') register = XPath30Parser.register method = XPath30Parser.method function = XPath30Parser.function ### # 'inline function' expression or 'function test' class _InlineFunction(XPathFunction): symbol = lookup_name = 'function' lbp = 90 rbp = 90 label: Union[str, MultiLabel] = MultiLabel('inline function', 'function test') body: Optional[XPathToken] = None "Body of anonymous inline function." variables: Optional[Dict[str, Any]] = None "In-scope variables linked by let and for expressions and arguments." varnames: Optional[List[str]] = None "Inline function arguments varnames." def __str__(self) -> str: return str(self.label) @property def source(self) -> str: if self.label == 'function test': if len(self.sequence_types) == 1 and self.sequence_types[0] == '*': return 'function(*)' else: return 'function(%s) as %s' % ( ', '.join(self.sequence_types[:-1]), self.sequence_types[-1] ) arguments = [] return_type = '' for var, sq in zip_longest(self, self.sequence_types): if var is None: if sq != 'item()*': return_type = f' as {sq}' elif sq is None or sq == 'item()*': arguments.append(var.source) else: arguments.append(f'{var.source} as {sq}') return '%s(%s)%s {%s}' % ( self.symbol, ', '.join(arguments), return_type, getattr(self.body, 'source', '') ) def __call__(self, *args: FunctionArgType, context: Optional[XPathContext] = None) -> Any: def get_argument(v: Any) -> Any: if isinstance(v, XPathToken) and not isinstance(v, XPathFunction): v = v.evaluate(context) if isinstance(v, XPathFunction) and sequence_type.startswith('function('): if not v.match_function_test(sequence_type, as_argument=True): msg = "argument {!r}: {} does not match sequence type {}" raise self.error('XPTY0004', msg.format(varname, v, sequence_type)) elif not match_sequence_type(v, sequence_type, self.parser): _v = self.cast_to_primitive_type(v, sequence_type) if not match_sequence_type(_v, sequence_type, self.parser): msg = "argument '${}': {} does not match sequence type {}" raise self.error('XPTY0004', msg.format(varname, v, sequence_type)) return _v return v sequence_type: str self.check_arguments_number(len(args)) context = copy(context) if self.variables and context is not None: context.variables.update(self.variables) if self.varnames is None: self.varnames = [] assert self.body is not None if self.label == 'inline partial function': k = 0 for varname, sequence_type, tk in zip(self.varnames, self.sequence_types, self): if context is None: raise self.missing_context() if tk.symbol != '?' or tk: context.variables[varname] = tk.evaluate(context) else: context.variables[varname] = get_argument(args[k]) k += 1 result = self.body.evaluate(context) else: if context is None: raise self.missing_context() elif not args and self: if isinstance(context.item, DocumentNode): if isinstance(context.root, DocumentNode): context.item = context.root.getroot() elif context.root is not None: context.item = context.root args = cast(Tuple[FunctionArgType], (context.item,)) partial_function = False if self.variables is None: self.variables = {} for varname, sequence_type, value in zip(self.varnames, self.sequence_types, args): if isinstance(value, XPathToken) and value.symbol == '?': partial_function = True else: context.variables[varname] = get_argument(value) if partial_function: self.to_partial_function() return self result = self.body.evaluate(context) return self.validated_result(result) def nud(self) -> Union[XPathFunction, XPathToken]: # type: ignore[override] def append_sequence_type(tk: XPathToken) -> None: if tk.symbol == '(' and len(tk) == 1: tk = tk[0] sequence_type = tk.source next_symbol = self.parser.next_token.symbol if sequence_type != 'empty-sequence()' and next_symbol in OCCURRENCE_INDICATORS: self.parser.advance() sequence_type += next_symbol tk.occurrence = next_symbol if not is_sequence_type(sequence_type, self.parser): if 'xs:NMTOKENS' in sequence_type \ or 'xs:ENTITIES' in sequence_type \ or 'xs:IDREFS' in sequence_type: msg = "a list type cannot be used in a function signature" raise self.error('XPST0051', msg) raise self.error('XPST0003', "a sequence type expected") assert isinstance(self.sequence_types, list) self.sequence_types.append(sequence_type) if self.parser.next_token.symbol != '(': return self.as_name() self.parser.advance('(') self.sequence_types = [] if self.parser.next_token.symbol in ('$', ')'): self.label = 'inline function' self.varnames = [] while self.parser.next_token.symbol != ')': self.parser.next_token.expected('$') variable = self.parser.expression(5) varname = variable[0].value assert isinstance(varname, str) if varname in self.varnames: raise self.error('XQST0039') self.append(variable) self.varnames.append(varname) if self.parser.next_token.symbol == 'as': self.parser.advance('as') token = self.parser.expression(90) append_sequence_type(token) else: self.sequence_types.append('item()*') self.parser.next_token.expected(')', ',') if self.parser.next_token.symbol == ',': self.parser.advance() self.parser.next_token.unexpected(')') self.parser.advance(')') elif self.parser.next_token.symbol == '*': self.label = 'function test' self.append(self.parser.advance('*')) self.sequence_types.append('*') self.parser.advance(')') return self else: self.label = 'function test' while True: token = self.parse_sequence_type() append_sequence_type(token) self.append(token) if self.parser.next_token.symbol != ',': break self.parser.advance(',') self.parser.advance(')') # Add function return sequence type if self.parser.next_token.symbol != 'as': self.sequence_types.append('item()*') else: self.parser.advance('as') if self.parser.next_token.label not in ('kind test', 'sequence type', 'function test'): self.parser.expected_next('(name)', ':') token = self.parser.expression(rbp=90) append_sequence_type(token) if self.label == 'inline function': if self.parser.next_token.symbol != '{' and not self: self.label = 'function test' else: self.parser.advance('{') if self.parser.next_token.symbol != '}': self.body = self.parser.expression() elif self.parser.version >= '3.1': self.body = ValueToken(self.parser, value=[]) else: raise self.wrong_syntax("inline function has an empty body") self.parser.advance('}') return self def evaluate(self, context: ContextType = None) -> Union[ItemType, List[ItemType]]: if context is None: raise self.missing_context() elif self.label.endswith('function'): self.variables = context.variables.copy() # like a closure return self # A function test if not isinstance(context.item, XPathFunction): return [] elif self.source == 'function(*)': return context.item elif context.item.match_function_test(self.sequence_types): return context.item else: return [] def to_partial_function(self) -> None: assert self.label != 'function test', "an effective inline function required" nargs = len([tk and not tk for tk in self._items if tk.symbol == '?']) assert nargs, "a partial function requires at least a placeholder token" self._name = None self.label = 'inline partial function' self.nargs = nargs XPath30Parser.symbol_table['function'] = _InlineFunction ### # Mathematical functions @method(function('pi', prefix='math', nargs=0, sequence_types=('xs:double',))) def evaluate_pi_function(self: XPathFunction, context: ContextType = None) -> float: return math.pi @method(function('exp', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_exp_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: NumericType = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] return math.exp(arg) @method(function('exp10', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_exp10_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: NumericType = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] return float(10 ** arg) @method(function('log', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_log_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] return float('-inf') if not arg else math.nan if arg <= -1 else math.log(arg) @method(function('log10', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_log10_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] return float('-inf') if not arg else math.nan if arg <= -1 else math.log10(arg) @method(function('pow', prefix='math', nargs=2, sequence_types=('xs:double?', 'xs:numeric', 'xs:double?'))) def evaluate_pow_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: if self.context is not None: context = self.context x = self.get_argument(context, cls=NumericProxy) y = self.get_argument(context, index=1, required=True, cls=NumericProxy) if x is None: return [] elif not x and y < 0: return math.copysign(float('inf'), x) if (y % 2) == 1 else float('inf') try: return float(x ** y) except TypeError: return math.nan @method(function('sqrt', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_sqrt_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] elif arg < 0: return math.nan return math.sqrt(arg) @method(function('sin', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_sin_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] elif math.isinf(arg): return math.nan return math.sin(arg) @method(function('cos', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_cos_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] elif math.isinf(arg): return math.nan return math.cos(arg) @method(function('tan', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_tan_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] elif math.isinf(arg): return math.nan return math.tan(arg) @method(function('asin', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_asin_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] elif arg < -1 or arg > 1: return math.nan return math.asin(arg) @method(function('acos', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_acos_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] elif arg < -1 or arg > 1: return math.nan return math.acos(arg) @method(function('atan', prefix='math', nargs=1, sequence_types=('xs:double?', 'xs:double?'))) def evaluate_atan_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: arg: Optional[NumericType] = self.get_argument(self.context or context, cls=NumericProxy) if arg is None: return [] return math.atan(arg) @method(function('atan2', prefix='math', nargs=2, sequence_types=('xs:double', 'xs:double', 'xs:double'))) def evaluate_atan2_function(self: XPathFunction, context: ContextType = None) -> Emptiable[float]: if self.context is not None: context = self.context x = self.get_argument(context, cls=NumericProxy) y = self.get_argument(context, index=1, required=True, cls=NumericProxy) return math.atan2(x, y) ### # Formatting functions @method(function('format-integer', nargs=(2, 3), sequence_types=('xs:integer?', 'xs:string', 'xs:string?', 'xs:string'))) def evaluate_format_integer_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context value = self.get_argument(context, cls=NumericProxy) picture = self.get_argument(context, index=1, required=True, cls=str) lang = self.get_argument(context, index=2, cls=str) if value is None: return '' if ';' not in picture: fmt_token, fmt_modifier = picture, '' else: fmt_token, fmt_modifier = picture.rsplit(';', 1) if MODIFIER_PATTERN.match(fmt_modifier) is None: raise self.error('FODF1310') if not fmt_token: raise self.error('FODF1310') elif fmt_token in FORMAT_INTEGER_TOKENS: if fmt_token == 'a': result = int_to_alphabetic(value, lang) elif fmt_token == 'A': result = int_to_alphabetic(value, lang).upper() elif fmt_token == 'i': result = int_to_roman(value).lower() elif fmt_token == 'I': result = int_to_roman(value) elif fmt_token == 'w': return int_to_words(value, lang, fmt_modifier) elif fmt_token == 'W': return int_to_words(value, lang, fmt_modifier).upper() else: return int_to_words(value, lang, fmt_modifier).title() else: if UNICODE_DIGIT_PATTERN.search(fmt_token) is None: if any(not x.isalpha() and not x.isdigit() for x in fmt_token): result = str(value) # fallback for invalid pictures else: base_char = '1' for base_char in fmt_token: if base_char.isalpha(): break if base_char.islower(): result = int_to_alphabetic(value, base_char) else: result = int_to_alphabetic(value, base_char.lower()).upper() elif DECIMAL_DIGIT_PATTERN.search(fmt_token) is None or ',,' in fmt_token: msg = 'picture argument has an invalid primary format token' raise self.error('FODF1310', msg) else: digits = UNICODE_DIGIT_PATTERN.findall(fmt_token) cp = ord(digits[0]) if any((ord(ch) - cp) > 10 for ch in digits[1:]): msg = "picture argument mixes digits from different digit families" raise self.error('FODF1310', msg) elif fmt_token[0].isdigit(): if '#' in fmt_token: msg = 'picture argument has an invalid primary format token' raise self.error('FODF1310', msg) elif fmt_token[0] != '#': raise self.error('FODF1310', "invalid grouping in picture argument") if digits[0].isdigit(): cp = ord(digits[0]) while chr(cp - 1).isdigit(): cp -= 1 digits_family = ''.join(chr(cp + k) for k in range(10)) else: raise ValueError() if value < 0: result = '-' + format_digits(str(abs(value)), fmt_token, digits_family) else: result = format_digits(str(abs(value)), fmt_token, digits_family) if fmt_modifier.startswith('o'): return f'{result}{ordinal_suffix(value)}' return result @method(function('format-number', nargs=(2, 3), sequence_types=('xs:numeric?', 'xs:string', 'xs:string?', 'xs:string'))) def evaluate_format_number_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context value = self.get_argument(context, cls=NumericProxy) picture = self.get_argument(context, index=1, required=True, cls=str) decimal_format_name = self.get_argument(context, index=2, cls=str) # Check and adapt decimal format name if decimal_format_name is not None: decimal_format_name = decimal_format_name.strip() if decimal_format_name.startswith('Q{'): if decimal_format_name.startswith('Q{}'): decimal_format_name = decimal_format_name[3:] else: decimal_format_name = decimal_format_name[1:] elif ':' in decimal_format_name: try: decimal_format_name = get_expanded_name( name=decimal_format_name, namespaces=self.parser.namespaces ) except (KeyError, ValueError): raise self.error('FODF1280') from None try: decimal_format = self.parser.decimal_formats[decimal_format_name] except KeyError: raise self.error('FODF1280') from None pattern_separator = decimal_format['pattern-separator'] sub_pictures = picture.split(pattern_separator) if len(sub_pictures) > 2: raise self.error('FODF1310') decimal_separator = decimal_format['decimal-separator'] if any(p.count(decimal_separator) > 1 for p in sub_pictures): raise self.error('FODF1310') percent_sign = decimal_format['percent'] per_mille_sign = decimal_format['per-mille'] if any(p.count(percent_sign) + p.count(per_mille_sign) > 1 for p in sub_pictures): raise self.error('FODF1310') zero_digit = decimal_format['zero-digit'] optional_digit = decimal_format['digit'] digits_family = ''.join(chr(cp + ord(zero_digit)) for cp in range(10)) if any(optional_digit not in p and all(x not in p for x in digits_family) for p in sub_pictures): raise self.error('FODF1310') grouping_separator = decimal_format['grouping-separator'] adjacent_pattern = re.compile(r'[\\%s\\%s]{2}' % (grouping_separator, decimal_separator)) if any(adjacent_pattern.search(p) for p in sub_pictures): raise self.error('FODF1310') if any(x.endswith(grouping_separator) for s in sub_pictures for x in s.split(decimal_separator)): raise self.error('FODF1310') active_characters = digits_family + ''.join([ decimal_separator, grouping_separator, pattern_separator, optional_digit ]) exponent_pattern = None exponent_separator = 'e' if self.parser.version > '3.0': # Check optional exponent spec correctness in each sub-picture exponent_separator = decimal_format['exponent-separator'] _pattern = re.compile(r'(?<=[{0}]){1}[{0}]'.format( re.escape(active_characters), exponent_separator )) for p in sub_pictures: for match in _pattern.finditer(p): if percent_sign in p or per_mille_sign in p: raise self.error('FODF1310') elif any(c not in digits_family for c in p[match.span()[1]-1:]): # detailed check to consider suffix has_suffix = False for ch in p[match.span()[1]-1:]: if ch in digits_family: if has_suffix: raise self.error('FODF1310') elif ch in active_characters: raise self.error('FODF1310') else: has_suffix = True exponent_pattern = _pattern if exponent_pattern is None: if any(EXPONENT_PIC.search(s) for s in sub_pictures): raise self.error('FODF1310') if value is None or math.isnan(value): return f"{decimal_format['NaN']}" elif isinstance(value, float): value = decimal.Decimal.from_float(value) elif not isinstance(value, decimal.Decimal): value = decimal.Decimal(value) minus_sign = decimal_format['minus-sign'] prefix = '' if value >= 0: subpic = sub_pictures[0] else: subpic = sub_pictures[-1] if len(sub_pictures) == 1: prefix = minus_sign for k, ch in enumerate(subpic): if ch in active_characters: prefix += subpic[:k] subpic = subpic[k:] break else: prefix += subpic subpic = '' if not subpic: suffix = '' elif subpic[-1] == percent_sign: suffix = percent_sign subpic = subpic[:-1] if value.as_tuple().exponent < 0: value *= 100 else: value = decimal.Decimal(int(value) * 100) elif subpic[-1] == per_mille_sign: suffix = per_mille_sign subpic = subpic[:-1] if value.as_tuple().exponent < 0: value *= 1000 else: value = decimal.Decimal(int(value) * 1000) else: for k, ch in enumerate(reversed(subpic)): if ch in active_characters: idx = len(subpic) - k suffix = subpic[idx:] subpic = subpic[:idx] break else: suffix = subpic subpic = '' exp_fmt = None if exponent_pattern is not None: exp_match = exponent_pattern.search(subpic) if exp_match is not None: exp_fmt = subpic[exp_match.span()[0]+1:] subpic = subpic[:exp_match.span()[0]] fmt_tokens = subpic.split(decimal_separator) if all(not fmt for fmt in fmt_tokens): raise self.error('FODF1310') if math.isinf(value): return f"{prefix}{decimal_format['infinity']}{suffix}" # Calculate the exponent value if it's in the sub-picture exp_value = 0 if exp_fmt and value: num_digits = 0 for ch in fmt_tokens[0]: if ch in digits_family: num_digits += 1 if abs(value) > 1: v = abs(value) while v > 10 ** num_digits: exp_value += 1 v /= 10 # modify empty fractional part to store a digit if not num_digits: if len(fmt_tokens) == 1: fmt_tokens.append(zero_digit) elif not fmt_tokens[-1]: fmt_tokens[-1] = zero_digit elif len(fmt_tokens) > 1 and fmt_tokens[-1] and value >= 0: v = abs(value) * 10 while v < 10 ** num_digits: exp_value -= 1 v *= 10 else: v = abs(value) * 10 while v < 10: exp_value -= 1 v *= 10 if exp_value: value = value * decimal.Decimal(10) ** -exp_value # round the value by fractional part if len(fmt_tokens) == 1 or not fmt_tokens[-1]: exp = decimal.Decimal('1') else: k = -1 for ch in fmt_tokens[-1]: if ch in digits_family or ch == optional_digit: k += 1 exp = decimal.Decimal('.' + '0' * k + '1') try: if value > 0: value = value.quantize(exp, rounding='ROUND_HALF_UP') else: value = value.quantize(exp, rounding='ROUND_HALF_DOWN') except decimal.InvalidOperation: pass # number too large, don't round elementpath.. chunks = decimal_to_string(value).lstrip('-').split('.') kwargs = { 'digits_family': digits_family, 'optional_digit': optional_digit, 'grouping_separator': grouping_separator, } result = format_digits(chunks[0], fmt_tokens[0], **kwargs) if len(fmt_tokens) > 1 and fmt_tokens[-1]: has_optional_digit = False for ch in fmt_tokens[-1]: if ch == optional_digit: has_optional_digit = True elif ch.isdigit() and has_optional_digit: raise self.error('FODF1310') if len(chunks) == 1: chunks.append(zero_digit) decimal_part = format_digits(chunks[1], fmt_tokens[-1], **kwargs) for ch in reversed(fmt_tokens[-1]): if ch == optional_digit: if decimal_part and decimal_part[-1] == zero_digit: decimal_part = decimal_part[:-1] else: if not decimal_part: decimal_part = zero_digit break if decimal_part: result += decimal_separator + decimal_part if not fmt_tokens[0] and result.startswith(zero_digit): result = result.lstrip(zero_digit) if exp_fmt: exp_digits = format_digits(str(abs(exp_value)), exp_fmt, **kwargs) if exp_value >= 0: result += f'{exponent_separator}{exp_digits}' else: result += f'{exponent_separator}-{exp_digits}' return prefix + result + suffix function('format-dateTime', nargs=(2, 5), sequence_types=('xs:dateTime?', 'xs:string', 'xs:string?', 'xs:string?', 'xs:string?', 'xs:string?')) function('format-date', nargs=(2, 5), sequence_types=('xs:date?', 'xs:string', 'xs:string?', 'xs:string?', 'xs:string?', 'xs:string?')) function('format-time', nargs=(2, 5), sequence_types=('xs:time?', 'xs:string', 'xs:string?', 'xs:string?', 'xs:string?', 'xs:string?')) @method('format-dateTime') @method('format-date') @method('format-time') def evaluate_format_date_time_functions(self: XPathFunction, context: ContextType = None) \ -> Emptiable[str]: cls: Type[Union[DateTime10, Date10, Time]] if self.symbol == 'format-dateTime': cls = DateTime10 invalid_markers = '' elif self.symbol == 'format-date': cls = Date10 invalid_markers = 'HhPmsf' else: cls = Time invalid_markers = 'YMDdFWwCE' if self.context is not None: context = self.context value = self.get_argument(context, cls=cls) picture = self.get_argument(context, index=1, required=True, cls=str) if len(self) not in [2, 5]: raise self.error('XPST0017') language = self.get_argument(context, index=2, cls=str) calendar = self.get_argument(context, index=3, cls=str) place = self.get_argument(context, index=4, cls=str) if value is None: return '' try: literals, markers = parse_datetime_picture(picture) except ElementPathError as err: err.token = self raise if invalid_markers: for mrk in markers: if mrk[1] in invalid_markers: msg = 'Invalid date formatting component {!r}'.format(mrk) raise self.error('FOFD1350', msg) result = [] if language not in ('en', 'it', None): language = 'en' result.append('[Language: en') if calendar is not None: if calendar.startswith('Q{}'): calendar = calendar[3:] if calendar not in ('AD', 'ISO', 'OS'): if context is None or calendar != context.default_calendar: if QName.is_valid(calendar): if ':' not in calendar: msg = f'unknown calendar in no namespace {calendar!r}' raise self.error('FOFD1340', msg) try: _ = get_expanded_name(calendar, self.parser.namespaces) except (KeyError, ValueError) as err: raise self.error('FOFD1340', str(err)) from None elif Patterns.extended_qname.search(calendar) is None: raise self.error('FOFD1340', f'Invalid calendar argument {calendar!r}') else: result.append('[' if not result else ', ') result.append('Calendar: AD') if place is not None and zoneinfo is not None: try: zone = zoneinfo.ZoneInfo(place.strip()) except zoneinfo.ZoneInfoNotFoundError: if not isinstance(context, XPathSchemaContext): raise self.error('FOFD1340', f'Invalid place argument {place!r}') else: value = value.astimezone(zone) if result: result.append(']') for k in range(len(markers)): result.append(literals[k]) try: result.append(parse_datetime_marker(markers[k], value, language)) except ElementPathError as err: err.token = self raise result.append(literals[-1]) return ''.join(result) ### # String functions that use regular expressions @method(function('analyze-string', nargs=(2, 3), sequence_types=('xs:string?', 'xs:string', 'xs:string', 'element(fn:analyze-string-result)'))) def evaluate_analyze_string_function(self: XPathFunction, context: ContextType = None) \ -> ElementProtocol: if self.context is not None: context = self.context input_string = self.get_argument(context, default='', cls=str) pattern = self.get_argument(context, 1, required=True, cls=str) flags = 0 if len(self) > 2: for c in self.get_argument(context, 2, required=True, cls=str): if c in 'smix': flags |= getattr(re, c.upper()) elif c == 'q' and self.parser.version > '2': pattern = re.escape(pattern) else: raise self.error('FORX0001', "Invalid regular expression flag %r" % c) try: python_pattern = translate_pattern(pattern, flags, self.parser.xsd_version) compiled_pattern = re.compile(python_pattern, flags=flags) except (re.error, RegexError) as err: msg = "Invalid regular expression: {}" raise self.error('FORX0002', msg.format(str(err))) from None except OverflowError as err: raise self.error('FORX0002', err) from None if compiled_pattern.match('') is not None: raise self.error('FORX0003', "pattern matches a zero-length string") if context is None: raise self.missing_context() level = 0 escaped = False char_class = False group_levels = [0] for s in compiled_pattern.pattern: if escaped: escaped = False elif s == '\\': escaped = True elif char_class: if s == ']': char_class = False elif s == '[': char_class = True elif s == '(': group_levels.append(level) level += 1 elif s == ')': level -= 1 lines = [''.format(XPATH_FUNCTIONS_NAMESPACE)] k = 0 while k < len(input_string): match = compiled_pattern.search(input_string, k) if match is None: lines.append('{}'.format(input_string[k:])) break elif not match.groups(): start, stop = match.span() if start > k: lines.append('{}'.format(input_string[k:start])) lines.append('{}'.format(input_string[start:stop])) k = stop else: start, stop = match.span() if start > k: lines.append('{}'.format(input_string[k:start])) k = start match_items = [] group_tmpl = '{}' empty_group_tmpl = '' unclosed_groups = 0 for idx in range(1, compiled_pattern.groups + 1): _start, _stop = match.span(idx) if _start < 0: continue elif _start > k: if unclosed_groups: for _ in range(unclosed_groups): match_items.append('') unclosed_groups = 0 match_items.append(input_string[k:_start]) if _start == _stop: if group_levels[idx] <= group_levels[idx - 1]: for _ in range(unclosed_groups): match_items.append('') unclosed_groups = 0 match_items.append(empty_group_tmpl.format(idx)) k = _stop elif idx == compiled_pattern.groups: k = _stop match_items.append(group_tmpl.format(idx, input_string[_start:k])) match_items.append('') else: next_start = match.span(idx + 1)[0] if next_start < 0 or _stop < next_start or _stop == next_start \ and group_levels[idx + 1] <= group_levels[idx]: k = _stop match_items.append(group_tmpl.format(idx, input_string[_start:k])) match_items.append('') else: k = next_start match_items.append(group_tmpl.format(idx, input_string[_start:k])) unclosed_groups += 1 for _ in range(unclosed_groups): match_items.append('') match_items.append(input_string[k:stop]) k = stop lines.append('{}'.format(''.join(match_items))) lines.append('') if self.parser.defuse_xml: root = context.etree.XML(defuse_xml(''.join(lines))) else: root = context.etree.XML(''.join(lines)) return cast(ElementProtocol, get_node_tree(root=root, namespaces=self.parser.namespaces)) ### # Functions and operators on nodes @method(function('path', nargs=(0, 1), sequence_types=('node()?', 'xs:string?'))) def evaluate_path_function(self: XPathFunction, context: ContextType = None) -> Emptiable[str]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() if not self: item = context.item else: item = self.get_argument(context) if item is None: return [] if not isinstance(item, XPathNode): return [] elif context.root is None or (root_node := item.root_node) is not context.root: # The context has no root or the root is not the root of the item node return [] elif not isinstance(root_node, (DocumentNode, SchemaElementNode)): # It's a fragment: add fn:root() to select the root position path = item.path[len(root_node.path):] return f"Q{{{XPATH_FUNCTIONS_NAMESPACE}}}root(){path}" else: return item.path or [] @method(function('has-children', nargs=(0, 1), sequence_types=('node()?', 'xs:boolean'))) def evaluate_has_children_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context elif context is None: raise self.missing_context() if not self: item = context.item if not isinstance(item, XPathNode): raise self.error('XPTY0004', 'context item must be a node') else: item = self.get_argument(context) if item is None: return False elif not isinstance(item, XPathNode): raise self.error('XPTY0004', 'argument must be a node') return isinstance(item, DocumentNode) or \ isinstance(item, EtreeElementNode) and (len(item.obj) > 0 or item.obj.text is not None) @method(function('innermost', nargs=1, sequence_types=('node()*', 'node()*'))) def select_innermost_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[XPathNode]: if context is None: raise self.missing_context() context = copy(context) nodes = [e for e in self[0].select(context)] if any(not isinstance(x, XPathNode) for x in nodes): raise self.error('XPTY0004', 'argument must contain only nodes') ancestors = {x for context.item in nodes for x in context.iter_ancestors(axis='ancestor')} results = {x for x in nodes if x not in ancestors} yield from cast(List[XPathNode], sorted(results, key=node_position)) @method(function('outermost', nargs=1, sequence_types=('node()*', 'node()*'))) def select_outermost_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[XPathNode]: if context is None: raise self.missing_context() context = copy(context) nodes: Union[List[ItemType], Set[ItemType]] nodes = [e for e in self[0].select(context)] if any(not isinstance(x, XPathNode) for x in nodes): raise self.error('XPTY0004', 'argument must contain only nodes') results = set() if len(nodes) > 10: nodes = set(nodes) for item in nodes: context.item = item ancestors = {x for x in context.iter_ancestors(axis='ancestor')} if any(x in nodes for x in ancestors): continue results.add(item) yield from cast(List[XPathNode], sorted(results, key=node_position)) ## # Functions and operators on sequences @method(function('head', nargs=1, sequence_types=('item()*', 'item()?'))) def evaluate_head_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[ItemType]: for item in self[0].select(self.context or context): return item else: return [] @method(function('tail', nargs=1, sequence_types=('item()*', 'item()*'))) def select_tail_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: for k, item in enumerate(self[0].select(self.context or context)): if k: yield item @method(function('generate-id', nargs=(0, 1), sequence_types=('node()?', 'xs:string'))) def evaluate_generate_id_function(self: XPathFunction, context: ContextType = None) -> str: arg: Optional[NumericType] arg = self.get_argument(self.context or context, default_to_context=True) if arg is None: return '' elif not isinstance(arg, XPathNode): if self: raise self.error('XPTY0004', "argument is not a node") raise self.error('XPTY0004', "context item is not a node") else: return f'ID{id(arg)}' @method(function('uri-collection', nargs=(0, 1), sequence_types=('xs:string?', 'xs:anyURI*'))) def evaluate_uri_collection_function(self: XPathFunction, context: ContextType = None) \ -> List[AnyURI]: if self.context is not None: context = self.context uri = self.get_argument(context) if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return [] elif not self or uri is None: if context.default_resource_collection is None: raise self.error('FODC0002', 'no default resource collection has been defined') resource_collection = [AnyURI(context.default_resource_collection)] else: try: AnyURI(uri) except ValueError: raise self.error('FODC0004', 'invalid argument to fn:uri-collection') from None if not context.resource_collections: resource_collection = [] else: uri = self.get_absolute_uri(uri) try: resource_collection = [AnyURI(x) for x in context.resource_collections[uri]] except (KeyError, TypeError): url_parts = urlsplit(uri) if url_parts.scheme in ('', 'file') and \ not url_parts.path.startswith(':') and url_parts.path.endswith('/'): raise self.error('FODC0003', 'collection URI is a directory') raise self.error('FODC0002', '{!r} collection not found'.format(uri)) from None if not match_sequence_type(resource_collection, 'xs:anyURI*', self.parser): raise self.error('XPDY0050', "Type does not match sequence type xs:anyURI*") return resource_collection @method(function('unparsed-text', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:string?'))) @method(function('unparsed-text-lines', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:string*'))) def evaluate_unparsed_text_functions(self: XPathFunction, context: ContextType = None) \ -> Union[Emptiable[str], List[str]]: if self.context is not None: context = self.context href: Optional[str] = self.get_argument(context, cls=str) if href is None: return [] elif urlsplit(href).fragment: raise self.error('FOUT1170') encoding: str if len(self) > 1: encoding = self.get_argument(context, index=1, required=True, cls=str) else: encoding = 'UTF-8' try: uri = self.get_absolute_uri(href) except ValueError: raise self.error('FOUT1170') from None try: codecs.lookup(encoding) except LookupError: raise self.error('FOUT1190') from None if context is not None and uri in context.text_resources: text = context.text_resources[uri] else: try: with urlopen(uri) as rp: stream_reader = codecs.getreader(encoding)(rp) text = stream_reader.read() except URLError as err: raise self.error('FOUT1170', err) from None except ValueError as err: if len(self) > 1: raise self.error('FOUT1190', err) from None try: with urlopen(uri) as rp: stream_reader = codecs.getreader('UTF-16')(rp) text = stream_reader.read() except URLError as err: raise self.error('FOUT1170', err) from None except ValueError as err: raise self.error('FOUT1190', err) from None if context is not None: context.text_resources[uri] = text if not all(is_xml_codepoint(ord(s)) for s in text): raise self.error('FOUT1190') text = text.lstrip('\ufeff') if self.symbol == 'unparsed-text-lines': lines = Patterns.xml_newlines.split(text) return lines[:-1] if lines[-1] == '' else lines return text @method(function('unparsed-text-available', nargs=(1, 2), sequence_types=('xs:string?', 'xs:string', 'xs:boolean'))) def evaluate_unparsed_text_available_function(self: XPathFunction, context: ContextType = None) \ -> bool: if self.context is not None: context = self.context href = self.get_argument(context, cls=str) if href is None: return False elif urlsplit(href).fragment: return False if len(self) > 1: encoding = self.get_argument(context, index=1, required=True, cls=str) else: encoding = 'UTF-8' try: uri = self.get_absolute_uri(href) except ValueError: return False try: codecs.lookup(encoding) except LookupError: return False try: with urlopen(uri) as rp: stream_reader = codecs.getreader(encoding)(rp) for line in stream_reader: if any(not is_xml_codepoint(ord(s)) for s in line): return False except URLError: return False except ValueError: if len(self) > 1: return False else: return True # Fallback auto-detection with utf-16 try: with urlopen(uri) as rp: stream_reader = codecs.getreader('UTF-16')(rp) for line in stream_reader: if any(not is_xml_codepoint(ord(s)) for s in line): return False except (ValueError, URLError): return False else: return True @method(function('environment-variable', nargs=1, sequence_types=('xs:string', 'xs:string?'))) def evaluate_environment_variable_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[str]: if self.context is not None: context = self.context name: str = self.get_argument(context, required=True, cls=str) if context is None: raise self.missing_context() elif not context.allow_environment: return [] else: value = os.environ.get(name) return value if value is not None else [] @method(function('available-environment-variables', nargs=0, sequence_types=('xs:string*',))) def evaluate_available_environment_variables_function( self: XPathFunction, context: ContextType = None) -> List[str]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() if not context.allow_environment: return [] else: return list(os.environ) ### # Parsing and serializing @method(function('parse-xml', nargs=1, sequence_types=('xs:string?', 'document-node(element(*))?'))) def evaluate_parse_xml_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[DocumentNode]: # TODO: resolve relative entity references with static base URI if self.context is not None: context = self.context arg: Optional[str] = self.get_argument(context, cls=str) if arg is None: return [] elif context is None: raise self.missing_context() etree = context.etree try: if self.parser.defuse_xml: root = etree.XML(defuse_xml(arg.encode('utf-8'))) else: root = etree.XML(arg.encode('utf-8')) except etree.ParseError: raise self.error('FODC0006') else: return cast(DocumentNode, get_node_tree(etree.ElementTree(root), self.parser.namespaces)) @method(function('parse-xml-fragment', nargs=1, sequence_types=('xs:string?', 'document-node()?'))) def evaluate_parse_xml_fragment_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[DocumentNode]: if self.context is not None: context = self.context arg: Optional[str] = self.get_argument(context, cls=str) if arg is None or isinstance(context, XPathSchemaContext): return [] elif context is None: raise self.missing_context() # Wrap argument in a fake document because an # XML document can have only one root element if arg.startswith('') xml_params = DECL_PARAM_PATTERN.findall(xml_declaration) if 'encoding' not in xml_params: raise self.error('FODC0006', "'encoding' argument is mandatory") for param in xml_params: if param not in ('version', 'encoding'): msg = f'unexpected parameter {param!r} in XML declaration' raise self.error('FODC0006', msg) if arg.lstrip().startswith('{arg}'), namespaces=self.parser.namespaces ) except etree.ParseError: raise self.error('FODC0006', str(err)) from None else: assert isinstance(dummy_element_node, ElementNode) return dummy_element_node.get_document_node(replace=True) else: return cast(DocumentNode, get_node_tree( root=etree.ElementTree(root), namespaces=self.parser.namespaces )) @method(function('serialize', nargs=(1, 2), sequence_types=( 'item()*', 'element(output:serialization-parameters)?', 'xs:string'))) def evaluate_serialize_function(self: XPathFunction, context: ContextType = None) -> str: # TODO full implementation of serialization with # https://www.w3.org/TR/xpath-functions-30/#xslt-xquery-serialization-30 if self.context is not None: context = self.context params = self.get_argument(context, index=1) if len(self) == 2 else None kwargs = get_serialization_params(params, token=self) if context is None: raise self.missing_context() elif isinstance(context, XPathSchemaContext): return '' # not applicable to schemas method_ = kwargs.get('method', 'xml') if method_ in ('xml', 'html', 'text'): etree_module = context.etree if context.namespaces: for pfx, uri in context.namespaces.items(): etree_module.register_namespace(pfx, uri) else: for pfx, uri in self.parser.namespaces.items(): etree_module.register_namespace(pfx, uri) return serialize_to_xml(self[0].select(context), etree_module, **kwargs) elif method_ == 'json': return serialize_to_json(self[0].select(context), token=self, **kwargs) else: return '' ### # Higher-order functions @method(function('function-lookup', nargs=2, sequence_types=('xs:QName', 'xs:integer', 'function(*)?'))) def evaluate_function_lookup_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[XPathFunction]: if self.context is not None: context = self.context qname: QName = self.get_argument(context, cls=QName, required=True) arity: int = self.get_argument(context, index=1, cls=int, required=True) if qname.namespace == '': return [] try: cls = self.parser.symbol_table[qname.expanded_name] except KeyError: try: cls = self.parser.symbol_table[qname.local_name] except KeyError: return [] assert issubclass(cls, XPathFunction) try: func = cls(self.parser, nargs=arity) except TypeError: return [] func.namespace = qname.namespace func.context = copy(context) return func @method(function('function-name', nargs=1, sequence_types=('function(*)', 'xs:QName?'))) def evaluate_function_name_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[QName]: if self.context is not None: context = self.context if isinstance(self[0], XPathFunction): func = self[0] else: func = self.get_argument(context) if not isinstance(func, XPathFunction): raise self.error('XPTY0004', "argument is not a function") else: name = func.qname return [] if name is None else name @method(function('function-arity', nargs=1, sequence_types=('function(*)', 'xs:integer'))) def evaluate_function_arity_function(self: XPathFunction, context: ContextType = None) -> int: if isinstance(self[0], XPathFunction): return self[0].arity func: XPathFunction func = self.get_argument(self.context or context, cls=XPathFunction, required=True) return func.arity @method(function('for-each', nargs=2, sequence_types=('item()*', 'function(item()) as item()*', 'item()*'))) def select_for_each_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context func = self[1][1] if self[1].symbol == ':' else self[1] if not isinstance(func, XPathFunction): func = self.get_argument(context, index=1, cls=XPathFunction, required=True) assert isinstance(func, XPathFunction) for item in self[0].select(copy(context)): result = func(item, context=context) if isinstance(result, list): yield from result else: yield result @method(function('filter', nargs=2, sequence_types=('item()*', 'function(item()) as xs:boolean', 'item()*'))) def select_filter_function(self: XPathFunction, context: ContextType = None)\ -> Iterator[ItemType]: func = self[1][1] if self[1].symbol == ':' else self[1] if not isinstance(func, XPathFunction): func = self.get_argument(context, index=1, cls=XPathFunction, required=True) assert isinstance(func, XPathFunction) if func.nargs == 0: raise self.error('XPTY0004', f'invalid number of arguments {func.nargs}') for item in self[0].select(copy(context)): cond = func(item, context=context) if not isinstance(cond, bool): raise self.error('XPTY0004', 'a single boolean value required') if cond: yield item @method(function('fold-left', nargs=3, sequence_types=('item()*', 'item()*', 'function(item()*, item()) as item()*', 'item()*'))) def select_fold_left_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: func = self[2][1] if self[2].symbol == ':' else self[2] if not isinstance(func, XPathFunction): func = self.get_argument(context, index=2, cls=XPathFunction, required=True) assert isinstance(func, XPathFunction) if func.arity != 2: raise self.error('XPTY0004', "function arity must be 2") zero = self.get_argument(context, index=1) result = zero for item in self[0].select(copy(context)): result = func(result, item, context=context) if isinstance(result, list): yield from result else: yield result @method(function('fold-right', nargs=3, sequence_types=('item()*', 'item()*', 'function(item()*, item()) as item()*', 'item()*'))) def select_fold_right_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: func = self[2][1] if self[2].symbol == ':' else self[2] if not isinstance(func, XPathFunction): func = self.get_argument(context, index=2, cls=XPathFunction, required=True) assert isinstance(func, XPathFunction) if func.arity != 2: raise self.error('XPTY0004', "function arity must be 2") zero = self.get_argument(context, index=1) result = zero sequence = [x for x in self[0].select(copy(context))] for item in reversed(sequence): result = func(item, result, context=context) if isinstance(result, list): yield from result else: yield result @method(function('for-each-pair', nargs=3, sequence_types=('item()*', 'item()*', 'function(item(), item()) as item()*', 'item()*'))) def select_for_each_pair_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: func = self[2][1] if self[2].symbol == ':' else self[2] if not isinstance(func, XPathFunction): func = self.get_argument(context, index=2, cls=XPathFunction, required=True) if not isinstance(func, XPathFunction): raise self.error('XPTY0004', "invalid type for 3rd argument {!r}".format(func)) elif func.arity != 2: raise self.error('XPTY0004', "function arity of 3rd argument must be 2") for item1, item2 in zip(self[0].select(copy(context)), self[1].select(copy(context))): result = func(item1, item2, context=context) if isinstance(result, list): yield from result else: yield result @method(function('namespace-node', nargs=0, label='kind test')) def select_namespace_node_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[NamespaceNode]: if context is None: raise self.missing_context() elif isinstance(context.item, NamespaceNode): yield context.item elif isinstance(context, XPathSchemaContext): return # deprecated for XP20+ and not needed for schema analysis elif isinstance(context.item, ElementNode): elem = context.item for context.item in elem.namespace_nodes: yield context.item ### # Redefined or extended functions XPath30Parser.unregister('data') XPath30Parser.unregister('document-uri') XPath30Parser.unregister('nilled') XPath30Parser.unregister('node-name') XPath30Parser.unregister('string-join') XPath30Parser.unregister('round') @method(function('data', nargs=(0, 1), sequence_types=('item()*', 'xs:anyAtomicType*'))) def select_data_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[AtomicType]: if self.context is not None: context = self.context if self: yield from self[0].atomization(context) elif context is None: raise self.missing_context() else: yield from self.atomize_item(context.item) @method(function('document-uri', nargs=(0, 1), sequence_types=('node()?', 'xs:anyURI?'))) def evaluate_document_uri_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[AnyURI]: if self.context is not None: context = self.context elif context is None: raise self.missing_context() arg: Optional[NumericType] = self.get_argument(context, default_to_context=True) if isinstance(arg, DocumentNode): uri = arg.document_uri if uri is not None: return AnyURI(uri) elif isinstance(context.root, DocumentNode): if context.documents: for uri, doc in context.documents.items(): if doc and doc.document is context.root.document: return AnyURI(uri) return [] @method(function('nilled', nargs=(0, 1), sequence_types=('node()?', 'xs:boolean?'))) def evaluate_nilled_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[bool]: arg: Optional[NumericType] = self.get_argument(self.context or context, default_to_context=True) if arg is None: return [] elif isinstance(arg, XPathNode): result = arg.nilled return result if result is not None else [] else: raise self.error('XPTY0004', 'an XPath node required') @method(function('node-name', nargs=(0, 1), sequence_types=('node()?', 'xs:QName?'))) def evaluate_node_name_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[QName]: arg: Optional[NumericType] arg = self.get_argument(self.context or context, default_to_context=True) if arg is None: return [] elif isinstance(arg, XPathNode): name = arg.name if name is None: return [] elif name.startswith('{'): # name is a QName in extended format namespace, local_name = split_expanded_name(name) for pfx, uri in self.parser.namespaces.items(): if uri == namespace: if not pfx: return QName(uri, local_name) return QName(uri, '{}:{}'.format(pfx, local_name)) raise self.error('FONS0004', 'no prefix found for namespace {}'.format(namespace)) else: # name is a local name return QName(self.parser.namespaces.get('', ''), name) else: raise self.error('XPTY0004', 'an XPath node required') @method(function('string-join', nargs=(1, 2), sequence_types=('xs:string*', 'xs:string', 'xs:string'))) def evaluate_string_join_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context items = [ self.validated_value(s, cls=str, promote=AnyURI, index=k) for k, s in enumerate(self[0].atomization(context)) ] if len(self) == 1: return ''.join(items) separator: str = self.get_argument(context, 1, required=True, cls=str) return separator.join(items) @method(function('round', nargs=(1, 2), sequence_types=('xs:numeric?', 'xs:integer', 'xs:numeric?'))) def evaluate_round_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[NumericType]: if self.context is not None: context = self.context arg: Optional[NumericType] = self.get_argument(context) if arg is None: return [] elif isinstance(arg, XPathNode) or self.parser.compatibility_mode: arg = self.number_value(arg) if isinstance(arg, float) and (math.isnan(arg) or math.isinf(arg)): return arg precision: int = self.get_argument(context, index=1, default=0, cls=int) try: if precision < 0: return type(arg)(round(arg, precision)) # type: ignore[call-overload, arg-type] number = decimal.Decimal(arg) exponent = decimal.Decimal('1') / 10 ** precision if number > 0: return type(arg)(number.quantize(exponent, rounding='ROUND_HALF_UP')) else: return type(arg)(number.quantize(exponent, rounding='ROUND_HALF_DOWN')) except TypeError as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FORG0006', err) from None except decimal.InvalidOperation: if isinstance(arg, str): if isinstance(context, XPathSchemaContext): return [] raise self.error('XPTY0004') from None return round(arg) except decimal.DecimalException as err: if isinstance(context, XPathSchemaContext): return [] raise self.error('FOCA0002', err) from None # # XSD list-based constructors @XPath30Parser.constructor('NMTOKENS', sequence_types=('xs:NMTOKEN*',)) def cast_nmtokens_list_type(self: XPathConstructor, value: AtomicType) \ -> List[NMToken]: cast_func = xsd10_atomic_types['NMTOKEN'] if isinstance(value, UntypedAtomic): values = value.value.split() or [value.value] elif hasattr(value, 'split'): values = value.split() or [value] else: raise self.error('FORG0001') try: return [cast_func(x) for x in values] except ValueError as err: raise self.error('FORG0001', err) from None @XPath30Parser.constructor('IDREFS', sequence_types=('xs:IDREF*',)) def cast_idrefs_list_type(self: XPathConstructor, value: AtomicType) \ -> List[Idref]: cast_func = xsd10_atomic_types['IDREF'] if isinstance(value, UntypedAtomic): values = value.value.split() or [value.value] elif hasattr(value, 'split'): values = value.split() or [value] else: raise self.error('FORG0001') try: return [cast_func(x) for x in values] except ValueError as err: raise self.error('FORG0001', err) from None @XPath30Parser.constructor('ENTITIES', sequence_types=('xs:ENTITY*',)) def cast_entities_list_type(self: XPathConstructor, value: AtomicType) \ -> List[Entity]: cast_func = xsd10_atomic_types['ENTITY'] if isinstance(value, UntypedAtomic): values = value.value.split() or [value.value] elif hasattr(value, 'split'): values = value.split() or [value] else: raise self.error('FORG0001') try: return [cast_func(x) for x in values] except ValueError as err: raise self.error('FORG0001', err) from None ### # In XPath 3.0+ the 'error' keyword has to be used both for fn:error() and xs:error() XPath30Parser.unregister('error') # TODO: apply sequence_types=('xs:anyAtomicType?', 'xs:error?') for xs:error @XPath30Parser.constructor('error', bp=90, label=('function', 'constructor function'), nargs=(0, 3), sequence_types=('xs:QName?', 'xs:string', 'item()*', 'none')) def cast_error_type(self: XPathConstructor, value: AtomicType) -> Emptiable[None]: if value is None or value == []: return [] msg = f"Cast {value!r} to xs:error is not possible" raise self.error('FORG0001', msg) @method('error') def nud_error_type_and_function(self: XPathConstructor) -> XPathConstructor: self.clear() if not self.parser.parse_arguments: return self try: self.parser.advance('(') if self.namespace == XSD_NAMESPACE: self.label = 'constructor function' self.nargs = 1 if self.parser.xsd_version == '1.0': raise self.error('XPST0051', 'xs:error is not defined with XSD 1.0') self.append(self.parser.expression(5)) else: self.label = 'function' for k in range(3): if self.parser.next_token.symbol == ')': break self.append(self.parser.expression(5)) if self.parser.next_token.symbol == ')': break self.parser.advance(',') self.parser.advance(')') except SyntaxError: raise self.error('XPST0017') from None else: return self @method('error') def evaluate_error_type_and_function(self: XPathConstructor, context: ContextType = None) \ -> Emptiable[ElementPathError]: if self.context is not None: context = self.context error: Optional[QName] if self.label == 'constructor function': return cast(ElementPathError, self.cast(self.get_argument(context))) elif not self: raise self.error('FOER0000') elif len(self) == 1: error = self.get_argument(context, cls=QName) if error is None and self.parser.version == '3.0': raise self.error('XPTY0004', "an xs:QName expected") raise self.error(error or 'FOER0000') else: error = self.get_argument(context, cls=QName) description: Optional[str] = self.get_argument(context, index=1, cls=str) raise self.error(error or 'FOER0000', description) sissaschool-elementpath-d3688c7/elementpath/xpath30/_xpath30_operators.py000066400000000000000000000217631476131650400266330ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.0 implementation - part 2 (symbols, operators and expressions) """ from copy import copy from typing import Any, cast, List, Type, Union from elementpath._typing import Iterator from elementpath.aliases import InputType from elementpath.namespaces import XPATH_FUNCTIONS_NAMESPACE, XSD_NAMESPACE from elementpath.xpath_tokens import XPathToken, ValueToken, XPathFunction, \ XPathMap, XPathArray from elementpath.xpath_context import ContextType, ItemType from elementpath.datatypes import QName from .xpath30_parser import XPath30Parser __all__ = ['XPath30Parser'] register = XPath30Parser.register infix = XPath30Parser.infix method = XPath30Parser.method register(':=') ### # Placeholder symbol (used also for optional occurrence) XPath30Parser.unregister('?') register('?', bases=(ValueToken,)) @method('?') def nud_placeholder_symbol(self: ValueToken) -> ValueToken: return self @method('?') def evaluate_placeholder_symbol(self: ValueToken, context: ContextType = None) -> ValueToken: return self ### # Braced/expanded QName(s) XPath30Parser.duplicate('{', 'Q{', pattern=r'Q\{') XPath30Parser.unregister('{') XPath30Parser.unregister('}') register('{') register('}', bp=100) XPath30Parser.unregister('(') @method(register('(', lbp=80, rpb=80, label='expression')) def nud_parenthesized_expression(self: XPathToken) -> XPathToken: if self.parser.next_token.symbol != ')': self[:] = self.parser.expression(), self.parser.advance(')') return self @method('(') def led_parenthesized_expression(self: XPathToken, left: XPathToken) -> XPathToken: if left.symbol in ('(name)', 'Q{'): if left.value in self.parser.RESERVED_FUNCTION_NAMES: msg = f"{left.value!r} is not allowed as function name" raise left.error('XPST0003', msg) else: raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) elif left.symbol == ':' and left[1].symbol == '(name)': if left[1].namespace == XSD_NAMESPACE: msg = 'unknown constructor function {!r}'.format(left[1].value) raise left[1].error('XPST0017', msg) raise left.error('XPST0017', 'unknown function {!r}'.format(left.value)) if self.parser.next_token.symbol != ')': self[:] = left, self.parser.expression() else: self[:] = left, self.parser.advance(')') return self @method('(') def evaluate_parenthesized_expression(self: XPathToken, context: ContextType = None) \ -> Union[ItemType, List[ItemType], XPathToken]: if not self: return [] value = self[0].evaluate(context) if isinstance(value, list) and len(value) == 1: value = value[0] if len(self) > 1: if isinstance(value, XPathFunction): func: XPathFunction func = value tokens = self[1].get_argument_tokens() if any(x.symbol == '?' and not x for x in tokens): func.check_arguments_number(len(tokens)) func = copy(func) func[:] = tokens func.to_partial_function() return func arguments: List[InputType[ItemType]] arguments = [tk.evaluate(context) for tk in tokens] if func.label == 'partial function' and func[0].symbol == '?' and len(func[0]): if context is None: raise self.missing_context() return func(context.item, *arguments, context=context) return func(*arguments, context=context) elif self[0].symbol == '(': if not isinstance(value, list): return value elif any(not isinstance(x, XPathFunction) for x in value): return value if isinstance(value, XPathToken) and value.symbol == '?': return value raise self.error('XPTY0004', f'an XPath function expected, not {type(value)!r}') if isinstance(value, (XPathMap, XPathArray)) or \ not isinstance(value, XPathFunction) or self[0].span[0] > self.span[0]: return value else: return value(context=context) @method(infix('||', bp=32)) def evaluate_union_operator(self: XPathToken, context: ContextType = None) -> str: return self.string_value(self.get_argument(context)) + \ self.string_value(self.get_argument(context, index=1)) @method(infix('!', bp=72)) def select_simple_map_operator(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: if context is None: raise self.missing_context() for context.item in context.inner_focus_select(self[0]): for result in self[1].select(copy(context)): yield result ### # 'let' expressions @method(register('let', lbp=20, rbp=20, label='let expression')) def nud_let_expression(self: XPathToken) -> XPathToken: del self[:] if self.parser.next_token.symbol != '$': return self.as_name() while True: self.parser.next_token.expected('$') variable = self.parser.expression(5) self.append(variable) self.parser.advance(':=') expr = self.parser.expression(5) self.append(expr) if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('return') self.append(self.parser.expression(5)) return self @method('let') def select_let_expression(self: XPathToken, context: ContextType = None) \ -> Iterator[ItemType]: if context is None: raise self.missing_context() for k in range(0, len(self) - 1, 2): varname = cast(str, self[k][0].value) value = self[k+1].evaluate(context) context.variables[varname] = value yield from self[-1].select(context) @method('#', bp=90) def led_function_reference(self: XPathToken, left: XPathToken) -> XPathToken: if not left.label.endswith('function'): left.expected(':', '(name)', 'Q{') self[:] = left, self.parser.expression(rbp=90) self[1].expected('(integer)') return self @method('#') def evaluate_function_reference(self: XPathToken, context: ContextType = None) -> XPathFunction: token_class: Type[Union[XPathFunction, XPathToken]] namespace: Any name: Any arity = self[1].value assert arity is None or isinstance(arity, int) if isinstance(self[0], XPathFunction): token_class = self[0].__class__ namespace = self[0].namespace name = self[0].name if isinstance(name, QName): qname = name else: qname = QName(None, f'anonymous {self[0].label}'.replace(' ', '-')) else: if self[0].symbol == ':': namespace = self[0][1].namespace name = self[0].value elif self[0].symbol == 'Q{': namespace = self[0][0].value name = self[0][1].value elif self[0].value not in self.parser.RESERVED_FUNCTION_NAMES: namespace = XPATH_FUNCTIONS_NAMESPACE name = self[0].value else: msg = f"{self[0].value!r} is not allowed as function name" raise self.error('XPST0003', msg) assert isinstance(name, str) assert isinstance(namespace, str) or namespace is None qname = QName(namespace, name) namespace = qname.namespace local_name = qname.local_name # Generic rule for XSD constructor functions if namespace == XSD_NAMESPACE and arity != 1: raise self.error('XPST0017', f"unknown function {qname.qname}#{arity}") # Special checks for multirole tokens if namespace == XPATH_FUNCTIONS_NAMESPACE and \ local_name in ('QName', 'dateTime') and arity == 1: raise self.error('XPST0017', f"unknown function {qname.qname}#{arity}") try: token_class = self.parser.symbol_table[qname.expanded_name] except KeyError: try: token_class = self.parser.symbol_table[local_name] except KeyError: msg = f"unknown function {qname.qname}#{arity}" raise self.error('XPST0017', msg) from None if token_class.symbol == 'function' or not token_class.label.endswith('function'): raise self.error('XPST0003') assert issubclass(token_class, XPathFunction) try: func = token_class(self.parser, nargs=arity) except TypeError: msg = f"unknown function {qname.qname}#{arity}" raise self.error('XPST0017', msg) from None else: if func.namespace is None: func.namespace = namespace elif func.namespace != namespace: raise self.error('XPST0017', f"unknown function {qname.qname}#{arity}") func.context = copy(context) return func sissaschool-elementpath-d3688c7/elementpath/xpath30/xpath30_helpers.py000066400000000000000000000560161476131650400261170ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import calendar import datetime import decimal import re from typing import Any, List, Optional, Tuple, Union from unicodedata import category from elementpath._typing import Iterator from elementpath.exceptions import xpath_error from elementpath.regex import translate_pattern from ._translation_maps import ALPHABET_CHARACTERS, OTHER_NUMBERS, ROMAN_NUMERALS_MAP, \ NUM_TO_MONTH_MAPS, NUM_TO_WEEKDAY_MAPS, NUM_TO_WORD_MAPS, MILITARY_TIME_ZONES PRESENTATION_FORMATS = {'i', 'I', 'w', 'W', 'Ww', 'a', 'A', 'n', 'N', 'Nn', 'Z'} PICTURE_PATTERN = re.compile(r'\[(?!\[)[^]]+]') UNICODE_DIGIT_PATTERN = re.compile(r'\d') DECIMAL_DIGIT_PATTERN = re.compile(translate_pattern(r'^((\p{Nd}|#|[^\p{N}\p{L}])+?)$')) FMT_MODIFIER_PATTERN = re.compile(r'([co](\(.+\))?)?[at]?$') WIDTH_PATTERN = re.compile(r'^([0-9]+|\*)(-([0-9]+|\*))?$') MODIFIER_PATTERN = re.compile(r'^([co](\(.+\))?)?[at]?$') def decimal_to_string(value: decimal.Decimal) -> str: """ Convert a Decimal value to a string representation that not includes exponent and with its decimals. """ exponent: Any sign, digits, exponent = value.as_tuple() if not exponent: result = ''.join(str(x) for x in digits) elif exponent > 0: result = ''.join(str(x) for x in digits) + '0' * exponent else: result = ''.join(str(x) for x in digits[:exponent]) if not result: result = '0' result += '.' if len(digits) >= -exponent: result += ''.join(str(x) for x in digits[exponent:]) else: result += '0' * (-exponent - len(digits)) result += ''.join(str(x) for x in digits) return '-' + result if sign else result def int_to_roman(num: int) -> str: """ Convert an integer to Roman ordinal. """ def roman_num(value: int) -> Iterator[str]: if not value: yield '0' return elif value < 0: yield '-' value = abs(value) for base, roman in ROMAN_NUMERALS_MAP.items(): if value: yield roman * (value // base) value %= base return ''.join(x for x in roman_num(num)) def int_to_alphabetic(num: int, reference: Optional[str] = None) -> str: if not reference or len(reference) > 1: try: alphabet = ALPHABET_CHARACTERS[reference] except KeyError: msg = "formatting for language {!r} is not supported" raise NotImplementedError(msg.format(reference)) elif reference.isdigit(): for alphabet in OTHER_NUMBERS: if reference in alphabet: break else: alphabet = '1234567890' else: for alphabet in ALPHABET_CHARACTERS.values(): if reference.lower() in alphabet: break else: alphabet = '1234567890' base = len(alphabet) if not num: return '0' chars = [] negative = num < 0 num = abs(num) - 1 while num >= 0: chars.append(alphabet[num % base]) num = (num // base) - 1 if negative: chars.append('-') return ''.join(reversed(chars)) def int_to_month(num: int, lang: Optional[str] = None) -> str: if lang is None: lang = 'en' try: months_map = NUM_TO_MONTH_MAPS[lang] except KeyError: months_map = NUM_TO_MONTH_MAPS['en'] return months_map[num] def int_to_weekday(num: int, lang: Optional[str] = None) -> str: if lang is None: lang = 'en' try: weekday_map = NUM_TO_WEEKDAY_MAPS[lang] except KeyError: weekday_map = NUM_TO_WEEKDAY_MAPS['en'] return weekday_map[num] def week_in_month(dt: datetime.datetime) -> int: month_cal = calendar.monthcalendar(dt.year, dt.month) for k, week_cal in enumerate(month_cal, start=1): if dt.day in week_cal: if month_cal[0][3]: return k elif k > 1: return k - 1 if dt.month > 1: prev_month_cal = calendar.monthcalendar(dt.year, dt.month - 1) else: prev_month_cal = calendar.monthcalendar(dt.year - 1, 12) if prev_month_cal[0][3]: return len(prev_month_cal) else: return len(prev_month_cal) - 1 else: raise ValueError(f'{dt.day} does not match related calendar') def format_digits(digits: str, fmt: str, digits_family: str = '0123456789', optional_digit: str = '#', grouping_separator: Optional[str] = None) -> str: result = [] iter_num_digits = reversed(digits) num_digit = next(iter_num_digits) for fmt_char in reversed(fmt): if fmt_char.isdigit() or fmt_char == optional_digit: if num_digit: result.append(digits_family[ord(num_digit) - 48]) num_digit = next(iter_num_digits, '') elif fmt_char != optional_digit: result.append(digits_family[0]) elif not result or not result[-1].isdigit() and grouping_separator \ and result[-1] != grouping_separator: raise xpath_error('FODF1310', "invalid grouping in picture argument") else: result.append(fmt_char) if num_digit: separator = '' _separator = {x for x in fmt if not x.isdigit() and x != optional_digit} if len(_separator) != 1: repeat = None else: separator = _separator.pop() chunks = fmt.split(separator) if len(chunks[0]) > len(chunks[-1]): repeat = None elif all(len(item) == len(chunks[-1]) for item in chunks[1:-1]): repeat = len(chunks[-1]) + 1 else: repeat = None if repeat is None: while num_digit: result.append(digits_family[ord(num_digit) - 48]) num_digit = next(iter_num_digits, '') else: while num_digit: if ((len(result) + 1) % repeat) == 0: result.append(separator) result.append(digits_family[ord(num_digit) - 48]) num_digit = next(iter_num_digits, '') if grouping_separator: return ''.join(reversed(result)).lstrip(grouping_separator) while result and \ category(result[-1]) not in ('Nd', 'Nl', 'No', 'Lu', 'Ll', 'Lt', 'Lm', 'Lo'): result.pop() return ''.join(reversed(result)) def ordinal_suffix(value: int) -> str: value = abs(value) % 100 if 3 < value < 20: return 'th' value %= 10 if value == 1: return 'st' elif value == 2: return 'nd' elif value == 3: return 'rd' else: return 'th' def to_ordinal_en(num_as_words: str) -> str: if num_as_words.endswith('one'): return num_as_words[:-3] + 'first' elif num_as_words.endswith('two'): return num_as_words[:-3] + 'second' elif num_as_words.endswith('three'): return num_as_words[:-5] + 'third' elif num_as_words.endswith('eight'): return num_as_words + 'h' elif num_as_words.endswith('nine'): return num_as_words[:-1] + 'th' elif num_as_words.endswith('y'): return num_as_words[:-1] + 'ieth' elif num_as_words.endswith('e'): return num_as_words[:-2] + 'fth' else: return num_as_words + 'th' def to_ordinal_it(num_as_words: str, fmt_modifier: str) -> str: if '%spellout-ordinal-feminine' in fmt_modifier: suffix = 'a' elif fmt_modifier.startswith('o(-'): suffix = fmt_modifier[3:-1] else: suffix = '' ordinal_map = { 'zero': '', 'uno': 'primo', 'due': 'secondo', 'tre': 'terzo', 'quattro': 'quarto', 'cinque': 'quinto', 'sei': 'sesto', 'sette': 'settimo', 'otto': 'ottavo', 'nove': 'nono', 'dieci': 'decimo', } try: value = ordinal_map[num_as_words] except KeyError: if num_as_words[-1] in 'eo': value = num_as_words[:-1] + 'esimo' else: value = num_as_words + 'esimo' if value and suffix: return value[:-1] + suffix return value def int_to_words(num: int, lang: Optional[str] = None, fmt_modifier: str = '') -> str: def word_num(value: int) -> Iterator[str]: if not value: yield num_map[value] for base, word in num_map.items(): if base >= 1: floor = value // base if not floor: continue elif base >= 100: yield from word_num(floor) yield ' ' yield word value %= base if not value: break elif base < 100: yield '-' elif base == 100: if lang == 'en': yield ' and ' else: yield ' ' try: num_map = NUM_TO_WORD_MAPS[lang] # type: ignore[index] except KeyError: lang = 'en' num_map = NUM_TO_WORD_MAPS[lang] if num < 0: result = '-' + ''.join(x for x in word_num(abs(num))) else: result = ''.join(x for x in word_num(num)) if not fmt_modifier.startswith('o'): return result if lang == 'en': return to_ordinal_en(result) elif lang == 'it': return to_ordinal_it(result, fmt_modifier) else: return result def parse_datetime_picture(picture: str) -> Tuple[List[str], List[str]]: """ Analyze a picture argument of XPath 3.0+ formatting functions. :param picture: the picture string. :return: a couple of lists containing the literal parts and markers. """ min_value: Union[int, str] max_value: Union[None, int, str] literals = [] for lit in PICTURE_PATTERN.split(picture): if '[' in lit.replace('[[', ''): raise xpath_error('FOFD1340', "Invalid character '[' in picture literal") elif ']' in lit.replace(']]', ''): raise xpath_error('FOFD1340', "Invalid character ']' in picture literal") else: literals.append(lit.replace('[[', '[').replace(']]', ']')) markers = [x.group().replace(' ', '').replace('\n', '').replace('\t', '') for x in PICTURE_PATTERN.finditer(picture)] assert len(markers) == (len(literals) - 1) msg_tmpl = 'Invalid formatting component {!r}' for value in markers: if value[1] not in 'YMDdFWwHhPmsfZzCE': raise xpath_error('FOFD1340', msg_tmpl.format(value)) if ',' not in value: presentation = value[2:-1] else: presentation, width = value[2:-1].rsplit(',', maxsplit=1) if WIDTH_PATTERN.match(width) is None: raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') elif '-' not in width: if '*' not in width and not int(width): raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') elif '*' not in width: min_value, max_value = map(int, width.split('-')) if min_value < 1 or max_value < min_value: raise xpath_error('FOFD1340', msg_tmpl.format(value)) else: min_value, max_value = width.split('-') if min_value != '*' and not int(min_value): raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') if max_value != '*' and not int(max_value): raise xpath_error('FOFD1340', f'Invalid width modifier {value!r}') if len(presentation) > 1 and presentation[-1] in 'atco': presentation = presentation[:-1] if not presentation or presentation in PRESENTATION_FORMATS: pass elif DECIMAL_DIGIT_PATTERN.match(presentation) is None: raise xpath_error('FOFD1340', msg_tmpl.format(value)) else: if value[1] == 'f': if presentation[0] == '#' and any(ch.isdigit() for ch in presentation): msg = 'picture argument has an invalid primary format token' raise xpath_error('FOFD1340', msg) elif presentation[0].isdigit() and '#' in presentation: msg = 'picture argument has an invalid primary format token' raise xpath_error('FOFD1340', msg) # Check digits set uniformity cp = None for ch in reversed(presentation): if not ch.isdigit(): continue elif cp is None: cp = ord(ch) elif abs(ord(ch) - cp) > 10: raise xpath_error('FOFD1340', msg_tmpl.format(value)) return literals, markers def parse_datetime_marker(marker: str, dt: datetime.datetime, lang: Optional[str] = None) -> str: min_width: int max_width: Optional[int] component = marker[1] fmt_token = marker[2:-1] if ',' not in fmt_token: presentation, width = fmt_token, '' else: presentation, width = fmt_token.rsplit(',', maxsplit=1) if not presentation: fmt_modifier = '' if component in 'Hhf': presentation = '1' elif component in 'ms': presentation = '01' elif component in 'Zz': presentation = '01:01' else: presentation = 'n' elif presentation == 'a': fmt_modifier = '' else: _match = FMT_MODIFIER_PATTERN.search(presentation) if _match is None: fmt_modifier = '' else: fmt_modifier = _match.group(0) if fmt_modifier: presentation = presentation[:-len(fmt_modifier)] if presentation.startswith('#') and presentation.endswith('#'): msg_tmpl = 'Invalid formatting component {!r}' raise xpath_error('FOFD1340', msg_tmpl.format(component)) for pch in presentation: if pch.isdigit(): zero_cp = ord(pch) - int(pch) zero_ch = chr(zero_cp) break else: zero_cp, zero_ch = ord('0'), '0' digits = sum(c.isdigit() for c in presentation) opt_digits = presentation.count('#') if not width or width == '*': if digits > 1: min_width, max_width = digits, digits + opt_digits else: min_width, max_width = 0, None else: min_width, max_width = parse_width(width) if digits > 1: min_width = max(min_width, digits) if max_width: max_width = max(max_width, digits + opt_digits) if component == 'Y': value = str(abs(dt.year)) elif component == 'M': if presentation.lower().startswith('n') and lang is not None: value = int_to_month(dt.month, lang) else: value = str(dt.month) elif component == 'D': value = str(dt.day) elif component == 'H': value = str(dt.hour) elif component == 'h': if dt.hour == 0: value = '12' elif dt.hour > 12: value = str(dt.hour % 12) else: value = str(dt.hour) elif component == 'P': value = 'a.m.' if dt.hour < 12 else 'p.m.' elif component == 'm': value = str(dt.minute) elif component == 's': value = str(dt.second) elif component == 'f': value = str('{:06}'.format(dt.microsecond)) elif component == 'z' or component == 'Z': if presentation == 'N': value = dt.tzname() or '' elif dt.tzinfo is None: value = '+00:00' else: value = str(dt) if value.endswith('Z'): value = '+00:00' else: value = value[-6:] elif component == 'W': value = str(dt.isocalendar()[1]) elif component == 'w': value = str(week_in_month(dt)) elif component == 'F': if presentation.lower().startswith('n') and lang is not None: value = int_to_weekday(dt.isocalendar()[2], lang) else: value = str(dt.isocalendar()[2]) elif component == 'E': if dt.year < 0: value = 'BC' else: value = 'AD' elif component == 'd': delta = dt - type(dt)(dt.year, 1, 1) value = str(1 + delta.seconds // 86400) else: msg_tmpl = 'Invalid formatting component {!r}' raise xpath_error('FOFD1340', msg_tmpl.format(component)) sign = '' left_to_right = component != 'Y' if presentation == 'n': fmt_chunk = value.lower() elif presentation == 'N': fmt_chunk = value.upper() elif presentation == 'Nn': fmt_chunk = value.title() elif presentation == 'I' or presentation == 'i': fmt_chunk = value elif presentation == 'Z' and component == 'Z': if dt.tzinfo is None: fmt_chunk = MILITARY_TIME_ZONES[None] elif value.endswith(':00'): fmt_chunk = MILITARY_TIME_ZONES.get(value[:3], value) else: fmt_chunk = value elif presentation == 'w': fmt_chunk = int_to_words(int(value), lang, fmt_modifier) elif presentation == 'W': fmt_chunk = int_to_words(int(value), lang, fmt_modifier).upper() elif presentation == 'Ww': fmt_chunk = int_to_words(int(value), lang, fmt_modifier).title() elif presentation == 'a': fmt_chunk = int_to_alphabetic(int(value), lang) elif presentation == 'A': fmt_chunk = int_to_alphabetic(int(value), lang).upper() else: left_to_right = False k = 0 pch = '' chars = [] # Extract the sign if value.startswith('-') or value.startswith('+'): sign = value[0] value = value[1:] if component in 'zZ': if presentation.isdigit(): if len(presentation) <= 2: if value.endswith(':00'): value = value[:-3] left_to_right = True elif len(presentation) == 1: presentation = '#0:01' min_width, max_width = 3, 4 else: presentation = '01:01' min_width = max_width = 4 elif len(presentation) == 3: presentation = '#001' min_width, max_width = 3, 4 elif presentation.replace(':', '', 1).isdigit(): if len(presentation) == 4: presentation = '#0:01' min_width, max_width = 3, 4 if component != 'f': presentation = ''.join(reversed(presentation)) value = ''.join(reversed(value)) for ch in value: try: pch = presentation[k] except IndexError: if ch == '0' and not pch.isdigit(): break else: k += 1 while pch != '#' and not pch.isdigit(): chars.append(pch) min_width += 1 if max_width is not None: max_width += 1 try: pch = presentation[k] except IndexError: break else: k += 1 else: if ch.isdigit(): chars.append(ch) if component != 'f': fmt_chunk = ''.join(reversed(chars)) else: fmt_chunk = ''.join(chars) if 'o' in fmt_modifier: try: fmt_chunk += ordinal_suffix(int(fmt_chunk)) except ValueError: pass else: min_width += 2 if max_width is not None: max_width += 2 if len(fmt_chunk) < min_width and component not in 'PzZ': if component in 'f': fmt_chunk += zero_ch * (min_width - len(fmt_chunk)) else: fmt_chunk = zero_ch * (min_width - len(fmt_chunk)) + fmt_chunk if max_width: if left_to_right or component in 'f': fmt_chunk = fmt_chunk[:max_width] else: fmt_chunk = fmt_chunk[max(0, len(fmt_chunk)-max_width):] if component in 'zZ': if not min_width: fmt_chunk = fmt_chunk.lstrip('0') if not fmt_chunk: return 'Z' if component == 'Z' else 'GMT' + sign + '0' else: try: nz_first = min(k for k in range(len(fmt_chunk)) if fmt_chunk[k] != zero_ch) except ValueError: fmt_chunk = fmt_chunk[max(0, len(fmt_chunk) - min_width):] else: fmt_chunk = fmt_chunk[max(0, min(nz_first, len(fmt_chunk) - min_width)):] elif min_width == 3 and component == 'F': fmt_chunk = fmt_chunk[:3] elif min_width or component == 'f': try: nz_last = max(k for k in range(len(fmt_chunk)) if fmt_chunk[k] != zero_ch) except ValueError: nz_last = 0 fmt_chunk = fmt_chunk[:max(min_width, nz_last + 1)] if zero_ch != '0': fmt_chunk = ''.join(chr(zero_cp + int(ch)) if ch.isdigit() else ch for ch in fmt_chunk) if component == 'z': return 'GMT' + sign + fmt_chunk if presentation == 'I': return sign + int_to_roman(int(fmt_chunk)) elif presentation == 'i': return sign + int_to_roman(int(fmt_chunk)).lower() return sign + fmt_chunk def parse_width(width: str) -> Tuple[int, Optional[int]]: min_width: Union[str, int] max_width: Union[str, int, None] if WIDTH_PATTERN.match(width) is None: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') elif '-' not in width: if width == '*': return 0, None min_width = int(width) if not min_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') return min_width, None elif '*' not in width: min_width, max_width = map(int, width.split('-')) if not min_width or max_width < min_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') return min_width, max_width else: min_width, max_width = width.split('-') if min_width == '*': min_width = 0 else: min_width = int(min_width) if not min_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') if max_width == '*': return min_width, None else: max_width = int(max_width) if not max_width: raise xpath_error('FOFD1340', f'Invalid width modifier {width!r}') return min_width, max_width sissaschool-elementpath-d3688c7/elementpath/xpath30/xpath30_parser.py000066400000000000000000000074341476131650400257510ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.0 implementation - part 1 (parser class) Refs: - https://www.w3.org/TR/2014/REC-xpath-30-20140408/ - https://www.w3.org/TR/xpath-functions-30/ """ from copy import deepcopy from typing import Any, ClassVar, Dict, Optional, Tuple from elementpath.namespaces import XPATH_MATH_FUNCTIONS_NAMESPACE from elementpath.datatypes import QName from elementpath.xpath2 import XPath2Parser DecimalFormatsType = Dict[Optional[str], Dict[str, str]] class XPath30Parser(XPath2Parser): """ XPath 3.0 expression parser class. Accepts all XPath 2.0 options as keyword arguments, but the *strict* option is ignored because XPath 3.0+ has braced URI literals and the expanded name syntax is not compatible. :param args: the same positional arguments of class :class:`elementpath.XPath2Parser`. :param decimal_formats: a mapping with statically known decimal formats. :param defuse_xml: if `True` defuse XML data before parsing, that is the default. :param kwargs: the same keyword arguments of class :class:`elementpath.XPath2Parser`. """ version = '3.0' DEFAULT_NAMESPACES: ClassVar[Dict[str, str]] = { 'math': XPATH_MATH_FUNCTIONS_NAMESPACE, **XPath2Parser.DEFAULT_NAMESPACES } PATH_STEP_SYMBOLS = { '(integer)', '(string)', '(float)', '(decimal)', '(name)', '*', '@', '..', '.', '(', '{', 'Q{', '$', } # https://www.w3.org/TR/xpath-30/#id-reserved-fn-names RESERVED_FUNCTION_NAMES = { 'attribute', 'comment', 'document-node', 'element', 'empty-sequence', 'function', 'if', 'item', 'namespace-node', 'node', 'processing-instruction', 'schema-attribute', 'schema-element', 'switch', 'text', 'typeswitch', } function_signatures: Dict[Tuple[QName, int], str] = XPath2Parser.function_signatures.copy() decimal_formats: DecimalFormatsType = { None: { 'decimal-separator': '.', 'grouping-separator': ',', 'exponent-separator': 'e', 'infinity': 'Infinity', 'minus-sign': '-', 'NaN': 'NaN', 'percent': '%', 'per-mille': '‰', 'zero-digit': '0', 'digit': '#', 'pattern-separator': ';', } } def __init__(self, *args: Any, decimal_formats: Optional[DecimalFormatsType] = None, defuse_xml: bool = True, **kwargs: Any) -> None: kwargs.pop('strict', None) super(XPath30Parser, self).__init__(*args, **kwargs) if decimal_formats is not None: self.decimal_formats = deepcopy(self.decimal_formats) for k, v in decimal_formats.items(): if k is not None: self.decimal_formats[k] = self.decimal_formats[None].copy() self.decimal_formats[k].update(v) if None in decimal_formats: self.decimal_formats[None].update(decimal_formats[None]) if not defuse_xml: self.defuse_xml = defuse_xml def __str__(self) -> str: args = [] if self.decimal_formats != self.__class__.decimal_formats: args.append(f'decimal_formats={self.decimal_formats!r}') if not self.defuse_xml: args.append('defuse_xml=False') if not args: return super().__str__() repr_string = super().__str__()[:-1] if repr_string.endswith('('): return f"{repr_string}{', '.join(args)})" return f"{repr_string}, {', '.join(args)})" sissaschool-elementpath-d3688c7/elementpath/xpath31/000077500000000000000000000000001476131650400225255ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/elementpath/xpath31/__init__.py000066400000000000000000000010031476131650400246300ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from typing import TYPE_CHECKING if TYPE_CHECKING: from .xpath31_parser import XPath31Parser else: from ._xpath31_functions import XPath31Parser __all__ = ['XPath31Parser'] sissaschool-elementpath-d3688c7/elementpath/xpath31/_xpath31_functions.py000066400000000000000000001515641476131650400266320ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.1 implementation - part 3 (functions) """ import json import locale import math import pathlib import random import re from datetime import datetime, timedelta from decimal import Decimal from itertools import product from typing import Any, cast, Dict, List, Optional, Tuple from urllib.request import urlopen from urllib.parse import urlsplit from elementpath._typing import Callable, Iterable, Iterator from elementpath.aliases import SequenceType, Emptiable from elementpath.protocols import ElementProtocol, EtreeElementProtocol from elementpath.datatypes import AnyAtomicType, AbstractBinary, AbstractDateTime, \ DateTime, Timezone, Duration, BooleanProxy, DoubleProxy, DoubleProxy10, \ NumericProxy, UntypedAtomic, Base64Binary, Language, AtomicType, NumericType from elementpath.exceptions import ElementPathTypeError from elementpath.helpers import collapse_white_spaces, is_xml_codepoint, \ escape_json_string, unescape_json_string, not_equal from elementpath.namespaces import XPATH_FUNCTIONS_NAMESPACE, XML_BASE from elementpath.etree import etree_iter_strings, is_etree_element from elementpath.collations import CollationManager from elementpath.compare import get_key_function, same_key from elementpath.tree_builders import get_node_tree from elementpath.xpath_nodes import XPathNode, DocumentNode, EtreeElementNode from elementpath.xpath_tokens import XPathFunction, XPathConstructor, XPathMap, XPathArray from elementpath.xpath_context import ContextType, ItemType, FunctionArgType, XPathSchemaContext from elementpath.validators import validate_json_to_xml from ._xpath31_operators import XPath31Parser method = XPath31Parser.method function = XPath31Parser.function XPath31Parser.unregister('string-join') XPath31Parser.unregister('trace') SAFE_KEY_ATOMIC_TYPES = ( int, Decimal, AbstractBinary, AbstractDateTime, Duration ) TIMEZONE_MAP = { 'UT': '00:00', 'UTC': '00:00', 'GMT': '00:00', 'EST': '-05:00', 'EDT': '-04:00', 'CST': '-06:00', 'CDT': '-05:00', 'MST': '-07:00', 'MDT': '-06:00', 'PST': '-08:00', 'PDT': '-07:00', } @XPath31Parser.constructor('numeric') def cast_numeric_type(self: XPathConstructor, value: AtomicType) -> NumericType: if isinstance(value, NumericProxy): return cast(NumericType, value) try: return cast(float, NumericProxy(value)) # type: ignore[arg-type] except ValueError as err: if isinstance(value, (str, UntypedAtomic)): raise self.error('FORG0001', err) raise self.error('FOCA0002', err) @method(function('string-join', nargs=(1, 2), sequence_types=('xs:anyAtomicType*', 'xs:string', 'xs:string'))) def evaluate_string_join_function(self: XPathFunction, context: ContextType = None) -> str: if self.context is not None: context = self.context items = [self.string_value(s) for s in self[0].select(context)] if len(self) == 1: return ''.join(items) separator: str = self.get_argument(context, 1, required=True, cls=str) return separator.join(items) @method(function('size', prefix='map', nargs=1, sequence_types=('map(*)', 'xs:integer'))) def evaluate_map_size_function(self: XPathFunction, context: ContextType = None) -> int: return len(self.get_argument(self.context or context, required=True, cls=XPathMap)) @method(function('keys', prefix='map', nargs=1, sequence_types=('map(*)', 'xs:anyAtomicType*'))) def evaluate_map_keys_function(self: XPathFunction, context: ContextType = None) \ -> List[AtomicType]: if self.context is not None: context = self.context map_: XPathMap = self.get_argument(context, required=True, cls=XPathMap) return [x for x in map_.keys(context)] @method(function('contains', prefix='map', nargs=2, sequence_types=('map(*)', 'xs:anyAtomicType', 'xs:boolean'))) def evaluate_map_contains_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context map_ = self.get_argument(context, required=True, cls=XPathMap) key = self.get_argument(context, index=1, required=True, cls=AnyAtomicType) if isinstance(key, float) and math.isnan(key): return any(isinstance(k, float) and math.isnan(k) for k in map_.keys(context)) for k in map_.keys(context): try: if k == key: if isinstance(key, str) or isinstance(k, str): return True elif isinstance(key, UntypedAtomic) ^ isinstance(k, UntypedAtomic): return False else: return True except TypeError: continue else: return False @method(function('get', prefix='map', nargs=2, sequence_types=('map(*)', 'xs:anyAtomicType', 'item()*'))) def evaluate_map_get_function(self: XPathFunction, context: ContextType = None) \ -> SequenceType[ItemType]: if self.context is not None: context = self.context map_: XPathMap = self.get_argument(context, required=True, cls=XPathMap) key: AnyAtomicType = self.get_argument(context, index=1, required=True, cls=AnyAtomicType) return map_(key, context=context) @method(function('put', prefix='map', nargs=3, sequence_types=('map(*)', 'xs:anyAtomicType', 'item()*', 'map(*)'))) def evaluate_map_put_function(self: XPathFunction, context: ContextType = None) -> XPathMap: if self.context is not None: context = self.context map_ = self.get_argument(context, required=True, cls=XPathMap) key = self.get_argument(context, index=1, required=True, cls=AnyAtomicType) value = self[2].evaluate(context) if value is None: value = [] items = {k: v for k, v in map_.items(context) if not_equal(k, key)} items[key] = value return XPathMap(self.parser, items=items) @method(function('remove', prefix='map', nargs=2, sequence_types=('map(*)', 'xs:anyAtomicType*', 'map(*)'))) def evaluate_map_remove_function(self: XPathFunction, context: ContextType = None) -> XPathMap: if self.context is not None: context = self.context map_ = self.get_argument(context, required=True, cls=XPathMap) keys = self[1].evaluate(context) if keys is None: return map_ elif isinstance(keys, list): items = ((k, v) for k, v in map_.items(context) if all(not_equal(k, x) for x in keys)) else: items = ((k, v) for k, v in map_.items(context) if not_equal(k, keys)) return XPathMap(self.parser, items=items) @method(function('entry', prefix='map', nargs=2, sequence_types=('xs:anyAtomicType', 'item()*', 'map(*)'))) def evaluate_map_entry_function(self: XPathFunction, context: ContextType = None) -> XPathMap: if self.context is not None: context = self.context key = self.get_argument(context, required=True, cls=AnyAtomicType) value = self[1].evaluate(context) if value is None: value = [] return XPathMap(self.parser, items=[(key, value)]) @method(function('merge', prefix='map', nargs=(1, 2), sequence_types=('map(*)*', 'map(*)', 'map(*)'))) def evaluate_map_merge_function(self: XPathFunction, context: ContextType = None) -> XPathMap: if self.context is not None: context = self.context duplicates = 'use-first' if len(self) > 1: options: XPathMap = self.get_argument(context, index=1, required=True, cls=XPathMap) for opt, value in options.items(context): if opt == 'duplicates': if value in ('reject', 'use-first', 'use-last', 'use-any', 'combine'): duplicates = cast(str, value) else: raise self.error('FOJS0005') items: Dict[Any, Any] = {} for map_ in self[0].select(context): assert isinstance(map_, XPathMap) for k1, v in map_.items(context): # Speed up for certain key types or float values if isinstance(k1, SAFE_KEY_ATOMIC_TYPES) or \ isinstance(k1, float) and not math.isnan(k1): if k1 not in items: items[k1] = v elif duplicates == 'reject': raise self.error('FOJS0003') elif duplicates == 'use-last': items.pop(k1) # remove before to replace the key items[k1] = v elif duplicates == 'combine': try: items[k1].append(v) except AttributeError: items[k1] = [items[k1], v] continue # TODO: too slow. An alternative idea is to couple with the type # or an index for unsafe types, and then unpack after merge. for k2 in items: if same_key(k1, k2): if duplicates == 'reject': raise self.error('FOJS0003') elif duplicates == 'use-last': items.pop(k2) # remove before to replace the key items[k1] = v elif duplicates == 'combine': try: items[k2].append(v) except AttributeError: items[k2] = [items[k2], v] break else: items[k1] = v return XPathMap(self.parser, items) @method(function('find', prefix='map', nargs=2, sequence_types=('map(*)', 'xs:anyAtomicType', 'array(*)'))) def evaluate_map_find_function(self: XPathFunction, context: ContextType = None) -> XPathArray: if self.context is not None: context = self.context key = self.get_argument(context, index=1, required=True, cls=AnyAtomicType) items = [] def collect_matching_items(obj: SequenceType[ItemType]) -> None: if isinstance(obj, list): for x in obj: collect_matching_items(x) elif isinstance(obj, XPathArray): for y in obj.items(context): collect_matching_items(y) elif isinstance(obj, XPathMap): for k, v in obj.items(context): if k == key: items.append(v) collect_matching_items(v) for item in self[0].select(context): collect_matching_items(item) return XPathArray(self.parser, items) @method(function('for-each', prefix='map', nargs=2, sequence_types=('map(*)', 'function(xs:anyAtomicType, item()*) as item()*', 'item()*'))) def select_map_for_each_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context map_: XPathMap = self.get_argument(context, required=True, cls=XPathMap) func: XPathFunction = self.get_argument(context, index=1, required=True, cls=XPathFunction) for k, v in map_.items(context): result = func(k, v, context=context) if isinstance(result, list): yield from result else: yield result @method(function('size', prefix='array', nargs=1, sequence_types=('array(*)', 'xs:integer'))) def evaluate_array_size_function(self: XPathFunction, context: ContextType = None) -> int: return len(self.get_argument(self.context or context, required=True, cls=XPathArray)) @method(function('get', prefix='array', nargs=2, sequence_types=('array(*)', 'xs:integer', 'item()*'))) def evaluate_array_get_function(self: XPathFunction, context: ContextType = None) \ -> SequenceType[ItemType]: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) position: int = self.get_argument(context, index=1, required=True, cls=int) return array_(position, context=context) @method(function('put', prefix='array', nargs=3, sequence_types=('array(*)', 'xs:integer', 'item()*', 'array(*)'))) def evaluate_array_put_function(self: XPathFunction, context: ContextType = None) -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) position: int = self.get_argument(context, index=1, required=True, cls=int) member = self[2].evaluate(context) if member is None: member = [] if position <= 0: raise self.error('FOAY0001') items = array_.items(context) try: items[position - 1] = member except IndexError: if isinstance(context, XPathSchemaContext): return array_ raise self.error('FOAY0001') return XPathArray(self.parser, items=items) @method(function('insert-before', prefix='array', nargs=3, sequence_types=('array(*)', 'xs:integer', 'item()*', 'array(*)'))) def evaluate_array_insert_before_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) position: int = self.get_argument(context, index=1, required=True, cls=int) member = self[2].evaluate(context) if member is None: member = [] items = array_.items(context) if position <= 0 or position > len(items) + 1: raise self.error('FOAY0001') try: items.insert(position - 1, member) except IndexError: if isinstance(context, XPathSchemaContext): return array_ raise self.error('FOAY0001') return XPathArray(self.parser, items=items) @method(function('append', prefix='array', nargs=2, sequence_types=('array(*)', 'item()*', 'array(*)'))) def evaluate_array_append_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) appendage = self[1].evaluate(context) if appendage is None: appendage = [] items = array_.items(context) items.append(appendage) return XPathArray(self.parser, items=items) @method(function('remove', prefix='array', nargs=2, sequence_types=('array(*)', 'xs:integer*', 'array(*)'))) def evaluate_array_remove_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) positions_ = self[1].evaluate(context) if positions_ is None: return array_ positions: List[int] = [] for p in positions_ if isinstance(positions_, list) else [positions_]: if isinstance(p, int) and 0 < p <= len(array_): positions.append(p) elif isinstance(context, XPathSchemaContext): return array_ elif not isinstance(p, int): raise self.error('XPTY0004') else: raise self.error('FOAY0001') items = (v for k, v in enumerate(array_.items(context), 1) if k not in positions) return XPathArray(self.parser, items=items) @method(function('subarray', prefix='array', nargs=(2, 3), sequence_types=('array(*)', 'xs:integer', 'xs:integer', 'array(*)'))) def evaluate_array_subarray_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) start: int = self.get_argument(context, index=1, required=True, cls=int) if start < 1 or start > len(array_) + 1: if isinstance(context, XPathSchemaContext): return array_ raise self.error('FOAY0001') if len(self) > 2: length = self.get_argument(context, index=2, required=True, cls=int) if length < 0: raise self.error('FOAY0002') if start + length > len(array_) + 1: raise self.error('FOAY0001') items = array_.items(context)[start - 1:start + length - 1] else: items = array_.items(context)[start - 1:] return XPathArray(self.parser, items=items) @method(function('head', prefix='array', nargs=1, sequence_types=('array(*)', 'item()*'))) def evaluate_array_head_function(self: XPathFunction, context: ContextType = None) \ -> SequenceType[ItemType]: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) items = array_.items(context) if not items: if isinstance(context, XPathSchemaContext): return array_ raise self.error('FOAY0001') return cast(ItemType, items[0]) @method(function('tail', prefix='array', nargs=1, sequence_types=('array(*)', 'array(*)'))) def evaluate_array_tail_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) items = array_.items(context) if not items: if isinstance(context, XPathSchemaContext): return array_ raise self.error('FOAY0001') return XPathArray(self.parser, items=items[1:]) @method(function('reverse', prefix='array', nargs=1, sequence_types=('array(*)', 'array(*)'))) def evaluate_array_reverse_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray array_ = self.get_argument(context, required=True, cls=XPathArray) items = array_.items(context) return XPathArray(self.parser, items=reversed(items)) @method(function('join', prefix='array', nargs=1, sequence_types=('array(*)', 'array(*)'))) def evaluate_array_join_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context items = [] for array_ in self[0].select(context): if not isinstance(array_, XPathArray): raise self.error('XPTY0004') items.extend(array_.items(context)) return XPathArray(self.parser, items=items) @method(function('flatten', prefix='array', nargs=1, sequence_types=('item()*', 'item()*'))) def evaluate_array_flatten_function(self: XPathFunction, context: ContextType = None) \ -> List[ItemType]: if self.context is not None: context = self.context items: List[ItemType] = [] for obj in self[0].select(context): if isinstance(obj, XPathArray): items.extend(obj.iter_flatten(context)) else: items.append(obj) return items @method(function('for-each', prefix='array', nargs=2, sequence_types=('array(*)', 'function(item()*) as item()*', 'array(*)'))) def evaluate_array_for_each_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) func: XPathFunction = self.get_argument(context, index=1, required=True, cls=XPathFunction) items = array_.items(context) return XPathArray(self.parser, items=map(lambda x: func(x, context=context), items)) @method(function('for-each-pair', prefix='array', nargs=3, sequence_types=('array(*)', 'array(*)', 'function(item()*, item()*) as item()*', 'array(*)'))) def evaluate_array_for_each_pair_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array1: XPathArray = self.get_argument(context, required=True, cls=XPathArray) array2: XPathArray = self.get_argument(context, index=1, required=True, cls=XPathArray) func: XPathFunction = self.get_argument(context, index=2, required=True, cls=XPathFunction) items = zip(array1.items(context), array2.items(context)) return XPathArray(self.parser, items=map(lambda x: func(*x, context=context), items)) @method(function('filter', prefix='array', nargs=2, sequence_types=('array(*)', 'function(item()*) as xs:boolean', 'array(*)'))) def evaluate_array_filter_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) func: XPathFunction = self.get_argument(context, index=1, required=True, cls=XPathFunction) items = array_.items(context) def filter_function(x: FunctionArgType) -> bool: choice = func(x, context=context) if not isinstance(choice, bool): raise self.error('XPTY0004', f'{func} must return xs:boolean values') return choice return XPathArray(self.parser, items=filter(filter_function, items)) @method(function('fold-left', prefix='array', nargs=3, sequence_types=('array(*)', 'item()*', 'function(item()*, item()) as item()*', 'item()*'))) @method(function('fold-right', prefix='array', nargs=3, sequence_types=('array(*)', 'item()*', 'function(item()*, item()) as item()*', 'item()*'))) def select_array_fold_left_right_functions(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context func = self[2][1] if self[2].symbol == ':' else self[2] if not isinstance(func, XPathFunction): func = self.get_argument(context, index=2, cls=XPathFunction, required=True) if func.arity != 2: raise self.error('XPTY0004', "function arity must be 2") assert isinstance(func, XPathFunction) array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) zero = self.get_argument(context, index=1) result = zero if self.symbol == 'fold-left': for item in array_.items(context): result = func(result, item, context=context) else: for item in reversed(array_.items(context)): result = func(item, result, context=context) if isinstance(result, list): yield from result else: yield result @method(function('sort', nargs=(1, 3), sequence_types=('item()*', 'xs:string?', 'function(item()) as xs:anyAtomicType*', 'item()*'))) def evaluate_sort_function(self: XPathFunction, context: ContextType = None) \ -> SequenceType[ItemType]: if self.context is not None: context = self.context if len(self) < 2: collation = self.parser.default_collation else: collation = self.get_argument(context, 1, cls=str) if collation is None: collation = self.parser.default_collation if len(self) == 3: func = self.get_argument(context, index=2, required=True, cls=XPathFunction) key_function = get_key_function( collation, key_func=lambda x: func(x, context=context), token=self ) else: key_function = get_key_function(collation, token=self) try: return sorted(self[0].select(context), key=key_function) except ElementPathTypeError: raise except TypeError: if isinstance(context, XPathSchemaContext): return [] raise self.error('XPTY0004') @method(function('sort', prefix='array', nargs=(1, 3), sequence_types=('array(*)', 'xs:string?', 'function(item()*) as xs:anyAtomicType*', 'array(*)'))) def evaluate_array_sort_function(self: XPathFunction, context: ContextType = None) \ -> XPathArray: if self.context is not None: context = self.context array_: XPathArray = self.get_argument(context, required=True, cls=XPathArray) if len(self) < 2: collation = self.parser.default_collation else: collation = self.get_argument(context, 1, cls=str) if collation is None: collation = self.parser.default_collation if len(self) == 3: func: XPathFunction func = self.get_argument(context, index=2, required=True, cls=XPathFunction) key_function = get_key_function( collation, key_func=lambda x: func(x, context=context), token=self ) else: key_function = get_key_function(collation, token=self) try: items = sorted(array_.items(context), key=key_function) except ElementPathTypeError: raise except TypeError: if isinstance(context, XPathSchemaContext): return array_ raise self.error('XPTY0004') else: return XPathArray(self.parser, items) @method(function('json-doc', nargs=(1, 2), sequence_types=('xs:string?', 'map(*)', 'item()?'))) @method(function('parse-json', nargs=(1, 2), sequence_types=('xs:string?', 'map(*)', 'item()?'))) def evaluate_parse_json_functions(self: XPathFunction, context: ContextType = None) \ -> Emptiable[ItemType]: if self.symbol == 'json-doc': href = self.get_argument(context, cls=str) if href is None: return [] try: if urlsplit(href).scheme: with urlopen(href) as fp: json_text = fp.read().decode('utf-8') else: with pathlib.Path(href).open() as fp: json_text = fp.read() except IOError: raise self.error('FOUT1170') from None else: href = None json_text = self.get_argument(context, cls=str) if json_text is None: return [] def _fallback(*args: Any, context: ContextType = None) -> str: return '\uFFFD' liberal = False duplicates = 'use-first' escape = None fallback: Callable[..., str] = _fallback if len(self) > 1: map_ = self.get_argument(context, index=1, required=True, cls=XPathMap) for k, v in map_.items(context): if k == 'liberal': if not isinstance(v, bool): raise self.error('XPTY0004') liberal = v elif k == 'duplicates': if not isinstance(v, str): raise self.error('XPTY0004') elif v not in ('use-first', 'use-last', 'reject'): raise self.error('FOJS0005') duplicates = v elif k == 'escape': if not isinstance(v, bool): raise self.error('XPTY0004') escape = v elif k == 'fallback': if not isinstance(v, XPathFunction): msg = 'fallback parameter is not a function' raise self.error('XPTY0004', msg) elif v.arity != 1: msg = f'fallback function has arity {v.arity} (must be 1)' raise self.error('XPTY0004', msg) elif escape: msg = "cannot provide both 'fallback' and 'escape' parameters" raise self.error('FOJS0005', msg) fallback = cast(Callable[..., str], v) escape = False def decode_value(value: SequenceType[ItemType]) -> ItemType: if value is None: return [] elif isinstance(value, list): return XPathArray(self.parser, [decode_value(x) for x in value]) elif not isinstance(value, str): return value elif escape: return json.dumps(value, ensure_ascii=True)[1:-1].replace('\\"', '"') return ''.join( x if is_xml_codepoint(ord(x)) else fallback(rf'\u{ord(x):04X}', context=context) for x in value ) def json_object_pairs_to_map(obj: Iterable[Tuple[str, SequenceType[ItemType]]]) -> XPathMap: items: Dict[ItemType, SequenceType[ItemType]] = {} for item in obj: key, value = decode_value(item[0]), decode_value(item[1]) if key in items: if duplicates == 'use-first': continue elif duplicates == 'reject': raise self.error('FOJS0003') if isinstance(value, list): values = [decode_value(x) for x in value] items[key] = XPathArray(self.parser, values) if values else values else: items[key] = value return XPathMap(self.parser, items) kwargs: Dict[str, Any] = {'object_pairs_hook': json_object_pairs_to_map} if liberal or escape: kwargs['strict'] = False if liberal: def parse_constant(s: str) -> None: raise self.error('FOJS0001') kwargs['parse_constant'] = parse_constant try: result = json.JSONDecoder(**kwargs).decode(json_text) except json.JSONDecodeError: if href and urlsplit(href).fragment: raise self.error('FOUT1170') from None raise self.error('FOJS0001') from None else: return decode_value(result) @method(function('load-xquery-module', nargs=(1, 2), sequence_types=('xs:string', 'map(*)', 'map(*)'))) def evaluate_load_xquery_module_function(self: XPathFunction, context: ContextType = None) \ -> XPathMap: if self.context is not None: context = self.context try: module_uri = self.get_argument(context, required=True, cls=str) except TypeError: raise self.error('FOQM0006') if not module_uri: raise self.error('FOQM0001') if len(self) > 1: options = self.get_argument(context, index=1, required=True, cls=XPathMap) for k, v in options.items(context): if k == 'xquery-version': if not isinstance(v, (int, float, Decimal)): raise self.error('FOQM0005') elif k == 'location-hints': if not isinstance(v, str) or \ not (isinstance(v, list) and all(isinstance(x, str) for x in v)): raise self.error('FOQM0005') elif k == 'context-item': if isinstance(v, list) and len(v) > 1: raise self.error('FOQM0005') elif k == 'variables' or k == 'vendor-options': if not isinstance(v, XPathMap) or \ any(not isinstance(x, str) for x in v.keys(context)): raise self.error('FOQM0006') else: raise self.error('FOQM0005') raise self.error('FOQM0006') # XQuery not available @method(function('transform', nargs=1, sequence_types=('map(*)', 'map(*)'))) def evaluate_transform_function(self: XPathFunction, context: ContextType = None) -> XPathMap: if self.context is not None: context = self.context options = self.get_argument(context, required=True, cls=XPathMap) for k, v in options.items(context): # Check only 'xslt-version' parameter until an effective # XSLT implementation will be loadable. if k == 'xslt-version': if not isinstance(v, (int, float, Decimal)): raise self.error('FOXT0002') raise self.error('FOXT0004') # XSLT transformation has been disabled @method(function('random-number-generator', nargs=(0, 1), sequence_types=('xs:anyAtomicType?', 'map(xs:string, item())'))) def evaluate_random_number_generator_function(self: XPathFunction, context: ContextType = None) \ -> ItemType: if self.context is not None: context = self.context seed = self.get_argument(context, cls=AnyAtomicType) if not isinstance(seed, (int, str)): seed = str(seed) random.seed(seed) class Permute(XPathFunction): nargs = 1 sequence_types = ('item()*', 'item()*') def __call__(self, *args: Any, **kwargs: Any) -> List[ItemType]: if not args: return [] try: seq = [x for x in args[0]] except TypeError: return [args[0]] else: random.shuffle(seq) return seq class NextRandom(XPathFunction): nargs = 0 sequence_types = ('map(xs:string, item())',) def __call__(self, *args: Any, **kwargs: Any) -> XPathMap: items = { 'number': random.random(), 'next': NextRandom(self.parser), 'permute': Permute(self.parser), } return XPathMap(self.parser, items) return NextRandom(self.parser)() @method(function('apply', nargs=2, sequence_types=('function(*)', 'array(*)', 'item()*'))) def evaluate_apply_function(self: XPathFunction, context: ContextType = None) \ -> SequenceType[ItemType]: if self.context is not None: context = self.context if isinstance(self[0], XPathFunction): func = self[0] else: func = self.get_argument(context, required=True, cls=XPathFunction) array_ = self.get_argument(context, index=1, required=True, cls=XPathArray) try: return func(*array_.items(context), context=context) except ElementPathTypeError as err: if err.code is None or not err.code.endswith(('XPST0017', 'XPTY0004')): raise raise self.error('FOAP0001') from None @method(function('parse-ietf-date', nargs=1, sequence_types=('xs:string?', 'xs:dateTime?'))) def evaluate_parse_ietf_date_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[DateTime]: if self.context is not None: context = self.context value = self.get_argument(context, cls=str) if value is None: return [] # Normalize the input value = collapse_white_spaces(value) value = value.replace(' -', '-').replace('- ', '-').replace(' +', '+') value = value.replace(' (', '(').replace('( ', '(').replace(' )', ')') # timezone +/-NN:N is invalid if re.search(r'(?<=[+\-])(\d{2}:\d)(?=\D)', value) is not None: raise self.error('FORG0010') # Minutes must be 2 digits if re.search(r' \d{1,2}:\d(?=\D)', value) is not None: raise self.error('FORG0010') # Adjust timezone part value = re.sub(r'(?<=\D)(\d)(?=\D)', '0\\g<1>', value) value = re.sub(r'(?<=\d[+\-])(\d{2}:)(?=($|[ (]))', '\\g<1>00', value) value = re.sub(r'(?<=\d[+\-])(\d{2})(?=($|[ (]))', '\\g<1>:00', value) value = re.sub(r'(?<=\d[+\-])(\d{3})(?=[ (])', '0\\g<1>', value) tzname_regex = r'(?<=[\d( ])(UT|UTC|GMT|EST|EDT|CST|CDT|MST|MDT|PST|PDT)\b' tzname_match = re.search(tzname_regex, value, re.IGNORECASE) if tzname_match is not None: # only to let be parsed by strptime() value = re.sub(tzname_regex, 'UTC', value, flags=re.IGNORECASE) illegal_tzname_regex = r'\b(CET)\b' if re.search(illegal_tzname_regex, value, re.IGNORECASE) is not None: raise self.error('FORG0010', 'illegal timezone name') if value and value[0].isalpha(): # Parse dayname part (that is then ignored) try: dayname, _value = value.split(' ', maxsplit=1) except ValueError: raise self.error('FORG0010') from None else: if dayname.endswith(','): dayname = dayname[:-1] for fmt in ['%A', '%a']: try: datetime.strptime(dayname, fmt) except ValueError: pass else: value = _value break # Parse 24:00 cases if ' 24:00 ' in value: value = value.replace(' 24:00 ', ' 00:00 ') day_offset = True elif ' 24:00:00' in value and ' 24:00:00.' not in value: value = value.replace(' 24:00:00', ' 00:00:00') day_offset = True else: day_offset = False # Parsing generating every combination if value and value[0].isalpha(): # Parse asctime rule fmt_alternatives = ( ['%b %d %H:%M', '%b-%d %H:%M'], ['', ':%S', ':%S.%f'], ['', '%Z', ' %Z', '%z', '%z(%Z)'], [' %Y', ' %y'], ) # Adjust 2-digits year value = re.sub(r'(?<= )(\d{2})$', '19\\g<1>', value) else: # Parse datespec rule fmt_alternatives = ( ['%d %b ', '%d-%b-', '%d %b-', '%d-%b '], ['%Y %H:%M', '%y %H:%M'], ['', ':%S', ':%S.%f'], ['', '%Z', ' %Z', '%z', '%z(%Z)'], ) # Adjust 2-digits year value = re.sub(r'(?<=[ \-])(\d{2})(?= \d{2}:\d{2})', '19\\g<1>', value) for fmt_chunks in product(*fmt_alternatives): fmt = ''.join(fmt_chunks) if '%f%Z' in fmt: continue try: dt = datetime.strptime(value, fmt) except ValueError: continue else: if tzname_match is not None and dt.tzinfo is None: tzname = tzname_match.group(0).upper() dt = dt.replace(tzinfo=Timezone.fromstring(TIMEZONE_MAP[tzname])) if dt.tzinfo is not None: offset = dt.tzinfo.utcoffset(None) seconds = offset.days * 86400 + offset.seconds if offset else 0 if abs(seconds) > 14 * 3600: raise self.error('FORG0010') if day_offset: dt = dt + timedelta(seconds=86400) return DateTime.fromdatetime(dt) else: raise self.error('FORG0010') @method(function('contains-token', nargs=(2, 3), sequence_types=('xs:string*', 'xs:string', 'xs:string', 'xs:boolean'))) def evaluate_contains_token_function(self: XPathFunction, context: ContextType = None) -> bool: if self.context is not None: context = self.context token_string = self.get_argument(context, index=1, required=True, cls=str) token_string = token_string.strip() if len(self) < 3: collation = self.parser.default_collation else: collation = self.get_argument(context, 2, required=True, cls=str) with CollationManager(collation, self) as manager: for input_string in self[0].select(context): if not isinstance(input_string, str): raise self.error('XPTY0004') if any(x and manager.eq(token_string, x) for x in re.split('[ \t\n\r\f\v]+', input_string)): return True else: return False @method(function('collation-key', nargs=(1, 2), sequence_types=('xs:string', 'xs:string', 'xs:base64Binary'))) def evaluate_collation_key_function(self: XPathFunction, context: ContextType = None) \ -> Base64Binary: if self.context is not None: context = self.context key = self.get_argument(context, required=True, cls=str) if len(self) > 1: collation = self.get_argument(context, index=1, required=True, cls=str) else: collation = self.parser.default_collation try: with CollationManager(collation, self) as manager: base64_key = Base64Binary.encoder(manager.strxfrm(key).encode()) return Base64Binary(base64_key, ordered=True) except locale.Error: raise self.error('FOCH0004') @method(function('default-language', nargs=0, sequence_types=('xs:language',))) def evaluate_default_language_function(self: XPathFunction, context: ContextType = None) \ -> Language: if self.context is not None: context = self.context elif context is None: raise self.missing_context() if context.default_language is not None: return context.default_language lang = locale.getlocale()[0] return Language(lang.replace('_', '-') if lang else lang) NULL_TAG = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}null' BOOLEAN_TAG = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}boolean' NUMBER_TAG = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}number' STRING_TAG = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}string' ARRAY_TAG = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}array' MAP_TAG = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}map' BOOLEAN_VALUES = {'true', 'false', '1', '0'} @method(function('xml-to-json', nargs=(1, 2), sequence_types=('node()?', 'map(*)', 'xs:string?'))) def evaluate_xml_to_json_function(self: XPathFunction, context: ContextType = None) \ -> Emptiable[str]: if self.context is not None: context = self.context input_node = self.get_argument(context, cls=XPathNode) if input_node is None: return [] if len(self) > 1: options = self.get_argument(context, index=1, required=True, cls=XPathMap) indent = options(context, 'indent') if indent is not None and isinstance(indent, bool): raise self.error('FOJS0005') def elem_to_json(elements: Iterable[ElementProtocol]) -> str: chunks = [] def check_attributes(*exclude: str) -> None: for name in child.attrib: if name is None or name in exclude: continue elif name.startswith('{') and \ not name.startswith(f'{{{XPATH_FUNCTIONS_NAMESPACE}}}'): continue raise self.error('FOJS0006', f"{child} has an invalid attribute {name!r}") def check_escapes(s: str) -> None: if re.search(r'(? Emptiable[DocumentNode]: if self.context is not None: context = self.context json_text = self.get_argument(context, cls=str) if json_text is None or isinstance(context, XPathSchemaContext): return [] elif context is not None: etree = context.etree else: raise self.missing_context() def _fallback(*args: Any, context: ContextType = None) -> str: return '�' liberal = False validate = False duplicates = None escape = False fallback: Callable[..., str] = _fallback if len(self) > 1: options = self.get_argument(context, index=1, required=True, cls=XPathMap) for key, value in options.items(context): if key == 'liberal': if not isinstance(value, bool): raise self.error('XPTY0004') liberal = value elif key == 'duplicates': if not isinstance(value, str): raise self.error('XPTY0004') elif value not in ('reject', 'retain', 'use-first'): raise self.error('FOJS0005') duplicates = value elif key == 'validate': if not isinstance(value, bool): raise self.error('XPTY0004') validate = value elif key == 'escape': if not isinstance(value, bool): raise self.error('XPTY0004') escape = value elif key == 'fallback': if escape: msg = "'fallback' function provided with escape=True" raise self.error('FOJS0005', msg) if not isinstance(value, XPathFunction): raise self.error('XPTY0004') fallback = cast(Callable[..., str], value) else: raise self.error('FOJS0005') if duplicates is None: duplicates = 'reject' if validate else 'retain' elif validate and duplicates == 'retain': raise self.error('FOJS0005') def escape_string(s: str) -> str: s = re.sub(r'\\(?!/)', r'\\\\', s) s = s.replace('\b', r'\b'). \ replace('\r', r'\r'). \ replace('\n', r'\n'). \ replace('\t', r'\t'). \ replace('\f', r'\f'). \ replace('/', r'\/') return ''.join( x if is_xml_codepoint(ord(x)) else rf'\u{ord(x):04X}' for x in s ) def value_to_etree(v: Optional[ItemType], **attrib: str) -> ElementProtocol: if v is None: elem = etree.Element(NULL_TAG, **attrib) elif isinstance(v, list): elem = etree.Element(ARRAY_TAG, **attrib) for item in v: elem.append(value_to_etree(item)) elif isinstance(v, bool): elem = etree.Element(BOOLEAN_TAG, **attrib) elem.text = 'true' if v else 'false' elif isinstance(v, (int, float)): elem = etree.Element(NUMBER_TAG, **attrib) elem.text = str(v) elif isinstance(v, str): if not escape: v = ''.join(x if is_xml_codepoint(ord(x)) else fallback(rf'\u{ord(x):04X}', context=context) for x in v) elem = etree.Element(STRING_TAG, **attrib) else: v = escape_string(v) if '\\' in v: elem = etree.Element(STRING_TAG, escaped='true', **attrib) else: elem = etree.Element(STRING_TAG, **attrib) elem.text = v elif is_etree_element(v): e = cast(EtreeElementProtocol, v) e.attrib.update(attrib) return e else: raise ElementPathTypeError(f'unexpected type {type(v)}') return cast(ElementProtocol, elem) def json_object_to_etree(obj: Iterable[Tuple[str, Optional[ItemType]]]) -> ElementProtocol: keys = set() items = [] for k, v in obj: if k not in keys: keys.add(k) elif duplicates == 'use-first': continue elif duplicates == 'reject': raise self.error('FOJS0003') if not escape: k = ''.join(x if is_xml_codepoint(ord(x)) else fallback(rf'\u{ord(x):04X}', context=context) for x in k) k = k.replace('"', '"') attrib = {'key': k} else: k = escape_string(k) if '\\' in k: attrib = {'escaped-key': 'true', 'key': k} else: attrib = {'key': k} items.append(value_to_etree(v, **attrib)) elem = etree.Element(MAP_TAG) for item in items: elem.append(item) return cast(ElementProtocol, elem) kwargs: Dict[str, Any] = {'object_pairs_hook': json_object_to_etree} if liberal or escape: kwargs['strict'] = False if liberal: def parse_constant(s: Any) -> None: raise self.error('FOJS0001') kwargs['parse_constant'] = parse_constant etree.register_namespace('fn', XPATH_FUNCTIONS_NAMESPACE) try: if json_text.startswith('\uFEFF'): # Exclude BOM character result = json.JSONDecoder(**kwargs).decode(json_text[1:]) else: result = json.JSONDecoder(**kwargs).decode(json_text) except json.JSONDecodeError as err: raise self.error('FOJS0001', str(err)) from None if is_etree_element(result): document = etree.ElementTree(result) else: document = etree.ElementTree(value_to_etree(result)) root = document.getroot() if XML_BASE not in root.attrib and self.parser.base_uri: root.set(XML_BASE, self.parser.base_uri) if validate: validate_json_to_xml(document.getroot()) namespaces = {'j': XPATH_FUNCTIONS_NAMESPACE} return cast(DocumentNode, get_node_tree(document, namespaces)) @method(function('trace', nargs=(1, 2), sequence_types=('item()*', 'xs:string', 'item()*'))) def select_trace_function(self: XPathFunction, context: ContextType = None) \ -> Iterator[ItemType]: if self.context is not None: context = self.context if len(self) == 1: for value in self[0].select(context): self.parser.tracer(str(value).strip()) yield value else: label = self.get_argument(context, index=1, cls=str) for value in self[0].select(context): self.parser.tracer('{} {}'.format(label, str(value).strip())) yield value sissaschool-elementpath-d3688c7/elementpath/xpath31/_xpath31_operators.py000066400000000000000000000223601476131650400266270ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.1 implementation - part 2 (operators and constructors) """ from typing import cast, Optional, Union from elementpath._typing import Iterator, Iterable from elementpath.aliases import SequenceType from elementpath.helpers import iter_sequence from elementpath.sequence_types import is_sequence_type, match_sequence_type from elementpath.xpath_tokens import XPathParserType, XPathToken, ProxyToken, \ XPathFunction, XPathMap, XPathArray from elementpath.datatypes import AtomicType from elementpath.xpath_context import ContextType, ItemType, ValueType from .xpath31_parser import XPath31Parser __all__ = ['XPath31Parser'] register = XPath31Parser.register method = XPath31Parser.method function = XPath31Parser.function register('map', bp=90, label=('kind test', 'map'), bases=(XPathFunction,), pattern=r'(? Union[XPathToken, XPathMap, XPathArray]: if self.parser.next_token.symbol == '{': self.parser.token = XPathMap(self.parser).nud() return self.parser.token elif self.parser.next_token.symbol != '(': return self.as_name() self.label = 'kind test' self.parser.advance('(') if self.parser.next_token.label not in ('kind test', 'sequence type', 'function test'): self.parser.expected_next('(name)', ':', '*', message='a QName or a wildcard expected') self[:] = self.parser.expression(45), self[0].parse_occurrence() if self[0].symbol != '*': self.parser.advance(',') if self.parser.next_token.label not in ('kind test', 'sequence type', 'function test'): self.parser.expected_next('(name)', ':', '*', message='a QName or a wildcard expected') self.append(self.parser.expression(45)) self[-1].parse_occurrence() self.parser.advance(')') return self register('array', bp=90, label=('kind test', 'array'), bases=(XPathFunction,), pattern=r'(? XPathToken: if self.parser.next_token.symbol == '{': self.parser.token = XPathArray(self.parser).nud() return self.parser.token elif self.parser.next_token.symbol != '(': return self.as_name() self.label = 'kind test' self.parser.advance('(') if self.parser.next_token.label not in ('kind test', 'function test'): self.parser.expected_next('(name)', ':', '*', 'item') self[:] = self.parser.expression(45), if self[0].symbol != '*': self[0].parse_occurrence() self.parser.advance(')') self.parse_occurrence() return self @method('map') @method('array') def select_map_or_array_kind_test(self: XPathFunction, context: ContextType = None) \ -> Iterator[Union[XPathMap, XPathArray]]: if context is None: raise self.missing_context() for item in context.iter_children_or_self(): if match_sequence_type(item, self.source, self.parser): yield cast(Union[XPathMap, XPathArray], item) ### # Square array constructor (pushed lazy) @method('[') def nud_square_array_constructor(self: XPathToken) -> XPathToken: if self.parser.version < '3.1': raise self.wrong_syntax() # Constructs an XPathArray token and returns it instead of the predicate token = XPathArray(self.parser) token.symbol = '[' if token.parser.next_token.symbol not in (']', '(end)'): while True: token.append(self.parser.expression(5)) if token.parser.next_token.symbol != ',': break token.parser.advance() token.parser.advance(']') return token class LookupOperatorToken(XPathToken): """ Question mark symbol is used for XP31+ lookup operator and also for placeholder in XP30+ partial functions and for optional occurrences. """ symbol = lookup_name = '?' lbp = 85 rbp = 85 def __init__(self, parser: XPathParserType, value: Optional[AtomicType] = None) -> None: super().__init__(parser, value) if self.parser.token.symbol in ('(', ','): # It's a placeholder symbol or a unary lookup operator # in a list of function arguments. self.lbp = self.rbp = 0 @property def source(self) -> str: if not self: return '?' elif len(self) == 1: return f'?{self[0].source}' else: return f'{self[0].source}?{self[1].source}' def nud(self) -> 'LookupOperatorToken': try: self.parser.expected_next('(name)', '(integer)', '(', '*') except SyntaxError: if self.lbp: raise return self # a placeholder/unary lookup token else: self[:] = self.parser.expression(85), return self def led(self, left: XPathToken) -> Union['LookupOperatorToken', XPathToken]: try: self.parser.expected_next('(name)', '(integer)', '(', '*') except SyntaxError: if is_sequence_type(left.value, self.parser): self.lbp = self.rbp = 0 left.occurrence = '?' return left raise else: self[:] = left, self.parser.expression(85) return self def evaluate(self, context: ContextType = None) -> SequenceType[ItemType]: if not self: return self.value # a placeholder token return [x for x in self.select(context)] def select(self, context: ContextType = None) -> Iterator[ItemType]: # flatten sequences, don't flatten arrays. def flatten(v: ValueType) -> Iterator[ItemType]: if isinstance(v, list): yield from v else: yield v if not self: yield from iter_sequence(self.value) return items: Iterable[ItemType] if len(self) == 1: # unary lookup operator (used in predicates) if context is None: raise self.missing_context() items = (context.item,) else: items = self[0].select(context) for item in items: symbol = self[-1].symbol if isinstance(item, XPathMap): if symbol == '*': for value in item.values(context): yield from flatten(value) elif symbol in ('(name)', '(integer)'): yield from flatten( item(cast(Union[str, int], self[-1].value), context=context) ) elif symbol == '(': for obj in self[-1].select(context): yield from flatten(item(self.data_value(obj), context=context)) elif isinstance(item, XPathArray): if symbol == '*': for value in item.items(context): yield from flatten(value) elif symbol == '(name)': raise self.error('XPTY0004') elif symbol == '(integer)': yield from flatten(item(cast(int, self[-1].value), context=context)) elif symbol == '(': for value in self[-1].select(context): yield from flatten(item(self.data_value(value), context=context)) elif not item and isinstance(item, list): continue else: raise self.error('XPTY0004') XPath31Parser.symbol_table['?'] = LookupOperatorToken @method('=>', bp=67) def led_arrow_operator(self: XPathToken, left: XPathToken) -> XPathToken: next_token = self.parser.next_token if next_token.symbol == '$': self[:] = left, self.parser.expression(80) elif isinstance(next_token, ProxyToken): self.parser.parse_arguments = False self[:] = left, next_token.nud() self.parser.parse_arguments = True self.parser.advance() elif isinstance(next_token, XPathFunction): self[:] = left, next_token if next_token.label == 'kind test': raise next_token.wrong_syntax() self.parser.advance() # Skip static evaluation of function arguments else: next_token.expected('(name)', ':', 'Q{', '(') self.parser.parse_arguments = False self[:] = left, self.parser.expression(80) self.parser.parse_arguments = True right = self.parser.expression(67) right.expected('(') self.append(right) return self @method('=>') def evaluate_arrow_operator(self: XPathToken, context: ContextType = None) \ -> SequenceType[ItemType]: tokens = [self[0]] if self[2]: tokens.extend(self[2][0].get_argument_tokens()) func = self[1].get_function(context, arity=len(tokens)) arguments = [tk.evaluate(context) for tk in tokens] return func(*arguments, context=context) sissaschool-elementpath-d3688c7/elementpath/xpath31/xpath31_parser.py000066400000000000000000000025741476131650400257530ustar00rootroot00000000000000# # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPath 3.1 implementation """ from typing import ClassVar, Dict, Tuple from elementpath.namespaces import XPATH_MAP_FUNCTIONS_NAMESPACE, \ XPATH_ARRAY_FUNCTIONS_NAMESPACE # , XSLT_XQUERY_SERIALIZATION_NAMESPACE from elementpath.datatypes import QName from elementpath.xpath30 import XPath30Parser class XPath31Parser(XPath30Parser): """ XPath 3.1 expression parser class. """ version = '3.1' DEFAULT_NAMESPACES: ClassVar[Dict[str, str]] = { 'map': XPATH_MAP_FUNCTIONS_NAMESPACE, 'array': XPATH_ARRAY_FUNCTIONS_NAMESPACE, **XPath30Parser.DEFAULT_NAMESPACES } # https://www.w3.org/TR/xpath-31/#id-reserved-fn-names RESERVED_FUNCTION_NAMES = { 'array', 'attribute', 'comment', 'document-node', 'element', 'empty-sequence', 'function', 'if', 'item', 'map', 'namespace-node', 'node', 'processing-instruction', 'schema-attribute', 'schema-element', 'switch', 'text', 'typeswitch', } function_signatures: Dict[Tuple[QName, int], str] = XPath30Parser.function_signatures.copy() sissaschool-elementpath-d3688c7/elementpath/xpath_context.py000066400000000000000000000712161476131650400245060ustar00rootroot00000000000000# # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import datetime import importlib from copy import copy from functools import cached_property from types import ModuleType from typing import TYPE_CHECKING, cast, Any, Dict, List, Optional, Set, Union from elementpath._typing import Iterator, Sequence, Callable from elementpath.aliases import NamespacesType, SequenceType, InputType from elementpath.protocols import ElementProtocol, DocumentProtocol from elementpath.exceptions import ElementPathTypeError from elementpath.tdop import Token from elementpath.datatypes import AnyAtomicType, AtomicType, Timezone, Language from elementpath.etree import is_etree_element, is_etree_element_instance, is_etree_document from elementpath.xpath_nodes import ChildNodeType, XPathNode, AttributeNode, NamespaceNode, \ CommentNode, ProcessingInstructionNode, ElementNode, DocumentNode, RootNodeType, \ RootArgType, SchemaElementNode from elementpath.tree_builders import get_node_tree if TYPE_CHECKING: from elementpath.schema_proxy import AbstractSchemaProxy from elementpath.xpath_tokens import XPathToken, XPathAxis, XPathFunction # noqa __all__ = ['XPathContext', 'XPathSchemaContext', 'ContextType', 'ItemType', 'ValueType', 'ItemArgType', 'FunctionArgType'] ### # Type annotations aliases for context and tokens classes ContextType = Union['XPathContext', 'XPathSchemaContext', None] ItemType = Union[XPathNode, AtomicType, 'XPathFunction'] ValueType = SequenceType[ItemType] ItemArgType = Union[ItemType, ElementProtocol, DocumentProtocol] FunctionArgType = Union[InputType[ItemArgType], ValueType] NodeArgType = Union[XPathNode, ElementProtocol, DocumentProtocol] CollectionArgType = Optional[InputType[NodeArgType]] class XPathContext: """ The XPath dynamic context. The static context is provided by the parser. Usually the dynamic context instances are created providing only the root element. Variable values argument is needed if the XPath expression refers to in-scope variables. The other optional arguments are needed only if a specific position on the context is required, but have to be used with the knowledge of what is their meaning. :param root: the root of the XML document, usually an ElementTree instance or an \ Element. A schema or a schema element can also be provided, or an already built \ node tree. For default is `None`, in which case no XML root is set, and you have \ to provide an *item* argument. :param namespaces: a dictionary with mapping from namespace prefixes into URIs, \ used when namespace information is not available within document and element nodes. \ This can be useful when the dynamic context has additional namespaces and root \ is an Element or an ElementTree instance of the standard library. :param uri: an optional URI associated with the root element or the document. :param fragment: if `True` is provided the root is considered a fragment. In this \ case if `root` is an ElementTree instance skips it and use the root Element. If \ `False` is provided creates a dummy document when the root is an Element instance. \ In this case the dummy document value is not included in results. For default the \ root node kind is preserved. :param item: the context item. A `None` value means that the context is positioned on \ the document node. :param position: the current position of the node within the input sequence. :param size: the number of items in the input sequence. :param axis: the active axis. Used to choose when apply the default axis ('child' axis). :param schema: an optional schema proxy instance to be applied on XDM root or item. :param variables: dictionary of context variables that maps a QName to a value. :param current_dt: current dateTime of the implementation, including explicit timezone. :param timezone: implicit timezone to be used when a date, time, or dateTime value does \ not have a timezone. :param documents: available documents. This is a mapping of absolute URI \ strings into document nodes. Used by the function fn:doc. :param collections: available collections. This is a mapping of absolute URI \ strings onto sequences of nodes. Used by the XPath 2.0+ function fn:collection. :param default_collection: this is the sequence of nodes used when fn:collection \ is called with no arguments. :param text_resources: available text resources. This is a mapping of absolute URI strings \ onto text resources. Used by XPath 3.0+ function fn:unparsed-text/fn:unparsed-text-lines. :param resource_collections: available URI collections. This is a mapping of absolute \ URI strings to sequence of URIs. Used by the XPath 3.0+ function fn:uri-collection. :param default_resource_collection: this is the sequence of URIs used when \ fn:uri-collection is called with no arguments. :param allow_environment: defines if the access to system environment is allowed, \ for default is `False`. Used by the XPath 3.0+ functions fn:environment-variable \ and fn:available-environment-variables. """ _etree: Optional[ModuleType] = None _schema: Optional['AbstractSchemaProxy'] = None root: Optional[RootNodeType] document: Optional[DocumentNode] item: ItemType variables: Dict[str, ValueType] documents: Optional[Dict[str, DocumentNode]] = None collections: Optional[Dict[str, List[XPathNode]]] = None default_collection: Optional[List[XPathNode]] = None __slots__ = ('document', 'root', 'item', 'namespaces', 'size', 'position', 'variables', 'axis', '__dict__') def __init__(self, root: Optional[RootArgType] = None, namespaces: Optional[NamespacesType] = None, uri: Optional[str] = None, fragment: Optional[bool] = None, item: Optional[ItemArgType] = None, position: int = 1, size: int = 1, axis: Optional[str] = None, schema: Optional['AbstractSchemaProxy'] = None, variables: Optional[Dict[str, InputType[ItemArgType]]] = None, current_dt: Optional[datetime.datetime] = None, timezone: Optional[Union[str, Timezone]] = None, documents: Optional[Dict[str, RootArgType]] = None, collections: Optional[Dict[str, CollectionArgType]] = None, default_collection: CollectionArgType = None, text_resources: Optional[Dict[str, str]] = None, resource_collections: Optional[Dict[str, List[str]]] = None, default_resource_collection: Optional[str] = None, allow_environment: bool = False, default_language: Optional[str] = None, default_calendar: Optional[str] = None, default_place: Optional[str] = None) -> None: if namespaces: self.namespaces = {k: v for k, v in namespaces.items()} else: self.namespaces = {} if root is not None: self.root = get_node_tree(root, self.namespaces, uri, fragment) if item is not None: self.item = self.get_context_item(item, self.namespaces) else: self.item = self.root if schema is not None and not self.root.is_schema_node: self.root.apply_schema(schema) elif item is not None: self.root = None self.item = self.get_context_item(item, self.namespaces, uri, fragment) else: raise ElementPathTypeError("Missing both the root node and the context item!") if isinstance(self.root, DocumentNode): self.document = self.root elif fragment is None and \ isinstance(self.root, ElementNode) and \ is_etree_element_instance(self.root.obj): # Creates a dummy document that will be not included in results self.document = self.root.get_document_node(replace=False, as_parent=False) else: self.document = None self.position = position self.size = size self.axis = axis if timezone is None or isinstance(timezone, Timezone): self.timezone = timezone else: self.timezone = Timezone.fromstring(timezone) self.current_dt = current_dt or datetime.datetime.now(tz=self.timezone) if documents is not None: # Assume that are all documents because type checking is done by fn:doc(). self.documents = { k: cast(DocumentNode, get_node_tree(v, self.namespaces, k)) if v is not None else v for k, v in documents.items() } self.schema = schema self.variables = {} if variables is not None: for varname, value in variables.items(): self.variables[varname] = self.get_value(value, self.namespaces) if collections is not None: self.collections = {k: self.get_collection(v) for k, v in collections.items()} if default_collection is not None: self.default_collection = self.get_collection(default_collection) self.text_resources = text_resources if text_resources is not None else {} self.resource_collections = resource_collections self.default_resource_collection = default_resource_collection self.allow_environment = allow_environment self.default_language = None if default_language is None else Language(default_language) self.default_calendar = default_calendar self.default_place = default_place def __repr__(self) -> str: if self.root is not None: return f'{self.__class__.__name__}(root={self.root.obj})' elif isinstance(self.item, XPathNode): return f'{self.__class__.__name__}(item={self.item.obj})' else: return f'{self.__class__.__name__}(item={self.item!r})' def __copy__(self) -> 'XPathContext': obj: XPathContext = object.__new__(self.__class__) obj.__dict__.update(self.__dict__) obj.document = self.document obj.root = self.root obj.item = self.item obj.size = self.size obj.position = self.position obj.axis = None obj.namespaces = {k: v for k, v in self.namespaces.items()} obj.variables = {k: v for k, v in self.variables.items()} return obj @cached_property def etree(self) -> ModuleType: if isinstance(self.root, (DocumentNode, ElementNode)): module_name = self.root.obj.__class__.__module__ elif isinstance(self.item, (DocumentNode, ElementNode, CommentNode, ProcessingInstructionNode)): module_name = self.item.obj.__class__.__module__ else: module_name = 'xml.etree.ElementTree' if module_name in ('lxml.etree', 'lxml.html'): return importlib.import_module('lxml.etree') else: return importlib.import_module('xml.etree.ElementTree') @property def schema(self) -> Optional['AbstractSchemaProxy']: return self._schema @schema.setter def schema(self, schema: Optional['AbstractSchemaProxy']) -> None: self._schema = schema if schema is not None: if self.root is not None: self.root.apply_schema(schema) elif isinstance(self.item, (DocumentNode, ElementNode, AttributeNode)): self.item.apply_schema(schema) def get_root(self, node: Any) -> Optional[RootNodeType]: if isinstance(self.root, (DocumentNode, ElementNode)): if any(node is x for x in self.root.iter_lazy()): return self.root if self.documents is not None: for uri, doc in self.documents.items(): if doc is not None and any(node is x for x in doc.iter_lazy()): return doc return None def is_document(self) -> bool: return isinstance(self.document, DocumentNode) def is_fragment(self) -> bool: return self.document is None and self.root is not None def is_rooted_subtree(self) -> bool: return self.root is not None and isinstance(self.root.parent, ElementNode) def is_principal_node_kind(self) -> bool: if self.axis == 'attribute': return isinstance(self.item, AttributeNode) elif self.axis == 'namespace': return isinstance(self.item, NamespaceNode) else: return isinstance(self.item, ElementNode) def get_context_item(self, item: ItemArgType, namespaces: Optional[NamespacesType] = None, uri: Optional[str] = None, fragment: Optional[bool] = None) -> ItemType: """ Checks the item and returns an item suitable for XPath processing. For XML trees and elements try a match with an existing node in the context. If it fails then builds a new node using also the provided optional arguments. """ if isinstance(item, (XPathNode, AnyAtomicType)): return item elif is_etree_document(item): if self.root is not None and item is self.root.obj: return self.root if self.documents: for doc in self.documents.values(): if doc is not None and item is doc.obj: return doc elif is_etree_element(item): try: return self.root.elements[item] # type: ignore[index,union-attr] except (TypeError, KeyError, AttributeError): pass if self.documents: for doc in self.documents.values(): if doc is not None and doc.elements is not None and item in doc.elements: return doc.elements[item] # type: ignore[index] if callable(item.tag): # type: ignore[union-attr] if item.tag.__name__ == 'Comment': # type: ignore[union-attr] return CommentNode(cast(ElementProtocol, item)) else: return ProcessingInstructionNode(cast(ElementProtocol, item)) elif not isinstance(item, Token) or not callable(item): msg = f"Unexpected type {type(item)} for context item" raise ElementPathTypeError(msg) else: return item return get_node_tree( root=cast(Union[ElementProtocol, DocumentProtocol], item), namespaces=namespaces, uri=uri, fragment=fragment ) def get_value(self, item: FunctionArgType, *args: Any, **kwargs: Any) -> ValueType: if item is None: return [] elif not isinstance(item, (list, tuple)): return self.get_context_item(item, *args, **kwargs) return [self.get_context_item(x, *args, **kwargs) for x in item] def get_collection(self, items: CollectionArgType) -> List[XPathNode]: if items is None: return [] elif isinstance(items, (list, tuple)): return [x for x in map(self.get_context_item, items) if isinstance(x, XPathNode)] else: item = self.get_context_item(items) return [item] if isinstance(item, XPathNode) else [] def inner_focus_select(self, token: Union['XPathToken', 'XPathAxis'], predicate: bool = False) \ -> Iterator[ItemType]: """Apply the token's selector with an inner focus.""" status = self.item, self.size, self.position, self.axis if predicate: results: List[ItemType] = [] for item in token.select(copy(self)): # With predicate select nodes that have not single list value # must be replaced by typed values. if isinstance(item, (AttributeNode, ElementNode)) and item.is_list: results.extend(v for v in item.iter_typed_values) continue results.append(item) else: results = [x for x in token.select(copy(self))] self.axis = None if token.label == 'axis' and cast('XPathAxis', token).reverse_axis: self.size = self.position = len(results) for self.item in results: yield self.item self.position -= 1 else: self.size = len(results) for self.position, self.item in enumerate(results, start=1): yield self.item self.item, self.size, self.position, self.axis = status def iter_product(self, selectors: Sequence[Callable[[Any], Any]], varnames: Optional[Sequence[str]] = None) -> Iterator[Any]: """ Iterator for cartesian products of selectors. :param selectors: a sequence of selector generator functions. :param varnames: a sequence of variables for storing the generated values. """ if varnames is None: varnames = [] iterators = [x(self) for x in selectors] dimension = len(iterators) prod = [None] * dimension max_index = dimension - 1 k = 0 while True: for value in iterators[k]: try: self.variables[varnames[k]] = value except IndexError: pass prod[k] = value if k == max_index: yield tuple(prod) else: k += 1 break else: if not k: return iterators[k] = selectors[k](self) k -= 1 ## # Context item iterators for axis def iter_self(self) -> Iterator[ItemType]: """Iterator for 'self' axis and '.' shortcut.""" if self.item is not None: status = self.axis self.axis = 'self' yield self.item self.axis = status def iter_attributes(self) -> Iterator[AttributeNode]: """Iterator for 'attribute' axis and '@' shortcut.""" status: Any if isinstance(self.item, AttributeNode): status = self.axis self.axis = 'attribute' yield self.item self.axis = status return elif isinstance(self.item, ElementNode): status = self.item, self.axis self.axis = 'attribute' for self.item in self.item.attributes: yield self.item self.item, self.axis = status def iter_children_or_self(self) -> Iterator[ItemType]: """Iterator for 'child' forward axis and '/' step.""" if self.item is not None: if self.axis is not None: yield self.item elif isinstance(self.item, (ElementNode, DocumentNode)): _status = self.item, self.axis self.axis = 'child' if self.item is self.document and self.root is not self.document: if self.root is not None: yield self.root else: for self.item in self.item: yield self.item self.item, self.axis = _status def iter_matching_nodes(self, name: str, default_namespace: Optional[str] = None) \ -> Iterator[Union[AttributeNode, ElementNode]]: """ Iterator for matching elements or attributes. For default uses 'child' forward axis if no axis is active, otherwise tests the current item. """ if self.axis is not None: if isinstance(self.item, (AttributeNode, ElementNode)): if self.item.match_name(name, default_namespace): yield self.item elif isinstance(self.item, (ElementNode, DocumentNode)): _status = self.item, self.axis self.axis = 'child' if self.item is self.document and isinstance(self.root, ElementNode): if self.root.match_name(name, default_namespace): yield self.root else: for self.item in self.item: if self.item.match_name(name, default_namespace): assert isinstance(self.item, ElementNode) yield self.item self.item, self.axis = _status def iter_parent(self) -> Iterator[RootNodeType]: """Iterator for 'parent' reverse axis and '..' shortcut.""" if isinstance(self.item, XPathNode): # A stop rule for non-rooted fragments (e.g. root is a schema elements) if self.document is not None or self.item is not self.root: if self.item.parent is not None: status = self.item, self.axis self.axis = 'parent' self.item = self.item.parent yield self.item self.item, self.axis = status def iter_siblings(self, axis: Optional[str] = None) -> Iterator[ChildNodeType]: """ Iterator for 'following-sibling' forward axis and 'preceding-sibling' reverse axis. :param axis: the context axis, default is 'following-sibling'. """ if isinstance(self.item, XPathNode): if self.document is not None or self.item is not self.root: item = self.item if item.parent is not None: status = self.item, self.axis self.axis = axis or 'following-sibling' if axis == 'preceding-sibling': for child in item.parent: # pragma: no cover if child is item: break self.item = child yield child else: follows = False for child in item.parent: if follows: self.item = child yield child elif child is item: follows = True self.item, self.axis = status def iter_descendants(self, axis: Optional[str] = None) -> Iterator[Union[None, XPathNode]]: """ Iterator for 'descendant' and 'descendant-or-self' forward axes and '//' shortcut. :param axis: the context axis, for default has no explicit axis. """ if isinstance(self.item, (DocumentNode, ElementNode)): status = self.item, self.axis self.axis = axis for self.item in self.item.iter_descendants(with_self=axis != 'descendant'): yield self.item self.item, self.axis = status elif axis != 'descendant' and isinstance(self.item, XPathNode): self.axis, axis = axis, self.axis yield self.item self.axis = axis def iter_ancestors(self, axis: Optional[str] = None) -> Iterator[XPathNode]: """ Iterator for 'ancestor' and 'ancestor-or-self' reverse axes. :param axis: the context axis, default is 'ancestor'. """ if isinstance(self.item, XPathNode): status = self.item, self.axis self.axis = axis or 'ancestor' ancestors: List[XPathNode] = [] if axis == 'ancestor-or-self': ancestors.append(self.item) if self.document is not None or self.item is not self.root: parent = self.item.parent while parent is not None: ancestors.append(parent) if parent is self.root and self.document is None: break parent = parent.parent for self.item in reversed(ancestors): yield self.item self.item, self.axis = status def iter_preceding(self) -> Iterator[Union[DocumentNode, ChildNodeType]]: """Iterator for 'preceding' reverse axis.""" ancestors: Set[RootNodeType] item: XPathNode if isinstance(self.item, XPathNode): if self.document is not None or self.item is not self.root: item = self.item if (root := item.parent) is not None: status = self.item, self.axis self.axis = 'preceding' ancestors = {root} while root.parent is not None: if root is self.root and self.document is None: break root = root.parent ancestors.add(root) for self.item in root.iter_descendants(): if self.item is item: break if self.item not in ancestors: yield self.item self.item, self.axis = status def iter_followings(self) -> Iterator[ChildNodeType]: """Iterator for 'following' forward axis.""" if isinstance(self.item, ElementNode): status = self.item, self.axis self.axis = 'following' descendants = set(self.item.iter_descendants()) position = self.item.position root = self.item while isinstance(root.parent, ElementNode) and root is not self.root: root = root.parent for item in root.iter_descendants(with_self=False): if position < item.position and item not in descendants: self.item = item yield cast(ChildNodeType, self.item) self.item, self.axis = status class XPathSchemaContext(XPathContext): """ The XPath dynamic context base class for schema bounded parsers. Use this class as dynamic context for schema instances in order to perform a schema-based type checking during the static analysis phase. Don't use this as dynamic context on XML instances. """ root: ElementNode def __init__(self, *args: Any, **kwargs: Any) -> None: super().__init__(*args, **kwargs) if self.schema is None: if isinstance(self.root, SchemaElementNode): try: self._schema = self.root.obj.xpath_proxy except AttributeError: pass @property def schema(self) -> Optional['AbstractSchemaProxy']: return self._schema @schema.setter def schema(self, schema: Optional['AbstractSchemaProxy']) -> None: self._schema = schema def iter_matching_nodes(self, name: str, default_namespace: Optional[str] = None) \ -> Iterator[Union[AttributeNode, ElementNode]]: """ Iterator for matching elements or attributes. For default uses 'child' forward axis if no axis is active, otherwise tests the current item. """ if self.axis is not None: if isinstance(self.item, (AttributeNode, ElementNode)): if self.item.match_name(name, default_namespace): if not self.item.name: if isinstance(self.item, ElementNode): for element_node in self.root: assert isinstance(element_node, ElementNode) if element_node.match_name(name, default_namespace): self.item = element_node break else: for attribute_node in self.root.attributes: if attribute_node.match_name(name, default_namespace): self.item = attribute_node break yield self.item elif isinstance(self.item, ElementNode): _status = self.item, self.axis self.axis = 'child' for self.item in self.item: if self.item.match_name(name, default_namespace): if not self.item.name: for element_node in self.root: if element_node.match_name(name, default_namespace): self.item = element_node break assert isinstance(self.item, ElementNode) yield self.item self.item, self.axis = _status sissaschool-elementpath-d3688c7/elementpath/xpath_nodes.py000066400000000000000000001547361476131650400241430ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import importlib from collections import deque from urllib.parse import urljoin from typing import cast, Any, Dict, List, Optional, Tuple, TYPE_CHECKING, Union from xml.etree import ElementTree from elementpath._typing import Deque, Iterator from elementpath.exceptions import ElementPathRuntimeError, \ ElementPathValueError, ElementPathKeyError from elementpath.aliases import NamespacesType, NsmapType, SequenceType from elementpath.datatypes import UntypedAtomic, AtomicType, AnyURI, QName from elementpath.namespaces import XML_NAMESPACE, XML_BASE, XSI_NIL, \ XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE, XSI_TYPE, \ XML_ID, XSD_IDREF, XSD_IDREFS, XSD_UNTYPED, XSD_UNTYPED_ATOMIC, \ XPATH_FUNCTIONS_NAMESPACE, get_expanded_name from elementpath.protocols import ElementProtocol, XsdElementProtocol, \ XsdAttributeProtocol, XsdTypeProtocol, DocumentType, ElementType, \ SchemaElemType, CommentType, ProcessingInstructionType from elementpath.helpers import match_wildcard, is_absolute_uri from elementpath.decoder import get_atomic_sequence from elementpath.etree import etree_iter_strings, is_etree_element_instance if TYPE_CHECKING: from elementpath.schema_proxy import AbstractSchemaProxy __all__ = ['TypedNodeType', 'ParentNodeType', 'ChildNodeType', 'ElementMapType', 'XPathNode', 'NamespaceNode', 'AttributeNode', 'TextAttributeNode', 'SchemaAttributeNode', 'TextNode', 'CommentNode', 'ProcessingInstructionNode', 'ElementNode', 'EtreeElementNode', 'LazyElementNode', 'SchemaElementNode', 'DocumentNode', 'EtreeDocumentNode', 'RootNodeType', 'RootArgType'] _EMPTY_NAME_PATH = f'*[Q{{{XPATH_FUNCTIONS_NAMESPACE}}}local-name()=""]' _XSD_SPECIAL_TYPES = {XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE} TypedNodeType = Union['AttributeNode', 'ElementNode'] ParentNodeType = Union['DocumentNode', 'ElementNode'] ChildNodeType = Union['TextNode', 'ElementNode', 'CommentNode', 'ProcessingInstructionNode'] ElementMapType = Dict[ElementType, 'ElementNode'] # TODO for v5.0: use an internal shared object for storing same data once. This # will replace position argument and some attributes in element nodes. Position # argument will be kept only for namespace and attribute nodes. class XPathNodeTree: """ Status of the node tree structure, shared between nodes. """ root: ParentNodeType elements: ElementMapType namespaces: NamespacesType schema: Optional['AbstractSchemaProxy'] uri: Optional[str] total: int __slots__ = ('root', 'namespaces', 'uri', 'elements', 'schema', 'total') def __init__(self, root: ParentNodeType, namespaces: Optional[NamespacesType] = None, uri: Optional[str] = None) -> None: self.root = root self.namespaces = namespaces if namespaces is not None else {} self.uri = uri self.elements = {} self.schema = None self.total = 1 ### # XQuery and XPath Data Model: https://www.w3.org/TR/xpath-datamodel/ # # Note: in this implementation empty sequence return value is replaced by None. # # XPath has seven kinds of nodes: # # element, attribute, text, namespace, processing-instruction, comment, document ### class XPathNode: """ The base class of all XPath nodes. In the base class and in other intermediate derivation string and typed values are not implemented. Use these classes only for type checking and for wrapping other types in a custom XPath node types. """ __slots__ = ('name', 'obj', 'parent', 'position') ### # XDM accessors @property def attributes(self) -> Optional[List['AttributeNode']]: return None @property def base_uri(self) -> Optional[str]: return self.parent.base_uri if self.parent is not None else None children: Optional[List[ChildNodeType]] @property def document_uri(self) -> Optional[str]: return None @property def is_id(self) -> Optional[bool]: return None @property def is_idrefs(self) -> Optional[bool]: return None @property def namespace_nodes(self) -> Optional[List['NamespaceNode']]: return None @property def nilled(self) -> Optional[bool]: return None @property def node_kind(self) -> str: raise NotImplementedError() @property def node_name(self) -> Optional[QName]: name: Optional[str] = getattr(self, 'name', None) if name is None: return None elif not name.startswith('{'): return QName(None, name) try: namespace, local = name[1:].split('}') except ValueError: raise ElementPathValueError(f'invalid name format for {self!r}') else: if namespace == XML_NAMESPACE: return QName(namespace, f'xml:{local}') if isinstance(self, ElementNode): nsmap = self.nsmap elif isinstance(self.parent, ElementNode): nsmap = self.parent.nsmap else: nsmap = {} for prefix, ns in nsmap.items(): if namespace == ns: if not prefix: return QName(namespace, local) return QName(namespace, f"{prefix}:{local}") raise ElementPathKeyError(f'missing namespace prefix mapping in {self!r}') parent: Optional[ParentNodeType] @property def type_name(self) -> Optional[str]: return None @property def string_value(self) -> str: raise NotImplementedError() @property def typed_value(self) -> SequenceType[AtomicType]: raise NotImplementedError() @staticmethod def unparsed_entity_public_id(name: str) -> Optional[str]: return None @staticmethod def unparsed_entity_system_id(name: str) -> Optional[AnyURI]: return None ### # Other properties and methods name: Optional[str] # node name obj: object # the object wrapped in the node position: int # position of the node, for document total order @property def value(self) -> object: """Access to wrapped object using the old API.""" return self.obj @property def root_node(self) -> 'XPathNode': return self if self.parent is None else self.parent.root_node @property def path(self) -> str: """Returns the node path in XPath 3.0+ format.""" return '' @property def extended_path(self) -> str: """Returns the node path in extended format.""" return self.path.replace('Q{}', '').replace('Q{', '{') @property def qname_path(self) -> str: """Returns the node path with names in prefixed QName format.""" path = self.path if isinstance(self, ElementNode): for prefix, namespace in self.nsmap.items(): path = path.replace(f'Q{{{namespace}}}', f'{prefix}:') elif isinstance(self.parent, ElementNode): for prefix, namespace in self.parent.nsmap.items(): path = path.replace(f'Q{{{namespace}}}', f'{prefix}:') path = path.replace('Q{}', '') if 'Q{' not in path: return path raise ElementPathKeyError(f'missing namespace prefix mapping in {path}') @property def iter_typed_values(self) -> Iterator[AtomicType]: raise NotImplementedError() @property def is_schema_node(self) -> Optional[bool]: return None @property def is_typed(self) -> Optional[bool]: return None @property def is_extended(self) -> Optional[bool]: return None @property def is_list(self) -> Optional[bool]: return None def apply_schema(self, schema: 'AbstractSchemaProxy') -> None: """Set XSD types for elements and attribute nodes from schema proxy instance.""" if self.parent is not None: self.parent.apply_schema(schema) def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: """ Returns `True` if the argument is matching the name of the node, `False` otherwise. Raises a ValueError if the argument is used, but it's in a wrong format. :param name: a fully qualified name, a local name or a wildcard. The accepted \ wildcard formats are '*', '*:*', '*:local-name' and '{namespace}*'. :param default_namespace: the default namespace for matching element names. \ The default is no-namespace. """ return False def get_child_position(self, child: ChildNodeType) -> int: pos = 0 if self.children: for c in self.children: if isinstance(child, ElementNode): if c.name == child.name: pos += 1 elif isinstance(c, child.__class__): pos += 1 if c is child: break return pos ### # NAMESPACE NODES class NamespaceNode(XPathNode): """ A class for processing XPath namespace nodes. :param prefix: the namespace prefix. :param uri: the namespace URI. :param parent: the parent element node. :param position: the position of the node in the document. """ obj: str parent: Optional['ElementNode'] __slots__ = () def __init__(self, prefix: Optional[str], uri: str, parent: Optional['ElementNode'] = None, position: int = 1) -> None: self.name = prefix self.obj = uri self.parent = parent self.position = position def __repr__(self) -> str: return '%s(prefix=%r, uri=%r)' % (self.__class__.__name__, self.name, self.obj) @property def prefix(self) -> Optional[str]: return self.name @property def uri(self) -> str: return self.obj value = uri def as_item(self) -> Tuple[Optional[str], str]: return self.name, self.obj @property def name_path(self) -> str: return self.prefix or _EMPTY_NAME_PATH @property def path(self) -> str: if self.parent is None: return '/namespace::{name_path}' elif isinstance(self.parent, ElementNode): return f"{self.parent.path}/namespace::{self.name_path}" return f"/namespace::{self.name_path}" @property def node_kind(self) -> str: return 'namespace' @property def node_name(self) -> Optional[QName]: return None if not self.name else QName(None, self.name) @property def string_value(self) -> str: return self.obj @property def iter_typed_values(self) -> Iterator[str]: yield self.obj ### # ATTRIBUTE NODES class AttributeNode(XPathNode): """ Base class for XPath attribute nodes, used only for type checking. """ name: Optional[str] parent: Optional['ElementNode'] schema: Optional['AbstractSchemaProxy'] xsd_type: Optional[XsdTypeProtocol] __slots__ = ('xsd_type',) def __new__(cls, *args: Any, **kwargs: Any) -> 'AttributeNode': if cls is AttributeNode: return object.__new__(TextAttributeNode) return object.__new__(cls) @property def uri_qualified_name(self) -> Optional[str]: """The URI qualified name of the attribute.""" if not self.name: return self.name elif self.name[0] == '{': return f'Q{self.name}' else: return self.name @property def name_path(self) -> str: return self.uri_qualified_name or _EMPTY_NAME_PATH @property def path(self) -> str: if self.parent is None: return f'/@{self.name_path}' elif isinstance(self.parent, ElementNode): return f"{self.parent.path}/@{self.name_path}" return f"/@{self.name_path}" @property def is_typed(self) -> bool: return self.xsd_type is not None @property def is_list(self) -> bool: return self.xsd_type is not None and self.xsd_type.is_list() def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if self.name is None: return False return self.name == name or '*' in name and match_wildcard(self.name, name) @property def base_uri(self) -> Optional[str]: return self.parent.base_uri if self.parent is not None else None @property def is_id(self) -> bool: return self.name == XML_ID or self.xsd_type is not None and self.xsd_type.is_key() @property def is_idrefs(self) -> bool: if self.xsd_type is None: return False root_type = self.xsd_type.root_type return root_type.name == XSD_IDREF or root_type.name == XSD_IDREFS @property def node_kind(self) -> str: return 'attribute' @property def string_value(self) -> str: raise NotImplementedError() @property def type_name(self) -> Optional[str]: return XSD_UNTYPED_ATOMIC if self.xsd_type is None else self.xsd_type.name @property def typed_value(self) -> SequenceType[AtomicType]: values = [v for v in self.iter_typed_values] if len(values) == 1: return values[0] else: return values @property def iter_typed_values(self) -> Iterator[AtomicType]: raise NotImplementedError() class TextAttributeNode(AttributeNode): """ Class for processing XPath attribute nodes. :param name: the attribute name. :param value: the string value of the attribute. :param parent: the parent element node. :param position: the position of the node in the document. """ name: str obj: str parent: Optional['EtreeElementNode'] __slots__ = () def __init__(self, name: str, value: str, parent: Optional['EtreeElementNode'] = None, position: int = 1) -> None: self.name = name self.obj = value self.parent = parent self.position = position self.xsd_type = None def __repr__(self) -> str: return '%s(name=%r, value=%r)' % (self.__class__.__name__, self.name, self.obj) @property def value(self) -> str: return self.obj def as_item(self) -> Tuple[str, str]: return self.name, self.obj def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: return self.name == name or '*' in name and match_wildcard(self.name, name) @property def string_value(self) -> str: return self.obj @property def iter_typed_values(self) -> Iterator[AtomicType]: if self.parent is not None: yield from get_atomic_sequence(self.xsd_type, self.obj, self.parent.nsmap) else: yield from get_atomic_sequence(self.xsd_type, self.obj) def apply_schema(self, schema: 'AbstractSchemaProxy') -> None: if self.parent is not None: self.parent.apply_schema(schema) elif (xsd_attribute := schema.get_attribute(self.name)) is not None: self.xsd_type = xsd_attribute.type else: self.xsd_type = None class SchemaAttributeNode(AttributeNode): """A class for processing XML Schema attribute nodes.""" name: Optional[str] obj: XsdAttributeProtocol parent: Optional['ElementNode'] __slots__ = () def __init__(self, attr: XsdAttributeProtocol, parent: Optional['ElementNode'] = None, position: int = 1): self.name = attr.name self.obj = attr self.parent = parent self.position = position self.xsd_type = attr.type def __repr__(self) -> str: return '%s(attr=%r)' % (self.__class__.__name__, self.obj) @property def xsd_attribute(self) -> XsdAttributeProtocol: return self.obj value = xsd_attribute def as_item(self) -> Tuple[Optional[str], object]: return self.name, self.obj @property def string_value(self) -> str: return str(get_atomic_sequence(self.xsd_type)) @property def iter_typed_values(self) -> Iterator[AtomicType]: yield from get_atomic_sequence(self.xsd_type) def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if not self.name: return self.obj.is_matching(name, default_namespace) elif '*' in name: return match_wildcard(self.name, name) else: return self.name == name @property def is_schema_node(self) -> bool: return True ### # TEXT NODES class TextNode(XPathNode): """ A class for processing XPath text nodes. An Element's property (elem.text or elem.tail) with a `None` value is not a text node. :param content: a string value. :param parent: the parent element node. :param position: the position of the node in the document. """ name: None obj: str parent: Optional['ElementNode'] children: None = None __slots__ = () def __init__(self, content: str, parent: Optional['ElementNode'] = None, position: int = 1) -> None: self.name = None self.obj = content self.parent = parent self.position = position def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.obj) @property def content(self) -> str: return self.obj value = content @property def path(self) -> str: if self.parent is None: return '/text()[1]' pos = self.parent.get_child_position(self) if isinstance(self.parent, ElementNode): return f"{self.parent.path}/text()[{pos}]" return f"/text()[{pos}]" ### # Text node accessors @property def base_uri(self) -> Optional[str]: return self.parent.base_uri if self.parent is not None else None @property def node_kind(self) -> str: return 'text' @property def string_value(self) -> str: return self.obj @property def type_name(self) -> Optional[str]: return XSD_UNTYPED_ATOMIC @property def typed_value(self) -> SequenceType[AtomicType]: return UntypedAtomic(self.obj) @property def iter_typed_values(self) -> Iterator[UntypedAtomic]: yield UntypedAtomic(self.obj) ### # COMMENT NODES class CommentNode(XPathNode): """ A class for processing XPath comment nodes. :param content: the wrapped Comment Element or a string. :param parent: the parent node. :param position: the position of the node in the document. """ name: None obj: CommentType __slots__ = () def __init__(self, content: Union[CommentType, str], parent: Union[ParentNodeType, None] = None, position: int = 1) -> None: self.name = None if isinstance(content, str): self.obj = ElementTree.Comment(content) else: self.obj = content self.parent = parent self.position = position def __repr__(self) -> str: return '%s(%r)' % (self.__class__.__name__, self.obj.text or '') @property def content(self) -> CommentType: return self.obj elem = value = content @property def path(self) -> str: if self.parent is None: return '/comment()[1]' pos = self.parent.get_child_position(self) if isinstance(self.parent, ElementNode): return f"{self.parent.path}/comment()[{pos}]" return f"/comment()[{pos}]" @property def base_uri(self) -> Optional[str]: return self.parent.base_uri if self.parent is not None else None @property def node_kind(self) -> str: return 'comment' @property def string_value(self) -> str: return self.obj.text or '' @property def typed_value(self) -> str: return self.string_value @property def iter_typed_values(self) -> Iterator[str]: yield self.string_value ### # PROCESSING INSTRUCTION NODES class ProcessingInstructionNode(XPathNode): """ A class for XPath processing instructions nodes. :param target: the wrapped Processing Instruction object or a string. :param content: an optional string, used if *target* is a string. :param parent: the parent element node. :param position: the position of the node in the document. """ name: str obj: ProcessingInstructionType __slots__ = () def __init__(self, target: Union[str, ProcessingInstructionType], content: Optional[str] = None, parent: Optional[ParentNodeType] = None, position: int = 1) -> None: if isinstance(target, str): self.name = target self.obj = ElementTree.ProcessingInstruction(self.name, content) else: if hasattr(target, 'target'): self.name = cast(str, target.target) # lxml PI else: self.name = (target.text or '').partition(' ')[0] self.obj = target self.parent = parent self.position = position def __repr__(self) -> str: return '%s(target=%r, content=%r)' % (self.__class__.__name__, self.name, self.content) @property def target(self) -> str: return self.name @property def content(self) -> str: if hasattr(self.obj, 'target'): return self.obj.text or '' else: return (self.obj.text or '').partition(' ')[-1] @property def elem(self) -> ProcessingInstructionType: return self.obj value = elem @property def path(self) -> str: if self.parent is None: return '/processing-instruction({self.name})[1]' pos = self.parent.get_child_position(self) if isinstance(self.parent, ElementNode): return f"{self.parent.path}/processing-instruction({self.name})[{pos}]" return f"/processing-instruction({self.name})[{pos}]" @property def base_uri(self) -> Optional[str]: return self.parent.base_uri if self.parent is not None else None @property def node_kind(self) -> str: return 'processing-instruction' @property def node_name(self) -> QName: return QName(None, self.name) @property def string_value(self) -> str: return self.content @property def typed_value(self) -> SequenceType[AtomicType]: return self.content @property def iter_typed_values(self) -> Iterator[str]: yield self.content text = string_value ### # ELEMENT NODES class ElementNode(XPathNode): """ Base class for XPath element nodes, used only for type checking. Element nodes use lazy properties to diminish the average load for a tree processing. """ name: Optional[str] obj: object nsmap: Union[NsmapType, NamespacesType] children: List[ChildNodeType] parent: Optional[ParentNodeType] xsd_type: Optional[XsdTypeProtocol] # Lazy protected attributes _uri: str _schema: 'AbstractSchemaProxy' _elements: ElementMapType _namespace_nodes: List[NamespaceNode] _attributes: List[AttributeNode] __slots__ = ('children', 'nsmap', 'xsd_type', '_uri', '_schema', '_elements', '_namespace_nodes', '_attributes') def __new__(cls, *args: Any, **kwargs: Any) -> 'ElementNode': if cls is ElementNode: return object.__new__(EtreeElementNode) return object.__new__(cls) def __repr__(self) -> str: return '%s(elem=%r)' % (self.__class__.__name__, self.obj) def __getitem__(self, i: Union[int, slice]) -> Union[ChildNodeType, List[ChildNodeType]]: return self.children[i] def __len__(self) -> int: return len(self.children) def __iter__(self) -> Iterator[ChildNodeType]: yield from self.children @property def uri_qualified_name(self) -> Optional[str]: """The URI qualified name of the element.""" if not self.name: return self.name elif self.name[0] == '{': return f'Q{self.name}' else: return f'Q{{}}{self.name}' @property def attributes(self) -> List[AttributeNode]: return [] @property def base_uri(self) -> Optional[str]: base_uri = self._uri.strip() if hasattr(self, '_uri') else None if self.parent is None: return base_uri elif base_uri is None: return self.parent.base_uri else: return urljoin(self.parent.base_uri or '', base_uri) @property def is_id(self) -> bool: return self.name == XML_ID or self.xsd_type is not None and self.xsd_type.is_key() @property def is_idrefs(self) -> bool: if self.xsd_type is None: return False root_type = self.xsd_type.root_type return root_type.name == XSD_IDREF or root_type.name == XSD_IDREFS @property def namespace_nodes(self) -> List[NamespaceNode]: if not hasattr(self, '_namespace_nodes'): # Lazy generation of namespace nodes of the element position = self.position + 1 self._namespace_nodes = [NamespaceNode('xml', XML_NAMESPACE, self, position)] position += 1 if self.nsmap: for pfx, uri in self.nsmap.items(): if pfx != 'xml': self._namespace_nodes.append(NamespaceNode(pfx, uri, self, position)) position += 1 return self._namespace_nodes @property def nilled(self) -> bool: return False @property def node_kind(self) -> str: return 'element' @property def string_value(self) -> str: raise NotImplementedError() @property def type_name(self) -> Optional[str]: return XSD_UNTYPED if self.xsd_type is None else self.xsd_type.name @property def typed_value(self) -> SequenceType[AtomicType]: values = [v for v in self.iter_typed_values] if len(values) == 1: return values[0] else: return values @property def iter_typed_values(self) -> Iterator[AtomicType]: raise NotImplementedError() @property def is_list(self) -> bool: return self.xsd_type is not None and self.xsd_type.is_list() @property def uri(self) -> Optional[str]: return getattr(self, '_uri', None) @uri.setter def uri(self, uri: str) -> None: self._uri = uri @property def schema(self) -> Optional['AbstractSchemaProxy']: root_node = self while isinstance(root_node.parent, EtreeElementNode): root_node = root_node.parent return getattr(root_node, '_schema', None) @schema.setter def schema(self, schema: 'AbstractSchemaProxy') -> None: root_node = self while isinstance(root_node.parent, EtreeElementNode): root_node = root_node.parent root_node._schema = schema @property def elements(self) -> Optional[ElementMapType]: return getattr(self, '_elements', None) @elements.setter def elements(self, elements: ElementMapType) -> None: self._elements = elements @property def name_path(self) -> str: return self.uri_qualified_name or _EMPTY_NAME_PATH @property def path(self) -> str: if self.parent is None: return f'/{self.name_path}[1]' pos = self.parent.get_child_position(self) if isinstance(self.parent, ElementNode): return f"{self.parent.path}/{self.name_path}[{pos}]" return f"/{self.name_path}[{pos}]" @property def default_namespace(self) -> Optional[str]: if None in self.nsmap: return self.nsmap[None] # type: ignore else: return self.nsmap.get('') @property def is_typed(self) -> bool: return self.xsd_type is not None def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if self.name is None: return False elif '*' in name: return match_wildcard(self.name, name) elif not name: return not self.name elif name[0] == '{' or not default_namespace: return self.name == name else: return self.name == f'{{{default_namespace}}}{name}' def get_element_node(self, elem: Union[ElementProtocol, SchemaElemType]) \ -> Optional['ElementNode']: if hasattr(self, '_elements'): return self._elements.get(elem) # Fallback if there is not the map of elements but do not expand lazy elements for node in self.iter(): if isinstance(node, ElementNode) and elem is node.obj: return node else: return None def get_document_node(self, replace: bool = True, as_parent: bool = True) -> 'DocumentNode': """ Returns a `DocumentNode` for the element node. If the element belongs to a tree that already has a document root, returns the document, otherwise creates a dummy document. :param replace: if `True` the root element of the tree is replaced by the \ document node. This is usually useful for extended data models (more element \ children, text nodes). :param as_parent: if `True` the root node/s of parent attribute is set with \ the dummy document node, otherwise is set to `None`. """ raise NotImplementedError() def iter(self) -> Iterator[XPathNode]: """Iterates the tree building lazy components.""" yield self yield from self.namespace_nodes yield from self.attributes for child in self: if isinstance(child, ElementNode): yield from child.iter() else: yield child iter_document = iter # For backward compatibility def iter_lazy(self) -> Iterator[XPathNode]: """Iterates the tree not including the not built lazy components.""" yield self iterators: Deque[Any] = deque() # slightly faster than list() children: Iterator[Any] = iter(self.children) if hasattr(self, '_namespace_nodes'): yield from self._namespace_nodes if hasattr(self, '_attributes'): yield from self._attributes while True: for child in children: yield child if isinstance(child, ElementNode): if hasattr(child, '_namespace_nodes'): yield from child._namespace_nodes if hasattr(child, '_attributes'): yield from child._attributes if child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return def iter_descendants(self, with_self: bool = True) -> Iterator[ChildNodeType]: if with_self: yield self iterators: Deque[Any] = deque() children: Iterator[Any] = iter(self.children) while True: for child in children: yield child if isinstance(child, ElementNode) and child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return class EtreeElementNode(ElementNode): """ XPath element nodes for wrapping ElementTree elements. :param elem: the wrapped Element or XSD schema/element. :param parent: the parent document node or element node. :param position: the position of the node in the document. :param nsmap: an optional mapping from prefix to namespace URI. """ name: str obj: ElementType xsd_element: Optional[XsdElementProtocol] __slots__ = () def __init__(self, elem: ElementType, parent: Optional[ParentNodeType] = None, position: int = 1, nsmap: Union[NsmapType, NamespacesType, None] = None): self.name = elem.tag self.obj = elem self.parent = parent self.position = position self.children = [] self.xsd_type = None if nsmap is not None: self.nsmap = nsmap else: try: self.nsmap = cast(Dict[Any, str], getattr(elem, 'nsmap')) except AttributeError: self.nsmap = {} @property def content(self) -> ElementType: return self.obj elem = value = content @property def attributes(self) -> List[AttributeNode]: if not hasattr(self, '_attributes'): position = self.position + len(self.nsmap) + int('xml' not in self.nsmap) + 1 self._attributes = [ TextAttributeNode(name, value, self, pos) for pos, (name, value) in enumerate(self.obj.attrib.items(), position) ] return self._attributes @property def base_uri(self) -> Optional[str]: base_uri = self.obj.get(XML_BASE) if isinstance(base_uri, str): base_uri = base_uri.strip() elif base_uri is not None: base_uri = '' elif hasattr(self, '_uri'): base_uri = self._uri.strip() if self.parent is None: return base_uri elif base_uri is None: return self.parent.base_uri else: return urljoin(self.parent.base_uri or '', base_uri) @property def nilled(self) -> bool: return self.obj.get(XSI_NIL) in ('true', '1') @property def string_value(self) -> str: if self.xsd_type is not None and self.xsd_type.is_element_only(): # Element-only text content is normalized return ''.join(etree_iter_strings(self.obj, normalize=True)) return ''.join(etree_iter_strings(self.obj)) @property def typed_value(self) -> SequenceType[AtomicType]: values = [v for v in self.iter_typed_values] if len(values) == 1: return values[0] else: return values @property def iter_typed_values(self) -> Iterator[AtomicType]: if self.xsd_type is None or \ self.xsd_type.name in _XSD_SPECIAL_TYPES or \ self.xsd_type.has_mixed_content(): yield UntypedAtomic(''.join(etree_iter_strings(self.obj))) elif self.xsd_type.is_element_only(): return elif self.obj.get(XSI_NIL) and getattr(self.xsd_type.parent, 'nillable', None): return elif self.obj.text is not None: yield from get_atomic_sequence(self.xsd_type, self.obj.text, self.nsmap) elif self.obj.get(XSI_NIL) in ('1', 'true'): yield '' else: yield from get_atomic_sequence(self.xsd_type, '') def apply_schema(self, schema: 'AbstractSchemaProxy') -> None: if self.schema is schema and not schema.is_assertion_based(): return self.schema = schema if not schema.is_fully_valid(): element_type = schema.get_type(XSD_ANY_TYPE) attribute_type = schema.get_type(XSD_ANY_SIMPLE_TYPE) for elem in self.iter_descendants(with_self=True): if isinstance(elem, EtreeElementNode): elem.xsd_type = element_type for attr in elem.attributes: attr.xsd_type = attribute_type return if (xsd_element := schema.base_element) is not None: paths = ['./'] children: Iterator[Any] = iter(self) if schema.is_assertion_based(): self.xsd_type = schema.get_type(XSD_ANY_TYPE) else: self.xsd_type = xsd_element.type for attr in self.attributes: if attr.name in xsd_element.attrib: attr.xsd_type = xsd_element.attrib[attr.name].type else: xsd_attribute = schema.cached_find(f'./@{attr.name}') if xsd_attribute is not None and hasattr(xsd_attribute, 'type'): attr.xsd_type = xsd_attribute.type else: attr.xsd_type = None else: root_node: ParentNodeType = self while isinstance(root_node.parent, EtreeElementNode): root_node = root_node.parent paths = ['/'] children = iter((root_node,)) iterators: List[Any] = [] while True: for elem in children: if not isinstance(elem, EtreeElementNode): continue child_path = f'{paths[-1]}{elem.name}/' if isinstance(xsi_type := elem.obj.attrib.get(XSI_TYPE), str): xsd_element = None try: type_name = get_expanded_name(xsi_type, elem.nsmap) except KeyError: elem.clear_types() continue else: elem.xsd_type = schema.get_type(type_name) else: result = schema.cached_find(f'{paths[-1]}{elem.name}') if result is not None and hasattr(result, 'type'): elem.xsd_type = cast(XsdElementProtocol, result).type else: elem.clear_types() continue for attr in elem.attributes: if xsd_element is not None and attr.name in xsd_element.attrib: attr.xsd_type = xsd_element.attrib[attr.name].type else: xsd_attribute = schema.cached_find(f'{child_path}@{attr.name}') if xsd_attribute is not None and hasattr(xsd_attribute, 'type'): attr.xsd_type = xsd_attribute.type else: attr.xsd_type = None if len(elem.obj): paths.append(child_path) iterators.append(children) children = iter(elem) break else: try: children = iterators.pop() paths.pop() except IndexError: return def clear_types(self) -> None: """Clear XSD types for element node subtree.""" for elem in self.iter_descendants(with_self=True): if isinstance(elem, EtreeElementNode): elem.xsd_type = None for attr in elem.attributes: attr.xsd_type = None @property def is_typed(self) -> bool: return self.xsd_type is not None def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if '*' in name: return match_wildcard(self.obj.tag, name) elif not name: return not self.obj.tag elif hasattr(self.obj, 'type'): return cast(XsdElementProtocol, self.obj).is_matching(name, default_namespace) elif name[0] == '{' or not default_namespace: return self.obj.tag == name else: return self.obj.tag == f'{{{default_namespace}}}{name}' def get_document_node(self, replace: bool = True, as_parent: bool = True) -> 'DocumentNode': """ Returns a `DocumentNode` for the element node. If the element belongs to a tree that already has a document root, returns the document, otherwise creates a dummy document if the element node wraps an Element of an ElementTree structure or return `None`. :param replace: if `True` the root element of the tree is replaced by the \ document node. This is usually useful for extended data models (more element \ children, text nodes). :param as_parent: if `True` the root node/s of parent attribute is set with \ the dummy document node, otherwise is set to `None`. """ root_node: ParentNodeType = self while root_node.parent is not None: root_node = root_node.parent if isinstance(root_node, DocumentNode): return root_node if root_node.obj.__class__.__module__ not in ('lxml.etree', 'lxml.html'): etree = ElementTree else: etree = importlib.import_module('lxml.etree') if replace: document = etree.ElementTree() if sum(isinstance(x, ElementNode) for x in root_node.children) == 1: for child in root_node.children: if isinstance(child, ElementNode): document = etree.ElementTree(cast(ElementTree.Element, child.obj)) break document_node = DocumentNode(document, root_node.uri, root_node.position) for child in root_node.children: document_node.children.append(child) child.parent = document_node if as_parent else None if root_node.elements is not None: root_node.elements.pop(root_node, None) # type: ignore[call-overload] document_node.elements = root_node.elements del root_node else: document = etree.ElementTree(cast(ElementTree.Element, root_node.obj)) document_node = DocumentNode(document, root_node.uri, root_node.position - 1) document_node.children.append(root_node) if as_parent: root_node.parent = document_node if root_node.elements is not None: document_node.elements = root_node.elements return document_node ### # Specialized element nodes class LazyElementNode(EtreeElementNode): """ A fully lazy element node, slower but better if the node has not to be used in a document context. The node extends descendants but does not record positions and a map of elements. """ __slots__ = () def __iter__(self) -> Iterator[ChildNodeType]: if not self.children: if self.obj.text is not None: self.children.append(TextNode(self.obj.text, self)) if len(self.obj): for elem in self.obj: if not callable(elem.tag): nsmap = cast(Dict[Any, str], getattr(elem, 'nsmap', self.nsmap)) self.children.append(LazyElementNode(elem, self, nsmap=nsmap)) elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] self.children.append(CommentNode(elem, self)) else: self.children.append(ProcessingInstructionNode(elem, parent=self)) if elem.tail is not None: self.children.append(TextNode(elem.tail, self)) yield from self.children def iter_descendants(self, with_self: bool = True) -> Iterator[ChildNodeType]: if with_self: yield self for child in self: if isinstance(child, ElementNode): yield from child.iter_descendants() else: yield child class SchemaElementNode(ElementNode): """ An element node class for wrapping the XSD schema and its elements. The resulting structure can be a tree or a set of disjoint trees. With more roots only one of them is the schema node. """ ref: Optional['SchemaElementNode'] = None obj: SchemaElemType __slots__ = ('__dict__',) def __init__(self, elem: SchemaElemType, parent: Optional[ParentNodeType] = None, position: int = 1, nsmap: Optional[NsmapType] = None): self.name = elem.tag self.obj = elem self.parent = parent self.position = position self.nsmap = nsmap if nsmap is not None else {} self.children = [] self.xsd_type = getattr(elem, 'type', None) def __iter__(self) -> Iterator[ChildNodeType]: if self.ref is None: yield from self.children else: yield from self.ref.children @property def xsd_element(self) -> Optional[XsdElementProtocol]: if hasattr(self.obj, 'type'): return cast(XsdElementProtocol, self.obj) else: return None @property def content(self) -> SchemaElemType: return self.obj elem = value = content @property def path(self) -> str: if not hasattr(self, 'type'): return '/' return super().path @property def is_schema_node(self) -> bool: return True def match_name(self, name: str, default_namespace: Optional[str] = None) -> bool: if '*' in name: return match_wildcard(self.obj.tag, name) elif not name: return not self.obj.tag elif hasattr(self.obj, 'type'): return self.obj.is_matching(name, default_namespace) else: return self.obj.tag == name # a schema @property def attributes(self) -> List[AttributeNode]: if not hasattr(self, '_attributes'): position = self.position + len(self.nsmap) + int('xml' not in self.nsmap) self._attributes = [ SchemaAttributeNode(attr, self, pos) for pos, (_, attr) in enumerate(self.obj.attrib.items(), position) ] return self._attributes @property def base_uri(self) -> Optional[str]: base_uri = self._uri.strip() if hasattr(self, '_uri') else None if self.parent is None: return base_uri elif base_uri is None: return self.parent.base_uri else: return urljoin(self.parent.base_uri or '', base_uri) @property def type_name(self) -> Optional[str]: if (xsd_type := getattr(self.obj, 'type', None)) is not None: return cast(Optional[str], xsd_type.name) return None @property def string_value(self) -> str: if not hasattr(self.obj, 'type'): return '' for item in get_atomic_sequence(self.xsd_type): return str(item) return '' @property def iter_typed_values(self) -> Iterator[AtomicType]: yield from get_atomic_sequence(self.xsd_type) def iter(self) -> Iterator[XPathNode]: yield self iterators: List[Any] = [] children: Iterator[Any] = iter(self.children) if hasattr(self, '_namespace_nodes'): yield from self._namespace_nodes if hasattr(self, '_attributes'): yield from self._attributes elements = {self} while True: for child in children: if child in elements: continue yield child elements.add(child) if isinstance(child, ElementNode): if hasattr(child, '_namespace_nodes'): yield from child._namespace_nodes if hasattr(child, '_attributes'): yield from child._attributes if child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return def iter_descendants(self, with_self: bool = True) -> Iterator[ChildNodeType]: if with_self: yield self iterators: List[Any] = [] children: Iterator[Any] = iter(self.children) elements = {self} while True: for child in children: if child.ref is not None: child = child.ref if child in elements: continue yield child elements.add(child) if child.children: iterators.append(children) children = iter(child.children) break else: try: children = iterators.pop() except IndexError: return ### # DOCUMENT NODES class DocumentNode(XPathNode): """ Base class for all XPath document nodes. """ name: None obj: object parent: None uri: Optional[str] children: List[ChildNodeType] elements: Dict[ElementProtocol, ElementNode] __slots__ = ('children', 'uri', 'elements') def __new__(cls, *args: Any, **kwargs: Any) -> 'DocumentNode': if cls is DocumentNode: return object.__new__(EtreeDocumentNode) return object.__new__(cls) def __repr__(self) -> str: return '%s(document=%r)' % (self.__class__.__name__, self.document) def __getitem__(self, i: Union[int, slice]) -> Union[ChildNodeType, List[ChildNodeType]]: return self.children[i] def __len__(self) -> int: return len(self.children) def __iter__(self) -> Iterator[ChildNodeType]: yield from self.children @property def document(self) -> object: return self.obj value = document @property def path(self) -> str: return '/' def getroot(self) -> ElementNode: for child in self.children: if isinstance(child, ElementNode): return child raise ElementPathRuntimeError("Missing document root") def get_element_node(self, elem: ElementProtocol) -> Optional[ElementNode]: return self.elements.get(elem) def iter(self) -> Iterator[XPathNode]: yield self for e in self.children: if isinstance(e, ElementNode): yield from e.iter() else: yield e iter_document = iter def iter_lazy(self) -> Iterator[XPathNode]: yield self for e in self.children: if isinstance(e, ElementNode): yield from e.iter_lazy() else: yield e def iter_descendants(self, with_self: bool = True) \ -> Iterator[Union['DocumentNode', ChildNodeType]]: if with_self: yield self for e in self.children: if isinstance(e, ElementNode): yield from e.iter_descendants() else: yield e @property def is_typed(self) -> bool: for child in self.children: if isinstance(child, ElementNode): return child.is_typed else: return False def apply_schema(self, schema: 'AbstractSchemaProxy') -> None: for child in self.children: if isinstance(child, EtreeElementNode): child.apply_schema(schema) def clear_types(self) -> None: for child in self.children: if isinstance(child, EtreeElementNode): child.clear_types() @property def is_extended(self) -> bool: """ Returns `True` if the document node can't be represented with an ElementTree structure, `False` otherwise. """ if not self.children: raise ElementPathRuntimeError("Missing document root") return len(self.children) > 1 or not isinstance(self.children[0], ElementNode) @property def base_uri(self) -> Optional[str]: return self.uri.strip() if self.uri is not None else None @property def document_uri(self) -> Optional[str]: if self.uri is not None and is_absolute_uri(self.uri): return self.uri.strip() else: return None @property def node_kind(self) -> str: return 'document' @property def string_value(self) -> str: raise NotImplementedError() @property def typed_value(self) -> AtomicType: return UntypedAtomic(self.string_value) @property def iter_typed_values(self) -> Iterator[UntypedAtomic]: yield UntypedAtomic(self.string_value) class EtreeDocumentNode(DocumentNode): """ A class for ElementTree document nodes. :param document: the wrapped ElementTree instance. :param uri: the document URI. :param position: the position of the node in the document, usually 1, \ or 0 for lxml standalone root elements with siblings. """ obj: DocumentType __slots__ = () def __init__(self, document: DocumentType, uri: Optional[str] = None, position: int = 1) -> None: self.obj = document self.uri = uri self.name = None self.parent = None self.position = position self.elements = {} self.children = [] @property def document(self) -> DocumentType: return self.obj @property def string_value(self) -> str: if not self.children: # Fallback for not built documents root = self.document.getroot() if root is None: return '' return ''.join(etree_iter_strings(root)) return ''.join(child.string_value for child in self.children) @property def is_extended(self) -> bool: """ Returns `True` if the document node can't be represented with an ElementTree structure, `False` otherwise. """ root = self.document.getroot() if root is None or not is_etree_element_instance(root): return True elif not self.children: raise ElementPathRuntimeError("Missing document root") elif len(self.children) == 1: return not isinstance(self.children[0], ElementNode) elif not hasattr(root, 'itersiblings'): return True # an extended xml.etree.ElementTree structure elif any(isinstance(x, TextNode) for x in root): return True else: return sum(isinstance(x, ElementNode) for x in root) != 1 ### # Type annotation aliases XPathNodeType = Union[DocumentNode, NamespaceNode, AttributeNode, TextNode, ElementNode, CommentNode, ProcessingInstructionNode] RootNodeType = Union[DocumentNode, ElementNode] RootArgType = Union[DocumentType, ElementType, SchemaElemType, RootNodeType] sissaschool-elementpath-d3688c7/elementpath/xpath_selectors.py000066400000000000000000000232061476131650400250210ustar00rootroot00000000000000# # Copyright (c), 2018, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import datetime from typing import Any, Dict, Optional, Union from elementpath._typing import Iterator from elementpath.aliases import NamespacesType, InputType from elementpath.xpath_nodes import RootArgType from elementpath.xpath_context import ItemArgType, XPathContext from elementpath.xpath2 import XPath2Parser from elementpath.datatypes import Timezone from elementpath.schema_proxy import AbstractSchemaProxy from elementpath.xpath_tokens import ParserClassType def select(root: Optional[RootArgType], path: str, namespaces: Optional[NamespacesType] = None, parser: Optional['ParserClassType'] = None, uri: Optional[str] = None, fragment: Optional[bool] = None, item: Optional[ItemArgType] = None, position: int = 1, size: int = 1, axis: Optional[str] = None, schema: Optional[AbstractSchemaProxy] = None, variables: Optional[Dict[str, InputType[ItemArgType]]] = None, current_dt: Optional[datetime.datetime] = None, timezone: Optional[Union[str, Timezone]] = None, **kwargs: Any) -> Any: """ XPath selector function that apply a *path* expression on *root* Element. :param root: the root of the XML document, usually an ElementTree instance or an \ Element. A schema or a schema element can also be provided, or an already built \ node tree. You can also provide `None`, in which case no XML root node is set in \ the dynamic context, and you have to provide the keyword argument *item*. :param path: the XPath expression. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param parser: the parser class to use, that is :class:`XPath2Parser` for default. :param uri: an optional URI associated with the root element or the document. :param fragment: if `True` is provided the root is considered a fragment. In this \ case if `root` is an ElementTree instance skips it and use the root Element. If \ `False` is provided creates a dummy document when the root is an Element instance. \ In this case the dummy document value is not included in results. For default the \ root node kind is preserved. :param item: the context item. A `None` value means that the context is positioned on \ the document node. :param position: the current position of the node within the input sequence. :param size: the number of items in the input sequence. :param axis: the active axis. Used to choose when apply the default axis ('child' axis). :param schema: an optional schema proxy instance for applying XSD type annotations \ on element and attribute nodes. :param variables: dictionary of context variables that maps a QName to a value. :param current_dt: current dateTime of the implementation, including explicit timezone. :param timezone: implicit timezone to be used when a date, time, or dateTime value does \ not have a timezone. :param kwargs: other optional parameters for the parser instance. :return: a list with XPath nodes or a basic type for expressions based \ on a function or literal. """ _parser = (parser or XPath2Parser)(namespaces, **kwargs) root_token = _parser.parse(path) context = XPathContext(root, namespaces, uri, fragment, item, position, size, axis, schema, variables, current_dt, timezone) return root_token.get_results(context) def iter_select(root: Optional[RootArgType], path: str, namespaces: Optional[NamespacesType] = None, parser: Optional['ParserClassType'] = None, uri: Optional[str] = None, fragment: Optional[bool] = None, item: Optional[ItemArgType] = None, position: int = 1, size: int = 1, axis: Optional[str] = None, schema: Optional[AbstractSchemaProxy] = None, variables: Optional[Dict[str, InputType[ItemArgType]]] = None, current_dt: Optional[datetime.datetime] = None, timezone: Optional[Union[str, Timezone]] = None, **kwargs: Any) -> Iterator[Any]: """ A function that creates an XPath selector generator for apply a *path* expression on *root* Element. :param root: the root of the XML document, usually an ElementTree instance or an \ Element. A schema or a schema element can also be provided, or an already built \ node tree. You can also provide `None`, in which case no XML root node is set in \ the dynamic context, and you have to provide the keyword argument *item*. :param path: the XPath expression. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param parser: the parser class to use, that is :class:`XPath2Parser` for default. :param uri: an optional URI associated with the root element or the document. :param fragment: if `True` is provided the root is considered a fragment. In this \ case if `root` is an ElementTree instance skips it and use the root Element. If \ `False` is provided creates a dummy document when the root is an Element instance. \ In this case the dummy document value is not included in results. For default the \ root node kind is preserved. :param item: the context item. A `None` value means that the context is positioned on \ the document node. :param position: the current position of the node within the input sequence. :param size: the number of items in the input sequence. :param axis: the active axis. Used to choose when apply the default axis ('child' axis). :param schema: an optional schema proxy instance for applying XSD type annotations \ on element and attribute nodes. :param variables: dictionary of context variables that maps a QName to a value. :param current_dt: current dateTime of the implementation, including explicit timezone. :param timezone: implicit timezone to be used when a date, time, or dateTime value does \ not have a timezone. :param kwargs: other optional parameters for the parser instance. :return: a generator of the XPath expression results. """ _parser = (parser or XPath2Parser)(namespaces, **kwargs) root_token = _parser.parse(path) context = XPathContext(root, namespaces, uri, fragment, item, position, size, axis, schema, variables, current_dt, timezone) return root_token.select_results(context) class Selector(object): """ XPath selector class. Create an instance of this class if you want to apply an XPath selector to several target data. :param path: the XPath expression. :param namespaces: a dictionary with mapping from namespace prefixes into URIs. :param parser: the parser class to use, that is :class:`XPath2Parser` for default. :param kwargs: other optional parameters for the XPath parser instance. :ivar path: the XPath expression. :vartype path: str :ivar parser: the parser instance. :vartype parser: XPath1Parser or XPath2Parser :ivar root_token: the root of tokens tree compiled from path. :vartype root_token: XPathToken """ def __init__(self, path: str, namespaces: Optional[NamespacesType] = None, parser: Optional['ParserClassType'] = None, **kwargs: Any) -> None: self._variables = kwargs.pop('variables', None) # For backward compatibility self.parser = (parser or XPath2Parser)(namespaces, **kwargs) self.path = path self.root_token = self.parser.parse(path) def __repr__(self) -> str: return '%s(path=%r, parser=%s)' % ( self.__class__.__name__, self.path, self.parser.__class__.__name__ ) @property def namespaces(self) -> Dict[str, str]: """A dictionary with mapping from namespace prefixes into URIs.""" return self.parser.namespaces def select(self, root: Optional[RootArgType], **kwargs: Any) -> Any: """ Applies the instance's XPath expression on *root* Element. :param root: the root of the XML document, usually an ElementTree instance \ or an Element. :param kwargs: other optional parameters for the XPath dynamic context. :return: a list with XPath nodes or a basic type for expressions based on \ a function or literal. """ if 'schema' not in kwargs: kwargs['schema'] = self.parser.schema if 'variables' not in kwargs and self._variables: kwargs['variables'] = self._variables context = XPathContext(root, **kwargs) return self.root_token.get_results(context) def iter_select(self, root: Optional[RootArgType], **kwargs: Any) -> Iterator[Any]: """ Creates an XPath selector generator for apply the instance's XPath expression on *root* Element. :param root: the root of the XML document, usually an ElementTree instance \ or an Element. :param kwargs: other optional parameters for the XPath dynamic context. :return: a generator of the XPath expression results. """ if 'variables' not in kwargs and self._variables: kwargs['variables'] = self._variables context = XPathContext(root, **kwargs) return self.root_token.select_results(context) sissaschool-elementpath-d3688c7/elementpath/xpath_tokens.py000066400000000000000000002151751476131650400243310ustar00rootroot00000000000000# # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ XPathToken class and derived classes for other XPath objects (functions, constructors, axes, maps, arrays). XPath's error creation and node helper functions are embedded in XPathToken class, in order to raise errors related to token instances. """ import decimal import math from copy import copy from decimal import Decimal from itertools import product from typing import TYPE_CHECKING, Any, cast, Dict, List, Optional, SupportsFloat, \ Tuple, Type, Union import urllib.parse from elementpath._typing import Callable, Iterable, Iterator from elementpath.aliases import NargsType, ClassCheckType, Emptiable from elementpath.protocols import ElementProtocol, DocumentProtocol, \ XsdAttributeProtocol from elementpath.exceptions import ElementPathError, ElementPathValueError, \ ElementPathTypeError, MissingContextError, xpath_error from elementpath.helpers import ordinal, get_double, split_function_test from elementpath.etree import is_etree_element, is_etree_document from elementpath.namespaces import XSD_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE, \ XPATH_MATH_FUNCTIONS_NAMESPACE, XSD_DECIMAL, XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, \ XSD_ANY_ATOMIC_TYPE from elementpath.tree_builders import get_node_tree from elementpath.xpath_nodes import XPathNode, ElementNode, DocumentNode, NamespaceNode from elementpath.datatypes import xsd10_atomic_types, AbstractDateTime, AnyURI, \ UntypedAtomic, Timezone, DateTime10, Date10, DayTimeDuration, Duration, \ Integer, DoubleProxy10, DoubleProxy, QName, AtomicType, AnyAtomicType from elementpath.sequence_types import is_sequence_type_restriction, match_sequence_type from elementpath.tdop import Token, MultiLabel from elementpath.xpath_context import ContextType, ItemType, ValueType, ItemArgType, \ FunctionArgType, XPathSchemaContext, XPathContext if TYPE_CHECKING: from .xpath1 import XPath1Parser # noqa: F401 from .xpath2 import XPath2Parser # noqa: F401 from .xpath30 import XPath30Parser # noqa: F401 from .xpath31 import XPath31Parser # noqa: F401 XPathParserType = Union['XPath1Parser', 'XPath2Parser', 'XPath30Parser', 'XPath31Parser'] XPath2ParserType = Union['XPath2Parser', 'XPath30Parser', 'XPath31Parser'] ParserClassType = Union[ Type['XPath1Parser'], Type['XPath2Parser'], Type['XPath30Parser'], Type['XPath31Parser'] ] _XSD_SPECIAL_TYPES = {XSD_ANY_TYPE, XSD_ANY_SIMPLE_TYPE, XSD_ANY_ATOMIC_TYPE} _CHILD_AXIS_TOKENS = { '*', 'node', 'child', 'text', '(name)', ':', '[', 'document-node', 'element', 'comment', 'processing-instruction', 'schema-element' } _LEAF_ELEMENTS_TOKENS = { '(name)', '*', ':', '..', '.', '[', 'self', 'child', 'parent', 'following-sibling', 'preceding-sibling', 'ancestor', 'ancestor-or-self', 'descendant', 'descendant-or-self', 'following', 'preceding' } # Type annotations aliases XPathTokenType = Union['XPathToken', 'XPathAxis', 'XPathFunction', 'XPathConstructor'] _ResultType = Union[ AtomicType, ElementProtocol, XsdAttributeProtocol, Tuple[Optional[str], str], DocumentProtocol, DocumentNode, 'XPathFunction', object ] _MapDictType = Dict[Optional[AtomicType], ValueType] _SequenceTypesType = Union[str, List[str], Tuple[str, ...]] class XPathToken(Token[XPathTokenType]): """Base class for XPath tokens.""" parser: XPathParserType value: ValueType namespace: Optional[str] = None # for namespace binding of names and wildcards occurrence: Optional[str] = None # occurrence indicator for item types concatenated = False # a flag for infix operators that can be concatenated def evaluate(self, context: ContextType = None) -> ValueType: """ Evaluate default method for XPath tokens. :param context: The XPath dynamic context. """ return [x for x in self.select(context)] def select(self, context: ContextType = None) -> Iterator[ItemType]: """ Select operator that generates XPath results. :param context: The XPath dynamic context. """ item = self.evaluate(context) if isinstance(item, list): yield from item else: yield item def select_flatten(self, context: ContextType = None) -> Iterator[ItemType]: """A select that flattens XPath results, including arrays.""" for item in self.select(context): if isinstance(item, list): yield from item elif isinstance(item, XPathArray): yield from item.iter_flatten(context) else: yield item def __str__(self) -> str: if self.symbol == '$': return '$%s variable reference' % (self[0].value if self._items else '') elif self.symbol == ',': return 'comma operator' if self.parser.version > '1.0' else 'comma symbol' elif self.symbol == '(': if not self or self[0].span[0] >= self.span[0]: return 'parenthesized expression' else: return 'function call expression' return super(XPathToken, self).__str__() @property def source(self) -> str: symbol = self.symbol if self.label == 'axis': # For XPath 2.0 'attribute' multirole token ('kind test', 'axis') return '%s::%s' % (symbol, self[0].source) elif symbol == '/' or symbol == '//': if not self: return symbol elif len(self) == 1: return f'{symbol}{self[0].source}' else: return f'{self[0].source}{symbol}{self[1].source}' elif symbol == '(': if not self: return '()' elif len(self) == 2: return f'{self[0].source}({self[1].source})' elif self[0].span[0] < self.span[0]: return f'{self[0].source}()' else: return f'({self[0].source})' elif symbol == '[': return '%s[%s]' % (self[0].source, self[1].source) elif symbol == ',': return '%s, %s' % (self[0].source, self[1].source) elif symbol == '$' or symbol == '@': return f'{symbol}{self[0].source}' elif symbol == '#': return '%s#%s' % (self[0].source, self[1].source) elif symbol == '{' or symbol == 'Q{': return '%s%s}%s' % (symbol, self[0].value, self[1].source) elif symbol == '=>': if isinstance(self[1], XPathFunction): return '%s => %s%s' % (self[0].source, self[1].symbol, self[2].source) return '%s => %s%s' % (self[0].source, self[1].source, self[2].source) elif symbol == 'if': return 'if (%s) then %s else %s' % (self[0].source, self[1].source, self[2].source) elif symbol == 'instance': return '%s instance of %s' % ( self[0].source, ''.join(str(t.source) for t in self[1:]) ) elif symbol in ('treat', 'cast', 'castable'): return '%s %s as %s' % ( self[0].source, symbol, ''.join(str(t.source) for t in self[1:]) ) elif symbol == 'for': return 'for %s return %s' % ( ', '.join('%s in %s' % (self[k].source, self[k + 1].source) for k in range(0, len(self) - 1, 2)), self[-1].source ) elif symbol in ('every', 'some'): return '%s %s satisfies %s' % ( symbol, ', '.join('%s in %s' % (self[k].source, self[k + 1].source) for k in range(0, len(self) - 1, 2)), self[-1].source ) elif symbol == 'let': return 'let %s return %s' % ( ', '.join('%s := %s' % (self[k].source, self[k + 1].source) for k in range(0, len(self) - 1, 2)), self[-1].source ) elif symbol in ('-', '+') and len(self) == 1: return symbol + self[0].source return super(XPathToken, self).source @property def name(self) -> str: if self.symbol == '@': return self[0].name if self.symbol != '(name)': return '' local_name = self.value assert isinstance(local_name, str) if self.namespace: return f'{{{self.namespace}}}{local_name}' elif namespace := self.parser.default_namespace: return f'{{{namespace}}}{local_name}' else: return local_name @property def child_axis(self) -> bool: """Is `True` if the token apply child axis for default, `False` otherwise.""" if self.symbol not in _CHILD_AXIS_TOKENS: return False elif self.symbol == '[': return self._items[0].child_axis elif self.symbol != ':': return True return not self._items[1].label.endswith('function') ### # Tokens tree analysis methods def iter_leaf_elements(self) -> Iterator[str]: """ Iterates through the leaf elements of the token tree if there are any, returning QNames in prefixed format. A leaf element is an element positioned at last path step. Does not consider kind tests and wildcards. """ if self.symbol in ('(name)', ':'): yield cast(str, self.value) elif self.symbol in ('//', '/'): if self._items[-1].symbol in _LEAF_ELEMENTS_TOKENS: yield from self._items[-1].iter_leaf_elements() elif self.symbol in ('[',): yield from self._items[0].iter_leaf_elements() else: for tk in self._items: yield from tk.iter_leaf_elements() def parse_sequence_type(self) -> 'XPathToken': if self.parser.next_token.label in ('kind test', 'sequence type', 'function test'): token = self.parser.expression(rbp=85) else: if self.parser.next_token.symbol == 'Q{': token = self.parser.advance().nud() elif self.parser.next_token.symbol != '(name)': raise self.wrong_syntax() else: self.parser.advance() if self.parser.next_token.symbol == ':': left = self.parser.token self.parser.advance() token = self.parser.token.led(left) else: token = self.parser.token if self.parser.next_token.symbol in ('::', '('): raise self.parser.next_token.wrong_syntax() next_symbol = self.parser.next_token.symbol if token.symbol != 'empty-sequence' and next_symbol in ('?', '*', '+'): token.occurrence = next_symbol self.parser.advance() return token def parse_occurrence(self) -> None: if self.parser.next_token.symbol in ('*', '+', '?'): self.occurrence = self.parser.next_token.symbol self.parser.advance() self.parser.next_token.unexpected('*', '+', '?') ### # Dynamic context methods def get_argument(self, context: ContextType, index: int = 0, required: bool = False, default_to_context: bool = False, default: Optional[AtomicType] = None, cls: Optional[Type[Any]] = None, promote: Optional[ClassCheckType] = None) -> Any: """ Get the argument value of a function of constructor token. A zero length sequence is converted to a `None` value. If the function has no argument returns the context's item if the dynamic context is not `None`. :param context: the dynamic context. :param index: an index for select the argument to be got, the first for default. :param required: if set to `True` missing or empty sequence arguments are not allowed. :param default_to_context: if set to `True` then the item of the dynamic context is \ returned when the argument is missing. :param default: the default value returned in case the argument is an empty sequence. \ If not provided returns `None`. :param cls: if a type is provided performs a type checking on item. :param promote: a class or a tuple of classes that are promoted to `cls` class. """ item: Optional[ItemType] try: token = self._items[index] except IndexError: if default_to_context: if context is None: raise self.missing_context() from None item = context.item if context.item is not None else context.root elif isinstance(context, XPathSchemaContext): return default elif required: msg = "missing %s argument" % ordinal(index + 1) raise self.error('XPST0017', msg) from None else: return default else: if isinstance(token, XPathFunction) and token.is_reference(): return token # It's a function reference item = None for k, result in enumerate(token.select(copy(context))): if k == 0: item = result elif self.parser.compatibility_mode: break elif isinstance(context, XPathSchemaContext): # Multiple schema nodes are ignored but do not raise. The target # of schema context selection is XSD type association and multiple # node coherency is already checked at schema level. break else: msg = "a sequence of more than one item is not allowed as argument" raise self.error('XPTY0004', msg) else: if item is None: if not required or isinstance(context, XPathSchemaContext): return default ord_arg = ordinal(index + 1) msg = "A not empty sequence required for {} argument" raise self.error('XPTY0004', msg.format(ord_arg)) if cls is not None: return self.validated_value(item, cls, promote, index) return item def get_argument_tokens(self) -> List['XPathToken']: """ Builds and returns the argument tokens list, expanding the comma tokens. """ tk = self tokens = [] while True: if tk.symbol == ',': tokens.append(tk[1]) tk = tk[0] else: tokens.append(tk) return tokens[::-1] def get_function(self, context: ContextType, arity: int = 0) -> 'XPathFunction': if isinstance(self, XPathFunction): func = self elif self.symbol in (':', 'Q{') and isinstance(self[1], XPathFunction): func = self[1] elif self.symbol == '(name)': msg = f'unknown function: {self.value}#{arity}' raise self.error('XPST0017', msg) else: item = self.evaluate(context) if not isinstance(item, XPathFunction): msg = f'unknown function: {item}#{arity}' raise self.error('XPST0017', msg) func = item max_args = func.max_args if func.min_args > arity or max_args is not None and max_args < arity: msg = f'unknown function: {func.symbol}#{arity}' raise self.error('XPST0017', msg) return func def validated_value(self, item: Any, cls: Type[Any], promote: Optional[ClassCheckType] = None, index: Optional[int] = None) -> Any: """ Type promotion checking (see "function conversion rules" in XPath 2.0 language definition) """ if isinstance(item, (cls, ValueToken)): return item elif promote and isinstance(item, promote): return cls(item) if self.parser.compatibility_mode: if issubclass(cls, str): return self.string_value(item) elif issubclass(cls, float) or issubclass(float, cls): return self.number_value(item) if issubclass(cls, XPathToken) or self.parser.version == '1.0': code = 'XPTY0004' else: value = self.data_value(item) if isinstance(value, cls): return value elif isinstance(value, AnyURI) and issubclass(cls, str): return cls(value) elif isinstance(value, UntypedAtomic): try: return cls(value) except (TypeError, ValueError): pass if value is None or not value and isinstance(value, list): code = 'FOTY0012' else: code = 'XPTY0004' if index is None: msg = f"item type is {type(item)!r} instead of {cls!r}" else: msg = f"{ordinal(index+1)} argument has type {type(item)!r} instead of {cls!r}" raise self.error(code, msg) def atomize_item(self, item: ValueType) -> Iterator[AtomicType]: """ Atomization of a sequence item. Yields typed values, as computed by fn:data(). Ref: https://www.w3.org/TR/xpath31/#id-atomization https://www.w3.org/TR/xpath20/#dt-typed-value """ if item is None: return elif isinstance(item, XPathNode): value = None for value in item.iter_typed_values: yield value if value is None: msg = f"argument node {item!r} does not have a typed value" raise self.error('FOTY0012', msg) elif isinstance(item, list): for v in item: yield from self.atomize_item(v) elif isinstance(item, XPathFunction): if not isinstance(item, XPathArray): raise self.error('FOTY0013', f"{item.label!r} has no typed value") for v in item.iter_flatten(): if isinstance(v, AnyAtomicType): yield v else: yield from self.atomize_item(v) elif isinstance(item, AnyAtomicType): yield cast(AtomicType, item) elif isinstance(item, bytes): yield item.decode() else: msg = f"sequence item {item!r} is not appropriate for the context" raise self.error('XPTY0004', msg) def atomization(self, context: ContextType = None) -> Iterator[AtomicType]: """ Helper method for value atomization of a sequence. Ref: https://www.w3.org/TR/xpath31/#id-atomization :param context: the XPath dynamic context. """ for item in self.select(context): yield from self.atomize_item(item) def get_atomized_operand(self, context: ContextType = None) -> Optional[AtomicType]: """ Get the atomized value for an XPath operator. :param context: the XPath dynamic context. :return: the atomized value of a single length sequence or `None` if the sequence is empty. """ value = None first = True for value in self.atomization(context): if not first: msg = "atomized operand is a sequence of length greater than one" raise self.error('XPTY0004', msg) first = False else: if isinstance(value, UntypedAtomic): return str(value) else: return value def iter_comparison_data(self, context: ContextType) -> Iterator[Any]: """ Generates comparison data couples for the general comparison of sequences. Different sequences maybe generated with an XPath 2.0 parser, depending on compatibility mode setting. Ref: https://www.w3.org/TR/xpath20/#id-general-comparisons :param context: the XPath dynamic context. """ left_values: Any right_values: Any if self.parser.compatibility_mode: left_values = [x for x in self._items[0].atomization(copy(context))] right_values = [x for x in self._items[1].atomization(copy(context))] # Boolean comparison if one of the results is a single boolean value (1.) try: if isinstance(left_values[0], bool): if len(left_values) == 1: yield left_values[0], self.boolean_value(right_values) return if isinstance(right_values[0], bool): if len(right_values) == 1: yield self.boolean_value(left_values), right_values[0] return except IndexError: return # Converts to float for lesser-greater operators (3.) if self.symbol in ('<', '<=', '>', '>='): yield from product(map(float, left_values), map(float, right_values)) return elif self.parser.version == '1.0': yield from product(left_values, right_values) return else: left_values = self._items[0].atomization(copy(context)) right_values = self._items[1].atomization(copy(context)) for values in product(left_values, right_values): if any(isinstance(x, bool) for x in values): if any(isinstance(x, (str, Integer)) for x in values): msg = "cannot compare {!r} and {!r}" raise TypeError(msg.format(type(values[0]), type(values[1]))) elif any(isinstance(x, Integer) for x in values) and \ any(isinstance(x, str) for x in values): msg = "cannot compare {!r} and {!r}" raise TypeError(msg.format(type(values[0]), type(values[1]))) elif any(isinstance(x, float) for x in values): if isinstance(values[0], decimal.Decimal): yield float(values[0]), values[1] continue elif isinstance(values[1], decimal.Decimal): yield values[0], float(values[1]) continue yield values def select_results(self, context: ContextType) -> Iterator[_ResultType]: """ Generates formatted XPath results. :param context: the XPath dynamic context. """ if context is None: yield from cast(Iterator[AtomicType], self.select(context)) else: self.parser.check_variables(context.variables) for result in self.select(context): if not isinstance(result, XPathNode): yield cast(Union[AtomicType, XPathFunction], result) elif isinstance(result, NamespaceNode): if self.parser.compatibility_mode: yield result.prefix, result.uri else: yield result.uri elif isinstance(result, DocumentNode): if result.is_extended: # cannot represent with an ElementTree: yield the document node yield result elif result is context.root or result is not context.document: yield result.obj else: yield result.obj def get_results(self, context: ContextType) \ -> Union[List[_ResultType], AtomicType]: """ Returns results formatted according to XPath specifications. :param context: the XPath dynamic context. :return: a list or a simple datatype when the result is a single simple type \ generated by a literal or function token. """ results: List[_ResultType] item = None if context is None: results = [x for x in cast(Iterator[AtomicType], self.select(context))] else: self.parser.check_variables(context.variables) results = [] for item in self.select(context): if not isinstance(item, XPathNode): results.append(item) elif isinstance(item, NamespaceNode): if self.parser.compatibility_mode: results.append((item.prefix, item.uri)) else: results.append(item.uri) elif isinstance(item, DocumentNode): if item.is_extended: results.append(item) elif item is not context.document or item is context.root: results.append(item.obj) else: results.append(item.obj) if len(results) == 1 and not isinstance(item, (ElementNode, DocumentNode)): if isinstance(item, (bool, int, float, Decimal)): return item elif self.label in ('function', 'literal'): return cast(AtomicType, results[0]) return results def get_operands(self, context: ContextType, cls: Optional[Type[Any]] = None) -> Any: """ Returns the operands for a binary operator. Float arguments are converted to decimal if the other argument is a `Decimal` instance. :param context: the XPath dynamic context. :param cls: if a type is provided performs a type checking on item. :return: a couple of values representing the operands. If any operand \ is not available returns a `(None, None)` couple. """ op1 = self.get_argument(context, cls=cls) if op1 is None: return None, None elif isinstance(op1, ElementNode): op1 = self._items[0].data_value(op1) op2 = self.get_argument(context, index=1, cls=cls) if op2 is None: return None, None elif isinstance(op2, ElementNode): op2 = self._items[1].data_value(op2) if isinstance(op1, AbstractDateTime) and isinstance(op2, AbstractDateTime): if context is not None and context.timezone is not None: if op1.tzinfo is None: op1.tzinfo = context.timezone if op2.tzinfo is None: op2.tzinfo = context.timezone else: if isinstance(op1, UntypedAtomic): op1 = self.cast_to_double(op1.value) if isinstance(op2, Decimal): return op1, float(op2) if isinstance(op2, UntypedAtomic): op2 = self.cast_to_double(op2.value) if isinstance(op1, Decimal): return float(op1), op2 if isinstance(op1, float): if isinstance(op2, Duration): return Decimal(op1), op2 if isinstance(op2, Decimal): return op1, type(op1)(op2) if isinstance(op2, float): if isinstance(op1, Duration): return op1, Decimal(op2) if isinstance(op1, Decimal): return type(op2)(op1), op2 return op1, op2 def get_absolute_uri(self, uri: str, base_uri: Optional[str] = None) -> str: """ Obtains an absolute URI from the argument and the static context. :param uri: a string representing a URI. :param base_uri: an alternative base URI, otherwise the base_uri \ of the static context is used. :returns: the argument if it's an absolute URI, otherwise returns the URI obtained by the join o the base_uri of the static context with the argument. Returns the argument if the base_uri is `None`. """ if not base_uri: base_uri = self.parser.base_uri uri_parts: urllib.parse.ParseResult = urllib.parse.urlparse(uri) if uri_parts.scheme or uri_parts.netloc or base_uri is None: return uri base_uri_parts: urllib.parse.SplitResult = urllib.parse.urlsplit(base_uri) if base_uri_parts.fragment or not base_uri_parts.scheme and \ not base_uri_parts.netloc and not base_uri_parts.path.startswith('/'): raise self.error('FORG0002', '{!r} is not suitable as base URI'.format(base_uri)) if uri_parts.path.startswith('/') and base_uri_parts.path not in ('', '/'): return uri return urllib.parse.urljoin(base_uri, uri) def get_namespace(self, prefix: str) -> str: """ Resolves a prefix to a namespace raising an error (FONS0004) if the prefix is not found in the namespace map. """ try: return self.parser.namespaces[prefix] except KeyError as err: msg = 'no namespace found for prefix %r' % str(err) raise self.error('FONS0004', msg) from None def bind_namespace(self, namespace: str) -> None: """ Bind a token with a namespace. The token has to be a name, a name wildcard, a function or a constructor, otherwise a syntax error is raised. Functions and constructors must be limited to their namespaces. """ if self.symbol in ('(name)', '*') or isinstance(self, ProxyToken): pass elif namespace == self.parser.function_namespace: if self.label != 'function': msg = "a name, a wildcard or a function expected" raise self.wrong_syntax(msg, code='XPST0017') elif isinstance(self.label, MultiLabel): self.label = 'function' elif namespace == XSD_NAMESPACE: if self.label != 'constructor function': msg = "a name, a wildcard or a constructor function expected" raise self.wrong_syntax(msg, code='XPST0017') elif isinstance(self.label, MultiLabel): self.label = 'constructor function' elif namespace == XPATH_MATH_FUNCTIONS_NAMESPACE: if self.label != 'math function': msg = "a name, a wildcard or a math function expected" raise self.wrong_syntax(msg, code='XPST0017') elif isinstance(self.label, MultiLabel): self.label = 'math function' elif not self.label.endswith('function'): msg = "a name, a wildcard or a function expected" raise self.wrong_syntax(msg, code='XPST0017') elif self.namespace and namespace != self.namespace: msg = "unmatched namespace" raise self.wrong_syntax(msg, code='XPST0017') self.namespace = namespace def adjust_datetime(self, context: ContextType, cls: Type[AbstractDateTime]) \ -> Emptiable[Union[AbstractDateTime, DayTimeDuration]]: """ XSD datetime adjust function helper. :param context: the XPath dynamic context. :param cls: the XSD datetime subclass to use. :return: an empty list if there is only one argument that is the empty sequence \ or the adjusted XSD datetime instance. """ timezone: Optional[Any] item: Optional[AbstractDateTime] _item: Union[AbstractDateTime, DayTimeDuration] if len(self) == 1: item = self.get_argument(context, cls=cls) if item is None: return [] timezone = getattr(context, 'timezone', None) else: item = self.get_argument(context, cls=cls) timezone = self.get_argument(context, 1, cls=DayTimeDuration) if timezone is not None: try: timezone = Timezone.fromduration(timezone) except ValueError as err: if isinstance(context, XPathSchemaContext): timezone = Timezone.fromduration(DayTimeDuration(0)) else: raise self.error('FODT0003', str(err)) from None if item is None: return [] _item = copy(item) _tzinfo = _item.tzinfo try: if _tzinfo is not None and timezone is not None: if isinstance(_item, DateTime10): _item += timezone.offset elif not isinstance(item, Date10): _item += timezone.offset - _tzinfo.offset elif timezone.offset < _tzinfo.offset: _item -= timezone.offset - _tzinfo.offset _item -= DayTimeDuration.fromstring('P1D') except OverflowError as err: if isinstance(context, XPathSchemaContext): return _item raise self.error('FODT0001', str(err)) from None if not isinstance(_item, DayTimeDuration): _item.tzinfo = timezone return _item ### # XSD types related methods def cast_to_qname(self, qname: str) -> QName: """Cast a prefixed qname string to a QName object.""" try: if ':' not in qname: return QName(self.parser.namespaces.get(''), qname.strip()) pfx, _ = qname.strip().split(':') return QName(self.parser.namespaces[pfx], qname) except ValueError: msg = 'invalid value {!r} for an xs:QName'.format(qname.strip()) raise self.error('FORG0001', msg) except KeyError as err: raise self.error('FONS0004', 'no namespace found for prefix {}'.format(err)) def cast_to_double(self, value: Union[SupportsFloat, str]) -> float: """Cast a value to xs:double.""" try: if self.parser.xsd_version == '1.0': return cast(float, DoubleProxy10(value)) return cast(float, DoubleProxy(value)) except ValueError as err: raise self.error('FORG0001', str(err)) # str or UntypedAtomic def cast_to_primitive_type(self, obj: Any, type_name: str) -> Any: if obj is None or not type_name.startswith('xs:') or type_name.count(':') != 1: return obj type_name = type_name[3:].rstrip('+*?') token = cast(XPathConstructor, self.parser.symbol_table[type_name](self.parser)) def cast_value(v: Any) -> Any: try: if isinstance(v, (UntypedAtomic, AnyURI)): return token.cast(v) elif isinstance(v, float) or \ isinstance(v, xsd10_atomic_types[XSD_DECIMAL]): if type_name in ('double', 'float'): return token.cast(v) except (ValueError, TypeError): return v else: return v if isinstance(obj, list): return [cast_value(x) for x in obj] else: return cast_value(obj) ### # XPath data accessors base functions def boolean_value(self, obj: Any) -> bool: """ The effective boolean value, as computed by fn:boolean(). """ if isinstance(obj, list): if not obj: return False elif isinstance(obj[0], XPathNode): return True elif len(obj) > 1: message = "effective boolean value is not defined for a sequence " \ "of two or more items not starting with an XPath node." raise self.error('FORG0006', message) else: obj = obj[0] elif isinstance(obj, Iterator): items = obj for k, obj in enumerate(items): if k: message = "effective boolean value is not defined for a sequence " \ "of two or more items not starting with an XPath node." raise self.error('FORG0006', message) elif isinstance(obj, XPathNode): return True else: if obj is items: return False if isinstance(obj, (int, str, UntypedAtomic, AnyURI)): # Include bool return bool(obj) elif isinstance(obj, (float, Decimal)): return False if math.isnan(obj) else bool(obj) elif obj is None: return False elif isinstance(obj, XPathNode): return True else: message = "effective boolean value is not defined for {!r}.".format(type(obj)) raise self.error('FORG0006', message) def data_value(self, obj: Any) -> Optional[AtomicType]: """ Returns the typed value. Raises an error if the atomization of the value produces more than one typed value. """ value = None first = True for value in self.atomize_item(obj): if not first: msg = "atomized value is a sequence of length greater than one" raise self.error('XPTY0004', msg) first = False else: return value def string_value(self, obj: Any) -> str: """ The string value, as computed by fn:string(). """ if obj is None: return '' elif isinstance(obj, XPathNode): return obj.string_value elif isinstance(obj, bool): return 'true' if obj else 'false' elif isinstance(obj, Decimal): value = format(obj, 'f') if '.' in value: return value.rstrip('0').rstrip('.') return value elif isinstance(obj, float): if math.isnan(obj): return 'NaN' elif math.isinf(obj): return str(obj).upper() value = str(obj) if '.' in value: value = value.rstrip('0').rstrip('.') if '+' in value: value = value.replace('+', '') if 'e' in value: return value.upper() return value elif isinstance(obj, XPathFunction): if self.symbol in ('concat', '||'): raise self.error('FOTY0013', f"an argument of {self} is a function") else: raise self.error('FOTY0014', f"{obj.label!r} has no string value") return str(obj) def number_value(self, obj: Any) -> float: """ The numeric value, as computed by fn:number() on each item. Returns a float value. """ try: if isinstance(obj, XPathNode): return get_double(self.string_value(obj), self.parser.xsd_version) else: return get_double(obj, self.parser.xsd_version) except (TypeError, ValueError): return math.nan ### # Error handling helpers and shortcuts def error(self, code: Union[str, QName], message_or_error: Union[None, str, Exception] = None) -> ElementPathError: return xpath_error(code, message_or_error, self, self.parser.namespaces) def expected(self, *symbols: str, message: Optional[str] = None, code: str = 'XPST0003') -> None: if symbols and self.symbol not in symbols: raise self.wrong_syntax(message, code) def unexpected(self, *symbols: str, message: Optional[str] = None, code: str = 'XPST0003') -> None: if not symbols or self.symbol in symbols: raise self.wrong_syntax(message, code) def wrong_syntax(self, message: Optional[str] = None, # type: ignore[override] code: str = 'XPST0003') -> ElementPathError: if self.label == 'function': code = 'XPST0017' if message: return self.error(code, message) error = super(XPathToken, self).wrong_syntax(message) return self.error(code, str(error)) def wrong_value(self, message: Optional[str] = None) -> ElementPathValueError: return cast(ElementPathValueError, self.error('FOCA0002', message)) def wrong_type(self, message: Optional[str] = None) -> ElementPathTypeError: return cast(ElementPathTypeError, self.error('FORG0006', message)) def missing_context(self, message: Optional[str] = None) -> MissingContextError: return cast(MissingContextError, self.error('XPDY0002', message)) class XPathAxis(XPathToken): pattern = r'\b[^\d\W][\w.\-\xb7\u0300-\u036F\u203F\u2040]*(?=\s*\:\:|\s*\(\:.*\:\)\s*\:\:)' label = 'axis' reverse_axis: bool = False def __str__(self) -> str: return f'{self.symbol!r} axis' def nud(self) -> 'XPathAxis': self.parser.advance('::') self.parser.expected_next( '(name)', '*', '{', 'Q{', 'text', 'node', 'document-node', 'comment', 'processing-instruction', 'element', 'attribute', 'schema-attribute', 'schema-element', 'namespace-node', ) self._items[:] = self.parser.expression(rbp=self.rbp), return self @property def source(self) -> str: return '%s::%s' % (self.symbol, self[0].source) class ValueToken(XPathToken): """ A dummy token for encapsulating a value. """ symbol = '(value)' value: AnyAtomicType @property def source(self) -> str: return str(self.value) def evaluate(self, context: ContextType = None) -> AnyAtomicType: return self.value def select(self, context: ContextType = None) -> Iterator[AnyAtomicType]: if isinstance(self.value, list): yield from self.value else: yield self.value class ProxyToken(XPathToken): """ A token class for resolving collisions between other tokens that have the same symbol but are in different namespaces. It also resolves collisions of functions with names. """ def nud(self) -> XPathToken: if self.parser.next_token.symbol not in ('(', '#'): # Not a function call or reference, returns a name. return self.as_name() lookup_name = f'{{{self.namespace or XPATH_FUNCTIONS_NAMESPACE}}}{self.value}' try: token = self.parser.symbol_table[lookup_name](self.parser) except KeyError: if self.namespace == XSD_NAMESPACE: msg = f'unknown constructor function {self.symbol!r}' else: msg = f'unknown function {self.symbol!r}' raise self.error('XPST0017', msg) from None else: if self.parser.next_token.symbol == '#': return token return token.nud() class RootToken(XPathToken): """ A token class that is a proxy for a parsed token tree and act as mediator between the static context (parser) and the dynamic context. """ _token: XPathToken def __init__(self, token: XPathToken) -> None: self._token = token self.parser = token.parser self._items = token._items self.value = token.value self.span = token.span self.symbol = token.symbol self.label = token.label def __repr__(self) -> str: return '%s(token=%r)' % (self.__class__.__name__, self._token) def __str__(self) -> str: return self._token.__str__() @property def tree(self) -> str: return self._token.tree @property def source(self) -> str: return self._token.source @property def position(self) -> Tuple[int, int]: return self._token.position def align_schema(self, context: XPathContext) -> None: if self.parser.schema is None: if (schema := context.schema) is not None: self.parser.schema = schema elif context.schema is None: context.schema = self.parser.schema def select(self, context: ContextType = None) -> Iterator[ItemType]: if context is not None: self.align_schema(context) yield from self._token.select(context) def evaluate(self, context: ContextType = None) -> ValueType: if context is not None: self.align_schema(context) return self._token.evaluate(context) def select_results(self, context: ContextType) -> Iterator[_ResultType]: if context is not None: self.align_schema(context) yield from self._token.select_results(context) def get_results(self, context: ContextType) -> Union[List[_ResultType], AtomicType]: if context is not None: self.align_schema(context) return self._token.get_results(context) class XPathFunction(XPathToken): """ A token for processing XPath functions. """ __name__: str _qname: Optional[QName] = None pattern = r'(? None: super().__init__(parser) if isinstance(nargs, int) and nargs != self.nargs: if nargs < 0: raise self.error('XPST0017', 'number of arguments must be non negative') elif self.nargs is None: self.nargs = nargs elif isinstance(self.nargs, int): raise self.error('XPST0017', 'incongruent number of arguments') elif self.nargs[0] > nargs or self.nargs[1] is not None and self.nargs[1] < nargs: raise self.error('XPST0017', 'incongruent number of arguments') else: self.nargs = nargs def __repr__(self) -> str: qname = self.qname if qname is None: return '<%s object at %#x>' % (self.__class__.__name__, id(self)) elif not isinstance(self.nargs, int): return '' % (qname.qname, id(self)) return '' % (qname.qname, self.nargs, id(self)) def __str__(self) -> str: if self.namespace is None: return f'{self.symbol!r} {self.label}' elif self.namespace == XPATH_FUNCTIONS_NAMESPACE: return f"'fn:{self.symbol}' {self.label}" else: for prefix, uri in self.parser.namespaces.items(): if uri == self.namespace: return f"'{prefix}:{self.symbol}' {self.label}" else: return f"'Q{{{self.namespace}}}{self.symbol}' {self.label}" def __call__(self, *args: FunctionArgType, context: ContextType = None) -> ValueType: self.check_arguments_number(len(args)) context = copy(self.context or context) if self.label == 'partial function': for arg, tk in zip(args, filter(lambda x: x.symbol == '?', self)): if isinstance(arg, XPathToken) and not isinstance(arg, XPathFunction): tk.value = arg.evaluate(context) else: tk.value = self.validated_argument(arg, context) else: self.clear() for arg in args: if isinstance(arg, XPathToken): self._items.append(arg) else: value = self.validated_argument(arg, context) # Accepts and wraps etree elements/documents, useful for external calls. self._items.append(ValueToken(self.parser, value=value)) if any(tk.symbol == '?' and not tk for tk in self._items): self.to_partial_function() return self if isinstance(self.label, MultiLabel): # Disambiguate multi-label tokens if self.namespace == XSD_NAMESPACE and \ 'constructor function' in self.label.values: self.label = 'constructor function' else: for label in self.label.values: if label.endswith('function'): self.label = label break if self.label == 'partial function': result = self._partial_evaluate(context) else: result = self.evaluate(context) return self.validated_result(result) def check_arguments_number(self, nargs: int) -> None: """Check the number of arguments against function arity.""" if self.nargs is None or self.nargs == nargs: pass elif isinstance(self.nargs, tuple): if nargs < self.nargs[0]: raise self.error('XPTY0004', "missing required arguments") elif self.nargs[1] is not None and nargs > self.nargs[1]: raise self.error('XPTY0004', "too many arguments") elif self.nargs > nargs: raise self.error('XPTY0004', "missing required arguments") else: raise self.error('XPTY0004', "too many arguments") def validated_argument(self, arg: FunctionArgType, context: ContextType = None) -> ValueType: def get_arg_item(item: ItemArgType) -> ItemType: if isinstance(item, (XPathNode, XPathFunction, AnyAtomicType)): return item elif not is_etree_document(item) and not is_etree_element(item): raise self.error('XPTY0004', f"unexpected argument type {type(item)}") else: return get_node_tree( cast(Union[ElementProtocol, DocumentProtocol], item), namespaces=self.parser.namespaces, uri=self.parser.base_uri, fragment=None ) if context is not None: return context.get_value(arg, context.namespaces, self.parser.base_uri, None) elif arg is None: return [] elif not isinstance(arg, (list, tuple)): return get_arg_item(arg) return [get_arg_item(x) for x in arg] def validated_result(self, result: ValueType) -> ValueType: if isinstance(result, XPathToken) and result.symbol == '?': return result elif match_sequence_type(result, self.sequence_types[-1], self.parser): return result result = self.cast_to_primitive_type(result, self.sequence_types[-1]) if not match_sequence_type(result, self.sequence_types[-1], self.parser): msg = "{!r} does not match sequence type {}" raise self.error('XPTY0004', msg.format(result, self.sequence_types[-1])) return result @property def source(self) -> str: if self.label in ('sequence type', 'kind test', ''): return '%s(%s)%s' % ( self.symbol, ', '.join(item.source for item in self), self.occurrence or '' ) return '%s(%s)' % (self.symbol, ', '.join(item.source for item in self)) @property def qname(self) -> Optional[QName]: if self._qname is not None: return self._qname elif self.symbol == 'function': return None elif self.label == 'partial function': return None elif not self.namespace: self._qname = QName(None, self.symbol) elif self.namespace == XPATH_FUNCTIONS_NAMESPACE: self._qname = QName(XPATH_FUNCTIONS_NAMESPACE, 'fn:%s' % self.symbol) elif self.namespace == XSD_NAMESPACE: self._qname = QName(XSD_NAMESPACE, 'xs:%s' % self.symbol) elif self.namespace == XPATH_MATH_FUNCTIONS_NAMESPACE: self._qname = QName(XPATH_MATH_FUNCTIONS_NAMESPACE, 'math:%s' % self.symbol) else: for pfx, uri in self.parser.namespaces.items(): if uri == self.namespace: self._qname = QName(uri, f'{pfx}:{self.symbol}') break else: self._qname = QName(self.namespace, self.symbol) return self._qname @property def arity(self) -> int: if isinstance(self.nargs, int): return self.nargs return len(self._items) @property def min_args(self) -> int: if isinstance(self.nargs, int): return self.nargs elif isinstance(self.nargs, (tuple, list)): return self.nargs[0] else: return 0 @property def max_args(self) -> Optional[int]: if isinstance(self.nargs, int): return self.nargs elif isinstance(self.nargs, (tuple, list)): return self.nargs[1] else: return None def is_reference(self) -> int: if not isinstance(self.nargs, int): return False return self.nargs and not len(self._items) def nud(self) -> 'XPathFunction': if not self.parser.parse_arguments: return self code = 'XPST0017' if self.label == 'function' else 'XPST0003' self.parser.advance('(') if self.nargs is None: del self._items[:] if self.parser.next_token.symbol in (')', '(end)'): raise self.error(code, 'at least an argument is required') while True: self.append(self.parser.expression(5)) if self.parser.next_token.symbol != ',': break self.parser.advance() elif self.nargs == 0: if self.parser.next_token.symbol != ')': if self.parser.next_token.symbol != '(end)': raise self.error(code, '%s has no arguments' % str(self)) raise self.parser.next_token.wrong_syntax() self.parser.advance() return self else: if isinstance(self.nargs, (tuple, list)): min_args, max_args = self.nargs else: min_args = max_args = self.nargs k = 0 while k < min_args: if self.parser.next_token.symbol in (')', '(end)'): msg = 'Too few arguments: expected at least %s arguments' % min_args raise self.error('XPST0017', msg if min_args > 1 else msg[:-1]) self._items[k:] = self.parser.expression(5), k += 1 if k < min_args: if self.parser.next_token.symbol == ')': msg = f'{str(self)}: Too few arguments, expected ' \ f'at least {min_args} arguments' raise self.error(code, msg if min_args > 1 else msg[:-1]) self.parser.advance(',') while max_args is None or k < max_args: if self.parser.next_token.symbol == ',': self.parser.advance(',') self._items[k:] = self.parser.expression(5), elif k == 0 and self.parser.next_token.symbol != ')': self._items[k:] = self.parser.expression(5), else: break # pragma: no cover k += 1 if self.parser.next_token.symbol == ',': msg = 'Too many arguments: expected at most %s arguments' % max_args raise self.error(code, msg if max_args != 1 else msg[:-1]) self.parser.advance(')') if any(tk.symbol == '?' and not tk for tk in self._items): self.to_partial_function() return self def match_function_test(self, function_test: _SequenceTypesType, as_argument: bool = False) -> bool: """ Match if function signature satisfies the provided *function_test*. For default return type is covariant and arguments are contravariant. If *as_argument* is `True` the match is inverted. References: https://www.w3.org/TR/xpath-31/#id-function-test https://www.w3.org/TR/xpath-31/#id-sequencetype-subtype """ if isinstance(function_test, (list, tuple)): sequence_types = function_test else: sequence_types = split_function_test(function_test) if not sequence_types or not sequence_types[-1]: return False elif sequence_types[0] == '*': return True signature = [x for x in self.sequence_types[:self.arity]] signature.append(self.sequence_types[-1]) if len(sequence_types) != len(signature): return False if as_argument: iterator = zip(sequence_types[:-1], signature[:-1]) else: iterator = zip(signature[:-1], sequence_types[:-1]) # compare sequence types for st1, st2 in iterator: if not is_sequence_type_restriction(st1, st2): return False else: st1, st2 = sequence_types[-1], signature[-1] return is_sequence_type_restriction(st1, st2) def to_partial_function(self) -> None: """Convert an XPath function to a partial function.""" nargs = len([tk and not tk for tk in self._items if tk.symbol == '?']) assert nargs, "a partial function requires at least a placeholder token" if self.label != 'partial function': def evaluate(context: ContextType = None) -> 'XPathFunction': return self def select(context: ContextType = None) -> Iterator['XPathFunction']: yield self if self.__class__.evaluate is not XPathToken.evaluate: setattr(self, '_partial_evaluate', self.evaluate) if self.__class__.select is not XPathToken.select: setattr(self, '_partial_select', self.select) setattr(self, 'evaluate', evaluate) setattr(self, 'select', select) self._qname = None self.label = 'partial function' self.nargs = nargs def as_function(self) -> Callable[..., Any]: """ Wraps the XPath function instance into a standard function. """ def wrapper(*args: FunctionArgType, context: ContextType = None) -> ValueType: return self.__call__(*args, context=context) qname = self.qname if self.is_reference(): ref_part = f'#{self.nargs}' else: ref_part = '' if qname is None: name = f'' else: name = f'<{qname.qname}{ref_part}>' wrapper.__name__ = name wrapper.__qualname__ = wrapper.__qualname__[:-7] + name return wrapper def _partial_evaluate(self, context: ContextType = None) -> Any: return [x for x in self._partial_select(context)] def _partial_select(self, context: ContextType = None) -> Iterator[Any]: item = self._partial_evaluate(context) if item is not None: if isinstance(item, list): yield from item else: if context is not None: context.item = item yield item class XPathConstructor(XPathFunction): """ A token for processing XPath 2.0+ constructors. """ @staticmethod def cast(value: Any) -> AtomicType: raise NotImplementedError() class XPathMap(XPathFunction): """ A token for processing XPath 3.1+ maps. Map instances have the double role of tokens and of dictionaries, depending on the way that are created (using a map constructor or a function). The map is fully set after the protected attribute _map is evaluated from tokens or initialized from arguments. """ symbol = 'map' label = 'map' pattern = r'(? None: super().__init__(parser) self._values = [] if items is not None: _items = items.items() if isinstance(items, dict) else items _map: _MapDictType = {} for k, v in _items: if k is None: raise self.error('XPTY0004', 'missing key value') elif isinstance(k, float) and math.isnan(k): if self._nan_key is False: raise self.error('XQDY0137') self._nan_key, _map[None] = k, v continue elif k in _map: raise self.error('XQDY0137') if isinstance(v, list): _map[k] = v[0] if len(v) == 1 else v else: _map[k] = v self._map = _map def __repr__(self) -> str: if self._map is not None: return f'<{self.__class__.__name__} object at {hex(id(self))}>' return "<{} object (not evaluated constructor) at {}>".format( self.__class__.__name__, hex(id(self)) ) def __str__(self) -> str: if self._map is None: return f'not evaluated map constructor with {len(self._items)} entries' return f'map{self._map}' def __len__(self) -> int: if self._map is None: return len(self._items) return len(self._map) def __eq__(self, other: Any) -> bool: if isinstance(other, XPathMap): if self._map is None or other._map is None: raise ElementPathValueError("cannot compare not evaluated maps") return self._map == other._map return NotImplemented def nud(self) -> 'XPathMap': self.parser.advance('{') del self._items[:] if self.parser.next_token.symbol not in ('}', '(end)'): while True: key = self.parser.expression(5) self._items.append(key) if self.parser.token.symbol != ':': self.parser.advance(':') self._values.append(self.parser.expression(5)) if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('}') return self @property def source(self) -> str: if self._map is None: items = ', '.join(f'{tk.source}:{tv.source}' for tk, tv in zip(self, self._values)) else: items = ', '.join(f'{k!r}:{v!r}' for k, v in self._map.items()) return f'map{{{items}}}' def evaluate(self, context: ContextType = None) -> 'XPathMap': if self._map is not None: return self return XPathMap( parser=self.parser, items=( (k.get_atomized_operand(context), v.evaluate(context)) for k, v in zip(self._items, self._values) ) ) def _evaluate(self, context: ContextType = None) -> _MapDictType: _map: _MapDictType = {} nan_key: Union[bool, float] = False for key, value in zip(self._items, self._values): k = key.get_atomized_operand(context) if k is None: raise self.error('XPTY0004', 'missing key value') elif isinstance(k, float) and math.isnan(k): if nan_key is not False: raise self.error('XQDY0137') nan_key, _map[None] = k, value.evaluate(context) continue elif k in _map: raise self.error('XQDY0137') v = value.evaluate(context) if isinstance(v, list): _map[k] = v[0] if len(v) == 1 else v else: _map[k] = v self._nan_key = nan_key return _map def __call__(self, *args: FunctionArgType, context: ContextType = None) -> ValueType: if len(args) == 1 and isinstance(args[0], list) and len(args[0]) == 1: args = args[0][0], if len(args) != 1 or not isinstance(args[0], AnyAtomicType): if isinstance(context, XPathSchemaContext): return [] raise self.error('XPST0003', 'exactly one atomic argument is expected') _map: _MapDictType key = args[0] if self._map is not None: _map = self._map else: _map = self._evaluate(context) try: if isinstance(key, float) and math.isnan(key): return _map[None] else: return _map[key] except KeyError: return [] def keys(self, context: ContextType = None) -> List[AtomicType]: if self._map is not None: return [self._nan_key if k is None else k for k in self._map.keys()] return [self._nan_key if k is None else k for k in self._evaluate(context).keys()] def values(self, context: ContextType = None) -> List[ValueType]: if self._map is not None: return [v for v in self._map.values()] return [v for v in self._evaluate(context).values()] def items(self, context: ContextType = None) -> List[Tuple[AtomicType, ValueType]]: if self._map is not None: _map = self._map else: _map = self._evaluate(context) return [(self._nan_key, v) if k is None else (k, v) for k, v in _map.items()] def match_function_test(self, function_test: _SequenceTypesType, as_argument: bool = False) -> bool: if isinstance(function_test, (list, tuple)): sequence_types = function_test else: sequence_types = split_function_test(function_test) if not sequence_types or not sequence_types[-1]: return False elif sequence_types[0] == '*': return True elif len(sequence_types) != 2: return False key_st, value_st = sequence_types if key_st.endswith(('+', '*')): return False elif value_st != 'empty-sequence()' and not value_st.endswith(('?', '*')): return False else: return any(match_sequence_type(k, key_st, self.parser, False) and match_sequence_type(v, value_st, self.parser) for k, v in self.items()) class XPathArray(XPathFunction): """ A token for processing XPath 3.1+ arrays. """ symbol = 'array' label = 'array' pattern = r'(? None: if items is not None: self._array = [x for x in items] super().__init__(parser) def __repr__(self) -> str: if self._array is not None: return f'<{self.__class__.__name__} object at {hex(id(self))}>' return "<{} object (not evaluated constructor) at {}>".format( self.__class__.__name__, hex(id(self)) ) def __str__(self) -> str: if self._array is not None: return str(self._array) items_desc = f'{len(self)} items' if len(self) != 1 else '1 item' if self.symbol == 'array': return f'not evaluated curly array constructor with {items_desc}' return f'not evaluated square array constructor with {items_desc}' def __len__(self) -> int: if self._array is None: return len(self._items) return len(self._array) def __eq__(self, other: Any) -> bool: if isinstance(other, XPathArray): if self._array is None or other._array is None: raise ElementPathValueError("cannot compare not evaluated arrays") return self._array == other._array return NotImplemented @property def source(self) -> str: if self._array is None: items = ', '.join(f'{tk.source}' for tk in self) else: items = ', '.join(f'{v!r}' for v in self._array) return f'array{{{items}}}' if self.symbol == 'array' else f'[{items}]' def nud(self) -> 'XPathArray': self.parser.advance('{') del self._items[:] if self.parser.next_token.symbol not in ('}', '(end)'): while True: self._items.append(self.parser.expression(5)) if self.parser.next_token.symbol != ',': break self.parser.advance() self.parser.advance('}') return self def evaluate(self, context: ContextType = None) -> 'XPathArray': if self._array is not None: return self return XPathArray(self.parser, items=self._evaluate(context)) def _evaluate(self, context: ContextType = None) -> List[ValueType]: if self.symbol == 'array': # A comma in a curly array constructor is the comma operator, not a delimiter. items: List[ValueType] = [] for tk in self._items: items.extend(tk.select(context)) return items else: return [tk.evaluate(context) for tk in self._items] def __call__(self, *args: FunctionArgType, context: ContextType = None) -> ValueType: if len(args) != 1 or not isinstance(args[0], int): raise self.error('XPTY0004', 'exactly one xs:integer argument is expected') position = args[0] if position <= 0: raise self.error('FOAY0001') if self._array is not None: items = self._array else: items = self._evaluate(context) try: return items[position - 1] except IndexError: raise self.error('FOAY0001') def items(self, context: ContextType = None) -> List[ValueType]: if self._array is not None: return self._array.copy() return self._evaluate(context) def iter_flatten(self, context: ContextType = None) -> Iterator[ItemType]: if self._array is not None: items = self._array else: items = self._evaluate(context) for item in items: if isinstance(item, XPathArray): yield from item.iter_flatten(context) elif isinstance(item, list): yield from item else: yield item def match_function_test(self, function_test: _SequenceTypesType, as_argument: bool = False) -> bool: if isinstance(function_test, (list, tuple)): sequence_types = function_test else: sequence_types = split_function_test(function_test) if not sequence_types or not sequence_types[-1]: return False elif sequence_types[0] == '*': return True elif len(sequence_types) != 2: return False index_type, value_type = sequence_types if index_type.endswith(('+', '*')): return False return match_sequence_type(1, index_type) and \ all(match_sequence_type(v, value_type, self.parser) for v in self.items()) sissaschool-elementpath-d3688c7/mypy.ini000066400000000000000000000000371476131650400204260ustar00rootroot00000000000000[mypy] show_error_codes = True sissaschool-elementpath-d3688c7/profiling/000077500000000000000000000000001476131650400207205ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/profiling/memray_node_tree.py000077500000000000000000000103541476131650400246160ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # if __name__ == '__main__': import argparse import pathlib import memray import xml.etree.ElementTree as ElementTree from elementpath import DocumentNode, ElementNode, \ CommentNode, ProcessingInstructionNode, TextNode def get_element_tree(source): return ElementTree.XML(source) parser = argparse.ArgumentParser() parser.add_argument('--depth', type=int, default=7, help="the depth of the test XML tree (7 for default)") parser.add_argument('--children', type=int, default=3, help="the number of children for each element (3 for default)") params = parser.parse_args() print('*' * 64) print("*** Memory usage estimation of XPath node trees using memray ***") print('*' * 64) print() chunk = 'lorem ipsum' for k in range(params.depth - 1, 0, -1): chunk = f'{chunk}' * params.children xml_source = f'{chunk}' label = f'{params.depth}x{params.children}' outdir = pathlib.Path(__file__).parent.joinpath('out/') et_file = outdir.joinpath(f'memray-element-tree-{label}.bin') nt_file = outdir.joinpath(f'memray-node-tree-{label}.bin') if et_file.is_file(): et_file.unlink() with memray.Tracker(et_file, memory_interval_ms=1, follow_fork=True): root = get_element_tree(xml_source) if nt_file.is_file(): nt_file.unlink() with memray.Tracker(nt_file, follow_fork=True): namespaces = None position = 1 def build_element_node() -> ElementNode: global position node = ElementNode(elem, parent, position, nsmap) position += 1 position += len(nsmap) if 'xml' in nsmap else len(nsmap) + 1 position += len(elem.attrib) if elem.text is not None: node.children.append(TextNode(elem.text, node, position)) position += 1 return node # Common nsmap nsmap = {} if namespaces is None else dict(namespaces) if hasattr(root, 'parse'): root_node = parent = DocumentNode(root, position) position += 1 elem = root.getroot() child = build_element_node() parent.children.append(child) parent = child else: elem = root parent = None root_node = parent = build_element_node() # elements = {elem: parent} # Enable for building a reverse map elem -> node children = iter(elem) iterators = [] ancestors = [] while True: for elem in children: if not callable(elem.tag): child = build_element_node() elif elem.tag.__name__ == 'Comment': # type: ignore[attr-defined] child = CommentNode(elem, parent, position) position += 1 else: child = ProcessingInstructionNode(elem, parent, position) # elements[elem] = child parent.children.append(child) if elem.tail is not None: parent.children.append(TextNode(elem.tail, parent, position)) position += 1 if len(elem): ancestors.append(parent) parent = child iterators.append(children) children = iter(elem) break else: try: children, parent = iterators.pop(), ancestors.pop() except IndexError: break print(f"Number of elements: {len(list(root.iter()))}") print(f"Number of nodes: {len(list(root_node.iter()))}") element_nodes = list(x for x in root_node.iter() if isinstance(x, ElementNode)) print(f"Number of element nodes: {len(element_nodes)}") sissaschool-elementpath-d3688c7/profiling/profile_character_classes.py000077500000000000000000000026351476131650400264740ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath.regex import CharacterClass def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def character_class_objects(): return [CharacterClass(r'\c') for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of CharacterClass class ***") print("***" + ' ' * 56 + "***") print("*** Note: save ~15% of memory with __slots__ (from v2.2.3) ***") print('*' * 62) print() character_class_objects() character_class = CharacterClass(r'\c') character_class -= CharacterClass(r'\i') SETUP = 'from __main__ import character_class' NUMBER = 10000 run_timeit('"9" in character_class # True ', SETUP, NUMBER) run_timeit('"q" in character_class # False', SETUP, NUMBER) run_timeit('8256 in character_class # True ', SETUP, NUMBER) run_timeit('8257 in character_class # False', SETUP, NUMBER) sissaschool-elementpath-d3688c7/profiling/profile_null_values.py000077500000000000000000000051551476131650400253540ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath import XPath1Parser from elementpath.xpath_tokens import ValueToken def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def xpath_none_null_values(): null_values = [None for _ in range(50000)] return null_values @profile def xpath_list_null_values(): null_values = [[] for _ in range(50000)] return null_values @profile def xpath_tuple_null_values(): null_values = [() for _ in range(50000)] return null_values def is_empty_sequence(s): return not s and isinstance(s, list) if __name__ == '__main__': print('*' * 68) print("*** Memory and timing profile of XPath null values alternatives ***") print('*' * 68) print() NUMBER = 1000 SETUP = 'from __main__ import obj1, obj2, is_empty_sequence' obj1 = [] obj2 = ['foo', 'bar'] print("*** Profile evaluation ***\n") run_timeit('[None for _ in range(10000)]', number=NUMBER) run_timeit('[[] for _ in range(10000)]', number=NUMBER) run_timeit('[() for _ in range(10000)]', number=NUMBER) print() run_timeit('for _ in range(10000): obj1 is None', SETUP, NUMBER) run_timeit('for _ in range(10000): obj1 == []', SETUP, NUMBER) run_timeit('for _ in range(10000): obj1 == ()', SETUP, NUMBER) run_timeit('for _ in range(10000): not obj1 and isinstance(obj1, list)', SETUP, NUMBER) run_timeit('for _ in range(10000): not obj1 and isinstance(obj1, tuple)', SETUP, NUMBER) run_timeit('for _ in range(10000): is_empty_sequence(obj1)', SETUP, NUMBER) print() run_timeit('for _ in range(10000): obj2 is None', SETUP, NUMBER) run_timeit('for _ in range(10000): obj2 == []', SETUP, NUMBER) run_timeit('for _ in range(10000): obj2 == ()', SETUP, NUMBER) run_timeit('for _ in range(10000): not obj2 and isinstance(obj2, list)', SETUP, NUMBER) run_timeit('for _ in range(10000): not obj2 and isinstance(obj2, tuple)', SETUP, NUMBER) run_timeit('for _ in range(10000): is_empty_sequence(obj2)', SETUP, NUMBER) print() print("*** Profile memory ***\n") xpath_none_null_values() xpath_list_null_values() xpath_tuple_null_values() sissaschool-elementpath-d3688c7/profiling/profile_unicode_subsets.py000077500000000000000000000025421476131650400262160ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath.regex import UNICODE_CATEGORIES, UnicodeSubset def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def unicode_subset_objects(): return [UnicodeSubset('\U00020000-\U0002A6D6') for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of UnicodeSubset class ***") print("***" + ' ' * 56 + "***") print("*** Note: save ~28% of memory with __slots__ (from v2.2.3) ***") print('*' * 62) print() unicode_subset_objects() subset = UNICODE_CATEGORIES['C'] SETUP = 'from __main__ import subset' NUMBER = 10000 run_timeit('1328 in subset # True ', SETUP, NUMBER) run_timeit('1329 in subset # False', SETUP, NUMBER) run_timeit('72165 in subset # True ', SETUP, NUMBER) run_timeit('72872 in subset # False', SETUP, NUMBER) sissaschool-elementpath-d3688c7/profiling/profile_xpath_nodes.py000077500000000000000000000060741476131650400253400ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import sys from timeit import timeit from memory_profiler import profile import lxml.etree as etree from elementpath import XPathNode, build_node_tree from elementpath.etree import PyElementTree def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s ({} times, about {}s each)".format(stmt, seconds, number, seconds/number)) @profile def create_element_tree(source): doc = etree.XML(source) return doc @profile def create_py_element_tree(source): doc = PyElementTree.XML(source) return doc @profile def create_xpath_tree(et_root): node_tree = build_node_tree(et_root) return node_tree # ep2.5 node checking function def is_xpath_node(obj): return isinstance(obj, XPathNode) or \ hasattr(obj, 'tag') and hasattr(obj, 'attrib') and hasattr(obj, 'text') or \ hasattr(obj, 'local_name') and hasattr(obj, 'type') and hasattr(obj, 'name') or \ hasattr(obj, 'getroot') and hasattr(obj, 'parse') and hasattr(obj, 'iter') if __name__ == '__main__': import argparse parser = argparse.ArgumentParser() parser.add_argument('--depth', type=int, default=7, help="the depth of the test XML tree (7 for default)") parser.add_argument('--children', type=int, default=3, help="the number of children for each element (3 for default)") parser.add_argument('--speed', action='store_true', default=False, help="run also speed tests (disabled for default)") params = parser.parse_args() print('*' * 60) print("*** Memory and timing profile of XPath node trees ***") print('*' * 60) print() SETUP = 'from __main__ import root, xpath_tree, build_node_tree, is_xpath_node, XPathNode' NUMBER = 5000 chunk = 'lorem ipsum' for k in range(params.depth - 1, 0, -1): chunk = f'{chunk}' * params.children xml_source = f'{chunk}' root = create_element_tree(xml_source) create_py_element_tree(xml_source) xpath_tree = create_xpath_tree(root) if not params.speed: print('Speed tests skipped ... exit') sys.exit() run_timeit('build_node_tree(root)', SETUP, 100) print() run_timeit('is_xpath_node(root)', SETUP, NUMBER) run_timeit('is_xpath_node(xpath_tree)', SETUP, NUMBER) run_timeit('isinstance(xpath_tree, XPathNode)', SETUP, NUMBER) print() run_timeit('for e in root.iter(): e', SETUP, NUMBER) run_timeit('for e in xpath_tree.iter(): e', SETUP, NUMBER) print() run_timeit('for e in root.iter(): is_xpath_node(e)', SETUP, NUMBER) run_timeit('for e in xpath_tree.iter(): isinstance(e, XPathNode)', SETUP, NUMBER) sissaschool-elementpath-d3688c7/profiling/profile_xpath_parsers.py000077500000000000000000000043561476131650400257100ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath import XPath1Parser, XPath2Parser from elementpath.xpath30 import XPath30Parser def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def xpath1_parser_objects(): return [XPath1Parser() for _ in range(10000)] @profile def xpath2_parser_objects(): return [XPath2Parser() for _ in range(10000)] @profile def xpath30_parser_objects(): return [XPath30Parser() for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of XPathParser1/2/3 classes ***") print("***" + ' ' * 56 + "***") print('*' * 62) print() xpath1_parser_objects() xpath2_parser_objects() xpath30_parser_objects() NUMBER = 10000 SETUP = 'from __main__ import XPath1Parser' run_timeit("XPath1Parser().parse('18 - 9 + 10')", SETUP, NUMBER) run_timeit("XPath1Parser().parse('true()')", SETUP, NUMBER) run_timeit("XPath1Parser().parse('contains(\"foobar\", \"bar\")')", SETUP, NUMBER) run_timeit("XPath1Parser().parse('/A/B/C/D')", SETUP, NUMBER) print() SETUP = 'from __main__ import XPath2Parser' run_timeit("XPath2Parser().parse('18 - 9 + 10')", SETUP, NUMBER) run_timeit("XPath2Parser().parse('true()')", SETUP, NUMBER) run_timeit("XPath2Parser().parse('contains(\"foobar\", \"bar\")')", SETUP, NUMBER) run_timeit("XPath2Parser().parse('/A/B/C/D')", SETUP, NUMBER) print() SETUP = 'from __main__ import XPath30Parser' run_timeit("XPath30Parser().parse('18 - 9 + 10')", SETUP, NUMBER) run_timeit("XPath30Parser().parse('true()')", SETUP, NUMBER) run_timeit("XPath30Parser().parse('contains(\"foobar\", \"bar\")')", SETUP, NUMBER) run_timeit("XPath30Parser().parse('/A/B/C/D')", SETUP, NUMBER) print() sissaschool-elementpath-d3688c7/profiling/profile_xpath_tokens.py000077500000000000000000000044451476131650400255330ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from timeit import timeit from memory_profiler import profile from elementpath import XPath1Parser from elementpath.xpath_tokens import ValueToken def run_timeit(stmt='pass', setup='pass', number=1000): seconds = timeit(stmt, setup=setup, number=number) print("{}: {}s".format(stmt, seconds)) @profile def xpath_token_objects(): true_token = XPath1Parser.symbol_table['true'] return [true_token(parser) for _ in range(10000)] if __name__ == '__main__': print('*' * 62) print("*** Memory and timing profile of XPathToken class ***") print("***" + ' ' * 56 + "***") print("*** Note: save ~34% of memory with __slots__ (from v2.2.3) ***") print('*' * 62) print() parser = XPath1Parser() xpath_token_objects() t1 = parser.parse('18 - 9 + 10') t2 = parser.parse('true()') t3 = parser.parse('contains("foobar", "bar")') NUMBER = 30000 print("*** Profile evaluation ***\n") run_timeit('t1.evaluate() # 19 ', 'from __main__ import t1', NUMBER) run_timeit('t2.evaluate() # True ', 'from __main__ import t2', NUMBER) run_timeit('t3.evaluate() # True ', 'from __main__ import t3', NUMBER) print() print("*** Profile MutableSequence operations ***\n") tk = ValueToken(parser, 1) SETUP = 'from __main__ import tk' run_timeit('tk.extend((1, 2, 3))', SETUP, NUMBER) run_timeit('tk._items.extend((1, 2, 3))', SETUP, NUMBER) print() run_timeit('tk[:] = (1, 2, 3)', SETUP, NUMBER) run_timeit('tk._items[:] = (1, 2, 3)', SETUP, NUMBER) print() run_timeit('tk.append(1); tk.append(2); tk.append(3)', SETUP, NUMBER) run_timeit('tk._items.append(1); tk._items.append(2); tk._items.append(3)', SETUP, NUMBER) print() run_timeit('tk.append(1)', SETUP, NUMBER) run_timeit('tk._items.append(1)', SETUP, NUMBER) print() run_timeit('tk.append(1); tk[0]', SETUP, NUMBER) run_timeit('tk._items.append(1); tk._items[0]', SETUP, NUMBER) sissaschool-elementpath-d3688c7/publiccode.yml000066400000000000000000000045451476131650400215730ustar00rootroot00000000000000# This repository adheres to the publiccode.yml standard by including this # metadata file that makes public software easily discoverable. # More info at https://github.com/italia/publiccode.yml publiccodeYmlVersion: '0.2' name: elementpath url: 'https://github.com/sissaschool/elementpath' landingURL: 'https://github.com/sissaschool/elementpath' releaseDate: '2025-03-03' softwareVersion: v4.8.0 developmentStatus: stable platforms: - linux - windows - mac softwareType: library inputTypes: - text/XML categories: - data-analytics - data-collection maintenance: type: internal contacts: - name: Davide Brunato email: davide.brunato@sissa.it affiliation: 'Scuola Internazionale Superiore di Studi Avanzati' legal: license: MIT mainCopyrightOwner: Scuola Internazionale Superiore di Studi Avanzati repoOwner: Scuola Internazionale Superiore di Studi Avanzati localisation: localisationReady: false availableLanguages: - en it: countryExtensionVersion: '0.2' riuso: codiceIPA: sissa description: en: genericName: elementpath apiDocumentation: 'https://elementpath.readthedocs.io/en/latest/xpath_api.html' documentation: 'https://elementpath.readthedocs.io/en/latest/' shortDescription: >- Python library that provides XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml longDescription: | This is a library for Python 3.8+ that provides XPath 1.0, 2.0, 3.0 and 3.1 selectors for Python's ElementTree XML data structures, both for the standard **ElementTree** library and for the **lxml** library. For lxml this package can be useful for providing XPath 2.0+ selectors, because lxml already has it's own implementation of XPath 1.0. ## Installation and usage You can install the package with _pip_ in a Python 3.7+ environment: ~~~~ pip install elementpath ~~~~ For using it import the package and apply the selectors on ElementTree nodes: ~~~~ >>> import elementpath >>> from xml.etree import ElementTree >>> root = ElementTree.XML('') >>> elementpath.select(root, '/A/B2/\*') [, , ] ~~~~ features: - XPath 1.0, XPath 2.0, XPath 3.0 and XPath 3.1 implementations sissaschool-elementpath-d3688c7/requirements-dev.txt000066400000000000000000000002401476131650400227630ustar00rootroot00000000000000# Requirements for setup a development environment setuptools tox>=4.0 coverage lxml xmlschema>=3.3.2 Sphinx memory-profiler memray flake8 mypy lxml-stubs -e . sissaschool-elementpath-d3688c7/scripts/000077500000000000000000000000001476131650400204165ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/scripts/generate_codepoints.py000077500000000000000000000360601476131650400250210ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2024, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """Codepoints modules generator utility.""" MODULE_HEADER_TEMPLATE = """# # Copyright (c), 2018-{year}, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or https://opensource.org/licenses/MIT. # # @author Davide Brunato # # --- Auto-generated code: don't edit this file --- #""" LIST_TEMPLATE = """ {list_name} = [ {indented_items} ] """ DICT_TEMPLATE = """ {dict_name} = {{ {indented_items} }} """ ### # Unicode versions index: https://www.unicode.org/versions/enumeratedversions.html UNICODE_DATA_BASE_URL = "https://www.unicode.org/Public/" UNICODE_VERSIONS = { '16.0.0': ('16.0.0/ucd/UnicodeData.txt', '16.0.0/ucd/Blocks.txt'), '15.1.0': ('15.1.0/ucd/UnicodeData.txt', '15.1.0/ucd/Blocks.txt'), '15.0.0': ('15.0.0/ucd/UnicodeData.txt', '15.0.0/ucd/Blocks.txt'), '14.0.0': ('14.0.0/ucd/UnicodeData.txt', '14.0.0/ucd/Blocks.txt'), '13.0.0': ('13.0.0/ucd/UnicodeData.txt', '13.0.0/ucd/Blocks.txt'), '12.1.0': ('12.1.0/ucd/UnicodeData.txt', '12.1.0/ucd/Blocks.txt'), '12.0.0': ('12.0.0/ucd/UnicodeData.txt', '12.0.0/ucd/Blocks.txt'), '11.0.0': ('11.0.0/ucd/UnicodeData.txt', '11.0.0/ucd/Blocks.txt'), '10.0.0': ('10.0.0/ucd/UnicodeData.txt', '10.0.0/ucd/Blocks.txt'), '9.0.0': ('9.0.0/ucd/UnicodeData.txt', '9.0.0/ucd/Blocks.txt'), '8.0.0': ('8.0.0/ucd/UnicodeData.txt', '8.0.0/ucd/Blocks.txt'), '7.0.0': ('7.0.0/ucd/UnicodeData.txt', '7.0.0/ucd/Blocks.txt'), '6.3.0': ('6.3.0/ucd/UnicodeData.txt', '6.3.0/ucd/Blocks.txt'), '6.2.0': ('6.2.0/ucd/UnicodeData.txt', '6.2.0/ucd/Blocks.txt'), '6.1.0': ('6.1.0/ucd/UnicodeData.txt', '6.1.0/ucd/Blocks.txt'), '6.0.0': ('6.0.0/ucd/UnicodeData.txt', '6.0.0/ucd/Blocks.txt'), '5.2.0': ('5.2.0/ucd/UnicodeData.txt', '5.2.0/ucd/Blocks.txt'), '5.1.0': ('5.1.0/ucd/UnicodeData.txt', '5.1.0/ucd/Blocks.txt'), '5.0.0': ('5.0.0/ucd/UnicodeData.txt', '5.0.0/ucd/Blocks.txt'), '4.1.0': ('4.1.0/ucd/UnicodeData.txt', '4.1.0/ucd/Blocks.txt'), '4.0.1': ('4.0-Update1/UnicodeData-4.0.1.txt', '4.0-Update1/Blocks-4.0.1.txt'), '4.0.0': ('4.0-Update/UnicodeData-4.0.0.txt', '4.0-Update/Blocks-4.0.0.txt'), '3.2.0': ('3.2-Update/UnicodeData-3.2.0.txt', '3.2-Update/Blocks-3.2.0.txt'), '3.1.1': ('3.2-Update/UnicodeData-3.2.0.txt', '3.2-Update/Blocks-3.2.0.txt'), '3.1.0': ('3.1-Update/UnicodeData-3.1.0.txt', '3.1-Update/Blocks-4.txt'), '3.0.1': ('3.0-Update1/UnicodeData-3.0.1.txt', '3.0-Update/Blocks-3.txt'), '3.0.0': ('3.0-Update/UnicodeData-3.0.0.txt', '3.0-Update/Blocks-3.txt'), '2.1.9': ('2.1-Update4/UnicodeData-2.1.9.txt', '2.1-Update4/Blocks-2.txt'), '2.1.8': ('2.1-Update3/UnicodeData-2.1.8.txt', '2.0-Update/Blocks-1.txt'), '2.1.5': ('2.1-Update2/UnicodeData-2.1.5.txt', '2.0-Update/Blocks-1.txt'), '2.1.2': ('2.1-Update/UnicodeData-2.1.2.txt', '2.0-Update/Blocks-1.txt'), '2.0.0': ('2.0-Update/UnicodeData-2.0.14.txt', '2.0-Update/Blocks-1.txt') } UNICODE_CATEGORIES = ( 'C', 'Cc', 'Cf', 'Cs', 'Co', 'Cn', 'L', 'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'M', 'Mn', 'Mc', 'Me', 'N', 'Nd', 'Nl', 'No', 'P', 'Pc', 'Pd', 'Ps', 'Pe', 'Pi', 'Pf', 'Po', 'S', 'Sm', 'Sc', 'Sk', 'So', 'Z', 'Zs', 'Zl', 'Zp' ) DEFAULT_CATEGORIES_VERSIONS = ['12.1.0', '13.0.0', '14.0.0', '15.0.0', '15.1.0', '16.0.0'] def version_number(value): numbers = value.strip().split('.') if len(numbers) != 3 or any(not x.isdigit() for x in numbers) or \ any(x != str(int(x)) for x in numbers): raise ValueError(f"{value!r} is not a version number") return value.strip() def version_info(versions): assert isinstance(versions, (tuple, list)) if not versions: return "all versions." if len(versions) == 1: return f"version {versions[0]}" return f"versions {', '.join(versions)}." def get_unicode_data_url(version): try: url = UNICODE_VERSIONS[version][0] except KeyError: url = f'{version}/ucd/UnicodeData.txt' return urljoin(UNICODE_DATA_BASE_URL, url) def get_blocks_url(version): try: url = UNICODE_VERSIONS[version][1] except KeyError: url = f'{version}/ucd/Blocks.txt' return urljoin(UNICODE_DATA_BASE_URL, url) def iter_codepoints_with_category(version): if version == unidata_version: # If requested version matches use Python unicodedata library API for cp in range(maxunicode + 1): yield cp, category(chr(cp)) return with urlopen(get_unicode_data_url(version)) as res: prev_cp = -1 for line in res.readlines(): fields = line.split(b';') cp = int(fields[0], 16) cat = fields[2].decode('utf-8') if cp - prev_cp > 1: if fields[1].endswith(b', Last>'): # Ranges of codepoints expressed with First and then Last for x in range(prev_cp + 1, cp): yield x, cat else: # For default is 'Cn' that means 'Other, not assigned' for x in range(prev_cp + 1, cp): yield x, 'Cn' prev_cp = cp yield cp, cat while cp < maxunicode: cp += 1 yield cp, 'Cn' def get_unicodedata_categories(version): """ Extracts Unicode categories information from unicodedata library or from normative raw data. Each category is represented with an ordered list containing code points and code point ranges. :return: a dictionary with category names as keys and lists as values. """ categories = {k: [] for k in UNICODE_CATEGORIES} major_category = 'C' major_start_cp, major_next_cp = 0, 1 minor_category = 'Cc' minor_start_cp, minor_next_cp = 0, 1 for cp, cat in iter_codepoints_with_category(version): if cat[0] != major_category: if cp > major_next_cp: categories[major_category].append((major_start_cp, cp)) else: categories[major_category].append(major_start_cp) major_category = cat[0] major_start_cp, major_next_cp = cp, cp + 1 if cat != minor_category: if cp > minor_next_cp: categories[minor_category].append((minor_start_cp, cp)) else: categories[minor_category].append(minor_start_cp) minor_category = cat minor_start_cp, minor_next_cp = cp, cp + 1 else: if major_next_cp == maxunicode + 1: categories[major_category].append(major_start_cp) else: categories[major_category].append((major_start_cp, maxunicode + 1)) if minor_next_cp == maxunicode + 1: categories[minor_category].append(minor_start_cp) else: categories[minor_category].append((minor_start_cp, maxunicode + 1)) return categories def get_unicodedata_blocks(version): """ Extracts Unicode blocks information from normative raw data. Each block is represented with as string that expresses a range of codepoints for building an UnicodeSubset(). :return: a dictionary with block names as keys and strings as values. """ blocks = {} with urlopen(get_blocks_url(version)) as res: for line in res.readlines(): if line.startswith((b'#', b'\n', b'\t')): continue try: block_range, block_name = line.decode('utf-8').split('; ') except ValueError: # old 2.0 format block_start, block_end, block_name = line.decode('utf-8').split('; ') else: block_start, block_end = block_range.split('..') block_name = block_name.strip() if len(block_start) <= 4: block_start = rf"\u{block_start.rjust(4, '0')}" else: block_start = rf"\U{block_start.rjust(8, '0')}" if len(block_end) <= 4: block_end = rf"\u{block_end.rjust(4, '0')}" else: block_end = rf"\U{block_end.rjust(8, '0')}" if block_name not in blocks: blocks[block_name] = f'{block_start}-{block_end}' else: blocks[block_name] += f'{block_start}-{block_end}' return blocks def generate_unicode_categories_module(module_path, versions): print(f"\nSaving raw Unicode categories to {str(module_path)}") with module_path.open('w') as fp: print(f"Write module header and generate categories map for version {versions[0]} ...") fp.write(MODULE_HEADER_TEMPLATE.format_map({ 'year': datetime.datetime.now().year, })) categories = get_unicodedata_categories(versions[0]) categories_repr = pprint.pformat(categories, compact=True) fp.write(LIST_TEMPLATE.format_map({ 'list_name': 'UNICODE_VERSIONS', 'indented_items': '\n '.join(repr(versions)[1:-1].split('\n')) })) fp.write(DICT_TEMPLATE.format_map({ 'dict_name': 'UNICODE_CATEGORIES', 'indented_items': '\n '.join(categories_repr[1:-1].split('\n')) })) for ver in versions[1:]: print(f" - Generate diff category map for version {ver} ...") base_categories = categories categories = get_unicodedata_categories(ver) categories_diff = {} for k, cps in categories.items(): cps_base = base_categories[k] if cps != cps_base: exclude_cps = [x for x in cps_base if x not in cps] insert_cps = [x for x in cps if x not in cps_base] categories_diff[k] = exclude_cps, insert_cps categories_repr = pprint.pformat(categories_diff, compact=True) fp.write(DICT_TEMPLATE.format_map({ 'dict_name': f"DIFF_CATEGORIES_VER_{ver.replace('.', '_')}", 'indented_items': '\n '.join(categories_repr[1:-1].split('\n')) })) def generate_unicode_blocks_module(module_path, versions): print(f"\nSaving raw Unicode blocks to {str(module_path)}") with module_path.open('w') as fp: print(f"Write module header and generate blocks map for version {versions[0]} ...") fp.write(MODULE_HEADER_TEMPLATE.format_map({ 'year': datetime.datetime.now().year, })) blocks = get_unicodedata_blocks(versions[0]) blocks_repr = pprint.pformat(blocks, compact=True, sort_dicts=False) fp.write(DICT_TEMPLATE.format_map({ 'dict_name': 'UNICODE_BLOCKS_VER_2_0_0', 'indented_items': '\n '.join( blocks_repr[1:-1].replace('\\\\', '\\').split('\n') ) })) for ver in versions[1:]: print(f" - Generate diff blocks map for version {ver} ...") base_blocks = blocks blocks = get_unicodedata_blocks(ver) blocks_removed = [k for k in base_blocks if k not in blocks] blocks_update = {k: v for k, v in blocks.items() if k not in base_blocks or base_blocks[k] != v} if blocks_removed: removed_repr = pprint.pformat(blocks_removed, compact=True) fp.write(LIST_TEMPLATE.format_map({ 'list_name': f"REMOVED_BLOCKS_VER_{ver.replace('.', '_')}", 'indented_items': '\n '.join(removed_repr[1:-1].split('\n')) })) if blocks_update: update_repr = pprint.pformat(blocks_update, compact=True, sort_dicts=False) fp.write(DICT_TEMPLATE.format_map({ 'dict_name': f"UPDATE_BLOCKS_VER_{ver.replace('.', '_')}", 'indented_items': '\n '.join( update_repr[1:-1].replace('\\\\', '\\').split('\n') ) })) if __name__ == '__main__': import argparse import datetime import pathlib import pprint from sys import maxunicode from unicodedata import category, unidata_version from urllib.request import urlopen from urllib.parse import urljoin description = ( "Generate Unicode codepoints modules. Both modules contain dictionaries " "with a compressed representation of the Unicode codepoints, suitable to " "be loaded by the elementpath library using UnicodeSubset class. Multiple " "versions of Unicode database are represented by additional codepoints in " "further dictionaries. For default the generated categories module contains " "the data for supported Python releases and pre-releases. For default the " "generated blocks module includes all Unicode versions (2.0.0+)." ) parser = argparse.ArgumentParser( description=description, usage="%(prog)s [options] dirpath" ) parser.add_argument('-v', '--version', dest='versions', type=version_number, default=[], action='append', help="generates codepoints for specific Unicode version") parser.add_argument('dirpath', type=str, help="directory path for generated modules") args = parser.parse_args() if not args.versions: categories_versions = DEFAULT_CATEGORIES_VERSIONS blocks_versions = list(reversed(UNICODE_VERSIONS)) else: categories_versions = args.versions = sorted(set(args.versions), reverse=False) blocks_versions = list(reversed(args.versions)) print("+++ Generate Unicode categories and blocks modules +++\n") print("Python Unicode data version: {}".format(unidata_version)) ### # Generate Unicode categories module print(f"\nGenerate Unicode Categories for {version_info(args.versions)}") filename = pathlib.Path(args.dirpath).joinpath('unicode_categories.py') if filename.is_file(): confirm = input("Overwrite existing module %r? [Y/Yes to confirm] " % str(filename)) else: confirm = 'Yes' if confirm.strip().upper() not in ('Y', 'YES'): print("\nSkip generation of Unicode categories module ...") else: generate_unicode_categories_module(filename, categories_versions) ### # Generate Unicode blocks module print(f"\nGenerate Unicode Blocks for {version_info(args.versions)}") filename = pathlib.Path(args.dirpath).joinpath('unicode_blocks.py') if filename.is_file(): confirm = input("Overwrite existing module %r? [Y/Yes to confirm] " % str(filename)) else: confirm = 'Yes' if confirm.strip().upper() not in ('Y', 'YES'): print("\nSkip generation of Unicode blocks module ...") else: generate_unicode_blocks_module(filename, blocks_versions) sissaschool-elementpath-d3688c7/setup.py000066400000000000000000000045351476131650400204500ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # from setuptools import setup, find_packages with open("README.rst") as readme: long_description = readme.read() setup( name='elementpath', version='4.8.0', packages=find_packages(include=['elementpath', 'elementpath.*']), package_data={ 'elementpath': ['py.typed'], 'elementpath.validators': ['analyze-string.xsd', 'schema-for-json.xsd'], }, author='Davide Brunato', author_email='brunato@sissa.it', url='https://github.com/sissaschool/elementpath', keywords=['XPath', 'XPath2', 'XPath3', 'XPath31', 'Pratt-parser', 'ElementTree', 'lxml'], license='MIT', license_file='LICENSE', description='XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml', long_description=long_description, python_requires='>=3.8', extras_require={ 'dev': ['tox', 'coverage', 'lxml', 'xmlschema>=3.3.2', 'Sphinx', 'memory-profiler', 'memray', 'flake8', 'mypy', 'lxml-stubs'] }, classifiers=[ 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'Intended Audience :: Information Technology', 'Intended Audience :: Science/Research', 'License :: OSI Approved :: MIT License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3 :: Only', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9', 'Programming Language :: Python :: 3.10', 'Programming Language :: Python :: 3.11', 'Programming Language :: Python :: 3.12', 'Programming Language :: Python :: 3.13', 'Programming Language :: Python :: 3.14', 'Programming Language :: Python :: Implementation :: CPython', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Software Development :: Libraries', 'Topic :: Text Processing :: Markup :: XML', ] ) sissaschool-elementpath-d3688c7/tests/000077500000000000000000000000001476131650400200715ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/tests/__init__.py000066400000000000000000000000001476131650400221700ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/tests/execute_w3c_tests.py000077500000000000000000001705321476131650400241160ustar00rootroot00000000000000#!/usr/bin/env python3 # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Jelte Jansen # @author Davide Brunato # """ Test script for running W3C XPath tests on elementpath. This is a reworking of https://github.com/tjeb/elementpath_w3c_tests project that uses ElementTree for default and collapses the essential parts into only one module. """ import argparse import contextlib import datetime import decimal import re import json import html import math import os import sys import traceback from collections import OrderedDict from itertools import zip_longest from pathlib import Path from urllib.parse import urlsplit from xml.etree import ElementTree import lxml.etree import xmlschema from elementpath import ElementPathError, XPath2Parser, XPathContext, XPathNode, \ CommentNode, ProcessingInstructionNode, get_node_tree from elementpath.namespaces import XML_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE, get_expanded_name from elementpath.xpath_tokens import XPathFunction, XPathMap, XPathArray from elementpath.datatypes import AnyAtomicType from elementpath.sequence_types import is_sequence_type, match_sequence_type from elementpath.xpath31 import XPath31Parser from elementpath.xpath_nodes import EtreeElementNode PY38_PLUS = sys.version_info > (3, 8) XML_EXPANDED_PREFIX = f'{{{XML_NAMESPACE}}}' DEPENDENCY_TYPES = {'spec', 'feature', 'calendar', 'default-language', 'format-integer-sequence', 'language', 'limits', 'xml-version', 'xsd-version', 'unicode-version', 'unicode-normalization-form'} SKIP_TESTS = { 'fn-subsequence__cbcl-subsequence-010', 'fn-subsequence__cbcl-subsequence-011', 'fn-subsequence__cbcl-subsequence-012', 'fn-subsequence__cbcl-subsequence-013', 'fn-subsequence__cbcl-subsequence-014', 'prod-NameTest__NodeTest004', # Unsupported collations 'fn-compare__compare-010', 'fn-substring-after__fn-substring-after-24', 'fn-substring-before__fn-substring-before-24', 'fn-deep-equal__K-SeqDeepEqualFunc-57', 'fn-deep-equal__K-SeqDeepEqualFunc-56', # Unsupported language 'fn-format-integer__format-integer-032', 'fn-format-integer__format-integer-032-fr', 'fn-format-integer__format-integer-052', 'fn-format-integer__format-integer-065', # Processing-instructions (tests on env "auction") 'fn-local-name__fn-local-name-78', 'fn-name__fn-name-28', 'fn-string__fn-string-28', # Require XML 1.1 'fn-codepoints-to-string__K-CodepointToStringFunc-8a', 'fn-codepoints-to-string__K-CodepointToStringFunc-11b', 'fn-codepoints-to-string__K-CodepointToStringFunc-12b', # Require unicode version "7.0" 'fn-lower-case__fn-lower-case-19', 'fn-upper-case__fn-upper-case-19', 'fn-matches.re__re00506', 'fn-matches.re__re00984', # Very large number fault (interpreter crashes or float rounding) 'op-to__RangeExpr-409d', 'fn-format-number__numberformat60a', 'fn-format-number__cbcl-fn-format-number-035', # For XQuery?? 'fn-deep-equal__K2-SeqDeepEqualFunc-43', # includes a '!' symbol # For XP30+ 'fn-root__K-NodeRootFunc-2', # includes a XPath 3.0 fn:generate-id() 'fn-codepoints-to-string__cbcl-codepoints-to-string-021', # Too long ... 'fn-serialize__serialize-xml-015b', # Do not raise, attribute is good 'fn-parse-xml-fragment__parse-xml-fragment-022-st', # conflict with parse-xml-fragment-022 'fn-for-each-pair__fn-for-each-pair-017', # Requires PI and comments parsing 'fn-function-lookup__fn-function-lookup-522', # xs:dateTimeStamp for XSD 1.1 only # Unsupported language (German) 'fn-format-date__format-date-de101', 'fn-format-date__format-date-de102', 'fn-format-date__format-date-de103', 'fn-format-date__format-date-de104', 'fn-format-date__format-date-de105', 'fn-format-date__format-date-de106', 'fn-format-date__format-date-de111', 'fn-format-date__format-date-de112', 'fn-format-date__format-date-de113', 'fn-format-date__format-date-de114', 'fn-format-date__format-date-de115', 'fn-format-date__format-date-de116', # Unicode FULLY-NORMALIZATION not supported in Python's unicodedata 'fn-normalize-unicode__cbcl-fn-normalize-unicode-001', 'fn-normalize-unicode__cbcl-fn-normalize-unicode-006', # 'เจมส์' does not match xs:NCName (maybe due to Python re module limitation) 'prod-CastExpr__K2-SeqExprCast-488', 'prod-CastExpr__K2-SeqExprCast-504', # TODO: unsupported for serialization 'fn-serialize__serialize-xml-110', # TODO: ElementNode serialization with params 'fn-serialize__serialize-html-001b', # HTML 5 'fn-serialize__serialize-html-002b', # HTML 5 # IMHO incorrect tests 'fn-resolve-uri__fn-resolve-uri-9', # URI scheme names are lowercase 'fn-apply__fn-apply-13', # Error code should be err:FOAP0001 'fn-json-doc__json-doc-032', # 0 is not an instance of xs:double 'fn-json-doc__json-doc-033', # 0 (should be -0) is not an instance of xs:double 'fn-function-lookup__fn-function-lookup-764', # Error code should be FOQM0001 } SKIP_XP20 = { 'fn-subsequence__fn-subsequence-mix-args-026', # fn:tail is XP30+ } # Tests that can be run only with lxml.etree LXML_ONLY = { # parse of comments or PIs required 'fn-string__fn-string-30', 'prod-AxisStep__Axes003-4', 'prod-AxisStep__Axes006-4', 'prod-AxisStep__Axes033-4', 'prod-AxisStep__Axes037-2', 'prod-AxisStep__Axes046-2', 'prod-AxisStep__Axes049-2', 'prod-AxisStep__Axes058-2', 'prod-AxisStep__Axes058-3', 'prod-AxisStep__Axes061-1', 'prod-AxisStep__Axes061-2', 'prod-AxisStep__Axes064-2', 'prod-AxisStep__Axes064-3', 'prod-AxisStep__Axes067-2', 'prod-AxisStep__Axes067-3', 'prod-AxisStep__Axes073-1', 'prod-AxisStep__Axes073-2', 'prod-AxisStep__Axes076-4', 'prod-AxisStep__Axes079-4', 'fn-path__path007', 'fn-path__path009', 'fn-generate-id__generate-id-005', 'fn-parse-xml-fragment__parse-xml-fragment-010', # in-scope namespaces required 'prod-AxisStep__Axes118', 'prod-AxisStep__Axes120', 'prod-AxisStep__Axes126', 'fn-resolve-QName__fn-resolve-qname-26', 'fn-in-scope-prefixes__fn-in-scope-prefixes-21', 'fn-in-scope-prefixes__fn-in-scope-prefixes-22', 'fn-in-scope-prefixes__fn-in-scope-prefixes-24', 'fn-in-scope-prefixes__fn-in-scope-prefixes-25', 'fn-in-scope-prefixes__fn-in-scope-prefixes-26', 'fn-innermost__fn-innermost-017', 'fn-innermost__fn-innermost-018', 'fn-innermost__fn-innermost-019', 'fn-innermost__fn-innermost-020', 'fn-innermost__fn-innermost-021', 'fn-outermost__fn-outermost-017', 'fn-outermost__fn-outermost-018', 'fn-outermost__fn-outermost-019', 'fn-outermost__fn-outermost-020', 'fn-outermost__fn-outermost-021', 'fn-outermost__fn-outermost-046', 'fn-local-name__fn-local-name-77', 'fn-local-name__fn-local-name-79', 'fn-name__fn-name-27', 'fn-name__fn-name-29', 'fn-string__fn-string-27', 'fn-format-number__numberformat87', 'fn-format-number__numberformat88', 'fn-path__path010', 'fn-path__path011', 'fn-path__path012', 'fn-path__path013', 'fn-function-lookup__fn-function-lookup-262', 'fn-generate-id__generate-id-007', 'fn-serialize__serialize-xml-012', 'prod-EQName__eqname-018', 'prod-EQName__eqname-023', 'prod-NamedFunctionRef__function-literal-262', 'fn-format-number__numberformat83', # XML declaration 'fn-serialize__serialize-xml-029b', 'fn-serialize__serialize-xml-030b', # require external ENTITY parsing 'fn-parse-xml__parse-xml-010', } USE_SCHEMA_FOR_JSON = { 'fn-json-to-xml__json-to-xml-016', 'fn-json-to-xml__json-to-xml-017', 'fn-json-to-xml__json-to-xml-037', 'fn-json-to-xml__json-to-xml-038', } xpath_parser = XPath2Parser ignore_specs = {'XQ10', 'XQ10+', 'XP30', 'XP30+', 'XQ30', 'XQ30+', 'XP31', 'XP31+', 'XQ31', 'XQ31+', 'XT30+'} QT3_NAMESPACE = "http://www.w3.org/2010/09/qt-fots-catalog" namespaces = {'': QT3_NAMESPACE} @contextlib.contextmanager def working_directory(dirpath): orig_wd = os.getcwd() os.chdir(dirpath) try: yield finally: os.chdir(orig_wd) def get_context_result(item): if isinstance(item, XPathNode): raise TypeError("Unexpected XPath node in external results") elif isinstance(item, (list, tuple)): return [get_context_result(x) for x in item] elif hasattr(item, 'tag'): if callable(item.tag): if item.tag.__name__ == 'Comment': return CommentNode(item) else: return ProcessingInstructionNode(item) elif not hasattr(item, 'getroot'): return item return get_node_tree(root=item) def is_equivalent(t1, t2): if t1 == t2 or html.unescape(t1) == html.unescape(t2): return True try: if decimal.Decimal(t1) != decimal.Decimal(t2): return False except (ValueError, decimal.DecimalException): return False else: return True def etree_is_equal(root1, root2, strict=True): for e1, e2 in zip_longest(root1.iter(), root2.iter()): if e1 is None or e2 is None: return False if e1.tail != e2.tail: if strict or e1.tail is None or e2.tail is None: return False if e1.tail.strip() != e2.tail.strip(): return False if callable(e1.tag) ^ callable(e2.tag): return False elif not callable(e1.tag): if e1.tag != e2.tag: return False if e1.attrib != e2.attrib: if strict: return False attrib1 = e1.attrib attrib2 = e2.attrib if len(attrib1) != len(attrib2): attrib1 = {k: v for k, v in attrib1.items() if not k.startswith(XML_EXPANDED_PREFIX)} attrib2 = {k: v for k, v in attrib2.items() if not k.startswith(XML_EXPANDED_PREFIX)} if len(attrib1) != len(attrib2): return False for (k1, v1), (k2, v2) in zip(attrib1.items(), attrib2.items()): if not is_equivalent(k1, k2) or not is_equivalent(v1, v2): return False if e1.text != e2.text: if strict or e1.text is None or e2.text is None: return False if e1.text.strip() != e2.text.strip(): if not is_equivalent(e1.text, e2.text): return False else: return True class ExecutionError(Exception): """Common class for W3C XPath tests execution script.""" class ParseError(ExecutionError): """Other error generated by XPath expression parsing and static evaluation.""" class EvaluateError(ExecutionError): """Other error generated by XPath token evaluation with dynamic context.""" class Schema(object): """Represents an XSD schema used in XML environment settings.""" def __init__(self, elem): assert elem.tag == '{%s}schema' % QT3_NAMESPACE self.uri = elem.attrib.get('uri') self.file = elem.attrib.get('file') try: self.description = elem.find('description', namespaces).text except AttributeError: self.description = '' self.filepath = self.file and os.path.abspath(self.file) def __repr__(self): return '%s(uri=%r, file=%s)' % (self.__class__.__name__, self.uri, self.file) class Source(object): """Represents an XML source file as used in environment settings.""" namespaces = None def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}source' % QT3_NAMESPACE self.file = elem.attrib['file'] self.role = elem.attrib.get('role', '') self.uri = elem.attrib.get('uri', self.file) if not urlsplit(self.uri).scheme: self.uri = Path(self.uri).absolute().as_uri() self.key = self.role or self.file try: self.description = elem.find('description', namespaces).text except AttributeError: self.description = '' if use_lxml: iterparse = lxml.etree.iterparse parser = lxml.etree.XMLParser(collect_ids=False) try: self.xml = lxml.etree.parse(self.file, parser=parser) except lxml.etree.XMLSyntaxError: self.xml = None else: iterparse = ElementTree.iterparse if PY38_PLUS: tree_builder = ElementTree.TreeBuilder(insert_comments=True, insert_pis=True) parser = ElementTree.XMLParser(target=tree_builder) else: parser = None try: self.xml = ElementTree.parse(self.file, parser=parser) except ElementTree.ParseError: self.xml = None try: self.namespaces = {} dup_index = 1 for _, (prefix, uri) in iterparse(self.file, events=('start-ns',)): if prefix not in self.namespaces: self.namespaces[prefix] = uri elif prefix: self.namespaces[f'{prefix}{dup_index}'] = uri dup_index += 1 else: self.namespaces[f'default{dup_index}'] = uri dup_index += 1 except (ElementTree.ParseError, lxml.etree.XMLSyntaxError): pass def __repr__(self): return '%s(file=%r)' % (self.__class__.__name__, self.file) class Resource(object): """Represents a remote resource used in environment settings.""" def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}resource' % QT3_NAMESPACE self.uri = elem.attrib['uri'] self.file = elem.attrib['file'] self.file_uri = f'file://{os.getcwd()}/{self.file}' self.media_type = elem.get('media-type') self.encoding = elem.get('encoding') class Collection(object): """Represents a collection of source files as used in XML environment settings.""" def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}collection' % QT3_NAMESPACE self.uri = elem.attrib.get('uri') self.query = elem.find('query', namespaces) # Not used (for XQuery) self.sources = [Source(e, use_lxml) for e in elem.iterfind('source', namespaces)] def __repr__(self): return '%s(uri=%r)' % (self.__class__.__name__, self.uri) class Environment(object): """ The XML environment definition for a test case. :param elem: the XML Element that contains the environment definition. :param use_lxml: use lxml.etree for loading XML sources. """ collation = None default_collation = False collection = None schema = None static_base_uri = None decimal_formats = None def __init__(self, elem, use_lxml=False): assert elem.tag == '{%s}environment' % QT3_NAMESPACE self.name = elem.get('name', 'anonymous') self.namespaces = { namespace.attrib['prefix']: namespace.attrib['uri'] for namespace in elem.iterfind('namespace', namespaces) } self.decimal_formats = {} for child in elem.iterfind('decimal-format', namespaces): name = child.get('name') if name is not None and ':' in name: if use_lxml: name = get_expanded_name(name, child.nsmap) else: try: name = get_expanded_name(name, self.namespaces) except KeyError: pass self.decimal_formats[name] = child.attrib child = elem.find('collation', namespaces) if child is not None: self.collation = child.get('uri') self.default_collation = child.get('default') == 'true' child = elem.find('collection', namespaces) if child is not None: self.collection = Collection(child, use_lxml) child = elem.find('schema', namespaces) if child is not None: self.schema = Schema(child) child = elem.find('static-base-uri', namespaces) if child is not None: self.static_base_uri = child.get('uri') self.params = [e.attrib for e in elem.iterfind('param', namespaces)] self.sources = {} for child in elem.iterfind('source', namespaces): source = Source(child, use_lxml) self.sources[source.key] = source self.resources = {} for child in elem.iterfind('resource', namespaces): resource = Resource(child, use_lxml) self.resources[resource.uri] = resource def __repr__(self): return '%s(name=%r)' % (self.__class__.__name__, self.name) def __str__(self): children = [] for prefix, uri in self.namespaces.items(): children.append(''.format(prefix, uri)) if self.schema is not None: children.append(''.format( self.schema.uri or '', self.schema.file or '' )) for role, source in self.sources.items(): children.append(''.format( role, source.uri or '', source.file )) return '\n {}\n'.format( self.name, '\n '.join(children) ) def get_namespaces(self): namespaces_ = self.namespaces.copy() for source in self.sources.values(): if source.namespaces: for pfx, uri in source.namespaces.items(): if pfx not in namespaces_: namespaces_[pfx] = uri return namespaces_ class TestSet(object): """ Represents a test-set as read from the catalog file and the test-set XML file itself. :param elem: the XML Element that contains the test-set definitions. :param pattern: the regex pattern for selecting test-cases to load. :param use_lxml: use lxml.etree for loading environment XML sources. :param environments: the global environments. """ def __init__(self, elem, pattern, use_lxml=False, environments=None): assert elem.tag == '{%s}test-set' % QT3_NAMESPACE self.name = elem.attrib['name'] self.file = elem.attrib['file'] self.environments = {} if environments is None else environments.copy() self.test_cases = [] self.specs = [] self.features = [] self.xsd_version = None self.use_lxml = use_lxml self.etree = lxml.etree if use_lxml else ElementTree full_path = os.path.abspath(self.file) filename = os.path.basename(full_path) self.workdir = os.path.dirname(full_path) with working_directory(self.workdir): xml_root = self.etree.parse(filename).getroot() self.description = xml_root.find('description', namespaces).text for child in xml_root.findall('dependency', namespaces): dep_type = child.attrib['type'] value = child.attrib['value'] if dep_type == 'spec': self.specs.extend(value.split(' ')) elif dep_type == 'feature': if child.get('satisfied', 'true') in ('true', '1'): self.features.append(value) elif dep_type == 'xsd-version': self.xsd_version = value else: print("unexpected dependency type %s for test-set %r" % (dep_type, self.name)) for child in xml_root.findall('environment', namespaces): environment = Environment(child, use_lxml) self.environments[environment.name] = environment test_case_template = self.name + '__%s' for child in xml_root.findall('test-case', namespaces): if pattern.search(test_case_template % child.attrib['name']) is not None: self.test_cases.append(TestCase(child, self, use_lxml)) def __repr__(self): return '%s(name=%r)' % (self.__class__.__name__, self.name) class TestCase(object): """ Represents a test case as read from a test-set file. :param elem: the XML Element that contains the test-case definition. :param test_set: the test-set that the test-case belongs to. :param use_lxml: use lxml.etree for loading environment XML sources. """ # Single value dependencies parser = None calendar = None default_language = None format_integer_sequence = None language = None limits = None unicode_version = None unicode_normalization_form = None xml_version = None def __init__(self, elem, test_set, use_lxml=False): assert elem.tag == '{%s}test-case' % QT3_NAMESPACE self.test_set = test_set self.xsd_version = test_set.xsd_version self.features = [feature for feature in test_set.features] self.use_lxml = use_lxml self.etree = lxml.etree if use_lxml else ElementTree self.name = test_set.name + "__" + elem.attrib['name'] self.description = elem.find('description', namespaces).text self.test = elem.find('test', namespaces).text result_child = elem.find('result', namespaces).find("*") self.result = Result(result_child, test_case=self, use_lxml=use_lxml) self.environment_ref = None self.environment = None self.specs = [] for child in elem.findall('dependency', namespaces): dep_type = child.attrib['type'] value = child.attrib['value'] if dep_type == 'spec': self.specs.extend(value.split(' ')) elif dep_type == 'feature': if child.get('satisfied') == 'false': try: self.features.remove(value) except ValueError: pass else: self.features.append(value) elif dep_type in DEPENDENCY_TYPES: setattr(self, dep_type.replace('-', '_'), value) else: print("unexpected dependency type %s for test-case %r" % (dep_type, self.name)) child = elem.find('environment', namespaces) if child is not None: if 'ref' in child.attrib: self.environment_ref = child.attrib['ref'] else: self.environment = Environment(child, use_lxml) def __repr__(self): return '%s(name=%r)' % (self.__class__.__name__, self.name) def __str__(self): children = [ '{}'.format(self.description or ''), '{}'.format(self.test) if self.test else '', '\n {}\n'.format(self.result), ] if self.environment_ref: children.append(''.format(self.environment_ref)) for dep_type in sorted(DEPENDENCY_TYPES): if dep_type == 'spec': if self.specs: children.extend(''.format(x) for x in self.specs) elif dep_type == 'feature': if self.features: children.extend(''.format(x) for x in self.features) else: value = getattr(self, dep_type.replace('-', '_')) if value is not None: children.append('<{} value="{}"/>'.format(dep_type, value)) return '\n {}\n'.format( self.name, self.test_set_file, '\n '.join('\n'.join(children).split('\n')), ) @property def test_set_file(self): return self.test_set.file def get_environment(self): env_ref = self.environment_ref if env_ref: try: return self.test_set.environments[env_ref] except KeyError: msg = "Unknown environment %s in test case %s" raise ExecutionError(msg % (env_ref, self.name)) from None elif self.environment: return self.environment def run(self, verbose=1): if verbose > 4: print("\n*** Execute test case {!r} ***".format(self.name)) print(str(self)) print() return self.result.validate(verbose) def run_xpath_test(self, verbose=1, with_context=True, with_xpath_nodes=False): """ Helper function to parse and evaluate tests with elementpath. If may_fail is true, raise the exception instead of printing and aborting """ environment = self.get_environment() # Create the parser instance (static context) if environment is None: test_namespaces = static_base_uri = schema_proxy = default_collation = None else: test_namespaces = environment.get_namespaces() static_base_uri = environment.static_base_uri default_collation = None if environment.collation is not None: if environment.default_collation: default_collation = environment.collation if environment.schema is None or not environment.schema.filepath: if self.name in USE_SCHEMA_FOR_JSON: xsd_path = Path(__file__).parent.joinpath('resources/schema-for-json.xsd') schema = xmlschema.XMLSchema(xsd_path) schema_proxy = schema.xpath_proxy else: schema_proxy = None else: if verbose > 2: print("Schema %r required for test %r" % (environment.schema.file, self.name)) schema = xmlschema.XMLSchema(environment.schema.filepath) schema_proxy = schema.xpath_proxy if static_base_uri is None: if self.name == "fn-parse-xml__parse-xml-007": # workaround: static-base-uri() must return AnyURI('') for this case static_base_uri = '' else: base_uri = os.path.dirname(os.path.abspath(self.test_set_file)) if os.path.isdir(base_uri): static_base_uri = f'{Path(base_uri).as_uri()}/' elif environment and static_base_uri in environment.resources: static_base_uri = environment.resources[static_base_uri].file_uri elif static_base_uri == 'http://www.w3.org/fots/unparsed-text/': static_base_uri = f'file://{os.getcwd()}/fn/unparsed-text/' kwargs = dict( namespaces=test_namespaces, xsd_version=self.xsd_version, schema=schema_proxy, base_uri=static_base_uri, compatibility_mode='xpath-1.0-compatibility' in self.features, default_collation=default_collation, ) if environment is not None and xpath_parser.version >= '3.0': if environment.decimal_formats: kwargs['decimal_formats'] = environment.decimal_formats kwargs['defuse_xml'] = False self.parser = xpath_parser(**kwargs) if self.test is None: xpath_expression = None else: xpath_expression = self.test if environment: for uri, resource in environment.resources.items(): if uri in xpath_expression: xpath_expression = xpath_expression.replace(uri, resource.file_uri) try: root_node = self.parser.parse(xpath_expression) # static evaluation except Exception as err: if isinstance(err, ElementPathError): raise raise ParseError(err) # Create the dynamic context if not with_context: context = None elif environment is None: context = XPathContext( root=self.etree.XML(""), namespaces=test_namespaces, schema=self.parser.schema, timezone='Z', default_language=self.default_language, default_calendar=self.calendar, resource_collections={ os.path.abspath(self.test_set_file): [self.test_set_file] } ) else: kwargs = {'timezone': 'Z'} variables = {} documents = {} if '.' in environment.sources: root = environment.sources['.'].xml root_uri = environment.sources['.'].uri else: root = self.etree.XML("") root_uri = None if any(k.startswith('$') for k in environment.sources): variables.update( (k[1:], v.xml) for k, v in environment.sources.items() if k.startswith('$') ) for param in environment.params: name = param['name'] value = xpath_parser().parse(param['select']).evaluate() variables[name] = value for source in environment.sources.values(): documents[source.uri] = source.xml if environment.collection is not None: uri = environment.collection.uri collection = [source.xml for source in environment.collection.sources] if uri is not None: kwargs['collections'] = {uri: collection} if collection: kwargs['default_collection'] = collection if 'non_empty_sequence_collection' in self.features: kwargs['default_resource_collection'] = uri if test_namespaces: kwargs['namespaces'] = test_namespaces if self.parser.schema: kwargs['schema'] = self.parser.schema if variables: kwargs['variables'] = variables if documents: kwargs['documents'] = documents if self.default_language: kwargs['default_language'] = self.default_language if self.calendar: kwargs['default_calendar'] = self.calendar context = XPathContext(root=root, uri=root_uri, **kwargs) try: if with_xpath_nodes: result = root_node.evaluate(context) else: result = root_node.get_results(context) except Exception as err: if isinstance(err, ElementPathError): raise raise EvaluateError(err) if verbose > 4: print("Result of evaluation: {!r}\n".format(result)) return result class Result(object): """ Class for validating the result of a test case. Result instances can be nested for multiple validation options. There are several types of result validators available: * all-of * any-of * assert * assert-count * assert-deep-eq * assert-empty * assert-eq * assert-false * assert-permutation * assert-serialization-error * assert-string-value * assert-true * assert-type * assert-xml * error * not * serialization-matches :param elem: the XML Element that contains the test-case definition. :param test_case: the test-case that the result validator belongs to. """ # Validation helper tokens string_token = XPath31Parser().parse('fn:string($result)') string_join_token = XPath31Parser().parse('fn:string-join($result, " ")') def __init__(self, elem, test_case, use_lxml=False): self.test_case = test_case self.use_lxml = use_lxml self.etree = lxml.etree if use_lxml else ElementTree self.type = elem.tag.split('}')[1] self.value = elem.text self.attrib = {k: v for k, v in elem.attrib.items()} if self.value is None and self.type == 'assert-xml': self.attrib['file'] = os.path.abspath(self.attrib['file']) self.children = [Result(child, test_case) for child in elem.findall('*')] self.validate = getattr(self, '%s_validator' % self.type.replace("-", "_")) def __repr__(self): return '%s(type=%r)' % (self.__class__.__name__, self.type) def __str__(self): attrib = ' '.join('{}="{}"'.format(k, v) for k, v in self.attrib.items()) if self.children: return '<{0} {1}>{2}{3}\n'.format( self.type, attrib, self.value if self.value is not None else '', '\n '.join(str(child) for child in self.children), ) elif self.value is not None: return '<{0} {1}>{2}'.format(self.type, attrib, self.value) else: return '<{} {}/>'.format(self.type, attrib) def report_failure(self, verbose=1, **results): if verbose <= 1: return print(f'Fail for test case {self.test_case.name!r}') print(f'Result failed: {self}') if verbose < 4: print(f'XPath expression: {self.test_case.test.strip()}') else: print() print(self.test_case) if results: print() print_traceback = False max_key = max(len(k) for k in results) for k, v in results.items(): if isinstance(v, Exception): v = "Unexpected {!r}: {}".format(type(v), v) if verbose >= 3: print_traceback = True print(' {}: {}{!r}'.format(k, ' ' * (max_key - len(k)), v)) if print_traceback: print() traceback.print_exc() print() def all_of_validator(self, verbose=1): """Valid if all child result validators are valid.""" assert self.children result = True for child in self.children: if not child.validate(verbose): result = False return result def any_of_validator(self, verbose=1): """Valid if any child result validator is valid.""" assert self.children result = False for child in self.children: if child.validate(): result = True if not result and verbose > 1: for child in self.children: child.validate(verbose) return result def not_validator(self, verbose=1): """Valid if the child result validator is not valid.""" assert len(self.children) == 1 result = not self.children[0].validate() if not result and verbose > 1: self.children[0].validate(verbose) if not result: self.report_failure(verbose, expected=False, result=True) return result def assert_eq_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, list) and len(result) == 1: result = result[0] parser = xpath_parser(xsd_version=self.test_case.xsd_version) root_node = parser.parse(self.value) context = XPathContext(root=self.etree.XML("")) expected_result = root_node.evaluate(context) try: if expected_result == result: return True elif isinstance(expected_result, decimal.Decimal) and isinstance(result, float): if float(expected_result) == result: return True elif decimal.Decimal(expected_result) == decimal.Decimal(result): return True except (TypeError, ValueError, decimal.DecimalException): pass self.report_failure(verbose, expected=expected_result, result=result) return False def assert_type_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, list) and len(result) == 1: result = result[0] parser = xpath_parser(namespaces={'j': XPATH_FUNCTIONS_NAMESPACE}) if self.value == 'function(*)': type_check = isinstance(result, XPathFunction) elif self.value == 'array(*)': type_check = isinstance(result, XPathArray) elif self.value == 'map(*)': type_check = isinstance(result, XPathMap) elif not is_sequence_type(self.value, parser): msg = " test-case {}: {!r} is not a valid sequence type" print(msg.format(self.test_case.name, self.value)) type_check = False else: context_result = get_context_result(result) type_check = match_sequence_type(context_result, self.value, parser) if not type_check: self.report_failure( verbose, expected=self.value, result=result, result_type=type(result) ) return type_check def assert_string_value_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False context = XPathContext(self.etree.XML(""), variables={'result': result}) if isinstance(result, list): value = self.string_join_token.evaluate(context) else: value = self.string_token.evaluate(context) if self.attrib.get('normalize-space'): expected = re.sub(r'\s+', ' ', self.value).strip() value = ' '.join(x.strip() for x in value.split('\n')).strip() else: expected = self.value if not value: if expected is None: return True elif value == expected: return True elif isinstance(expected, str): # workaround for typos in some expected values if expected.strip() == value: return True elif expected.replace('v ;', 'v;') == value: return True if value and ' ' not in value: try: dv = decimal.Decimal(value) if math.isclose(dv, decimal.Decimal(expected), rel_tol=1E-7, abs_tol=0.0): return True except decimal.DecimalException: pass self.report_failure( verbose, expected=expected, string_value=value, xpath_result=result ) return False def error_validator(self, verbose=1): code = self.attrib.get('code', '*').strip() err_traceback = '' try: self.test_case.run_xpath_test(verbose, with_context=code != 'XPDY0002') except ElementPathError as err: if code == '*' or code in str(err): return True if verbose > 3: err_traceback = ''.join(traceback.format_exception(None, err, err.__traceback__)) reason = "Unexpected error {!r}: {}".format(type(err), str(err)) except (ParseError, EvaluateError) as err: if verbose > 3: err_traceback = ''.join(traceback.format_exception(None, err, err.__traceback__)) reason = "Not an elementpath error {!r}: {}".format(type(err), str(err)) else: reason = "Error not raised" self.report_failure(verbose, reason=reason, expected_code=code) if err_traceback: print(err_traceback) return False def assert_true_validator(self, verbose=1): """Valid if the result is `True`.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False else: if result is True or isinstance(result, list) and result and result[0] is True: return True self.report_failure(verbose) return False def assert_false_validator(self, verbose=1): """Valid if the result is `False`.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False else: if result is False or isinstance(result, list) and result and result[0] is False: return True self.report_failure(verbose) return False def assert_count_validator(self, verbose=1): """Valid if the number of items of the result matches.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, (AnyAtomicType, XPathArray, XPathMap)): length = 1 else: try: length = len(result) except TypeError as err: self.report_failure(verbose, error=err) return False if int(self.value) == length: return True self.report_failure( verbose, expected=int(self.value), value=length, xpath_result=result ) return False def assert_validator(self, verbose=1): """ Assert validator contains an XPath expression whose value must be true. The expression may use the variable $result, which is the result of the original test. """ try: result = self.test_case.run_xpath_test(verbose, with_xpath_nodes=True) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False variables = {'result': result} root_node = self.test_case.parser.parse(self.value) context = XPathContext( root=self.etree.XML(""), variables=variables ) if root_node.boolean_value(root_node.evaluate(context)) is True: return True self.report_failure(verbose) return False def assert_deep_eq_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if isinstance(result, list) and len(result) == 1: result = result[0] expression = "fn:deep-equal($result, (%s))" % self.value variables = {'result': result} parser = XPath31Parser(xsd_version=self.test_case.xsd_version) root_node = parser.parse(expression) context = XPathContext(root=self.etree.XML(""), variables=variables) if root_node.evaluate(context) is True: return True self.report_failure(verbose, expected=self.value, result=result) return False def assert_empty_validator(self, verbose=1): """Valid if the result is empty.""" try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False else: if result is None or result == '' or result == [] or result == ['']: return True self.report_failure(verbose, result=result) return False def assert_permutation_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if not isinstance(result, list): result = [result] expected = xpath_parser().parse(self.value).evaluate() if not isinstance(expected, list): expected = [expected] if set(expected) == set(result): return True if len(expected) == len(result): _expected = set(expected) for value in result: if value in _expected: _expected.remove(value) continue elif not isinstance(value, (float, decimal.Decimal)): self.report_failure(verbose, result=result, expected=expected) return False dv = decimal.Decimal(value) for ev in _expected: if not isinstance(ev, (float, decimal.Decimal)): continue elif math.isnan(ev) and math.isnan(dv): _expected.remove(ev) break elif math.isclose(dv, decimal.Decimal(ev), rel_tol=1E-7, abs_tol=0.0): _expected.remove(ev) break else: self.report_failure(verbose, result=result, expected=expected) return False return True self.report_failure(verbose, result=result, expected=expected) return False def assert_serialization_error_validator(self, verbose=1): # TODO: this currently succeeds on any error try: self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError): return True else: return False def assert_xml_validator(self, verbose=1): try: if self.test_case.test_set.name == 'fn-parse-xml': with working_directory(self.test_case.test_set.workdir): result = self.test_case.run_xpath_test(verbose) else: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError) as err: self.report_failure(verbose, error=err) return False if result is None: return False if self.use_lxml: fromstring = lxml.etree.fromstring tostring = lxml.etree.tostring else: fromstring = ElementTree.fromstring tostring = ElementTree.tostring environment = self.test_case.get_environment() if environment is not None: for source in environment.sources.values(): if source.namespaces: for prefix, uri in source.namespaces.items(): ElementTree.register_namespace(prefix, uri) for prefix, uri in environment.namespaces.items(): ElementTree.register_namespace(prefix, uri) else: for prefix, uri in xpath_parser.DEFAULT_NAMESPACES.items(): ElementTree.register_namespace(prefix, uri) if self.value is not None: expected = self.value else: with open(self.attrib['file']) as fp: expected = fp.read() if isinstance(result, list): parts = [] for item in result: if isinstance(item, EtreeElementNode): tail, item.elem.tail = item.elem.tail, None string_value = tostring(item.elem) # type: ignore parts.append(string_value.decode('utf-8').strip()) item.elem.tail = tail elif isinstance(item, XPathNode): parts.append(str(item.value)) elif hasattr(item, 'tag'): tail, item.tail = item.tail, None parts.append(tostring(item).decode('utf-8').strip()) item.tail = tail elif hasattr(item, 'getroot'): parts.append(tostring(item.getroot()).decode('utf-8').strip()) else: parts.append(str(item)) xml_str = ''.join(parts) else: try: root = result.getroot() except AttributeError: root = result xml_str = tostring(root).decode('utf-8').strip() # Remove character data from result if expected result is serialized if '\n' not in expected: xml_str = '>'.join(s.lstrip() for s in xml_str.split('>\n')) # Strip the tail from serialized result if '>' in xml_str: tail_pos = xml_str.rindex('>') + 1 if tail_pos < len(xml_str): xml_str = xml_str[:tail_pos] if xml_str == expected or xml_str.replace(' />', '/>') == expected: return True # 2nd tentative (expected result from a serialization or comparing trees) try: if xml_str == tostring(fromstring(expected)).decode('utf-8').strip(): return True if etree_is_equal(fromstring(xml_str), fromstring(expected), strict=False): return True except (ElementTree.ParseError, lxml.etree.ParseError): # invalid XML data (maybe empty or concatenation of XML elements) # Last try removing xmlns registrations xmlns_pattern = re.compile(r'\sxmlns[^"]+"[^"]+"') expected_xmlns = xmlns_pattern.findall(expected) if any(xmlns not in expected_xmlns for xmlns in xmlns_pattern.findall(xml_str)): pass elif xmlns_pattern.sub('', xml_str) == xmlns_pattern.sub('', expected): return True self.report_failure(verbose, result=xml_str, expected=self.value or self.attrib['file']) return False def serialization_matches_validator(self, verbose=1): try: result = self.test_case.run_xpath_test(verbose) except (ElementPathError, ParseError, EvaluateError): return False regex = re.compile(self.value) return regex.match(result) def main(): global xpath_parser parser = argparse.ArgumentParser() parser.add_argument('catalog', metavar='CATALOG_FILE', help='the path to the main index file of test suite (catalog.xml)') parser.add_argument('pattern', nargs='?', default='.*', metavar='PATTERN', help='run only test cases which name matches a regex pattern') parser.add_argument('--xpath', metavar='XPATH_EXPR', help="run only test cases that have a specific XPath expression") parser.add_argument('-i', dest='ignore_case', action='store_true', default=False, help="ignore character case for regex pattern matching") parser.add_argument('--xp30', action='store_true', default=False, help="test XPath 3.0 parser") parser.add_argument('--xp31', action='store_true', default=False, help="test XPath 3.0 parser") parser.add_argument('-l', '--lxml', dest='use_lxml', action='store_true', default=False, help="use lxml.etree for environment sources (default is ElementTree)") parser.add_argument('-c', dest='show_test_case', action='store_true', default=False, help="show test case information before execution") parser.add_argument('-v', dest='verbose', action='count', default=1, help='increase verbosity: one option to show unexpected errors, ' 'two for show also unmatched error codes, three for debug') parser.add_argument('-q', '--quiet', action='store_true', default=False, help="run without printing steps or errors") parser.add_argument('-r', dest='report', metavar='REPORT_FILE', help="write a report (JSON format) to the given file") args = parser.parse_args() report = OrderedDict() report["summary"] = OrderedDict() report['other_failures'] = [] report['unknown'] = [] report['failed'] = [] report['success'] = [] catalog_file = os.path.abspath(args.catalog) pattern = re.compile(args.pattern, flags=re.IGNORECASE if args.ignore_case else 0) etree = lxml.etree if args.use_lxml else ElementTree if not args.quiet: verbose = args.verbose elif args.verbose > 1: print("Error: quiet and verbose options are mutually exclusive") sys.exit(1) else: verbose = 0 if not os.path.isfile(catalog_file): print("Error: catalog file %s does not exist" % args.catalog) sys.exit(1) start_time = datetime.datetime.now() if args.xp31: from elementpath.xpath31 import XPath31Parser xpath_parser = XPath31Parser Result.parser = xpath_parser() ignore_specs.remove('XP30+') ignore_specs.remove('XP31') ignore_specs.remove('XP31+') ignore_specs.add('XP20') elif args.xp30: from elementpath.xpath30 import XPath30Parser xpath_parser = XPath30Parser Result.parser = xpath_parser() ignore_specs.remove('XP30') ignore_specs.remove('XP30+') ignore_specs.add('XP20') with working_directory(dirpath=os.path.dirname(catalog_file)): catalog_xml = etree.parse(catalog_file) environments = {} for child in catalog_xml.getroot().iterfind("environment", namespaces): environment = Environment(child, args.use_lxml) environments[environment.name] = environment test_sets = {} for child in catalog_xml.getroot().iterfind("test-set", namespaces): test_set = TestSet(child, pattern, args.use_lxml, environments) test_sets[test_set.name] = test_set count_read = 0 count_skip = 0 count_run = 0 count_success = 0 count_failed = 0 count_unknown = 0 count_other_failures = 0 for test_set in test_sets.values(): # ignore by specs of test_set ignore_all_in_test_set = test_set.specs and all( dep in ignore_specs for dep in test_set.specs ) for test_case in test_set.test_cases: count_read += 1 if ignore_all_in_test_set: count_skip += 1 continue # ignore test cases for XML version 1.1 (not yet supported by Python's libraries) if test_case.xml_version == '1.1': count_skip += 1 continue # ignore by specs of test_case if test_case.specs and all(dep in ignore_specs for dep in test_case.specs): count_skip += 1 continue # ignore tests that rely on high level of support for uca collation semantics if 'advanced-uca-fallback' in test_case.features: count_skip += 1 continue # ignore tests that require an XQuery processor available if 'fn-load-xquery-module' in test_case.features: count_skip += 1 continue # ignore tests that require an XSLT processor available if 'fn-transform-XSLT' in test_case.features: count_skip += 1 continue # ignore tests that require an XSLT 3.0 processor available if 'fn-transform-XSLT30' in test_case.features: count_skip += 1 continue # ignore tests that rely on DTD parsing (TODO with lxml or a custom parser) if 'infoset-dtd' in test_case.features \ or test_case.environment_ref == 'id-idref-dtd': count_skip += 1 continue # ignore cases where a directory is used as collection uri (not supported # feature, only the case fn-collection__collection-010) if 'directory-as-collection-uri' in test_case.features: count_skip += 1 continue # ignore tests that rely on XQuery 1.0/XPath 2.0 static-typing enforcement if 'staticTyping' in test_case.test_set.features \ or 'staticTyping' in test_case.features: count_skip += 1 continue # ignore tests that rely on processing-instructions and comments if test_case.environment_ref == 'bib2': count_skip += 1 continue # Other test cases to skip for technical limitations if test_case.name in SKIP_TESTS: count_skip += 1 continue if test_case.name in SKIP_XP20 and not args.xp30 and not args.xp31: count_skip += 1 continue if not args.use_lxml and test_case.name in LXML_ONLY: count_skip += 1 continue if args.xpath and test_case.test != args.xpath: count_skip += 1 continue if args.xp30 and not args.xp31 and test_case.test: if 'parse-json' in test_case.test: count_skip += 1 continue elif 'map {' in test_case.test: count_skip += 1 continue count_run += 1 if args.show_test_case: print(f"Run test case {test_case.name!r}", flush=True) elif verbose == 1: print('.', end='', flush=True) try: case_result = test_case.run(verbose) if case_result is True: if args.report: report['success'].append(test_case.name) count_success += 1 elif case_result is False: if args.report: report['failed'].append(test_case.name) count_failed += 1 if verbose == 1: print('F', end='', flush=True) else: if args.report: report['unknown'].append(test_case.name) count_unknown += 1 if verbose == 1: print('U', end='', flush=True) except Exception as err: if verbose == 1: print('E', end='', flush=True) elif verbose: print("\nUnexpected failure for test %r" % test_case.name) print(type(err), str(err), flush=True) if verbose >= 4: traceback.print_exc() if args.report: report['other_failures'].append(test_case.name) count_other_failures += 1 elapsed_time = (datetime.datetime.now() - start_time).seconds print("\n*** Totals of W3C XPath tests execution ***\n") print(f"Total elapsed time: {elapsed_time}s\n") print("%d test cases read" % count_read) print("%d test cases skipped" % count_skip) print("%d test cases run\n" % count_run) print(" %d success" % count_success) print(" %d failed" % count_failed) print(" %d unknown" % count_unknown) print(" %d other failures" % count_other_failures) if args.report: report['summary']['read'] = count_read report['summary']['skipped'] = count_skip report['summary']['run'] = count_run report['summary']['success'] = count_success report['summary']['failed'] = count_failed report['summary']['unknown'] = count_unknown report['summary']['other_failures'] = count_other_failures with open(args.report, 'w') as outfile: outfile.write(json.dumps(report, indent=2)) if __name__ == '__main__': sys.exit(main()) sissaschool-elementpath-d3688c7/tests/memory_profiling.py000066400000000000000000000023501476131650400240240ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # flake8: noqa from memory_profiler import profile # noinspection PyUnresolvedReferences @profile(precision=3) def elementpath_memory_usage(): # Memory relevant standard library imports import pathlib import decimal import calendar import xml.etree.ElementTree import unicodedata # elementpath imports # # Note: comments out all subpackages imports in elementpath/__init__.py # to put in evidence the memory consumption of each subpackage. # import elementpath import elementpath.regex import elementpath.datatypes import elementpath.xpath_nodes import elementpath.xpath_context import elementpath.xpath_tokens import elementpath.xpath1 import elementpath.xpath2 # Optional elementpath imports import elementpath.xpath30 import elementpath.xpath31 if __name__ == '__main__': elementpath_memory_usage() sissaschool-elementpath-d3688c7/tests/mypy_tests/000077500000000000000000000000001476131650400223115ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/tests/mypy_tests/advanced.py000077500000000000000000000017671476131650400244460ustar00rootroot00000000000000#!/usr/bin/env python def main() -> None: from io import StringIO from xml.etree import ElementTree from elementpath import XPath2Parser, XPathToken, XPathContext, DocumentNode parser = XPath2Parser() token = parser.parse('/root/(: comment :) child[@attr]') assert isinstance(token, XPathToken) assert token.tree == '(/ (/ (root)) ([ (child) (@ (attr))))' assert token.source == '/root/child[@attr]' root = ElementTree.XML('') context = XPathContext(root) value = token.evaluate(context) print(value) token = parser.parse('concat("foo", " ", "bar")') assert context.root is not None and token.evaluate() == 'foo bar' doc = ElementTree.parse(StringIO('')) context = XPathContext(doc) # error? assert isinstance(context.root, DocumentNode) assert context.document is context.root assert context.item is context.root if __name__ == '__main__': main() sissaschool-elementpath-d3688c7/tests/mypy_tests/protocols.py000077500000000000000000000067531476131650400247250ustar00rootroot00000000000000#!/usr/bin/env python def main() -> None: import xml.etree.ElementTree as ElementTree import lxml.etree as etree from typing import Iterator, Union, cast from xmlschema import XMLSchema from xmlschema.validators import XsdSimpleType, XsdComplexType, XsdAnyElement from elementpath.protocols import ElementProtocol, LxmlElementProtocol, \ DocumentProtocol, LxmlDocumentProtocol, XsdTypeProtocol, XsdElementProtocol, \ XsdAttributeProtocol, GlobalMapsProtocol, XsdSchemaProtocol ### # Test protocols for ElementTree and lxml.etree def iter_elements(element: ElementProtocol) -> Iterator[ElementProtocol]: for e in element.iter(): yield e def iter_lxml_elements(element: LxmlElementProtocol) -> Iterator[LxmlElementProtocol]: for e in element.iter(): yield e doc: DocumentProtocol elem: ElementProtocol lxml_doc: LxmlDocumentProtocol lxml_elem: LxmlElementProtocol doc = ElementTree.ElementTree() del doc elem = ElementTree.XML('') elements = list(iter_elements(elem)) elements.clear() lxml_doc = etree.ElementTree() del lxml_doc lxml_elem = etree.XML('') lxml_elem2 = etree.XML('') lxml_elem2 = lxml_elem del lxml_elem2 elem2 = ElementTree.XML('') elem2 = elem del elem2 lxml_elements = list(iter_lxml_elements(lxml_elem)) lxml_elements.clear() ### # Test protocols for XSD type annotations BaseXsdType = Union[XsdSimpleType, XsdComplexType] class Base: xsd_type: XsdTypeProtocol def __init__(self, xsd_type: XsdTypeProtocol) -> None: self.xsd_type = xsd_type class Derived(Base): def __init__(self, xsd_type: BaseXsdType) -> None: super().__init__(xsd_type) def check_elem_type(xsd_element: XsdElementProtocol) -> None: assert xsd_element.type is not None def check_any_elem_type(xsd_element: XsdElementProtocol) -> None: assert xsd_element.type is None def check_attr_type(xsd_attribute: XsdAttributeProtocol) -> bool: return xsd_attribute.type is not None def check_simple_type(xsd_type: XsdTypeProtocol) -> bool: return xsd_type.is_simple() def check_maps(maps: GlobalMapsProtocol) -> bool: return maps is not None def check_xsd_schema(s: XsdSchemaProtocol) -> None: assert s is not None schema = XMLSchema(""" """) check_any_elem_type(cast(XsdAnyElement, schema.groups['group1'][0])) check_elem_type(schema.elements['elem1']) check_maps(schema.maps) check_xsd_schema(schema) a = cast(BaseXsdType, schema.types['type1']) check_simple_type(a) b = schema.attributes['attr1'] check_attr_type(b) if __name__ == '__main__': main() sissaschool-elementpath-d3688c7/tests/mypy_tests/selectors.py000077500000000000000000000016631476131650400246770ustar00rootroot00000000000000#!/usr/bin/env python def main() -> None: from xml.etree.ElementTree import XML import elementpath from elementpath import get_node_tree from elementpath.xpath3 import XPath3Parser root = XML('') result = elementpath.select(root, '*') print(result) result = list(elementpath.iter_select(root, '*')) print(result) selector = elementpath.Selector('*') result = selector.select(root) print(result) result = list(selector.iter_select(root)) print(result) result = elementpath.select(root, 'math:atan(1.0e0)', parser=XPath3Parser) print(result) root_node = get_node_tree(root) result = elementpath.select(root_node, '*') print(result) assert result == elementpath.select(root, '*') try: elementpath.select(1, '*') # type: ignore[arg-type] except TypeError: pass if __name__ == '__main__': main() sissaschool-elementpath-d3688c7/tests/resources/000077500000000000000000000000001476131650400221035ustar00rootroot00000000000000sissaschool-elementpath-d3688c7/tests/resources/analyze-string.xsd000066400000000000000000000023011476131650400255660ustar00rootroot00000000000000 sissaschool-elementpath-d3688c7/tests/resources/external_entity.xml000066400000000000000000000002531476131650400260430ustar00rootroot00000000000000 ]> sissaschool-elementpath-d3688c7/tests/resources/sample.xml000066400000000000000000000001031476131650400241000ustar00rootroot00000000000000 abc àèéìù sissaschool-elementpath-d3688c7/tests/resources/schema-for-json.xsd000066400000000000000000000124151476131650400256210ustar00rootroot00000000000000 sissaschool-elementpath-d3688c7/tests/resources/unparsed_entity.xml000066400000000000000000000004641476131650400260460ustar00rootroot00000000000000 ]> sissaschool-elementpath-d3688c7/tests/resources/unused_external_entity.xml000066400000000000000000000002521476131650400274250ustar00rootroot00000000000000 ]> abc sissaschool-elementpath-d3688c7/tests/resources/unused_unparsed_entity.xml000066400000000000000000000004161476131650400274260ustar00rootroot00000000000000 ]> sissaschool-elementpath-d3688c7/tests/resources/with_entity.xml000066400000000000000000000002011476131650400251650ustar00rootroot00000000000000 ]> &e; sissaschool-elementpath-d3688c7/tests/test_collations.py000066400000000000000000000035101476131650400236500ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from elementpath import ElementPathError from elementpath.collations import UNICODE_CODEPOINT_COLLATION, \ HTML_ASCII_CASE_INSENSITIVE_COLLATION, CollationManager class CollationsTest(unittest.TestCase): def test_context_manager_init(self): manager = CollationManager(collation=UNICODE_CODEPOINT_COLLATION) self.assertIsInstance(manager, CollationManager) with self.assertRaises(ElementPathError) as ctx: CollationManager(collation=None) self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn('collation cannot be an empty sequence', str(ctx.exception)) # Not raised in __init__() manager = CollationManager(collation='unknown') self.assertIsInstance(manager, CollationManager) def test_context_activation(self): with CollationManager(UNICODE_CODEPOINT_COLLATION) as manager: self.assertFalse(manager.eq('a', 'A')) self.assertIsInstance(manager, CollationManager) with self.assertRaises(ElementPathError) as ctx: with CollationManager(collation='unknown'): pass self.assertIn('FOCH0002', str(ctx.exception)) self.assertIn("Unsupported collation 'unknown'", str(ctx.exception)) def test_html_ascii_case_insensitive_collation(self): with CollationManager(HTML_ASCII_CASE_INSENSITIVE_COLLATION) as manager: self.assertTrue(manager.eq('a', 'A')) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_compare.py000066400000000000000000000125661476131650400231420ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from xml.etree import ElementTree from elementpath import XPath2Parser from elementpath.xpath_nodes import EtreeElementNode from elementpath.tree_builders import get_node_tree from elementpath.compare import deep_equal, deep_compare, get_key_function class CompareTest(unittest.TestCase): def test_deep_equal_function(self): parser = XPath2Parser() token = parser.parse('true()') with self.assertRaises(TypeError): deep_equal([token], [1]) with self.assertRaises(TypeError): deep_equal([1], [token]) self.assertTrue(deep_equal([1], [1])) self.assertFalse(deep_equal([1], [2])) self.assertFalse(deep_equal([1, 1], [1])) self.assertFalse(deep_equal([1], [1, 1])) root = ElementTree.Element('root') elem = ElementTree.Element('elem') element = EtreeElementNode(elem) self.assertTrue(deep_equal([element], [element])) self.assertFalse(deep_equal([1], [element])) self.assertFalse(deep_equal([EtreeElementNode(root)], [element])) root = ElementTree.XML('texttail') element = get_node_tree(root) document = get_node_tree(ElementTree.ElementTree(root)) self.assertTrue(deep_equal([element], [element])) self.assertTrue(deep_equal([document], [document])) root1 = ElementTree.XML('texttail') element1 = get_node_tree(root1) document1 = get_node_tree(ElementTree.ElementTree(root1)) self.assertFalse(deep_equal([element], [element1])) self.assertFalse(deep_equal([document], [document1])) root1 = ElementTree.XML('tail') element1 = get_node_tree(root1) document1 = get_node_tree(ElementTree.ElementTree(root1)) self.assertFalse(deep_equal([element], [element1])) self.assertFalse(deep_equal([document], [document1])) root1 = ElementTree.XML('texttail') element1 = get_node_tree(root1) document1 = get_node_tree(ElementTree.ElementTree(root1)) self.assertFalse(deep_equal([element], [element1])) self.assertFalse(deep_equal([document], [document1])) def test_deep_compare(self): parser = XPath2Parser() token = parser.parse('true()') with self.assertRaises(TypeError): deep_compare([token], [1]) with self.assertRaises(TypeError): deep_compare([1], [token]) self.assertEqual(deep_compare([1], [1]), 0) self.assertEqual(deep_compare([1], [2]), -1) self.assertEqual(deep_compare([1, 1], [1]), 1) self.assertEqual(deep_compare([1], [1, 1]), -1) root = ElementTree.Element('root') elem = ElementTree.Element('elem') element = EtreeElementNode(elem) self.assertEqual(deep_compare([element], [element]), 0) with self.assertRaises(TypeError): deep_compare([1], [element]) self.assertEqual(deep_compare([EtreeElementNode(root)], [element]), 1) root = ElementTree.XML('texttail') element = get_node_tree(root) document = get_node_tree(ElementTree.ElementTree(root)) self.assertEqual(deep_compare([element], [element]), 0) self.assertEqual(deep_compare([document], [document]), 0) root1 = ElementTree.XML('texttail') element1 = get_node_tree(root1) document1 = get_node_tree(ElementTree.ElementTree(root1)) self.assertEqual(deep_compare([element], [element1]), -1) self.assertEqual(deep_compare([document], [document1]), -1) root1 = ElementTree.XML('tail') element1 = get_node_tree(root1) document1 = get_node_tree(ElementTree.ElementTree(root1)) self.assertEqual(deep_compare([element], [element1]), 1) self.assertEqual(deep_compare([document], [document1]), 1) root1 = ElementTree.XML('texttail') element1 = get_node_tree(root1) document1 = get_node_tree(ElementTree.ElementTree(root1)) self.assertEqual(deep_compare([element], [element1]), -1) self.assertEqual(deep_compare([document], [document1]), -1) def test_key_function(self): key_function = get_key_function() result = sorted([2, 1], key=key_function) self.assertListEqual(result, [1, 2]) result = sorted([2, 1, 0], key=key_function) self.assertListEqual(result, [0, 1, 2]) result = sorted([2, 10, 7], key=key_function) self.assertListEqual(result, [2, 7, 10]) with self.assertRaises(TypeError) as cm: sorted(['2', 1, 0], key=key_function) self.assertIn('XPTY0004', str(cm.exception)) result = sorted(['2', '10', '7'], key=key_function) self.assertListEqual(result, ['10', '2', '7']) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_datatypes.py000066400000000000000000002530531476131650400235100ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import sys import datetime import math import operator import pickle import platform import random from decimal import Decimal from calendar import isleap from textwrap import dedent from xml.etree import ElementTree try: import xmlschema except ImportError: xmlschema = None from elementpath.helpers import MONTH_DAYS, MONTH_DAYS_LEAP from elementpath.datatypes import AnyAtomicType, DateTime, DateTime10, Date, Date10, \ Time, Timezone, Duration, DayTimeDuration, YearMonthDuration, UntypedAtomic, \ GregorianYear, GregorianYear10, GregorianYearMonth, GregorianYearMonth10, \ GregorianMonthDay, GregorianMonth, GregorianDay, AbstractDateTime, NumericProxy, \ ArithmeticProxy, Id, Notation, QName, Base64Binary, HexBinary, NormalizedString, \ XsdToken, Language, Float, Float10, Integer, Short, NegativeInteger, AnyURI, \ BooleanProxy, DecimalProxy, DoubleProxy10, DoubleProxy, StringProxy, \ xsd10_atomic_types, xsd11_atomic_types from elementpath.datatypes.atomic_types import AtomicTypeMeta from elementpath.datatypes.datetime import OrderedDateTime from elementpath.decoder import get_atomic_sequence class AtomicTypesTest(unittest.TestCase): def test_xsd_atomic_types_maps(self): self.assertEqual(len(xsd10_atomic_types), 45 * 2) self.assertEqual(len(xsd11_atomic_types), 46 * 2) self.assertSetEqual( set(xsd11_atomic_types) - set(xsd10_atomic_types), {'{http://www.w3.org/2001/XMLSchema}dateTimeStamp', 'dateTimeStamp'} ) @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_get_atomic_value(self): schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertEqual( list(get_atomic_sequence(schema.elements['d'].type)), [UntypedAtomic('1')] ) with self.assertRaises(AttributeError): list(get_atomic_sequence(schema)) self.assertEqual(next(iter(get_atomic_sequence(xsd_type=None))), UntypedAtomic(value='')) value = next(iter(get_atomic_sequence(schema.elements['a'].type)), None) self.assertIsInstance(value, UntypedAtomic) self.assertEqual(value, UntypedAtomic(value='1')) value = next(iter(get_atomic_sequence(schema.elements['b'].type)), None) self.assertIsInstance(value, int) self.assertEqual(value, 1) value = next(iter(get_atomic_sequence(schema.elements['c'].type)), None) self.assertIsInstance(value, UntypedAtomic) self.assertEqual(value, UntypedAtomic(value='1')) value = next(iter(get_atomic_sequence(schema.elements['d'].type)), None) self.assertIsInstance(value, float) self.assertEqual(value, 1.0) value = next(iter(get_atomic_sequence(schema.elements['e'].type)), None) self.assertIsInstance(value, str) self.assertEqual(value, ' alpha\t') class AnyAtomicTypeTest(unittest.TestCase): def test_invalid_type_name(self): with self.assertRaises(TypeError): class InvalidAtomicType(metaclass=AtomicTypeMeta): name = b'invalid' def test_validation(self): class AnotherAtomicType(metaclass=AtomicTypeMeta): pass self.assertIsNone(AnotherAtomicType.validate(AnotherAtomicType())) self.assertIsNone(AnotherAtomicType.validate('')) with self.assertRaises(TypeError) as ctx: AnotherAtomicType.validate(10) self.assertIn("invalid type for xyz').text)) self.assertFalse(Id.is_valid(ElementTree.XML('xyz abc').text)) self.assertFalse(Id.is_valid(ElementTree.XML('12345').text)) self.assertTrue(Id.is_valid('alpha')) self.assertFalse(Id.is_valid('alpha beta')) self.assertFalse(Id.is_valid('12345')) def test_new_instance(self): self.assertEqual(NormalizedString(' a b\t c\n'), ' a b c ') self.assertEqual(NormalizedString(10.0), '10.0') self.assertEqual(XsdToken(10), '10') self.assertEqual(Language(True), 'true') with self.assertRaises(ValueError) as ctx: Language(10), '10' self.assertEqual("invalid value '10' for xs:language", str(ctx.exception)) def test_isinstance(self): value = NormalizedString('xyz') self.assertIsInstance(value, AnyAtomicType) self.assertIsInstance(value, str) self.assertIsInstance(value, NormalizedString) self.assertNotIsInstance(value, XsdToken) self.assertNotIsInstance(value, bytes) def test_issubclass(self): self.assertTrue(issubclass(NormalizedString, AnyAtomicType)) self.assertTrue(issubclass(NormalizedString, str)) self.assertTrue(issubclass(NormalizedString, StringProxy)) self.assertFalse(issubclass(NormalizedString, XsdToken)) class FloatTypesTest(unittest.TestCase): def test_init(self): self.assertEqual(Float10(10), 10.0) self.assertTrue(math.isnan(Float10('NaN'))) self.assertTrue(math.isinf(Float10('INF'))) self.assertTrue(math.isinf(Float10('-INF'))) with self.assertRaises(ValueError): Float10('+INF') self.assertTrue(math.isnan(Float('NaN'))) self.assertTrue(math.isinf(Float('INF'))) self.assertTrue(math.isinf(Float('-INF'))) self.assertTrue(math.isinf(Float('+INF'))) with self.assertRaises(ValueError): Float10('nan') with self.assertRaises(ValueError): Float10('inf') def test_hash(self): self.assertEqual(hash(Float10(892.1)), hash(892.1)) def test_equivalence(self): self.assertEqual(Float10('10.1'), Float10('10.1')) self.assertEqual(Float10('10.1'), Float('10.1')) self.assertNotEqual(Float10('10.1001'), Float10('10.1')) self.assertFalse(Float10('10.1001') == Float10('10.1')) self.assertNotEqual(Float10('10.1001'), Float('10.1')) self.assertFalse(Float10('10.1') != Float10('10.1')) self.assertEqual(Float10('10.0'), 10) self.assertNotEqual(Float10('10.0'), 11) def test_addition(self): self.assertEqual(Float10('10.1') + Float10('10.1'), 20.2) self.assertEqual(Float('10.1') + Float10('10.1'), 20.2) self.assertEqual(10.1 + Float10('10.1'), 20.2) with self.assertRaises(TypeError): '10.1' + Float10('10.1') with self.assertRaises(TypeError): Float10('10.1') + '10.1' def test_subtraction(self): self.assertEqual(Float10('10.1') - Float10('1.1'), 9.0) self.assertEqual(Float('10.1') - Float10('1.1'), 9.0) self.assertEqual(10.1 - Float10('1.1'), 9.0) self.assertEqual(10 - Float10('1.1'), 8.9) with self.assertRaises(TypeError): '10.1' - Float10('10.1') with self.assertRaises(TypeError): Float10('10.1') - '10.1' def test_multiplication(self): self.assertEqual(Float10('10.1') * 2, 20.2) self.assertEqual(Float('10.1') * 2.0, 20.2) self.assertEqual(2 * Float10('10.1'), 20.2) self.assertEqual(2.0 * Float('10.1'), 20.2) with self.assertRaises(TypeError): Float10('10.1') * '2.0' with self.assertRaises(TypeError): '10.1' * Float10('2.0') def test_division(self): self.assertEqual(Float10('20.2') / 2, 10.1) self.assertEqual(Float('20.2') / 2.0, 10.1) self.assertEqual(20.2 / Float10('2'), 10.1) self.assertEqual(20 / Float('2'), 10.0) with self.assertRaises(TypeError): Float10('10.1') / '2.0' with self.assertRaises(TypeError): '10.1' / Float10('2.0') def test_module(self): self.assertEqual(Float10('20.2') % 3, 20.2 % 3) self.assertEqual(Float('20.2') % 3.0, 20.2 % 3.0) self.assertEqual(20.2 % Float10('3'), 20.2 % 3) self.assertEqual(20 % Float('3.0'), 20 % 3.0) with self.assertRaises(TypeError): Float10('20.2') % '3.0' with self.assertRaises(TypeError): True % Float10('3.0') with self.assertRaises(TypeError): () % Float10('3.0') def test_abs(self): self.assertEqual(abs(Float10('-20.2')), 20.2) def test_nan(self): self.assertNotEqual(math.nan, math.nan) # NaN is not equal to itself! self.assertIs(math.nan, math.nan) if platform.python_implementation() == 'PyPy': # PyPy uses the same instance for float('nan') and math.nan self.assertIs(float('nan'), float('nan')) else: self.assertIsNot(float('nan'), float('nan')) self.assertTrue(math.isnan(Float10('NaN'))) self.assertTrue(math.isnan(Float10(math.nan))) self.assertNotEqual(Float10('NaN'), Float10('NaN')) self.assertIs(Float10('NaN'), Float10('NaN')) self.assertIs(Float10('NaN'), Float('NaN')) self.assertIs(Float10(math.nan), Float10('NaN')) self.assertIsNot(Float10(math.nan), math.nan) self.assertIsNot(Float10('NaN'), DoubleProxy10('NaN')) # Invalid values for constructor function self.assertRaises(ValueError, Float10, 'NAN') with self.assertRaises(ValueError) as ctx: Float10('nan') self.assertEqual(str(ctx.exception), "invalid value 'nan' for xs:float") def test_isinstance(self): value = Float10(1.0) self.assertIsInstance(value, Float10) self.assertNotIsInstance(value, int) self.assertIsInstance(value, AnyAtomicType) self.assertNotIsInstance(value, Short) self.assertIsInstance(value, float) self.assertNotIsInstance(value, Float) def test_issubclass(self): self.assertTrue(issubclass(Float10, AnyAtomicType)) self.assertTrue(issubclass(Float, AnyAtomicType)) self.assertTrue(issubclass(Float10, float)) class IntegerTypesTest(unittest.TestCase): def test_validate(self): self.assertIsNone(Integer.validate(10)) self.assertIsNone(Integer.validate(Integer(10))) self.assertIsNone(Integer.validate('10')) with self.assertRaises(TypeError): Integer.validate(True) with self.assertRaises(ValueError): Integer.validate('10.1') def test_isinstance(self): value = Integer(1) self.assertIsInstance(value, Integer) self.assertIsInstance(value, int) self.assertIsInstance(value, AnyAtomicType) self.assertNotIsInstance(value, Short) self.assertNotIsInstance(value, float) self.assertNotIsInstance(value, Float10) def test_issubclass(self): self.assertTrue(issubclass(Integer, AnyAtomicType)) self.assertTrue(issubclass(Integer, int)) self.assertTrue(issubclass(NegativeInteger, Integer)) self.assertTrue(issubclass(Short, AnyAtomicType)) self.assertFalse(issubclass(bool, Integer)) self.assertFalse(issubclass(float, Integer)) class UntypedAtomicTest(unittest.TestCase): def test_init(self): self.assertEqual(UntypedAtomic(1).value, '1') self.assertEqual(UntypedAtomic(-3.9).value, '-3.9') self.assertEqual(UntypedAtomic('alpha').value, 'alpha') self.assertEqual(UntypedAtomic(b'beta').value, 'beta') self.assertEqual(UntypedAtomic(True).value, 'true') self.assertEqual(UntypedAtomic(UntypedAtomic(2)).value, '2') self.assertEqual(UntypedAtomic(Date.fromstring('2000-02-01')).value, '2000-02-01') with self.assertRaises(TypeError) as err: UntypedAtomic(None) self.assertEqual(str(err.exception), "None is not an atomic value") def test_string_representation(self): self.assertEqual(repr(UntypedAtomic(7)), "UntypedAtomic('7')") self.assertEqual(str(UntypedAtomic(7)), '7') def test_eq(self): self.assertTrue(UntypedAtomic(-10) == UntypedAtomic(-10)) self.assertTrue(UntypedAtomic(5.2) == UntypedAtomic(5.2)) self.assertTrue(UntypedAtomic('-6.09') == UntypedAtomic('-6.09')) self.assertTrue(UntypedAtomic(Decimal('8.91')) == UntypedAtomic(Decimal('8.91'))) self.assertTrue(UntypedAtomic(False) == UntypedAtomic(False)) self.assertTrue(UntypedAtomic(-10) == -10) self.assertTrue(-10 == UntypedAtomic(-10)) self.assertTrue('-10' == UntypedAtomic(-10)) self.assertTrue(UntypedAtomic(False) == bool(False)) self.assertTrue(bool(False) == UntypedAtomic(False)) self.assertTrue(Decimal('8.91') == UntypedAtomic(Decimal('8.91'))) self.assertTrue(UntypedAtomic(Decimal('8.91')) == Decimal('8.91')) self.assertTrue(bool(True) == UntypedAtomic(1)) with self.assertRaises(ValueError) as ctx: _ = bool(True) == UntypedAtomic(10) self.assertEqual(str(ctx.exception), "'10' cannot be cast to xs:boolean") self.assertFalse(-10.9 == UntypedAtomic(-10)) self.assertFalse(UntypedAtomic(-10) == -11) self.assertFalse(UntypedAtomic(-10.5) == UntypedAtomic(-10)) self.assertFalse(-10.5 == UntypedAtomic(-10)) self.assertFalse(-17 == UntypedAtomic(-17.3)) def test_ne(self): self.assertTrue(UntypedAtomic(True) != UntypedAtomic(False)) self.assertTrue(UntypedAtomic(5.12) != UntypedAtomic(5.2)) self.assertTrue('29' != UntypedAtomic(5.2)) self.assertFalse('2.0' != UntypedAtomic('2.0')) def test_lt(self): self.assertTrue(UntypedAtomic(9.0) < UntypedAtomic(15)) self.assertTrue(False < UntypedAtomic(True)) self.assertTrue(UntypedAtomic('78') < 100.0) self.assertFalse(UntypedAtomic('100.1') < 100.0) def test_le(self): self.assertTrue(UntypedAtomic(9.0) <= UntypedAtomic(15)) self.assertTrue(False <= UntypedAtomic(False)) self.assertTrue(UntypedAtomic('78') <= 100.0) self.assertFalse(UntypedAtomic('100.001') <= 100.0) def test_gt(self): self.assertTrue(UntypedAtomic(25) > UntypedAtomic(15)) self.assertTrue(25 > UntypedAtomic(15)) self.assertTrue(UntypedAtomic(25) > 15) self.assertTrue(UntypedAtomic(25) > '15') def test_ge(self): self.assertTrue(UntypedAtomic(25) >= UntypedAtomic(25)) self.assertFalse(25 >= UntypedAtomic(25.1)) def test_add(self): self.assertEqual(UntypedAtomic(20) + UntypedAtomic(3), UntypedAtomic(23)) self.assertEqual(UntypedAtomic(-2) + UntypedAtomic(3), UntypedAtomic(1)) self.assertEqual(UntypedAtomic(17) + UntypedAtomic(5.1), UntypedAtomic(22.1)) self.assertEqual(UntypedAtomic('1') + UntypedAtomic('2.7'), UntypedAtomic(3.7)) def test_conversion(self): self.assertEqual(str(UntypedAtomic(25.1)), '25.1') self.assertEqual(int(UntypedAtomic(25)), 25) with self.assertRaises(ValueError): int(UntypedAtomic(25.1)) self.assertEqual(float(UntypedAtomic(25.1)), 25.1) self.assertEqual(bool(UntypedAtomic(True)), True) self.assertEqual(str(UntypedAtomic(u'Joan Miró')), u'Joan Miró') self.assertEqual(bytes(UntypedAtomic(u'Joan Miró')), b'Joan Mir\xc3\xb3') def test_numerical_operators(self): self.assertEqual(0.25 * UntypedAtomic(1000), 250) self.assertEqual(1200 - UntypedAtomic(1000.0), 200.0) self.assertEqual(UntypedAtomic(1000.0) - 250, 750.0) self.assertEqual(UntypedAtomic('1000.0') - 250, 750.0) self.assertEqual(UntypedAtomic('1000.0') - UntypedAtomic(250), 750.0) self.assertEqual(UntypedAtomic(0.75) * UntypedAtomic(100), 75) self.assertEqual(UntypedAtomic('0.75') * UntypedAtomic('100'), 75) self.assertEqual(UntypedAtomic('9.0') / UntypedAtomic('3'), 3.0) self.assertEqual(9.0 / UntypedAtomic('3'), 3.0) self.assertEqual(UntypedAtomic('15') * UntypedAtomic('4'), 60) def test_abs(self): self.assertEqual(abs(UntypedAtomic(-10)), 10) def test_mod(self): self.assertEqual(UntypedAtomic(1) % 2, 1) self.assertEqual(UntypedAtomic('1') % 2, 1.0) def test_hashing(self): self.assertEqual(hash(UntypedAtomic(12345)), hash('12345')) self.assertIsInstance(hash(UntypedAtomic('alpha')), int) def test_validate(self): self.assertIsNone(UntypedAtomic.validate(UntypedAtomic('10'))) self.assertRaises(TypeError, UntypedAtomic.validate, '10') self.assertRaises(TypeError, UntypedAtomic.validate, 10) def test_isinstance(self): value = UntypedAtomic('1') self.assertIsInstance(value, UntypedAtomic) self.assertIsInstance(value, AnyAtomicType) self.assertNotIsInstance(value, StringProxy) self.assertNotIsInstance(value, str) def test_issubclass(self): self.assertTrue(issubclass(UntypedAtomic, AnyAtomicType)) self.assertFalse(issubclass(UntypedAtomic, StringProxy)) self.assertFalse(issubclass(UntypedAtomic, str)) class DateTimeTypesTest(unittest.TestCase): def test_abstract_classes(self): self.assertRaises(TypeError, AbstractDateTime) self.assertRaises(TypeError, OrderedDateTime) def test_datetime_init(self): with self.assertRaises(ValueError) as err: DateTime(year=0, month=1, day=1) self.assertIn("0 is an illegal value for year", str(err.exception)) with self.assertRaises(TypeError) as err: DateTime(year=-1999.0, month=1, day=1) self.assertIn("invalid type for year", str(err.exception)) def test_datetime_fromstring(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7)) dt = DateTime.fromstring('-2000-10-07T00:00:00') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(4, 10, 7)) self.assertEqual(dt._year, -2001) dt = DateTime.fromstring('2020-03-05T23:04:10.047') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2020, 3, 5, 23, 4, 10, 47000)) with self.assertRaises(TypeError) as err: DateTime.fromstring(b'00-10-07') self.assertIn("1st argument has an invalid type ", str(err.exception)) with self.assertRaises(TypeError) as err: DateTime.fromstring('2010-10-07', tzinfo='Z') self.assertIn("2nd argument has an invalid type ", str(err.exception)) with self.assertRaises(ValueError) as err: DateTime.fromstring('2000-10-07') self.assertIn("Invalid datetime string", str(err.exception)) with self.assertRaises(ValueError) as err: DateTime.fromstring('00-10-07T00:00:00') self.assertIn("Invalid datetime string", str(err.exception)) with self.assertRaises(ValueError) as err: DateTime.fromstring('2020-03-05 23:04:10.047') self.assertIn("Invalid datetime string", str(err.exception)) dt = DateTime.fromstring('2000-10-07T00:00:00.100000') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=100000)) def test_tzname(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertIsNone(dt.tzname()) dt = DateTime.fromstring('2000-10-07T00:00:00Z') self.assertEqual(dt.tzname(), 'Z') def test_astimezone(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertIsInstance(dt.astimezone(), datetime.datetime) def test_isocalendar(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertEqual( dt.isocalendar(), (2000, 40, 6) ) def test_issue_36_fromstring_with_more_microseconds_digits(self): dt = DateTime.fromstring('2000-10-07T00:00:00.00090001') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=900)) dt = DateTime.fromstring('2000-10-07T00:00:00.0009009999') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=900)) dt = DateTime.fromstring('2000-10-07T00:00:00.1000000') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2000, 10, 7, microsecond=100000)) # Regression test of issue #36 tz = Timezone.fromstring('+01:00') dt = DateTime.fromstring('2021-02-21T21:43:03.1121296+01:00') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(2021, 2, 21, 21, 43, 3, 112129, tz)) # From W3C's XQuery/XPath tests dt = DateTime.fromstring('9999-12-31T23:59:59.9999999') self.assertIsInstance(dt, DateTime) self.assertEqual(dt._dt, datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)) def test_issue_84_error_parsing_midnight_hours(self): dt1 = DateTime.fromstring('9998-12-31T24:00:00') self.assertEqual(str(dt1), '9999-01-01T00:00:00') dt1 = DateTime.fromstring('9999-12-31T24:00:00') self.assertEqual(str(dt1), '10000-01-01T00:00:00') dt2 = DateTime.fromstring('10000-01-01T00:00:00') self.assertEqual(dt1, dt2) def test_date_fromstring(self): self.assertIsInstance(Date.fromstring('2000-10-07'), Date) self.assertIsInstance(Date.fromstring('-2000-10-07'), Date) self.assertIsInstance(Date.fromstring('0000-02-29'), Date) with self.assertRaises(ValueError) as ctx: Date10.fromstring('0000-02-29') self.assertIn("year '0000' is an illegal value for XSD 1.0", str(ctx.exception)) with self.assertRaises(ValueError) as ctx: Date.fromstring('01000-02-29') self.assertIn("when year exceeds 4 digits leading zeroes are not allowed", str(ctx.exception)) dt = Date.fromstring("-0003-01-01") self.assertEqual(dt._year, -4) self.assertEqual(dt._dt.year, 6) self.assertEqual(dt._dt.month, 1) self.assertEqual(dt._dt.day, 1) self.assertTrue(dt.bce) def test_fromdatetime(self): dt = datetime.datetime(2000, 1, 20) self.assertEqual(str(DateTime.fromdatetime(dt)), '2000-01-20T00:00:00') with self.assertRaises(TypeError) as err: DateTime.fromdatetime('2000-10-07') self.assertEqual("1st argument has an invalid type ", str(err.exception)) with self.assertRaises(TypeError) as err: DateTime.fromdatetime(dt, year='0001') self.assertEqual("2nd argument has an invalid type ", str(err.exception)) self.assertEqual(str(DateTime.fromdatetime(dt, year=1)), '0001-01-20T00:00:00') def test_iso_year_property(self): self.assertEqual(DateTime(2000, 10, 7).iso_year, '2000') self.assertEqual(DateTime(20001, 10, 7).iso_year, '20001') self.assertEqual(DateTime(-9999, 10, 7).iso_year, '-9998') self.assertEqual(DateTime10(-9999, 10, 7).iso_year, '-9999') self.assertEqual(DateTime(-1, 10, 7).iso_year, '0000') self.assertEqual(DateTime10(-1, 10, 7).iso_year, '-0001') def test_datetime_string_representation(self): dt = DateTime.fromstring('2000-10-07T00:00:00') self.assertEqual(repr(dt), "DateTime(2000, 10, 7, 0, 0, 0)") self.assertEqual(str(dt), '2000-10-07T00:00:00') dt = DateTime.fromstring('-0100-04-13T23:59:59') self.assertEqual(repr(dt), "DateTime(-101, 4, 13, 23, 59, 59)") self.assertEqual(str(dt), '-0100-04-13T23:59:59') dt = DateTime10.fromstring('-0100-04-13T10:30:00-04:00') if sys.version_info >= (3, 7): self.assertEqual( repr(dt), "DateTime10(-100, 4, 13, 10, 30, 0, " "tzinfo=Timezone(datetime.timedelta(days=-1, seconds=72000)))" ) else: self.assertEqual(repr(dt), "DateTime10(-100, 4, 13, 10, 30, 0, " "tzinfo=Timezone(datetime.timedelta(-1, 72000)))") self.assertEqual(str(dt), '-0100-04-13T10:30:00-04:00') dt = DateTime(2001, 1, 1, microsecond=10) self.assertEqual(repr(dt), 'DateTime(2001, 1, 1, 0, 0, 0.000010)') self.assertEqual(str(dt), '2001-01-01T00:00:00.00001') def test_24_hour_datetime(self): dt = DateTime.fromstring('0000-09-19T24:00:00Z') self.assertEqual(str(dt), '0000-09-20T00:00:00Z') def test_date_string_representation(self): dt = Date.fromstring('2000-10-07') self.assertEqual(repr(dt), "Date(2000, 10, 7)") self.assertEqual(str(dt), '2000-10-07') dt = Date.fromstring('-0100-04-13') self.assertEqual(repr(dt), "Date(-101, 4, 13)") self.assertEqual(str(dt), '-0100-04-13') dt = Date10.fromstring('-0100-04-13') self.assertEqual(repr(dt), "Date10(-100, 4, 13)") self.assertEqual(str(dt), '-0100-04-13') dt = Date.fromstring("-0003-01-01") self.assertEqual(repr(dt), "Date(-4, 1, 1)") self.assertEqual(str(dt), '-0003-01-01') dt = Date10.fromstring("-0003-01-01") self.assertEqual(repr(dt), "Date10(-3, 1, 1)") self.assertEqual(str(dt), '-0003-01-01') def test_gregorian_year_string_representation(self): dt = GregorianYear.fromstring('1991') self.assertEqual(repr(dt), "GregorianYear(1991)") self.assertEqual(str(dt), '1991') dt = GregorianYear.fromstring('0000') self.assertEqual(repr(dt), "GregorianYear(-1)") self.assertEqual(str(dt), '0000') dt = GregorianYear10.fromstring('-0050') self.assertEqual(repr(dt), "GregorianYear10(-50)") self.assertEqual(str(dt), '-0050') def test_gregorian_day_string_representation(self): dt = GregorianDay.fromstring('---31') self.assertEqual(repr(dt), "GregorianDay(31)") self.assertEqual(str(dt), '---31') dt = GregorianDay.fromstring('---05Z') self.assertEqual(repr(dt), "GregorianDay(5, tzinfo=Timezone(datetime.timedelta(0)))") self.assertEqual(str(dt), '---05Z') def test_gregorian_month_string_representation(self): dt = GregorianMonth.fromstring('--09') self.assertEqual(repr(dt), "GregorianMonth(9)") self.assertEqual(str(dt), '--09') def test_gregorian_month_day_string_representation(self): dt = GregorianMonthDay.fromstring('--07-23') self.assertEqual(repr(dt), "GregorianMonthDay(7, 23)") self.assertEqual(str(dt), '--07-23') def test_gregorian_year_month_string_representation(self): dt = GregorianYearMonth.fromstring('-1890-12') self.assertEqual(repr(dt), "GregorianYearMonth(-1891, 12)") self.assertEqual(str(dt), '-1890-12') dt = GregorianYearMonth10.fromstring('-0050-04') self.assertEqual(repr(dt), "GregorianYearMonth10(-50, 4)") self.assertEqual(str(dt), '-0050-04') def test_time_string_representation(self): dt = Time.fromstring('20:40:13') self.assertEqual(repr(dt), "Time(20, 40, 13)") self.assertEqual(str(dt), '20:40:13') dt = Time.fromstring('24:00:00') self.assertEqual(repr(dt), "Time(0, 0, 0)") self.assertEqual(str(dt), '00:00:00') dt = Time.fromstring('15:34:29.000037') self.assertEqual(repr(dt), "Time(15, 34, 29.000037)") self.assertEqual(str(dt), '15:34:29.000037') def test_eq_operator(self): tz = Timezone.fromstring('-05:00') mkdt = DateTime.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") == mkdt("2002-04-02T17:00:00+04:00")) self.assertFalse(mkdt("2002-04-02T12:00:00") == mkdt("2002-04-02T23:00:00+06:00")) self.assertFalse(mkdt("2002-04-02T12:00:00") == mkdt("2002-04-02T17:00:00")) self.assertTrue(mkdt("2002-04-02T12:00:00") == mkdt("2002-04-02T12:00:00")) self.assertTrue(mkdt("2002-04-02T23:00:00-04:00") == mkdt("2002-04-03T02:00:00-01:00")) self.assertTrue(mkdt("1999-12-31T24:00:00") == mkdt("2000-01-01T00:00:00")) self.assertTrue(mkdt("2005-04-04T24:00:00") == mkdt("2005-04-05T00:00:00")) self.assertTrue( mkdt("2002-04-02T12:00:00-01:00", tz) == mkdt("2002-04-02T17:00:00+04:00", tz)) self.assertTrue(mkdt("2002-04-02T12:00:00", tz) == mkdt("2002-04-02T23:00:00+06:00", tz)) self.assertFalse(mkdt("2002-04-02T12:00:00", tz) == mkdt("2002-04-02T17:00:00", tz)) self.assertTrue(mkdt("2002-04-02T12:00:00", tz) == mkdt("2002-04-02T12:00:00", tz)) self.assertTrue( mkdt("2002-04-02T23:00:00-04:00", tz) == mkdt("2002-04-03T02:00:00-01:00", tz)) self.assertTrue(mkdt("1999-12-31T24:00:00", tz) == mkdt("2000-01-01T00:00:00", tz)) self.assertTrue(mkdt("2005-04-04T24:00:00", tz) == mkdt("2005-04-05T00:00:00", tz)) self.assertFalse(mkdt("2005-04-04T24:00:00", tz) != mkdt("2005-04-05T00:00:00", tz)) self.assertTrue(Date.fromstring("-1000-01-01") == Date.fromstring("-1000-01-01")) self.assertTrue(Date.fromstring("-10000-01-01") == Date.fromstring("-10000-01-01")) self.assertFalse(Date.fromstring("20000-01-01") != Date.fromstring("20000-01-01")) self.assertFalse(Date.fromstring("-10000-01-02") == Date.fromstring("-10000-01-01")) self.assertFalse(Date.fromstring("-10000-01-02") == (1, 2, 3)) # Wrong type self.assertTrue(Date.fromstring("-10000-01-02") != (1, 2, 3)) # Wrong type # Type mismatch: not comparable self.assertFalse(GregorianYearMonth10(1989, 6) == GregorianMonthDay(11, 30)) self.assertTrue(GregorianYearMonth10(1989, 6) != GregorianMonthDay(11, 30)) def test_lt_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") < mkdt("2002-04-02T17:00:00-01:00")) self.assertFalse(mkdt("2002-04-02T18:00:00-01:00") < mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00+02:00") < mkdt("2002-04-02T17:00:00Z")) self.assertTrue(mkdt("2002-04-02T18:00:00+02:00") < mkdt("2002-04-03T00:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") < mkdt("2001-01-01T17:00:00Z")) self.assertFalse(mkdt("2002-01-01T10:00:00") < mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") < mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-12002-01-01T10:00:00") < mkdt("-12001-01-01T17:00:00Z")) self.assertFalse(mkdt("12002-01-01T10:00:00") < mkdt("12001-01-01T17:00:00Z")) self.assertTrue(mkdt("-10000-01-01T10:00:00Z") < mkdt("-10000-01-01T17:00:00Z")) self.assertRaises(TypeError, operator.lt, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) with self.assertRaises(TypeError): mkdt("2002-04-02T12:00:00-01:00") < "2002-04-02T17:00:00-01:00" def test_le_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") <= mkdt("2002-04-02T12:00:00-01:00")) self.assertFalse(mkdt("2002-04-02T18:00:00-01:00") <= mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00+01:00") <= mkdt("2002-04-02T17:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") <= mkdt("2001-01-01T17:00:00Z")) self.assertFalse(mkdt("2002-01-01T10:00:00") <= mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-2002-01-01T10:00:00") <= mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-10000-01-01T10:00:00Z") <= mkdt("-10000-01-01T10:00:00Z")) self.assertTrue(mkdt("-190000-01-01T10:00:00Z") <= mkdt("0100-01-01T10:00:00Z")) self.assertRaises(TypeError, operator.le, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) with self.assertRaises(TypeError): mkdt("2002-04-02T12:00:00-01:00") <= "2002-04-02T17:00:00-01:00" def test_gt_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertFalse(mkdt("2002-04-02T12:00:00-01:00") > mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00-01:00") > mkdt("2002-04-02T17:00:00-01:00")) self.assertFalse(mkdt("2002-04-02T18:00:00+02:00") > mkdt("2002-04-02T17:00:00Z")) self.assertFalse(mkdt("2002-04-02T18:00:00+02:00") > mkdt("2002-04-03T00:00:00Z")) self.assertTrue(mkdt("2002-01-01T10:00:00") > mkdt("-2001-01-01T17:00:00Z")) self.assertFalse(mkdt("-2002-01-01T10:00:00") > mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("13567-04-18T10:00:00Z") > datetime.datetime.now()) self.assertFalse(mkdt("15032-11-12T23:17:59Z") > mkdt("15032-11-12T23:17:59Z")) self.assertRaises(TypeError, operator.lt, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) with self.assertRaises(TypeError): mkdt("2002-04-02T12:00:00-01:00") > "2002-04-02T17:00:00-01:00" def test_ge_operator(self): mkdt = DateTime.fromstring mkdate = Date.fromstring self.assertTrue(mkdt("2002-04-02T12:00:00-01:00") >= mkdt("2002-04-02T12:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00-01:00") >= mkdt("2002-04-02T17:00:00-01:00")) self.assertTrue(mkdt("2002-04-02T18:00:00+01:00") >= mkdt("2002-04-02T17:00:00Z")) self.assertFalse(mkdt("-2002-01-01T10:00:00") >= mkdt("2001-01-01T17:00:00Z")) self.assertTrue(mkdt("2002-01-01T10:00:00") >= mkdt("-2001-01-01T17:00:00Z")) self.assertFalse(mkdt("-2002-01-01T10:00:00") >= mkdt("-2001-01-01T17:00:00Z")) self.assertTrue(mkdt("-3000-06-21T00:00:00Z") >= mkdt("-3000-06-21T00:00:00Z")) self.assertFalse(mkdt("-3000-06-21T00:00:00Z") >= mkdt("-3000-06-21T01:00:00Z")) self.assertTrue(mkdt("15032-11-12T23:17:59Z") >= mkdt("15032-11-12T23:17:59Z")) self.assertRaises(TypeError, operator.le, mkdt("2002-04-02T18:00:00+02:00"), mkdate("2002-04-03")) with self.assertRaises(TypeError): mkdt("2002-04-02T12:00:00-01:00") >= "2002-04-02T17:00:00-01:00" def test_fromdelta(self): self.assertIsNotNone(Date.fromstring('10000-02-28')) self.assertEqual(Date.fromdelta(datetime.timedelta(days=0)), Date.fromstring("0001-01-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=31)), Date.fromstring("0001-02-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=59)), Date.fromstring("0001-03-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=151)), Date.fromstring("0001-06-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=153)), Date.fromstring("0001-06-03")) self.assertEqual(DateTime.fromdelta(datetime.timedelta(days=153, seconds=72000)), DateTime.fromstring("0001-06-03T20:00:00")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=365)), Date.fromstring("0002-01-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=396)), Date.fromstring("0002-02-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-366)), Date.fromstring("-0000-01-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-1)), Date.fromstring("-0000-12-31")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-335)), Date.fromstring("-0000-02-01")) self.assertEqual(Date.fromdelta(datetime.timedelta(days=-1)), Date.fromstring("-0000-12-31")) self.assertEqual(Date10.fromdelta(datetime.timedelta(days=-366)), Date10.fromstring("-0001-01-01")) self.assertEqual(Date10.fromdelta(datetime.timedelta(days=-326)), Date10.fromstring("-0001-02-10")) self.assertEqual(Date10.fromdelta(datetime.timedelta(days=-1)), Date10.fromstring("-0001-12-31Z")) # With timezone adjusting self.assertEqual(Date10.fromdelta(datetime.timedelta(hours=-22), adjust_timezone=True), Date10.fromstring("-0001-12-31-02:00")) self.assertEqual(Date10.fromdelta(datetime.timedelta(hours=-27), adjust_timezone=True), Date10.fromstring("-0001-12-31+03:00")) self.assertEqual( Date10.fromdelta(datetime.timedelta(hours=-27, minutes=-12), adjust_timezone=True), Date10.fromstring("-0001-12-31+03:12") ) self.assertEqual( DateTime10.fromdelta(datetime.timedelta(hours=-27, minutes=-12, seconds=-5)), DateTime10.fromstring("-0001-12-30T20:47:55") ) def test_todelta(self): self.assertEqual(Date.fromstring("0001-01-01").todelta(), datetime.timedelta(days=0)) self.assertEqual(Date.fromstring("0001-02-01").todelta(), datetime.timedelta(days=31)) self.assertEqual(Date.fromstring("0001-03-01").todelta(), datetime.timedelta(days=59)) self.assertEqual(Date.fromstring("0001-06-01").todelta(), datetime.timedelta(days=151)) self.assertEqual(Date.fromstring("0001-06-03").todelta(), datetime.timedelta(days=153)) self.assertEqual(DateTime.fromstring("0001-06-03T20:00:00").todelta(), datetime.timedelta(days=153, seconds=72000)) self.assertEqual(Date.fromstring("0001-01-01-01:00").todelta(), datetime.timedelta(seconds=3600)) self.assertEqual(Date.fromstring("0001-01-01-07:00").todelta(), datetime.timedelta(seconds=3600 * 7)) self.assertEqual(Date.fromstring("0001-01-01+10:00").todelta(), datetime.timedelta(seconds=-3600 * 10)) self.assertEqual(Date.fromstring("0001-01-02+10:00").todelta(), DayTimeDuration.fromstring("PT14H").get_timedelta()) self.assertEqual(Date.fromstring("-0000-12-31-01:00").todelta(), DayTimeDuration.fromstring("-PT23H").get_timedelta()) self.assertEqual(Date10.fromstring("-0001-12-31-01:00").todelta(), DayTimeDuration.fromstring("-PT23H").get_timedelta()) self.assertEqual(Date.fromstring("-0000-12-31+01:00").todelta(), DayTimeDuration.fromstring("-P1DT1H").get_timedelta()) self.assertEqual(Date.fromstring("0002-01-01").todelta(), datetime.timedelta(days=365)) self.assertEqual(Date.fromstring("0002-02-01").todelta(), datetime.timedelta(days=396)) self.assertEqual(Date.fromstring("-0000-01-01").todelta(), datetime.timedelta(days=-366)) self.assertEqual(Date.fromstring("-0000-02-01").todelta(), datetime.timedelta(days=-335)) self.assertEqual(Date.fromstring("-0000-12-31").todelta(), datetime.timedelta(days=-1)) self.assertEqual(Date10.fromstring("-0001-01-01").todelta(), datetime.timedelta(days=-366)) self.assertEqual(Date10.fromstring("-0001-02-10").todelta(), datetime.timedelta(days=-326)) self.assertEqual(Date10.fromstring("-0001-12-31Z").todelta(), datetime.timedelta(days=-1)) self.assertEqual(Date10.fromstring("-0001-12-31-02:00").todelta(), datetime.timedelta(hours=-22)) self.assertEqual(Date10.fromstring("-0001-12-31+03:00").todelta(), datetime.timedelta(hours=-27)) self.assertEqual(Date10.fromstring("-0001-12-31+03:00").todelta(), datetime.timedelta(hours=-27)) self.assertEqual(Date10.fromstring("-0001-12-31+03:12").todelta(), datetime.timedelta(hours=-27, minutes=-12)) def test_to_and_from_delta(self): for month, day in [(1, 1), (1, 2), (2, 1), (2, 28), (3, 10), (6, 30), (12, 31)]: fmt1 = '{:04}-%s' % '{:02}-{:02}'.format(month, day) fmt2 = '{}-%s' % '{:02}-{:02}'.format(month, day) days = sum(MONTH_DAYS[m] for m in range(1, month)) + day - 1 for year in range(1, 15000): if year <= 500 or 9900 <= year <= 10100 or random.randint(1, 20) == 1: date_string = fmt1.format(year) if year < 10000 else fmt2.format(year) dt1 = Date10.fromstring(date_string) delta1 = dt1.todelta() delta2 = datetime.timedelta(days=days) self.assertEqual(delta1, delta2, msg="Failed for %r: %r != %r" % (dt1, delta1, delta2)) dt2 = Date10.fromdelta(delta2) self.assertEqual(dt1, dt2, msg="Failed for year %d: %r != %r" % (year, dt1, dt2)) days += 366 if isleap(year if month <= 2 else year + 1) else 365 def test_to_and_from_delta_bce(self): for month, day in [(1, 1), (1, 2), (2, 1), (2, 28), (3, 10), (5, 26), (6, 30), (12, 31)]: fmt1 = '-{:04}-%s' % '{:02}-{:02}'.format(month, day) fmt2 = '{}-%s' % '{:02}-{:02}'.format(month, day) days = -sum(MONTH_DAYS_LEAP[m] for m in range(month, 13)) + day - 1 for year in range(-1, -15000, -1): if year >= -500 or -9900 >= year >= -10100 or random.randint(1, 20) == 1: date_string = fmt1.format(abs(year)) if year > -10000 else fmt2.format(year) dt1 = Date10.fromstring(date_string) delta1 = dt1.todelta() delta2 = datetime.timedelta(days=days) self.assertEqual(delta1, delta2, msg="Failed for %r: %r != %r" % (dt1, delta1, delta2)) dt2 = Date10.fromdelta(delta2) self.assertEqual(dt1, dt2, msg="Failed for year %d: %r != %r" % (year, dt1, dt2)) days -= 366 if isleap(year if month <= 2 else year + 1) else 365 def test_add_operator(self): date = Date.fromstring date10 = Date10.fromstring daytime_duration = DayTimeDuration.fromstring self.assertEqual(date("0001-01-01") + daytime_duration('P2D'), date("0001-01-03")) self.assertEqual(date("0001-01-01") + daytime_duration('-P2D'), date("0000-12-30")) self.assertEqual(date("-0001-01-01") + daytime_duration('P2D'), date("-0001-01-03")) self.assertEqual(date("-0001-12-01") + daytime_duration('P30D'), date("-0001-12-31")) self.assertEqual(date("-0001-12-01") + daytime_duration('P31D'), date("0000-01-01")) self.assertEqual(date10("-0001-12-01") + daytime_duration('P31D'), date10("0001-01-01")) self.assertEqual(date("0001-01-01") + YearMonthDuration(months=12), Date(2, 1, 1)) self.assertEqual(date("-0003-01-01") + YearMonthDuration(months=12), Date(-3, 1, 1)) self.assertEqual(date("-0004-01-01") + YearMonthDuration(months=13), Date(-4, 2, 1)) self.assertEqual(date("0001-01-05") + YearMonthDuration(months=25), Date(3, 2, 5)) with self.assertRaises(TypeError) as err: date("0001-01-05") + date("0001-01-01") self.assertEqual(str(err.exception), "wrong type " "for operand Date(1, 1, 1)") with self.assertRaises(TypeError) as err: date("0001-01-05") + 10 self.assertEqual(str(err.exception), "wrong type for operand 10") self.assertEqual(Time(13, 30, 00) + daytime_duration('PT3M21S'), Time(13, 33, 21)) self.assertEqual(Time(21, 00, 00) + datetime.timedelta(seconds=105), Time(21, 1, 45)) with self.assertRaises(TypeError) as err: Time(21, 00, 00) + 105 self.assertEqual(str(err.exception), "wrong type for operand 105") def test_sub_operator(self): date = Date.fromstring date10 = Date10.fromstring daytime_duration = DayTimeDuration.fromstring self.assertEqual(date("2002-04-02") - date("2002-04-01"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-2002-04-02") - date("-2002-04-01"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-0002-01-01") - date("-0001-12-31"), DayTimeDuration.fromstring('-P729D')) self.assertEqual(date("-0101-01-01") - date("-0100-12-31"), DayTimeDuration.fromstring('-P729D')) self.assertEqual(date("15032-11-12") - date("15032-11-11"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-9999-11-12") - date("-9999-11-11"), DayTimeDuration(seconds=86400)) self.assertEqual(date("-9999-11-12") - date("-9999-11-12"), DayTimeDuration(seconds=0)) self.assertEqual(date("-9999-11-11") - date("-9999-11-12"), DayTimeDuration(seconds=-86400)) self.assertEqual(date10("-2001-04-02-02:00") - date10("-2001-04-01"), DayTimeDuration.fromstring('P1DT2H')) self.assertEqual(Time(13, 30, 00) - Time(13, 00, 00), daytime_duration('PT30M')) self.assertEqual(Time(13, 30, 00) - Time(13, 59, 59), daytime_duration('-PT29M59S')) self.assertEqual(Time(13, 30, 00) - daytime_duration('PT3M21S'), Time(13, 26, 39)) self.assertEqual(Time(21, 00, 00) - datetime.timedelta(seconds=105), Time(20, 58, 15)) with self.assertRaises(TypeError) as err: Time(21, 00, 00) - 105 self.assertEqual(str(err.exception), "wrong type for operand 105") def test_hashing(self): dt = DateTime.fromstring("2002-04-02T12:00:00-01:00") self.assertIsInstance(hash(dt), int) def test_isinstance(self): dt = DateTime.fromstring("2002-04-02T12:00:00-01:00") self.assertIsInstance(dt, DateTime) self.assertIsInstance(dt, DateTime10) self.assertIsInstance(dt, AnyAtomicType) self.assertNotIsInstance(dt, Date10) self.assertNotIsInstance(dt, StringProxy) self.assertNotIsInstance(dt, str) def test_issubclass(self): self.assertTrue(issubclass(AbstractDateTime, AnyAtomicType)) self.assertTrue(issubclass(OrderedDateTime, AnyAtomicType)) self.assertTrue(issubclass(DateTime10, AnyAtomicType)) self.assertTrue(issubclass(Date10, AnyAtomicType)) self.assertTrue(issubclass(GregorianDay, AnyAtomicType)) self.assertTrue(issubclass(GregorianYearMonth, AnyAtomicType)) self.assertFalse(issubclass(DateTime10, Date10)) self.assertFalse(issubclass(DateTime10, StringProxy)) self.assertFalse(issubclass(DateTime10, str)) class DurationTypesTest(unittest.TestCase): def test_init(self): self.assertIsInstance(Duration(months=1, seconds=37000), Duration) with self.assertRaises(ValueError) as err: Duration(months=-1, seconds=1) self.assertEqual(str(err.exception), "signs differ: (months=-1, seconds=1)") seconds = Decimal('1.0100001') self.assertNotEqual(Duration(seconds=seconds).seconds, seconds) with self.assertRaises(OverflowError): Duration(months=2 ** 32) with self.assertRaises(OverflowError): Duration(seconds=Decimal('1' * 40)) self.assertEqual(DayTimeDuration(300).seconds, 300) self.assertEqual(YearMonthDuration(10).months, 10) def test_init_fromstring(self): self.assertIsInstance(Duration.fromstring('P1Y'), Duration) self.assertIsInstance(Duration.fromstring('P1M'), Duration) self.assertIsInstance(Duration.fromstring('P1D'), Duration) self.assertIsInstance(Duration.fromstring('PT0H'), Duration) self.assertIsInstance(Duration.fromstring('PT1M'), Duration) self.assertIsInstance(Duration.fromstring('PT0.0S'), Duration) self.assertRaises(ValueError, Duration.fromstring, 'P') self.assertRaises(ValueError, Duration.fromstring, 'PT') self.assertRaises(ValueError, Duration.fromstring, '1Y') self.assertRaises(ValueError, Duration.fromstring, 'P1W1DT5H3M23.9S') self.assertRaises(ValueError, Duration.fromstring, 'P1.5Y') self.assertRaises(ValueError, Duration.fromstring, 'PT1.1H') self.assertRaises(ValueError, Duration.fromstring, 'P1.0DT5H3M23.9S') self.assertIsInstance(DayTimeDuration.fromstring('PT0.0S'), DayTimeDuration) with self.assertRaises(ValueError) as err: DayTimeDuration.fromstring('P1MT0.0S') self.assertEqual(str(err.exception), "months must be 0 for 'DayTimeDuration'") self.assertIsInstance(YearMonthDuration.fromstring('P1Y'), YearMonthDuration) with self.assertRaises(ValueError) as err: YearMonthDuration.fromstring('P1YT10S') self.assertEqual(str(err.exception), "seconds must be 0 for 'YearMonthDuration'") def test_string_representation(self): self.assertEqual(repr(Duration(months=1, seconds=86400)), 'Duration(months=1, seconds=86400)') self.assertEqual(repr(Duration.fromstring('P3Y1D')), 'Duration(months=36, seconds=86400)') self.assertEqual(repr(YearMonthDuration.fromstring('P3Y6M')), 'YearMonthDuration(months=42)') self.assertEqual(repr(DayTimeDuration.fromstring('P1DT6H')), 'DayTimeDuration(seconds=108000)') def test_as_string(self): self.assertEqual(str(Duration.fromstring('P3Y1D')), 'P3Y1D') self.assertEqual(str(Duration.fromstring('PT2M10.4S')), 'PT2M10.4S') self.assertEqual(str(Duration.fromstring('PT2400H')), 'P100D') self.assertEqual(str(Duration.fromstring('-P15M')), '-P1Y3M') self.assertEqual(str(Duration.fromstring('-P809YT3H5M5S')), '-P809YT3H5M5S') self.assertEqual(str(Duration.fromstring('-PT1H8S')), '-PT1H8S') self.assertEqual(str(Duration.fromstring('PT2H5M')), 'PT2H5M') self.assertEqual(str(Duration.fromstring('P0Y')), 'PT0S') self.assertEqual(str(YearMonthDuration.fromstring('P3Y6M')), 'P3Y6M') self.assertEqual(str(YearMonthDuration.fromstring('-P3Y6M')), '-P3Y6M') self.assertEqual(str(YearMonthDuration.fromstring('P7M')), 'P7M') self.assertEqual(str(YearMonthDuration.fromstring('P2Y')), 'P2Y') self.assertEqual(str(DayTimeDuration.fromstring('P1DT6H')), 'P1DT6H') def test_eq(self): self.assertEqual(Duration.fromstring('PT147.5S'), (0, 147.5)) self.assertEqual(Duration.fromstring('PT147.3S'), (0, Decimal("147.3"))) self.assertEqual(Duration.fromstring('PT2M10.4S'), (0, Decimal("130.4"))) self.assertEqual(Duration.fromstring('PT5H3M23.9S'), (0, Decimal("18203.9"))) self.assertEqual(Duration.fromstring('P1DT5H3M23.9S'), (0, Decimal("104603.9"))) self.assertEqual(Duration.fromstring('P31DT5H3M23.9S'), (0, Decimal("2696603.9"))) self.assertEqual(Duration.fromstring('P1Y1DT5H3M23.9S'), (12, Decimal("104603.9"))) self.assertEqual(Duration.fromstring('-P809YT3H5M5S'), (-9708, -11105)) self.assertEqual(Duration.fromstring('P15M'), (15, 0)) self.assertEqual(Duration.fromstring('P1Y'), (12, 0)) self.assertEqual(Duration.fromstring('P3Y1D'), (36, 3600 * 24)) self.assertEqual(Duration.fromstring('PT2400H'), (0, 8640000)) self.assertEqual(Duration.fromstring('PT4500M'), (0, 4500 * 60)) self.assertEqual(Duration.fromstring('PT4500M70S'), (0, 4500 * 60 + 70)) self.assertEqual(Duration.fromstring('PT5529615.3S'), (0, Decimal('5529615.3'))) self.assertEqual(Duration.fromstring('P3Y1D'), UntypedAtomic('P3Y1D')) self.assertFalse(Duration.fromstring('P3Y1D') == UntypedAtomic('P3Y2D')) def test_ne(self): self.assertNotEqual(Duration.fromstring('PT147.3S'), None) self.assertNotEqual(Duration.fromstring('PT147.3S'), (0, 147.3)) self.assertNotEqual(Duration.fromstring('P3Y1D'), (36, 3600 * 2)) self.assertNotEqual(Duration.fromstring('P3Y1D'), (36, 3600 * 24, 0)) self.assertNotEqual(Duration.fromstring('P3Y1D'), None) self.assertNotEqual(Duration.fromstring('P3Y1D'), Duration.fromstring('P3Y2D')) self.assertNotEqual(Duration.fromstring('P3Y1D'), YearMonthDuration.fromstring('P3Y')) self.assertNotEqual(Duration.fromstring('P3Y1D'), UntypedAtomic('P3Y2D')) self.assertFalse(Duration.fromstring('P3Y1D') != UntypedAtomic('P3Y1D')) def test_lt(self): self.assertTrue(Duration(months=15) < Duration(months=16)) self.assertFalse(Duration(months=16) < Duration(months=16)) self.assertTrue(Duration(months=16) < Duration.fromstring('P16M1D')) self.assertTrue(Duration(months=16) < Duration.fromstring('P16MT1H')) self.assertTrue(Duration(months=16) < Duration.fromstring('P16MT1M')) self.assertTrue(Duration(months=16) < Duration.fromstring('P16MT1S')) self.assertFalse(Duration(months=16) < Duration.fromstring('P16MT0S')) self.assertTrue(Time(20, 15, 0) < Time(21, 0, 0)) self.assertFalse(Time(21, 15, 0) < Time(21, 0, 0)) with self.assertRaises(TypeError) as err: _ = Duration(months=16) < 16 self.assertEqual(str(err.exception), "wrong type for operand 16") def test_le(self): self.assertTrue(Duration(months=15) <= Duration(months=16)) self.assertTrue(Duration(months=16) <= Duration(16)) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16M1D')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT1H')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT1M')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT1S')) self.assertTrue(Duration(months=16) <= Duration.fromstring('P16MT0S')) self.assertTrue(Time(11, 10, 35) <= Time(11, 10, 35)) self.assertFalse(Time(11, 10, 35) <= Time(11, 10, 34)) def test_gt(self): self.assertTrue(Duration(months=16) > Duration(15)) self.assertFalse(Duration(months=16) > Duration(16)) self.assertFalse(Time(23, 59, 59) > Time(23, 59, 59)) self.assertTrue(Time(9, 0, 0) > Time(8, 59, 59)) def test_ge(self): self.assertTrue(Duration(16) >= Duration(15)) self.assertTrue(Duration(16) >= Duration(16)) self.assertTrue(Duration.fromstring('P1Y1DT1S') >= Duration.fromstring('P1Y1D')) self.assertTrue(Time(23, 59, 59) >= Time(23, 59, 59)) self.assertFalse(Time(23, 59, 58) >= Time(23, 59, 59)) def test_incomparable_values(self): self.assertFalse(Duration(1) < Duration.fromstring('P30D')) self.assertFalse(Duration(1) <= Duration.fromstring('P30D')) self.assertFalse(Duration(1) > Duration.fromstring('P30D')) self.assertFalse(Duration(1) >= Duration.fromstring('P30D')) def test_add_operator(self): daytime_duration = DayTimeDuration.fromstring year_month_duration = YearMonthDuration.fromstring self.assertEqual(daytime_duration('P2D') + daytime_duration('P1D'), DayTimeDuration(seconds=86400 * 3)) self.assertEqual(daytime_duration('P2D') + Date10(1999, 8, 12), Date10(1999, 8, 14)) self.assertEqual(year_month_duration('P2Y') + year_month_duration('P1Y'), YearMonthDuration(months=36)) self.assertEqual(year_month_duration('P2Y') + Date10(1999, 8, 12), Date10(2001, 8, 12)) with self.assertRaises(TypeError) as err: _ = year_month_duration('P2Y') + daytime_duration('P1D') self.assertIn("cannot add Base64Binary(b'YWxwaGE=') with self.assertRaises(TypeError): _ = HexBinary(b'F859') > HexBinary(b'F859') self.assertGreater(HexBinary(b'F859', ordered=True), HexBinary(b'F858')) self.assertFalse(HexBinary(b'F859', ordered=True) > HexBinary(b'F859')) self.assertFalse(HexBinary(b'F858', ordered=True) > HexBinary(b'F859')) def test_greater_or_equal(self): with self.assertRaises(TypeError): _ = HexBinary(b'F859') >= Base64Binary(b'YWxwaGE=') with self.assertRaises(TypeError): _ = HexBinary(b'F859') >= HexBinary(b'F859') self.assertGreaterEqual(HexBinary(b'F859', ordered=True), HexBinary(b'F859')) self.assertGreaterEqual(HexBinary(b'F859', ordered=True), HexBinary(b'F858')) self.assertFalse(HexBinary(b'F858', ordered=True) >= HexBinary(b'F859')) def test_validate(self): self.assertIsNone(Base64Binary.validate(Base64Binary(b'YWxwaGE='))) self.assertIsNone(Base64Binary.validate(b'YWxwaGE=')) with self.assertRaises(TypeError): Base64Binary.validate(67) self.assertIsNone(Base64Binary.validate(b' ')) with self.assertRaises(ValueError): Base64Binary.validate('FF') self.assertIsNone(HexBinary.validate(HexBinary(b'F859'))) self.assertIsNone(HexBinary.validate(b'F859')) with self.assertRaises(TypeError): HexBinary.validate(67) self.assertIsNone(HexBinary.validate(b' ')) with self.assertRaises(ValueError): HexBinary.validate('XY') def test_encoder(self): self.assertEqual(Base64Binary.encoder(b'alpha'), b'YWxwaGE=') def test_decoder(self): try: self.assertEqual(Base64Binary(b'YWxwaGE=').decode(), b'alpha') except TypeError: # Issue #3001 of pypy3.6 with codecs.decode(), fixed with PyPy 7.2.0. if platform.python_implementation() != 'PyPy': raise def test_isinstance(self): value = Base64Binary(b'YWxwaGE=') self.assertIsInstance(value, Base64Binary) self.assertIsInstance(value, AnyAtomicType) self.assertNotIsInstance(value, HexBinary) self.assertNotIsInstance(value, StringProxy) self.assertNotIsInstance(value, bytes) value = HexBinary(b'F859') self.assertIsInstance(value, HexBinary) self.assertIsInstance(value, AnyAtomicType) self.assertNotIsInstance(value, Base64Binary) self.assertNotIsInstance(value, StringProxy) self.assertNotIsInstance(value, bytes) def test_issubclass(self): self.assertTrue(issubclass(Base64Binary, AnyAtomicType)) self.assertFalse(issubclass(Base64Binary, HexBinary)) self.assertFalse(issubclass(Base64Binary, StringProxy)) self.assertFalse(issubclass(Base64Binary, bytes)) self.assertTrue(issubclass(HexBinary, AnyAtomicType)) self.assertFalse(issubclass(HexBinary, Base64Binary)) self.assertFalse(issubclass(HexBinary, StringProxy)) self.assertFalse(issubclass(HexBinary, bytes)) class QNameTypesTest(unittest.TestCase): def test_initialization(self): qname = QName(None, 'foo') self.assertEqual(qname.namespace, '') self.assertEqual(qname.local_name, 'foo') self.assertIsNone(qname.prefix) self.assertEqual(qname.expanded_name, 'foo') self.assertEqual(qname.braced_uri_name, 'Q{}foo') with self.assertRaises(ValueError) as ctx: QName(None, 'tns:foo') self.assertIn('non-empty prefix with no namespace', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: QName(10, 'foo') self.assertIn("invalid type ", str(ctx.exception)) qname = QName('http://xpath.test/ns', 'foo') self.assertEqual(qname.namespace, 'http://xpath.test/ns') self.assertEqual(qname.local_name, 'foo') self.assertIsNone(qname.prefix) self.assertEqual(qname.expanded_name, '{http://xpath.test/ns}foo') self.assertEqual(qname.braced_uri_name, 'Q{http://xpath.test/ns}foo') qname = QName('http://xpath.test/ns', 'tst:foo') self.assertEqual(qname.namespace, 'http://xpath.test/ns') self.assertEqual(qname.local_name, 'foo') self.assertEqual(qname.prefix, 'tst') self.assertEqual(qname.expanded_name, '{http://xpath.test/ns}foo') def test_string_representation(self): qname = QName('http://xpath.test/ns', 'tst:foo') self.assertEqual(repr(qname), "QName(uri='http://xpath.test/ns', qname='tst:foo')") qname = QName(uri=None, qname='foo') self.assertEqual(repr(qname), "QName(uri='', qname='foo')") qname = QName(uri='', qname='foo') self.assertEqual(repr(qname), "QName(uri='', qname='foo')") def test_hash_value(self): qname = QName('http://xpath.test/ns', 'tst:foo') self.assertEqual(hash(qname), hash(('http://xpath.test/ns', 'foo'))) def test_equivalence(self): qname1 = QName('http://xpath.test/ns1', 'tst1:foo') qname2 = QName('http://xpath.test/ns1', 'tst2:foo') qname3 = QName('http://xpath.test/ns2', 'tst2:foo') self.assertEqual(qname1, qname2) self.assertNotEqual(qname1, qname3) self.assertNotEqual(qname2, qname3) self.assertEqual(qname1, 'tst1:foo') with self.assertRaises(TypeError) as ctx: _ = qname1 == 1 self.assertIn('cannot compare', str(ctx.exception)) def test_isinstance(self): qname = QName('http://xpath.test/ns', 'tst:foo') self.assertIsInstance(qname, QName) self.assertIsInstance(qname, AnyAtomicType) self.assertNotIsInstance(qname, Notation) self.assertNotIsInstance(qname, StringProxy) self.assertNotIsInstance(qname, str) def test_issubclass(self): self.assertTrue(issubclass(QName, AnyAtomicType)) self.assertFalse(issubclass(QName, Notation)) self.assertFalse(issubclass(QName, StringProxy)) self.assertFalse(issubclass(QName, str)) def test_notation(self): with self.assertRaises(TypeError) as ec: Notation(None, 'foo') self.assertEqual(str(ec.exception), "can't instantiate xs:NOTATION objects") class EffectiveNotation(Notation): def __init__(self, uri, qname): super().__init__(uri, qname) notation = EffectiveNotation(None, 'foo') self.assertEqual(notation, QName(None, 'foo')) notation = EffectiveNotation('http://xpath.test/ns1', 'tst1:foo') self.assertEqual(notation, QName('http://xpath.test/ns1', 'tst2:foo')) self.assertEqual(hash(notation), hash(('http://xpath.test/ns1', 'foo'))) self.assertIsInstance(notation, Notation) self.assertIsInstance(notation, AnyAtomicType) self.assertNotIsInstance(notation, StringProxy) self.assertNotIsInstance(notation, str) self.assertTrue(issubclass(Notation, AnyAtomicType)) self.assertFalse(issubclass(Notation, QName)) self.assertFalse(issubclass(Notation, StringProxy)) self.assertFalse(issubclass(Notation, str)) class AnyUriTest(unittest.TestCase): def test_init(self): uri = AnyURI('http://xpath.test') self.assertEqual(uri, 'http://xpath.test') self.assertEqual(AnyURI(b'http://xpath.test'), 'http://xpath.test') self.assertEqual(AnyURI(uri), uri) self.assertEqual(AnyURI(UntypedAtomic('http://xpath.test')), uri) with self.assertRaises(TypeError): AnyURI(1) def test_string_representation(self): self.assertEqual(repr(AnyURI('http://xpath.test')), "AnyURI('http://xpath.test')") self.assertEqual(str(AnyURI('http://xpath.test')), 'http://xpath.test') def test_bool_value(self): self.assertTrue(bool(AnyURI('http://xpath.test'))) self.assertFalse(bool(AnyURI(''))) def test_hash_value(self): self.assertEqual(hash(AnyURI('http://xpath.test')), hash('http://xpath.test')) def test_in_operator(self): uri = AnyURI('http://xpath.test') self.assertIn('xpath', uri) self.assertNotIn('example', uri) def test_comparison_operators(self): uri = AnyURI('http://xpath.test') self.assertTrue(uri != 'http://example.test') self.assertTrue(uri != AnyURI('http://example.test')) with self.assertRaises(TypeError): _ = uri == 10 with self.assertRaises(TypeError): _ = uri != 10 self.assertLess(AnyURI('1'), AnyURI('2')) self.assertLess(AnyURI('1'), '2') self.assertLessEqual(AnyURI('1'), AnyURI('1')) self.assertLessEqual(AnyURI('1'), '1') self.assertGreater(AnyURI('2'), AnyURI('1')) self.assertGreater(AnyURI('2'), '1') self.assertGreaterEqual(AnyURI('1'), AnyURI('1')) self.assertGreaterEqual(AnyURI('1'), '1') def test_validate(self): uri = AnyURI('http://xpath.test') self.assertIsNone(AnyURI.validate(uri)) self.assertIsNone(AnyURI.validate(b'http://xpath.test')) self.assertIsNone(AnyURI.validate('http://xpath.test')) with self.assertRaises(TypeError): AnyURI.validate(1) with self.assertRaises(ValueError): AnyURI.validate('http:://xpath.test') with self.assertRaises(ValueError): AnyURI.validate('http://[xpath.test') def test_isinstance(self): uri = AnyURI('http://xpath.test') self.assertIsInstance(uri, AnyURI) self.assertIsInstance(uri, AnyAtomicType) self.assertNotIsInstance(uri, StringProxy) self.assertNotIsInstance(uri, str) def test_issubclass(self): self.assertTrue(issubclass(AnyURI, AnyAtomicType)) self.assertFalse(issubclass(AnyURI, StringProxy)) self.assertFalse(issubclass(AnyURI, str)) class TypeProxiesTest(unittest.TestCase): def test_numeric_proxy(self): self.assertIsInstance(10, NumericProxy) self.assertIsInstance(17.8, NumericProxy) self.assertIsInstance(Decimal('18.12'), NumericProxy) self.assertNotIsInstance(True, NumericProxy) self.assertNotIsInstance(Duration.fromstring('P1Y'), NumericProxy) self.assertEqual(NumericProxy(), 0.0) self.assertEqual(NumericProxy(9), 9.0) self.assertEqual(NumericProxy('49'), 49.0) self.assertFalse(issubclass(bool, NumericProxy)) self.assertFalse(issubclass(str, NumericProxy)) self.assertTrue(issubclass(int, NumericProxy)) self.assertTrue(issubclass(float, NumericProxy)) self.assertTrue(issubclass(Decimal, NumericProxy)) self.assertFalse(issubclass(DateTime10, NumericProxy)) def test_arithmetic_proxy(self): self.assertIsInstance(10, ArithmeticProxy) self.assertEqual(ArithmeticProxy(), 0.0) self.assertEqual(ArithmeticProxy(8.0), 8.0) self.assertEqual(ArithmeticProxy('81.0'), 81.0) self.assertFalse(issubclass(bool, ArithmeticProxy)) self.assertFalse(issubclass(str, ArithmeticProxy)) self.assertTrue(issubclass(int, ArithmeticProxy)) self.assertTrue(issubclass(float, ArithmeticProxy)) self.assertTrue(issubclass(Decimal, ArithmeticProxy)) def test_boolean_proxy(self): self.assertTrue(BooleanProxy(1)) self.assertFalse(BooleanProxy(float('nan'))) self.assertIsNone(BooleanProxy.validate(True)) self.assertIsNone(BooleanProxy.validate('true')) self.assertIsNone(BooleanProxy.validate('1')) self.assertIsNone(BooleanProxy.validate('false')) self.assertIsNone(BooleanProxy.validate('0')) with self.assertRaises(TypeError): BooleanProxy.validate(1) with self.assertRaises(ValueError): BooleanProxy.validate('2') self.assertIsInstance(False, BooleanProxy) self.assertIsInstance(True, BooleanProxy) self.assertNotIsInstance(0, BooleanProxy) self.assertNotIsInstance(1, BooleanProxy) self.assertNotIsInstance('0', BooleanProxy) self.assertNotIsInstance('1', BooleanProxy) self.assertTrue(issubclass(BooleanProxy, AnyAtomicType)) self.assertFalse(issubclass(BooleanProxy, int)) def test_decimal_proxy(self): self.assertIsInstance(DecimalProxy(20.0), Decimal) self.assertEqual(Decimal('10'), DecimalProxy('10')) self.assertEqual(Decimal('10'), DecimalProxy(Decimal('10'))) self.assertEqual(Decimal('10.0'), DecimalProxy(10.0)) self.assertEqual(Decimal(1), DecimalProxy(True)) with self.assertRaises(TypeError): DecimalProxy(None) with self.assertRaises(ArithmeticError): DecimalProxy([]) with self.assertRaises(ValueError): DecimalProxy('false') with self.assertRaises(ValueError): DecimalProxy('INF') with self.assertRaises(ValueError): DecimalProxy('NaN') with self.assertRaises(ValueError): DecimalProxy(float('nan')) with self.assertRaises(ValueError): DecimalProxy(float('inf')) self.assertIsNone(DecimalProxy.validate(Decimal(-2.0))) self.assertIsNone(DecimalProxy.validate(17)) self.assertIsNone(DecimalProxy.validate('17')) with self.assertRaises(ValueError): DecimalProxy.validate(Decimal('nan')) with self.assertRaises(ValueError): DecimalProxy.validate('alpha') with self.assertRaises(TypeError): DecimalProxy.validate(True) self.assertIsInstance(1, DecimalProxy) self.assertIsInstance(-5, DecimalProxy) self.assertIsInstance(Decimal('9.0'), DecimalProxy) self.assertIsInstance(Integer(-5), DecimalProxy) self.assertNotIsInstance(True, DecimalProxy) self.assertNotIsInstance(1.0, DecimalProxy) self.assertNotIsInstance('1', DecimalProxy) self.assertTrue(issubclass(DecimalProxy, AnyAtomicType)) self.assertFalse(issubclass(DecimalProxy, int)) def test_double_proxy(self): self.assertIsInstance(DoubleProxy10(20), float) self.assertEqual(DoubleProxy10('10'), 10.0) self.assertTrue(math.isnan(DoubleProxy10('NaN'))) self.assertTrue(math.isinf(DoubleProxy10('INF'))) self.assertTrue(math.isinf(DoubleProxy10('-INF'))) # noinspection PyTypeChecker self.assertTrue(math.isinf(DoubleProxy('+INF'))) with self.assertRaises(ValueError): DoubleProxy10('+INF') with self.assertRaises(ValueError): DoubleProxy('nan') with self.assertRaises(ValueError): DoubleProxy('inf') self.assertIs(DoubleProxy10('NaN'), DoubleProxy10('NaN')) self.assertIs(DoubleProxy10('NaN'), DoubleProxy('NaN')) self.assertIsNone(DoubleProxy10.validate(1.9)) self.assertIsNone(DoubleProxy10.validate('1.9')) with self.assertRaises(TypeError): DoubleProxy10.validate(Float10('1.9')) with self.assertRaises(ValueError): DoubleProxy10.validate('six') self.assertIsInstance(1.0, DoubleProxy10) self.assertIsInstance(-5.9, DoubleProxy10) self.assertNotIsInstance(Decimal('9.0'), DoubleProxy10) self.assertNotIsInstance(Integer(-5), DoubleProxy10) self.assertNotIsInstance(True, DoubleProxy10) self.assertNotIsInstance(1, DoubleProxy10) self.assertNotIsInstance('1', DoubleProxy10) self.assertTrue(issubclass(DoubleProxy10, AnyAtomicType)) self.assertFalse(issubclass(DoubleProxy10, float)) def test_string_proxy(self): self.assertIsInstance(StringProxy(20), str) self.assertIsNone(StringProxy.validate('alpha')) with self.assertRaises(TypeError): StringProxy.validate(b'alpha') self.assertIsInstance('abc', StringProxy) self.assertIsInstance(NormalizedString('abc'), StringProxy) self.assertNotIsInstance(Decimal('9.0'), StringProxy) self.assertNotIsInstance(Integer(-5), StringProxy) self.assertNotIsInstance(True, StringProxy) self.assertNotIsInstance(1, StringProxy) self.assertNotIsInstance(1.0, StringProxy) self.assertTrue(issubclass(StringProxy, AnyAtomicType)) self.assertFalse(issubclass(StringProxy, str)) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_elementpath.py000066400000000000000000000021321476131650400240060ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests in imported modules are built using the examples of the # XPath standards, published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # if __name__ == '__main__': import unittest import os def load_tests(loader, tests, pattern): tests_dir = os.path.dirname(__file__) tests.addTests(loader.discover(start_dir=tests_dir, pattern=pattern or 'test*.py')) return tests unittest.main() sissaschool-elementpath-d3688c7/tests/test_etree.py000066400000000000000000000452421476131650400226150ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import sys import unittest import platform import importlib import io from pathlib import Path try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath.etree import ElementTree, PyElementTree, \ SafeXMLParser, defuse_xml, etree_tostring, is_etree_document, \ is_lxml_etree_element, is_lxml_etree_document, etree_deep_equal, \ etree_iter_paths XML_WITH_NAMESPACES = '\n' \ ' \n' \ '' class TestElementTree(unittest.TestCase): @unittest.skipUnless(platform.python_implementation() == 'CPython', "requires CPython") def test_imported_modules(self): self.assertIs(importlib.import_module('xml.etree.ElementTree'), ElementTree) self.assertIs(importlib.import_module('xml.etree').ElementTree, ElementTree) self.assertIsNot(ElementTree.Element, ElementTree._Element_Py, msg="cElementTree is not available!") def test_element_string_serialization(self): self.assertRaises(TypeError, etree_tostring, '') elem = ElementTree.Element('element') self.assertEqual(etree_tostring(elem), '') self.assertEqual(etree_tostring(elem, xml_declaration=True), '') self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', indent=' '), b' ') self.assertEqual(etree_tostring(elem, encoding='us-ascii', xml_declaration=True), b'\n') elem.text = '\t' self.assertEqual(etree_tostring(elem), ' ') self.assertEqual(etree_tostring(elem, spaces_for_tab=2), ' ') self.assertEqual(etree_tostring(elem, spaces_for_tab=0), '') self.assertEqual(etree_tostring(elem, spaces_for_tab=None), '\t') elem.text = '\n\n' self.assertEqual(etree_tostring(elem), '\n\n') self.assertEqual(etree_tostring(elem, indent=' '), ' \n\n ') elem.text = '\nfoo\n' self.assertEqual(etree_tostring(elem), '\nfoo\n') self.assertEqual(etree_tostring(elem, indent=' '), ' \n foo\n ') elem.text = None self.assertEqual(etree_tostring(elem, encoding='ascii'), b"\n") self.assertEqual(etree_tostring(elem, encoding='ascii', xml_declaration=False), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8'), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='iso-8859-1'), b"\n") self.assertEqual(etree_tostring(elem, encoding='iso-8859-1', xml_declaration=False), b"") self.assertEqual(etree_tostring(elem, method='html'), '') self.assertEqual(etree_tostring(elem, method='text'), '') root = ElementTree.XML('\n' ' text1\n' ' text2\n' '') self.assertEqual(etree_tostring(root, method='text'), '\n text1\n text2') self.assertEqual(etree_tostring(root, max_lines=1), '\n ...\n ...\n') root = ElementTree.XML(XML_WITH_NAMESPACES) result = etree_tostring(root) self.assertNotEqual(result, XML_WITH_NAMESPACES) self.assertNotIn('pxa', result) self.assertNotIn('pxa', result) self.assertRegex(result, r'xmlns:ns\d="http://xpath.test/nsa') self.assertRegex(result, r'xmlns:ns\d="http://xpath.test/nsb') namespaces = { 'pxa': "http://xpath.test/nsa", 'pxb': "http://xpath.test/nsb" } expected = '\n' \ ' \n' \ '' self.assertEqual(etree_tostring(root, namespaces), expected) namespaces = { '': "http://xpath.test/nsa", 'pxa': "http://xpath.test/nsa", 'pxb': "http://xpath.test/nsb" } self.assertEqual(etree_tostring(root, namespaces), expected) namespaces = { '': "http://xpath.test/nsa", 'pxb': "http://xpath.test/nsb" } expected = '\n' \ ' \n' \ '' self.assertEqual(etree_tostring(root, namespaces), expected) def test_py_element_string_serialization(self): elem = PyElementTree.Element('element') self.assertEqual(etree_tostring(elem), '') self.assertEqual(etree_tostring(elem, xml_declaration=True), '') self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', indent=' '), b' ') elem.text = '\t' self.assertEqual(etree_tostring(elem), ' ') self.assertEqual(etree_tostring(elem, spaces_for_tab=2), ' ') self.assertEqual(etree_tostring(elem, spaces_for_tab=0), '') self.assertEqual(etree_tostring(elem, spaces_for_tab=None), '\t') elem.text = None self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='ascii'), b"\n") self.assertEqual(etree_tostring(elem, encoding='ascii', xml_declaration=False), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8'), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='iso-8859-1'), b"\n") self.assertEqual(etree_tostring(elem, encoding='iso-8859-1', xml_declaration=False), b"") self.assertEqual(etree_tostring(elem, method='html'), '') self.assertEqual(etree_tostring(elem, method='text'), '') root = PyElementTree.XML('\n' ' text1\n' ' text2\n' '') self.assertEqual(etree_tostring(root, method='text'), '\n text1\n text2') root = PyElementTree.XML(XML_WITH_NAMESPACES) result = etree_tostring(root) self.assertNotEqual(result, XML_WITH_NAMESPACES) self.assertNotIn('pxa', result) self.assertNotIn('pxa', result) self.assertRegex(result, r'xmlns:ns\d="http://xpath.test/nsa') self.assertRegex(result, r'xmlns:ns\d="http://xpath.test/nsb') namespaces = { 'pxa': "http://xpath.test/nsa", 'pxb': "http://xpath.test/nsb" } expected = '\n' \ ' \n' \ '' self.assertEqual(etree_tostring(root, namespaces), expected) @unittest.skipIf(lxml_etree is None, 'lxml is not installed ...') def test_lxml_element_string_serialization(self): elem = lxml_etree.Element('element') self.assertEqual(etree_tostring(elem), '') self.assertEqual(etree_tostring(elem, xml_declaration=True), '') self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', indent=' '), b' ') elem.text = '\t' self.assertEqual(etree_tostring(elem), ' ') self.assertEqual(etree_tostring(elem, spaces_for_tab=2), ' ') self.assertEqual(etree_tostring(elem, spaces_for_tab=0), '') self.assertEqual(etree_tostring(elem, spaces_for_tab=None), '\t') elem.text = None self.assertEqual(etree_tostring(elem, encoding='us-ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='us-ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='ascii'), b'') self.assertEqual(etree_tostring(elem, encoding='ascii', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='utf-8'), b'') self.assertEqual(etree_tostring(elem, encoding='utf-8', xml_declaration=True), b'\n') self.assertEqual(etree_tostring(elem, encoding='iso-8859-1'), b"\n") self.assertEqual(etree_tostring(elem, encoding='iso-8859-1', xml_declaration=False), b"") self.assertEqual(etree_tostring(elem, method='html'), '') self.assertEqual(etree_tostring(elem, method='text'), '') root = lxml_etree.XML('\n' ' text1\n' ' text2\n' '') self.assertEqual(etree_tostring(root, method='text'), '\n text1\n text2') root = lxml_etree.XML(XML_WITH_NAMESPACES) self.assertEqual(etree_tostring(root), XML_WITH_NAMESPACES) namespaces = { 'tns0': "http://xpath.test/nsa", 'tns1': "http://xpath.test/nsb" } self.assertEqual(etree_tostring(root, namespaces), XML_WITH_NAMESPACES) for prefix, uri in namespaces.items(): lxml_etree.register_namespace(prefix, uri) self.assertEqual(etree_tostring(root), XML_WITH_NAMESPACES) def test_defuse_xml_entities(self): xml_file = Path(__file__).parent.joinpath('resources/with_entity.xml') elem = ElementTree.parse(str(xml_file)).getroot() self.assertEqual(elem.text, 'abc') parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(xml_file, parser=parser) self.assertEqual("Entities are forbidden (entity_name='e')", str(ctx.exception)) with self.assertRaises(PyElementTree.ParseError) as ctx: with xml_file.open() as fp: defuse_xml(fp.read()) self.assertEqual("Entities are forbidden (entity_name='e')", str(ctx.exception)) def test_defuse_xml_external_entities(self): xml_file = Path(__file__).parent.joinpath('resources/external_entity.xml') with self.assertRaises(ElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file)) self.assertIn("undefined entity &ee", str(ctx.exception)) parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file), parser=parser) self.assertEqual("Entities are forbidden (entity_name='ee')", str(ctx.exception)) with self.assertRaises(PyElementTree.ParseError) as ctx: with xml_file.open() as fp: defuse_xml(fp.read()) self.assertEqual("Entities are forbidden (entity_name='ee')", str(ctx.exception)) def test_defuse_xml_unused_external_entities(self): xml_file = str(Path(__file__).parent.joinpath('resources/unused_external_entity.xml')) elem = ElementTree.parse(xml_file).getroot() self.assertEqual(elem.text, 'abc') parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(xml_file, parser=parser) self.assertEqual("Entities are forbidden (entity_name='ee')", str(ctx.exception)) with self.assertRaises(PyElementTree.ParseError) as ctx: with open(xml_file) as fp: defuse_xml(fp.read()) self.assertEqual("Entities are forbidden (entity_name='ee')", str(ctx.exception)) def test_defuse_xml_unparsed_entities(self): xml_file = Path(__file__).parent.joinpath('resources/unparsed_entity.xml') parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file), parser=parser) self.assertEqual("Unparsed entities are forbidden (entity_name='logo_file')", str(ctx.exception)) with self.assertRaises(PyElementTree.ParseError) as ctx: with xml_file.open() as fp: defuse_xml(fp.read()) self.assertEqual("Unparsed entities are forbidden (entity_name='logo_file')", str(ctx.exception)) def test_defuse_xml_unused_unparsed_entities(self): xml_file = Path(__file__).parent.joinpath('resources/unused_unparsed_entity.xml') elem = ElementTree.parse(str(xml_file)).getroot() self.assertIsNone(elem.text) parser = SafeXMLParser(target=PyElementTree.TreeBuilder()) with self.assertRaises(PyElementTree.ParseError) as ctx: ElementTree.parse(str(xml_file), parser=parser) self.assertEqual("Unparsed entities are forbidden (entity_name='logo_file')", str(ctx.exception)) with self.assertRaises(PyElementTree.ParseError) as ctx: with xml_file.open() as fp: defuse_xml(fp.read()) self.assertEqual("Unparsed entities are forbidden (entity_name='logo_file')", str(ctx.exception)) def test_is_etree_document_function(self): document = ElementTree.parse(io.StringIO('')) self.assertTrue(is_etree_document(document)) self.assertFalse(is_etree_document(ElementTree.XML(''))) def test_is_lxml_etree_document_function(self): document = ElementTree.parse(io.StringIO('')) self.assertFalse(is_lxml_etree_document(document)) if lxml_etree is not None: document = lxml_etree.parse(io.StringIO('')) self.assertTrue(is_lxml_etree_document(document)) self.assertFalse(is_lxml_etree_document(lxml_etree.XML(''))) def test_is_lxml_etree_element_function(self): self.assertFalse(is_lxml_etree_element(ElementTree.XML(''))) if lxml_etree is not None: self.assertTrue(is_lxml_etree_element(lxml_etree.XML(''))) def test_etree_deep_equal_function(self): e1 = ElementTree.XML('') e2 = ElementTree.XML('') self.assertTrue(etree_deep_equal(e1, e2)) e2 = ElementTree.XML('') self.assertFalse(etree_deep_equal(e1, e2)) e2 = ElementTree.XML('') self.assertFalse(etree_deep_equal(e1, e2)) e2 = ElementTree.XML('bar') self.assertFalse(etree_deep_equal(e1, e2)) def test_etree_iter_paths_function(self): root = ElementTree.XML('') result = list(etree_iter_paths(root)) self.assertListEqual( result, [(root, '.'), (root[0], './Q{}child[1]')] ) root = ElementTree.XML('') result = list(etree_iter_paths(root)) self.assertListEqual( result, [(root, '.'), (root[0], './Q{http://xpath.test/ns}child[1]')] ) if sys.version_info >= (3, 8): parser = ElementTree.XMLParser( target=ElementTree.TreeBuilder(insert_comments=True) ) root = ElementTree.XML('', parser) result = list(etree_iter_paths(root)) self.assertListEqual( result, [(root, '.'), (root[0], './comment()[1]')] ) parser = ElementTree.XMLParser( target=ElementTree.TreeBuilder(insert_pis=True) ) root = ElementTree.XML( '', parser ) result = list(etree_iter_paths(root)) self.assertListEqual( result, [(root, '.'), (root[0], './processing-instruction(xml-stylesheet)[1]')] ) if lxml_etree is not None: root = lxml_etree.XML('') result = list(etree_iter_paths(root)) self.assertListEqual( result, [(root, '.'), (root[0], './comment()[1]')] ) root = lxml_etree.XML( '' ) result = list(etree_iter_paths(root)) self.assertListEqual( result, [(root, '.'), (root[0], './processing-instruction(xml-stylesheet)[1]')] ) if __name__ == '__main__': header_template = "ElementTree tests for elementpath with Python {} on {}" header = header_template.format(platform.python_version(), platform.platform()) print('{0}\n{1}\n{0}'.format("*" * len(header), header)) unittest.main() sissaschool-elementpath-d3688c7/tests/test_exceptions.py000066400000000000000000000072371476131650400236740ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from elementpath.exceptions import ElementPathError, xpath_error from elementpath.namespaces import XSD_NAMESPACE from elementpath.datatypes import QName from elementpath.xpath1 import XPath1Parser class ExceptionsTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.parser = XPath1Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) def test_string_conversion(self): err = ElementPathError("unknown error") self.assertEqual(str(err), 'unknown error') err = ElementPathError("unknown error", code='XPST0001') self.assertEqual(str(err), '[XPST0001] unknown error') token = self.parser.symbol_table['true'](self.parser) err = ElementPathError("unknown error", token=token) self.assertEqual(str(err), "'fn:true' function at line 1, column 1: unknown error") err = ElementPathError("unknown error", code='XPST0001', token=token) self.assertEqual( str(err), "'fn:true' function at line 1, column 1: [XPST0001] unknown error" ) def test_xpath_error(self): self.assertEqual(str(xpath_error('XPST0001')), '[err:XPST0001] Parser not bound to a schema') self.assertEqual(str(xpath_error('err:XPDY0002', "test message")), '[err:XPDY0002] test message') self.assertRaises(ValueError, xpath_error, '') self.assertRaises(ValueError, xpath_error, 'error:XPDY0002') self.assertEqual(str(xpath_error('{http://www.w3.org/2005/xqt-errors}XPST0001')), '[err:XPST0001] Parser not bound to a schema') code = QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0001') self.assertEqual(str(xpath_error(code)), '[err:XPST0001] Parser not bound to a schema') code = QName('', 'XPST0001') self.assertEqual(str(xpath_error(code)), '[Q{}XPST0001] Parser not bound to a schema') code = QName('http://xpath.test/errors', 'ce:XPCE0001') self.assertEqual(str(xpath_error(code)), '[ce:XPCE0001] custom XPath error') with self.assertRaises(ValueError) as err: xpath_error('{http://www.w3.org/2005/xpath-functions}XPST0001') self.assertEqual(str(err.exception), "[err:XPTY0004] invalid namespace " "'http://www.w3.org/2005/xpath-functions'") with self.assertRaises(ValueError) as err: xpath_error('{http://www.w3.org/2005/xpath-functions}}XPST0001') self.assertEqual(str(err.exception), "[err:XPTY0004] '{http://www.w3.org/2005/xpath-" "functions}}XPST0001' is not an xs:QName",) code = '{http://www.w3.org/2005/xqt-errors}XPST0001' namespaces = {'fn': 'http://www.w3.org/2005/xpath-functions', 'e': 'http://www.w3.org/2005/xqt-errors'} self.assertEqual(str(xpath_error(code, namespaces=namespaces)), '[e:XPST0001] Parser not bound to a schema') namespaces = {'fn': 'http://www.w3.org/2005/xpath-functions', '': 'http://www.w3.org/2005/xqt-errors'} self.assertEqual(str(xpath_error(code, namespaces=namespaces)), '[XPST0001] Parser not bound to a schema') if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_helpers.py000066400000000000000000000271161476131650400231530ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import re import unittest import math from xml.etree import ElementTree from elementpath.helpers import LazyPattern, days_from_common_era, \ months2days, round_number, is_idrefs, collapse_white_spaces, escape_json_string, \ get_double, numeric_equal, numeric_not_equal, equal, not_equal, \ match_wildcard, unescape_json_string, iter_sequence, split_function_test class HelperFunctionsTest(unittest.TestCase): def test_lazy_pattern(self): pattern = LazyPattern(r'^[^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*$') self.assertIsInstance(pattern, LazyPattern) class TestPatterns: pattern = LazyPattern(r'^[^\d\W][\w.\-\u00B7\u0300-\u036F\u203F\u2040]*$') self.assertIsInstance(TestPatterns.pattern, re.Pattern) self.assertIsNotNone(TestPatterns.pattern.match('foo')) self.assertIsNone(TestPatterns.pattern.match('foo:bar')) def test_node_is_idref_function(self): self.assertTrue(is_idrefs(ElementTree.XML('xyz').text)) self.assertTrue(is_idrefs(ElementTree.XML('xyz abc').text)) self.assertFalse(is_idrefs(ElementTree.XML('12345').text)) self.assertTrue(is_idrefs('alpha')) self.assertTrue(is_idrefs('alpha beta')) self.assertFalse(is_idrefs('12345')) def test_days_from_common_era_function(self): days4y = 365 * 3 + 366 days100y = days4y * 24 + 365 * 4 days400y = days100y * 4 + 1 self.assertEqual(days_from_common_era(0), 0) self.assertEqual(days_from_common_era(1), 365) self.assertEqual(days_from_common_era(3), 365 * 3) self.assertEqual(days_from_common_era(4), days4y) self.assertEqual(days_from_common_era(100), days100y) self.assertEqual(days_from_common_era(200), days100y * 2) self.assertEqual(days_from_common_era(300), days100y * 3) self.assertEqual(days_from_common_era(400), days400y) self.assertEqual(days_from_common_era(800), 2 * days400y) self.assertEqual(days_from_common_era(-1), -366) self.assertEqual(days_from_common_era(-4), -days4y) self.assertEqual(days_from_common_era(-5), -days4y - 366) self.assertEqual(days_from_common_era(-100), -days100y - 1) self.assertEqual(days_from_common_era(-200), -days100y * 2 - 1) self.assertEqual(days_from_common_era(-300), -days100y * 3 - 1) self.assertEqual(days_from_common_era(-101), -days100y - 366) self.assertEqual(days_from_common_era(-400), -days400y) self.assertEqual(days_from_common_era(-401), -days400y - 366) self.assertEqual(days_from_common_era(-800), -days400y * 2) def test_months2days_function(self): self.assertEqual(months2days(-119, 1, 12 * 319), 116512) self.assertEqual(months2days(200, 1, -12 * 320) - 1, -116877 - 2) # 0000 BCE tests self.assertEqual(months2days(0, 1, 12), 366) self.assertEqual(months2days(0, 1, -12), -365) self.assertEqual(months2days(1, 1, 12), 365) self.assertEqual(months2days(1, 1, -12), -366) # xs:duration ordering related tests self.assertEqual(months2days(year=1696, month=9, months_delta=0), 0) self.assertEqual(months2days(1696, 9, 1), 30) self.assertEqual(months2days(1696, 9, 2), 61) self.assertEqual(months2days(1696, 9, 3), 91) self.assertEqual(months2days(1696, 9, 4), 122) self.assertEqual(months2days(1696, 9, 5), 153) self.assertEqual(months2days(1696, 9, 12), 365) self.assertEqual(months2days(1696, 9, -1), -31) self.assertEqual(months2days(1696, 9, -2), -62) self.assertEqual(months2days(1696, 9, -12), -366) self.assertEqual(months2days(1697, 2, 0), 0) self.assertEqual(months2days(1697, 2, 1), 28) self.assertEqual(months2days(1697, 2, 12), 365) self.assertEqual(months2days(1697, 2, -1), -31) self.assertEqual(months2days(1697, 2, -2), -62) self.assertEqual(months2days(1697, 2, -3), -92) self.assertEqual(months2days(1697, 2, -12), -366) self.assertEqual(months2days(1697, 2, -14), -428) self.assertEqual(months2days(1697, 2, -15), -458) self.assertEqual(months2days(1903, 3, 0), 0) self.assertEqual(months2days(1903, 3, 1), 31) self.assertEqual(months2days(1903, 3, 2), 61) self.assertEqual(months2days(1903, 3, 3), 92) self.assertEqual(months2days(1903, 3, 4), 122) self.assertEqual(months2days(1903, 3, 11), 366 - 29) self.assertEqual(months2days(1903, 3, 12), 366) self.assertEqual(months2days(1903, 3, -1), -28) self.assertEqual(months2days(1903, 3, -2), -59) self.assertEqual(months2days(1903, 3, -3), -90) self.assertEqual(months2days(1903, 3, -12), -365) self.assertEqual(months2days(1903, 7, 0), 0) self.assertEqual(months2days(1903, 7, 1), 31) self.assertEqual(months2days(1903, 7, 2), 62) self.assertEqual(months2days(1903, 7, 3), 92) self.assertEqual(months2days(1903, 7, 6), 184) self.assertEqual(months2days(1903, 7, 12), 366) self.assertEqual(months2days(1903, 7, -1), -30) self.assertEqual(months2days(1903, 7, -2), -61) self.assertEqual(months2days(1903, 7, -6), -181) self.assertEqual(months2days(1903, 7, -12), -365) # Extra tests self.assertEqual(months2days(1900, 3, 0), 0) self.assertEqual(months2days(1900, 3, 1), 31) self.assertEqual(months2days(1900, 3, 24), 730) self.assertEqual(months2days(1900, 3, -1), -28) self.assertEqual(months2days(1900, 3, -24), -730) self.assertEqual(months2days(1000, 4, 0), 0) self.assertEqual(months2days(1000, 4, 1), 30) self.assertEqual(months2days(1000, 4, 24), 730) self.assertEqual(months2days(1000, 4, -1), -31) self.assertEqual(months2days(1000, 4, -24), -730) self.assertEqual(months2days(2001, 10, -12), -365) self.assertEqual(months2days(2000, 10, -12), -366) self.assertEqual(months2days(2000, 2, -12), -365) self.assertEqual(months2days(2000, 3, -12), -366) def test_round_number_function(self): self.assertTrue(math.isnan(round_number(float('NaN')))) self.assertTrue(math.isinf(round_number(float('INF')))) self.assertTrue(math.isinf(round_number(float('-INF')))) self.assertEqual(round_number(10.1), 10) self.assertEqual(round_number(9.5), 10) self.assertEqual(round_number(-10.1), -10) self.assertEqual(round_number(-9.5), -9) def test_collapse_white_spaces_function(self): self.assertEqual(collapse_white_spaces(' ab c '), 'ab c') self.assertEqual(collapse_white_spaces(' ab\t\nc '), 'ab c') def test_get_double_function(self): self.assertEqual(get_double(1), 1.0) self.assertEqual(get_double(1.0), 1.0) self.assertIs(get_double('NaN'), math.nan) self.assertIs(get_double(float('nan')), math.nan) self.assertTrue(math.isinf(get_double('INF'))) self.assertRaises(ValueError, get_double, 'nan') self.assertRaises(ValueError, get_double, 'Inf') self.assertRaises(ValueError, get_double, 'alfa') def test_numeric_equal_function(self): self.assertTrue(numeric_equal(1.0, 1)) self.assertFalse(numeric_equal(1.000001, 1.0)) self.assertTrue(numeric_equal(1.0000001, 1.0)) self.assertFalse(numeric_equal(float('nan'), float('nan'))) self.assertRaises(TypeError, numeric_equal, 'xyz', 1) def test_numeric_not_equal_function(self): self.assertFalse(numeric_not_equal(1.0, 1)) self.assertTrue(numeric_not_equal(1.000001, 1.0)) self.assertFalse(numeric_not_equal(1.0000001, 1.0)) self.assertTrue(numeric_not_equal(float('nan'), float('nan'))) def test_equal_function(self): self.assertTrue(equal(1.0, 1)) self.assertFalse(equal(1.000001, 1.0)) self.assertFalse(equal(1.0000001, 1.0)) self.assertTrue(equal(float('nan'), float('nan'))) self.assertTrue(equal('xyz', 'xyz')) def test_not_equal_function(self): self.assertFalse(not_equal(1.0, 1)) self.assertTrue(not_equal(1.000001, 1.0)) self.assertTrue(not_equal(1.0000001, 1.0)) self.assertFalse(not_equal(float('nan'), float('nan'))) self.assertFalse(not_equal('xyz', 'xyz')) def test_match_wildcard_function(self): self.assertTrue(match_wildcard('foo', '*')) self.assertTrue(match_wildcard('foo', '*:*')) self.assertTrue(match_wildcard('foo', '*:foo')) self.assertFalse(match_wildcard('foo', '*:bar')) self.assertTrue(match_wildcard('{ns}foo', '*:foo')) self.assertFalse(match_wildcard('{ns}foo', '*:bar')) self.assertTrue(match_wildcard('tns:foo', 'tns:*')) self.assertFalse(match_wildcard('tns:foo', 'bar:*')) self.assertTrue(match_wildcard('{ns}foo', '{ns}*')) self.assertFalse(match_wildcard('{ns}foo', '{ns}foo')) # is not a wildcard self.assertFalse(match_wildcard('{ns}foo', '{ns}bar')) def test_escape_json_string_function(self): self.assertEqual(escape_json_string("\""), '\\"') self.assertEqual(escape_json_string("\""), '\\"') self.assertEqual(escape_json_string('\\"', escaped=True), '\\"') self.assertEqual(escape_json_string('\\u000A', escaped=True), '\\u000A') def test_unescape_json_string_function(self): self.assertEqual(unescape_json_string('foo'), 'foo') self.assertEqual(unescape_json_string('\\n'), '\n') self.assertEqual(unescape_json_string('\\u0031'), '1') self.assertEqual(unescape_json_string('\\"'), '"') self.assertEqual(unescape_json_string('\\\\'), '\\') self.assertEqual(unescape_json_string('\\u000a'), '\n') self.assertEqual(unescape_json_string('\\U0000000a'), '\n') self.assertEqual(unescape_json_string('-\\r-'), '-\r-') self.assertEqual(unescape_json_string("-\\t-"), '-\t-') def test_iter_sequence_function(self): self.assertListEqual(list(iter_sequence(None)), []) self.assertListEqual(list(iter_sequence([None, 8])), [8]) self.assertListEqual(list(iter_sequence([])), []) self.assertListEqual(list(iter_sequence([[], 8])), [8]) self.assertListEqual(list(iter_sequence([[], [], []])), []) self.assertListEqual(list(iter_sequence([[], 8, [9]])), [8, 9]) def test_split_function_test_function(self): self.assertListEqual( split_function_test('element(*)'), [] ) self.assertListEqual( split_function_test('function(*)'), ['*'] ) self.assertListEqual( split_function_test('function(item()) as xs:anyAtomicType'), ['item()', 'xs:anyAtomicType'] ) self.assertListEqual( split_function_test('function(xs:string) as xs:integer*'), ['xs:string', 'xs:integer*'] ) self.assertListEqual( split_function_test('function() as map(xs:string, item())'), ['map(xs:string, item())'] ) self.assertListEqual( split_function_test('function(item()*, item()*, item()*) as item()*'), ['item()*', 'item()*', 'item()*', 'item()*'] ) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_namespaces.py000066400000000000000000000055301476131650400236240ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from elementpath.namespaces import XSD_NAMESPACE, get_namespace, \ get_prefixed_name, get_expanded_name, split_expanded_name class NamespacesTest(unittest.TestCase): namespaces = { 'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns" } # namespaces.py module def test_get_namespace_function(self): self.assertEqual(get_namespace('A'), '') self.assertEqual(get_namespace('{ns}foo'), 'ns') self.assertEqual(get_namespace('{}foo'), '') self.assertEqual(get_namespace('{A}B{C}'), 'A') def test_qname_to_prefixed_function(self): self.assertEqual(get_prefixed_name('{ns}foo', {'bar': 'ns'}), 'bar:foo') self.assertEqual(get_prefixed_name('{ns}foo', {'': 'ns'}), 'foo') self.assertEqual(get_prefixed_name('Q{ns}foo', {'': 'ns'}), 'foo') self.assertEqual(get_prefixed_name('foo', {'': 'ns'}), 'foo') self.assertEqual(get_prefixed_name('', {'': 'ns'}), '') self.assertEqual(get_prefixed_name('{ns}foo', {}), '{ns}foo') self.assertEqual(get_prefixed_name('{ns}foo', {'bar': 'other'}), '{ns}foo') with self.assertRaises(ValueError): get_prefixed_name('{{ns}}foo', {'bar': 'ns'}) def test_prefixed_to_qname_function(self): self.assertEqual(get_expanded_name('{ns}foo', {'bar': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('Q{ns}foo', {'bar': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('bar:foo', {'bar': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('foo', {'': 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('foo', {None: 'ns'}), '{ns}foo') self.assertEqual(get_expanded_name('', {'': 'ns'}), '') with self.assertRaises(KeyError): get_expanded_name('bar:foo', self.namespaces) with self.assertRaises(ValueError): get_expanded_name('bar:foo:bar', {'bar': 'ns'}) with self.assertRaises(ValueError): get_expanded_name(':foo', {'': 'ns'}) with self.assertRaises(ValueError): get_expanded_name('foo:', {'': 'ns'}) def test_split_expanded_name_function(self): self.assertEqual(split_expanded_name('{ns}foo'), ('ns', 'foo')) self.assertEqual(split_expanded_name('foo'), ('', 'foo')) with self.assertRaises(ValueError): split_expanded_name('tst:foo') with self.assertRaises(ValueError): split_expanded_name('{{ns}}foo') if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_package.py000066400000000000000000000102231476131650400230730ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import glob import fileinput import os import re import platform class PackageTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.test_dir = os.path.dirname(os.path.abspath(__file__)) cls.package_dir = os.path.dirname(cls.test_dir) cls.source_dir = os.path.join(cls.package_dir, 'elementpath/') cls.missing_debug = re.compile( r"(\bimport\s+pdb\b|\bpdb\s*\.\s*set_trace\(\s*\)|\bprint\s*\(|\bbreakpoint\s*\()" ) cls.get_version = re.compile( r"(?:\bversion|__version__)(?:\s*=\s*)(\'[^\']*\'|\"[^\"]*\")" ) cls.get_python_requires = re.compile( r"(?:\bpython_requires\s*=\s*)(\'[^\']*\'|\"[^\"]*\")" ) cls.get_classifier_version = re.compile( r"(?:'Programming\s+Language\s+::\s+Python\s+::\s+)(3\.\d+)(?:\s*')" ) @unittest.skipIf(platform.system() == 'Windows', 'Skip on Windows platform') def test_missing_debug_statements(self): message = "\nFound a debug missing statement at line %d of file %r: %r" filename = None source_files = glob.glob(os.path.join(self.source_dir, '*.py')) + \ glob.glob(os.path.join(self.source_dir, '*/*.py')) for line in fileinput.input(source_files): if fileinput.isfirstline(): filename = os.path.basename(fileinput.filename()) if filename == 'generate_categories.py': fileinput.nextfile() continue lineno = fileinput.filelineno() match = self.missing_debug.search(line) self.assertIsNone( match, message % (lineno, filename, match.group(0) if match else None) ) def test_version_matching(self): message = "\nFound a different version at line %d of file %r: %r (maybe %r)." files = [ os.path.join(self.source_dir, '__init__.py'), os.path.join(self.package_dir, 'setup.py'), ] version = filename = None for line in fileinput.input(files): if fileinput.isfirstline(): filename = fileinput.filename() lineno = fileinput.filelineno() match = self.get_version.search(line) if match is not None: if version is None: version = match.group(1).strip('\'\"') else: self.assertTrue( version == match.group(1).strip('\'\"'), message % (lineno, filename, match.group(1).strip('\'\"'), version) ) def test_python_requirement(self): files = [ os.path.join(self.package_dir, 'setup.py'), os.path.join(self.package_dir, 'setup.py'), ] min_version = None for line in fileinput.input(files): if min_version is None: match = self.get_python_requires.search(line) if match is not None: min_version = match.group(1).strip('\'\"') self.assertTrue( min_version.startswith('>=3.') and min_version[4:].isdigit(), msg="Wrong python_requires directive in setup.py: %s" % min_version ) min_version = min_version[2:] else: match = self.get_classifier_version.search(line) if match is not None: python_version = match.group(1) self.assertEqual(python_version[:2], min_version[:2]) self.assertGreaterEqual(int(python_version[2:]), int(min_version[2:])) self.assertIsNotNone(min_version, msg="Missing python_requires directive in setup.py") if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_regex.py000066400000000000000000001415311476131650400226210ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2016-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """ This module runs tests on XML Schema regular expressions. """ import unittest import os import sys import re import string from collections import Counter from copy import copy from itertools import chain from unicodedata import category, unidata_version from elementpath.regex import RegexError, CharacterClass, translate_pattern, \ UnicodeSubset, unicode_category, unicode_block, install_unicode_data, \ unicode_version, UnicodeData from elementpath.regex.codepoints import code_point_repr, iter_code_points, \ iterparse_character_subset CATEGORIES = ( 'C', 'Cc', 'Cf', 'Cs', 'Co', 'Cn', 'L', 'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'M', 'Mn', 'Mc', 'Me', 'N', 'Nd', 'Nl', 'No', 'P', 'Pc', 'Pd', 'Ps', 'Pe', 'Pi', 'Pf', 'Po', 'S', 'Sm', 'Sc', 'Sk', 'So', 'Z', 'Zs', 'Zl', 'Zp' ) class TestCodePoints(unittest.TestCase): def test_iter_code_points(self): self.assertEqual(list(iter_code_points([10, 20, 11, 12, 25, (9, 21), 21])), [(9, 22), 25]) self.assertEqual(list(iter_code_points([10, 20, 11, 12, 25, (9, 20), 21])), [(9, 22), 25]) self.assertEqual(list(iter_code_points({2, 120, 121, (150, 260)})), [2, (120, 122), (150, 260)]) self.assertEqual( list(iter_code_points([10, 20, (10, 22), 11, 12, 25, 8, (9, 20), 21, 22, 9, 0])), [0, (8, 23), 25] ) self.assertEqual( list(e for e in iter_code_points([10, 20, 11, 12, 25, (9, 21)], reverse=True)), [25, (9, 21)] ) self.assertEqual( list(iter_code_points([10, 20, (10, 22), 11, 12, 25, 8, (9, 20), 21, 22, 9, 0], reverse=True)), [25, (8, 23), 0] ) class TestParseCharacterSubset(unittest.TestCase): def test_expand_ranges(self): self.assertListEqual( list(iterparse_character_subset('a-e', expand_ranges=True)), [ord('a'), ord('b'), ord('c'), ord('d'), ord('e')] ) def test_backslash_character(self): self.assertListEqual(list(iterparse_character_subset('\\')), [ord('\\')]) self.assertListEqual(list(iterparse_character_subset('2-\\')), [(ord('2'), ord('\\') + 1)]) self.assertListEqual(list(iterparse_character_subset('2-\\\\')), [(ord('2'), ord('\\') + 1), ord('\\')]) self.assertListEqual(list(iterparse_character_subset('2-\\x')), [(ord('2'), ord('\\') + 1), ord('x')]) self.assertListEqual(list(iterparse_character_subset('2-\\a-x')), [(ord('2'), ord('\\') + 1), (ord('a'), ord('x') + 1)]) self.assertListEqual(list(iterparse_character_subset('2-\\{')), [(ord('2'), ord('{') + 1)]) def test_backslash_escapes(self): self.assertListEqual(list(iterparse_character_subset('\\{')), [ord('{')]) self.assertListEqual(list(iterparse_character_subset('\\(')), [ord('(')]) self.assertListEqual(list(iterparse_character_subset('\\a')), [ord('\\'), ord('a')]) def test_square_brackets(self): self.assertListEqual(list(iterparse_character_subset('\\[')), [ord('[')]) self.assertListEqual(list(iterparse_character_subset('[')), [ord('[')]) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('[ ')) self.assertIn("bad character '['", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('x[')) self.assertIn("bad character '['", str(ctx.exception)) self.assertListEqual(list(iterparse_character_subset('\\]')), [ord(']')]) self.assertListEqual(list(iterparse_character_subset(']')), [ord(']')]) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('].')) self.assertIn("bad character ']'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('8[')) self.assertIn("bad character '['", str(ctx.exception)) def test_character_range(self): self.assertListEqual(list(iterparse_character_subset('A-z')), [(ord('A'), ord('z') + 1)]) self.assertListEqual(list(iterparse_character_subset('\\[-z')), [(ord('['), ord('z') + 1)]) def test_bad_character_range(self): with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('9-2')) self.assertIn('bad character range', str(ctx.exception)) with self.assertRaises(RegexError) as ctx: list(iterparse_character_subset('2-\\s')) self.assertIn('bad character range', str(ctx.exception)) def test_parse_multiple_ranges(self): self.assertListEqual( list(iterparse_character_subset('a-c-1-4x-z-7-9')), [(ord('a'), ord('c') + 1), ord('-'), (ord('1'), ord('4') + 1), (ord('x'), ord('z') + 1), ord('-'), (55, 58)] ) class TestUnicodeSubset(unittest.TestCase): def test_creation(self): subset = UnicodeSubset([(0, 9), 11, 12, (14, 32), (33, sys.maxunicode + 1)]) self.assertEqual(subset, [(0, 9), 11, 12, (14, 32), (33, sys.maxunicode + 1)]) self.assertEqual(UnicodeSubset('0-9'), [(48, 58)]) self.assertEqual(UnicodeSubset('0-9:'), [(48, 59)]) subset = UnicodeSubset('a-z') self.assertEqual(UnicodeSubset(subset), [(ord('a'), ord('z') + 1)]) def test_repr(self): self.assertEqual(code_point_repr((ord('2'), ord('\\') + 1)), r'2-\\') subset = UnicodeSubset('a-z') self.assertEqual(repr(subset), "UnicodeSubset('a-z')") self.assertEqual(str(subset), "a-z") subset = UnicodeSubset((50, 90)) subset.codepoints.append(sys.maxunicode + 10) # Invalid subset self.assertRaises(ValueError, repr, subset) def test_modify(self): subset = UnicodeSubset() for cp in [50, 90, 10, 90]: subset.add(cp) self.assertEqual(subset, [10, 50, 90]) self.assertRaises(ValueError, subset.add, -1) self.assertRaises(ValueError, subset.add, sys.maxunicode + 1) subset.add((100, 20001)) subset.discard((100, 19001)) self.assertEqual(subset, [10, 50, 90, (19001, 20001)]) subset.add(0) subset.discard(1) self.assertEqual(subset, [0, 10, 50, 90, (19001, 20001)]) subset.discard(0) self.assertEqual(subset, [10, 50, 90, (19001, 20001)]) subset.discard((10, 100)) self.assertEqual(subset, [(19001, 20001)]) subset.add(20) subset.add(19) subset.add(30) subset.add([30, 33]) subset.add(30000) subset.add(30001) self.assertEqual(subset, [(19, 21), (30, 33), (19001, 20001), (30000, 30002)]) subset.add(22) subset.add(21) subset.add(22) self.assertEqual(subset, [(19, 22), 22, (30, 33), (19001, 20001), (30000, 30002)]) subset.discard((90, 50000)) self.assertEqual(subset, [(19, 22), 22, (30, 33)]) subset.discard(21) subset.discard(19) self.assertEqual(subset, [20, 22, (30, 33)]) subset.discard((0, 200)) self.assertEqual(subset, []) with self.assertRaises(TypeError): subset.discard(None) with self.assertRaises(ValueError): subset.discard((10, 11, 12)) def test_update_method(self): subset = UnicodeSubset() subset.update('\\\\') self.assertListEqual(subset.codepoints, [ord('\\')]) subset.update('\\$') self.assertListEqual(subset.codepoints, [ord('$'), ord('\\')]) subset.clear() subset.update('!--') self.assertListEqual(subset.codepoints, [(ord('!'), ord('-') + 1)]) subset.clear() subset.update('!---') self.assertListEqual(subset.codepoints, [(ord('!'), ord('-') + 1)]) subset.clear() subset.update('!--a') self.assertListEqual(subset.codepoints, [(ord('!'), ord('-') + 1), ord('a')]) with self.assertRaises(RegexError): subset.update('[[') def test_difference_update_method(self): subset = UnicodeSubset('a-z') subset.difference_update('a-c') self.assertEqual(subset, UnicodeSubset('d-z')) subset = UnicodeSubset('a-z') subset.difference_update([(ord('a'), ord('c') + 1)]) self.assertEqual(subset, UnicodeSubset('d-z')) def test_iterate(self): subset = UnicodeSubset('a-d') self.assertListEqual(list(iter(subset)), [ord('a'), ord('b'), ord('c'), ord('d')]) self.assertListEqual(list(subset.iter_characters()), ['a', 'b', 'c', 'd']) def test_reversed(self): subset = UnicodeSubset('0-9ax') self.assertEqual(list(reversed(subset)), [ord('x'), ord('a'), ord('9'), 56, 55, 54, 53, 52, 51, 50, 49, 48]) def test_in_operator(self): subset = UnicodeSubset('0-9a-z') self.assertIn('a', subset) self.assertIn(ord('a'), subset) self.assertIn(ord('z'), subset) self.assertNotIn('/', subset) self.assertNotIn('A', subset) self.assertNotIn(ord('A'), subset) self.assertNotIn(ord('}'), subset) self.assertNotIn(float(ord('a')), subset) self.assertNotIn('.', subset) subset.update('.') self.assertIn('.', subset) self.assertNotIn('/', subset) self.assertNotIn('-', subset) def test_complement(self): subset = UnicodeSubset((50, 90, 10, 90)) self.assertEqual(list(subset.complement()), [(0, 10), (11, 50), (51, 90), (91, sys.maxunicode + 1)]) subset.add(11) self.assertEqual(list(subset.complement()), [(0, 10), (12, 50), (51, 90), (91, sys.maxunicode + 1)]) subset.add((0, 10)) self.assertEqual(list(subset.complement()), [(12, 50), (51, 90), (91, sys.maxunicode + 1)]) s1 = UnicodeSubset(chain( unicode_category('L').codepoints, unicode_category('M').codepoints, unicode_category('N').codepoints, unicode_category('S').codepoints )) s2 = UnicodeSubset(chain( unicode_category('C').codepoints, unicode_category('P').codepoints, unicode_category('Z').codepoints )) self.assertEqual(s1.codepoints, UnicodeSubset(s2.complement()).codepoints) subset = UnicodeSubset((50, 90)) subset.codepoints.append(70) # Invalid subset (unordered) with self.assertRaises(ValueError) as ctx: list(subset.complement()) self.assertEqual( str(ctx.exception), "unordered code points found in UnicodeSubset('2ZF')") subset = UnicodeSubset((sys.maxunicode - 1,)) self.assertEqual(list(subset.complement()), [(0, sys.maxunicode - 1), sys.maxunicode]) def test_equality(self): self.assertFalse(UnicodeSubset() == 0.0) self.assertEqual(UnicodeSubset('a-z'), UnicodeSubset('a-kl-z')) def test_union_and_intersection(self): s1 = UnicodeSubset([50, (90, 200), 10]) s2 = UnicodeSubset([10, 51, (89, 150), 90]) self.assertEqual(s1 | s2, [10, (50, 52), (89, 200)]) self.assertEqual(s1 & s2, [10, (90, 150)]) subset = UnicodeSubset('a-z') subset |= UnicodeSubset('A-Zfx') self.assertEqual(subset, UnicodeSubset('A-Za-z')) subset |= '0-9' self.assertEqual(subset, UnicodeSubset('0-9A-Za-z')) subset |= [ord('{'), ord('}')] self.assertEqual(subset, UnicodeSubset('0-9A-Za-z{}')) subset = UnicodeSubset('a-z') subset &= UnicodeSubset('A-Zfx') self.assertEqual(subset, UnicodeSubset('fx')) subset &= 'xyz' self.assertEqual(subset, UnicodeSubset('x')) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset |= False self.assertIn('unsupported operand type', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset &= False self.assertIn('unsupported operand type', str(ctx.exception)) def test_max_and_min(self): s1 = UnicodeSubset([10, 51, (89, 151), 90]) s2 = UnicodeSubset([0, 2, (80, 201), 10000]) s3 = UnicodeSubset([1]) self.assertEqual((min(s1), max(s1)), (10, 150)) self.assertEqual((min(s2), max(s2)), (0, 10000)) self.assertEqual((min(s3), max(s3)), (1, 1)) def test_subtraction(self): subset = UnicodeSubset([0, 2, (80, 200), 10000]) self.assertEqual(subset - {2, 120, 121, (150, 260)}, [0, (80, 120), (122, 150), 10000]) subset = UnicodeSubset('a-z') subset -= UnicodeSubset('a-c') self.assertEqual(subset, UnicodeSubset('d-z')) subset = UnicodeSubset('a-z') subset -= 'a-c' self.assertEqual(subset, UnicodeSubset('d-z')) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset -= False self.assertIn('unsupported operand type', str(ctx.exception)) def test_xor(self): subset = UnicodeSubset('a-z') subset ^= subset self.assertEqual(subset, UnicodeSubset()) subset = UnicodeSubset('a-z') subset ^= UnicodeSubset('a-c') self.assertEqual(subset, UnicodeSubset('d-z')) subset = UnicodeSubset('a-z') subset ^= 'a-f' self.assertEqual(subset, UnicodeSubset('g-z')) with self.assertRaises(TypeError) as ctx: subset = UnicodeSubset('a-z') subset ^= False self.assertIn('unsupported operand type', str(ctx.exception)) subset = UnicodeSubset('a-z') subset ^= 'A-Za-f' self.assertEqual(subset, UnicodeSubset('A-Zg-z')) class TestCharacterClass(unittest.TestCase): def test_char_class_init(self): char_class = CharacterClass() self.assertEqual(char_class.positive, []) self.assertEqual(char_class.negative, []) char_class = CharacterClass('a-z') self.assertEqual(char_class.positive, [(97, 123)]) self.assertEqual(char_class.negative, []) def test_char_class_repr(self): char_class = CharacterClass('a-z') self.assertEqual(repr(char_class), 'CharacterClass([a-z])') char_class.complement() self.assertEqual(repr(char_class), 'CharacterClass([^a-z])') def test_char_class_copy(self): char_class = CharacterClass('a-z') char_class_copy = copy(char_class) self.assertEqual(char_class.xsd_version, char_class_copy.xsd_version) self.assertEqual(char_class.positive, char_class_copy.positive) self.assertEqual(char_class.negative, char_class_copy.negative) self.assertEqual(char_class, char_class_copy) def test_char_class_contains(self): char_class = CharacterClass('a-z') self.assertIn('a', char_class) self.assertIn(97, char_class) self.assertNotIn(97.0, char_class) def test_char_class_split(self): self.assertListEqual(CharacterClass._re_char_set.split(r'2-\\'), [r'2-\\']) def test_complement(self): char_class = CharacterClass('a-z') self.assertListEqual(char_class.positive.codepoints, [(97, 123)]) self.assertListEqual(char_class.negative.codepoints, []) char_class.complement() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(97, 123)]) self.assertEqual(str(char_class), '[^a-z]') char_class = CharacterClass() char_class.complement() self.assertEqual(len(char_class), sys.maxunicode + 1) def test_isub_operator(self): char_class = CharacterClass('A-Za-z') char_class -= CharacterClass('a-z') self.assertEqual(str(char_class), '[A-Z]') char_class = CharacterClass('a-z') other = CharacterClass('A-Za-c') other.complement() char_class -= other self.assertEqual(str(char_class), '[a-c]') char_class = CharacterClass('a-z') other = CharacterClass('A-Za-c') other.complement() other.add('b') char_class -= other self.assertEqual(str(char_class), '[ac]') char_class = CharacterClass('a-c') char_class.complement() other = CharacterClass('a-z') other.complement() char_class -= other self.assertEqual(str(char_class), '[d-z]') char_class = CharacterClass('a-z') with self.assertRaises(TypeError): char_class -= 'a' def test_in_operator(self): char_class = CharacterClass('A-Za-z') self.assertIn(100, char_class) self.assertIn('d', char_class) self.assertNotIn(49, char_class) self.assertNotIn('1', char_class) char_class.complement() self.assertNotIn(100, char_class) self.assertNotIn('d', char_class) self.assertIn(49, char_class) self.assertIn('1', char_class) def test_iterate(self): char_class = CharacterClass('A-Za-z') self.assertEqual(''.join(chr(c) for c in char_class), string.ascii_uppercase + string.ascii_lowercase) char_class.complement() self.assertEqual(len(''.join(chr(c) for c in char_class)), sys.maxunicode + 1 - len(string.ascii_letters)) def test_length(self): char_class = CharacterClass('0-9A-Z') self.assertListEqual(char_class.positive.codepoints, [(48, 58), (65, 91)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 36) char_class.complement() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(48, 58), (65, 91)]) self.assertEqual(len(char_class), sys.maxunicode + 1 - 36) char_class.add('k-m') self.assertListEqual(char_class.positive.codepoints, [(107, 110)]) self.assertListEqual(char_class.negative.codepoints, [(48, 58), (65, 91)]) self.assertEqual(str(char_class), '[\x00-/:-@\\[-\U0010ffffk-m]') self.assertEqual(len(char_class), sys.maxunicode + 1 - 36) char_class.add('K-M') self.assertListEqual(char_class.positive.codepoints, [(75, 78), (107, 110)]) self.assertListEqual(char_class.negative.codepoints, [(48, 58), (65, 91)]) self.assertEqual(len(char_class), sys.maxunicode + 1 - 33) self.assertEqual(str(char_class), '[\x00-/:-@\\[-\U0010ffffK-Mk-m]') char_class.clear() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 0) def test_add(self): char_class = CharacterClass() self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 0) char_class.add('0-9') self.assertListEqual(char_class.positive.codepoints, [(48, 58)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 10) char_class = CharacterClass() char_class.add(ord('0')) self.assertListEqual(char_class.positive.codepoints, [48]) char_class.add(r'\p{Nd}') if unidata_version == '12.1.0': self.assertEqual(len(char_class), 630) elif unidata_version == '15.0.0': self.assertEqual(len(char_class), 680) with self.assertRaises(RegexError): char_class.add(r'\p{}') with self.assertRaises(RegexError): char_class.add(r'\p{XYZ}') char_class.add(r'\P{Nd}') self.assertEqual(len(char_class), sys.maxunicode + 1) char_class = CharacterClass() char_class.add(r'\p{IsFoo}') def test_discard(self): char_class = CharacterClass('0-9') char_class.discard('6-9') self.assertListEqual(char_class.positive.codepoints, [(48, 54)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), 6) char_class = CharacterClass('0-9') char_class.discard(ord('6')) self.assertListEqual(char_class.positive.codepoints, [(48, 54), (55, 58)]) char_class.add(r'\p{Nd}') if unidata_version == '12.1.0': self.assertEqual(len(char_class), 630) elif unidata_version == '15.0.0': self.assertEqual(len(char_class), 680) char_class.discard(r'\p{Nd}') self.assertEqual(len(char_class), 0) with self.assertRaises(RegexError): char_class.discard(r'\p{}') with self.assertRaises(RegexError): char_class.discard(r'\p{XYZ}') char_class.add(r'\P{Nd}') if unidata_version == '12.1.0': self.assertEqual(len(char_class), sys.maxunicode + 1 - 630) elif unidata_version == '15.0.0': self.assertEqual(len(char_class), sys.maxunicode + 1 - 680) char_class.discard(r'\P{Nd}') self.assertEqual(len(char_class), 0) char_class = CharacterClass('a-z') char_class.discard(r'\p{IsFoo}') self.assertEqual(len(char_class), 0) char_class = CharacterClass() char_class.complement() char_class.discard('\\n') self.assertListEqual(char_class.positive.codepoints, [(0, 10), (11, 1114112)]) self.assertListEqual(char_class.negative.codepoints, []) self.assertEqual(len(char_class), sys.maxunicode) char_class.discard('\\s') self.assertListEqual(char_class.positive.codepoints, [(0, 9), (11, 13), (14, 32), (33, 1114112)]) self.assertEqual(len(char_class), sys.maxunicode - 3) char_class.discard('\\S') self.assertEqual(len(char_class), 0) char_class.clear() char_class.negative.codepoints.append(10) char_class.discard('\\s') self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(9, 11), 13, 32]) char_class = CharacterClass('\t') char_class.complement() self.assertListEqual(char_class.negative.codepoints, [9]) char_class.discard('\\n') self.assertListEqual(char_class.positive.codepoints, []) self.assertListEqual(char_class.negative.codepoints, [(9, 11)]) self.assertEqual(len(char_class), sys.maxunicode - 1) class TestUnicodeData(unittest.TestCase): """Test the UnicodeData installation and its subsets.""" def test_unicode_categories(self): cps_of_categories = Counter( {k: len(unicode_category(k)) for k in CATEGORIES if len(k) > 1} ) expected_cps = Counter(category(chr(cp)) for cp in range(sys.maxunicode + 1)) self.assertEqual(cps_of_categories, expected_cps) if sys.version_info >= (3, 10): self.assertEqual(cps_of_categories.total(), sys.maxunicode + 1) else: self.assertEqual(sum(cps_of_categories.values()), sys.maxunicode + 1) self.assertEqual(min([min(unicode_category(k)) for k in CATEGORIES]), 0) self.assertEqual( max([max(unicode_category(k)) for k in CATEGORIES]), sys.maxunicode ) base_sets = [set(unicode_category(k)) for k in CATEGORIES if len(k) > 1] self.assertFalse(any(s.intersection(t) for s in base_sets for t in base_sets if s != t)) def test_unicodedata_category(self): for key in CATEGORIES: for cp in unicode_category(key): uc = category(chr(cp)) if key == uc or len(key) == 1 and key == uc[0]: continue self.assertTrue( False, "Wrong category %r for code point %d (should be %r)." % (uc, cp, key) ) def test_unicode_block_key(self): self.assertEqual( UnicodeData._unicode_block_key('Latin-1 Supplement'), 'LATIN1SUPPLEMENT') self.assertEqual( UnicodeData._unicode_block_key('Latin Extended-B'), 'LATINEXTENDEDB' ) def test_basic_latin_unicode_block(self): with self.assertRaises(KeyError): unicode_block('Basic Latin') subset = unicode_block('BasicLatin') self.assertEqual(len(subset), 128) for cp in range(0, 0x80): self.assertIn(cp, subset) self.assertNotIn(-1, subset) self.assertNotIn(128, subset) self.assertSetEqual(subset, {x for x in range(0, 0x80)}) def test_latin1_supplement_unicode_block(self): with self.assertRaises(KeyError): unicode_block('Latin-1 Supplement') subset = unicode_block('Latin-1Supplement') self.assertEqual(len(subset), 128) for cp in range(0x80, 0x100): self.assertIn(cp, subset) self.assertNotIn(0x7F, subset) self.assertNotIn(0x100, subset) self.assertSetEqual(subset, {x for x in range(0x80, 0x100)}) def test_latin_extended_a_unicode_block(self): with self.assertRaises(KeyError): unicode_block('Latin Extended-A') subset = unicode_block('LatinExtended-A') self.assertEqual(len(subset), 128) for cp in range(0x100, 0x180): self.assertIn(cp, subset) self.assertNotIn(0xFF, subset) self.assertNotIn(0x180, subset) self.assertSetEqual(subset, {x for x in range(0x100, 0x180)}) def test_latin_extended_b_unicode_block(self): with self.assertRaises(KeyError): unicode_block('Latin Extended-B') subset = unicode_block('LatinExtended-B') self.assertEqual(len(subset), 208) for cp in range(0x180, 0x250): self.assertIn(cp, subset) self.assertNotIn(0x17F, subset) self.assertNotIn(0x250, subset) self.assertSetEqual(subset, {x for x in range(0x180, 0x250)}) def test_others_unicode_blocks(self): self.assertEqual(len(unicode_block('IPAExtensions')), 96) self.assertEqual(len(unicode_block('SpacingModifierLetters')), 80) self.assertEqual(len(unicode_block('CombiningDiacriticalMarks')), 112) self.assertEqual(len(unicode_block('GreekandCoptic')), 144) self.assertEqual(len(unicode_block('Cyrillic')), 256) # A block can have unassigned codepoints ncp = len(unicode_block('GreekandCoptic') - unicode_category('Cn')) self.assertEqual(ncp, 135) @unittest.skipIf(unidata_version[:2] >= '16', f"Unicode {unidata_version} is installed") def test_install_unicode_data(self): self.assertEqual(unidata_version, unicode_version()) self.assertNotIn(42971, unicode_category('Ll')) install_unicode_data('16.0.0') self.assertEqual('16.0.0', unicode_version()) self.assertIn(42971, unicode_category('Ll')) install_unicode_data() self.assertEqual(unidata_version, unicode_version()) self.assertNotIn(42971, unicode_category('Ll')) install_unicode_data('16.0.0', 'elementpath.regex.unicode_categories') self.assertEqual('16.0.0', unicode_version()) self.assertIn(42971, unicode_category('Ll')) install_unicode_data() self.assertEqual(unidata_version, unicode_version()) self.assertNotIn(42971, unicode_category('Ll')) with self.assertRaises(ValueError) as ctx: install_unicode_data('14.1.0') self.assertEqual(str(ctx.exception), "argument is not a valid Unicode version") with self.assertRaises(TypeError) as ctx: install_unicode_data(name_or_url='elementpath.regex.unicode_categories') self.assertEqual(str(ctx.exception), "you must specify a version to install") self.assertEqual(unidata_version, unicode_version()) @unittest.skipIf(unidata_version[:2] < '16', f"Unicode {unidata_version} is installed") def test_install_previous_unicode_data(self): self.assertEqual(unidata_version, unicode_version()) self.assertIn(42971, unicode_category('Ll')) install_unicode_data('15.0.0') self.assertEqual('15.0.0', unicode_version()) self.assertNotIn(42971, unicode_category('Ll')) install_unicode_data() self.assertEqual(unidata_version, unicode_version()) self.assertIn(42971, unicode_category('Ll')) install_unicode_data('15.0.0', 'elementpath.regex.unicode_categories') self.assertEqual('15.0.0', unicode_version()) self.assertNotIn(42971, unicode_category('Ll')) install_unicode_data() self.assertEqual(unidata_version, unicode_version()) self.assertIn(42971, unicode_category('Ll')) with self.assertRaises(ValueError) as ctx: install_unicode_data('14.1.0') self.assertEqual(str(ctx.exception), "argument is not a valid Unicode version") with self.assertRaises(TypeError) as ctx: install_unicode_data(name_or_url='elementpath.regex.unicode_categories') self.assertEqual(str(ctx.exception), "you must specify a version to install") self.assertEqual(unidata_version, unicode_version()) @unittest.skipUnless('TEST_UNICODE_INSTALLATION' in os.environ, "Skip UnicodeData.txt installation") def test_unicode_data_installation_from_source(self): self.assertEqual(unidata_version, unicode_version()) self.assertIn(42998, unicode_category('Ll')) version = os.environ.get('TEST_UNICODE_INSTALLATION') version_info = tuple(map(int, version.split('.'))) self.assertLess(version_info, (13, 0, 0)) install_unicode_data( version, f'https://www.unicode.org/Public/{version}/ucd/UnicodeData.txt' ) self.assertEqual(version, unicode_version()) self.assertNotIn(42998, unicode_category('Ll')) install_unicode_data() self.assertEqual(unidata_version, unicode_version()) self.assertIn(42998, unicode_category('Ll')) class TestPatterns(unittest.TestCase): """ Test of specific regex patterns and their application. """ def test_issue_079(self): # Do not escape special characters in character class regex = translate_pattern('[^\n\t]+', anchors=False) self.assertEqual(regex, '^([^\t\n]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertIsNone(pattern.search('first\tsecond\tthird')) self.assertEqual(pattern.search('first second third').group(0), 'first second third') def test_dot_wildcard(self): regex = translate_pattern('.+', anchors=False) self.assertEqual(regex, '^([^\\r\\n]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertIsNone(pattern.search('line1\rline2\r')) self.assertIsNone(pattern.search('line1\nline2')) self.assertIsNone(pattern.search('')) self.assertIsNotNone(pattern.search('\\')) self.assertEqual(pattern.search('abc').group(0), 'abc') regex = translate_pattern('.+T.+(Z|[+-].+)', anchors=False) self.assertEqual(regex, '^([^\\r\\n]+T[^\\r\\n]+(Z|[\\+\\-][^\\r\\n]+))$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('12T0A3+36').group(0), '12T0A3+36') self.assertEqual(pattern.search('12T0A3Z').group(0), '12T0A3Z') self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('12T0A3Z2')) def test_not_spaces(self): regex = translate_pattern(r"[\S' ']{1,10}", anchors=False) self.assertEqual( regex, "^([\x00-\x08\x0b\x0c\x0e-\x1f!-\U0010ffff ']{1,10})$(?!\\n\\Z)" ) pattern = re.compile(regex) # self.assertIsNone(pattern.search('alpha\r')) self.assertEqual(pattern.search('beta').group(0), 'beta') self.assertIsNone(pattern.search('beta\n')) self.assertIsNone(pattern.search('beta\n ')) self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('over the maximum length!')) self.assertIsNotNone(pattern.search('\\')) self.assertEqual(pattern.search('abc').group(0), 'abc') def test_category_escape(self): regex = translate_pattern('^\\p{IsBasicLatin}*$') self.assertEqual(regex, '^[\x00-\x7f]*$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('').group(0), '') self.assertEqual(pattern.search('e').group(0), 'e') self.assertIsNone(pattern.search('è')) regex = translate_pattern('^[\\p{IsBasicLatin}\\p{IsLatin-1Supplement}]*$') self.assertEqual(regex, '^[\x00-\xff]*$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('e').group(0), 'e') self.assertEqual(pattern.search('è').group(0), 'è') self.assertIsNone(pattern.search('Ĭ')) def test_digit_shortcut(self): regex = translate_pattern(r'\d{1,3}\.\d{1,2}', anchors=False) self.assertEqual(regex, r'^(\d{1,3}\.\d{1,2})$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('12.40').group(0), '12.40') self.assertEqual(pattern.search('867.00').group(0), '867.00') self.assertIsNone(pattern.search('867.00\n')) self.assertIsNone(pattern.search('867.00 ')) self.assertIsNone(pattern.search('867.000')) self.assertIsNone(pattern.search('1867.0')) self.assertIsNone(pattern.search('a1.13')) regex = translate_pattern(r'[-+]?(\d+|\d+(\.\d+)?%)', anchors=False) self.assertEqual(regex, r'^([\+\-]?(\d+|\d+(\.\d+)?%))$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('78.8%').group(0), '78.8%') self.assertIsNone(pattern.search('867.00')) def test_character_class_reordering(self): regex = translate_pattern('[A-Z ]', anchors=False) self.assertEqual(regex, '^([ A-Z])$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('A').group(0), 'A') self.assertEqual(pattern.search('Z').group(0), 'Z') self.assertEqual(pattern.search('Q').group(0), 'Q') self.assertEqual(pattern.search(' ').group(0), ' ') self.assertIsNone(pattern.search(' ')) self.assertIsNone(pattern.search('AA')) regex = translate_pattern(r'[0-9.,DHMPRSTWYZ/:+\-]+', anchors=False) self.assertEqual(regex, r'^([\+-\-\.-:DHMPR-TWYZ]+)$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('12,40').group(0), '12,40') self.assertEqual(pattern.search('YYYY:MM:DD').group(0), 'YYYY:MM:DD') self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('C')) regex = translate_pattern('[^: \n\r\t]+', anchors=False) self.assertEqual(regex, '^([^\t\n\r :]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('56,41').group(0), '56,41') self.assertIsNone(pattern.search('56,41\n')) self.assertIsNone(pattern.search('13:20')) regex = translate_pattern(r'^[A-Za-z0-9_\-]+(:[A-Za-z0-9_\-]+)?$') self.assertEqual(regex, r'^[\-0-9A-Z_a-z]+(:[\-0-9A-Z_a-z]+)?$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('fa9').group(0), 'fa9') self.assertIsNone(pattern.search('-x_1:_tZ-\n')) self.assertEqual(pattern.search('-x_1:_tZ-').group(0), '-x_1:_tZ-') self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('+78')) regex = translate_pattern(r'[!%\^\*@~;#,|/]', anchors=False) self.assertEqual(regex, r'^([!#%\*,/;@\^\|~])$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('#').group(0), '#') self.assertEqual(pattern.search('!').group(0), '!') self.assertEqual(pattern.search('^').group(0), '^') self.assertEqual(pattern.search('|').group(0), '|') self.assertEqual(pattern.search('*').group(0), '*') self.assertIsNone(pattern.search('**')) self.assertIsNone(pattern.search('b')) self.assertIsNone(pattern.search('')) regex = translate_pattern('[A-Za-z]+:[A-Za-z][A-Za-z0-9\\-]+', anchors=False) self.assertEqual(regex, '^([A-Za-z]+:[A-Za-z][\\-0-9A-Za-z]+)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('zk:xy-9s').group(0), 'zk:xy-9s') self.assertIsNone(pattern.search('xx:y')) def test_occurrences_qualifiers(self): regex = translate_pattern('#[0-9a-fA-F]{3}([0-9a-fA-F]{3})?', anchors=False) self.assertEqual(regex, r'^(#[0-9A-Fa-f]{3}([0-9A-Fa-f]{3})?)$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('#F3D').group(0), '#F3D') self.assertIsNone(pattern.search('#F3D\n')) self.assertEqual(pattern.search('#F3DA30').group(0), '#F3DA30') self.assertIsNone(pattern.search('#F3')) self.assertIsNone(pattern.search('#F3D ')) self.assertIsNone(pattern.search('F3D')) self.assertIsNone(pattern.search('')) def test_or_operator(self): regex = translate_pattern('0|1', anchors=False) self.assertEqual(regex, r'^(0|1)$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('0').group(0), '0') self.assertEqual(pattern.search('1').group(0), '1') self.assertIsNone(pattern.search('1\n')) self.assertIsNone(pattern.search('')) self.assertIsNone(pattern.search('2')) self.assertIsNone(pattern.search('01')) self.assertIsNone(pattern.search('1\n ')) regex = translate_pattern(r'\d+[%]|\d*\.\d+[%]', anchors=False) self.assertEqual(regex, r'^(\d+[%]|\d*\.\d+[%])$(?!\n\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('99%').group(0), '99%') self.assertEqual(pattern.search('99.9%').group(0), '99.9%') self.assertEqual(pattern.search('.90%').group(0), '.90%') self.assertIsNone(pattern.search('%')) self.assertIsNone(pattern.search('90.%')) regex = translate_pattern('([ -~]|\n|\r|\t)*', anchors=False) self.assertEqual(regex, '^(([ -~]|\n|\r|\t)*)$(?!\\n\\Z)') pattern = re.compile(regex) self.assertEqual(pattern.search('ciao\t-~ ').group(0), 'ciao\t-~ ') self.assertEqual(pattern.search('\r\r').group(0), '\r\r') self.assertEqual(pattern.search('\n -.abc').group(0), '\n -.abc') self.assertIsNone(pattern.search('à')) self.assertIsNone(pattern.search('\t\n à')) def test_character_class_shortcuts(self): regex = translate_pattern(r"^[\i-[:]][\c-[:]]*$") pattern = re.compile(regex) self.assertEqual(pattern.search('x11').group(0), 'x11') self.assertIsNone(pattern.search('3a')) regex = translate_pattern(r"^\w*$") pattern = re.compile(regex) self.assertEqual(pattern.search('aA_x7').group(0), 'aA_x7') self.assertIsNone(pattern.search('.')) self.assertIsNone(pattern.search('-')) regex = translate_pattern(r"\W*", anchors=False) pattern = re.compile(regex) self.assertIsNone(pattern.search('aA_x7')) self.assertEqual(pattern.search('.-').group(0), '.-') regex = translate_pattern(r"^\d*$") pattern = re.compile(regex) self.assertEqual(pattern.search('6410').group(0), '6410') self.assertIsNone(pattern.search('a')) self.assertIsNone(pattern.search('-')) regex = translate_pattern(r"^\D*$") pattern = re.compile(regex) self.assertIsNone(pattern.search('6410')) self.assertEqual(pattern.search('a').group(0), 'a') self.assertEqual(pattern.search('-').group(0), '-') # Pull Request 114 regex = translate_pattern(r"^[\w]{0,5}$") pattern = re.compile(regex) self.assertEqual(pattern.search('abc').group(0), 'abc') self.assertIsNone(pattern.search('.')) regex = translate_pattern(r"^[\W]{0,5}$") pattern = re.compile(regex) self.assertEqual(pattern.search('.').group(0), '.') self.assertIsNone(pattern.search('abc')) def test_character_class_range(self): regex = translate_pattern('[bc-]') self.assertEqual(regex, r'[\-bc]') def test_character_class_subtraction(self): regex = translate_pattern('[a-z-[aeiuo]]') self.assertEqual(regex, '[b-df-hj-np-tv-z]') # W3C XSD 1.1 test group RegexTest_422 regex = translate_pattern('[^0-9-[a-zAE-Z]]') self.assertEqual(regex, '[^0-9AE-Za-z]') regex = translate_pattern(r'^([^0-9-[a-zAE-Z]]|[\w-[a-zAF-Z]])+$') pattern = re.compile(regex) self.assertIsNone(pattern.search('azBCDE1234567890BCDEFza')) self.assertEqual(pattern.search('BCD').group(0), 'BCD') def test_invalid_character_class(self): with self.assertRaises(RegexError) as ctx: translate_pattern('[[]') self.assertIn("invalid character '['", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('ab]d') self.assertIn("unexpected meta character ']'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('[abc\\1]') self.assertIn("illegal back-reference in character class", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('[--a]') self.assertIn("invalid character range '--'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('[a-z-[c-q') self.assertIn("unterminated character class", str(ctx.exception)) def test_empty_character_class(self): regex = translate_pattern('[a-[a-f]]', anchors=False) self.assertEqual(regex, r'^([^\w\W])$(?!\n\Z)') self.assertRaises(RegexError, translate_pattern, '[]') self.assertEqual(translate_pattern(r'[\w-[\w]]'), r'[^\w\W]') self.assertEqual(translate_pattern(r'[\s-[\s]]'), r'[^\w\W]') self.assertEqual(translate_pattern(r'[\c-[\c]]'), r'[^\w\W]') self.assertEqual(translate_pattern(r'[\i-[\i]]'), r'[^\w\W]') self.assertEqual(translate_pattern('[a-[ab]]'), r'[^\w\W]') self.assertEqual(translate_pattern('[^a-[^a]]'), r'[^\w\W]') def test_back_references(self): self.assertEqual(translate_pattern('(a)\\1'), '(a)\\1') self.assertEqual(translate_pattern('(a)\\11'), '(a)\\1[1]') regex = translate_pattern('((((((((((((a))))))))))))\\11') self.assertEqual(regex, '((((((((((((a))))))))))))\\11') with self.assertRaises(RegexError) as ctx: translate_pattern('(a)\\1', back_references=False) self.assertIn("not allowed escape sequence", str(ctx.exception)) def test_anchors(self): regex = translate_pattern('a^b') self.assertEqual(regex, 'a^b') regex = translate_pattern('a^b', anchors=False) self.assertEqual(regex, '^(a\\^b)$(?!\\n\\Z)') regex = translate_pattern('ab$') self.assertEqual(regex, 'ab$(?!\\n\\Z)') regex = translate_pattern('ab$', anchors=False) self.assertEqual(regex, '^(ab\\$)$(?!\\n\\Z)') def test_lazy_quantifiers(self): regex = translate_pattern('.*?') self.assertEqual(regex, '[^\\r\\n]*?') regex = translate_pattern('[a-z]{2,3}?') self.assertEqual(regex, '[a-z]{2,3}?') regex = translate_pattern('[a-z]*?') self.assertEqual(regex, '[a-z]*?') regex = translate_pattern('[a-z]*', lazy_quantifiers=False) self.assertEqual(regex, '[a-z]*') with self.assertRaises(RegexError) as ctx: translate_pattern('.*?', lazy_quantifiers=False) self.assertEqual(str(ctx.exception), "unexpected meta character '?' at position 2: '.*?'") with self.assertRaises(RegexError): translate_pattern('[a-z]{2,3}?', lazy_quantifiers=False) with self.assertRaises(RegexError): translate_pattern(r'[a-z]{2,3}?\s+', lazy_quantifiers=False) with self.assertRaises(RegexError): translate_pattern(r'[a-z]+?\s+', lazy_quantifiers=False) def test_invalid_quantifiers(self): with self.assertRaises(RegexError) as ctx: translate_pattern('{1}') self.assertIn("unexpected quantifier '{'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('.{1,2,3}') self.assertIn("invalid quantifier '{'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('*') self.assertIn("unexpected quantifier '*'", str(ctx.exception)) def test_invalid_hyphen(self): with self.assertRaises(RegexError) as ctx: translate_pattern('[a-b-c]') self.assertIn("unescaped character '-' at position 4", str(ctx.exception)) regex = translate_pattern('[a-b-c]', xsd_version='1.1') self.assertEqual(regex, '[\\-a-c]') self.assertEqual(translate_pattern('[-a-bc]'), regex) self.assertEqual(translate_pattern('[a-bc-]'), regex) def test_invalid_pattern_groups(self): with self.assertRaises(RegexError) as ctx: translate_pattern('(?.*)') self.assertIn("invalid '(?...)' extension notation", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('(.*))') self.assertIn("unbalanced parenthesis ')'", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('((.*)') self.assertIn("unterminated subpattern in expression", str(ctx.exception)) def test_extra_escapes(self): self.assertEqual(translate_pattern('^{2}alpha'), '(?:^){2}alpha') self.assertEqual(translate_pattern('alp^+ha'), 'alp(?:^)+ha') def test_verbose_patterns(self): regex = translate_pattern('\\ s*[a-z]+', flags=re.VERBOSE) self.assertEqual(regex, '\\s*[a-z]+') regex = translate_pattern('\\ p{ Is BasicLatin}+', flags=re.VERBOSE) self.assertEqual(regex, '[\x00-\x7f]+') def test_backslash_and_escapes(self): regex = translate_pattern('\\') self.assertEqual(regex, '\\') regex = translate_pattern('\\i') self.assertTrue(regex.startswith('[:A-Z_a-z')) regex = translate_pattern('\\I') self.assertTrue(regex.startswith('[^:A-Z_a-z')) regex = translate_pattern('\\c') self.assertTrue(regex.startswith('[-.0-9:A-Z_a-z')) regex = translate_pattern('\\C') self.assertTrue(regex.startswith('[^-.0-9:A-Z_a-z')) def test_block_escapes(self): regex = translate_pattern('\\p{P}') self.assertTrue(regex.startswith('[!-#%-')) regex = translate_pattern('\\P{P}') self.assertTrue(regex.startswith('[^!-#%-')) regex = translate_pattern('\\p{IsBasicLatin}') self.assertEqual(regex, '[\x00-\x7f]') regex = translate_pattern('\\p{IsBasicLatin}', flags=re.IGNORECASE) self.assertEqual(regex, '(?-i:[\x00-\x7f])') with self.assertRaises(RegexError) as ctx: translate_pattern('\\px') self.assertIn("a '{' expected", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('\\p{Pu') self.assertIn("truncated unicode block escape", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('\\p{Unknown}') self.assertIn("'Unknown' doesn't match any Unicode category", str(ctx.exception)) regex = translate_pattern('\\p{IsUnknown}', xsd_version='1.1') self.assertEqual(regex, '[\x00-\U0010fffe]') with self.assertRaises(RegexError) as ctx: translate_pattern('\\p{IsUnknown}') self.assertIn("'IsUnknown' doesn't match any Unicode block", str(ctx.exception)) def test_ending_newline_match(self): # Related with xmlschema's issue #223 regex = translate_pattern( pattern=r"\d{2}:\d{2}:\d{6,7}", back_references=False, lazy_quantifiers=False, anchors=False ) pattern = re.compile(regex) self.assertIsNotNone(pattern.match("38:36:000031")) self.assertIsNone(pattern.match("38:36:000031\n")) def test_possessive_quantifiers(self): # Note: possessive quantifiers (*+, ++, ?+, {m,n}+) are supported in Python 3.11+ with self.assertRaises(RegexError) as ctx: translate_pattern('^[abcd]*+$') self.assertIn("unexpected meta character '+' at position 8", str(ctx.exception)) with self.assertRaises(RegexError) as ctx: translate_pattern('^[abcd]{1,5}+$') self.assertIn("unexpected meta character '+' at position 12", str(ctx.exception)) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_schema_context.py000066400000000000000000000167121476131650400245150ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from copy import copy from textwrap import dedent from elementpath import XPath2Parser, XPathSchemaContext from elementpath.datatypes import UntypedAtomic try: # noinspection PyPackageRequirements import xmlschema except (ImportError, AttributeError): xmlschema = None @unittest.skipIf(xmlschema is None, "xmlschema library required") class XMLSchemaContextTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.schema1 = xmlschema.XMLSchema(dedent('''\ ''')) cls.schema2 = xmlschema.XMLSchema(dedent('''\ ''')) def test_name_token(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") schema_context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] token = parser.parse('a') context = copy(schema_context) element_node = context.root[0] self.assertIs(element_node.elem, elem_a) self.assertIs(element_node.xsd_type, elem_a.type) result = token.evaluate(context) self.assertListEqual(result, [element_node]) elem_b1 = elem_a.type.content[0] token = parser.parse('a/b1') context = copy(schema_context) element_node = context.root[0][0] self.assertIs(element_node.elem, elem_b1) self.assertIs(element_node.xsd_type, elem_b1.type) result = token.evaluate(context) self.assertListEqual(result, [element_node]) def test_colon_token(self): parser = XPath2Parser(namespaces={'tst': "http://xpath.test/ns"}) context = XPathSchemaContext(self.schema1) token = parser.parse('tst:a') self.assertEqual(token.symbol, ':') result = token.evaluate(copy(context)) self.assertListEqual(result, [context.root[0]]) token = parser.parse('tst:a/b1') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, ':') result = token.evaluate(copy(context)) self.assertListEqual(result, [context.root[0][0]]) token = parser.parse('tst:a/tst:b1') result = token.evaluate(copy(context)) self.assertListEqual(result, []) token = parser.parse('tst:a/tst:b3') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, ':') result = token.evaluate(copy(context)) self.assertListEqual(result, [context.root[0][2]]) def test_extended_name_token(self): parser = XPath2Parser(strict=False) context = XPathSchemaContext(self.schema1) token = parser.parse('{http://xpath.test/ns}a') self.assertEqual(token.symbol, '{') self.assertEqual(token[0].symbol, '(string)') self.assertEqual(token[1].symbol, '(name)') self.assertEqual(token[1].value, 'a') result = token.evaluate(context) self.assertListEqual(result, [context.root[0]]) def test_wildcard_token(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) elem_a = self.schema1.elements['a'] elem_b3 = self.schema1.elements['b3'] token = parser.parse('*') self.assertEqual(token.symbol, '*') result = token.evaluate(context) self.assertListEqual([e.value for e in result], [elem_a, elem_b3]) token = parser.parse('a/*') self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, '(name)') self.assertEqual(token[1].symbol, '*') result = token.evaluate(context) self.assertListEqual([e.value for e in result], elem_a.type.content[:]) def test_dot_shortcut_token(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) token = parser.parse('.') result = token.evaluate(context) self.assertListEqual(result, [context.root]) context = XPathSchemaContext(self.schema1, item=self.schema1) token = parser.parse('.') result = token.evaluate(context) self.assertListEqual(result, [context.root]) context = XPathSchemaContext(self.schema1, item=self.schema2) schema2_node = context.item token = parser.parse('.') result = token.evaluate(context) self.assertListEqual(result, [schema2_node]) def test_schema_variables(self): variable_types = {'a': 'item()', 'b': 'xs:integer?', 'c': 'xs:string'} parser = XPath2Parser( default_namespace="http://xpath.test/ns", variable_types=variable_types, schema=self.schema1.xpath_proxy, ) context = XPathSchemaContext(self.schema1) token = parser.parse('$a') result = token.evaluate(context) self.assertIsInstance(result, UntypedAtomic) self.assertEqual(result.value, '1') token = parser.parse('$b') result = token.evaluate(context) self.assertIsInstance(result, int) self.assertEqual(result, 1) token = parser.parse('$c') result = token.evaluate(context) self.assertIsInstance(result, str) self.assertEqual(result, ' alpha\t') token = parser.parse('$z') self.assertListEqual(token.evaluate(context), []) def test_not_applicable_functions(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) token = parser.parse("fn:collection('filepath')") self.assertListEqual(token.evaluate(context), []) token = parser.parse("fn:doc-available('tns1')") self.assertFalse(token.evaluate(context)) token = parser.parse("fn:root(.)") self.assertListEqual(token.evaluate(context), []) token = parser.parse("fn:id('ID21256')") self.assertListEqual(token.evaluate(context), []) token = parser.parse("fn:idref('ID21256')") self.assertListEqual(token.evaluate(context), []) def test_if_statement(self): parser = XPath2Parser(default_namespace="http://xpath.test/ns") context = XPathSchemaContext(self.schema1) token = parser.parse('if ($x > 1) then a/b1 else a/b2') result = token.evaluate(context) self.assertListEqual(result, [context.root[0][1]]) token = parser.parse('if ($x > xs:date("2010-01-01")) then a/b1 else a/b2') result = token.evaluate(context) self.assertListEqual(result, [context.root[0][1]]) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_schema_proxy.py000066400000000000000000000442341476131650400242120ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import xml.etree.ElementTree as ElementTree import io from textwrap import dedent try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath import AttributeNode, XPathContext, XPath2Parser, MissingContextError, \ get_node_tree, ElementNode from elementpath.namespaces import XML_LANG, XSD_NAMESPACE, XSD_ANY_ATOMIC_TYPE, XSD_NOTATION try: # noinspection PyPackageRequirements import xmlschema from xmlschema.xpath import XMLSchemaProxy except (ImportError, AttributeError): xmlschema = None try: from tests import xpath_test_class except ImportError: import xpath_test_class @unittest.skipIf(xmlschema is None, "xmlschema library required.") class XMLSchemaProxyTest(xpath_test_class.XPathTestCase): @classmethod def setUpClass(cls): cls.schema = xmlschema.XMLSchema(''' ''') def setUp(self): self.schema_proxy = XMLSchemaProxy(self.schema) self.parser = XPath2Parser(namespaces=self.namespaces, schema=self.schema_proxy) def test_abstract_xsd_schema(self): class GlobalMaps: types = {} attributes = {} elements = {} substitution_groups = {} class XsdSchema: tag = '{%s}schema' % XSD_NAMESPACE xsd_version = '1.1' maps = GlobalMaps() text = None @property def attrib(self): return {} def __iter__(self): return iter(()) def find(self, path, namespaces=None): return schema = XsdSchema() self.assertEqual(schema.tag, '{http://www.w3.org/2001/XMLSchema}schema') self.assertIsNone(schema.text) def test_schema_proxy_init(self): schema_src = """ """ schema_tree = ElementTree.parse(io.StringIO(schema_src)) self.assertIsInstance(XMLSchemaProxy(), XMLSchemaProxy) self.assertIsInstance(XMLSchemaProxy(xmlschema.XMLSchema(schema_src)), XMLSchemaProxy) with self.assertRaises(TypeError): XMLSchemaProxy(schema=schema_tree) with self.assertRaises(TypeError): XMLSchemaProxy(schema=xmlschema.XMLSchema(schema_src), base_element=schema_tree) with self.assertRaises(TypeError): XMLSchemaProxy(schema=xmlschema.XMLSchema(schema_src), base_element=schema_tree.getroot()) schema = xmlschema.XMLSchema(schema_src) with self.assertRaises(ValueError): XMLSchemaProxy(base_element=schema.elements['test_element']) def test_xmlschema_proxy(self): context = XPathContext( root=self.etree.XML('') ) self.wrong_syntax("schema-element(*)") self.wrong_name("schema-element(nil)") self.wrong_name("schema-element(xs:string)") self.check_value("schema-element(xs:complexType)", MissingContextError) self.check_value("self::schema-element(xs:complexType)", NameError, context) self.check_value("self::schema-element(xs:schema)", [context.item], context) self.check_tree("schema-element(xs:group)", '(schema-element (: (xs) (group)))') attribute = context.item = AttributeNode(XML_LANG, 'en') self.wrong_syntax("schema-attribute(*)") self.wrong_name("schema-attribute(nil)") self.wrong_name("schema-attribute(xs:string)") self.check_value("schema-attribute(xml:lang)", MissingContextError) self.check_value("schema-attribute(xml:lang)", NameError, context) self.check_value("self::schema-attribute(xml:lang)", [context.item], context) self.check_tree("schema-attribute(xsi:schemaLocation)", '(schema-attribute (: (xsi) (schemaLocation)))') token = self.parser.parse("self::schema-attribute(xml:lang)") context.item = attribute context.axis = 'attribute' self.assertEqual(list(token.select(context)), [context.item]) def test_bind_parser_method(self): schema_src = dedent(""" """) schema = xmlschema.XMLSchema(schema_src) schema_proxy = XMLSchemaProxy(schema=schema) parser = XPath2Parser(namespaces=self.namespaces) self.assertFalse(parser.is_schema_bound()) schema_proxy.bind_parser(parser) self.assertTrue(parser.is_schema_bound()) self.assertIs(schema_proxy, parser.schema) # To test AbstractSchemaProxy.bind_parser() parser = XPath2Parser(namespaces=self.namespaces) super(XMLSchemaProxy, schema_proxy).bind_parser(parser) self.assertIs(schema_proxy, parser.schema) super(XMLSchemaProxy, schema_proxy).bind_parser(parser) self.assertIs(schema_proxy, parser.schema) def test_schema_constructors(self): schema_src = dedent(""" """) schema = xmlschema.XMLSchema(schema_src) schema_proxy = XMLSchemaProxy(schema=schema) parser = XPath2Parser(namespaces=self.namespaces, schema=schema_proxy) with self.assertRaises(NameError) as ctx: parser.schema_constructor(XSD_ANY_ATOMIC_TYPE) self.assertIn('XPST0080', str(ctx.exception)) with self.assertRaises(NameError) as ctx: parser.schema_constructor(XSD_NOTATION) self.assertIn('XPST0080', str(ctx.exception)) token = parser.parse('stringType("apple")') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), 'apple') token = parser.parse('stringType(())') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), []) token = parser.parse('stringType(10)') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), '10') token = parser.parse('stringType(.)') self.assertEqual(token.symbol, 'stringType') self.assertEqual(token.label, 'constructor function') token = parser.parse('intType(10)') self.assertEqual(token.symbol, 'intType') self.assertEqual(token.label, 'constructor function') self.assertEqual(token.evaluate(), 10) with self.assertRaises(ValueError) as ctx: parser.parse('intType(true())') self.assertIn('FORG0001', str(ctx.exception)) def test_get_context_method(self): schema_proxy = XMLSchemaProxy() self.assertIsInstance(schema_proxy.get_context(), XPathContext) self.assertIsInstance(super(XMLSchemaProxy, schema_proxy).get_context(), XPathContext) def test_get_type_api(self): schema_proxy = XMLSchemaProxy() self.assertIsNone(schema_proxy.get_type('unknown')) self.assertEqual(schema_proxy.get_type('{%s}string' % XSD_NAMESPACE), xmlschema.XMLSchema.builtin_types()['string']) def test_xsd_version_api(self): self.assertEqual(self.schema_proxy.xsd_version, '1.0') def test_find_api(self): schema_src = """ """ schema = xmlschema.XMLSchema(schema_src) schema_proxy = XMLSchemaProxy(schema=schema) self.assertEqual(schema_proxy.find('/test_element'), schema.elements['test_element']) def test_get_attribute_api(self): self.assertIs( self.schema_proxy.get_attribute("{http://xpath.test/ns}test_attribute"), self.schema_proxy._schema.maps.attributes["{http://xpath.test/ns}test_attribute"] ) def test_get_element_api(self): self.assertIs( self.schema_proxy.get_element("{http://xpath.test/ns}test_element"), self.schema_proxy._schema.maps.elements["{http://xpath.test/ns}test_element"] ) def test_get_substitution_group_api(self): self.assertIsNone(self.schema_proxy.get_substitution_group('x')) def test_is_instance_api(self): self.assertFalse(self.schema_proxy.is_instance(True, '{%s}integer' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance(5, '{%s}integer' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('alpha', '{%s}integer' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha', '{%s}string' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha beta', '{%s}token' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha', '{%s}Name' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('alpha beta', '{%s}Name' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('1alpha', '{%s}Name' % XSD_NAMESPACE)) self.assertTrue(self.schema_proxy.is_instance('alpha', '{%s}NCName' % XSD_NAMESPACE)) self.assertFalse(self.schema_proxy.is_instance('eg:alpha', '{%s}NCName' % XSD_NAMESPACE)) def test_cast_as_api(self): schema_proxy = XMLSchemaProxy() self.assertEqual(schema_proxy.cast_as('19', '{%s}short' % XSD_NAMESPACE), 19) def test_attributes_type(self): parser = XPath2Parser(namespaces=self.namespaces) token = parser.parse("@min le @max") context = XPathContext(self.etree.XML('')) self.assertTrue(token.evaluate(context)) context = XPathContext(self.etree.XML('')) self.assertTrue(token.evaluate(context)) schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces=self.namespaces, schema=XMLSchemaProxy(schema, schema.elements['range'])) token = parser.parse("@min le @max") context = XPathContext( self.etree.XML(''), schema=parser.schema ) self.assertEqual(context.root.type_name, '{http://xpath.test/ns}intRange') self.assertEqual(context.root.attributes[0].type_name, '{http://www.w3.org/2001/XMLSchema}int') self.assertEqual(context.root.attributes[1].type_name, '{http://www.w3.org/2001/XMLSchema}int') self.assertTrue(token.evaluate(context)) context = XPathContext( self.etree.XML(''), schema=parser.schema ) self.assertFalse(token.evaluate(context)) schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces=self.namespaces, schema=XMLSchemaProxy(schema, schema.elements['range'])) self.assertRaises(TypeError, parser.parse, '@min le @max') def test_elements_type(self): schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces={'': "http://xpath.test/ns", 'xs': XSD_NAMESPACE}, schema=XMLSchemaProxy(schema)) root = ElementTree.XML( '' 'foo8true2.0' ) root_node = get_node_tree(root, namespaces={'': "http://xpath.test/ns"}) for node in root_node.iter(): if isinstance(node, ElementNode): self.assertFalse(node.is_typed) root_node.apply_schema(parser.schema) for node in root_node.iter(): if isinstance(node, ElementNode): self.assertTrue(node.is_typed) def test_elements_and_attributes_type(self): schema = xmlschema.XMLSchema(''' ''') parser = XPath2Parser(namespaces={'': "http://xpath.test/ns", 'xs': XSD_NAMESPACE}, schema=XMLSchemaProxy(schema)) token = parser.parse("//b/@min lt //b/@max") root = self.etree.XML('') context = XPathContext( root, namespaces={'': "http://xpath.test/ns"}, schema=parser.schema ) self.assertEqual(token.evaluate(context), []) root = self.etree.XML('30') context = XPathContext( root, namespaces={'': "http://xpath.test/ns"}, schema=parser.schema ) self.assertEqual(token.evaluate(context), []) root = self.etree.XML( '30') context = XPathContext( root, namespaces={'': "http://xpath.test/ns"}, schema=parser.schema ) self.assertTrue(token.evaluate(context)) root = self.etree.XML( '30') context = XPathContext( root, namespaces={'': "http://xpath.test/ns"}, schema=parser.schema ) self.assertFalse(token.evaluate(context)) def test_issue_10(self): schema = xmlschema.XMLSchema(''' ''') # TODO: test fail with xmlschema-1.0.17+, added namespaces as temporary fix for test. # A fix for xmlschema.xpath.ElementPathMixin._get_xpath_namespaces() is required. root = schema.find('root', namespaces={'': 'http://xpath.test/ns#'}) self.assertEqual(getattr(root, 'tag', None), '{http://xpath.test/ns#}root') @unittest.skipIf(xmlschema is None or lxml_etree is None, "both xmlschema and lxml required") class LxmlXMLSchemaProxyTest(XMLSchemaProxyTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_selectors.py000066400000000000000000000136371476131650400235170ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import xml.etree.ElementTree as ElementTree from elementpath import select, iter_select, Selector, XPath2Parser try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None class XPathSelectorsTest(unittest.TestCase): etree = ElementTree @classmethod def setUpClass(cls) -> None: cls.root = cls.etree.XML('Dickens') def test_select_function(self): self.assertListEqual(select(self.root, 'text()'), ['Dickens']) self.assertEqual(select(self.root, '$a', variables={'a': 1}), 1) self.assertEqual( select(self.root, '$a', variables={'a': 1}, variable_types={'a': 'xs:decimal'}), 1 ) def test_iter_select_function(self): self.assertListEqual(list(iter_select(self.root, 'text()')), ['Dickens']) self.assertListEqual(list(iter_select(self.root, '$a', variables={'a': True})), [True]) def test_selector_class(self): selector = Selector('/A') self.assertEqual(repr(selector), "Selector(path='/A', parser=XPath2Parser)") self.assertEqual(selector.namespaces, XPath2Parser.DEFAULT_NAMESPACES) selector = Selector('text()') self.assertListEqual(selector.select(self.root), ['Dickens']) self.assertListEqual(list(selector.iter_select(self.root)), ['Dickens']) selector = Selector('$a', variables={'a': 1}) self.assertEqual(selector.select(self.root), 1) self.assertListEqual(list(selector.iter_select(self.root)), [1]) def test_issue_001(self): selector = Selector("//FullPath[ends-with(., 'Temp')]") self.assertListEqual(selector.select(self.etree.XML('')), []) self.assertListEqual(selector.select(self.etree.XML('')), []) root = self.etree.XML('High Temp') self.assertListEqual(selector.select(root), [root]) def test_issue_042(self): selector1 = Selector('text()') selector2 = Selector('sup[last()]/preceding-sibling::text()') root = self.etree.XML('a1b2c3') self.assertListEqual(selector1.select(root), selector2.select(root)) selector2 = Selector('sup[1]/following-sibling::text()') root = self.etree.XML('1b2c3d') self.assertListEqual(selector1.select(root), selector2.select(root)) def test_fragment_argument__issue_081(self): # xml1 contains the xml-stylesheet tag xml1 = b""" value """ # the same as xml1, but without the xml-stylesheet tag xml2 = b""" value """ root1 = self.etree.XML(xml1) root2 = self.etree.XML(xml2) query = "first/second" if hasattr(root1, 'xpath'): self.assertEqual(select(root1, query), []) else: self.assertEqual(select(root1, query), root1[0][:]) self.assertEqual(select(root1, query, fragment=True), root1[0][:]) self.assertEqual(select(root2, query), root2[0][:]) self.assertEqual(select(root1, query, fragment=False), []) self.assertEqual(select(root2, query, fragment=False), []) query = "root/first/second" self.assertEqual(select(root1, query, fragment=False), root1[0][:]) self.assertEqual(select(root2, query, fragment=False), root2[0][:]) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPathSelectorsTest(XPathSelectorsTest): etree = lxml_etree def test_issue_058(self): tei = """ """ doc = self.etree.XML(tei.encode()) namespaces = {'': "http://www.tei-c.org/ns/1.0"} k = None for k, p in enumerate(select(doc, '//pb', namespaces), start=1): self.assertEqual(p.attrib['n'], f'page{k}') self.assertListEqual(p.xpath('./@n'), [f'page{k}']) self.assertListEqual(select(doc, './@n'), []) self.assertListEqual(select(p, './@n'), [f'page{k}']) self.assertListEqual(select(doc, './@n', item=p), [f'page{k}']) else: self.assertEqual(k, 2) def test_issue_074(self): root = lxml_etree.XML("") result = select(root, "trunk") self.assertListEqual(result, [root[0]]) # [] result = select(root, "/root/trunk") self.assertListEqual(result, [root[0]]) # [] root = lxml_etree.XML("") result = select(root, "trunk") self.assertListEqual(result, []) result = select(root, "root/trunk") self.assertListEqual(result, [root[0]]) # [] result = select(root, "/root/trunk") self.assertListEqual(result, [root[0]]) # [] if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_sequence_types.py000066400000000000000000000361111476131650400245400ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import io from textwrap import dedent from xml.etree import ElementTree try: import xmlschema except ImportError: xmlschema = None from elementpath.sequence_types import normalize_sequence_type, is_instance, \ is_sequence_type, match_sequence_type, is_sequence_type_restriction from elementpath import XPath2Parser, XPathContext from elementpath.xpath3 import XPath30Parser, XPath31Parser from elementpath.namespaces import XSD_NAMESPACE, XSD_UNTYPED_ATOMIC, \ XSD_ANY_ATOMIC_TYPE, XSD_ANY_SIMPLE_TYPE, XSI_NIL, XSD_STRING from elementpath.datatypes import UntypedAtomic from elementpath.xpath_nodes import CommentNode class SequenceTypesTest(unittest.TestCase): def test_normalize_sequence_type_function(self): self.assertEqual(normalize_sequence_type(' xs:integer + '), 'xs:integer+') self.assertEqual(normalize_sequence_type(' xs :integer + '), 'xs :integer+') # Invalid self.assertEqual(normalize_sequence_type(' element( * ) '), 'element(*)') self.assertEqual(normalize_sequence_type(' element( *,xs:int ) '), 'element(*, xs:int)') self.assertEqual(normalize_sequence_type(' \nfunction ( * )\t '), 'function(*)') self.assertEqual( normalize_sequence_type(' \nfunction ( item( ) * ) as xs:integer\t '), 'function(item()*) as xs:integer' ) def test_is_sequence_type_restriction_function(self): self.assertTrue(is_sequence_type_restriction('xs:error?', 'none')) self.assertTrue(is_sequence_type_restriction('empty-sequence()', 'none')) self.assertTrue(is_sequence_type_restriction('xs:error*', 'empty-sequence()')) self.assertFalse(is_sequence_type_restriction('xs:integer', 'item()*')) self.assertFalse(is_sequence_type_restriction('xs:integer', 'item()?')) self.assertFalse(is_sequence_type_restriction('xs:integer', 'item()')) self.assertFalse(is_sequence_type_restriction('xs:integer+', 'xs:integer*')) self.assertFalse(is_sequence_type_restriction('xs:integer+', 'xs:integer?')) self.assertTrue(is_sequence_type_restriction('xs:integer+', 'xs:integer+')) self.assertTrue(is_sequence_type_restriction('xs:integer+', 'xs:integer')) self.assertFalse(is_sequence_type_restriction('xs:integer*', 'xs:integer?')) self.assertFalse(is_sequence_type_restriction('xs:integer*', 'xs:integer+')) self.assertTrue(is_sequence_type_restriction('xs:integer*', 'xs:integer*')) self.assertTrue(is_sequence_type_restriction('xs:integer*', 'xs:integer')) self.assertFalse(is_sequence_type_restriction('xs:integer?', 'xs:integer*')) self.assertFalse(is_sequence_type_restriction('xs:integer?', 'xs:integer+')) self.assertTrue(is_sequence_type_restriction('xs:integer?', 'xs:integer?')) self.assertTrue(is_sequence_type_restriction('xs:integer?', 'xs:integer')) self.assertTrue(is_sequence_type_restriction('node()', 'element()')) self.assertFalse(is_sequence_type_restriction('element()', 'node()')) self.assertTrue(is_sequence_type_restriction('xs:anyAtomicType', 'xs:string')) self.assertFalse(is_sequence_type_restriction('xs:anyAtomicType', 'xs:unknown')) self.assertTrue(is_sequence_type_restriction('xs:string', 'xs:anyAtomicType')) self.assertTrue(is_sequence_type_restriction('xs:string', 'xs:token')) self.assertFalse(is_sequence_type_restriction('xs:string', 'xs:int')) self.assertFalse(is_sequence_type_restriction('xs:string', 'xs:unknown')) self.assertFalse(is_sequence_type_restriction('element()', 'xs:string')) self.assertFalse(is_sequence_type_restriction('function(*)', 'xs:string')) self.assertFalse(is_sequence_type_restriction( 'function(item()+) as xs:boolean', 'function(item()*) as xs:boolean' )) self.assertTrue(is_sequence_type_restriction( 'function(item()) as xs:boolean', 'function(item()*) as xs:boolean' )) self.assertFalse(is_sequence_type_restriction( 'function(item()+) as xs:boolean', 'function(item()) as xs:boolean' )) self.assertFalse(is_sequence_type_restriction( 'function(item()) as xs:boolean', 'function(item()) as xs:string' )) self.assertFalse(is_sequence_type_restriction( 'function(item(), item()) as xs:boolean', 'function(item()) as xs:boolean' )) def test_is_instance_function(self): self.assertTrue(is_instance(UntypedAtomic(1), XSD_UNTYPED_ATOMIC)) self.assertFalse(is_instance(1, XSD_UNTYPED_ATOMIC)) self.assertTrue(is_instance(1, XSD_ANY_ATOMIC_TYPE)) self.assertFalse(is_instance([1], XSD_ANY_ATOMIC_TYPE)) self.assertTrue(is_instance(1, XSD_ANY_SIMPLE_TYPE)) self.assertTrue(is_instance([1], XSD_ANY_SIMPLE_TYPE)) self.assertTrue(is_instance('foo', '{%s}string' % XSD_NAMESPACE)) self.assertFalse(is_instance(1, '{%s}string' % XSD_NAMESPACE)) self.assertTrue(is_instance(1.0, '{%s}double' % XSD_NAMESPACE)) self.assertFalse(is_instance(1.0, '{%s}float' % XSD_NAMESPACE)) parser = XPath2Parser(xsd_version='1.1') self.assertTrue(is_instance(1.0, '{%s}double' % XSD_NAMESPACE), parser) self.assertFalse(is_instance(1.0, '{%s}float' % XSD_NAMESPACE), parser) self.assertRaises(KeyError, is_instance, 'foo', '{%s}unknown' % XSD_NAMESPACE) self.assertRaises(KeyError, is_instance, 'foo', '{%s}unknown' % XSD_NAMESPACE, parser) self.assertRaises(KeyError, is_instance, 'foo', 'tst:unknown') self.assertRaises(KeyError, is_instance, 'foo', 'tst:unknown', parser) self.assertTrue(is_instance(None, '{%s}error' % XSD_NAMESPACE)) self.assertTrue(is_instance([], '{%s}error' % XSD_NAMESPACE)) self.assertFalse(is_instance(1.0, '{%s}error' % XSD_NAMESPACE)) self.assertTrue(is_instance(1.0, '{%s}numeric' % XSD_NAMESPACE)) self.assertFalse(is_instance(True, '{%s}numeric' % XSD_NAMESPACE)) self.assertFalse(is_instance('foo', '{%s}numeric' % XSD_NAMESPACE)) def test_is_sequence_type_function(self): self.assertTrue(is_sequence_type('empty-sequence()')) self.assertTrue(is_sequence_type('xs:string')) self.assertTrue(is_sequence_type('xs:float+')) self.assertTrue(is_sequence_type('element()*')) self.assertTrue(is_sequence_type('item()?')) self.assertTrue(is_sequence_type('xs:untypedAtomic+')) self.assertFalse(is_sequence_type(10)) self.assertFalse(is_sequence_type('')) self.assertFalse(is_sequence_type('empty-sequence()*')) self.assertFalse(is_sequence_type('unknown')) self.assertFalse(is_sequence_type('unknown?')) self.assertFalse(is_sequence_type('tns0:unknown')) self.assertTrue(is_sequence_type(' element( ) ')) self.assertTrue(is_sequence_type(' element( * ) ')) self.assertFalse(is_sequence_type(' element( *, * ) ')) self.assertTrue(is_sequence_type('element(A)')) self.assertTrue(is_sequence_type('element(A, xs:date)')) self.assertTrue(is_sequence_type('element(*, xs:date)')) self.assertFalse(is_sequence_type('element(A, B, xs:date)')) self.assertTrue(is_sequence_type('document-node(element(*, xs:date))')) self.assertFalse(is_sequence_type('document-node(element(*, xs:date)')) self.assertFalse(is_sequence_type('document-node(xs:date)')) parser = XPath2Parser() self.assertFalse(is_sequence_type('function(*)', parser)) self.assertFalse(is_sequence_type('function(xs:string)', parser)) self.assertFalse(is_sequence_type('map(xs:string, xs:string)', parser)) self.assertFalse(is_sequence_type('array(xs:string)', parser)) parser = XPath30Parser() self.assertTrue(is_sequence_type('function(*)', parser)) self.assertTrue(is_sequence_type('function(xs:string)', parser)) self.assertFalse(is_sequence_type('function(xs:string', parser)) self.assertFalse(is_sequence_type('map(xs:string, xs:string)', parser)) self.assertFalse(is_sequence_type('array(xs:string)', parser)) parser = XPath31Parser() self.assertTrue(is_sequence_type('function(*)', parser)) self.assertTrue(is_sequence_type('map(xs:string, xs:string)', parser)) self.assertFalse(is_sequence_type('map(xs:string, xs:string', parser)) self.assertTrue(is_sequence_type('array(xs:string)', parser)) # Without a parser argument assumes the latest version coverage self.assertTrue(is_sequence_type('function(*)')) self.assertTrue(is_sequence_type('map(xs:string, xs:string)')) self.assertFalse(is_sequence_type('map(xs:string, xs:string')) self.assertTrue(is_sequence_type('array(xs:string)')) self.assertTrue(is_sequence_type('function(xs:int) as xs:int')) self.assertFalse(is_sequence_type('function(xs:unknown) as xs:int')) self.assertFalse(is_sequence_type('function(xs:int) as xs:unknown')) self.assertTrue(is_sequence_type('function(xs:int, ...) as xs:int')) self.assertTrue(is_sequence_type('function(xs:int, function(*)) as xs:int')) self.assertTrue(is_sequence_type('function(function(*), xs:int) as xs:int')) self.assertTrue( is_sequence_type('function(xs:int, function(xs:int) as xs:int) as xs:int') ) self.assertFalse( is_sequence_type('function(function(xs:int) as xs:int, xs:int) as xs:int') ) def test_match_sequence_type_function(self): self.assertTrue(match_sequence_type(None, 'empty-sequence()')) self.assertTrue(match_sequence_type([], 'empty-sequence()')) self.assertFalse(match_sequence_type('', 'empty-sequence()')) self.assertFalse(match_sequence_type('', 'empty-sequence()')) context = XPathContext(ElementTree.XML('1')) root = context.root self.assertTrue(match_sequence_type(root, 'element()')) self.assertTrue(match_sequence_type(root, 'element(root)')) self.assertFalse(match_sequence_type(root, 'element(foo)')) self.assertTrue(match_sequence_type(root, 'element(root, xs:untyped)')) self.assertTrue(match_sequence_type(root, 'element(root, xs:untyped?)')) if xmlschema is not None: schema = xmlschema.XMLSchema(dedent('''\ ''')) root.xsd_type = schema.maps.types[XSD_STRING] self.assertFalse(match_sequence_type(root, 'element(root, xs:untyped)')) root.xsd_type = None root.obj.attrib[XSI_NIL] = 'true' self.assertFalse(match_sequence_type(root, 'element(root, xs:untyped)')) self.assertTrue(match_sequence_type(root, 'element(root, xs:untyped?)')) root.elem.attrib.pop(XSI_NIL) self.assertFalse(match_sequence_type(root, 'element(root, xs:string)')) self.assertFalse(match_sequence_type(root, 'element(root/e1, xs:string)')) self.assertTrue(match_sequence_type(root, 'element(root, xs:untypedAtomic)')) self.assertFalse(match_sequence_type(root, 'element(xs:root, xs:untypedAtomic)')) with self.assertRaises(NameError): match_sequence_type(root, 'element(root, xs:unknown)') parser = XPath2Parser() self.assertFalse(match_sequence_type(root, 'element(xs:root)', parser)) self.assertFalse(match_sequence_type(root, 'element(tns:root)', parser)) self.assertFalse(match_sequence_type(1.0, 'element()')) self.assertTrue(match_sequence_type([root], 'element()')) self.assertTrue(match_sequence_type(root, 'element()?')) self.assertTrue(match_sequence_type(root, 'element()+')) self.assertTrue(match_sequence_type(root, 'element()*')) self.assertFalse(match_sequence_type(root[:], 'element()')) self.assertFalse(match_sequence_type(root[:], 'element()?')) self.assertTrue(match_sequence_type(root[:], 'element()+')) self.assertTrue(match_sequence_type(root[:], 'element()*')) self.assertTrue(match_sequence_type(root, 'element(*)')) document = ElementTree.parse(io.StringIO('')) context = XPathContext(document) root = context.root self.assertTrue(match_sequence_type(root, 'document-node(element())')) self.assertFalse(match_sequence_type(root, 'document-node(element(A))')) parser = XPath2Parser() self.assertTrue(match_sequence_type(UntypedAtomic(1), 'xs:untypedAtomic')) self.assertFalse(match_sequence_type(1, 'xs:untypedAtomic')) self.assertTrue(match_sequence_type('1', 'xs:string')) self.assertFalse(match_sequence_type(1, 'xs:string')) with self.assertRaises(NameError) as ctx: match_sequence_type('1', 'xs:unknown', parser) self.assertIn('XPST0051', str(ctx.exception)) with self.assertRaises(NameError) as ctx: match_sequence_type('1', 'tns0:string', parser) self.assertIn('XPST0051', str(ctx.exception)) token = parser.parse('true()') self.assertFalse(match_sequence_type(1.0, 'function(*)')) self.assertTrue(match_sequence_type(token, 'function(*)')) self.assertTrue(match_sequence_type(token, 'function() as xs:boolean')) self.assertFalse(match_sequence_type(token, 'function() as xs:int')) parser = XPath31Parser() self.assertFalse(match_sequence_type(1.0, 'array(*)')) self.assertFalse(match_sequence_type(1.0, 'map(*)')) token = parser.parse('[1, 2, 3]') self.assertTrue(match_sequence_type(token, 'array(*)')) self.assertTrue(match_sequence_type(token, 'array(xs:integer)')) self.assertFalse(match_sequence_type(token, 'array(xs:string)')) self.assertFalse(match_sequence_type(token, 'map(*)')) token = parser.parse('map{1: 2}') self.assertFalse(match_sequence_type(token, 'array(*)')) self.assertTrue(match_sequence_type(token, 'map(*)')) self.assertTrue(match_sequence_type(token, 'map(xs:integer, xs:integer)')) self.assertFalse(match_sequence_type(token, 'map(xs:string, xs:integer)')) with self.assertRaises(SyntaxError): match_sequence_type(token, 'map(xs:integer+, xs:integer)') self.assertFalse(match_sequence_type('foo', 'xs:anyURI')) self.assertTrue(match_sequence_type('foo', 'xs:anyURI', strict=False)) comment = ElementTree.Comment('foo') comment_node = CommentNode(comment) self.assertTrue(match_sequence_type(comment_node, 'comment()')) self.assertFalse(match_sequence_type(comment_node, 'comment(*)')) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_serialization.py000066400000000000000000000420711476131650400243630ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2023, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import decimal import sys import unittest from textwrap import dedent from xml.etree import ElementTree try: import xmlschema except ImportError: xmlschema = None from elementpath import get_node_tree from elementpath.datatypes import QName, UntypedAtomic from elementpath.xpath_nodes import TextNode, CommentNode, \ ProcessingInstructionNode from elementpath.xpath_tokens import XPathMap, XPathArray from elementpath.serialization import get_serialization_params, \ serialize_to_xml, serialize_to_json from elementpath.xpath3 import XPath31Parser class SerializationTest(unittest.TestCase): def test_get_serialization_params_from_map(self): parser = XPath31Parser() params = XPathMap(parser, items={}) result = get_serialization_params(params) self.assertEqual(result, {}) params = XPathMap(parser, items={1: 'xml', 'method': None}) result = get_serialization_params(params) self.assertEqual(result, {}) params = XPathMap(parser, items={'method': 'xml'}) result = get_serialization_params(params) self.assertEqual(result, {'method': 'xml'}) params = XPathMap(parser, items={'method': 'json'}) result = get_serialization_params(params) self.assertEqual(result, {'method': 'json'}) params = XPathMap(parser, items={'method': 'adaptive'}) result = get_serialization_params(params) self.assertEqual(result, {'method': 'adaptive'}) params = XPathMap(parser, items={'indent': True}) result = get_serialization_params(params) self.assertEqual(result, {'indent': True}) params = XPathMap(parser, items={'indent': False}) result = get_serialization_params(params) self.assertEqual(result, {'indent': False}) params = XPathMap(parser, items={'indent': UntypedAtomic('true')}) result = get_serialization_params(params) self.assertEqual(result, {'indent': True}) params = XPathMap(parser, items={'indent': UntypedAtomic('false')}) result = get_serialization_params(params) self.assertEqual(result, {'indent': False}) params = XPathMap(parser, items={'indent': 'yes'}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'method': 'other'}) with self.assertRaises(ValueError) as ctx: get_serialization_params(params) self.assertIn('SEPM0017', str(ctx.exception)) params = XPathMap(parser, items={'use-character-maps': True}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'omit-xml-declaration': True}) result = get_serialization_params(params) self.assertEqual(result, {'xml_declaration': False}) params = XPathMap(parser, items={'omit-xml-declaration': False}) result = get_serialization_params(params) self.assertEqual(result, {'xml_declaration': True}) params = XPathMap(parser, items={'omit-xml-declaration': 'no'}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'item-separator': ';'}) result = get_serialization_params(params) self.assertEqual(result, {'item_separator': ';'}) params = XPathMap(parser, items={'item-separator': UntypedAtomic(' ')}) result = get_serialization_params(params) self.assertEqual(result, {'item_separator': ' '}) params = XPathMap(parser, items={'item-separator': True}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'encoding': 'ISO-8859-1'}) result = get_serialization_params(params) self.assertEqual(result, {'encoding': 'ISO-8859-1'}) params = XPathMap(parser, items={'encoding': False}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'allow-duplicate-names': True}) result = get_serialization_params(params) self.assertEqual(result, {'allow_duplicate_names': True}) params = XPathMap(parser, items={'allow-duplicate-names': False}) result = get_serialization_params(params) self.assertEqual(result, {'allow_duplicate_names': False}) params = XPathMap(parser, items={'allow-duplicate-names': 'false'}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'json-node-output-method': 'xml'}) result = get_serialization_params(params) self.assertEqual(result, {'json-node-output-method': 'xml'}) params = XPathMap(parser, items={'json-node-output-method': True}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) character_map = XPathMap(parser, {'$': '£'}) params = XPathMap(parser, {'use-character-maps': character_map}) result = get_serialization_params(params) self.assertEqual(result, {'character_map': {'$': '£'}}) character_map = XPathMap(parser, {'$': 1}) params = XPathMap(parser, {'use-character-maps': character_map}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) character_map = XPathMap(parser, {'$$': '£'}) params = XPathMap(parser, {'use-character-maps': character_map}) with self.assertRaises(ValueError) as ctx: get_serialization_params(params) self.assertIn('SEPM0016', str(ctx.exception)) params = XPathMap(parser, items={'standalone': False}) result = get_serialization_params(params) self.assertEqual(result, {'standalone': False}) params = XPathMap(parser, items={'standalone': True}) result = get_serialization_params(params) self.assertEqual(result, {'standalone': True}) params = XPathMap(parser, items={'standalone': []}) result = get_serialization_params(params) self.assertEqual(result, {}) params = XPathMap(parser, items={'standalone': 'no'}) result = get_serialization_params(params) self.assertEqual(result, {'standalone': False}) params = XPathMap(parser, items={'standalone': 'omit'}) result = get_serialization_params(params) self.assertEqual(result, {}) params = XPathMap(parser, items={'standalone': ' no '}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) cdata = [ QName(uri='http://xpath.test/ns', qname='a'), QName(uri='http://xpath.test/ns', qname='b'), QName(uri='', qname='c') ] params = XPathMap(parser, items={'cdata-section-elements': cdata}) result = get_serialization_params(params) self.assertEqual(result, {'cdata_section': cdata}) cdata_array = XPathArray(parser, cdata) params = XPathMap(parser, items={'cdata-section-elements': cdata_array}) result = get_serialization_params(params) self.assertEqual(result, {'cdata_section': cdata}) cdata.append('wrong') params = XPathMap(parser, items={'cdata-section-elements': cdata}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) params = XPathMap(parser, items={'suppress-indentation': QName('', 'foo')}) result = get_serialization_params(params) self.assertEqual(result, {'suppress-indentation': QName('', 'foo')}) params = XPathMap(parser, items={'suppress-indentation': [QName('', 'foo')]}) result = get_serialization_params(params) self.assertListEqual(list(result.values()), [QName('', 'foo')]) params = XPathMap(parser, items={'suppress-indentation': 'foo'}) with self.assertRaises(TypeError) as ctx: get_serialization_params(params) self.assertIn('XPTY0004', str(ctx.exception)) def test_get_serialization_params_from_element_tree(self): namespaces = {'output': "http://www.w3.org/2010/xslt-xquery-serialization"} root = ElementTree.XML(dedent("""\ """)) params = get_node_tree(root, namespaces) result = get_serialization_params(params) self.assertEqual(result, {'method': 'xml', 'item_separator': '=='}) root = ElementTree.XML(dedent("""\ """)) params = get_node_tree(root, namespaces) result = get_serialization_params(params) self.assertEqual(result, {'character_map': {'$': '£'}}) root = ElementTree.XML(dedent("""\ """)) params = get_node_tree(root, namespaces) result = get_serialization_params(params) self.assertEqual(result, {'standalone': False, 'xml_declaration': True}) def test_serialize_to_xml_function(self): root = ElementTree.XML("1") elements = [get_node_tree(root)] result = serialize_to_xml(elements) self.assertEqual(result, '1') root = ElementTree.XML("1") elements = [get_node_tree(root)] result = serialize_to_xml(elements, xml_declaration=True) if sys.version_info < (3, 8): self.assertEqual(result, '1') else: self.assertEqual(result, '\n1') cdata = [ QName(uri='http://xpath.test/ns', qname='a'), QName(uri='http://xpath.test/ns', qname='b'), QName(uri='', qname='c') ] result = serialize_to_xml(elements, cdata_section=cdata) self.assertEqual(result, '1') root1 = ElementTree.XML("£$") root2 = ElementTree.XML("£") elements = [get_node_tree(root1), get_node_tree(root2)] result = serialize_to_xml(elements, character_map={'$': '£'}) self.assertEqual(result, '££' '£') root1 = ElementTree.XML("") root2 = ElementTree.XML("") elements = [get_node_tree(root1), get_node_tree(root2)] result = serialize_to_xml(elements, item_separator=' ') self.assertEqual(result, ' ') root = ElementTree.XML("1234") root_node = get_node_tree(root) elements = [x for x in root_node.children if isinstance(x, TextNode)] result = serialize_to_xml(elements, item_separator='-') self.assertEqual(result, '1-2-3-4') parser = XPath31Parser() elements = [ XPathArray(parser, [1, 2, 3]), XPathArray(parser, [4, 5, 6]), ] result = serialize_to_xml(elements, item_separator=';') self.assertEqual(result, '1;2;3;4;5;6') elements = list(range(10)) result = serialize_to_xml(elements, item_separator=',') self.assertEqual(result, '0,1,2,3,4,5,6,7,8,9') def test_serialize_to_json_function(self): result = serialize_to_json([]) self.assertEqual(result, 'null') result = serialize_to_json(["à"]) self.assertEqual(result, '"\\u00e0"') result = serialize_to_json(["à"], encoding='ascii') self.assertEqual(result, '"\\u00e0"') root = ElementTree.XML("1") elements = [get_node_tree(root)] result = serialize_to_json(elements) self.assertEqual(result, '"1<\\/root>"') document = ElementTree.ElementTree(root) elements = [get_node_tree(document)] result = serialize_to_json(elements) self.assertEqual(result, '"1<\\/root>"') root = ElementTree.XML("£$") elements = [get_node_tree(root)] result = serialize_to_json(elements) self.assertEqual(result, r'"\u00a3<\/c1>$<\/c2><\/root>"') root = ElementTree.XML("1234") elements = [get_node_tree(root)] result = serialize_to_json(elements) self.assertEqual(result, '"1234<\\/root>"') root = ElementTree.XML("foo") root_node = get_node_tree(root) result = serialize_to_json(root_node.children) self.assertEqual(result, '"foo"') result = serialize_to_json(root_node.attributes) self.assertEqual(result, '"a=\\"bar\\""') comment = ElementTree.Comment(' foo') comment_node = CommentNode(content=comment) result = serialize_to_json([comment_node]) self.assertEqual(result, '""') pi = ElementTree.ProcessingInstruction('foo bar') pi_node = ProcessingInstructionNode(target=pi) result = serialize_to_json([pi_node]) self.assertEqual(result, '""') parser = XPath31Parser() elements = [XPathArray(parser, [1, 2, 3])] result = serialize_to_json(elements) self.assertEqual(result, '[1,2,3]') elements = [XPathArray(parser, [float('nan'), 2, 3])] with self.assertRaises(TypeError) as ctx: serialize_to_json(elements) self.assertIn('SERE0020', str(ctx.exception)) elements = [XPathArray(parser, [1, 2, float('inf')])] with self.assertRaises(TypeError) as ctx: serialize_to_json(elements) self.assertIn('SERE0020', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: serialize_to_json([set()]) self.assertIn('SERE0021', str(ctx.exception)) elements = list(range(10)) with self.assertRaises(TypeError) as ctx: serialize_to_json(elements) self.assertIn('SERE0023', str(ctx.exception)) elements = [[1, 2, 3]] result = serialize_to_json(elements) self.assertEqual(result, '[1,2,3]') parser = XPath31Parser() elements = [XPathMap(parser, [(1, 'one'), (2, 'two'), (3, 'three')])] result = serialize_to_json(elements) self.assertEqual(result, '{"1":"one","2":"two","3":"three"}') elements = [XPathMap(parser, [(1, ['one']), (2, 'two')])] result = serialize_to_json(elements) self.assertEqual(result, '{"1":"one","2":"two"}') elements = [XPathMap(parser, [(1, ['one', 'one']), (2, 'two')])] with self.assertRaises(TypeError) as ctx: serialize_to_json(elements) self.assertIn('SERE0023', str(ctx.exception)) elements = [XPathMap(parser, [(QName('', 'one'), 1), ('one', 1)])] with self.assertRaises(ValueError) as ctx: serialize_to_json(elements) self.assertIn('SERE0022', str(ctx.exception)) result = serialize_to_json(elements, allow_duplicate_names=True) self.assertEqual(result, '{"one":1,"one":1}') elements = [XPathMap(parser, [(QName('', 'one'), 1), ('two', 2)])] result = serialize_to_json(elements) self.assertEqual(result, '{"one":1,"two":2}') self.assertEqual(serialize_to_json([UntypedAtomic('9.0')]), '"9.0"') self.assertEqual(serialize_to_json([decimal.Decimal('9.0')]), '9.0') self.assertEqual(serialize_to_json([9.0]), '9.0') if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_tdop_parser.py000066400000000000000000000403541476131650400240320ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import re import sys from collections import namedtuple from elementpath.tdop import _symbol_to_classname, ParseError, Token, \ ParserMeta, Parser, MultiLabel class TdopParserTest(unittest.TestCase): @classmethod def setUpClass(cls): class ExpressionParser(Parser): @classmethod def create_tokenizer(cls, symbol_table): return re.compile( r'INCOMPATIBLE | (\d+) | (UNKNOWN|[+\-]) | (\w+) | (\S)| \s+', flags=re.VERBOSE ) ExpressionParser.literal('(integer)') ExpressionParser.register('(name)') @ExpressionParser.method(ExpressionParser.infix('+', bp=40)) def evaluate_plus(self, context=None): return self[0].evaluate(context) + self[1].evaluate(context) @ExpressionParser.method(ExpressionParser.infix('-', bp=40)) def evaluate_minus(self, context=None): return self[0].evaluate(context) - self[1].evaluate(context) cls.parser = ExpressionParser() def test_multi_label_class(self): label = MultiLabel('function', 'constructor function') self.assertEqual(label, 'function') self.assertEqual(label, 'constructor function') self.assertNotEqual(label, 'constructor') self.assertNotEqual(label, 'operator') self.assertEqual(str(label), 'function__constructor_function') self.assertEqual(repr(label), "MultiLabel('function', 'constructor function')") self.assertEqual(hash(label), hash(('function', 'constructor function'))) self.assertIn(label, ['function']) self.assertNotIn(label, []) self.assertNotIn(label, ['not a function']) self.assertNotIn(label, {'function'}) # compares not equality but hash self.assertIn('function', label) self.assertIn('constructor', label) self.assertNotIn('axis', label) self.assertTrue(label.startswith('function')) self.assertTrue(label.startswith('constructor')) self.assertFalse(label.startswith('operator')) self.assertTrue(label.endswith('function')) self.assertFalse(label.endswith('constructor')) def test_symbol_to_classname_function(self): self.assertEqual(_symbol_to_classname('_cat10'), 'Cat10') self.assertEqual(_symbol_to_classname('&'), 'Ampersand') self.assertEqual(_symbol_to_classname('('), 'LeftParenthesis') self.assertEqual(_symbol_to_classname(')'), 'RightParenthesis') self.assertEqual(_symbol_to_classname('(name)'), 'Name') self.assertEqual(_symbol_to_classname('(name'), 'LeftParenthesisname') self.assertEqual(_symbol_to_classname('-'), 'HyphenMinus') self.assertEqual(_symbol_to_classname('_'), 'LowLine') self.assertEqual(_symbol_to_classname('-_'), 'HyphenMinusLowLine') self.assertEqual(_symbol_to_classname('--'), 'HyphenMinusHyphenMinus') self.assertEqual(_symbol_to_classname('my-api-call'), 'MyApiCall') self.assertEqual(_symbol_to_classname('call-'), 'Call') def test_create_tokenizer_method(self): FakeToken = namedtuple('Token', 'symbol pattern label') tokens = { FakeToken(symbol='(name)', pattern=None, label='literal'), FakeToken('call', pattern=r'\bcall\b(?=\s+\()', label='function'), } pattern = Parser.create_tokenizer({t.symbol: t for t in tokens}) self.assertEqual(pattern.pattern, '(\'[^\']*\'|"[^"]*"|(?:\\d+|\\.\\d+)(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)|' '(\\bcall\\b(?=\\s+\\())|([A-Za-z0-9_]+)|(\\S)|\\s+') tokens = { FakeToken(symbol='(name)', pattern=None, label='literal'), FakeToken('call', pattern=r'\bcall\b(?=\s+\()', label='function'), FakeToken('+', pattern=None, label='operator'), } pattern = Parser.create_tokenizer({t.symbol: t for t in tokens}) self.assertEqual(pattern.pattern, '(\'[^\']*\'|"[^"]*"|(?:\\d+|\\.\\d+)(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)|' '([\\+]|\\bcall\\b(?=\\s+\\())|([A-Za-z0-9_]+)|(\\S)|\\s+') # Check fix for issue #10 tk = FakeToken('{http://www.w3.org/2000/09/xmldsig#}CryptoBinary', None, 'constructor') tokens.add(tk) pattern = Parser.create_tokenizer({t.symbol: t for t in tokens}) if sys.version_info >= (3, 7): self.assertIn(r"(\{http://www\.w3\.org/2000/09/xmldsig\#\}", pattern.pattern) else: self.assertIn(r"(\{http\:\/\/www\.w3\.org\/2000\/09\/xmldsig\#\}", pattern.pattern) def test_tokenizer_items(self): self.assertListEqual(self.parser.tokenizer.findall('5 56'), [('5', '', '', ''), ('', '', '', ''), ('56', '', '', '')]) self.assertListEqual(self.parser.tokenizer.findall('5+56'), [('5', '', '', ''), ('', '+', '', ''), ('56', '', '', '')]) self.assertListEqual(self.parser.tokenizer.findall('xy'), [('', '', 'xy', '')]) self.assertListEqual(self.parser.tokenizer.findall('5x'), [('5', '', '', ''), ('', '', 'x', '')]) def test_incompatible_tokenizer(self): with self.assertRaises(RuntimeError) as ec: self.parser.parse('INCOMPATIBLE') self.assertIn("incompatible tokenizer", str(ec.exception)) def test_string_repr(self): self.assertEqual(repr(self.parser), f'') self.assertEqual(str(self.parser), 'ExpressionParser()') def test_expression(self): token = self.parser.parse('10 + 6') self.assertEqual(token.evaluate(), 16) def test_iter_method(self): token = self.parser.parse('9 + 7 - 5') self.assertListEqual(list(tk.source for tk in token.iter()), ['9', '9 + 7', '7', '9 + 7 - 5', '5']) self.assertListEqual(list(tk.source for tk in token.iter('(integer)')), ['9', '7', '5']) self.assertListEqual(list(tk.source for tk in token.iter('(name)')), []) class SampleParser(Parser): pass @SampleParser.method(SampleParser.nullary('.')) def evaluate_self(_self, _context=None): return _self parser = SampleParser() token = parser.parse('.') self.assertListEqual(list(tk.source for tk in token.iter()), ['.']) self.assertListEqual(list(tk.source for tk in token.iter('.')), ['.']) self.assertListEqual(list(tk.source for tk in token.iter('..')), []) def test_syntax_errors(self): with self.assertRaises(ParseError) as ec: self.parser.parse('x') # with nud() self.assertEqual(str(ec.exception), "unexpected name 'x'") with self.assertRaises(ParseError) as ec: self.parser.parse('5y') # with led() self.assertEqual(str(ec.exception), "unexpected name 'y'") with self.assertRaises(ParseError) as ec: self.parser.parse('5 5') # with expected() self.assertEqual(str(ec.exception), "unexpected literal 5") def test_unused_token_helpers(self): token = self.parser.parse('10') self.assertIsNone(token.unexpected('+', '-')) with self.assertRaises(ParseError) as ec: token.unexpected('(integer)') self.assertEqual(str(ec.exception), "unexpected literal 10") self.assertIsInstance(token.wrong_type(), TypeError) self.assertIsInstance(token.wrong_value(), ValueError) def test_unknown_symbol(self): with self.assertRaises(ParseError) as ec: self.parser.parse('?') self.assertEqual(str(ec.exception), "unknown symbol '?'") with self.assertRaises(ParseError) as ec: self.parser.parse('UNKNOWN') self.assertEqual(str(ec.exception), "unexpected name 'UNKNOWN'") parser = self.parser.__class__() parser.symbol_table = parser.symbol_table.copy() parser.build() parser.symbol_table.pop('+') with self.assertRaises(ParseError) as ec: parser.parse('+') self.assertEqual(str(ec.exception), "unknown symbol '+'") def test_invalid_source(self): with self.assertRaises(ParseError) as ec: self.parser.parse(10) self.assertIn("invalid source type", str(ec.exception)) def test_invalid_token(self): token = self.parser.symbol_table['(invalid)'](self.parser, '10e') self.assertEqual(str(token.wrong_syntax()), "invalid literal '10e'") def test_parser_position(self): parser = type(self.parser)() parser.source = ' 7 +\n 8 ' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) self.assertEqual(parser.token.symbol, '(start)') parser.advance() self.assertEqual(parser.token.symbol, '(start)') self.assertEqual(parser.position, (1, 1)) self.assertTrue(parser.is_source_start()) self.assertTrue(parser.is_line_start()) self.assertTrue(parser.is_spaced()) parser.advance() self.assertNotEqual(parser.token.symbol, '(start)') self.assertEqual(parser.token.value, 7) self.assertEqual(parser.position, (1, 4)) self.assertTrue(parser.is_source_start()) self.assertTrue(parser.is_line_start()) self.assertTrue(parser.is_spaced()) parser.advance() self.assertEqual(parser.token.symbol, '+') self.assertEqual(parser.position, (1, 6)) self.assertFalse(parser.is_source_start()) self.assertFalse(parser.is_line_start()) parser.advance() self.assertEqual(parser.token.value, 8) self.assertEqual(parser.position, (2, 2)) self.assertFalse(parser.is_source_start()) self.assertTrue(parser.is_line_start()) self.assertTrue(parser.is_spaced()) parser.source = ' 7 +' self.assertFalse(parser.is_spaced()) def test_advance_until(self): parser = type(self.parser)() parser.source = '' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() with self.assertRaises(TypeError) as ec: parser.advance_until() self.assertEqual(str(ec.exception), "at least a stop symbol required!") with self.assertRaises(ParseError) as ec: parser.advance_until('+') self.assertEqual(str(ec.exception), "source is empty") parser = type(self.parser)() parser.source = '5 6 7 + 8' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() self.assertEqual(parser.next_token.symbol, '(integer)') self.assertEqual(parser.next_token.value, 5) parser.advance_until('+') self.assertEqual(parser.next_token.symbol, '+') parser = type(self.parser)() parser.source = '5 6 7 + 8' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() self.assertEqual(parser.next_token.symbol, '(integer)') self.assertEqual(parser.next_token.value, 5) parser.advance_until('*') self.assertEqual(parser.next_token.symbol, '(end)') parser = type(self.parser)() parser.source = '5 UNKNOWN' parser.tokens = iter(parser.tokenizer.finditer(parser.source)) parser.advance() self.assertEqual(parser.next_token.symbol, '(integer)') self.assertEqual(parser.next_token.value, 5) with self.assertRaises(ParseError) as ec: parser.advance_until('UNKNOWN') self.assertEqual(str(ec.exception), "unknown symbol '(unknown)'") def test_unescape_helper(self): self.assertEqual(self.parser.unescape("'\\''"), "'") self.assertEqual(self.parser.unescape('"\\""'), '"') def test_invalid_parser_derivation(self): globals()['ExpressionParser'] = self.parser.__class__ try: with self.assertRaises(RuntimeError) as ec: class AnotherParser(Parser): pass isinstance(AnotherParser, Parser) self.assertEqual(str(ec.exception), "Multiple parser class definitions per module are not allowed") finally: del globals()['ExpressionParser'] def test_new_parser_class(self): class FakeBase: pass class AnotherParser(FakeBase, metaclass=ParserMeta): pass self.assertIs(AnotherParser.token_base_class, Token) self.assertEqual(AnotherParser.literals_pattern.pattern, r"""'[^']*'|"[^"]*"|(?:\d+|\.\d+)(?:\.\d*)?(?:[Ee][+-]?\d+)?""") def test_invalid_registrations(self): class AnotherParser(Parser): SYMBOLS = {'(integer)', '(name)'} with self.assertRaises(ValueError) as ec: AnotherParser.register(r'function \(') self.assertIn("a symbol can't contain whitespaces", str(ec.exception)) with self.assertRaises(TypeError) as ec: AnotherParser.register(9) self.assertIn("A string or a", str(ec.exception)) with self.assertRaises(ValueError) as ec: AnotherParser.register(self.parser.symbol_table['+']) self.assertIn("Token class ", str(ec.exception)) self.assertIn("is not registered", str(ec.exception)) def test_other_operators(self): class ExpressionParser(Parser): SYMBOLS = {'(integer)', '+', '++', '-', '*', '(name)'} ExpressionParser.prefix('++') ExpressionParser.postfix('+') ExpressionParser.nullary('(name)') @ExpressionParser.method(ExpressionParser.prefix('++', bp=90)) def evaluate_increment(self_, context=None): return self_[0].evaluate(context) + 1 @ExpressionParser.method(ExpressionParser.postfix('+', bp=90)) def evaluate_plus(self_, context=None): return self_[0].evaluate(context) + 1 @ExpressionParser.method(ExpressionParser.infixr('-', bp=50)) def evaluate_minus(self_, context=None): return self_[0].evaluate(context) - self_[1].evaluate(context) @ExpressionParser.method('*', bp=70) def nud_mul(self_): for _ in range(3): self_.append(self_.parser.expression(rbp=70)) return self_ @ExpressionParser.method('*', bp=70) def evaluate_mul(self_, context=None): return self_[0].evaluate(context) * \ self_[1].evaluate(context) * self_[2].evaluate(context) ExpressionParser.literal('(integer)') ExpressionParser.register('(end)') with self.assertRaises(AttributeError) as ec: @ExpressionParser.method('*', bp=70) def foo_mul(self_): return None self.assertIn("has no attribute 'foo'", str(ec.exception)) with self.assertRaises(TypeError) as ec: @ExpressionParser.method('*', bp=70) def label_mul(self_): return None self.assertIn("'label' is not a method of ", str(ec.exception)) parser = ExpressionParser() token = parser.parse('foo') self.assertEqual(token.evaluate(), 'foo') token = parser.parse('++5') self.assertEqual(token.source, '++ 5') self.assertEqual(token.evaluate(), 6) with self.assertRaises(ParseError) as ec: parser.parse('1 ++ 5') self.assertEqual(str(ec.exception), "unexpected '++' prefix operator") token = parser.parse('8 +') self.assertEqual(token.source, '8 +') self.assertEqual(token.evaluate(), 9) token = parser.parse(' 8 - 5') self.assertEqual(token.source, '8 - 5') self.assertEqual(token.evaluate(), 3) token = parser.parse('* 8 2 5') self.assertEqual(token.source, '* 8 2 5') self.assertEqual(token.evaluate(), 80) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_tree_builders.py000066400000000000000000000270651476131650400243440ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2023, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import io import sys import xml.etree.ElementTree as ElementTree from textwrap import dedent try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath.tree_builders import build_node_tree, \ build_lxml_node_tree, build_schema_node_tree from elementpath.xpath_nodes import ElementNode, \ DocumentNode, TextNode, CommentNode, ProcessingInstructionNode XML_DATA = """\ child1 text1 child1 text2 """ class TreeBuildersTest(unittest.TestCase): namespaces = {'tns': "http://elementpath.test/ns"} def test_build_node_tree_with_element(self): root = ElementTree.XML(XML_DATA) node = build_node_tree(root, self.namespaces) self.assertIsInstance(node, ElementNode) self.assertEqual(len(node.children), 7) self.assertIsInstance(node.children[0], TextNode) self.assertIsInstance(node.children[1], ElementNode) self.assertIsInstance(node.children[2], TextNode) self.assertIsInstance(node.children[3], ElementNode) self.assertIsInstance(node.children[4], TextNode) self.assertIsInstance(node.children[5], ElementNode) self.assertIsInstance(node.children[6], TextNode) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) def test_build_node_tree_with_element_tree(self): root = ElementTree.parse(io.StringIO(XML_DATA)) node = build_node_tree(root, self.namespaces) self.assertIsInstance(node, DocumentNode) self.assertEqual(node.position, 1) self.assertEqual(len(node.children), 1) self.assertIsInstance(node[0], ElementNode) self.assertEqual(node[0].position, 2) self.assertEqual(len(node[0].children), 7) self.assertIsInstance(node[0].children[0], TextNode) self.assertIsInstance(node[0].children[1], ElementNode) self.assertIsInstance(node[0].children[2], TextNode) self.assertIsInstance(node[0].children[3], ElementNode) self.assertIsInstance(node[0].children[4], TextNode) self.assertIsInstance(node[0].children[5], ElementNode) self.assertIsInstance(node[0].children[6], TextNode) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) @unittest.skipIf(sys.version_info <= (3, 8), "Comments not available in ElementTree") def test_build_node_tree_with_comments_and_pis(self): parser = ElementTree.XMLParser( target=ElementTree.TreeBuilder( insert_comments=True, insert_pis=True ) ) root = ElementTree.XML(XML_DATA, parser=parser) node = build_node_tree(root, self.namespaces) self.assertIsInstance(node, ElementNode) self.assertEqual(len(node.children), 11) self.assertIsInstance(node.children[0], TextNode) self.assertIsInstance(node.children[1], CommentNode) self.assertIsInstance(node.children[2], TextNode) self.assertIsInstance(node.children[3], ElementNode) self.assertIsInstance(node.children[4], TextNode) self.assertIsInstance(node.children[5], ElementNode) self.assertIsInstance(node.children[6], TextNode) self.assertIsInstance(node.children[7], ElementNode) self.assertIsInstance(node.children[8], TextNode) self.assertIsInstance(node.children[9], ProcessingInstructionNode) self.assertIsInstance(node.children[10], TextNode) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) @unittest.skipIf(lxml_etree is None, "lxml library is not installed") def test_build_lxml_node_tree_with_element(self): root = lxml_etree.XML(XML_DATA.encode('utf-8')) node = build_lxml_node_tree(root) self.assertIsInstance(node, DocumentNode) self.assertEqual(node.position, 1) self.assertIsNotNone(node.document) self.assertEqual(len(node.children), 5) self.assertIsInstance(node.children[0], ProcessingInstructionNode) self.assertIsInstance(node.children[1], CommentNode) self.assertIsInstance(node.children[2], ElementNode) self.assertIsInstance(node.children[3], ProcessingInstructionNode) self.assertIsInstance(node.children[4], CommentNode) self.assertIsInstance(node[2], ElementNode) self.assertEqual(node[2].position, 4) self.assertEqual(len(node[2].children), 11) self.assertIsInstance(node[2].children[0], TextNode) self.assertIsInstance(node[2].children[1], CommentNode) self.assertIsInstance(node[2].children[2], TextNode) self.assertIsInstance(node[2].children[3], ElementNode) self.assertIsInstance(node[2].children[4], TextNode) self.assertIsInstance(node[2].children[5], ElementNode) self.assertIsInstance(node[2].children[6], TextNode) self.assertIsInstance(node[2].children[5], ElementNode) self.assertIsInstance(node[2].children[6], TextNode) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) node = build_lxml_node_tree(root[1]) self.assertIsInstance(node, ElementNode) self.assertEqual(len(node.children), 7) self.assertIsInstance(node.children[0], TextNode) self.assertIsInstance(node.children[1], CommentNode) self.assertIsInstance(node.children[2], TextNode) self.assertIsInstance(node.children[3], ElementNode) self.assertIsInstance(node.children[4], TextNode) self.assertIsInstance(node.children[5], ElementNode) self.assertIsInstance(node.children[6], TextNode) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) @unittest.skipIf(lxml_etree is None, "lxml library is not installed") def test_build_lxml_node_tree_with_element_tree(self): root = lxml_etree.parse(io.BytesIO(XML_DATA.encode('utf-8'))) node = build_lxml_node_tree(root) self.assertIsInstance(node, DocumentNode) self.assertEqual(node.position, 1) self.assertIs(node.document, root) self.assertEqual(len(node.children), 5) self.assertIsInstance(node.children[0], ProcessingInstructionNode) self.assertIsInstance(node.children[1], CommentNode) self.assertIsInstance(node.children[2], ElementNode) self.assertIsInstance(node.children[3], ProcessingInstructionNode) self.assertIsInstance(node.children[4], CommentNode) self.assertIsInstance(node[2], ElementNode) self.assertEqual(node[2].position, 4) self.assertEqual(len(node[2].children), 11) self.assertIsInstance(node[2].children[0], TextNode) self.assertIsInstance(node[2].children[1], CommentNode) self.assertIsInstance(node[2].children[2], TextNode) self.assertIsInstance(node[2].children[3], ElementNode) self.assertIsInstance(node[2].children[4], TextNode) self.assertIsInstance(node[2].children[5], ElementNode) self.assertIsInstance(node[2].children[6], TextNode) self.assertIsInstance(node[2].children[5], ElementNode) self.assertIsInstance(node[2].children[6], TextNode) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) root = lxml_etree.ElementTree() self.assertIsNone(root.getroot()) node = build_lxml_node_tree(root) self.assertIsInstance(node, DocumentNode) self.assertEqual(node.position, 1) self.assertIs(node.document, root) self.assertEqual(len(node.children), 0) for k, node in enumerate(node.iter(), start=1): self.assertEqual(k, node.position, msg=node) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed!") def test_build_schema_node_tree(self): schema = xmlschema.XMLSchema(dedent("""\n """)) root_node = build_schema_node_tree(schema) self.assertIs(root_node.elem, schema) self.assertIsInstance(root_node.elements, dict) for node in root_node.elements.values(): self.assertIs(node.elements, root_node.elements) self.assertEqual(len(root_node.elements), 7) global_elements = [] root_node = build_schema_node_tree(schema, global_elements=global_elements) self.assertIs(root_node.elem, schema) self.assertIn(root_node, global_elements) for node in root_node.elements.values(): self.assertIs(node.elements, root_node.elements) self.assertEqual(len(root_node.elements), 7) root_node = build_schema_node_tree(schema.elements['root']) self.assertIs(root_node.elem, schema.elements['root']) self.assertIsInstance(root_node.elements, dict) for node in root_node.elements.values(): self.assertIs(node.elements, root_node.elements) self.assertEqual(len(root_node.elements), 6) def test_document_order__issue_079(self): xml_source = '1011121314' root = ElementTree.XML(xml_source) root_node = build_node_tree(root) for k, node in enumerate(root_node.iter(), start=1): self.assertEqual(k, node.position, msg=node) if lxml_etree is not None: root = lxml_etree.XML(xml_source) root_node = build_lxml_node_tree(root) for k, node in enumerate(root_node.iter(), start=1): self.assertEqual(k, node.position, msg=node) xml_source = '1011121314' root = ElementTree.XML(xml_source) root_node = build_node_tree(root) for k, node in enumerate(root_node.iter(), start=1): self.assertEqual(k, node.position, msg=node) if lxml_etree is not None: root = lxml_etree.XML(xml_source) root_node = build_lxml_node_tree(root) for k, node in enumerate(root_node.iter(), start=1): self.assertEqual(k, node.position, msg=node) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_typing.py000066400000000000000000000035541476131650400230230ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # """Tests about static typing of elementpath objects.""" import unittest import importlib from pathlib import Path try: from mypy import api as mypy_api except ImportError: mypy_api = None try: lxml_stubs_module = importlib.import_module('lxml-stubs') except ImportError: lxml_stubs_module = None @unittest.skipIf(mypy_api is None, "mypy is not installed") @unittest.skipIf(lxml_stubs_module is None, "lxml-stubs is not installed") class TestTyping(unittest.TestCase): @classmethod def setUpClass(cls): cls.cases_dir = Path(__file__).parent.joinpath('mypy_tests') cls.config_file = Path(__file__).parent.parent.joinpath('mypy.ini') def test_selectors(self): result = mypy_api.run([ '--strict', '--config-file', str(self.config_file), str(self.cases_dir.joinpath('selectors.py')) ]) self.assertEqual(result[2], 0, msg=result[1] or result[0]) def test_protocols(self): result = mypy_api.run([ '--strict', '--config-file', str(self.config_file), str(self.cases_dir.joinpath('protocols.py')) ]) self.assertEqual(result[2], 0, msg=result[1] or result[0]) def test_advanced(self): result = mypy_api.run([ '--strict', '--config-file', str(self.config_file), str(self.cases_dir.joinpath('advanced.py')) ]) self.assertEqual(result[2], 0, msg=result[1] or result[0]) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_validators.py000066400000000000000000000043541476131650400236600ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from textwrap import dedent from xml.etree import ElementTree try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None from elementpath.validators import validate_analyzed_string, validate_json_to_xml @unittest.skipIf(xmlschema is None, "xmlschema library is not installed") class ValidatorsTest(unittest.TestCase): etree = ElementTree def test_validate_analyzed_string(self): xml_source = dedent("""\ The cat sat on the mat . """) root = self.etree.XML(xml_source) self.assertIsNone(validate_analyzed_string(root)) with self.assertRaises(ValueError): validate_analyzed_string(self.etree.XML('')) def test_validate_json_to_xml(self): xml_source = """\ 1 """ root = self.etree.XML(xml_source) self.assertIsNone(validate_json_to_xml(root)) with self.assertRaises(ValueError): validate_json_to_xml(self.etree.XML('')) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed") @unittest.skipIf(lxml_etree is None, "lxml library is not installed") class ValidatorsTest(ValidatorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath1_parser.py000066400000000000000000002467361476131650400243050ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import io import math import pickle from decimal import Decimal from textwrap import dedent from typing import Optional, List, Tuple from xml.etree import ElementTree try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath import datatypes, XPath1Parser, XPathContext, MissingContextError, \ NamespaceNode, TextNode, CommentNode, ProcessingInstructionNode, select, XPathFunction from elementpath.xpath_nodes import TextAttributeNode, EtreeElementNode from elementpath.namespaces import XSD_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE, \ XPATH_MATH_FUNCTIONS_NAMESPACE from elementpath.sequence_types import is_sequence_type try: from tests import xpath_test_class except ImportError: import xpath_test_class XML_GENERIC_TEST = """ some content space space \t . """ XML_DATA_TEST = """ 3.4 20 -10.1 alpha true 44 """ # noinspection PyPropertyAccess,PyTypeChecker class XPath1ParserTest(xpath_test_class.XPathTestCase): def setUp(self): self.parser = XPath1Parser(self.namespaces, strict=True) def test_string_representation(self): parser = self.parser.__class__() self.assertEqual( repr(parser), f"<{parser.__class__.__name__} object at {hex(id(parser))}>" ) self.assertEqual(str(parser), f"{parser.__class__.__name__}()") parser = self.parser.__class__(namespaces={'tst': 'http://xpath.test/ns'}) self.assertEqual( str(parser), f"{parser.__class__.__name__}({{'tst': 'http://xpath.test/ns'}})" ) def test_parser_pickling(self): if getattr(self.parser, 'schema', None) is None: obj = pickle.dumps(self.parser) parser = pickle.loads(obj) obj = pickle.dumps(self.parser.symbol_table) symbol_table = pickle.loads(obj) self.assertEqual(self.parser, parser) self.assertEqual(self.parser.symbol_table, symbol_table) def test_xpath_tokenizer(self): # tests from the XPath specification self.check_tokenizer("*", ['*']) self.check_tokenizer("text()", ['text', '(', ')']) self.check_tokenizer("@name", ['@', 'name']) self.check_tokenizer("@*", ['@', '*']) self.check_tokenizer("para[1]", ['para', '[', '1', ']']) self.check_tokenizer("para[last()]", ['para', '[', 'last', '(', ')', ']']) self.check_tokenizer("*/para", ['*', '/', 'para']) self.check_tokenizer("/doc/chapter[5]/section[2]", ['/', 'doc', '/', 'chapter', '[', '5', ']', '/', 'section', '[', '2', ']']) self.check_tokenizer("chapter//para", ['chapter', '//', 'para']) self.check_tokenizer("//para", ['//', 'para']) self.check_tokenizer("//olist/item", ['//', 'olist', '/', 'item']) self.check_tokenizer(".", ['.']) self.check_tokenizer(".//para", ['.', '//', 'para']) self.check_tokenizer("..", ['..']) self.check_tokenizer("../@lang", ['..', '/', '@', 'lang']) self.check_tokenizer("chapter[title]", ['chapter', '[', 'title', ']']) self.check_tokenizer( "employee[@secretary and @assistant]", ['employee', '[', '@', 'secretary', '', 'and', '', '@', 'assistant', ']'] ) self.check_tokenizer('/root/a/true()', ['/', 'root', '/', 'a', '/', 'true', '(', ')']) # additional tests from Python XML etree test cases self.check_tokenizer("{http://spam}egg", ['{', 'http', ':', '//', 'spam', '}', 'egg']) self.check_tokenizer("./spam.egg", ['.', '/', 'spam.egg']) self.check_tokenizer(".//spam:egg", ['.', '//', 'spam', ':', 'egg']) # additional tests self.check_tokenizer("substring-after()", ['substring-after', '(', ')']) self.check_tokenizer("contains('XML','XM')", ['contains', '(', "'XML'", ',', "'XM'", ')']) self.check_tokenizer( "concat('XML', true(), 10)", ['concat', '(', "'XML'", ',', '', 'true', '(', ')', ',', '', '10', ')'] ) self.check_tokenizer("concat('a', 'b', 'c')", ['concat', '(', "'a'", ',', '', "'b'", ',', '', "'c'", ')']) self.check_tokenizer("_last()", ['_last', '(', ')']) self.check_tokenizer("last ()", ['last', '', '(', ')']) self.check_tokenizer('child::text()', ['child', '::', 'text', '(', ')']) self.check_tokenizer('./ /.', ['.', '/', '', '/', '.']) self.check_tokenizer('tns :*', ['tns', '', ':', '*']) def test_token_classes(self): # Literals self.check_token( '(string)', 'literal', "'hello' string", value='hello' ) self.check_token( '(integer)', 'literal', "1999 integer", value=1999 ) self.check_token( '(float)', 'literal', "3.1415 float", value=3.1415 ) self.check_token( '(decimal)', 'literal', "217.35 decimal", value=217.35 ) self.check_token( '(name)', 'literal', "'schema' name", value='schema' ) # Variables self.check_token('$', 'operator', "$ variable reference") # Axes self.check_token('self', 'axis', "'self' axis") self.check_token('child', 'axis', "'child' axis") self.check_token('parent', 'axis', "'parent' axis") self.check_token('ancestor', 'axis', "'ancestor' axis") self.check_token( 'preceding', 'axis', "'preceding' axis" ) self.check_token('descendant-or-self', 'axis', "'descendant-or-self' axis") self.check_token('following-sibling', 'axis', "'following-sibling' axis") self.check_token('preceding-sibling', 'axis', "'preceding-sibling' axis") self.check_token('ancestor-or-self', 'axis', "'ancestor-or-self' axis") self.check_token('descendant', 'axis', "'descendant' axis") if self.parser.version == '1.0': self.check_token('attribute', 'axis', "'attribute' axis") self.check_token('following', 'axis', "'following' axis") self.check_token('namespace', 'axis', "'namespace' axis") # Functions self.check_token( symbol='position', expected_label='function', expected_str="'fn:position' function", ) # Operators self.check_token('and', 'operator', "'and' operator") if self.parser.version == '1.0': self.check_token(',', 'symbol', "comma symbol") else: self.check_token(',', 'operator', "comma operator") def test_token_tree(self): self.check_tree('child::B1', '(child (B1))') self.check_tree('A/B//C/D', '(/ (// (/ (A) (B)) (C)) (D))') self.check_tree('child::*/child::B1', '(/ (child (*)) (child (B1)))') self.check_tree('attribute::name="Galileo"', "(= (attribute (name)) ('Galileo'))") self.check_tree('1 + 2 * 3', '(+ (1) (* (2) (3)))') self.check_tree('(1 + 2) * 3', '(* (+ (1) (2)) (3))') self.check_tree("false() and true()", '(and (false) (true))') self.check_tree("false() or true()", '(or (false) (true))') self.check_tree("./A/B[C][D]/E", '(/ (/ (/ (.) (A)) ([ ([ (B) (C)) (D))) (E))') self.check_tree("string(xml:lang)", '(string (: (xml) (lang)))') self.check_tree("//text/preceding-sibling::text[1]", '(/ (// (text)) ([ (preceding-sibling (text)) (1)))') def test_token_source(self): self.check_source(' child ::B1', 'child::B1') self.check_source('false()', 'false()') self.check_source("concat('alpha', 'beta', 'gamma')", "concat('alpha', 'beta', 'gamma')") self.check_source('1 +2 * 3 ', '1 + 2 * 3') self.check_source('(1 + 2) * 3', '(1 + 2) * 3') self.check_source(' eg:example ', 'eg:example') self.check_source('attribute::name="Galileo"', "attribute::name = 'Galileo'") self.check_source(".//eg:a | .//eg:b", './/eg:a | .//eg:b') self.check_source("/A/B[C]", '/A/B[C]') if self.parser.version < '3.0': try: self.parser.strict = False self.check_source("{tns1}name", '{tns1}name') finally: self.parser.strict = True def test_parser_position(self): self.assertEqual(self.parser.position, (1, 1)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse('child::node())') self.assertIn('line 1, column 14', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' child::node())') self.assertIn('line 1, column 15', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' child::node( ))') self.assertIn('line 1, column 16', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' child::node() )') self.assertIn('line 1, column 16', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse('^)') self.assertIn('line 1, column 1', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: self.parser.parse(' ^)') self.assertIn('line 1, column 2', str(ctx.exception)) def test_wrong_syntax(self): self.wrong_syntax('') self.wrong_syntax(" \n \n )") self.wrong_syntax('child::1') self.wrong_syntax("{}egg") self.wrong_syntax("./*:*") self.wrong_syntax('./ /.') self.wrong_syntax(' eg : example ') def test_wrong_nargs(self): self.wrong_type("boolean()") # Too few arguments self.wrong_type("count(0, 1, 2)") # Too many arguments self.wrong_type("round(2.5, 1.7)") self.wrong_type("contains('XPath', 'XP', 20)") self.wrong_type("boolean(1, 5)") def test_xsd_qname_method(self): qname = self.parser.xsd_qname('string') self.assertEqual(qname, 'xs:string') parser = self.parser.__class__(namespaces={'xs': XSD_NAMESPACE}) parser.namespaces['xsd'] = parser.namespaces.pop('xs') self.assertEqual(parser.xsd_qname('string'), 'xsd:string') parser.namespaces.pop('xsd') with self.assertRaises(NameError) as ctx: parser.xsd_qname('string') self.assertIn('XPST0081', str(ctx.exception)) def test_check_variables_method(self): self.assertIsNone(self.parser.check_variables({ 'values': [1, 2, -1], 'myaddress': 'info@example.com', 'word': '' })) with self.assertRaises(TypeError) as ctx: self.parser.check_variables({'values': [None, 2, -1]}) error_message = str(ctx.exception) self.assertIn('XPDY0050', error_message) self.assertIn('Unmatched sequence type', error_message) with self.assertRaises(TypeError) as ctx: self.parser.check_variables({'other': None}) error_message = str(ctx.exception) self.assertIn('XPDY0050', error_message) self.assertIn('Unmatched sequence type', error_message) # XPath expression tests def test_node_selection(self): root = self.etree.XML('') self.check_value("mars", MissingContextError) context = XPathContext(root) self.check_value("mars", [], context=context) self.check_value("B1", [context.root[0]], context=context) self.check_value("B2", [context.root[1], context.root[3]], context=context) self.check_value("B4", [], context=context) def test_prefixed_references(self): namespaces = {'tst': "http://xpath.test/ns"} root = self.etree.XML(""" """) # Prefix references self.check_tree('eg:unknown', '(: (eg) (unknown))') self.check_tree('string(eg:unknown)', '(string (: (eg) (unknown)))') # Test evaluate method self.check_value("fn:true()", True) self.check_value("fx:true()", NameError) context = XPathContext(root) self.check_value("tst:B1", [context.root[1]], context=context) self.check_value("tst:B2", [context.root[3], context.root[7]], context=context) self.check_value("tst:B1:B2", SyntaxError) self.check_selector("./tst:B1", root, [root[0]], namespaces=namespaces) self.check_selector("./tst:*", root, root[:], namespaces=namespaces) self.wrong_syntax("./tst:1") self.check_value("./fn:A", MissingContextError) self.wrong_type("./xs:true()") # Namespace wildcard works only for XPath > 1.0 if self.parser.version == '1.0': self.check_selector("./*:B2", root, Exception, namespaces=namespaces) else: self.check_selector("./*:B2", root, [root[1], root[3]], namespaces=namespaces) def test_braced_uri_literal(self): root = self.etree.XML(""" """) self.parser.strict = False self.check_tree('{%s}string' % XSD_NAMESPACE, "({ ('http://www.w3.org/2001/XMLSchema') (string))") self.check_tree('string({%s}unknown)' % XSD_NAMESPACE, "(string ({ ('http://www.w3.org/2001/XMLSchema') (unknown)))") self.wrong_syntax("{%s" % XSD_NAMESPACE) self.wrong_syntax("{%s}1" % XSD_NAMESPACE) self.check_value("{%s}true()" % XPATH_FUNCTIONS_NAMESPACE, True) self.check_value("string({%s}true())" % XPATH_FUNCTIONS_NAMESPACE, 'true') context = XPathContext(root) name = '{%s}alpha' % XPATH_FUNCTIONS_NAMESPACE self.check_value(name, [], context) # it's not an error to use 'fn' namespace for a name self.parser.strict = True self.wrong_syntax('{%s}string' % XSD_NAMESPACE) if not hasattr(self.etree, 'LxmlError') or self.parser.version > '1.0': # Do not test with XPath 1.0 on lxml. self.check_selector( "./{http://www.w3.org/2001/04/xmlenc#}EncryptedData", root, [], strict=False) self.check_selector("./{http://xpath.test/ns}B1", root, [root[0]], strict=False) self.check_selector("./{http://xpath.test/ns}*", root, root[:], strict=False) def test_node_types(self): element = self.etree.Element('schema') context = XPathContext(element) attribute = TextAttributeNode('id', '0212349350') namespace = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') comment = CommentNode(self.etree.Comment('nothing important')) pi = ProcessingInstructionNode(self.etree.ProcessingInstruction('action')) text = TextNode('aldebaran') self.check_selector("node()", element, []) context.item = attribute self.check_select("self::node()", [attribute], context) context.item = namespace self.check_select("self::node()", [namespace], context) self.check_value("comment()", [], context=context) context.item = comment self.check_select("self::node()", [comment], context) self.check_select("self::comment()", [comment], context) self.check_value("comment()", MissingContextError) self.check_value("processing-instruction()", [], context=context) context.item = pi self.check_select("self::node()", [pi], context) self.check_select("self::processing-instruction()", [pi], context) self.check_select("self::processing-instruction('action')", [pi], context) self.check_select("self::processing-instruction('other')", [], context) self.check_value("processing-instruction()", MissingContextError) context.item = text self.check_select("self::node()", [text], context) self.check_select("text()", [], context) # Selects the children self.check_selector("node()", self.etree.XML('Dickens'), ['Dickens']) self.check_selector("text()", self.etree.XML('Dickens'), ['Dickens']) document = self.etree.parse(io.StringIO('Dickens')) root = document.getroot() if self.etree is not lxml_etree: # self.check_value("//self::node()", [document, root, 'Dickens'], context=context) # Skip lxml test because lxml's XPath doesn't include document root self.check_selector("//self::node()", document, [document, root, 'Dickens']) self.check_selector("/self::node()", document, [document]) self.check_selector("/self::node()", root, []) self.check_selector("//self::text()", root, ['Dickens']) context = XPathContext(document) self.check_select("node()", [context.root.getroot()], context) context = XPathContext(root) self.check_value("/self::node()", expected=[], context=context) context.item = 1 self.check_value("self::node()", expected=[], context=context) def test_unknown_function(self): self.wrong_type("unknown('5')", 'XPST0017', 'unknown function') def test_node_set_id_function(self): # XPath 1.0 id() function: https://www.w3.org/TR/1999/REC-xpath-19991116/#function-id root = self.etree.XML('') self.check_selector('id("foo")', root, [root[0]]) context = XPathContext(root) self.check_value('./B/@xml:id[id("bar")]', expected=[], context=context) context.item = CommentNode(self.etree.Comment('a comment')) self.check_value('id("foo")', expected=[], context=context) def test_node_set_functions(self): root = self.etree.XML('') context = XPathContext(root, item=root[1], size=3, position=3) self.check_value("position()", MissingContextError) self.check_value("position()", 3, context=context) self.check_value("position()<=2", MissingContextError) self.check_value("position()<=2", False, context=context) self.check_value("position()=3", True, context=context) self.check_value("position()=2", False, context=context) self.check_value("last()", MissingContextError) self.check_value("last()", 3, context=context) self.check_value("last()-1", 2, context=context) self.check_selector("name(.)", root, 'A') self.check_selector("name(A)", root, '') self.check_selector("name(1.0)", root, TypeError) self.check_selector("local-name(A)", root, '') self.check_selector("namespace-uri(A)", root, '') self.check_selector("name(B2)", root, 'B2') self.check_selector("local-name(B2)", root, 'B2') self.check_selector("namespace-uri(B2)", root, '') if self.parser.version <= '1.0': self.check_selector("name(*)", root, 'B1') context = XPathContext(root, item=self.etree.Comment('a comment')) self.check_value("name()", '', context=context) root = self.etree.XML('') self.check_selector("name(.)", root, 'tst:A', namespaces={'tst': "http://xpath.test/ns"}) self.check_selector("local-name(.)", root, 'A') self.check_selector("namespace-uri(.)", root, 'http://xpath.test/ns') self.check_selector("name(tst:B1)", root, 'tst:B1', namespaces={'tst': "http://xpath.test/ns"}) self.check_selector("name(tst:B1)", root, 'tst:B1', namespaces={'tst': "http://xpath.test/ns", '': ''}) def test_string_function(self): self.check_value("string()", MissingContextError) self.check_value("string(10.0)", '10') if self.parser.version == '1.0': self.wrong_syntax("string(())") else: self.check_value("string(())", '') root = self.etree.XML('foo') self.check_value("string()", 'foo', context=XPathContext(root)) def test_string_length_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("string-length('hello world')", 11) self.check_value("string-length('')", 0) self.check_selector("a[string-length(@id) = 4]", root, [root[0]]) self.check_selector("a[string-length(@id) = 3]", root, []) self.check_selector("//b[string-length(.) = 12]", root, [root[0][0]]) self.check_selector("//b[string-length(.) = 10]", root, []) self.check_selector("//none[string-length(.) = 10]", root, []) self.check_value('fn:string-length("Harp not on that string, madam; that is past.")', 45) if self.parser.version == '1.0': self.wrong_syntax("string-length(())") self.check_value("string-length(12345)", 5) else: self.check_value("string-length(())", 0) self.check_value("string-length(('alpha'))", 5) self.check_value("string-length(('alpha'))", 5) self.wrong_type("string-length(12345)") self.wrong_type("string-length(('12345', 'abc'))") self.parser.compatibility_mode = True self.check_value("string-length(('12345', 'abc'))", 5) self.check_value("string-length(12345)", 5) self.parser.compatibility_mode = False root = self.etree.XML('foo') self.check_value("string-length()", 3, context=XPathContext(root)) def test_normalize_space_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("normalize-space(' hello \t world ')", 'hello world') self.check_selector("//c[normalize-space(.) = 'space space .']", root, [root[0][1]]) self.check_value('fn:normalize-space(" The wealthy curled darlings of our nation. ")', 'The wealthy curled darlings of our nation.') if self.parser.version == '1.0': self.wrong_syntax('fn:normalize-space(())') self.check_value("normalize-space(1000)", '1000') self.check_value("normalize-space(true())", 'true') else: self.check_value('fn:normalize-space(())', '') self.wrong_type("normalize-space(true())") self.wrong_type("normalize-space(('\ta b c ', 'other'))") self.parser.compatibility_mode = True self.check_value("normalize-space(true())", 'true') self.check_value("normalize-space(('\ta b\tc ', 'other'))", 'a b c') self.parser.compatibility_mode = False def test_translate_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("translate('hello world!', 'hw', 'HW')", 'Hello World!') self.check_value("translate('hello world!', 'hwx', 'HW')", 'Hello World!') self.check_value("translate('hello world!', 'hw!', 'HW')", 'Hello World') self.check_value("translate('hello world!', 'hw', 'HW!')", 'Hello World!') self.check_selector("a[translate(@id, 'id', 'no') = 'a_no']", root, [root[0]]) self.check_selector("a[translate(@id, 'id', 'na') = 'a_no']", root, []) self.check_selector( "//b[translate(., 'some', 'one2') = 'one2 cnnt2nt']", root, [root[0][0]]) self.check_selector("//b[translate(., 'some', 'two2') = 'one2 cnnt2nt']", root, []) self.check_selector("//none[translate(., 'some', 'two2') = 'one2 cnnt2nt']", root, []) self.check_value('fn:translate("bar","abc","ABC")', 'BAr') self.check_value('fn:translate("--aaa--","abc-","ABC")', 'AAA') self.check_value('fn:translate("abcdabc", "abc", "AB")', "ABdAB") if self.parser.version > '1.0': self.check_value("translate((), 'hw', 'HW')", '') self.wrong_type("translate((), (), 'HW')", 'XPTY0004', '2nd argument', 'empty sequence') self.wrong_type("translate((), 'hw', ())", 'XPTY0004', '3rd argument', 'empty sequence') def test_variable_substitution(self): root = self.etree.XML('' ' 40kW' ' 20kW' ' 30kWXYZ' '') variables = {'ups1': root[0], 'ups2': root[1], 'ups3': root[2]} self.check_selector('string($ups1/power)', root, '40kW', variables=variables) context = XPathContext(root, variables=self.variables) self.check_value('$word', 'alpha', context) self.wrong_syntax('${http://xpath.test/ns}word', 'XPST0003') if self.parser.version == '1.0': self.wrong_syntax('$eg:word', 'variable reference requires a simple reference name') else: context = XPathContext(root, variables={'eg:color': 'purple'}) self.check_value('$eg:color', 'purple', context) def test_substring_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("substring('Preem Palver', 1)", 'Preem Palver') self.check_value("substring('Preem Palver', 2)", 'reem Palver') self.check_value("substring('Preem Palver', 7)", 'Palver') self.check_value("substring('Preem Palver', 1, 5)", 'Preem') self.wrong_type("substring('Preem Palver', 'c', 5)") self.wrong_type("substring('Preem Palver', 1, '5')") self.check_selector("a[substring(@id, 1) = 'a_id']", root, [root[0]]) self.check_selector("a[substring(@id, 2) = '_id']", root, [root[0]]) self.check_selector("a[substring(@id, 3) = '_id']", root, []) self.check_selector("//b[substring(., 1, 5) = 'some ']", root, [root[0][0]]) self.check_selector("//b[substring(., 1, 6) = 'some ']", root, []) self.check_selector("//none[substring(., 1, 6) = 'some ']", root, []) self.check_value("substring('12345', 1.5, 2.6)", '234') self.check_value("substring('12345', 0, 3)", '12') if self.parser.version == '1.0': self.check_value("substring('12345', 0 div 0, 3)", '') self.check_value("substring('12345', 1, 0 div 0)", '') self.check_value("substring('12345', -42, 1 div 0)", '12345') self.check_value("substring('12345', -1 div 0, 1 div 0)", '') else: self.check_value('fn:substring("motor car", 6)', ' car') self.check_value('fn:substring("metadata", 4, 3)', 'ada') self.check_value('fn:substring("12345", 1.5, 2.6)', '234') self.check_value('fn:substring("12345", 0, 3)', '12') self.check_value('fn:substring("12345", 5, -3)', '') self.check_value('fn:substring("12345", -3, 5)', '1') self.check_value('fn:substring("12345", 0 div 0E0, 3)', '') self.check_value('fn:substring("12345", 1, 0 div 0E0)', '') self.check_value('fn:substring((), 1, 3)', '') self.check_value('fn:substring("12345", -42, 1 div 0)', ZeroDivisionError) self.check_value('fn:substring("12345", -42, 1 div 0E0)', '12345') self.check_value('fn:substring("12345", -1 div 0E0, 1 div 0E0)', '') self.check_value('fn:substring(("alpha"), 1, 3)', 'alp') self.check_value('fn:substring(("alpha"), (1), 3)', 'alp') self.check_value('fn:substring(("alpha"), 1, (3))', 'alp') self.wrong_type('fn:substring(("alpha"), (1, 2), 3)') self.wrong_type('fn:substring(("alpha", "beta"), 1, 3)') self.parser.compatibility_mode = True self.check_value('fn:substring(("alpha", "beta"), 1, 3)', 'alp') self.check_value('fn:substring("12345", -42, 1 div 0E0)', '12345') self.check_value('fn:substring("12345", -1 div 0E0, 1 div 0E0)', '') self.parser.compatibility_mode = False def test_starts_with_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("starts-with('Hello World', 'Hello')", True) self.check_value("starts-with('Hello World', 'hello')", False) self.check_selector("a[starts-with(@id, 'a_i')]", root, [root[0]]) self.check_selector("a[starts-with(@id, 'a_b')]", root, []) self.check_selector("//b[starts-with(., 'some')]", root, [root[0][0]]) self.check_selector("//b[starts-with(., 'none')]", root, []) self.check_selector("//none[starts-with(., 'none')]", root, []) self.check_selector("a[starts-with(@id, 'a_id')]", root, [root[0]]) self.check_selector("a[starts-with(@id, 'a')]", root, [root[0]]) self.check_selector("a[starts-with(@id, 'a!')]", root, []) self.check_selector("//b[starts-with(., 'some')]", root, [root[0][0]]) self.check_selector("//b[starts-with(., 'a')]", root, []) self.check_value("starts-with('', '')", True) self.check_value('fn:starts-with("abracadabra", "abra")', True) self.check_value('fn:starts-with("abracadabra", "a")', True) self.check_value('fn:starts-with("abracadabra", "bra")', False) if self.parser.version == '1.0': self.wrong_syntax("starts-with((), ())") self.check_value("starts-with('1999', 19)", True) else: self.check_value('fn:starts-with("tattoo", "tat", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', True) self.check_value('fn:starts-with ("tattoo", "att", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', False) self.check_value('fn:starts-with ((), ())', True) self.wrong_type("starts-with('1999', 19)") self.parser.compatibility_mode = True self.check_value("starts-with('1999', 19)", True) self.parser.compatibility_mode = False def test_concat_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("concat('alpha', 'beta', 'gamma')", 'alphabetagamma') self.check_value("concat('', '', '')", '') self.check_value("concat('alpha', 10, 'gamma')", 'alpha10gamma') self.check_value("concat('alpha', 'beta', 'gamma')", 'alphabetagamma') self.check_value("concat('alpha', 10, 'gamma')", 'alpha10gamma') self.check_value("concat('alpha', 'gamma')", 'alphagamma') self.check_selector("a[concat(@id, '_foo') = 'a_id_foo']", root, [root[0]]) self.check_selector("a[concat(@id, '_fo') = 'a_id_foo']", root, []) self.check_selector("//b[concat(., '_foo') = 'some content_foo']", root, [root[0][0]]) self.check_selector("//b[concat(., '_fo') = 'some content_foo']", root, []) self.check_selector("//none[concat(., '_fo') = 'some content_foo']", root, []) self.wrong_type("concat()", 'XPST0017') if self.parser.version == '1.0': self.wrong_syntax("concat((), (), ())") else: self.check_value("concat((), (), ())", '') self.check_value("concat(('a'), (), ('c'))", 'ac') self.wrong_type("concat(('a', 'b'), (), ('c'))") self.parser.compatibility_mode = True self.check_value("concat(('a', 'b'), (), ('c'))", 'ac') self.parser.compatibility_mode = False def test_contains_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("contains('XPath','XP')", True) self.check_value("contains('XP','XPath')", False) self.check_value("contains('', '')", True) self.check_selector("a[contains(@id, '_i')]", root, [root[0]]) self.check_selector("a[contains(@id, '_b')]", root, []) self.check_selector("//b[contains(., 'c')]", root, [root[0][0]]) self.check_selector("//b[contains(., ' -con')]", root, []) self.check_selector("//none[contains(., ' -con')]", root, []) if self.parser.version == '1.0': self.wrong_syntax("contains((), ())") self.check_value("contains('XPath', 20)", False) else: self.check_value('fn:contains ( "tattoo", "t", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', True) self.check_value('fn:contains ( "tattoo", "ttt", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', False) self.check_value('fn:contains ( "", ())', True) self.wrong_type("contains('XPath', 20)") self.check_value('fn:contains(xs:untypedAtomic("abcde"), "bcd")', True) self.check_value('fn:contains(xs:anyURI("http://xpath.test"), "th")', True) self.parser.compatibility_mode = True try: self.check_value("contains('XPath', 20)", False) finally: self.parser.compatibility_mode = False def test_substring_before_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("substring-before('Wolfgang Amadeus Mozart', 'Wolfgang')", '') self.check_value("substring-before('Wolfgang Amadeus Mozart', 'Amadeus')", 'Wolfgang ') self.check_value('substring-before("1999/04/01","/")', '1999') self.check_selector("a[substring-before(@id, 'a') = '']", root, [root[0]]) self.check_selector("a[substring-before(@id, 'id') = 'a_']", root, [root[0]]) self.check_selector("a[substring-before(@id, 'id') = '']", root, []) self.check_selector("//b[substring-before(., ' ') = 'some']", root, [root[0][0]]) self.check_selector("//b[substring-before(., 'con') = 'some']", root, []) self.check_selector("//none[substring-before(., 'con') = 'some']", root, []) if self.parser.version == '1.0': self.check_value("substring-before('2017-10-27', 10)", '2017-') self.wrong_syntax("fn:substring-before((), ())") else: self.check_value('fn:substring-before ( "tattoo", "attoo", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', 't') self.check_value('fn:substring-before ( "tattoo", "tatto", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', '') self.check_value('fn:substring-before ((), ())', '') self.check_value('fn:substring-before ((), "")', '') self.wrong_type("substring-before('2017-10-27', 10)") self.parser.compatibility_mode = True self.check_value("substring-before('2017-10-27', 10)", '2017-') self.parser.compatibility_mode = False def test_substring_after_function(self): root = self.etree.XML(XML_GENERIC_TEST) self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Amadeus ')", 'Mozart') self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Mozart')", '') self.check_value("substring-after('', '')", '') self.check_value("substring-after('Mozart', 'B')", '') self.check_value("substring-after('Mozart', 'Bach')", '') self.check_value("substring-after('Mozart', 'Amadeus')", '') self.check_value("substring-after('Mozart', '')", 'Mozart') self.check_value('substring-after("1999/04/01","/")', '04/01') self.check_value('substring-after("1999/04/01","19")', '99/04/01') self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Amadeus ')", 'Mozart') self.check_value("substring-after('Wolfgang Amadeus Mozart', 'Mozart')", '') self.check_selector("a[substring-after(@id, 'a') = '_id']", root, [root[0]]) self.check_selector("a[substring-after(@id, 'id') = '']", root, [root[0]]) self.check_selector("a[substring-after(@id, 'i') = '']", root, []) self.check_selector("//b[substring-after(., ' ') = 'content']", root, [root[0][0]]) self.check_selector("//b[substring-after(., 'con') = 'content']", root, []) self.check_selector("//none[substring-after(., 'con') = 'content']", root, []) if self.parser.version == '1.0': self.wrong_syntax("fn:substring-after((), ())") else: self.check_value('fn:substring-after("tattoo", "tat")', 'too') self.check_value('fn:substring-after("tattoo", "tattoo")', '') self.check_value("fn:substring-after((), ())", '') self.wrong_type("substring-after('2017-10-27', 10)") self.parser.compatibility_mode = True self.check_value("substring-after('2017-10-27', 10)", '-27') self.parser.compatibility_mode = False def test_boolean_functions(self): self.check_value("true()", True) self.check_value("false()", False) self.check_value("not(false())", True) self.check_value("not(true())", False) self.check_value("boolean(0)", False) self.check_value("boolean(1)", True) self.check_value("boolean(-1)", True) self.check_value("boolean('hello!')", True) self.check_value("boolean(' ')", True) self.check_value("boolean('')", False) self.wrong_type("true(1)", 'XPST0017', "'fn:true' function has no arguments") self.wrong_syntax("true(", 'unexpected end of source') if self.parser.version == '1.0': self.wrong_syntax("boolean(())") else: self.check_value("boolean(())", False) def test_boolean_context_nonempty_elements(self): root = self.etree.XML(""" text """) context = XPathContext(root=root) root_token = self.parser.parse("boolean(node())") self.assertEqual(True, root_token.evaluate(context)) root_token = self.parser.parse("not(node())") self.assertEqual(False, root_token.evaluate(context)) root_token = self.parser.parse("not(not(node()))") self.assertEqual(True, root_token.evaluate(context)) def test_nonempty_elements(self): root = self.etree.XML(" text") context = XPathContext(root=root) root_token = self.parser.parse("normalize-space(text()) = ''") self.assertEqual(True, root_token.evaluate(context)) if self.parser.version > '1.0': with self.assertRaises(TypeError) as ctx: root_token.evaluate( context=XPathContext( root=self.etree.XML(" text ") # Two text nodes ... ) ) self.assertIn('sequence of more than one item is not allowed', str(ctx.exception)) elements = select(root, "//*") for element in elements: context = XPathContext(root=root, item=element) root_token = self.parser.parse("* or normalize-space(text()) != ''") self.assertEqual(True, root_token.evaluate(context), element) def test_lang_function(self): # From https://www.w3.org/TR/1999/REC-xpath-19991116/#section-Boolean-Functions root = self.etree.XML('') self.check_selector('lang("en")', root, True) root = self.etree.XML('
') document = self.etree.ElementTree(root) self.check_selector('lang("en")', root, True) if self.parser.version > '1.0': self.check_selector('para/lang("en")', root, True) context = XPathContext(root) self.check_value('for $x in . return $x/fn:lang(())', expected=[False], context=context) else: context = XPathContext(document, item=root[0]) self.check_value('lang("en")', True, context=context) self.check_value('lang("it")', False, context=context) root = self.etree.XML('') self.check_selector('lang("en")', root, False) if self.parser.version > '1.0': self.check_selector('b/c/lang("en")', root, False) self.check_selector('b/c/lang("en", .)', root, False) else: context = XPathContext(root, item=root[0][0]) self.check_value('lang("en")', False, context=context) self.check_selector('lang("en")', self.etree.XML(''), True) self.check_selector('lang("en")', self.etree.XML(''), True) self.check_selector('lang("en")', self.etree.XML(''), False) self.check_selector('lang("en")', self.etree.XML('
'), False) document = self.etree.ElementTree(root) context = XPathContext(root=document) if self.parser.version == '1.0': self.check_value('lang("en")', expected=False, context=context) else: self.check_value('lang("en")', expected=TypeError, context=context) context.item = document self.check_value('for $x in /a/b/c return $x/fn:lang("en")', expected=[False], context=context) def test_logical_and_operator(self): self.check_value("false() and true()", False) self.check_value("true() and true()", True) self.check_value("1 and 0", False) self.check_value("1 and 1", True) self.check_value("1 and 'jupiter'", True) self.check_value("0 and 'mars'", False) self.check_value("1 and mars", MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value("1 and mars", False, context) def test_logical_or_operator(self): self.check_value("false() or true()", True) self.check_value("true() or false()", True) def test_logical_expressions(self): root_token = self.parser.parse("(@a and not(@b)) or (not(@a) and @b)") context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) def test_comparison_operators(self): self.check_value("0.05 = 0.05", True) self.check_value("19.03 != 19.02999", True) self.check_value("-1.0 = 1.0", False) self.check_value("1 <= 2", True) self.check_value("5 >= 9", False) self.check_value("5 > 3", True) self.check_value("5 < 20.0", True) self.check_value("2 * 2 = 4", True) self.wrong_syntax("5 > 3 < 4", "unexpected '<' operator") if self.parser.version == '1.0': self.check_value("false() = 1", False) self.check_value("0 = false()", True) else: self.wrong_type("false() = 1") self.wrong_type("0 = false()") self.wrong_value('xs:untypedAtomic("1") = xs:dayTimeDuration("PT1S")', 'FORG0001', "'1' is not an xs:duration value") def test_comparison_of_sequences(self): root = self.etree.XML('' ' 50' ' 30' ' 20' ' 40' '
') self.check_selector("/table/unit[2]/cost <= /table/unit[1]/cost", root, True) self.check_selector("/table/unit[2]/cost > /table/unit[position()!=2]/cost", root, True) self.check_selector("/table/unit[3]/cost > /table/unit[position()!=3]/cost", root, False) self.check_selector(". = 'Dickens'", self.etree.XML('Dickens'), True) def test_numerical_expressions(self): self.check_value("9", 9) self.check_value("-3", -3) self.check_value("7.1", Decimal('7.1')) self.check_value("0.45e3", 0.45e3) self.check_value(" 7+5 ", 12) self.check_value("8 - 5", 3) self.check_value("-8 - 5", -13) self.check_value("-3 * 7", -21) self.check_value("(5 * 7) + 9", 44) self.check_value("-3 * 7", -21) self.check_value('(2 + 4) * 5', 30) self.check_value('2 + 4 * 5', 22) # From W3C XQuery/XPath test suite self.wrong_syntax('1.1.1.E2') self.wrong_syntax('.0.1') def test_addition_and_subtraction_operators(self): # '+' and '-' are both prefix and infix operators. The binding # power is equal to 40 but the nud() method is set with rbp=70. self.check_value("9 + 1 + 6", 16) self.check_tree("9 - 1 + 6", '(+ (- (9) (1)) (6))') self.check_value("(9 - 1) + 6", 14) self.check_value("9 - 1 + 6", 14) self.check_tree('1 + 2 * 4 + (1 + 2 + 3 * 4)', '(+ (+ (1) (* (2) (4))) (+ (+ (1) (2)) (* (3) (4))))') self.check_value('1 + 2 * 4 + (1 + 2 + 3 * 4)', 24) self.check_tree('15 - 13.64 - 1.36', "(- (- (15) (Decimal('13.64'))) (Decimal('1.36')))") self.check_tree('15 + 13.64 + 1.36', "(+ (+ (15) (Decimal('13.64'))) (Decimal('1.36')))") self.check_value('15 - 13.64 - 1.36', 0) if self.parser.version != '1.0': self.check_tree('(5, 6) instance of xs:integer+', '(instance (, (5) (6)) (: (xs) (integer)))') self.check_tree('- 1 instance of xs:int', "(instance (- (1)) (: (xs) (int)))") self.check_tree('+ 1 instance of xs:int', "(instance (+ (1)) (: (xs) (int)))") self.wrong_type('2 - 1 instance of xs:int', 'XPTY0004') def test_div_operator(self): self.check_value("5 div 2", 2.5) self.check_value("0 div 2", 0.0) if self.parser.version == '1.0': self.check_value("10div 3", SyntaxError) # TODO: accepted syntax in XPath 1.0 else: self.check_value("() div 2") self.check_raise('1 div 0.0', ZeroDivisionError, 'FOAR0001', 'Division by zero') def test_numerical_add_operator(self): self.check_value("3 + 8", 11) self.check_value("+9", 9) self.wrong_syntax("+") root = self.etree.XML(XML_DATA_TEST) if self.parser.version == '1.0': self.check_value("'9' + 5.0", 14) self.check_selector("/values/a + 2", root, 5.4) self.check_value("/values/b + 2", float('nan'), context=XPathContext(root)) self.check_value("+'alpha'", float('nan')) self.check_value("3 + 'alpha'", float('nan')) else: self.check_selector("/values/a + 2", root, TypeError) self.check_value("/values/b + 2", ValueError, context=XPathContext(root)) self.wrong_type("+'alpha'") self.wrong_type("3 + 'alpha'") self.check_value("() + 81") self.check_value("72 + ()") self.check_value("+()") self.wrong_type('xs:dayTimeDuration("P1D") + xs:duration("P6M")', 'XPTY0004') self.check_selector("/values/d + 3", root, 47) def test_numerical_sub_operator(self): self.check_value("9 - 5.0", 4) self.check_value("-8", -8) self.wrong_syntax("-") root = self.etree.XML(XML_DATA_TEST) if self.parser.version == '1.0': self.check_value("'9' - 5.0", 4) self.check_selector("/values/a - 2", root, 1.4) self.check_value("/values/b - 1", float('nan'), context=XPathContext(root)) self.check_value("-'alpha'", float('nan')) self.check_value("3 - 'alpha'", float('nan')) else: self.check_selector("/values/a - 2", root, TypeError) self.check_value("/values/b - 2", ValueError, context=XPathContext(root)) self.wrong_type("-'alpha'") self.wrong_type("3 - 'alpha'") self.check_value("() - 6") self.check_value("19 - ()") self.check_value("-()") self.wrong_type('xs:duration("P3Y") - xs:yearMonthDuration("P2Y3M")', 'XPTY0004') self.check_selector("/values/d - 3", root, 41) def test_numerical_mod_operator(self): self.check_value("11 mod 3", 2) self.check_value("4.5 mod 1.2", Decimal('0.9')) self.check_value("1.23E2 mod 0.6E1", 3.0E0) self.check_value("10 mod 0e1", math.isnan) self.check_raise('3 mod 0', ZeroDivisionError, 'FOAR0001') root = self.etree.XML(XML_DATA_TEST) if self.parser.version == '1.0': self.check_selector("/values/a mod 2", root, 1.4) self.check_value("/values/b mod 2", float('nan'), context=XPathContext(root)) else: self.check_selector("/values/a mod 2", root, TypeError) self.check_value("/values/b mod 2", TypeError, context=XPathContext(root)) self.check_value("() mod 2e1") self.check_value("2 mod xs:float('INF')", 2) self.check_selector("/values/d mod 3", root, 2) def test_number_function(self): root = self.etree.XML('15') self.check_value("number()", MissingContextError) self.check_value("number()", 15, context=XPathContext(root)) self.check_value("number()", 15, context=XPathContext(root, item=root.text)) self.check_value("number(.)", 15, context=XPathContext(root)) self.check_value("number(5.0)", 5.0) self.check_value("number('text')", math.isnan) self.check_value("number('-11')", -11) self.check_selector("number(9)", root, 9.0) if self.parser.version == '1.0': self.wrong_syntax("number(())") else: self.check_value("number(())", float('nan'), context=XPathContext(root)) root = self.etree.XML(XML_DATA_TEST) self.check_selector("/values/a/number()", root, [3.4, 20.0, -10.1]) results = select(root, "/values/*/number()", parser=self.parser.__class__) self.assertEqual(results[:3], [3.4, 20.0, -10.1]) self.assertTrue(math.isnan(results[3]) and math.isnan(results[4])) self.check_selector("number(/values/d)", root, 44.0) self.check_selector("number(/values/a)", root, TypeError) def test_count_function(self): root = self.etree.XML('') self.check_selector("count(B)", root, 3) self.check_selector("count(.//C)", root, 5) root = self.etree.XML('5') self.check_selector("count(@avg)", root, 0) self.check_selector("count(@max)", root, 1) self.check_selector("count(@min)", root, 1) self.check_selector("count(@min | @max)", root, 2) self.check_selector("count(@min | @avg)", root, 1) self.check_selector("count(@top | @avg)", root, 0) self.check_selector("count(@min | @max) = 1", root, False) self.check_selector("count(@min | @max) = 2", root, True) def test_sum_function(self): root = self.etree.XML(XML_DATA_TEST) context = XPathContext(root, variables=self.variables) self.check_value("sum($values)", 35, context) self.check_selector("sum(/values/a)", root, 13.3) if self.parser.version == '1.0': self.check_selector("sum(/values/*)", root, math.isnan) self.wrong_syntax("sum(())") else: self.check_selector("sum(/values/*)", root, TypeError) self.check_value("sum(())", 0) self.check_value("sum((), ())", []) self.check_value('sum((xs:yearMonthDuration("P2Y"), xs:yearMonthDuration("P1Y")))', datatypes.YearMonthDuration(months=36)) self.wrong_type('sum((xs:duration("P2Y"), xs:duration("P1Y")))', 'FORG0006') self.wrong_type('sum(("P2Y", "P1Y"))', 'FORG0006') self.check_value("sum((1.0, xs:float('NaN')))", math.isnan) def test_ceiling_function(self): root = self.etree.XML(XML_DATA_TEST) self.check_value("ceiling(10.5)", 11) self.check_value("ceiling(-10.5)", -10) self.check_selector("//a[ceiling(.) = 10]", root, []) self.check_selector("//a[ceiling(.) = -10]", root, [root[2]]) if self.parser.version == '1.0': self.wrong_syntax("ceiling(())") else: self.check_value("ceiling(())", []) self.check_value("ceiling((10.5))", 11) self.check_value("ceiling((xs:float('NaN')))", math.isnan) self.wrong_type("ceiling((10.5, 17.3))") def test_floor_function(self): root = self.etree.XML(XML_DATA_TEST) self.check_value("floor(10.5)", 10) self.check_value("floor(-10.5)", -11) self.check_selector("//a[floor(.) = 10]", root, []) self.check_selector("//a[floor(.) = 20]", root, [root[1]]) if self.parser.version == '1.0': self.wrong_syntax("floor(())") self.check_selector("//ab[floor(.) = 10]", root, []) else: self.check_value("floor(())", []) self.check_value("floor((10.5))", 10) self.wrong_type("floor((10.5, 17.3))") def test_round_function(self): self.check_value("round(2.5)", 3) self.check_value("round(2.4999)", 2) self.check_value("round(-2.5)", -2) if self.parser.version == '1.0': self.wrong_syntax("round(())") self.check_value("round('foo')", math.isnan) else: self.check_value("round(())", []) self.check_value("round((10.5))", 11) self.wrong_type("round((2.5, 12.2))") self.check_value("round(xs:double('NaN'))", math.isnan) self.wrong_type("round('foo')", 'XPTY0004') self.check_value('fn:round(xs:double("1E300"))', 1E300) def test_context_variables(self): root = self.etree.XML('') context = XPathContext(root, variables={'alpha': 10, 'id': '19273222'}) self.check_value("$alpha", MissingContextError) self.check_value("$alpha", 10, context=context) self.check_value("$beta", NameError, context=context) self.check_value("$id", '19273222', context=context) if self.parser.version == '1.0': self.wrong_type("$id()", 'XPST0017') else: self.check_value("$id()", MissingContextError) def test_path_step_operator(self): root = self.etree.XML('') document = self.etree.ElementTree(root) self.check_selector('/', root, []) if self.etree is ElementTree or self.parser.version > '1.0': # Skip lxml'xpath() comparison because it doesn't include document selection self.check_selector('/', document, [document]) self.check_selector('/B1', root, []) self.check_selector('/A1', root, []) self.check_selector('/A', root, [root]) self.check_selector('/A/B1', root, [root[0]]) self.check_selector('/A/*', root, [root[0], root[1], root[2]]) self.check_selector('/*/*', root, [root[0], root[1], root[2]]) self.check_selector('/A/B1/C1', root, [root[0][0]]) self.check_selector('/A/B1/*', root, [root[0][0]]) self.check_selector('/A/B3/*', root, [root[2][0], root[2][1]]) self.check_selector('child::*/child::C1', root, [root[0][0], root[2][0]]) self.check_selector('/A/child::B3', root, [root[2]]) self.check_selector('/A/child::C1', root, []) if self.parser.version == '1.0': self.wrong_type('/true()') self.wrong_type('/A/true()') self.wrong_syntax('/|') else: self.check_value('/true()', [True], context=XPathContext(root)) self.check_value('/A/true()', [True], context=XPathContext(root)) self.wrong_syntax('/|') root = self.etree.XML("") context = XPathContext(root) self.check_value('/A', [context.root], context=context) context = XPathContext(root, item=root[0][0]) self.check_value('/A', [context.root], context=context) def test_path_step_operator_with_duplicates(self): root = self.etree.XML('1011121314') self.check_selector('/A/node()', root, ['10', root[0], '12', root[1], '13', root[2]]) self.check_selector('/A/node() | /A/node()', root, ['10', root[0], '12', root[1], '13', root[2]]) self.check_selector('/A/node() | /A/B/text()', root, ['10', root[0], '11', '12', root[1], '13', root[2], '14']) root = self.etree.XML('') self.check_selector('/A/B1/@a', root, ['2', '3', '4']) self.check_selector('/A/B1/@a | /A/B1/@a', root, ['2', '3', '4']) self.check_selector('/A/B1/@a | /A/@a', root, ['1', '2', '3', '4']) self.check_selector('/A/B1/@a | /A/B2/@a', root, ['2', '3', '4', '5']) def test_context_item_expression(self): root = self.etree.XML('') self.check_selector('.', root, [root]) self.check_selector('./.', root, [root]) self.check_selector('././.', root, [root]) self.check_selector('./././.', root, [root]) self.check_selector('/', root, []) self.check_selector('/.', root, []) self.check_selector('/./.', root, []) self.check_selector('/././.', root, []) self.check_selector('/A/.', root, [root]) self.check_selector('/A/B1/.', root, [root[0]]) self.check_selector('/A/B1/././.', root, [root[0]]) self.check_selector('1/.', root, TypeError) document = self.etree.ElementTree(root) context = XPathContext(root) self.check_value('.', [context.root], context=context) context = XPathContext(root=document) self.check_value('.', [context.root], context=context) def test_self_axis(self): root = self.etree.XML('A textB1 textB3 text') self.check_selector('self::node()', root, [root]) self.check_selector('self::text()', root, []) def test_child_axis(self): root = self.etree.XML('A textB1 textB3 text') self.check_selector('child::B1', root, [root[0]]) self.check_selector('child::A', root, []) self.check_selector('child::text()', root, ['A text']) self.check_selector('child::node()', root, ['A text'] + root[:]) self.check_selector('child::*', root, root[:]) root = self.etree.XML('') self.check_selector('child::eg:A', root, [], namespaces={'eg': 'http://www.example.com/ns/'}) self.check_selector('child::eg:B1', root, [root[0]], namespaces={'eg': 'http://www.example.com/ns/'}) def test_descendant_axis(self): root = self.etree.XML('') self.check_selector('descendant::node()', root, [e for e in root.iter()][1:]) self.check_selector('/descendant::node()', root, [e for e in root.iter()]) def test_descendant_or_self_axis(self): root = self.etree.XML('') self.check_selector('descendant-or-self::node()', root, [e for e in root.iter()]) self.check_selector('descendant-or-self::node()/.', root, [e for e in root.iter()]) def test_double_slash_shortcut(self): root = self.etree.XML('') self.check_selector('//.', root, [e for e in root.iter()]) self.check_selector('/A//.', root, [e for e in root.iter()]) self.check_selector('/A//self::node()', root, [e for e in root.iter()]) self.check_selector('//C1', root, [root[2][1]]) self.check_selector('//B2', root, [root[1]]) self.check_selector('//C', root, [root[0][0], root[2][0]]) self.check_selector('//*', root, [e for e in root.iter()]) self.check_value('/1//*', TypeError, context=XPathContext(root)) # Issue #14 root = self.etree.XML(""" """) self.check_selector('/pm/content/pmEntry/pmEntry//pmEntry[@pmEntryType]', root, []) root = self.etree.XML("") context = XPathContext(root) expected = list(e for e in context.root.iter() if isinstance(e, EtreeElementNode)) self.check_value('//*', expected=expected, context=context) context = XPathContext(root, item=root[0][0]) expected = list(e for e in context.root.iter() if isinstance(e, EtreeElementNode)) self.check_value('//*', expected=expected, context=context) root = self.etree.XML("") context = XPathContext(root) expected = list(e for e in context.root.iter() if isinstance(e, EtreeElementNode)) self.check_value('//A', expected=expected, context=context) def test_double_slash_shortcut_pr16(self): # Pull-Request #16 root = self.etree.XML("""
  • a
""") self.check_selector("//span", root, [root[0][0][0]]) # self.check_selector("//span[concat('', '', 'class_a')='class_a']/text()", root, ['a']) self.check_selector("//span[concat('', '', @class)='class_a']/text()", root, ['a']) def test_following_axis(self): root = self.etree.XML( '') self.check_selector('/A/B1/C1/following::*', root, [ root[1], root[2], root[2][0], root[2][1], root[3], root[3][0], root[3][0][0] ]) self.check_selector('/A/B1/following::C1', root, [root[2][0], root[3][0]]) self.check_value('following::*', MissingContextError) def test_following_sibling_axis(self): root = self.etree.XML('') self.check_selector( '/A/B1/C1/following-sibling::*', root, [root[0][1], root[0][2]]) self.check_selector( '/A/B2/C1/following-sibling::*', root, [root[1][1], root[1][2], root[1][3]]) self.check_selector('/A/B1/C1/following-sibling::C3', root, [root[0][2]]) self.check_selector("/A/B1/C1/1/following-sibling::*", root, TypeError) self.check_selector("/A/B1/C1/@a/following-sibling::*", root, []) self.check_value('following-sibling::*', MissingContextError) def test_attribute_abbreviation_and_axis(self): root = self.etree.XML('' '') self.check_selector('/A/B1/attribute::*', root, ['beta1']) self.check_selector('/A/B1/@*', root, ['beta1']) self.check_selector('/A/B3/attribute::*', root, {'beta2', 'beta3'}) self.check_selector('/A/attribute::*', root, {'1', 'alpha'}) root = self.etree.XML('10') self.check_selector('@choice', root, ['int']) root = self.etree.XML('10') self.check_selector('@choice', root, ['int']) self.check_selector('@choice="int"', root, True) self.check_value('@choice', MissingContextError) self.check_value('@1', SyntaxError, context=XPathContext(root)) def test_namespace_axis(self): root = self.etree.XML('10') namespaces = list(self.parser.DEFAULT_NAMESPACES.items()) \ + [('tst', 'http://xpath.test/ns')] if self.parser.version == '1.0': self.check_selector('/A/namespace::*', root, expected=set(namespaces), namespaces=dict(namespaces[-1:])) self.check_selector('/A/namespace::tst', root, expected=[('tst', 'http://xpath.test/ns')], namespaces=dict(namespaces[-1:])) else: self.check_selector('/A/namespace::*', root, expected={'http://www.w3.org/XML/1998/namespace', 'http://xpath.test/ns'}, namespaces=dict(namespaces[-1:])) self.check_selector('/A/namespace::tst', root, expected=['http://xpath.test/ns'], namespaces=dict(namespaces[-1:])) self.check_value('namespace::*', MissingContextError) self.check_value('./text()/namespace::*', [], context=XPathContext(root)) if self.parser.version >= '3.0': self.check_selector('/A/namespace::namespace-node()', root, expected={'http://www.w3.org/XML/1998/namespace', 'http://xpath.test/ns'}, namespaces=dict(namespaces[-1:])) def test_parent_shortcut_and_axis(self): root = self.etree.XML( '') self.check_selector('/A/*/C2/..', root, [root[2]]) self.check_selector('/A/*/*/..', root, [root[0], root[2], root[3]]) self.check_selector('//C2/..', root, [root[2]]) self.check_selector('/A/*/C2/parent::node()', root, [root[2]]) self.check_selector('/A/*/*/parent::node()', root, [root[0], root[2], root[3]]) self.check_selector('//C2/parent::node()', root, [root[2]]) self.check_selector('..', self.etree.ElementTree(root), []) self.check_value('..', MissingContextError) self.check_value('parent::*', MissingContextError) def test_ancestor_axes(self): root = self.etree.XML( '') self.check_selector('/A/B3/C1/ancestor::*', root, [root, root[2]]) self.check_selector('/A/B4/C1/ancestor::*', root, []) self.check_selector('/A/*/C1/ancestor::*', root, [root, root[0], root[1], root[2]]) self.check_selector('/A/*/C1/ancestor::B3', root, [root[2]]) self.check_selector('/A/B3/C1/ancestor-or-self::*', root, [root, root[2], root[2][0]]) self.check_selector('/A/*/C1/ancestor-or-self::*', root, [ root, root[0], root[0][0], root[1], root[1][0], root[2], root[2][0] ]) self.check_value('ancestor-or-self::*', MissingContextError) def test_preceding_axis(self): root = self.etree.XML('') self.check_selector('/A/B1/C2/preceding::*', root, [root[0][0]]) self.check_selector('/A/B2/C4/preceding::*', root, [ root[0], root[0][0], root[0][1], root[0][2], root[1][0], root[1][1], root[1][2] ]) root = self.etree.XML("") self.check_tree("/root/e/preceding::b", '(/ (/ (/ (root)) (e)) (preceding (b)))') self.check_selector('/root/e[2]/preceding::b', root, [root[0][0][0], root[0][1][0]]) self.check_value('preceding::*', MissingContextError) root = self.etree.XML('value') self.check_selector('./text()/preceding::*', root, []) def test_preceding_sibling_axis(self): root = self.etree.XML('') self.check_selector('/A/B1/C2/preceding-sibling::*', root, [root[0][0]]) self.check_selector('/A/B2/C4/preceding-sibling::*', root, [root[1][0], root[1][1], root[1][2]]) self.check_selector('/A/B1/C2/preceding-sibling::C3', root, []) def test_default_axis(self): """Tests about when child:: default axis is applied.""" root = self.etree.XML('firstsecond') self.check_selector('/root/a/*', root, [root[0][0]]) self.check_selector('/root/a/b', root, [root[0][0]]) self.check_selector('/root/a/node()', root, ['first', root[0][0], 'second']) self.check_selector('/root/a/text()', root, ['first', 'second']) self.check_selector('/root/a/attribute::*', root, ['1', '2']) if self.parser.version > '1.0': self.check_selector('/root/a/true()', root, [True, True]) self.check_selector('/root/a/attribute()', root, ['1', '2']) self.check_selector('/root/a/element()', root, [root[0][0]]) self.check_selector('/root/a/name()', root, ['a', 'a']) self.check_selector('/root/a/last()', root, [2, 2]) self.check_selector('/root/a/position()', root, [1, 2]) else: # Functions are not allowed after path step in XPath 1.0 self.wrong_type('/root/a/true()') def test_unknown_axis(self): self.wrong_name('unknown::node()', 'XPST0010') self.wrong_name('A/unknown::node()', 'XPST0010') def test_predicate(self): root = self.etree.XML('') self.check_selector('/A/B1[C2]', root, [root[0]]) self.check_selector('/A/B1[1]', root, [root[0]]) self.check_selector('/A/B1[2]', root, []) self.check_selector('/A/*[2]', root, [root[1]]) self.check_selector('/A/*[position()<2]', root, [root[0]]) self.check_selector('/A/*[last()-1]', root, [root[0]]) self.check_selector('/A/B2/*[position()>=2]', root, root[1][1:]) root = self.etree.XML("Asimov") self.check_selector("book/author[. = 'Asimov']", root, [root[0][0]]) self.check_selector("book/author[. = 'Dickens']", root, []) self.check_selector("book/author[text()='Asimov']", root, [root[0][0]]) root = self.etree.XML('hello ') self.check_selector("/A/*[' ']", root, root[:]) self.check_selector("/A/*['']", root, []) root = self.etree.XML("") self.check_tree("child::a[b][c]", '([ ([ (child (a)) (b)) (c))') self.check_selector("child::a[b][c]", root, [root[1]]) root = self.etree.XML("") self.check_tree("a[not(b)]", '([ (a) (not (b)))') self.check_value("a[not(b)]", [], context=XPathContext(root, item=root[0])) context = XPathContext(root, item=root[1]) self.check_value("a[not(b)]", [context.root[1][0]], context) self.check_raise('88[..]', TypeError, 'XPTY0020', 'Context item is not a node', context=XPathContext(root)) self.check_tree("preceding::a[not(b)]", '([ (preceding (a)) (not (b)))') self.check_value("a[preceding::a[not(b)]]", [], context=XPathContext(root, item=root[0])) self.check_value("a[preceding::a[not(b)]]", [], context=XPathContext(root, item=root[1])) def test_parenthesized_expression(self): self.check_value('(6 + 9)', 15) if self.parser.version == '1.0': self.check_value('()', SyntaxError) else: self.check_value('()', []) def test_union(self): root = self.etree.XML( '') self.check_selector('/A/B2 | /A/B1', root, root[:2]) self.check_selector('/A/B2 | /A/*', root, root[:]) self.check_selector('/A/B2 | /A/* | /A/B1', root, root[:]) self.check_selector('/A/@min | /A/@max', root, {'1', '10'}) self.check_raise('1|2|3', TypeError, 'XPTY0004', 'only XPath nodes are allowed', context=XPathContext(root)) def test_default_namespace(self): root = self.etree.XML('bar') self.check_selector('/foo', root, [root]) if self.parser.version == '1.0': # XPath 1.0 ignores the default namespace self.check_selector('/foo', root, [root], namespaces={'': 'ns'}) # foo --> foo else: self.check_selector('/foo', root, [], namespaces={'': 'ns'}) # foo --> {ns}foo if self.parser.version != '1.0': self.check_selector('/*:foo', root, [root], namespaces={'': 'ns'}) # foo --> {ns}foo root = self.etree.XML('bar') self.check_selector('/foo', root, []) if self.parser.version == '1.0': self.check_selector('/foo', root, [], namespaces={'': 'ns'}) else: self.check_selector('/foo', root, [root], namespaces={'': 'ns'}) root = self.etree.XML('') if self.parser.version > '1.0': self.check_selector("name(tst:B1)", root, 'B1' if self.etree is lxml_etree else 'tst:B1', namespaces={'tst': "http://xpath.test/ns"}) self.check_selector("name(B1)", root, 'B1', namespaces={'': "http://xpath.test/ns"}) else: # XPath 1.0 ignores the default namespace declarations self.check_selector("name(B1)", root, '', namespaces={'': "http://xpath.test/ns"}) def test_function_signatures(self): function_names = [] for tk in self.parser.symbol_table.values(): if issubclass(tk, XPathFunction) and 'function' in tk.label: function_names.append(tk.symbol) for st in tk.sequence_types: if 'dateTimeStamp' in st: self.assertFalse(is_sequence_type(st, self.parser), msg=st) with self.xsd_version_parser('1.1'): self.assertTrue(is_sequence_type(st, self.parser), msg=st) else: self.assertTrue(is_sequence_type(st, self.parser), msg=st) if self.parser.version == '1.0': self.assertEqual(len(self.parser.function_signatures), 36) elif self.parser.version == '2.0': self.assertEqual(len(self.parser.function_signatures), 151) elif self.parser.version == '3.0': self.assertEqual(len(self.parser.function_signatures), 221) for key, value in self.parser.function_signatures.items(): self.assertIsInstance(key, tuple) self.assertEqual(len(key), 2) self.assertIsInstance(key[0], datatypes.QName) self.assertIsInstance(key[1], int) try: self.assertIn(key[0].local_name, function_names) except AssertionError: self.assertIn(key[0].expanded_name, function_names) if self.parser.version <= '2.0': self.assertIn(key[0].namespace, XPATH_FUNCTIONS_NAMESPACE) elif self.parser.version == '3.0': self.assertIn(key[0].namespace, {XPATH_FUNCTIONS_NAMESPACE, XPATH_MATH_FUNCTIONS_NAMESPACE}) self.assertIsInstance(value, str) self.assertTrue(value.startswith('function(')) self.assertTrue(is_sequence_type(value), msg=value) def test_descendant_predicate__issue_51(self): root = self.etree.XML(dedent(""" V1 3 foo V3 5 bar V1 3 V2 3 """)) self.check_selector("//target[name=//var/name]", root, expected=[root[0]]) def test_order_with_descendants__issue_079(self): xml_data = ("Achille Compagnoni and " "Lino Lacedelli") root = self.etree.fromstring(xml_data) xpath_expr = '//descendant-or-self::text()' chunks = select(root, xpath_expr) self.assertListEqual(chunks, ['Achille Compagnoni ', ' and ', 'Lino Lacedelli']) xpath_expr = '//text()' chunks = select(root, xpath_expr) self.assertListEqual(chunks, ['Achille Compagnoni ', ' and ', 'Lino Lacedelli']) def test_get_function(self): func = self.parser.get_function('fn:true', 0) self.assertEqual(str(func), "'fn:true' function") self.assertTrue(repr(func).startswith('') namespaces: List[Tuple[Optional[str], str]] = [] namespaces.extend(self.parser.DEFAULT_NAMESPACES.items()) namespaces += [('tst', 'http://xpath.test/ns')] self.check_selector('/A/namespace::*', root, expected=set(namespaces), namespaces=dict(namespaces[-1:])) self.check_selector('/A/namespace::*', root, expected=set(namespaces)) root = self.etree.XML('') namespaces.append((None, 'http://xpath.test/ns')) self.check_selector('/tst:A/namespace::*', root, set(namespaces), namespaces=dict(namespaces[-2:-1])) def test_issue_25_with_count_function(self): root = lxml_etree.fromstring(""" C A P I T O L O I I I """) path = '//text/preceding-sibling::text' self.check_selector(path, root, root[:-1]) self.check_tree('//text[7]/preceding-sibling::text[1]', '(/ (// ([ (text) (7))) ([ (preceding-sibling (text)) (1)))') if self.parser.version != '1.0': self.check_tree('//text[7]/(preceding-sibling::text)[1]', '(/ (// ([ (text) (7))) ([ (preceding-sibling (text)) (1)))') path = '//text[7]/(preceding-sibling::text)[2]' self.check_selector(path, root, [root[1]]) path = '//text[7]/preceding-sibling::text[2]' self.check_selector(path, root, [root[4]]) path = 'count(//text[@size="12.482"][not(preceding-sibling::text[1][@size="12.482"])])' self.check_selector(path, root, 3) path = '//text[@size="12.482"][not(preceding-sibling::text[1][@size="12.482"])]' self.check_selector(path, root, [root[0], root[4], root[9]]) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath2_constructors.py000066400000000000000000001034321476131650400255430ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest import datetime import platform from decimal import Decimal try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None from elementpath import XPathContext, select from elementpath.datatypes import Timezone, DateTime10, DateTime, DateTimeStamp, \ GregorianDay, GregorianMonth, GregorianMonthDay, GregorianYear10, GregorianYearMonth10, \ Duration, YearMonthDuration, DayTimeDuration, Date10, Time, QName, UntypedAtomic, \ Base64Binary, HexBinary from elementpath.xpath_nodes import TextAttributeNode from elementpath.namespaces import XSD_NAMESPACE from elementpath.xpath_tokens import XPathConstructor try: from tests import xpath_test_class except ImportError: import xpath_test_class class XPath2ConstructorsTest(xpath_test_class.XPathTestCase): def test_constructor_class(self): ntk = 0 for token_class in self.parser.symbol_table.values(): if issubclass(token_class, XPathConstructor): self.assertEqual(token_class.label, 'constructor function') self.assertEqual(token_class.lbp, 90) self.assertEqual(token_class.rbp, 90) ntk += 1 self.assertGreaterEqual(ntk, 45) self.assertLessEqual(ntk, 50) def test_unknown_constructor(self): self.wrong_type("xs:unknown('5')", 'XPST0017', 'unknown constructor function') def test_invalid_arguments(self): # Invalid argument types (parsed by null-denotation method) self.wrong_type('xs:normalizedString(()', 'XPST0017') self.wrong_type('xs:normalizedString(5, 2)', 'XPST0017') def test_string_constructor(self): self.check_value("xs:string(5.0)", '5') self.check_value("xs:string(5.2)", '5.2') self.check_value('xs:string(" hello ")', ' hello ') self.check_value('xs:string("\thello \n")', '\thello \n') self.check_value('xs:string(())', []) self.wrong_type('xs:string(()', 'XPST0017') # canonical string representation of xs:hexBinary self.check_value('xs:string(xs:hexBinary("ef"))', 'EF') def test_normalized_string_constructor(self): self.check_value('xs:normalizedString("hello")', "hello") self.check_value('xs:normalizedString(" hello ")', " hello ") self.check_value('xs:normalizedString("\thello \n")', " hello ") self.check_value('xs:normalizedString(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:normalizedString(@a)', ' alpha beta ', context=context) def test_token_constructor(self): self.check_value('xs:token(" hello world ")', "hello world") self.check_value('xs:token("hello\t world\n")', "hello world") self.check_value('xs:token(xs:untypedAtomic("hello\t world\n"))', "hello world") self.check_value('xs:token(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:token(@a)', 'hello world', context=context) def test_language_constructor(self): self.check_value('xs:language(" en ")', "en") self.check_value('xs:language(xs:untypedAtomic(" en "))', "en") self.check_value('xs:language(" en-GB ")', "en-GB") self.check_value('xs:language("it-IT")', "it-IT") self.check_value('xs:language("i-klingon")', 'i-klingon') # IANA-registered language self.check_value('xs:language("x-another-language-code")', 'x-another-language-code') self.wrong_value('xs:language("MoreThan8")') self.check_value('xs:language(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:language(@a)', 'en-US', context=context) def test_nmtoken_constructor(self): self.check_value('xs:NMTOKEN(" :menù.09-_ ")', ":menù.09-_") self.check_value('xs:NMTOKEN(xs:untypedAtomic(" :menù.09-_ "))', ":menù.09-_") self.wrong_value('xs:NMTOKEN("alpha+")') self.wrong_value('xs:NMTOKEN("hello world")') self.check_value('xs:NMTOKEN(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:NMTOKEN(@a)', 'tns:example', context=context) def test_name_constructor(self): self.check_value('xs:Name(" :base ")', ":base") self.check_value('xs:Name(xs:untypedAtomic(" :base "))', ":base") self.check_value('xs:Name(" ::level_alpha ")', "::level_alpha") self.check_value('xs:Name("level-alpha")', "level-alpha") self.check_value('xs:Name("level.alpha\t\n")', "level.alpha") self.check_value('xs:Name("__init__ ")', "__init__") self.check_value('xs:Name("\u0110")', "\u0110") self.wrong_value('xs:Name("2_values")') self.wrong_value('xs:Name(" .values ")') self.wrong_value('xs:Name(" -values ")') self.check_value('xs:Name(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:Name(@a)', ':foo:', context=context) def test_ncname_constructor(self): self.check_value('xs:NCName(" base ")', "base") self.check_value('xs:NCName(xs:untypedAtomic(" base "))', "base") self.check_value('xs:NCName(" _level_alpha ")', "_level_alpha") self.check_value('xs:NCName("level-alpha")', "level-alpha") self.check_value('xs:NCName("level.alpha\t\n")', "level.alpha") self.check_value('xs:NCName("__init__ ")', "__init__") self.check_value('xs:NCName("\u0110")', "\u0110") self.wrong_value('xs:NCName("2_values")') self.wrong_value('xs:NCName(" .values ")') self.wrong_value('xs:NCName(" -values ")') self.check_value('xs:NCName(())', []) self.wrong_value('xs:NCName("tns:example")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:NCName(@a)', 'foo', context=context) def test_id_constructor(self): self.check_value('xs:ID("xyz")', 'xyz') self.check_value('xs:ID(xs:untypedAtomic("xyz"))', 'xyz') def test_idref_constructor(self): self.check_value('xs:IDREF("xyz")', 'xyz') self.check_value('xs:IDREF(xs:untypedAtomic("xyz"))', 'xyz') def test_entity_constructor(self): self.check_value('xs:ENTITY("xyz")', 'xyz') self.check_value('xs:ENTITY(xs:untypedAtomic("xyz"))', 'xyz') def test_qname_constructor(self): qname = QName(XSD_NAMESPACE, 'xs:element') self.check_value('xs:QName(())', []) self.check_value('xs:QName("xs:element")', qname) self.check_value('xs:QName(xs:QName("xs:element"))', qname) if self.parser.version == '2.0': self.wrong_type('xs:QName(xs:untypedAtomic("xs:element"))', 'XPTY0004') else: self.check_value('xs:QName(xs:untypedAtomic("xs:element"))', qname) self.wrong_type('xs:QName(5)', 'XPTY0004', "the argument has an invalid type") self.wrong_value('xs:QName("1")', 'FORG0001', "invalid value") def test_any_uri_constructor(self): self.check_value('xs:anyURI("")', '') self.check_value('xs:anyURI("https://example.com")', 'https://example.com') self.check_value('xs:anyURI("mailto:info@example.com")', 'mailto:info@example.com') self.check_value('xs:anyURI("urn:example:com")', 'urn:example:com') self.check_value('xs:anyURI(xs:untypedAtomic("urn:example:com"))', 'urn:example:com') self.check_value('xs:anyURI("../principi/libertà.html")', '../principi/libertà.html') self.check_value('xs:anyURI("../principi/libert%E0.html")', '../principi/libert%E0.html') self.check_value('xs:anyURI("../path/page.html#frag")', '../path/page.html#frag') self.wrong_value('xs:anyURI("../path/page.html#frag1#frag2")') self.wrong_value('xs:anyURI("https://example.com/index%.html")') self.wrong_value('xs:anyURI("https://example.com/index.%html")') self.wrong_value('xs:anyURI("https://example.com/index.html% frag")') self.check_value('xs:anyURI(())', []) if platform.python_version_tuple() >= ('3', '6') and \ platform.python_implementation() != 'PyPy': self.wrong_value('xs:anyURI("https://example.com:65536")', 'FORG0001', 'Port out of range 0-65535') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:anyURI(@a)', 'https://example.com', context=context) def test_boolean_constructor(self): self.check_value('xs:boolean(())', []) self.check_value('xs:boolean(1)', True) self.check_value('xs:boolean(0)', False) self.check_value('xs:boolean(xs:boolean(0))', False) self.check_value('xs:boolean(xs:untypedAtomic(0))', False) self.wrong_type('xs:boolean(xs:hexBinary("FF"))', 'XPTY0004', "HexBinary") self.wrong_value('xs:boolean("2")', 'FORG0001', "invalid value") def test_integer_constructors(self): self.wrong_value('xs:integer("hello")', 'FORG0001') self.check_value('xs:integer("19")', 19) self.check_value('xs:integer(xs:untypedAtomic("19"))', 19) self.check_value("xs:integer('-5')", -5) self.wrong_value("xs:integer('INF')", 'FORG0001') self.check_value("xs:integer('inf')", ValueError) self.wrong_value("xs:integer('NaN')", 'FORG0001') self.wrong_value("xs:integer(xs:float('-INF'))", 'FOCA0002') self.check_value("xs:integer(xs:double('NaN'))", ValueError) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:integer(@a)', 19, context=context) root = self.etree.XML('') context = XPathContext(root, item=float('nan')) self.check_value('xs:integer(.)', ValueError, context=context) self.wrong_value('xs:nonNegativeInteger("-1")') self.wrong_value('xs:nonNegativeInteger(-1)') self.check_value('xs:nonNegativeInteger(0)', 0) self.check_value('xs:nonNegativeInteger(1000)', 1000) self.wrong_value('xs:positiveInteger(0)') self.check_value('xs:positiveInteger("1")', 1) self.wrong_value('xs:negativeInteger(0)') self.check_value('xs:negativeInteger(-1)', -1) self.wrong_value('xs:nonPositiveInteger(1)') self.check_value('xs:nonPositiveInteger(0)', 0) self.check_value('xs:nonPositiveInteger("-1")', -1) def test_limited_integer_constructors(self): self.wrong_value('xs:long("true")') self.wrong_value('xs:long("340282366920938463463374607431768211456")') self.check_value('xs:long("-20")', -20) self.wrong_value('xs:int("-20 91")') self.wrong_value('xs:int("2147483648")') self.wrong_value('xs:int(xs:untypedAtomic("INF"))') self.check_value('xs:int("2147483647")', 2**31 - 1) self.check_value('xs:int("-2147483648")', -2**31) self.wrong_value('xs:short("40000")') self.check_value('xs:short("9999")', 9999) self.check_value('xs:short(-9999)', -9999) self.wrong_value('xs:byte(-129)') self.wrong_value('xs:byte(128)') self.check_value('xs:byte("-128")', -128) self.check_value('xs:byte(127)', 127) self.check_value('xs:byte(-90)', -90) self.wrong_value('xs:unsignedLong("-10")') self.check_value('xs:unsignedLong("3")', 3) self.wrong_value('xs:unsignedInt("-4294967296")') self.check_value('xs:unsignedInt("4294967295")', 2**32 - 1) self.wrong_value('xs:unsignedShort("-1")') self.check_value('xs:unsignedShort("0")', 0) self.wrong_value('xs:unsignedByte(-128)') self.check_value('xs:unsignedByte("128")', 128) def test_decimal_constructors(self): self.check_value('xs:decimal("19")', 19) self.check_value('xs:decimal("19")', Decimal) self.check_value('xs:decimal(xs:untypedAtomic("19"))', 19) self.wrong_value('xs:decimal("hello")', 'FORG0001') self.wrong_value('xs:decimal(xs:float("INF"))', 'FOCA0002') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:decimal(@a)', Decimal('10.3'), context=context) def test_double_constructor(self): self.wrong_value('xs:double("world")') self.check_value('xs:double("39.09")', 39.09) self.check_value('xs:double(xs:untypedAtomic("39.09"))', 39.09) self.check_value('xs:double(-5)', -5.0) self.check_value('xs:double(-5)', float) root = self.etree.XML('') context = XPathContext(root) context.item = context.root.attributes[0] self.check_value('xs:double(.)', float, context=context) self.check_value('xs:double(.)', 10.3, context=context) self.check_value('xs:double(1) instance of xs:double', True) self.check_value('xs:double(1) instance of xs:float', False) def test_float_constructor(self): self.wrong_value('xs:float("..")') self.wrong_value('xs:float("ab")', 'FORG0001') self.wrong_value('xs:float("inf")') self.check_value('xs:float(25.05)', 25.05) self.check_value('xs:float(xs:untypedAtomic(25.05))', 25.05) self.check_value('xs:float(-0.00001)', -0.00001) self.check_value('xs:float(0.00001)', float) self.check_value('xs:float("INF")', float('inf')) self.check_value('xs:float("-INF")', float('-inf')) root = self.etree.XML('') context = XPathContext(root) context.item = context.root.attributes[0] self.check_value('xs:float(.)', float, context=context) self.check_value('xs:float(.)', 10.3, context=context) self.parser._xsd_version = '1.1' try: self.check_value('xs:float(9.001)', 9.001) finally: self.parser._xsd_version = '1.1' self.check_value('xs:float(1) instance of xs:float', True) self.check_value('xs:float(1) instance of xs:double', False) def test_datetime_constructor(self): tz1 = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('xs:dateTime(())', []) self.check_value('xs:dateTime("1969-07-20T20:18:00")', DateTime10(1969, 7, 20, 20, 18)) self.check_value('xs:dateTime(xs:untypedAtomic("1969-07-20T20:18:00"))', DateTime10(1969, 7, 20, 20, 18)) self.check_value('xs:dateTime("2000-05-10T21:30:00+05:24")', datetime.datetime(2000, 5, 10, hour=21, minute=30, tzinfo=tz1)) self.check_value('xs:dateTime("1999-12-31T24:00:00")', datetime.datetime(2000, 1, 1, 0, 0)) self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime10(1969, 7, 20)) self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime10) with self.assertRaises(AssertionError): self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime) self.parser._xsd_version = '1.1' try: self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime(1969, 7, 20)) self.check_value('xs:dateTime(xs:date("1969-07-20"))', DateTime) finally: self.parser._xsd_version = '1.0' self.wrong_value('xs:dateTime("2000-05-10t21:30:00+05:24")') self.wrong_value('xs:dateTime("2000-5-10T21:30:00+05:24")') self.wrong_value('xs:dateTime("2000-05-10T21:3:00+05:24")') self.wrong_value('xs:dateTime("2000-05-10T21:13:0+05:24")') self.wrong_value('xs:dateTime("2000-05-10T21:13:0")') self.check_value('xs:dateTime("-25252734927766554-12-31T12:00:00")', OverflowError) self.wrong_type('xs:dateTime(50)', 'FORG0006', '1st argument has an invalid type') self.wrong_type('xs:dateTime("2000-05-10T21:30:00", "+05:24")', 'XPST0017') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:dateTime(@a)', DateTime10(1969, 7, 20, 20, 18), context=context) context.item = TextAttributeNode('a', str(DateTime10(1969, 7, 20, 20, 18))) self.check_value('xs:dateTime(.)', DateTime10(1969, 7, 20, 20, 18), context=context) context.item = TextAttributeNode('a', 'true') self.check_value('xs:dateTime(.)', ValueError, context=context) context.item = DateTime10(1969, 7, 20, 20, 18) self.check_value('xs:dateTime(.)', DateTime10(1969, 7, 20, 20, 18), context=context) def test_datetimestamp_constructor(self): tz0 = Timezone(datetime.timedelta(hours=7, minutes=0)) tz1 = Timezone(datetime.timedelta(hours=5, minutes=24)) ts = DateTimeStamp(1969, 7, 20, 20, 18, tzinfo=tz0) self.assertEqual(self.parser.xsd_version, '1.0') self.wrong_syntax('xs:dateTimeStamp("1969-07-20T20:18:00+07:00")') self.parser._xsd_version = '1.1' try: self.check_value('xs:dateTimeStamp(())', []) self.check_value('xs:dateTimeStamp("1969-07-20T20:18:00+07:00")', ts) self.check_value('xs:dateTimeStamp(xs:untypedAtomic("1969-07-20T20:18:00+07:00"))', ts) self.check_value('xs:dateTimeStamp("1969-07-20T20:18:00+07:00") ' 'castable as xs:dateTimeStamp', True) self.check_value('xs:untypedAtomic("1969-07-20T20:18:00+07:00") ' 'castable as xs:dateTimeStamp', True) self.check_value('xs:dateTime("1969-07-20T20:18:00+07:00") ' 'cast as xs:dateTimeStamp', ts) self.check_value('xs:dateTimeStamp("2000-05-10T21:30:00+05:24")', datetime.datetime(2000, 5, 10, hour=21, minute=30, tzinfo=tz1)) self.wrong_value('xs:dateTimeStamp("1999-12-31T24:00:00")') self.wrong_value('xs:dateTimeStamp("2000-05-10t21:30:00+05:24")') self.wrong_type('xs:dateTimeStamp("1969-07-20T20:18:00", "+07:00")', 'XPST0017') self.wrong_type('xs:dateTimeStamp("1969-07-20T20:18:00+07:00"', 'XPST0017') finally: self.parser._xsd_version = '1.0' def test_time_constructor(self): tz = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('xs:time("21:30:00")', datetime.datetime(2000, 1, 1, 21, 30)) self.check_value('xs:time(xs:untypedAtomic("21:30:00"))', datetime.datetime(2000, 1, 1, 21, 30)) self.check_value('xs:time("11:15:48+05:24")', datetime.datetime(2000, 1, 1, 11, 15, 48, tzinfo=tz)) self.check_value('xs:time(xs:dateTime("1969-07-20T20:18:00"))', Time(20, 18, 00)) self.wrong_value('xs:time("24:00:01")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:time(@a)', Time(13, 15, 39), context=context) context.item = Time(20, 10, 00) self.check_value('xs:time(.)', Time(20, 10, 00), context=context) def test_date_constructor(self): tz = Timezone(datetime.timedelta(hours=-14, minutes=0)) self.check_value('xs:date("2017-01-19")', datetime.datetime(2017, 1, 19)) self.check_value('xs:date(xs:untypedAtomic("2017-01-19"))', datetime.datetime(2017, 1, 19)) self.check_value('xs:date("2011-11-11-14:00")', datetime.datetime(2011, 11, 11, tzinfo=tz)) self.check_value('xs:date(xs:dateTime("1969-07-20T20:18:00"))', Date10(1969, 7, 20)) self.wrong_value('xs:date("2011-11-11-14:01")') self.wrong_value('xs:date("11-11-11")') self.check_value('xs:date(())', []) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:date(@a)', Date10(2017, 1, 19), context=context) class DummyXsdDateType(xpath_test_class.DummyXsdType): def is_list(self): pass def is_union(self): pass def is_simple(self): return True def decode(self, obj, *args, **kwargs): return Date10.fromstring(obj) def validate(self, obj, *args, **kwargs): Date10.validate(obj) context.item = TextAttributeNode('a', 'true') context.item.xsd_type = DummyXsdDateType() self.check_value('xs:date(.)', ValueError, context=context) context.item = TextAttributeNode('a', str(Date10(2017, 1, 19))) self.check_value('xs:date(.)', Date10(2017, 1, 19), context=context) context.item = TextAttributeNode('a', 'true') self.check_value('xs:date(.)', ValueError, context=context) root = self.etree.XML("2017-10-02") context = XPathContext(root) self.check_value('xs:date(.)', Date10(2017, 10, 2), context=context) root = self.etree.XML("2017-10-02") context = XPathContext(root) self.check_value('xs:date(.)', Date10(2017, 10, 2), context=context) context = XPathContext(root, item=Date10(2017, 10, 2)) self.check_value('xs:date(.)', Date10(2017, 10, 2), context=context) def test_gregorian_day_constructor(self): tz = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('xs:gDay("---30")', datetime.datetime(2000, 1, 30)) self.check_value('xs:gDay(xs:untypedAtomic("---30"))', datetime.datetime(2000, 1, 30)) self.check_value('xs:gDay("---21+05:24")', datetime.datetime(2000, 1, 21, tzinfo=tz)) self.check_value('xs:gDay(xs:dateTime("1969-07-20T20:18:00"))', GregorianDay(20)) self.wrong_value('xs:gDay("---32")') self.wrong_value('xs:gDay("--19")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gDay(@a)', GregorianDay(8), context=context) context.item = GregorianDay(10) self.check_value('xs:gDay(.)', GregorianDay(10), context=context) def test_gregorian_month_constructor(self): self.check_value('xs:gMonth("--09")', datetime.datetime(2000, 9, 1)) self.check_value('xs:gMonth(xs:untypedAtomic("--09"))', datetime.datetime(2000, 9, 1)) self.check_value('xs:gMonth("--12")', datetime.datetime(2000, 12, 1)) self.wrong_value('xs:gMonth("--9")') self.wrong_value('xs:gMonth("-09")') self.wrong_value('xs:gMonth("--13")') self.check_value('xs:gMonth(xs:dateTime("1969-07-20T20:18:00"))', GregorianMonth(7)) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gMonth(@a)', GregorianMonth(11), context=context) context.item = GregorianMonth(1) self.check_value('xs:gMonth(.)', GregorianMonth(1), context=context) def test_gregorian_month_day_constructor(self): tz = Timezone(datetime.timedelta(hours=-14, minutes=0)) self.check_value('xs:gMonthDay("--07-02")', datetime.datetime(2000, 7, 2)) self.check_value('xs:gMonthDay(xs:untypedAtomic("--07-02"))', datetime.datetime(2000, 7, 2)) self.check_value('xs:gMonthDay("--07-02-14:00")', datetime.datetime(2000, 7, 2, tzinfo=tz)) self.check_value('xs:gMonthDay(xs:dateTime("1969-07-20T20:18:00"))', GregorianMonthDay(7, 20)) self.wrong_value('xs:gMonthDay("--7-02")') self.wrong_value('xs:gMonthDay("-07-02")') self.wrong_value('xs:gMonthDay("--07-32")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gMonthDay(@a)', GregorianMonthDay(5, 20), context=context) context.item = GregorianMonthDay(1, 15) self.check_value('xs:gMonthDay(.)', GregorianMonthDay(1, 15), context=context) def test_gregorian_year_constructor(self): self.check_value('xs:gYear("2004")', datetime.datetime(2004, 1, 1)) self.check_value('xs:gYear(xs:untypedAtomic("2004"))', datetime.datetime(2004, 1, 1)) self.check_value('xs:gYear("-2004")', GregorianYear10(-2004)) self.check_value('xs:gYear("-12540")', GregorianYear10(-12540)) self.check_value('xs:gYear("12540")', GregorianYear10(12540)) self.check_value('xs:gYear(xs:dateTime("1969-07-20T20:18:00"))', GregorianYear10(1969)) self.wrong_value('xs:gYear("84")') self.wrong_value('xs:gYear("821")') self.wrong_value('xs:gYear("84")') self.check_value('"99999999999999999999999999999" castable as xs:gYear', False) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gYear(@a)', GregorianYear10(1999), context=context) context.item = GregorianYear10(1492) self.check_value('xs:gYear(.)', GregorianYear10(1492), context=context) def test_gregorian_year_month_constructor(self): self.check_value('xs:gYearMonth("2004-02")', datetime.datetime(2004, 2, 1)) self.check_value('xs:gYearMonth(xs:untypedAtomic("2004-02"))', datetime.datetime(2004, 2, 1)) self.check_value('xs:gYearMonth(xs:dateTime("1969-07-20T20:18:00"))', GregorianYearMonth10(1969, 7)) self.wrong_value('xs:gYearMonth("2004-2")') self.wrong_value('xs:gYearMonth("204-02")') self.check_value('"99999999999999999999999999999-01" castable as xs:gYearMonth', False) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:gYearMonth(@a)', GregorianYearMonth10(1900, 1), context=context) context.item = GregorianYearMonth10(1300, 10) self.check_value('xs:gYearMonth(.)', GregorianYearMonth10(1300, 10), context=context) def test_duration_constructor(self): self.check_value('xs:duration("P3Y5M1D")', (41, 86400)) self.check_value('xs:duration(xs:untypedAtomic("P3Y5M1D"))', (41, 86400)) self.check_value('xs:duration("P3Y5M1DT1H")', (41, 90000)) self.check_value('xs:duration("P3Y5M1DT1H3M2.01S")', (41, Decimal('90182.01'))) self.check_value('xs:untypedAtomic("P3Y5M1D") castable as xs:duration', True) self.check_value('"P8192912991912Y" castable as xs:duration', False) self.wrong_value('xs:duration("P3Y5M1X")') self.assertRaises(ValueError, self.parser.parse, 'xs:duration(1)') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:duration(@a)', Duration(months=17), context=context) context.item = Duration(months=12, seconds=86400) self.check_value('xs:duration(.)', Duration(12, 86400), context=context) root = self.etree.XML('P1Y5M') context = XPathContext(root) self.check_value('xs:duration(.)', Duration(months=17), context=context) def test_year_month_duration_constructor(self): self.check_value('xs:yearMonthDuration("P3Y5M")', (41, 0)) self.check_value('xs:yearMonthDuration(xs:untypedAtomic("P3Y5M"))', (41, 0)) self.check_value('xs:yearMonthDuration("-P15M")', (-15, 0)) self.check_value('xs:yearMonthDuration("-P20Y18M")', YearMonthDuration.fromstring("-P21Y6M")) self.check_value('xs:yearMonthDuration(xs:duration("P3Y5M"))', (41, 0)) self.check_value('xs:untypedAtomic("P3Y5M") castable as xs:yearMonthDuration', True) self.check_value('"P9999999999999999Y" castable as xs:yearMonthDuration', False) self.wrong_value('xs:yearMonthDuration("-P15M1D")') self.wrong_value('xs:yearMonthDuration("P15MT1H")') self.wrong_value('xs:yearMonthDuration("P1MT10H")') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:yearMonthDuration(@a)', Duration(months=17), context=context) context.item = YearMonthDuration(months=12) self.check_value('xs:yearMonthDuration(.)', YearMonthDuration(12), context=context) def test_day_time_duration_constructor(self): self.check_value('xs:dayTimeDuration("-P2DT15H")', DayTimeDuration(seconds=-226800)) self.check_value('xs:dayTimeDuration(xs:duration("-P2DT15H"))', DayTimeDuration(seconds=-226800)) self.check_value('xs:dayTimeDuration("PT240H")', DayTimeDuration.fromstring("P10D")) self.check_value('xs:dayTimeDuration("P365D")', DayTimeDuration.fromstring("P365D")) self.check_value('xs:dayTimeDuration(xs:untypedAtomic("PT240H"))', DayTimeDuration.fromstring("P10D")) self.check_value('xs:untypedAtomic("PT240H") castable as xs:dayTimeDuration', True) self.check_value('xs:dayTimeDuration("-P2DT15H0M0S")', DayTimeDuration.fromstring('-P2DT15H')) self.check_value('xs:dayTimeDuration("P3DT10H")', DayTimeDuration.fromstring("P3DT10H")) self.check_value('xs:dayTimeDuration("PT1S")', (0, 1)) self.check_value('xs:dayTimeDuration("PT0S")', (0, 0)) self.wrong_value('xs:dayTimeDuration("+P3DT10H")', 'FORG0001') self.check_value('xs:dayTimeDuration("P999999999999999D")', OverflowError) root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:dayTimeDuration(@a)', DayTimeDuration(496800), context=context) context.item = DayTimeDuration(86400) self.check_value('xs:dayTimeDuration(.)', DayTimeDuration(86400), context=context) def test_hex_binary_constructor(self): self.check_value('xs:hexBinary(())', []) self.check_value('xs:hexBinary("84")', HexBinary(b'84')) self.check_value('xs:hexBinary(xs:hexBinary("84"))', HexBinary(b'84')) self.wrong_type('xs:hexBinary(12)') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:hexBinary(@a)', HexBinary('84'), context=context) context.item = UntypedAtomic('84') self.check_value('xs:hexBinary(.)', HexBinary('84'), context=context) context.item = '84' self.check_value('xs:hexBinary(.)', HexBinary('84'), context=context) context.item = b'84' self.check_value('xs:hexBinary(.)', HexBinary('84'), context=context) context.item = b'XY' self.check_value('xs:hexBinary(.)', ValueError, context=context) context.item = b'F859' self.check_value('xs:hexBinary(.)', HexBinary(b'F859'), context=context) def test_base64_binary_constructor(self): self.check_value('xs:base64Binary(())', []) self.check_value('xs:base64Binary("ODQ=")', Base64Binary(b'ODQ=')) self.check_value('xs:base64Binary(xs:base64Binary("ODQ="))', Base64Binary(b'ODQ=')) self.check_value('xs:base64Binary("YWJjZWZnaGk=")', Base64Binary(b'YWJjZWZnaGk=')) self.wrong_value('xs:base64Binary("xyz")') self.wrong_value('xs:base64Binary("\u0411")') self.wrong_type('xs:base64Binary(1e2)') self.wrong_type('xs:base64Binary(1.1)') root = self.etree.XML('') context = XPathContext(root) self.check_value('xs:base64Binary(@a)', Base64Binary(b'YWJjZWZnaGk='), context=context) context.item = UntypedAtomic('YWJjZWZnaGk=') self.check_value('xs:base64Binary(.)', Base64Binary(b'YWJjZWZnaGk='), context=context) context.item = b'abcefghi' # Don't change, it can be an encoded value. self.check_value('xs:base64Binary(.)', Base64Binary(b'abcefghi'), context=context) context.item = b'YWJjZWZnaGlq' self.check_value('xs:base64Binary(.)', Base64Binary(b'YWJjZWZnaGlq'), context=context) def test_untyped_atomic_constructor(self): self.check_value('xs:untypedAtomic(())', []) root = self.etree.XML('1999') context = XPathContext(root) self.check_value('xs:untypedAtomic(.)', UntypedAtomic(1999), context=context) context.item = UntypedAtomic('true') self.check_value('xs:untypedAtomic(.)', UntypedAtomic(True), context=context) def test_notation_constructor(self): self.wrong_type('xs:NOTATION()', 'XPST0017') self.wrong_type('xs:NOTATION(()', 'XPST0017') self.wrong_type('xs:NOTATION(())', 'XPST0017', 'no constructor function exists for xs:NOTATION') self.wrong_name('"A120" castable as xs:NOTATION', 'XPST0080') def test_missing_context_on_namespaced_name__issue_068(self): namespaces = { 'test': "urn:example:names:common-test-names" } root = self.etree.XML(""" 1 """) context_node = select(root, "B", namespaces=namespaces)[0] result = select(root, "xs:decimal(./test:number)", namespaces, item=context_node) self.assertEqual(result, 1) result = select(root, "xs:decimal(test:number)", namespaces, item=context_node) self.assertEqual(result, 1) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath2ConstructorsTest(XPath2ConstructorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath2_functions.py000066400000000000000000002521571476131650400250140ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import datetime import io import locale import math import os import platform import time from textwrap import dedent from decimal import Decimal try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import XPath2Parser, XPathContext, ElementPathError, \ MissingContextError, select, Selector, datatypes, get_node_tree, \ NamespaceNode, TextNode from elementpath.xpath_nodes import TextAttributeNode from elementpath.namespaces import XSI_NAMESPACE, XML_NAMESPACE, XML_ID from elementpath.datatypes import DateTime10, DateTime, Date10, Date, Time, \ Timezone, DayTimeDuration, YearMonthDuration, QName, UntypedAtomic from elementpath.collations import UNICODE_CODEPOINT_COLLATION try: from tests import test_xpath1_parser except ImportError: import test_xpath1_parser XML_GENERIC_TEST = test_xpath1_parser.XML_GENERIC_TEST XML_POEM_TEST = """ Kaum hat dies der Hahn gesehen, Fängt er auch schon an zu krähen: «Kikeriki! Kikikerikih!!» Tak, tak, tak! - da kommen sie. """ try: from tests import xpath_test_class except ImportError: import xpath_test_class class XPath2FunctionsTest(xpath_test_class.XPathTestCase): def setUp(self): self.parser = XPath2Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] def test_boolean_function(self): root = self.etree.XML('') self.check_selector("boolean(/A)", root, True) self.check_selector("boolean((-10, 35))", root, TypeError) # Sequence with 2 numeric values self.check_selector("boolean((/A, 35))", root, True) def test_abs_function(self): # Test cases taken from https://www.w3.org/TR/xquery-operators/#numeric-value-functions self.check_value("abs(10.5)", 10.5) self.check_value("abs(-10.5)", 10.5) self.check_value("abs(())") root = self.etree.XML('-10') context = XPathContext(root, item=float('nan')) self.check_value("abs(.)", float('nan'), context=context) context = XPathContext(root) self.check_value("abs(.)", 10, context=context) context = XPathContext(root=self.etree.XML('foo')) self.wrong_type('abs("10")', 'XPTY0004', 'invalid argument type') with self.assertRaises(ValueError) as err: self.check_value("abs(.)", 10, context=context) self.assertIn('FOCA0002', str(err.exception)) self.assertIn('invalid string value', str(err.exception)) def test_round_half_to_even_function(self): self.check_value("round-half-to-even(())") self.check_value("round-half-to-even(0.5)", 0) self.check_value("round-half-to-even(1)", 1) self.check_value("round-half-to-even(1.5)", 2) self.check_value("round-half-to-even(2.5)", 2) self.check_value("round-half-to-even(xs:float(2.5))", 2) self.check_value("round-half-to-even(3.567812E+3, 2)", 3567.81E0) self.check_value("round-half-to-even(4.7564E-3, 2)", 0.0E0) self.check_value("round-half-to-even(35612.25, -2)", 35600) self.wrong_type('round-half-to-even(3.5, "2")', 'XPTY0004') self.check_value('fn:round-half-to-even(xs:double("1.0E300"))', 1.0E300) self.check_value('fn:round-half-to-even(4.8712122, 8328782878)', 4.8712122) root = self.etree.XML('') context = XPathContext(root, item=float('nan')) self.check_value("round-half-to-even(.)", float('nan'), context=context) self.wrong_type('round-half-to-even("wrong")', 'XPTY0004', 'invalid argument type') def test_sum_function(self): self.check_value("sum((10, 15, 6, -2))", 29) def test_avg_function(self): context = XPathContext(root=self.etree.XML(''), variables={ 'd1': YearMonthDuration.fromstring("P20Y"), 'd2': YearMonthDuration.fromstring("P10M"), 'seq3': [3, 4, 5] }) self.check_value("fn:avg($seq3)", 4.0, context=context) self.check_value("fn:avg(($d1, $d2))", YearMonthDuration.fromstring("P125M"), context=context) root_token = self.parser.parse("fn:avg(($d1, $seq3))") self.assertRaises(TypeError, root_token.evaluate, context=context) self.check_value("fn:avg(())") self.wrong_type("fn:avg('10')", 'FORG0006') self.check_value("fn:avg($seq3)", 4.0, context=context) self.check_value('avg((xs:float(1), xs:untypedAtomic(2), xs:integer(0)))', 1) self.check_value('avg((1.0, 2.0, 3.0))', 2) self.wrong_type('avg((xs:float(1), true(), xs:integer(0)))', 'FORG0006') self.wrong_type('avg((xs:untypedAtomic(3), xs:integer(3), "three"))', 'FORG0006', 'unsupported operand') root_token = self.parser.parse("fn:avg((xs:float('INF'), xs:float('-INF')))") self.assertTrue(math.isnan(root_token.evaluate(context))) root_token = self.parser.parse("fn:avg(($seq3, xs:float('NaN')))") self.assertTrue(math.isnan(root_token.evaluate(context))) root = self.etree.XML('19') self.check_selector('avg(/a/b/number(text()))', root, 5) def test_max_function(self): self.check_value("fn:max(())", []) self.check_value("fn:max((3,4,5))", 5) self.check_value("fn:max((3, 4, xs:float('NaN')))", float('nan')) self.check_value("fn:max((3,4,5), 'en_US.UTF-8')", 5) self.check_value("fn:max((5, 5.0e0))", 5.0e0) self.check_value("fn:max((xs:float(1.0E0), xs:double(15.0)))", 15.0) self.wrong_type("fn:max((3,4,'Zero'))") dt = datetime.datetime.now() self.check_value('fn:max((fn:current-date(), xs:date("2001-01-01")))', Date(dt.year, dt.month, dt.day, tzinfo=dt.tzinfo)) self.check_value('fn:max(("a", "b", "c"))', 'c') root = self.etree.XML('19') self.check_selector('max(/a/b/number(text()))', root, 9) self.check_selector('max(/a/b)', root, 9) self.check_value( 'max((xs:anyURI("http://xpath.test/ns0"), xs:anyURI("http://xpath.test/ns1")))', datatypes.AnyURI("http://xpath.test/ns1") ) self.check_value('max((xs:dayTimeDuration("P1D"), xs:dayTimeDuration("P2D")))', datatypes.DayTimeDuration(seconds=3600 * 48)) self.wrong_type('max(QName("http://xpath.test/ns", "foo"))', 'FORG0006', 'xs:QName is not an ordered type') self.wrong_type('max(xs:duration("P1Y"))', 'FORG0006', 'xs:duration is not an ordered type') def test_min_function(self): self.check_value("fn:min(())", []) self.check_value("fn:min((3,4,5))", 3) self.check_value("fn:min((3, 4, xs:float('NaN')))", float('nan')) self.check_value("fn:min((5, 5.0e0))", 5.0e0) self.check_value("fn:min((xs:float(0.0E0), xs:float(-0.0E0)))", 0.0) self.check_value("fn:min((xs:float(1.0E0), xs:double(15.0)))", 1.0) self.check_value('fn:min((fn:current-date(), xs:date("2001-01-01")))', Date.fromstring("2001-01-01")) self.check_value('fn:min(("a", "b", "c"))', 'a') root = self.etree.XML('19') self.check_selector('min(/a/b/number(text()))', root, 1) self.check_selector('min(/a/b)', root, 1) self.check_value( 'min((xs:anyURI("http://xpath.test/ns0"), xs:anyURI("http://xpath.test/ns1")))', datatypes.AnyURI("http://xpath.test/ns0") ) self.check_value('min((xs:dayTimeDuration("P1D"), xs:dayTimeDuration("P2D")))', datatypes.DayTimeDuration(seconds=3600 * 24)) self.wrong_type('min(QName("http://xpath.test/ns", "foo"))', 'FORG0006') self.wrong_type('min(xs:duration("P1Y"))', 'FORG0006') ### # Functions on strings def test_codepoints_to_string_function(self): self.check_value("codepoints-to-string((2309, 2358, 2378, 2325))", 'अशॊक') self.check_value("codepoints-to-string(2309)", 'अ') self.wrong_value("codepoints-to-string((55296))", 'FOCH0001') self.wrong_type("codepoints-to-string(('z'))", 'XPTY0004') self.wrong_type("codepoints-to-string((2309.1))", 'FORG0006') def test_string_to_codepoints_function(self): self.check_value('string-to-codepoints("Thérèse")', [84, 104, 233, 114, 232, 115, 101]) self.check_value('string-to-codepoints(())') self.wrong_type('string-to-codepoints(84)', 'XPTY0004') self.check_value('string-to-codepoints(("Thérèse"))', [84, 104, 233, 114, 232, 115, 101]) self.wrong_type('string-to-codepoints(("Thér", "èse"))', 'XPTY0004') def test_codepoint_equal_function(self): self.check_value("fn:codepoint-equal('abc', 'abc')", True) self.check_value("fn:codepoint-equal('abc', 'abcd')", False) self.check_value("fn:codepoint-equal('', '')", True) self.check_value("fn:codepoint-equal((), 'abc')") self.check_value("fn:codepoint-equal('abc', ())") self.check_value("fn:codepoint-equal((), ())") def test_compare_function(self): env_locale_setting = locale.getlocale(locale.LC_COLLATE) locale.setlocale(locale.LC_COLLATE, 'C') try: self.assertEqual(locale.getlocale(locale.LC_COLLATE), (None, None)) self.check_value("fn:compare('abc', 'abc')", 0) self.check_value("fn:compare('abc', 'abd')", -1) self.check_value("fn:compare('abc', 'abb')", 1) self.check_value("fn:compare('foo bar', 'foo bar')", 0) self.check_value("fn:compare('', '')", 0) self.check_value("fn:compare('abc', 'abcd')", -1) self.check_value("fn:compare('', ' foo bar')", -1) self.check_value("fn:compare('abcd', 'abc')", 1) self.check_value("fn:compare('foo bar', '')", 1) self.check_value('fn:compare("a","A")', 1) self.check_value('fn:compare("A","a")', -1) self.check_value('fn:compare("+++","++")', 1) self.check_value('fn:compare("1234","123")', 1) self.check_value("fn:count(fn:compare((), ''))", 0) self.check_value("fn:count(fn:compare('abc', ()))", 0) self.check_value("compare(xs:anyURI('http://example.com/'), 'http://example.com/')", 0) self.check_value( "compare(xs:untypedAtomic('http://example.com/'), 'http://example.com/')", 0 ) self.check_value('compare("𐀁", "𐀂", ' '"http://www.w3.org/2005/xpath-functions/collation/codepoint")', -1) self.check_value('compare("𐀁", "￰", ' '"http://www.w3.org/2005/xpath-functions/collation/codepoint")', 1) # Issue #17 self.check_value("fn:compare('Strassen', 'Straße')", -1) if platform.system() != 'Linux': return locale.setlocale(locale.LC_COLLATE, 'en_US.UTF-8') self.check_value("fn:compare('Strasse', 'Straße')", -1) self.check_value("fn:compare('Strassen', 'Straße')", -1) try: self.check_value("fn:compare('Strasse', 'Straße', " "'http://www.w3.org/2013/collation/UCA?lang=it_IT.UTF-8')", -1) self.check_value("fn:compare('Strasse', 'Straße', 'it_IT.UTF-8')", -1) except locale.Error: pass # Skip test if 'it_IT.UTF-8' is an unknown locale setting try: self.check_value("fn:compare('Strasse', 'Straße', 'de_DE.UTF-8')", -1) except locale.Error: pass # Skip test if 'de_DE.UTF-8' is an unknown locale setting try: self.check_value("fn:compare('Strasse', 'Straße', 'deutsch')", -1) except locale.Error: pass # Skip test if 'deutsch' is an unknown locale setting with self.assertRaises(locale.Error) as cm: self.check_value("fn:compare('Strasse', 'Straße', 'invalid_collation')") self.assertIn('FOCH0002', str(cm.exception)) self.wrong_type("fn:compare('Strasse', 111)", 'XPTY0004') self.wrong_type('fn:compare("1234", 1234)', 'XPTY0004') finally: locale.setlocale(locale.LC_COLLATE, env_locale_setting) def test_normalize_unicode_function(self): self.check_value('fn:normalize-unicode(())', '') self.check_value('fn:normalize-unicode("menù")', 'menù') self.wrong_type('fn:normalize-unicode(xs:hexBinary("84"))', 'XPTY0004') self.assertRaises(ValueError, self.parser.parse, 'fn:normalize-unicode("à", "FULLY-NORMALIZED")') self.check_value('fn:normalize-unicode("à", "")', 'à') self.wrong_value('fn:normalize-unicode("à", "UNKNOWN")') self.wrong_type('fn:normalize-unicode("à", ())', 'XPTY0004', "can't be an empty sequence") # https://www.w3.org/TR/charmod-norm/#normalization_forms self.check_value("fn:normalize-unicode('\u01FA')", '\u01FA') self.check_value("fn:normalize-unicode('\u01FA', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u01FA', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u01FA', 'NFKD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u00C5\u0301')", '\u01FA') self.check_value("fn:normalize-unicode('\u00C5\u0301', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u00C5\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u00C5\u0301', ' nfkd ')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u212B\u0301')", '\u01FA') self.check_value("fn:normalize-unicode('\u212B\u0301', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u212B\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u212B\u0301', 'NFKD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301')", '\u01FA') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301', 'NFD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\u0041\u030A\u0301', 'NFKD')", '\u0041\u030A\u0301') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301')", '\uFF21\u030A\u0301') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301', 'NFD')", '\uFF21\u030A\u0301') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301', 'NFKC')", '\u01FA') self.check_value("fn:normalize-unicode('\uFF21\u030A\u0301', 'NFKD')", '\u0041\u030A\u0301') def test_count_function(self): self.check_value("fn:count('')", 1) self.check_value("count('')", 1) self.check_value("fn:count('abc')", 1) self.check_value("fn:count(7)", 1) self.check_value("fn:count(())", 0) self.check_value("fn:count((1, 2, 3))", 3) self.check_value("fn:count((1, 2, ()))", 2) self.check_value("fn:count((((()))))", 0) self.check_value("fn:count((((), (), ()), (), (), (), ()))", 0) self.check_value('fn:count((1, 2 to ()))', 1) self.check_value("count(('1', (2, ())))", 2) self.check_value("count(('1', (2, '3')))", 3) self.check_value("count(1 to 5)", 5) self.check_value("count(reverse((1, 2, 3, 4)))", 4) root = self.etree.XML('') self.check_selector("count(5)", root, 1) self.check_value("count((0, 1, 2 + 1, 3 - 1))", 4) self.check_value('fn:count((xs:decimal("-999999999999999999")))', 1) self.check_value('fn:count((xs:float("0")))', 1) self.check_value("count(//*[@name='John Doe'])", MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value("count(//*[@name='John Doe'])", 0, context) with self.assertRaises(TypeError) as cm: self.check_value("fn:count()") self.assertIn('XPST0017', str(cm.exception)) with self.assertRaises(TypeError) as cm: self.check_value("fn:count(1, ())") self.assertIn('XPST0017', str(cm.exception)) with self.assertRaises(TypeError) as cm: self.check_value("fn:count(1, 2)") self.assertIn('XPST0017', str(cm.exception)) def test_lower_case_function(self): self.check_value('lower-case("aBcDe01")', 'abcde01') self.check_value('lower-case(("aBcDe01"))', 'abcde01') self.check_value('lower-case(())', '') self.wrong_type('lower-case((10))') root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[lower-case(@id) = 'a_id']", root, [root[0]]) self.check_selector("a[lower-case(@id) = 'a_i']", root, []) self.check_selector("//b[lower-case(.) = 'some content']", root, [root[0][0]]) self.check_selector("//b[lower-case((.)) = 'some content']", root, [root[0][0]]) self.check_selector("//none[lower-case((.)) = 'some content']", root, []) def test_upper_case_function(self): self.check_value('upper-case("aBcDe01")', 'ABCDE01') self.check_value('upper-case(("aBcDe01"))', 'ABCDE01') self.check_value('upper-case(())', '') self.wrong_type('upper-case((10))', 'XPTY0004') root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[upper-case(@id) = 'A_ID']", root, [root[0]]) self.check_selector("a[upper-case(@id) = 'A_I']", root, []) self.check_selector("//b[upper-case(.) = 'SOME CONTENT']", root, [root[0][0]]) self.check_selector("//b[upper-case((.)) = 'SOME CONTENT']", root, [root[0][0]]) self.check_selector("//none[upper-case((.)) = 'SOME CONTENT']", root, []) def test_encode_for_uri_function(self): self.check_value('encode-for-uri("http://xpath.test")', 'http%3A%2F%2Fxpath.test') self.check_value('encode-for-uri("~bébé")', '~b%C3%A9b%C3%A9') self.check_value('encode-for-uri("100% organic")', '100%25%20organic') self.check_value('encode-for-uri("")', '') self.check_value('encode-for-uri(())', '') def test_iri_to_uri_function(self): self.check_value('iri-to-uri("http://www.example.com/00/Weather/CA/Los%20Angeles#ocean")', 'http://www.example.com/00/Weather/CA/Los%20Angeles#ocean') self.check_value('iri-to-uri("http://www.example.com/~bébé")', 'http://www.example.com/~b%C3%A9b%C3%A9') self.check_value('iri-to-uri("")', '') self.check_value('iri-to-uri(())', '') def test_escape_html_uri_function(self): self.check_value( 'escape-html-uri("http://www.example.com/00/Weather/CA/Los Angeles#ocean")', 'http://www.example.com/00/Weather/CA/Los Angeles#ocean' ) self.check_value("escape-html-uri(\"javascript:if (navigator.browserLanguage == 'fr') " "window.open('http://www.example.com/~bébé');\")", "javascript:if (navigator.browserLanguage == 'fr') " "window.open('http://www.example.com/~b%C3%A9b%C3%A9');") self.check_value('escape-html-uri("")', '') self.check_value('escape-html-uri(())', '') def test_string_join_function(self): self.check_value("string-join(('Now', 'is', 'the', 'time', '...'), ' ')", "Now is the time ...") self.check_value("string-join(('Blow, ', 'blow, ', 'thou ', 'winter ', 'wind!'), '')", 'Blow, blow, thou winter wind!') self.check_value("string-join((), 'separator')", '') self.check_value("string-join(('a', 'b', 'c'), ', ')", 'a, b, c') self.wrong_type("string-join(('a', 'b', 'c'), 8)", 'XPTY0004') if self.parser.version < '3.1': self.wrong_type("string-join(('a', 4, 'c'), ', ')", 'XPTY0004') else: self.check_value("string-join(('a', 4, 'c'), ', ')", 'a, 4, c') root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[string-join((@id, 'foo', 'bar'), ' ') = 'a_id foo bar']", root, [root[0]]) self.check_selector("a[string-join((@id, 'foo'), ',') = 'a_id,foo']", root, [root[0]]) self.check_selector("//b[string-join((., 'bar'), ' ') = 'some content bar']", root, [root[0][0]]) self.check_selector("//b[string-join((., 'bar'), ',') = 'some content,bar']", root, [root[0][0]]) self.check_selector("//b[string-join((., 'bar'), ',') = 'some content bar']", root, []) self.check_selector("//none[string-join((., 'bar'), ',') = 'some content,bar']", root, []) def test_matches_function(self): self.check_value('fn:matches("abracadabra", "bra")', True) self.check_value('fn:matches("abracadabra", "^a.*a$")', True) self.check_value('fn:matches("abracadabra", "^bra")', False) self.wrong_value('fn:matches("abracadabra", "bra", "k")') self.wrong_value('fn:matches("abracadabra", "[bra")') self.wrong_value('fn:matches("abracadabra", "a{1,99999999999999999999999999}")', 'FORX0002') self.check_value('fn:matches("1", "\\S")', True) self.check_value('fn:matches(" ", "\\S")', False) self.check_value('fn:matches("", "\\S")', False) self.check_value('fn:matches("\t", "\\S")', False) self.check_value('fn:matches(" foo bar", "\\S")', True) if platform.python_implementation() != 'PyPy' or self.etree is not lxml_etree: poem_context = XPathContext(root=self.etree.XML(XML_POEM_TEST)) self.check_value('fn:matches(., "Kaum.*krähen")', False, context=poem_context) self.check_value('fn:matches(., "Kaum.*krähen", "s")', True, context=poem_context) self.check_value('fn:matches(., "^Kaum.*gesehen,$", "m")', True, context=poem_context) self.check_value('fn:matches(., "^Kaum.*gesehen,$")', False, context=poem_context) self.check_value('fn:matches(., "kiki", "i")', True, context=poem_context) root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[matches(@id, '^a_id$')]", root, [root[0]]) self.check_selector("a[matches(@id, 'a.id')]", root, [root[0]]) self.check_selector("a[matches(@id, '_id')]", root, [root[0]]) self.check_selector("a[matches(@id, 'a!')]", root, []) self.check_selector("//b[matches(., '^some.content$')]", root, [root[0][0]]) self.check_selector("//b[matches(., '^content')]", root, []) self.check_selector("//none[matches(., '.*')]", root, []) def test_ends_with_function(self): self.check_value('fn:ends-with("abracadabra", "bra")', True) self.check_value('fn:ends-with("abracadabra", "a")', True) self.check_value('fn:ends-with("abracadabra", "cbra")', False) root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[ends-with(@id, 'a_id')]", root, [root[0]]) self.check_selector("a[ends-with(@id, 'id')]", root, [root[0]]) self.check_selector("a[ends-with(@id, 'a!')]", root, []) self.check_selector("//b[ends-with(., 'some content')]", root, [root[0][0]]) self.check_selector("//b[ends-with(., 't')]", root, [root[0][0]]) self.check_selector("//none[ends-with(., 's')]", root, []) self.check_value('fn:ends-with ( "tattoo", "tattoo", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', True) self.check_value('fn:ends-with ( "tattoo", "atto", "http://www.w3.org/' '2005/xpath-functions/collation/codepoint")', False) self.check_value("ends-with((), ())", True) def test_replace_function(self): self.check_value('fn:replace("abracadabra", "bra", "*")', "a*cada*") self.check_value('fn:replace("abracadabra", "a.*a", "*")', "*") self.check_value('fn:replace("abracadabra", "a.*?a", "*")', "*c*bra") self.check_value('fn:replace("abracadabra", "a", "")', "brcdbr") self.check_value('fn:replace("abracadabra", "a", "", "i")', "brcdbr") self.wrong_value('fn:replace("abracadabra", "a", "", "z")') self.wrong_value('fn:replace("abracadabra", "[a", "")') self.wrong_type('fn:replace("abracadabra")') self.check_value('fn:replace("abracadabra", "a(.)", "a$1$1")', "abbraccaddabbra") self.wrong_value('replace("abc", "a(.)", "$x")', 'FORX0004', 'Invalid replacement string') self.wrong_value('fn:replace("abracadabra", ".*?", "$1")') self.check_value('fn:replace("AAAA", "A+", "b")', "b") self.check_value('fn:replace("AAAA", "A+?", "b")', "bbbb") self.check_value('fn:replace("darted", "^(.*?)d(.*)$", "$1c$2")', "carted") self.check_value('fn:replace("abcd", "(ab)|(a)", "[1=$1][2=$2]")', "[1=ab][2=]cd") root = self.etree.XML(XML_GENERIC_TEST) self.check_selector("a[replace(@id, '^a_id$', 'foo') = 'foo']", root, [root[0]]) self.check_selector("a[replace(@id, 'a.id', 'foo') = 'foo']", root, [root[0]]) self.check_selector("a[replace(@id, '_id', 'no') = 'ano']", root, [root[0]]) self.check_selector("//b[replace(., '^some.content$', 'new') = 'new']", root, [root[0][0]]) self.check_selector("//b[replace(., '^content', '') = '']", root, []) self.check_selector("//none[replace(., '.*', 'a') = 'a']", root, []) def test_tokenize_function(self): self.check_value('fn:tokenize("abracadabra", "(ab)|(a)")', ['', 'r', 'c', 'd', 'r', '']) self.check_value(r'fn:tokenize("The cat sat on the mat", "\s+")', ['The', 'cat', 'sat', 'on', 'the', 'mat']) self.check_value(r'fn:tokenize("1, 15, 24, 50", ",\s*")', ['1', '15', '24', '50']) self.check_value('fn:tokenize("1,15,,24,50,", ",")', ['1', '15', '', '24', '50', '']) self.check_value(r'fn:tokenize("Some unparsed
HTML
text", "\s*
\s*", "i")', ['Some unparsed', 'HTML', 'text']) self.check_value('fn:tokenize("", "(ab)|(a)")', []) self.wrong_value('fn:tokenize("abc", "[a")', 'FORX0002', 'Invalid regular expression') self.wrong_value('fn:tokenize("abc", ".*?")', 'FORX0003', 'matches zero-length string') self.wrong_value('fn:tokenize("abba", ".?")') self.wrong_value('fn:tokenize("abracadabra", "(ab)|(a)", "sxf")') self.wrong_type('fn:tokenize("abracadabra", ())') self.wrong_type('fn:tokenize("abracadabra", "(ab)|(a)", ())') def test_resolve_uri_function(self): self.check_value('fn:resolve-uri("dir1/dir2", "file:///home/")', 'file:///home/dir1/dir2') self.wrong_value('fn:resolve-uri("dir1/dir2", "home/")', '') self.wrong_value('fn:resolve-uri("dir1/dir2")') self.check_value('fn:resolve-uri((), "http://xpath.test")') self.wrong_value('fn:resolve-uri("file:://file1.txt", "http://xpath.test")', 'FORG0002', "'file:://file1.txt' is not a valid URI") self.wrong_value('fn:resolve-uri("dir1/dir2", "http:://xpath.test")', 'FORG0002', "'http:://xpath.test' is not a valid URI") self.parser.base_uri = 'http://www.example.com/ns/' try: self.check_value('fn:resolve-uri("dir1/dir2")', 'http://www.example.com/ns/dir1/dir2') self.check_value('fn:resolve-uri("/dir1/dir2")', '/dir1/dir2') self.check_value('fn:resolve-uri("file:text.txt")', 'file:text.txt') self.check_value('fn:resolve-uri(())') self.wrong_value('fn:resolve-uri("http:://xpath.test")', 'FORG0002', "'http:://xpath.test' is not a valid URI") finally: self.parser.base_uri = None def test_empty_function(self): # Test cases from https://www.w3.org/TR/xquery-operators/#general-seq-funcs self.check_value('fn:empty(("hello", "world"))', False) self.check_value('fn:empty(fn:remove(("hello", "world"), 1))', False) self.check_value('fn:empty(())', True) self.check_value("empty(() * ())", True) self.check_value('fn:empty(fn:remove(("hello"), 1))', True) self.check_value('fn:empty((xs:double("0")))', False) def test_exists_function(self): self.check_value('fn:exists(("hello", "world"))', True) self.check_value('fn:exists(())', False) self.check_value('fn:exists(fn:remove(("hello"), 1))', False) self.check_value('fn:exists((xs:int("-1873914410")))', True) def test_distinct_values_function(self): self.check_value('fn:distinct-values((1, 2.0, 3, 2))', [1, 2.0, 3]) context = XPathContext( root=self.etree.XML(''), variables={ 'x': [UntypedAtomic("foo"), UntypedAtomic("bar"), UntypedAtomic("bar")] } ) self.check_value('fn:distinct-values($x)', ['foo', 'bar'], context) context = XPathContext( root=self.etree.XML(''), variables={'x': [UntypedAtomic("foo"), float('nan'), UntypedAtomic("bar")]} ) token = self.parser.parse('fn:distinct-values($x)') results = token.evaluate(context) self.assertEqual(results[0], 'foo') self.assertTrue(math.isnan(results[1])) self.assertEqual(results[2], 'bar') root = self.etree.XML('') self.check_selector( "fn:distinct-values((xs:float('NaN'), xs:double('NaN'), xs:float('NaN')))", root, math.isnan ) self.check_value('fn:distinct-values((xs:float("0"), xs:float("0")))', [0.0]) self.check_value( 'fn:distinct-values("foo", "{}")'.format(UNICODE_CODEPOINT_COLLATION), ['foo'] ) def test_index_of_function(self): self.check_value('fn:index-of ((10, 20, 30, 40), 35)', []) self.wrong_type('fn:index-of ((10, 20, 30, 40), ())', 'XPTY0004') self.check_value('fn:index-of ((10, 20, 30, 30, 20, 10), 20)', [2, 5]) self.check_value('fn:index-of (("a", "sport", "and", "a", "pastime"), "a")', [1, 4]) self.check_value( 'fn:index-of (("foo", "bar"), "bar", "{}")'.format(UNICODE_CODEPOINT_COLLATION), [2] ) # Issue #28 root = self.etree.XML(""" 030 """) test1 = "/root/descript[index-of(('030','031'), '030')]" test2 = "/root/descript[ancestor::root/incode = '030']" test3 = "/root/descript[index-of(('030','031'), ancestor::root/incode)]" self.check_selector(test1, root, [root[1]]) self.check_selector(test2, root, [root[1]]) self.check_selector(test3, root, [root[1]]) def test_insert_before_function(self): context = XPathContext(root=self.etree.XML(''), variables={'x': ['a', 'b', 'c']}) self.check_value('fn:insert-before($x, 0, "z")', ['z', 'a', 'b', 'c'], context) self.check_value('fn:insert-before($x, 1, "z")', ['z', 'a', 'b', 'c'], context) self.check_value('fn:insert-before($x, 2, "z")', ['a', 'z', 'b', 'c'], context) self.check_value('fn:insert-before($x, 3, "z")', ['a', 'b', 'z', 'c'], context) self.check_value('fn:insert-before($x, 4, "z")', ['a', 'b', 'c', 'z'], context) self.wrong_type('fn:insert-before($x, "1", "z")', 'XPTY0004', context=context) def test_remove_function(self): context = XPathContext(root=self.etree.XML(''), variables={'x': ['a', 'b', 'c']}) self.check_value('fn:remove($x, 0)', ['a', 'b', 'c'], context) self.check_value('fn:remove($x, 1)', ['b', 'c'], context) self.check_value('remove($x, 6)', ['a', 'b', 'c'], context) self.wrong_type('remove($x, "6")', 'XPTY0004', context=context) self.check_value('fn:remove((), 3)', []) def test_reverse_function(self): context = XPathContext(root=self.etree.XML(''), variables={'x': ['a', 'b', 'c']}) self.check_value('reverse($x)', ['c', 'b', 'a'], context) self.check_value('fn:reverse(("hello"))', ['hello'], context) self.check_value('fn:reverse(())', []) def test_subsequence_function(self): self.check_value('fn:subsequence((), 5)', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 1)', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 0)', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), -1)', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 10)', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 4)', [4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 4, 2)', [4, 5]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 3, 10)', [3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), xs:float("INF"))', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), xs:float("-INF"))', [1, 2, 3, 4, 5, 6, 7]) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 5, xs:float("-INF"))', []) self.check_value('fn:subsequence((1, 2, 3, 4, 5, 6, 7), 5, xs:float("INF"))', [5, 6, 7]) def test_unordered_function(self): self.check_value('fn:unordered(())', []) self.check_value('fn:unordered(("z", 2, "3", "Z", "b", "a"))', [2, '3', 'Z', 'a', 'b', 'z']) def test_sequence_cardinality_functions(self): self.check_value('fn:zero-or-one(())', []) self.check_value('fn:zero-or-one((10))', [10]) self.wrong_value('fn:zero-or-one((10, 20))') self.wrong_value('fn:one-or-more(())') self.check_value('fn:one-or-more((10))', [10]) self.check_value('fn:one-or-more((10, 20, 30, 40))', [10, 20, 30, 40]) self.check_value('fn:exactly-one((20))', [20]) self.wrong_value('fn:exactly-one(())') self.wrong_value('fn:exactly-one((10, 20, 30, 40))') def test_qname_function(self): self.check_value('fn:string(fn:QName("", "person"))', 'person') self.check_value('fn:string(fn:QName((), "person"))', 'person') self.check_value('fn:string(fn:QName("http://www.example.com/ns/", "person"))', 'person') self.check_value('fn:string(fn:QName("http://www.example.com/ns/", "ht:person"))', 'ht:person') self.check_value('fn:string(fn:QName("http://www.example.com/ns/", "xs:person"))', 'xs:person') self.wrong_value('fn:QName("http://www.example.com/ns/", "@person")') self.wrong_type('fn:QName(1.0, "person")', 'XPTY0004', '1st argument has an invalid type') self.wrong_type('fn:QName("", 2)', 'XPTY0004', '2nd argument has an invalid type') self.wrong_value('fn:QName("", "3")', 'FOCA0002', 'invalid value') self.wrong_value('fn:QName("", "xs:int")', 'FOCA0002', 'cannot associate a non-empty prefix with no namespace') self.wrong_type('fn:QName("http://www.example.com/ns/")', 'XPST0017', '2nd argument missing') self.wrong_type('fn:QName("http://www.example.com/ns/", "person"', 'XPST0017', 'Wrong number of arguments') if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.xpath_proxy): context = self.parser.schema.get_context() self.check_value('fn:QName("http://www.example.com/ns/", "@person")', expected=ValueError, context=context) def test_prefix_from_qname_function(self): self.check_value( 'fn:prefix-from-QName(fn:QName("http://www.example.com/ns/", "ht:person"))', 'ht' ) self.check_value( 'fn:prefix-from-QName(fn:QName("http://www.example.com/ns/", "person"))', [] ) self.check_value('fn:prefix-from-QName(())', []) self.check_value('fn:prefix-from-QName(7)', TypeError) self.check_value('fn:prefix-from-QName("7")', TypeError) def test_local_name_from_qname_function(self): self.check_value( 'fn:local-name-from-QName(fn:QName("http://www.example.com/ns/", "person"))', 'person' ) self.check_value('fn:local-name-from-QName(())') self.check_value('fn:local-name-from-QName(8)', TypeError) self.check_value('fn:local-name-from-QName("8")', TypeError) def test_namespace_uri_from_qname_function(self): root = self.etree.XML('' ' ' ' ' '') self.check_value( 'fn:namespace-uri-from-QName(fn:QName("http://www.example.com/ns/", "person"))', 'http://www.example.com/ns/' ) self.check_value('fn:namespace-uri-from-QName(())') self.check_value('fn:namespace-uri-from-QName(1)', TypeError) self.check_value('fn:namespace-uri-from-QName("1")', TypeError) self.check_selector("fn:namespace-uri-from-QName(xs:QName('p3:C3'))", root, KeyError) self.check_selector("fn:namespace-uri-from-QName(xs:QName('p3:C3'))", root, ValueError, namespaces={'p3': ''}) def test_resolve_qname_function(self): root = self.etree.XML('' ' ' ' ' '') context = XPathContext(root=root, namespaces=self.namespaces) self.check_value("fn:resolve-QName((), .)", context=context) if self.etree is lxml_etree: self.check_value("fn:string(fn:resolve-QName('eg:C2', .))", KeyError, context=context) self.check_selector("fn:resolve-QName('p3:C3', .)", root, KeyError, namespaces={'p3': ''}) else: self.check_value("fn:string(fn:resolve-QName('eg:C2', .))", 'eg:C2', context=context) self.check_selector("fn:resolve-QName('p3:C3', .)", root, ValueError, namespaces={'p3': ''}) self.check_raise("fn:resolve-QName('p3:C3', .)", KeyError, 'FONS0004', "no namespace found for prefix 'p3'", context=context) self.check_value("fn:resolve-QName('C3', .)", QName('', 'C3'), context=context) self.check_value("fn:resolve-QName(2, .)", TypeError, context=context) self.check_value("fn:resolve-QName('2', .)", ValueError, context=context) self.check_value("fn:resolve-QName((), 4)", context=context) self.wrong_type("fn:resolve-QName('p3:C3', 4)", 'FORG0006', '2nd argument 4 is not an element node', context=context) root = self.etree.XML('') self.check_selector("fn:resolve-QName('C3', .)", root, [QName('', 'C3')], namespaces={'': ''}) self.check_selector("fn:resolve-QName('xml:lang', .)", root, [QName(XML_NAMESPACE, 'lang')]) def test_namespace_uri_for_prefix_function(self): root = self.etree.XML('' ' ' ' ' '') context = XPathContext(root=root) self.check_value("fn:namespace-uri-for-prefix('p1', .)", context=context) self.check_value("fn:namespace-uri-for-prefix(4, .)", TypeError, context=context) self.check_value("fn:namespace-uri-for-prefix('p1', 9)", TypeError, context=context) self.check_value("fn:namespace-uri-for-prefix('eg', .)", 'http://www.example.com/ns/', context=context) self.check_selector("fn:namespace-uri-for-prefix('p3', .)", root, NameError, namespaces={'p3': ''}) # Note: default namespace for XPath 2 tests is 'http://www.example.com/ns/' self.check_value("fn:namespace-uri-for-prefix('', .)", context=context) self.check_value( 'fn:namespace-uri-from-QName(fn:QName("http://www.example.com/ns/", "person"))', 'http://www.example.com/ns/' ) self.check_value("fn:namespace-uri-for-prefix('', .)", context=context) self.check_value("fn:namespace-uri-for-prefix((), .)", context=context) def test_in_scope_prefixes_function(self): root = self.etree.XML('' ' ' ' ' '') namespaces = {'p0': 'ns0', 'p2': 'ns2'} prefixes = select(root, "fn:in-scope-prefixes(.)", namespaces, parser=type(self.parser)) if self.etree is lxml_etree: self.assertIn('p0', prefixes) self.assertIn('p1', prefixes) self.assertNotIn('p2', prefixes) else: self.assertIn('p0', prefixes) self.assertNotIn('p1', prefixes) self.assertIn('p2', prefixes) # Provides namespaces through the dynamic context selector = Selector("fn:in-scope-prefixes(.)", parser=type(self.parser)) prefixes = selector.select(root, namespaces=namespaces) self.assertIn('p0', prefixes) self.assertNotIn('p1', prefixes) self.assertIn('p2', prefixes) with self.assertRaises(TypeError): select(root, "fn:in-scope-prefixes('')", namespaces, parser=type(self.parser)) root = self.etree.XML(''.format(XML_NAMESPACE)) namespaces = {'tns': 'ns1', 'xml': XML_NAMESPACE} prefixes = select(root, "fn:in-scope-prefixes(.)", namespaces, parser=type(self.parser)) if self.etree is lxml_etree: self.assertIn('tns', prefixes) self.assertIn('xml', prefixes) self.assertNotIn('fn', prefixes) else: self.assertIn('tns', prefixes) self.assertIn('xml', prefixes) self.assertIn('fn', prefixes) if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.xpath_proxy): context = self.parser.schema.get_context() prefixes = {'xml', 'xs', 'fn', 'err', 'xsi', 'eg', 'tst'} if self.parser.version >= '3.0': prefixes.add('math') if self.parser.version >= '3.1': prefixes.add('map') prefixes.add('array') self.check_value("fn:in-scope-prefixes(.)", prefixes, context) def test_datetime_function(self): tz = Timezone(datetime.timedelta(hours=5, minutes=24)) self.check_value('fn:dateTime((), xs:time("24:00:00"))', []) self.check_value('fn:dateTime(xs:date("1999-12-31"), ())', []) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', datetime.datetime(1999, 12, 31, 12, 0)) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("24:00:00"))', datetime.datetime(1999, 12, 31, 0, 0)) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("13:00:00+05:24"))', datetime.datetime(1999, 12, 31, 13, 0, tzinfo=tz)) self.wrong_value('fn:dateTime(xs:date("1999-12-31+03:00"), xs:time("13:00:00+05:24"))', 'FORG0008', 'inconsistent timezones') self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime10) with self.assertRaises(AssertionError): self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime) self.parser._xsd_version = '1.1' try: self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime(1999, 12, 31, 12)) self.check_value('fn:dateTime(xs:date("1999-12-31"), xs:time("12:00:00"))', DateTime) finally: self.parser._xsd_version = '1.0' def test_year_from_datetime_function(self): self.check_value('fn:year-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 1999) self.check_value('fn:year-from-dateTime(xs:dateTime("1999-05-31T21:30:00-05:00"))', 1999) self.check_value('fn:year-from-dateTime(xs:dateTime("1999-12-31T19:20:00"))', 1999) self.check_value('fn:year-from-dateTime(xs:dateTime("1999-12-31T24:00:00"))', 2000) self.check_value('fn:year-from-dateTime(())') def test_month_from_datetime_function(self): self.check_value('fn:month-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 5) self.check_value('fn:month-from-dateTime(xs:dateTime("1999-12-31T19:20:00-05:00"))', 12) self.check_value('fn:month-from-dateTime(fn:adjust-dateTime-to-timezone(xs:dateTime(' '"1999-12-31T19:20:00-05:00"), xs:dayTimeDuration("PT0S")))', 1) def test_day_from_datetime_function(self): self.check_value('fn:day-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 31) self.check_value('fn:day-from-dateTime(xs:dateTime("1999-12-31T20:00:00-05:00"))', 31) self.check_value('fn:day-from-dateTime(fn:adjust-dateTime-to-timezone(xs:dateTime(' '"1999-12-31T19:20:00-05:00"), xs:dayTimeDuration("PT0S")))', 1) def test_hours_from_datetime_function(self): self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-05-31T08:20:00-05:00")) ', 8) self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-12-31T21:20:00-05:00"))', 21) self.check_value('fn:hours-from-dateTime(fn:adjust-dateTime-to-timezone(xs:dateTime(' '"1999-12-31T21:20:00-05:00"), xs:dayTimeDuration("PT0S")))', 2) self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-12-31T12:00:00")) ', 12) self.check_value('fn:hours-from-dateTime(xs:dateTime("1999-12-31T24:00:00"))', 0) def test_minutes_from_datetime_function(self): self.check_value('fn:minutes-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 20) self.check_value('fn:minutes-from-dateTime(xs:dateTime("1999-05-31T13:30:00+05:30"))', 30) def test_seconds_from_datetime_function(self): self.check_value('fn:seconds-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', 0) self.check_value('seconds-from-dateTime(xs:dateTime("2001-02-03T08:23:12.43"))', Decimal('12.43')) def test_timezone_from_datetime_function(self): self.check_value('fn:timezone-from-dateTime(xs:dateTime("1999-05-31T13:20:00-05:00"))', DayTimeDuration(seconds=-18000)) self.check_value('fn:timezone-from-dateTime(())') def test_year_from_date_function(self): self.check_value('fn:year-from-date(xs:date("1999-05-31"))', 1999) self.check_value('fn:year-from-date(xs:date("2000-01-01+05:00"))', 2000) self.check_value('year-from-date(())') def test_month_from_date_function(self): self.check_value('fn:month-from-date(xs:date("1999-05-31-05:00"))', 5) self.check_value('fn:month-from-date(xs:date("2000-01-01+05:00"))', 1) def test_day_from_date_function(self): self.check_value('fn:day-from-date(xs:date("1999-05-31-05:00"))', 31) self.check_value('fn:day-from-date(xs:date("2000-01-01+05:00"))', 1) def test_timezone_from_date_function(self): self.check_value('fn:timezone-from-date(xs:date("1999-05-31-05:00"))', DayTimeDuration.fromstring('-PT5H')) self.check_value('fn:timezone-from-date(xs:date("2000-06-12Z"))', DayTimeDuration.fromstring('PT0H')) self.check_value('fn:timezone-from-date(xs:date("2000-06-12"))') def test_hours_from_time_function(self): self.check_value('fn:hours-from-time(xs:time("11:23:00"))', 11) self.check_value('fn:hours-from-time(xs:time("21:23:00"))', 21) self.check_value('fn:hours-from-time(xs:time("01:23:00+05:00"))', 1) self.check_value('fn:hours-from-time(fn:adjust-time-to-timezone(xs:time("01:23:00+05:00"), ' 'xs:dayTimeDuration("PT0S")))', 20) self.check_value('fn:hours-from-time(xs:time("24:00:00"))', 0) def test_minutes_from_time_function(self): self.check_value('fn:minutes-from-time(xs:time("13:00:00Z"))', 0) self.check_value('fn:minutes-from-time(xs:time("09:45:10"))', 45) def test_seconds_from_time_function(self): self.check_value('fn:seconds-from-time(xs:time("13:20:10.5"))', 10.5) self.check_value('fn:seconds-from-time(xs:time("20:50:10.0"))', 10.0) self.check_value('fn:seconds-from-time(xs:time("03:59:59.000001"))', Decimal('59.000001')) def test_timezone_from_time_function(self): self.check_value('fn:timezone-from-time(xs:time("13:20:00-05:00"))', DayTimeDuration.fromstring('-PT5H')) self.check_value('timezone-from-time(())') def test_years_from_duration_function(self): self.check_value('fn:years-from-duration(())') self.check_value('fn:years-from-duration(xs:yearMonthDuration("P20Y15M"))', 21) self.check_value('fn:years-from-duration(xs:yearMonthDuration("-P15M"))', -1) self.check_value('fn:years-from-duration(xs:dayTimeDuration("-P2DT15H"))', 0) def test_months_from_duration_function(self): self.check_value('fn:months-from-duration(())') self.check_value('fn:months-from-duration(xs:yearMonthDuration("P20Y15M"))', 3) self.check_value('fn:months-from-duration(xs:yearMonthDuration("-P20Y18M"))', -6) self.check_value('fn:months-from-duration(xs:dayTimeDuration("-P2DT15H0M0S"))', 0) def test_days_from_duration_function(self): self.check_value('fn:days-from-duration(())') self.check_value('fn:days-from-duration(xs:dayTimeDuration("P3DT10H"))', 3) self.check_value('fn:days-from-duration(xs:dayTimeDuration("P3DT55H"))', 5) self.check_value('fn:days-from-duration(xs:yearMonthDuration("P3Y5M"))', 0) def test_hours_from_duration_function(self): self.check_value('fn:hours-from-duration(())') self.check_value('fn:hours-from-duration(xs:dayTimeDuration("P3DT10H"))', 10) self.check_value('fn:hours-from-duration(xs:dayTimeDuration("P3DT12H32M12S"))', 12) self.check_value('fn:hours-from-duration(xs:dayTimeDuration("PT123H"))', 3) self.check_value('fn:hours-from-duration(xs:dayTimeDuration("-P3DT10H"))', -10) def test_minutes_from_duration_function(self): self.check_value('fn:minutes-from-duration(())') self.check_value('fn:minutes-from-duration(xs:dayTimeDuration("P3DT10H"))', 0) self.check_value('fn:minutes-from-duration(xs:dayTimeDuration("-P5DT12H30M"))', -30) def test_seconds_from_duration_function(self): self.check_value('fn:seconds-from-duration(())') self.check_value('fn:seconds-from-duration(xs:dayTimeDuration("P3DT10H12.5S"))', 12.5) self.check_value('fn:seconds-from-duration(xs:dayTimeDuration("-PT256S"))', -16.0) def test_node_accessor_functions(self): root = self.etree.XML('' 'simple text' % XSI_NAMESPACE) self.check_selector("node-name(.)", root, QName('', 'A')) self.check_selector("node-name(/A/B1)", root, QName('', 'B1')) self.check_selector("node-name(/A/*)", root, TypeError) # Not allowed more than one item! self.check_selector("nilled(./B1/C1)", root, False) self.check_selector("nilled(./B1/C2)", root, True) self.check_raise("nilled(.)", MissingContextError) context = XPathContext(root) self.check_value('nilled(())', context=context) self.wrong_type('nilled(8)', 'XPTY0004', 'an XPath node required', context=context) self.check_value('node-name(())', context=context) self.wrong_type('node-name(8)', 'XPTY0004', 'an XPath node required', context=context) self.check_value('node-name(.)', context=XPathContext(self.etree.ElementTree(root))) root = self.etree.XML('') self.check_value('node-name(.)', QName('http://xpath.test/ns', 'root'), context=XPathContext(root)) self.check_value('node-name(./@tst:a)', QName('http://xpath.test/ns', 'a'), context=XPathContext(root)) root = self.etree.XML('') self.check_value('node-name(./@a)', QName('', 'a'), context=XPathContext(root)) root = self.etree.XML('') self.check_raise('node-name(.)', KeyError, 'FONS0004', 'no prefix found for namespace http://xpath.test/ns0', context=XPathContext(root)) def test_string_and_data_functions(self): root = self.etree.XML(' a text, an inner text, a tail, ' 'an ending text ') self.check_selector("/*/string()", root, [' a text, an inner text, a tail, an ending text ']) self.check_selector("string(.)", root, ' a text, an inner text, a tail, an ending text ') self.check_selector("data(.)", root, ' a text, an inner text, a tail, an ending text ') self.check_selector("data(.)", root, UntypedAtomic) self.check_selector("data(())", root, []) self.check_value("string()", MissingContextError) context = XPathContext(root=self.etree.XML('')) parser = XPath2Parser(base_uri='http://www.example.com/ns/') self.assertEqual(parser.parse('data(fn:resolve-uri(()))').evaluate(context), []) @unittest.skipIf(xmlschema is None, "The xmlschema library is not installed") def test_data_function_with_typed_nodes(self): schema = xmlschema.XMLSchema(dedent("""\ """)) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root = self.etree.XML('') self.wrong_value("data(/root)", 'FOTY0012', 'argument node', 'does not have a typed value', context=XPathContext(root, schema=self.parser.schema)) self.wrong_value("data(.)", 'FOTY0012', 'argument node', 'does not have a typed value', context=XPathContext(root, schema=self.parser.schema)) finally: self.parser.schema = None def test_node_set_id_function(self): root = self.etree.XML('') self.check_selector('element-with-id("foo")', root, [root[0]]) self.check_selector('id("foo")', root, [root[0]]) doc = self.etree.ElementTree(root) root = doc.getroot() self.check_selector('id("foo")', doc, [root[0]]) self.check_selector('id("fox")', doc, []) self.check_selector('id("foo baz")', doc, [root[0], root[3]]) self.check_selector('id(("foo", "baz"))', doc, [root[0], root[3]]) self.check_selector('id(("foo", "baz bar"))', doc, [root[0], root[2], root[3]]) self.check_selector('id("baz bar foo")', doc, [root[0], root[2], root[3]]) # From XPath documentation doc = self.etree.parse(io.StringIO(""" E21256 John Brown """)) root = doc.getroot() self.check_selector("id('ID21256')", doc, [root]) self.check_selector("id('E21256')", doc, [root[0]]) self.check_selector('element-with-id("ID21256")', doc, [root]) self.check_selector('element-with-id("E21256")', doc, [root]) with self.assertRaises(MissingContextError) as err: self.check_value("id('ID21256')") self.assertIn('XPDY0002', str(err.exception)) context = XPathContext(doc, variables={'x': 11}) with self.assertRaises(TypeError) as err: self.check_value("id('ID21256', $x)", context=context) self.assertIn('XPTY0004', str(err.exception)) context = XPathContext(doc, item=11, variables={'x': 11}) with self.assertRaises(TypeError) as err: self.check_value("id('ID21256', $x)", context=context) self.assertIn('XPTY0004', str(err.exception)) context = XPathContext(doc, item=root, variables={'x': root}) self.check_value("id('ID21256', $x)", [context.root.getroot()], context=context) # Id on root element root = self.etree.XML("E21256") self.check_selector("id('E21256')", root, [root]) self.check_selector('element-with-id("E21256")', root, []) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed ...") def test_node_set_id_function_with_schema(self): root = self.etree.XML(dedent("""\ E21256 John Brown """)) doc = self.etree.ElementTree(root) # Test with matching value of type xs:ID schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertTrue(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('ID21256')", [context.root.getroot()], context) # self.check_select("id('E21256')", [root[0]], context) # Test with matching value of type xs:string schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertTrue(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('E21256')", [], context) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed ...") def test_node_set_id_function_with_wrong_schema(self): root = self.etree.XML(dedent("""\ E21256 John Brown """)) doc = self.etree.ElementTree(root) schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertFalse(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('ID21256')", [context.root.getroot()], context) self.check_select("id('E21256')", [], context) schema = xmlschema.XMLSchema(dedent("""\ """)) self.assertFalse(schema.is_valid(root)) with self.schema_bound_parser(schema.xpath_proxy): context = XPathContext(doc) self.check_select("id('ID21256')", [context.root.getroot()], context) self.check_select("id('E21256')", [], context) def test_node_set_idref_function(self): doc = self.etree.parse(io.StringIO(""" E21256 John Brown E21257 John Doe """)) root = doc.getroot() self.check_value("idref('ID21256')", MissingContextError) self.check_selector("idref('ID21256')", doc, []) self.check_selector("idref('E21256')", doc, [root[0][0]]) self.check_selector("idref('ID21256')", root, []) context = XPathContext(doc, variables={'x': 11}) self.wrong_type("idref('ID21256', $x)", 'XPTY0004', context=context) context = XPathContext(doc, item=root, variables={'x': root}) self.check_value("idref('ID21256', $x)", [], context=context) context = XPathContext(doc, item=root) context.variables = { 'x': TextAttributeNode(XML_ID, 'ID21256', parent=context.root[0]) } self.check_value("idref('ID21256', $x)", [], context=context) context = XPathContext(root, variables={'x': None}) self.wrong_type("idref('ID21256', $x)", 'XPTY0004', context=context) context = XPathContext(root, variables={'x': []}) self.wrong_type("idref('ID21256', $x)", 'XPTY0004', context=context) context = XPathContext(root) self.wrong_type("idref('ID21256', ())", 'XPTY0004', context=context) def test_deep_equal_function(self): root = self.etree.XML(""" """) context = XPathContext(root, variables={'xt': root}) self.check_value('fn:deep-equal($xt, $xt)', True, context=context) self.check_value('deep-equal($xt, $xt/*)', False, context=context) self.check_value('deep-equal($xt/name[1], $xt/name[2])', False, context=context) self.check_value('deep-equal($xt/name[1], $xt/name[3])', True, context=context) self.check_value('deep-equal($xt/name[1], $xt/name[3]/@last)', False, context=context) self.check_value('deep-equal($xt/name[1]/@last, $xt/name[3]/@last)', True, context=context) self.check_value('deep-equal($xt/name[1]/@last, $xt/name[2]/@last)', False, context=context) self.check_value('deep-equal($xt/name[1], "Peter Parker")', False, context=context) root = self.etree.XML("""""") context = XPathContext(root, variables={'xt': root}) self.check_value('deep-equal($xt, $xt)', True, context=context) self.check_value('deep-equal((1, 2, 3), (1, 2, 3))', True) self.check_value('deep-equal((1, 2, 3), (1, (), 3))', False) self.check_value('deep-equal((true(), 2, 3), (1, 2, 3))', False) self.check_value('deep-equal((true(), 2, 3), (true(), 2, 3))', True) self.check_value('deep-equal((1, 2, 3), (true(), 2, 3))', False) self.check_value('deep-equal((xs:untypedAtomic("1"), 2, 3), (1, 2, 3))', False) self.check_value('deep-equal((1, 2, 3), (xs:untypedAtomic("1"), 2, 3))', False) self.check_value( 'deep-equal((xs:untypedAtomic("1"), 2, 3), (xs:untypedAtomic("2"), 2, 3))', False ) self.check_value( 'deep-equal((xs:untypedAtomic("1"), 2, 3), (xs:untypedAtomic("1"), 2, 3))', True ) self.check_value('deep-equal((), (1, 2, 3))', False) self.check_value('deep-equal((1, 2, 3), (1, 2, 4))', False) self.check_value("deep-equal((1, 2, 3), (1, '2', 3))", False) self.check_value("deep-equal(('1', '2', '3'), ('1', '2', '3'))", True) self.check_value("deep-equal(('1', '2', '3'), ('1', '4', '3'))", False) self.check_value("deep-equal((1, 2, 3), (1, 2, 3), 'en_US.UTF-8')", True) self.check_value('fn:deep-equal(xs:float("NaN"), xs:double("NaN"))', True) self.check_value('fn:deep-equal(xs:float("NaN"), 1.0)', False) self.check_value('fn:deep-equal(1.0, xs:double("NaN"))', False) self.check_value('deep-equal((1.1E0, 2E0, 3), (1.1, 2.0, 3))', True) self.check_value('deep-equal((1.1E0, 2E0, 3), (1.1, 2.1, 3))', False) self.check_value('deep-equal((1E0, 2E0, 3), (1, 2, 3))', True) self.check_value('deep-equal((1E0, 2E0, 3), (1, 4, 3))', False) self.check_value('deep-equal((1.1, 2.0, 3), (1.1E0, 2E0, 3))', True) self.check_value('deep-equal((1.1, 2.1, 3), (1.1E0, 2E0, 3))', False) self.check_value('deep-equal((1, 2, 3), (1E0, 2E0, 3))', True) self.check_value('deep-equal((1, 4, 3), (1E0, 2E0, 3))', False) self.check_value('deep-equal(3.1, xs:anyURI("http://xpath.test")) ', False) context = XPathContext(root) context.variables = {'a': [TextNode('alpha')], 'b': [TextNode('beta')]} self.check_value('deep-equal($a, $a)', True, context=context) self.check_value('deep-equal($a, $b)', False, context=context) context = XPathContext(root) context.variables = {'a': [TextAttributeNode('a', '10')], 'b': [TextAttributeNode('b', '10')]} self.check_value('deep-equal($a, $a)', True, context=context) self.check_value('deep-equal($a, $b)', False, context=context) context.variables = {'a': [NamespaceNode('tns0', 'http://xpath.test/ns')], 'b': [NamespaceNode('tns1', 'http://xpath.test/ns')]} self.check_value('deep-equal($a, $a)', True, context=context) self.check_value('deep-equal($a, $b)', False, context=context) def test_deep_equal_function_on_nested_sequences(self): self.check_value('fn:deep-equal(1, 1)', True) self.check_value('fn:deep-equal(1, (1))', True) self.check_value('fn:deep-equal(1, (1, ()))', True) self.check_value('fn:deep-equal(1, (1, (1)))', False) self.check_value('fn:deep-equal((1, ()), (1, (1)))', False) self.check_value('fn:deep-equal(((1), 1), (1, (1)))', True) def test_adjust_datetime_to_timezone_function(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00'), variables={'tz': DayTimeDuration.fromstring("-PT10H")}) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"))', DateTime.fromstring('2002-03-07T12:00:00-05:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"))', DateTime.fromstring('2002-03-07T10:00:00')) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"))', DateTime.fromstring('2002-03-07T10:00:00-05:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"), $tz)', DateTime.fromstring('2002-03-07T10:00:00-10:00'), context) self.check_value( 'fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"), $tz)', DateTime.fromstring('2002-03-07T07:00:00-10:00'), context ) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"), ' 'xs:dayTimeDuration("PT10H"))', DateTime.fromstring('2002-03-08T03:00:00+10:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T00:00:00+01:00"), ' 'xs:dayTimeDuration("-PT8H"))', DateTime.fromstring('2002-03-06T15:00:00-08:00'), context) self.check_value('fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00"), ())', DateTime.fromstring('2002-03-07T10:00:00'), context) self.check_value( 'fn:adjust-dateTime-to-timezone(xs:dateTime("2002-03-07T10:00:00-07:00"), ())', DateTime.fromstring('2002-03-07T10:00:00'), context ) self.check_value('fn:adjust-dateTime-to-timezone((), ())') def test_adjust_date_to_timezone_function(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00'), variables={'tz': DayTimeDuration.fromstring("-PT10H")}) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07"))', Date.fromstring('2002-03-07-05:00'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07-07:00"))', Date.fromstring('2002-03-07-05:00'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07"), $tz)', Date.fromstring('2002-03-07-10:00'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07"), ())', Date.fromstring('2002-03-07'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07-07:00"), ())', Date.fromstring('2002-03-07'), context) self.check_value('fn:adjust-date-to-timezone(xs:date("2002-03-07-07:00"), $tz)', Date.fromstring('2002-03-06-10:00'), context) self.check_value('fn:adjust-date-to-timezone((), ())') self.check_value('adjust-date-to-timezone(xs:date("-25252734927766555-06-07+02:00"), ' 'xs:dayTimeDuration("PT0S"))', OverflowError) def test_adjust_time_to_timezone_function(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00'), variables={'tz': DayTimeDuration.fromstring("-PT10H")}) self.check_value('fn:adjust-time-to-timezone(())') self.check_value('fn:adjust-time-to-timezone((), ())') self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00"))', Time.fromstring('10:00:00-05:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"))', Time.fromstring('12:00:00-05:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00"), $tz)', Time.fromstring('10:00:00-10:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"), $tz)', Time.fromstring('07:00:00-10:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00"), ())', Time.fromstring('10:00:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"), ())', Time.fromstring('10:00:00'), context) self.check_value('fn:adjust-time-to-timezone(xs:time("10:00:00-07:00"), ' 'xs:dayTimeDuration("PT10H"))', Time.fromstring('03:00:00+10:00'), context) def test_default_collation_function(self): default_collation = self.parser.default_collation self.check_value('fn:default-collation()', default_collation) def test_context_datetime_functions(self): context = XPathContext(root=self.etree.XML('')) self.check_value('fn:current-dateTime()', context=context, expected=DateTime10.fromdatetime(context.current_dt)) self.check_value(path='fn:current-date()', context=context, expected=Date10.fromdatetime(context.current_dt.date())) self.check_value(path='fn:current-time()', context=context, expected=Time.fromdatetime(context.current_dt)) self.check_value(path='fn:implicit-timezone()', context=context, expected=DayTimeDuration(seconds=time.timezone)) context.timezone = Timezone.fromstring('-05:00') self.check_value(path='fn:implicit-timezone()', context=context, expected=DayTimeDuration.fromstring('-PT5H')) self.parser._xsd_version = '1.1' try: self.check_value('fn:current-dateTime()', context=context, expected=DateTime.fromdatetime(context.current_dt)) self.check_value(path='fn:current-date()', context=context, expected=Date.fromdatetime(context.current_dt.date())) finally: self.parser._xsd_version = '1.0' def test_static_base_uri_function(self): context = XPathContext(root=self.etree.XML('')) self.check_value('fn:static-base-uri()', context=context) parser = XPath2Parser(strict=True, base_uri='http://example.com/ns/') self.assertEqual(parser.parse('fn:static-base-uri()').evaluate(context), 'http://example.com/ns/') def test_base_uri_function(self): context = XPathContext(root=self.etree.XML('')) with self.assertRaises(MissingContextError) as err: self.check_value('fn:base-uri(())') self.assertIn('XPDY0002', str(err.exception)) self.assertIn('context item is undefined', str(err.exception)) self.check_value('fn:base-uri(9)', MissingContextError) self.check_value('fn:base-uri(9)', TypeError, context=context) self.check_value('fn:base-uri()', datatypes.AnyURI(''), context=context) self.check_value('fn:base-uri(())', context=context) context = XPathContext(root=self.etree.XML('')) self.check_value('fn:base-uri()', '/base_path/', context=context) def test_document_uri_function(self): document = self.etree.parse(io.StringIO('')) context = XPathContext(root=document) self.check_value('fn:document-uri(())', context=context) self.check_value('fn:document-uri(.)', context=context) context = XPathContext(root=document.getroot(), item=document, documents={'/base_path/': document}) self.check_value('fn:document-uri(.)', context=context) context = XPathContext(root=document, documents={'/base_path/': document}) self.check_value('fn:document-uri(.)', '/base_path/', context=context) context = XPathContext(root=document, documents={ '/base_path/': self.etree.parse(io.StringIO('')), }) self.check_value('fn:document-uri(.)', context=context) document = self.etree.parse(io.StringIO('')) context = XPathContext(root=document) self.check_value('fn:document-uri(.)', context=context) # xml:base doesn't apply! document_node = get_node_tree(document, uri='/foo/bar.xml') context = XPathContext(root=document_node) self.check_value('fn:document-uri(.)', '/foo/bar.xml', context=context) # Relative URIs doesn't apply for fn:document-uri document_node = get_node_tree(document, uri='foo/bar.xml') context = XPathContext(root=document_node) self.check_value('fn:document-uri(.)', None, context=context) def test_doc_functions(self): root = self.etree.XML("") doc = self.etree.parse(io.StringIO("")) context = XPathContext(root, documents={'tns0': doc}) self.check_value("fn:doc(())", context=context) self.check_value("fn:doc-available(())", False, context=context) self.wrong_value('fn:doc-available(xs:untypedAtomic("2"))', 'FODC0002', context=context) self.wrong_type('fn:doc-available(2)', 'XPTY0004', context=context) self.check_value("fn:doc('tns0')", context.documents['tns0'], context=context) self.check_value("fn:doc-available('tns0')", True, context=context) self.check_value("fn:doc('tns1')", ValueError, context=context) self.check_value("fn:doc-available('tns1')", False, context=context) self.parser.base_uri = "/path1" self.check_value("fn:doc('http://foo.test')", ValueError, context=context) self.check_value("fn:doc-available('http://foo.test')", False, context=context) self.parser.base_uri = None doc = self.etree.XML("") context = XPathContext(root, documents={'tns0': doc}) self.wrong_type("fn:doc('tns0')", 'XPDY0050', context=context) self.wrong_type("fn:doc-available('tns0')", 'XPDY0050', context=context) context = XPathContext(root, documents={'file.xml': None}) self.wrong_value("fn:doc('file.xml')", 'FODC0002', context=context) self.wrong_value("fn:doc('unknown')", 'FODC0002', context=context) self.check_value("fn:doc-available('unknown')", False, context=context) dirpath = os.path.dirname(__file__) self.wrong_value("fn:doc('{}')".format(dirpath), 'FODC0005', context=context) def test_collection_function(self): root = self.etree.XML("") doc1 = self.etree.parse(io.StringIO("")) doc2 = self.etree.parse(io.StringIO("")) context = XPathContext(root, collections={'tns0': [doc1, doc2]}) collection = context.collections['tns0'] self.check_value("fn:collection('tns0')", collection, context=context) self.parser.collection_types = {'tns0': 'node()*'} self.check_value("fn:collection('tns0')", collection, context=context) self.parser.collection_types = {'tns0': 'node()'} self.check_value("fn:collection('tns0')", TypeError, context=context) self.check_value("fn:collection()", ValueError, context=context) context.default_collection = context.collections['tns0'] self.check_value("fn:collection()", collection, context=context) self.parser.default_collection_type = 'node()' self.check_value("fn:collection()", TypeError, context=context) self.parser.default_collection_type = 'node()*' context = XPathContext(root) self.wrong_value("fn:collection('filepath')", 'FODC0002', context=context) self.wrong_value("fn:collection('dirpath/')", 'FODC0002', context=context) def test_root_function(self): root = self.etree.XML("") context = XPathContext(root) self.check_value("root()", context.root, context=context) context = XPathContext(root, item=root[2]) self.check_value("root()", context.root, context=context) with self.assertRaises(TypeError) as err: context = XPathContext(root, item=10) self.check_value("root()", context.root, context=context) self.assertIn('XPTY0004', str(err.exception)) with self.assertRaises(TypeError) as err: self.check_value("root(7)", root, context=XPathContext(root)) self.assertIn('XPTY0004', str(err.exception)) context = XPathContext(root, variables={'elem': root[1]}) self.check_value("fn:root(())", context=context) self.check_value("fn:root($elem)", context.root, context=context) doc = self.etree.XML("") context = XPathContext(root, variables={'elem': doc[1]}) self.check_value("fn:root($elem)", context=context) context = XPathContext(root, variables={'elem': doc[1]}, documents={}) self.check_value("fn:root($elem)", context=context) context = XPathContext(root, variables={'elem': doc[1]}, documents={'.': doc}) self.check_value("root($elem)", context.documents['.'], context=context) doc2 = self.etree.XML("") context = XPathContext(root, variables={'elem': doc2[1]}, documents={'.': doc}) self.check_value("root($elem)", context=context) context = XPathContext(root, variables={'elem': doc2[1]}, documents={'.': doc, 'doc2': doc2}) self.check_value("root($elem)", context.documents['doc2'], context=context) if xmlschema is not None: schema = xmlschema.XMLSchema(dedent("""\ """)) with self.schema_bound_parser(schema.xpath_proxy): context = self.parser.schema.get_context() self.check_value("fn:root()", None, context) def test_error_function(self): with self.assertRaises(ElementPathError) as err: self.check_value('fn:error()') self.assertIn('[err:FOER0000] Unidentified error', str(err.exception)) with self.assertRaises(ElementPathError) as err: self.check_value('fn:error("err:XPST0001")') self.assertIn("[err:XPTY0004]", str(err.exception)) with self.assertRaises(ElementPathError) as err: self.check_value( "fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0001'))" ) self.assertIn('[err:XPST0001] Parser not bound to a schema', str(err.exception)) with self.assertRaises(ElementPathError) as err: self.check_value( "fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0001'), " "'Missing schema')" ) self.assertIn('[err:XPST0001] Missing schema', str(err.exception)) def test_trace_function(self): self.check_value('trace((), "trace message")', []) self.check_value('trace("foo", "trace message")', ['foo']) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath2FunctionsTest(XPath2FunctionsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath2_parser.py000066400000000000000000002147551476131650400243020ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import io import locale import os from decimal import Decimal from textwrap import dedent import xml.etree.ElementTree as ET from elementpath import XPath2Parser, XPathContext, XPathSchemaContext, \ MissingContextError, ElementNode, select, iter_select, get_node_tree from elementpath.datatypes import xsd10_atomic_types, xsd11_atomic_types, DateTime, \ Date, Date10, Time, Timezone, DayTimeDuration, YearMonthDuration, UntypedAtomic, QName from elementpath.namespaces import XPATH_FUNCTIONS_NAMESPACE from elementpath.collations import get_locale_category from elementpath.sequence_types import is_instance from elementpath.xpath_tokens import ProxyToken, XPathFunction try: from tests import test_xpath1_parser except ImportError: import test_xpath1_parser try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema from xmlschema.xpath import XMLSchemaProxy except ImportError: xmlschema = None XMLSchemaProxy = None else: xmlschema.XMLSchema.meta_schema.build() def get_sequence_type(value, xsd_version='1.0'): """ Infers the sequence type from a value. """ if value is None or value == []: return 'empty-sequence()' elif isinstance(value, list): if value[0] is not None and not isinstance(value[0], list): sequence_type = get_sequence_type(value[0], xsd_version) if all(get_sequence_type(x, xsd_version) == sequence_type for x in value[1:]): return '{}+'.format(sequence_type) else: return 'node()+' else: value_kind = getattr(value, 'kind', None) if value_kind is not None: return '{}()'.format(value_kind) elif isinstance(value, UntypedAtomic): return 'xs:untypedAtomic' if QName.is_valid(value) and ':' in str(value): return 'xs:QName' if xsd_version == '1.0': atomic_types = xsd10_atomic_types else: atomic_types = xsd11_atomic_types if atomic_types['dateTimeStamp'].is_valid(value): return 'xs:dateTimeStamp' for type_name in ['string', 'boolean', 'decimal', 'float', 'double', 'date', 'dateTime', 'gDay', 'gMonth', 'gMonthDay', 'anyURI', 'gYear', 'gYearMonth', 'time', 'duration', 'dayTimeDuration', 'yearMonthDuration', 'base64Binary', 'hexBinary']: if atomic_types[type_name].is_valid(value): return 'xs:%s' % type_name raise ValueError("Inconsistent sequence type for {!r}".format(value)) class XPath2ParserTest(test_xpath1_parser.XPath1ParserTest): def setUp(self): self.parser = XPath2Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] @unittest.skipIf(xmlschema is None, "xmlschema library is not installed!") def test_is_instance_function_with_schema(self): schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: self.assertFalse(is_instance(1.0, 'myInt', self.parser)) self.assertTrue(is_instance(1, 'myInt', self.parser)) with self.assertRaises(KeyError): is_instance(1.0, 'dType', self.parser) finally: self.parser.schema = None def test_variable_reference(self): root = self.etree.XML('') token = self.parser.parse('$var1') self.assertEqual(token.source, '$var1') self.assertEqual( repr(token), f"<{token.__class__.__name__} object at {hex(id(token))}>" ) self.assertEqual(str(token), '$var1 variable reference') context = XPathContext(root=root, variables={'var1': root[0]}) self.check_value('$var1', context.root[0], context=context) context = XPathContext(root=root, variables={'tns:var1': root[0]}) self.check_raise('$tns:var1', NameError, 'XPST0081', context=context) # Test dynamic evaluation error parser = XPath2Parser(namespaces={'tns': 'http://xpath.test/ns'}) token = parser.parse('$tns:var1') parser.namespaces.pop('tns') with self.assertRaises(NameError) as ctx: token.evaluate(context) self.assertIn('XPST0081', str(ctx.exception)) def test_check_variables_method(self): self.parser.variable_types.update( (k, get_sequence_type(v)) for k, v in self.variables.items() ) self.assertEqual(self.parser.variable_types, {'values': 'xs:decimal+', 'myaddress': 'xs:string', 'word': 'xs:string'}) self.assertIsNone(self.parser.check_variables( {'values': [1, 2, -1], 'myaddress': 'info@example.com', 'word': ''} )) with self.assertRaises(NameError) as ctx: self.parser.check_variables({'values': 1}) self.assertIn("[err:XPST0008] missing variable", str(ctx.exception)) with self.assertRaises(TypeError) as ctx: self.parser.check_variables( {'values': 1.0, 'myaddress': 'info@example.com', 'word': ''} ) self.assertEqual("[err:XPDY0050] Unmatched sequence type for variable 'values'", str(ctx.exception)) with self.assertRaises(TypeError) as ctx: self.parser.check_variables( {'values': 1, 'myaddress': 'info@example.com', 'word': True} ) self.assertEqual("[err:XPDY0050] Unmatched sequence type for variable 'word'", str(ctx.exception)) self.parser.variable_types.clear() def test_xpath_tokenizer(self): super(XPath2ParserTest, self).test_xpath_tokenizer() self.check_tokenizer("(: this is a comment :)", ['(:', '', 'this', '', 'is', '', 'a', '', 'comment', '', ':)']) self.check_tokenizer("last (:", ['last', '', '(:']) def test_token_tree(self): super(XPath2ParserTest, self).test_token_tree() self.check_tree('(1 + 6, 2, 10 - 4)', '(, (, (+ (1) (6)) (2)) (- (10) (4)))') self.check_tree('/A/B2 union /A/B1', '(union (/ (/ (A)) (B2)) (/ (/ (A)) (B1)))') self.check_tree("//text/(preceding-sibling::text)[1]", '(/ (// (text)) ([ (preceding-sibling (text)) (1)))') def test_token_source(self): super(XPath2ParserTest, self).test_token_source() self.check_source("(5, 6) instance of xs:integer+") self.check_source("$myaddress treat as element(*, USAddress)") self.check_source("(10, 1 to 4)") self.check_source("if (true()) then /A/B1 else /A/B2") self.check_source("every $part in /parts/part satisfies $part/@discounted") self.check_source("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4") self.check_source("-3.5 idiv -2") self.check_source("xs:float('1e0') eq 1e2", "xs:float('1e0') eq 100.0") self.check_source("sum(//price[../available = false()])") self.check_source("self::node()") self.check_source('child (: nasty (:nested :) axis comment :) ::B1', 'child::B1') self.check_source("() cast as xs:integer?") self.check_source("() treat as empty-sequence()") self.check_source("'NaN' castable as xs:double") self.check_source("(1, fn:round-half-to-even(()), 7)") def test_xpath_comments(self): self.wrong_syntax("(: this is a comment :)") self.check_value("(: this is a comment :) true()", True) self.check_value("(: comment 1 :)(: comment 2 :) true()", True) self.check_value("(: comment 1 :) true() (: comment 2 :)", True) self.wrong_syntax("(: this is a (: nested :) comment :)") self.check_value("(: this is a (: nested :) comment :) true()", True) self.check_tree('child (: nasty (:nested :) axis comment :) ::B1', '(child (B1))') self.check_tree('child (: nasty "(: but not nested :)" axis comment :) ::B1', '(child (B1))') self.check_value("5 (: before operator comment :) < 4", False) # Before infix operator self.check_value("5 < (: after operator comment :) 4", False) # After infix operator self.check_value("true (:# nasty function comment :) ()", True) self.check_tree(' (: initial comment :)/ (:2nd comment:)A/B1(: 3rd comment :)/ \n' 'C1 (: last comment :)\t', '(/ (/ (/ (A)) (B1)) (C1))') self.wrong_syntax("xs:(: invalid QName :)string") def test_comma_operator(self): self.check_value("1, 2", [1, 2]) self.check_value("(1, 2)", [1, 2]) self.check_value("(1, 2, ())", [1, 2]) self.check_value("(1, fn:round-half-to-even(()), 7)", [1, 7]) self.check_value("(-9, 28, 10)", [-9, 28, 10]) self.check_value("(1, 2)", [1, 2]) root = self.etree.XML('') self.check_selector("(7.0, /A, 'foo')", root, [7.0, root, 'foo']) self.check_selector("7.0, /A, 'foo'", root, [7.0, root, 'foo']) self.check_selector("/A, 7.0, 'foo'", self.etree.XML(''), [7.0, 'foo']) def test_range_expressions(self): # Some cases from https://www.w3.org/TR/xpath20/#construct_seq self.check_value("1 to 2", [1, 2]) self.check_value("1 to 10", list(range(1, 11))) self.check_value("(10, 1 to 4)", [10, 1, 2, 3, 4]) self.check_value("10 to 10", [10]) self.check_value("15 to 10", []) self.check_value("fn:reverse(10 to 15)", [15, 14, 13, 12, 11, 10]) self.wrong_syntax("1 to 10 to 20", 'XPST0003') root = self.etree.XML('') self.wrong_type("'1' to '10'", 'XPTY0004', context=XPathContext(root)) self.wrong_type("true() to 10", 'XPTY0004') def test_parenthesized_expressions(self): self.check_value("(1, 2, '10')", [1, 2, '10']) self.check_value("()", []) def test_if_expressions(self): root = self.etree.XML('') token = self.parser.parse("if (1) then 2 else 3") self.assertEqual(len(token), 3) self.assertEqual(token.source, 'if (1) then 2 else 3') self.check_value("if (1) then 2 else 3", 2) self.check_selector("if (true()) then /A/B1 else /A/B2", root, root[:1]) self.check_selector("if (false()) then /A/B1 else /A/B2", root, root[1:2]) token = self.parser.parse("if") self.assertEqual(token.symbol, '(name)') self.assertEqual(token.value, 'if') # Cases from XPath 2.0 examples root = self.etree.XML('') self.check_selector( 'if ($part/@discounted) then $part/wholesale else $part/retail', root, [root[0]], variables={'part': root}, variable_types={'part': 'element()'} ) root = self.etree.XML('' ' 25' ' 10' ' 15' '') self.check_selector( 'if ($widget1/unit-cost < $widget2/unit-cost) then $widget1 else $widget2', root, [root[2]], variables={'widget1': root[0], 'widget2': root[2]} ) def test_quantifier_expressions(self): # Cases from XPath 2.0 examples root = self.etree.XML('' ' ' ' ' ' ' '') self.check_selector("every $part in /parts/part satisfies $part/@discounted", root, True) self.check_selector("every $part in /parts/part satisfies $part/@available", root, False) root = self.etree.XML('' ' 1000400' ' 1200300' ' 1200200' '') self.check_selector("some $emp in /emps/employee satisfies " " ($emp/bonus > 0.25 * $emp/salary)", root, True) self.check_selector("every $emp in /emps/employee satisfies " " ($emp/bonus < 0.5 * $emp/salary)", root, True) context = XPathContext(root=self.etree.XML('')) self.check_value("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4", True, context) self.check_value("every $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 4", False, context) self.check_value("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 7", True, context) self.check_value("some $x in (1, 2, 3), $y in (2, 3, 4) satisfies $x + $y = 8", False, context) self.check_value('some $x in (1, 2, "cat") satisfies $x * 2 = 4', True, context) self.check_value('every $x in (1, 2, "cat") satisfies $x * 2 = 4', False, context) token = self.parser.parse("some") self.assertEqual(token.symbol, '(name)') self.assertEqual(token.value, 'some') # From W3C XQuery/XPath tests context = XPathContext(root=self.etree.XML(''), variables={'result': [43, 44, 45]}) self.check_value('some $i in $result satisfies $i = 44', True, context) self.check_value('every $i in $result satisfies $i = 44', False, context) self.check_raise('some $foo in (1, $foo) satisfies 1', NameError, 'XPST0008') def test_for_expressions(self): # Cases from XPath 2.0 examples context = XPathContext(root=self.etree.XML('')) path = "for $i in (10, 20), $j in (1, 2) return ($i + $j)" self.check_value(path, [11, 12, 21, 22], context) self.check_source(path, path) root = self.etree.XML( """ TCP/IP Illustrated Stevens Addison-Wesley Advanced Programming in the Unix Environment Stevens Addison-Wesley Data on the Web Abiteboul Buneman Suciu """) # Test step-by-step, testing also other basic features. self.check_selector("book/author[1]", root, [root[0][1], root[1][1], root[2][1]]) self.check_selector("book/author[. = $a]", root, [root[0][1], root[1][1]], variables={'a': 'Stevens'}) self.check_tree("book/author[. = $a][1]", '(/ (book) ([ ([ (author) (= (.) ($ (a)))) (1)))') self.check_selector("book/author[. = $a][1]", root, [root[0][1], root[1][1]], variables={'a': 'Stevens'}) self.check_selector("book/author[. = 'Stevens'][2]", root, []) self.check_selector("for $a in fn:distinct-values(book/author) return $a", root, ['Stevens', 'Abiteboul', 'Buneman', 'Suciu']) self.check_selector("for $a in fn:distinct-values(book/author) return book/author[. = $a]", root, [root[0][1], root[1][1]] + root[2][1:4]) self.check_selector("for $a in fn:distinct-values(book/author) " "return book/author[. = $a][1]", root, [root[0][1], root[1][1]] + root[2][1:4]) self.check_selector( "for $a in fn:distinct-values(book/author) " "return (book/author[. = $a][1], book[author = $a]/title)", root, [root[0][1], root[1][1], root[0][0], root[1][0], root[2][1], root[2][0], root[2][2], root[2][0], root[2][3], root[2][0]] ) # From W3C XQuery/XPath tests context = XPathContext(root=self.etree.XML(''), variables={'result': [43, 44, 45]}) self.check_value('for $i in $result return $i + 10', [53, 54, 55], context) self.check_raise('for $foo in (1, $foo) return 1', NameError, 'XPST0008') def test_idiv_operator(self): self.check_value("5 idiv 2", 2) self.check_value("-3.5 idiv -2", 1) self.check_value("-3.5 idiv 2", -1) self.check_value('xs:float("-3.5") idiv xs:float("3")', -1) self.check_value("-3.5 idiv 0", ZeroDivisionError) self.check_value("xs:float('INF') idiv 2", OverflowError) self.wrong_value("-3.5 idiv ()", 'XPST0005') self.check_raise('xs:float("NaN") idiv 1', OverflowError, 'FOAR0002') self.wrong_type("5 idiv '2'", 'XPTY0004') def test_comparison_operators(self): super(XPath2ParserTest, self).test_comparison_operators() self.check_value("0.05 eq 0.05", True) self.check_value("19.03 ne 19.02999", True) self.check_value("-1.0 eq 1.0", False) self.check_value("1 le 2", True) self.check_value("1e0 eq 1e2", False) self.check_value("xs:float('1e0') eq 1e2", False) self.check_value("1.0 lt 1e2", True) self.check_value("1e2 lt 1000", True) self.check_value("3 le 2", False) self.check_value("5 ge 9", False) self.check_value("5 gt 3", True) self.check_value("5 lt 20.0", True) self.wrong_type("false() eq 1", 'XPTY0004') self.wrong_type("0 eq false()", 'XPTY0004') self.check_value("2 * 2 eq 4", True) self.check_value("() * 7") self.check_value("() * ()") self.check_value('xs:string("http://xpath.test") eq xs:anyURI("http://xpath.test")', True) self.check_value("() le 4") self.check_value("4 gt ()") self.check_value("() eq ()") # Equality of empty sequences is also an empty sequence self.wrong_syntax('true() eq true() eq true()', 'XPST0003') # From W3C XQuery/XPath tests self.check_value('xs:duration("P31D") ne xs:yearMonthDuration("P1M")', True) self.wrong_type('QName("", "ncname") le QName("", "ncname")', 'XPTY0004') # From W3C XSD 1.1 tests context = XPathContext(root=self.etree.XML(''), variables={'value': Date(9999, 10, 10)}) self.check_value('$value lt current-date()', False, context=context) def test_comparison_in_expression(self): context = XPathContext(self.etree.XML('false')) self.check_value("(. = 'false') = (. = 'false')", True, context) self.check_value("(. = 'asdf') != (. = 'false')", True, context) def test_boolean_evaluation_in_selector(self): context = XPathContext(self.etree.XML(""" true 10.0 1 10.0 false 5.0 0 5.0 """)) self.check_value("sum(//price)", 30, context) self.check_value("sum(//price[../available = 'true'])", 10, context) self.check_value("sum(//price[../available = 'false'])", 5, context) self.check_value("sum(//price[../available = '1'])", 10, context) self.check_value("sum(//price[../available = '0'])", 5, context) self.check_value("sum(//price[../available = true()])", 20, context) self.check_value("sum(//price[../available = false()])", 10, context) def test_comparison_of_sequences(self): super(XPath2ParserTest, self).test_comparison_of_sequences() self.parser.compatibility_mode = True self.wrong_type("(false(), false()) = 1") self.check_value("(false(), false()) = (false(), false())", True) self.check_value("(false(), false()) = (false(), false(), false())", True) self.check_value("(false(), false()) = (false(), true())", True) self.check_value("(false(), false()) = (true(), false())", True) self.check_value("(false(), false()) = (true(), true())", False) self.check_value("(false(), false()) = (true(), true(), false())", True) self.parser.compatibility_mode = False # From XPath 2.0 examples root = self.etree.XML('' ' Kafka' ' Huxley' ' Asimov' '') context = XPathContext(root=root, variables={'book1': root[0]}) self.check_value('$book1 / author = "Kafka"', True, context=context) self.check_value('$book1 / author eq "Kafka"', True, context=context) self.check_value("(1, 2) = (2, 3)", True) self.check_value("(2, 3) = (3, 4)", True) self.check_value("(1, 2) = (3, 4)", False) self.check_value("(1, 2) != (2, 3)", True) # != is not the inverse of = context = XPathContext(root=root, variables={ 'a': UntypedAtomic('1'), 'b': UntypedAtomic('2'), 'c': UntypedAtomic('2.0') }) self.check_value('($a, $b) = ($c, 3.0)', False, context=context) self.check_value('($a, $b) = ($c, 2.0)', True, context=context) self.wrong_type("(1, 2) le (2, 3)", 'XPTY0004', 'sequence of length greater than one') root = self.etree.XML('') context = XPathContext(root=root) self.check_value('@min', [context.root.attributes[0]], context=context) self.check_value('@min le @max', True, context=context) root = self.etree.XML('') self.check_value('@min le @max', False, context=XPathContext(root=root)) self.check_value('@min le @maximum', None, context=XPathContext(root=root)) if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): root = self.etree.XML('11') context = XPathContext(root, schema=self.parser.schema) self.check_value('. le 10', False, context) self.check_value('. le 20', True, context) root = self.etree.XML('eleven') context = XPathContext(root, schema=self.parser.schema) self.wrong_value('. le 10', 'FORG0001', context=context) root = self.etree.XML('12') context = XPathContext(root, schema=self.parser.schema) with self.assertRaises(TypeError) as err: self.check_value('. le "11"', context) self.assertIn('XPTY0004', str(err.exception)) # Static schema context error self.check_value('. le 10', False, context=context) # Schema information persists on parser (will be removed in v5.0) context = XPathContext(root) self.check_value('. le 10', False, context=context) context = XPathContext(root) with self.assertRaises(TypeError) as err: self.check_value('. le 10', context=context) self.assertIn('XPTY0004', str(err.exception)) # Dynamic context error schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): root = self.etree.XML('15') self.check_value('. le "11"', False, context=XPathContext(root)) root = self.etree.XML('1103050') self.check_selector("a = (1 to 30)", root, True) self.check_selector("a = (2)", root, False) self.check_selector("a[1] = (1 to 10, 30)", root, True) self.check_selector("a[2] = (1 to 10, 30)", root, True) self.check_selector("a[3] = (1 to 10, 30)", root, True) self.check_selector("a[4] = (1 to 10, 30)", root, False) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed!") def test_namespace_axis_on_schema_context(self): schema = xmlschema.XMLSchema(dedent("""\n """)) context = XPathSchemaContext(schema) token = self.parser.parse('/namespace::*') self.assertListEqual(token.evaluate(context), []) def test_unknown_axis(self): self.wrong_syntax('unknown::node()', 'XPST0003') self.wrong_syntax('A/unknown::node()', 'XPST0003') self.parser.compatibility_mode = True self.wrong_name('unknown::node()', 'XPST0010') self.wrong_name('A/unknown::node()', 'XPST0010') self.parser.compatibility_mode = False def test_predicate(self): super(XPath2ParserTest, self).test_predicate() root = self.etree.XML('') self.check_selector("/(A/*/*)[1]", root, [root[0][0]]) self.check_selector("/A/*/*[1]", root, [root[0][0], root[1][0]]) def test_subtract_datetimes(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00')) self.check_value('xs:dateTime("2000-10-30T06:12:00") - xs:dateTime("1999-11-28T09:00:00Z")', DayTimeDuration.fromstring('P337DT2H12M'), context) self.check_value('xs:dateTime("2000-10-30T06:12:00") - xs:dateTime("1999-11-28T09:00:00Z")', DayTimeDuration.fromstring('P336DT21H12M')) def test_subtract_dates(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('Z')) self.check_value('xs:date("2000-10-30") - xs:date("1999-11-28")', DayTimeDuration.fromstring('P337D'), context) context.timezone = Timezone.fromstring('+05:00') self.check_value('xs:date("2000-10-30") - xs:date("1999-11-28Z")', DayTimeDuration.fromstring('P336DT19H'), context) self.check_value('xs:date("2000-10-15-05:00") - xs:date("2000-10-10+02:00")', DayTimeDuration.fromstring('P5DT7H')) # BCE test cases self.check_value('xs:date("0001-01-01") - xs:date("-0001-01-01")', DayTimeDuration.fromstring('P366D')) self.check_value('xs:date("-0001-01-01") - xs:date("-0001-01-01")', DayTimeDuration.fromstring('P0D')) self.check_value('xs:date("-0001-01-01") - xs:date("0001-01-01")', DayTimeDuration.fromstring('-P366D')) self.check_value('xs:date("-0001-01-01") - xs:date("-0001-01-02")', DayTimeDuration.fromstring('-P1D')) self.check_value('xs:date("-0001-01-04") - xs:date("-0001-01-01")', DayTimeDuration.fromstring('P3D')) self.check_value('xs:date("0200-01-01") - xs:date("-0121-01-01")', DayTimeDuration.fromstring('P116878D')) self.check_value('xs:date("-0201-01-01") - xs:date("0120-01-01")', DayTimeDuration.fromstring('-P116877D')) def test_subtract_times(self): context = XPathContext(root=self.etree.XML(''), timezone=Timezone.fromstring('-05:00')) self.check_value('xs:time("11:12:00Z") - xs:time("04:00:00")', DayTimeDuration.fromstring('PT2H12M'), context) self.check_value('xs:time("11:00:00-05:00") - xs:time("21:30:00+05:30")', DayTimeDuration.fromstring('PT0S'), context) self.check_value('xs:time("17:00:00-06:00") - xs:time("08:00:00+09:00")', DayTimeDuration.fromstring('PT24H'), context) self.check_value('xs:time("24:00:00") - xs:time("23:59:59")', DayTimeDuration.fromstring('-PT23H59M59S'), context) def test_add_year_month_duration_to_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") + xs:yearMonthDuration("P1Y2M")', DateTime.fromstring("2001-12-30T11:12:00")) def test_add_day_time_duration_to_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") + xs:dayTimeDuration("P3DT1H15M")', DateTime.fromstring("2000-11-02T12:27:00")) def test_subtract_year_month_duration_from_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") - xs:yearMonthDuration("P0Y2M")', DateTime.fromstring("2000-08-30T11:12:00")) self.check_value('xs:dateTime("2000-10-30T11:12:00") - xs:yearMonthDuration("P1Y2M")', DateTime.fromstring("1999-08-30T11:12:00")) def test_subtract_day_time_duration_from_datetime(self): self.check_value('xs:dateTime("2000-10-30T11:12:00") - xs:dayTimeDuration("P3DT1H15M")', DateTime.fromstring("2000-10-27T09:57:00")) def test_add_year_month_duration_to_date(self): self.check_value('xs:date("2000-10-30") + xs:yearMonthDuration("P1Y2M")', Date.fromstring('2001-12-30')) def test_subtract_year_month_duration_from_date(self): self.check_value('xs:date("2000-10-30") - xs:yearMonthDuration("P1Y2M")', Date.fromstring('1999-08-30')) self.check_value('xs:date("2000-02-29Z") - xs:yearMonthDuration("P1Y")', Date.fromstring('1999-02-28Z')) self.check_value('xs:date("2000-10-31-05:00") - xs:yearMonthDuration("P1Y1M")', Date.fromstring('1999-09-30-05:00')) def test_subtract_day_time_duration_from_date(self): self.check_value('xs:date("0001-01-05") - xs:dayTimeDuration("P3DT1H15M")', Date.fromstring('0001-01-01')) self.check_value('xs:date("2000-10-30") - xs:dayTimeDuration("P3DT1H15M")', Date.fromstring('2000-10-26')) def test_add_day_time_duration_to_time(self): self.check_value('xs:time("11:12:00") + xs:dayTimeDuration("P3DT1H15M")', Time.fromstring('12:27:00')) self.check_value('xs:time("23:12:00+03:00") + xs:dayTimeDuration("P1DT3H15M")', Time.fromstring('02:27:00+03:00')) def test_subtract_day_time_duration_to_time(self): self.check_value('xs:time("11:12:00") - xs:dayTimeDuration("P3DT1H15M")', Time.fromstring('09:57:00')) self.check_value('xs:time("08:20:00-05:00") - xs:dayTimeDuration("P23DT10H10M")', Time.fromstring('22:10:00-05:00')) def test_duration_with_arithmetical_operators(self): self.wrong_type('xs:duration("P1Y") * 3', 'XPTY0004', 'unsupported operand type(s)') self.wrong_value('xs:duration("P1Y") * xs:float("NaN")', 'FOCA0005') self.check_value('xs:duration("P1Y") * xs:float("INF")', OverflowError) self.wrong_value('xs:float("NaN") * xs:duration("P1Y")', 'FOCA0005') self.check_value('xs:float("INF") * xs:duration("P1Y")', OverflowError) self.wrong_type('xs:duration("P3Y") div 3', 'XPTY0004', 'unsupported operand type(s)') def test_year_month_duration_operators(self): self.check_value('xs:yearMonthDuration("P2Y11M") + xs:yearMonthDuration("P3Y3M")', YearMonthDuration(months=74)) self.check_value('xs:yearMonthDuration("P2Y11M") - xs:yearMonthDuration("P3Y3M")', YearMonthDuration(months=-4)) self.check_value('xs:yearMonthDuration("P2Y11M") * 2.3', YearMonthDuration.fromstring('P6Y9M')) self.check_value('xs:yearMonthDuration("P2Y11M") div 1.5', YearMonthDuration.fromstring('P1Y11M')) self.check_value('xs:yearMonthDuration("P3Y4M") div xs:yearMonthDuration("-P1Y4M")', -2.5) self.wrong_value('xs:double("NaN") * xs:yearMonthDuration("P2Y")', 'FOCA0005') self.check_value('xs:yearMonthDuration("P1Y") * xs:double("INF")', OverflowError) self.wrong_value('xs:yearMonthDuration("P3Y") div xs:double("NaN")', 'FOCA0005') self.check_raise('xs:yearMonthDuration("P3Y") div xs:yearMonthDuration("P0Y")', ZeroDivisionError, 'FOAR0001', 'Division by zero') self.check_raise('xs:yearMonthDuration("P3Y36M") div 0', OverflowError, 'FODT0002') def test_day_time_duration_operators(self): self.check_value('xs:dayTimeDuration("P2DT12H5M") + xs:dayTimeDuration("P5DT12H")', DayTimeDuration.fromstring('P8DT5M')) self.check_value('xs:dayTimeDuration("P2DT12H") - xs:dayTimeDuration("P1DT10H30M")', DayTimeDuration.fromstring('P1DT1H30M')) self.check_value('xs:dayTimeDuration("PT2H10M") * 2.1', DayTimeDuration.fromstring('PT4H33M')) self.check_value('xs:dayTimeDuration("P1DT2H30M10.5S") div 1.5', DayTimeDuration.fromstring('PT17H40M7S')) self.check_value('3 * xs:dayTimeDuration("P1D")', DayTimeDuration.fromstring('P3D')) self.check_value( 'xs:dayTimeDuration("P2DT53M11S") div xs:dayTimeDuration("P1DT10H")', Decimal('1.437834967320261437908496732') ) def test_document_node_accessor(self): document = self.etree.parse(io.StringIO('')) context = XPathContext(root=document) self.wrong_syntax("document-node(A)") self.wrong_syntax("document-node(*)") self.wrong_syntax("document-node(true())") self.wrong_syntax("document-node(node())") self.wrong_type("document-node(element(A), 1)") self.check_select("document-node()", [], context) self.check_select("self::document-node()", [context.root], context) self.check_selector("self::document-node(element(A))", document, [document]) self.check_selector("self::document-node(element(B))", document, []) context = XPathContext(root=document.getroot()) self.check_select("document-node()", [], context) self.check_select("self::document-node()", [], context) self.check_select("self::document-node(element(A))", [], context) def test_element_accessor(self): element = self.etree.Element('schema') context = XPathContext(root=element) self.wrong_syntax("element('name')") self.wrong_syntax("element(A, 'name')") self.check_select("element()", [], context) self.check_select("self::element()", [context.root], context) self.check_select("self::element(schema)", [context.root], context) self.check_select("self::element(schema, xs:string)", [], context) root = self.etree.XML('texttail') context = XPathContext(root) expected = [e for e in context.root if isinstance(e, ElementNode)] self.check_select("element(*)", expected, context) self.check_select("element(B)", expected, context) self.check_select("element(A)", [], context) if xmlschema is not None: schema = xmlschema.XMLSchema(dedent('''\ ''')) root = self.etree.XML('hello') context = XPathContext(root) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): context.root.xsd_type = schema.elements['root'].type self.check_select("self::element(*, xs:string)", [context.root], context) self.check_select("self::element(*, xs:int)", [], context) def test_attribute_accessor(self): root = self.etree.XML('texttail') context = XPathContext(root) a = context.root.attributes[0] b = context.root.attributes[1] self.check_select("attribute()", [a, b], context) self.check_select("attribute(*)", [a, b], context) self.check_select("attribute(a)", [a], context) self.check_select("attribute(a, xs:int)", [a], context) if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) schema_proxy = schema.elements['A'].xpath_proxy with self.schema_bound_parser(schema_proxy): context = XPathContext(root, schema=schema_proxy) a = context.root.attributes[0] b = context.root.attributes[1] self.check_select("attribute(a, xs:int)", [a], context) self.check_select("attribute(*, xs:int)", [a, b], context) self.check_select("attribute(a, xs:string)", [], context) self.check_select("attribute(*, xs:string)", [], context) def test_node_and_node_accessors(self): element = self.etree.Element('schema') element.attrib.update([('id', '0212349350')]) context = XPathContext(root=element) self.check_select("self::node()", [context.root], context) self.check_select("self::attribute()", [context.root.attributes[0]], context) context.item = 7 self.check_select("node()", [], context) context.item = 10.2 self.check_select("node()", [], context) def test_union_intersect_except_operators(self): root = self.etree.XML('') self.check_selector('/A/B2 union /A/B1', root, root[:2]) self.check_selector('/A/B2 union /A/*', root, root[:]) self.check_selector('/A/B2 intersect /A/B1', root, []) self.check_selector('/A/B2 intersect /A/*', root, [root[1]]) self.check_selector('/A/B1/* intersect /A/B2/*', root, []) self.check_selector('/A/B1/* intersect /A/*/*', root, root[0][:]) self.check_selector('/A/B2 except /A/B1', root, root[1:2]) self.check_selector('/A/* except /A/B2', root, [root[0], root[2]]) self.check_selector('/A/*/* except /A/B2/*', root, root[0][:]) self.check_selector('/A/B2/* except /A/B1/*', root, root[1][:]) self.check_selector('/A/B2/* except /A/*/*', root, []) root = self.etree.XML('') # From variables like XPath 2.0 examples context = XPathContext(root, variables={ 'seq1': root[:2], # (A, B) 'seq2': root[:2], # (A, B) 'seq3': root[1:], # (B, C) }) self.check_select('$seq1 union $seq2', context.root[:2], context=context) self.check_select('$seq2 union $seq3', context.root[:], context=context) self.check_select('$seq1 intersect $seq2', context.root[:2], context=context) self.check_select('$seq2 intersect $seq3', context.root[1:2], context=context) self.check_select('$seq1 except $seq2', [], context=context) self.check_select('$seq2 except $seq3', context.root[:1], context=context) self.wrong_type('1 intersect 1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('1 except $seq1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('1 union $seq1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('$seq1 intersect 1', 'XPTY0004', 'only XPath nodes are allowed', context=context) self.wrong_type('$seq1 union 1', 'XPTY0004', 'only XPath nodes are allowed', context=context) def test_node_comparison_operators(self): # Test cases from https://www.w3.org/TR/xpath20/#id-node-comparisons root = self.etree.XML(''' 1558604820QA76.9 C3845 0070512655QA76.9 C3846 0131477005QA76.9 C3847 ''') self.check_selector('/books/book[isbn="1558604820"] is /books/book[call="QA76.9 C3845"]', root, True) self.check_selector('/books/book[isbn="0070512655"] is /books/book[call="QA76.9 C3847"]', root, False) self.check_selector('/books/book[isbn="not a code"] is /books/book[call="QA76.9 C3847"]', root, []) context = XPathContext(root) self.check_value('/books/book[isbn="1558604820"] is ()', context=context) self.wrong_type('/books/book[isbn="1558604820"] is (1, 2)', 'XPTY0004', context=context) self.check_value('/books/book[isbn="1558604820"] << /books/book[isbn="1558604820"]', False, context=context) context = XPathContext(root, variables={'a': self.etree.Element('a'), 'b': self.etree.Element('b')}) self.wrong_value('$a << $b', 'FOCA0002', 'operands are not nodes of the XML tree', context=context) root = self.etree.XML(''' 28-451 33-870 15-392 35-530 10-639 10-639 39-729 ''') self.check_selector( '/transactions/purchase[parcel="28-451"] << /transactions/sale[parcel="33-870"]', root, True ) self.check_selector( '/transactions/purchase[parcel="15-392"] >> /transactions/sale[parcel="33-870"]', root, True ) self.check_selector( '/transactions/purchase[parcel="10-639"] >> /transactions/sale[parcel="33-870"]', root, TypeError ) self.wrong_type('is ()', 'XPST0017') self.wrong_syntax('is B', 'XPST0003') self.wrong_syntax('A is B is C', 'XPST0003') def test_empty_sequence_type(self): self.check_value("() treat as empty-sequence()", []) self.check_value("6 treat as empty-sequence()", TypeError) self.wrong_syntax("empty-sequence()") context = XPathContext(root=self.etree.XML('')) self.check_value("() instance of empty-sequence()", expected=True, context=context) self.check_value(". instance of empty-sequence()", expected=False, context=context) def test_item_sequence_type(self): self.check_value("4 treat as item()", MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value("4 treat as item()", [4], context) self.check_value("() treat as item()", TypeError, context) self.wrong_syntax("item()") context = XPathContext(root=self.etree.XML('')) self.check_value(". instance of item()", expected=True, context=context) self.check_value("() instance of item()", expected=False, context=context) context = XPathContext(root=self.etree.parse(io.StringIO(''))) self.check_value(". instance of item()", expected=True, context=context) self.check_value("() instance of item()", expected=False, context=context) def test_static_analysis_phase(self): context = XPathContext(self.etree.XML(''), variables=self.variables) self.check_value('fn:concat($word, fn:lower-case(" BETA"))', 'alpha beta', context) self.check_value('fn:concat($word, fn:lower-case(10))', TypeError, context) self.check_value('fn:concat($unknown, fn:lower-case(10))', NameError, context) def test_instance_of_expression(self): element = self.etree.Element('schema') # Test cases from https://www.w3.org/TR/xpath20/#id-instance-of self.check_value("5 instance of xs:integer", True) self.check_value("5 instance of xs:decimal", True) self.check_value("9.0 instance of xs:integer", False) self.check_value("(5, 6) instance of xs:integer+", True) context = XPathContext(element) self.check_value(". instance of element()", True, context) context.item = "foo" self.check_value(". instance of element()", False, context) self.check_value("(5, 6) instance of xs:integer", False) self.check_value("(5, 6) instance of xs:integer*", True) self.check_value("(5, 6) instance of xs:integer?", False) self.check_value("5 instance of empty-sequence()", False) self.check_value("() instance of empty-sequence()", True) self.wrong_syntax("5 instance of unknown()", 'XPST0003', "unexpected parenthesized expression") self.wrong_syntax("5 instance of unknown::node()", 'XPST0003', "unexpected '::' symbol") self.wrong_syntax("1e3 instance of empty-sequence()(", 'XPST0003') # Test dynamic evaluation error on prefixed name parser = XPath2Parser() token = parser.parse('5 instance of xs:decimal') parser.namespaces.pop('xs') with self.assertRaises(NameError) as ctx: token.evaluate() self.assertIn('XPST0081', str(ctx.exception)) # From W3C XQuery/XPath tests context = XPathContext(element) self.check_value("not(1 instance of node())", True, context) self.check_value("(1, 2, 3, 4, 5) instance of item()+", True, context) self.check_value("(1, 2, 3, 4, 5) instance of item()", False, context) self.wrong_name("3 instance of void") def test_treat_as_expression(self): element = self.etree.Element('schema') context = XPathContext(element) self.check_value("5 treat as xs:integer", [5]) self.check_value("5 treat as xs:string", TypeError) self.check_value("5 treat as xs:decimal", [5]) self.check_value("(5, 6) treat as xs:integer+", [5, 6]) self.check_value(". treat as element()", [context.root], context) self.check_value("(5, 6) treat as xs:integer", TypeError) self.check_value("(5, 6) treat as xs:integer*", [5, 6]) self.check_value("(5, 6) treat as xs:integer?", TypeError) self.check_value("5 treat as empty-sequence()", TypeError) self.check_value("() treat as empty-sequence()", []) self.check_value("() treat as xs:integer?", []) self.wrong_type("() treat as xs:integer", 'XPDY0050') # Test dynamic evaluation error on prefixed name parser = XPath2Parser() token = parser.parse('5 treat as xs:decimal') parser.namespaces.pop('xs') with self.assertRaises(NameError) as ctx: token.evaluate() self.assertIn('XPST0081', str(ctx.exception)) # From W3C XQuery/XPath tests self.check_value("3 treat as item()+", [3], context) self.wrong_type("3 treat as node()+", 'XPDY0050', context=context) self.check_value("(1, 2, 3) treat as item()+", [1, 2, 3], context) self.wrong_type("(1, 2, 3) treat as item()", 'XPDY0050', context=context) self.wrong_name("3 treat as xs:doesNotExist") def test_castable_expression(self): self.check_value("5 castable as xs:integer", True) self.check_value("'5' castable as xs:integer", True) self.check_value("'hello' castable as xs:integer", False) self.check_value("('5', '6') castable as xs:integer", False) self.check_value("() castable as xs:integer", False) self.check_value("() castable as xs:integer?", True) self.wrong_syntax("5 castable as empty-sequence()", 'XPST0003') self.wrong_name("5 castable as void", 'XPST0051') self.check_value("5 castable as xs:void", False) self.check_value("'NaN' castable as xs:double", True) self.check_value("'None' castable as xs:double", False) self.check_value("'NaN' castable as xs:float", True) self.check_value("'NaN' castable as xs:integer", False) # From W3C XQuery/XPath tests self.check_value("(1E3) castable as xs:double?", True) def test_cast_expression(self): self.check_value("5 cast as xs:integer", 5) self.check_value("'5' cast as xs:integer", 5) self.check_value("'hello' cast as xs:integer", ValueError) self.check_value("('5', '6') cast as xs:integer", TypeError) self.check_value("() cast as xs:integer", TypeError) self.check_value("() cast as xs:integer?", []) self.check_value('"1" cast as xs:boolean', True) self.check_value('"0" cast as xs:boolean', False) self.check_value("xs:untypedAtomic('1E3') cast as xs:double", 1E3) self.wrong_value("xs:untypedAtomic('x') cast as xs:double", 'FORG0001') # Test dynamic evaluation error on prefixed name parser = XPath2Parser() token = parser.parse("() cast as xs:string?") parser.namespaces.pop('xs') with self.assertRaises(NameError) as ctx: token.evaluate() self.assertIn('XPST0081', str(ctx.exception)) @unittest.skipIf(xmlschema is None, "xmlschema library is not installed!") def test_cast_or_castable_with_derived_type(self): schema = xmlschema.XMLSchema(dedent("""\n """)) with self.schema_bound_parser(schema.xpath_proxy): root = self.etree.XML('') context = XPathContext(root) self.check_value("'1E3' castable as floatType", True, context) self.check_value("(1E3) castable as floatType", True, context) self.check_value("xs:untypedAtomic('1E3') cast as floatType", 1E3) self.check_value("xs:untypedAtomic('x') castable as floatType", False) self.wrong_value("xs:untypedAtomic('x') cast as floatType", 'FORG0001') self.wrong_value("'x' cast as floatType", 'FORG0001') self.wrong_type("xs:anyURI('http://xpath.test') cast as floatType", 'XPTY0004') def test_logical_expressions_(self): super(XPath2ParserTest, self).test_logical_expressions() if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) with self.schema_bound_parser(schema.elements['root'].xpath_proxy): root_token = self.parser.parse("(@a and not(@b)) or (not(@a) and @b)") context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is False) context = XPathContext(self.etree.XML('')) self.assertTrue(root_token.evaluate(context=context) is True) def test_element_decimal_cast(self): root = self.etree.XML(''' 155860482012.50 155860482013.50 1558604820-0.1 ''') expected_values = [Decimal('12.5'), Decimal('13.5'), Decimal('-0.1')] self.assertEqual(3, len(select(root, "//book"))) for book in iter_select(root, "//book"): context = XPathContext(root=root, item=book) root_token = self.parser.parse("xs:decimal(price)") self.assertEqual(expected_values.pop(0), root_token.evaluate(context)) def test_element_decimal_comparison_after_round(self): self.check_value('xs:decimal(0.36) = round(0.36*100) div 100', True) def test_tokenizer_ambiguity(self): # From issue #27 self.check_tokenizer("sch:pattern[@is-a]", ['sch', ':', 'pattern', '[', '@', 'is-a', ']']) self.check_tokenizer("/is-a", ['/', 'is-a']) self.check_tokenizer("/-is-a", ['/', '-', 'is-a']) def test_operator_ambiguity(self): # Related to issue #27 self.check_tokenizer("/is", ['/', 'is']) context = XPathContext(self.etree.XML('')) self.check_value('/is', [], context) context = XPathContext(self.etree.XML('')) self.check_value('/is', [context.root], context) self.check_value('/and', [], context) context = XPathContext(self.etree.XML('')) self.check_value('/and', [context.root], context) root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('and', [context.root.getroot()], context) root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('eq', [context.root.getroot()], context) root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('union', [context.root.getroot()], context) def test_statements_ambiguity(self): root = self.etree.XML('') context = XPathContext(self.etree.ElementTree(root)) self.check_value('for', [context.root.getroot()], context) def test_auxiliary_tokens(self): # Tokens are parsed as names, so raise at evaluation if the contest is None self.check_raise('as', MissingContextError) self.check_raise('of', MissingContextError) context = XPathContext(self.etree.XML('')) self.check_value('as', expected=None, context=context) self.check_value('of', expected=None, context=context) def test_function_namespace(self): function_namespace = "http://xpath.test/fn/xpath-functions" parser = self.parser.__class__( namespaces={'fn2': function_namespace}, function_namespace=function_namespace ) token = parser.parse('fn2:true()') self.assertTrue(token.evaluate()) def test_invalid_schema_argument(self): schema = dedent("""\ """) with self.assertRaises(TypeError) as ctx: self.parser.__class__(schema=schema) self.assertEqual(str(ctx.exception), "argument 'schema' must be an instance of AbstractSchemaProxy") if xmlschema is not None: with self.assertRaises(TypeError): self.parser.__class__(schema=xmlschema.XMLSchema(schema)) def test_variable_types_argument(self): variable_types = {'a': 'item()', 'b': 'xs:integer'} parser = self.parser.__class__(variable_types=variable_types) self.assertEqual(variable_types, parser.variable_types) self.assertIsNot(variable_types, parser.variable_types) with self.assertRaises(ValueError) as ctx: self.parser.__class__(variable_types={'a': 'item()', 'b': 'xs:complex'}) self.assertEqual(str(ctx.exception), "invalid sequence type for in-scope variable types") def test_document_types_argument(self): document_types = {'doc1': 'node()*', 'doc2': 'element()'} parser = self.parser.__class__(document_types=document_types) self.assertEqual(document_types, parser.document_types) self.assertIs(document_types, parser.document_types) with self.assertRaises(ValueError) as ctx: self.parser.__class__(document_types={'doc1': 'node()*', 'doc2': 'etree()'}) self.assertEqual(str(ctx.exception), "invalid sequence type in document_types argument") def test_collection_types_argument(self): collection_types = {'col1': 'node()*', 'col2': 'element()*'} parser = self.parser.__class__(collection_types=collection_types) self.assertEqual(collection_types, parser.collection_types) self.assertIs(collection_types, parser.collection_types) with self.assertRaises(ValueError) as ctx: self.parser.__class__(collection_types={'doc1': 'node()*', 'doc2': 'etree()*'}) self.assertEqual(str(ctx.exception), "invalid sequence type in collection_types argument") def test_default_collection_type_argument(self): parser = self.parser.__class__(default_collection_type='element()*') self.assertEqual(parser.default_collection_type, 'element()*') with self.assertRaises(ValueError) as ctx: self.parser.__class__(default_collection_type='elem()*') self.assertEqual(str(ctx.exception), "invalid sequence type for default_collection_type argument") def test_default_collation_argument(self): locale_collation = get_locale_category(locale.LC_COLLATE) if locale_collation == 'en_US.UTF-8': locale_collation = "http://www.w3.org/2005/xpath-functions/collation/codepoint" self.assertEqual(self.parser.__class__().default_collation, locale_collation) parser = self.parser.__class__(default_collation='it_IT.UTF-8') self.assertEqual(parser.default_collation, 'it_IT.UTF-8') def test_issue_35_getting_attribute_names(self): root = self.etree.XML(dedent("""\ some text T1 T1 T1 T1 T1 T2 T2 T2 T2 T2 """)) result = ['attrib1', 'attrib2', 'isbn', 'lang', 'isbn', 'lang'] self.check_selector('//@*/local-name()', root, result) self.check_selector('//@*/name()', root, result) def test_external_function_registration(self): parser = self.parser.__class__() def foo(x): return str(x) self.assertIs(parser.symbol_table, parser.__class__.symbol_table) parser.external_function(foo) self.assertIsNot(parser.symbol_table, parser.__class__.symbol_table) self.assertIn('foo', parser.symbol_table) token_class = parser.symbol_table['foo'] self.assertTrue(issubclass(token_class, ProxyToken)) symbol = f'{{{XPATH_FUNCTIONS_NAMESPACE}}}foo' self.assertIn(symbol, parser.symbol_table) token_class = parser.symbol_table[symbol] self.assertTrue(issubclass(token_class, XPathFunction)) assert issubclass(token_class, XPathFunction) self.assertEqual(token_class.nargs, 1) token = parser.parse('foo(8)') self.assertEqual(token.evaluate(), '8') token = parser.parse('fn:foo("abc")') self.assertEqual(token.evaluate(), 'abc') with self.assertRaises(ValueError): parser.external_function(foo) parser.external_function(foo, name='bar') token = parser.parse('bar(99)') self.assertEqual(token.evaluate(), '99') with self.assertRaises(ValueError) as ctx: parser.external_function(foo) self.assertEqual(str(ctx.exception), "function 'fn:foo' is already registered") with self.assertRaises(ValueError) as ctx: parser.external_function(foo, 'concat') self.assertEqual(str(ctx.exception), "function 'fn:concat' is already registered") if self.parser.version >= '3.0': with self.assertRaises(ValueError) as ctx: parser.external_function(foo, 'pi', 'math') self.assertEqual(str(ctx.exception), "function 'math:pi' is already registered") with self.assertRaises(ValueError) as ctx: parser.external_function(foo, 'some') self.assertIn("'some' name collides with """) xml_source = dedent("""\ 3.14 foo true 2018-01-23T12:34:56Z 2018-01-23 """) schema = xmlschema.XMLSchema(xsd_source) assert schema.is_valid(xml_source) root = ET.fromstring(xml_source) root_node = get_node_tree(root) date_node = root_node.get_element_node(root[4]) assert date_node.name == 'date_value' assert date_node.xsd_type is None assert date_node.typed_value == '2018-01-23' schema_proxy = XMLSchemaProxy(schema) parser = XPath2Parser(schema=schema_proxy) assert date_node.xsd_type is None assert date_node.typed_value == '2018-01-23' root_token = parser.parse('fn:data(//*)') assert date_node.xsd_type is None assert date_node.typed_value == '2018-01-23' context = XPathContext(root_node, schema=schema_proxy) result = root_token.get_results(context) assert date_node.typed_value == Date10(2018, 1, 23) assert len(result) == 5 assert result[-1] == Date10(2018, 1, 23) token = XPath2Parser().parse('fn:data(.)') context = XPathContext(root_node, item=date_node) result = token.get_results(context) assert len(result) == 1 assert result[-1] == Date10(2018, 1, 23) def test_proxy_token_disambiguation__issue_078(self): root = self.etree.XML(dedent('''\ Flowers Flower Type Chrysanthemum perennial well drained Gardenia perennial Gerbera annual sandy, well-drained Iris
''')) results = select(root, 'min(.//row/count(entry))', parser=self.parser.__class__) self.assertEqual(results, 1) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath2ParserTest(XPath2ParserTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath30.py000066400000000000000000001724631476131650400230060ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # https://www.w3.org/TR/xpath-3/ # https://www.w3.org/TR/xpath-30/ # https://www.w3.org/TR/xpath-31/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import io import unittest import os import re import math import pathlib import platform import xml.etree.ElementTree as ElementTree from textwrap import dedent from typing import cast try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import select, XPathContext, MissingContextError, datatypes, XPathFunction from elementpath.namespaces import XPATH_FUNCTIONS_NAMESPACE from elementpath.etree import is_etree_element, is_lxml_etree_document, is_etree_document from elementpath.xpath_nodes import ElementNode, DocumentNode from elementpath.xpath3 import XPath30Parser from elementpath.xpath30.xpath30_helpers import PICTURE_PATTERN, \ int_to_roman, int_to_alphabetic, int_to_words try: from tests import test_xpath2_parser from tests import test_xpath2_functions from tests import test_xpath2_constructors except ImportError: import test_xpath2_parser import test_xpath2_functions import test_xpath2_constructors ANALYZE_STRING_1 = """ 2008-12-03 """ ANALYZE_STRING_2 = """ A1 , C15 ,, D24 , X50 , """ class XPath30ParserTest(test_xpath2_parser.XPath2ParserTest): def setUp(self): self.parser = XPath30Parser(namespaces=self.namespaces) def test_decimal_formats_argument(self): decimal_formats = {None: {'decimal-separator': '|', 'grouping-separator': '.'}} parser = self.parser.__class__(decimal_formats=decimal_formats) expected = { 'decimal-separator': '|', 'grouping-separator': '.', 'exponent-separator': 'e', 'infinity': 'Infinity', 'minus-sign': '-', 'NaN': 'NaN', 'percent': '%', 'per-mille': '‰', 'zero-digit': '0', 'digit': '#', 'pattern-separator': ';' } self.assertDictEqual(parser.decimal_formats[None], expected) self.assertEqual( repr(parser), f"<{parser.__class__.__name__} object at {hex(id(parser))}>" ) self.assertEqual( str(parser), f'{parser.__class__.__name__}(decimal_formats={parser.decimal_formats})' ) decimal_formats = {'foo': {'decimal-separator': '|', 'grouping-separator': '.'}} parser = self.parser.__class__(decimal_formats=decimal_formats) self.assertDictEqual(parser.decimal_formats[None], self.parser.__class__.decimal_formats[None]) self.assertDictEqual(parser.decimal_formats.get('foo'), expected) def test_defuse_xml_argument(self): parser = self.parser.__class__(defuse_xml=False) self.assertFalse(parser.defuse_xml) self.assertEqual( repr(parser), f'<{parser.__class__.__name__} object at {hex(id(parser))}>' ) self.assertEqual(str(parser), f'{parser.__class__.__name__}(defuse_xml=False)') parser = self.parser.__class__({'tst': 'http://xpath.test/ns'}, defuse_xml=False) self.assertEqual( str(parser), f"{parser.__class__.__name__}({{'tst': 'http://xpath.test/ns'}}, defuse_xml=False)" ) self.assertEqual( str(parser), f"{parser.__class__.__name__}({{'tst': 'http://xpath.test/ns'}}, defuse_xml=False)" ) def test_function_match(self): token = self.parser.parse('math:pi()') self.assertEqual( repr(token), f"<{token.__class__.__name__} object at {hex(id(token))}>" ) self.assertEqual(str(token), "'math:pi' function") self.assertEqual(token.source, 'math:pi()') def test_braced_uri_literal(self): expected_lexemes = ['Q{', 'http', ':', '//', 'xpath.test', '/', 'ns', '}', 'ABC'] self.check_tokenizer("Q{http://xpath.test/ns}ABC", expected_lexemes) self.check_tokenizer("/Q{http://xpath.test/ns}ABC", ['/'] + expected_lexemes) self.check_tokenizer("Q{###}ABC", ['Q{', '#', '#', '#', '}', 'ABC']) expression = '/Q{http://xpath.test/ns}ABC' token = self.parser.parse(expression) self.assertEqual(token.symbol, '/') self.assertEqual(token[0].symbol, 'Q{') self.assertEqual(token.source, expression) with self.assertRaises(TypeError) as ctx: self.parser.parse('/Q{###}ABC') self.assertIn('XQST0046', str(ctx.exception)) expression = 'Q{http://www.w3.org/2005/xpath-functions/math}pi()' token = self.parser.parse(expression) self.assertAlmostEqual(token.evaluate(), math.pi) self.assertEqual(token.source, expression) expression = 'Q{}foo' token = self.parser.parse(expression) self.assertEqual(token.value, 'foo') expression = 'Q{ }foo' token = self.parser.parse(expression) self.assertEqual(token.value, 'foo') expression = 'Q{ bar }foo' token = self.parser.parse(expression) self.assertEqual(token.value, '{bar}foo') with self.assertRaises(SyntaxError): self.parser.parse('Q{ }') with self.assertRaises(SyntaxError): self.parser.parse('Q{ } ') with self.assertRaises(SyntaxError): self.parser.parse('Q{bar} ') # '{' is unusable for non-standard braced URI literals # because is used for inline functions body with self.assertRaises(SyntaxError): self.parser.parse('{http://www.w3.org/2005/xpath-functions/math}pi()') def test_concat_operator(self): token = self.parser.parse("10 || '/' || 6") self.assertEqual(token.evaluate(), "10/6") self.assertEqual(token.source, "10 || '/' || 6") self.check_tree('"true" || "false"', "(|| ('true') ('false'))") self.check_tree('"true"||"false"', "(|| ('true') ('false'))") def test_function_test(self): func: XPathFunction expression = "function($x as item()) as item() { $x }" func = cast(XPathFunction, self.parser.parse(expression)) self.assertTrue(func.match_function_test('function(*)')) self.assertEqual(func.source, expression.replace(" $x ", '$x')) func = cast( XPathFunction, self.parser.parse("function($x as item()) as xs:integer { $x }") ) self.assertTrue(func.match_function_test('function(item()) as item()')) func = cast( XPathFunction, self.parser.parse("function($x as item()) as item() { $x }") ) self.assertTrue(func.match_function_test('function(xs:string) as item()')) def test_dynamic_function_call(self): token = self.parser.parse("$f(2, 3)") self.assertEqual(token.source, "$f(2, 3)") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root, variables={'f': 10}) with self.assertRaises(TypeError): token.evaluate(context) context.variables['f'] = self.parser.symbol_table['concat'](self.parser, nargs=2) self.assertEqual(token.evaluate(context), '23') with self.assertRaises(TypeError): self.parser.parse("f(2, 3)") with self.assertRaises(MissingContextError): token.evaluate() token = self.parser.parse('$f[2]("Hi there")') self.assertEqual(token.source, "$f[2]('Hi there')") with self.assertRaises(MissingContextError): token.evaluate() context.variables['f'] = self.parser.symbol_table['concat'](self.parser, nargs=2) with self.assertRaises(TypeError): token.evaluate(context) context.variables['f'] = [1, context.variables['f']] with self.assertRaises(TypeError): token.evaluate(context) context.variables['f'] = self.parser.symbol_table['true'](self.parser, nargs=0) token = self.parser.parse('$f()[2]') self.assertEqual(token.source, "$f()[2]") with self.assertRaises(MissingContextError): token.evaluate() self.assertEqual(token.evaluate(context), []) token = self.parser.parse('$f()[1]') self.assertTrue(token.evaluate(context)) def test_let_expression(self): expression = 'let $x := 4, $y := 3 return $x + $y' token = self.parser.parse(expression) self.assertEqual(token.source, expression) with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertEqual(token.evaluate(context), [7]) def test_picture_pattern(self): self.assertListEqual(PICTURE_PATTERN.findall(''), []) self.assertListEqual(PICTURE_PATTERN.findall('a'), []) self.assertListEqual(PICTURE_PATTERN.findall('[y]'), ['[y]']) self.assertListEqual(PICTURE_PATTERN.findall('[h01][m01][z,2-6]'), ['[h01]', '[m01]', '[z,2-6]']) self.assertListEqual(PICTURE_PATTERN.findall('[H٠]:[m٠]:[s٠٠]:[f٠٠٠]'), ['[H٠]', '[m٠]', '[s٠٠]', '[f٠٠٠]']) self.assertListEqual(PICTURE_PATTERN.split(' [H٠]:[m٠]:[s٠٠]:[f٠٠٠]'), [' ', ':', ':', ':', '']) self.assertListEqual(PICTURE_PATTERN.findall('[y'), []) self.assertListEqual(PICTURE_PATTERN.findall('[[y]'), ['[y]']) def test_int_to_roman(self): self.assertRaises(TypeError, int_to_roman, 3.0) self.assertEqual(int_to_roman(0), '0') self.assertEqual(int_to_roman(3), 'III') self.assertEqual(int_to_roman(4), 'IV') self.assertEqual(int_to_roman(5), 'V') self.assertEqual(int_to_roman(7), 'VII') self.assertEqual(int_to_roman(9), 'IX') self.assertEqual(int_to_roman(10), 'X') self.assertEqual(int_to_roman(11), 'XI') self.assertEqual(int_to_roman(19), 'XIX') self.assertEqual(int_to_roman(20), 'XX') self.assertEqual(int_to_roman(49), 'XLIX') self.assertEqual(int_to_roman(100), 'C') self.assertEqual(int_to_roman(489), 'CDLXXXIX') self.assertEqual(int_to_roman(2999), 'MMCMXCIX') def test_int_to_alphabetic(self): self.assertEqual(int_to_alphabetic(4), 'd') self.assertEqual(int_to_alphabetic(7), 'g') self.assertEqual(int_to_alphabetic(25), 'y') self.assertEqual(int_to_alphabetic(26), 'z') self.assertEqual(int_to_alphabetic(27), 'aa') self.assertEqual(int_to_alphabetic(-29), '-ac') self.assertEqual(int_to_alphabetic(890), 'ahf') def test_int_to_words(self): self.assertEqual(int_to_words(1), 'one') self.assertEqual(int_to_words(4), 'four') @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath30ParserTest(XPath30ParserTest): etree = lxml_etree class XPath30FunctionsTest(test_xpath2_functions.XPath2FunctionsTest): maxDiff = 1024 def setUp(self): self.parser = XPath30Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] def test_pi_math_function(self): token = self.parser.parse('math:pi()') self.assertEqual(token.evaluate(), math.pi) def test_exp_math_function(self): token = self.parser.parse('math:exp(())') self.assertEqual(token.evaluate(), []) self.assertEqual(token.source, 'math:exp(())') self.assertAlmostEqual(self.parser.parse('math:exp(0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:exp(1)').evaluate(), 2.718281828459045) self.assertAlmostEqual(self.parser.parse('math:exp(2)').evaluate(), 7.38905609893065) self.assertAlmostEqual(self.parser.parse('math:exp(-1)').evaluate(), 0.36787944117144233) self.assertAlmostEqual(self.parser.parse('math:exp(math:pi())').evaluate(), 23.140692632779267) expression = 'math:exp(xs:double("NaN"))' self.assertTrue(math.isnan(self.parser.parse(expression).evaluate())) self.check_source(expression, expression.replace('"', "'")) self.assertEqual(self.parser.parse("math:exp(xs:double('INF'))").evaluate(), float('inf')) expression = "math:exp(xs:double('-INF'))" self.assertAlmostEqual(self.parser.parse(expression).evaluate(), 0.0) self.check_source(expression, expression.replace('"', "'")) def test_exp10_math_function(self): token = self.parser.parse('math:exp10(())') self.assertEqual(token.evaluate(), []) self.assertEqual(token.source, 'math:exp10(())') self.assertAlmostEqual(self.parser.parse('math:exp10(0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:exp10(1)').evaluate(), 10) self.assertAlmostEqual(self.parser.parse('math:exp10(0.5)').evaluate(), 3.1622776601683795) self.assertAlmostEqual(self.parser.parse('math:exp10(-1)').evaluate(), 0.1) self.assertTrue(math.isnan(self.parser.parse('math:exp10(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:exp10(xs:double('INF'))").evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse("math:exp10(xs:double('-INF'))").evaluate(), 0.0) def test_log_math_function(self): token = self.parser.parse('math:log(())') self.assertEqual(token.evaluate(), []) self.assertEqual(token.source, 'math:log(())') self.assertEqual(self.parser.parse('math:log(0)').evaluate(), float('-inf')) self.assertAlmostEqual(self.parser.parse('math:log(math:exp(1))').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:log(1.0e-3)').evaluate(), -6.907755278982137) self.assertAlmostEqual(self.parser.parse('math:log(2)').evaluate(), 0.6931471805599453) self.assertTrue(math.isnan(self.parser.parse('math:log(-1)').evaluate())) self.assertTrue(math.isnan(self.parser.parse('math:log(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:log(xs:double('INF'))").evaluate(), float('inf')) self.assertTrue(math.isnan(self.parser.parse('math:log(xs:double("-INF"))').evaluate())) def test_log10_math_function(self): token = self.parser.parse('math:log10(())') self.assertEqual(token.evaluate(), []) self.assertEqual(token.source, 'math:log10(())') self.assertEqual(self.parser.parse('math:log10(0)').evaluate(), float('-inf')) self.assertAlmostEqual(self.parser.parse('math:log10(1.0e3)').evaluate(), 3.0) self.assertAlmostEqual(self.parser.parse('math:log10(1.0e-3)').evaluate(), -3.0) self.assertAlmostEqual(self.parser.parse('math:log10(2)').evaluate(), 0.3010299956639812) self.assertTrue(math.isnan(self.parser.parse('math:log10(-1)').evaluate())) self.assertTrue(math.isnan(self.parser.parse('math:log10(xs:double("NaN"))').evaluate())) self.assertEqual(self.parser.parse("math:log10(xs:double('INF'))").evaluate(), float('inf')) self.assertTrue(math.isnan(self.parser.parse('math:log10(xs:double("-INF"))').evaluate())) def test_pow_math_function(self): self.assertEqual(self.parser.parse('math:pow((), 93.7)').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:pow(2, 3)').evaluate(), 8.0) self.assertAlmostEqual(self.parser.parse('math:pow(-2, 3)').evaluate(), -8.0) self.assertAlmostEqual(self.parser.parse('math:pow(2, -3)').evaluate(), 0.125) self.assertAlmostEqual(self.parser.parse('math:pow(-2, -3)').evaluate(), -0.125) self.assertAlmostEqual(self.parser.parse('math:pow(2, 0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:pow(0, 0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(xs:double('INF'), 0)").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(xs:double('NaN'), 0)").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(-math:pi(), 0)").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 3)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 4)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(-0e0, 3)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(0, 3)').evaluate(), 0.0) self.assertEqual(self.parser.parse('math:pow(0e0, -3)').evaluate(), float('inf')) self.assertEqual(self.parser.parse('math:pow(0e0, -4)').evaluate(), float('inf')) self.assertEqual(self.parser.parse('math:pow(-0e0, -3)').evaluate(), float('-inf')) self.assertEqual(self.parser.parse('math:pow(0, -4)').evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse('math:pow(16, 0.5e0)').evaluate(), 4.0) self.assertAlmostEqual(self.parser.parse('math:pow(16, 0.25e0)').evaluate(), 2.0) self.assertEqual(self.parser.parse('math:pow(0e0, -3.0e0)').evaluate(), float('inf')) self.assertEqual(self.parser.parse('math:pow(-0e0, -3.0e0)').evaluate(), float('-inf')) self.assertEqual(self.parser.parse('math:pow(0e0, -3.1e0)').evaluate(), float('inf')) self.assertEqual(self.parser.parse('math:pow(-0e0, -3.1e0)').evaluate(), float('inf')) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 3.0e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(-0e0, 3.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:pow(0e0, 3.1e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:pow(-0e0, 3.1e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse("math:pow(-1, xs:double('INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(-1, xs:double('-INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(1, xs:double('INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(1, xs:double('-INF'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse("math:pow(1, xs:double('NaN'))").evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:pow(-2.5e0, 2.0e0)').evaluate(), 6.25) self.assertTrue(math.isnan(self.parser.parse('math:pow(-2.5e0, 2.00000001e0)').evaluate())) self.check_source('math:pow(0e0, 3.1e0)', 'math:pow(0.0, 3.1)') def test_sqrt_math_function(self): self.assertEqual(self.parser.parse('math:sqrt(())').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:sqrt(0.0e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:sqrt(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:sqrt(1.0e6)').evaluate(), 1.0e3) self.assertAlmostEqual(self.parser.parse('math:sqrt(2.0e0)').evaluate(), 1.4142135623730951) self.assertTrue(math.isnan(self.parser.parse('math:sqrt(-2.0e0)').evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:sqrt(xs:double('NaN'))").evaluate())) self.assertEqual(self.parser.parse("math:sqrt(xs:double('INF'))").evaluate(), float('inf')) self.assertTrue(math.isnan(self.parser.parse("math:sqrt(xs:double('-INF'))").evaluate())) self.check_source('math:sqrt(1.0e6)', 'math:sqrt(1000000.0)') def test_sin_math_function(self): self.assertEqual(self.parser.parse('math:sin(())').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:sin(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:sin(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:sin(math:pi() div 2)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:sin(-math:pi() div 2)').evaluate(), -1.0) self.assertAlmostEqual(self.parser.parse('math:sin(math:pi())').evaluate(), 0.0, places=13) self.assertTrue(math.isnan(self.parser.parse("math:sin(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:sin(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:sin(xs:double('-INF'))").evaluate())) expression = 'math:sin(-math:pi() div 2)' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'math:sin' function") def test_cos_math_function(self): self.assertEqual(self.parser.parse('math:cos(())').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:cos(0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:cos(-0.0e0)').evaluate(), 1.0) self.assertAlmostEqual(self.parser.parse('math:cos(math:pi() div 2)').evaluate(), 0.0, places=13) self.assertAlmostEqual(self.parser.parse('math:cos(-math:pi() div 2)').evaluate(), 0.0, places=13) self.assertAlmostEqual(self.parser.parse('math:cos(math:pi())').evaluate(), -1.0) self.assertTrue(math.isnan(self.parser.parse("math:cos(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:cos(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:cos(xs:double('-INF'))").evaluate())) expression = "math:cos(xs:double('INF'))" token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'math:cos' function") def test_tan_math_function(self): self.assertEqual(self.parser.parse('math:tan(())').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:tan(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:tan(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:tan(math:pi() div 4)').evaluate(), 1.0, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(-math:pi() div 4)').evaluate(), -1.0, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(math:pi() div 2)').evaluate(), 1.633123935319537E16, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(-math:pi() div 2)').evaluate(), -1.633123935319537E16, places=13) self.assertAlmostEqual(self.parser.parse('math:tan(math:pi())').evaluate(), 0.0, places=13) self.assertTrue(math.isnan(self.parser.parse("math:tan(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:tan(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:tan(xs:double('-INF'))").evaluate())) expression = 'math:tan(-0.0e0)' token = self.parser.parse(expression) self.assertEqual(token.source, 'math:tan(-0.0)') self.assertEqual( repr(token[1]), f"" ) self.assertEqual(str(token[1]), "'math:tan' function") def test_asin_math_function(self): self.assertEqual(self.parser.parse('math:asin(())').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:asin(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:asin(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual( self.parser.parse('math:asin(1.0e0)').evaluate(), 1.5707963267948966e0, places=13 ) self.assertAlmostEqual( self.parser.parse('math:asin(-1.0e0)').evaluate(), -1.5707963267948966e0, places=13 ) self.assertTrue(math.isnan(self.parser.parse("math:asin(2.0e0)").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:asin(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:asin(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:asin(xs:double('-INF'))").evaluate())) def test_acos_math_function(self): self.assertEqual(self.parser.parse('math:acos(())').evaluate(), []) self.assertAlmostEqual( self.parser.parse('math:acos(0.0e0)').evaluate(), 1.5707963267948966e0, places=13 ) self.assertAlmostEqual( self.parser.parse('math:acos(-0.0e0)').evaluate(), 1.5707963267948966e0, places=13 ) self.assertAlmostEqual(self.parser.parse('math:acos(1.0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:acos(-1.0e0)').evaluate(), math.pi) self.assertTrue(math.isnan(self.parser.parse("math:acos(2.0e0)").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:acos(xs:double('NaN'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:acos(xs:double('INF'))").evaluate())) self.assertTrue(math.isnan(self.parser.parse("math:acos(xs:double('-INF'))").evaluate())) def test_atan_math_function(self): self.assertEqual(self.parser.parse('math:atan(())').evaluate(), []) self.assertAlmostEqual(self.parser.parse('math:atan(0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:atan(-0.0e0)').evaluate(), -0.0) self.assertAlmostEqual( self.parser.parse('math:atan(1.0e0)').evaluate(), 0.7853981633974483e0, places=13 ) self.assertAlmostEqual( self.parser.parse('math:atan(-1.0e0)').evaluate(), -0.7853981633974483e0, places=13 ) self.assertTrue(math.isnan(self.parser.parse("math:atan(xs:double('NaN'))").evaluate())) self.assertAlmostEqual( self.parser.parse("math:atan(xs:double('INF'))").evaluate(), 1.5707963267948966e0, places=5 ) self.assertAlmostEqual( self.parser.parse("math:atan(xs:double('-INF'))").evaluate(), -1.5707963267948966e0, places=5 ) def test_atan2_math_function(self): self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, 0.0e0)').evaluate(), 0.0) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, 0.0e0)').evaluate(), -0.0) self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, -0.0e0)').evaluate(), math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, -0.0e0)').evaluate(), -math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(-1, 0.0e0)').evaluate(), -math.pi / 2) self.assertAlmostEqual(self.parser.parse('math:atan2(+1, 0.0e0)').evaluate(), math.pi / 2) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, -1)').evaluate(), -math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, -1)').evaluate(), math.pi) self.assertAlmostEqual(self.parser.parse('math:atan2(-0.0e0, +1)').evaluate(), -0.0e0) self.assertAlmostEqual(self.parser.parse('math:atan2(+0.0e0, +1)').evaluate(), 0.0e0) def test_analyze_string_function(self): expression = 'fn:analyze-string("The cat sat on the mat.", "unmatchable")' token = self.parser.parse(expression) self.assertEqual(token.source, expression.replace('"', "'")) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:analyze-string' function") context = XPathContext(root=self.etree.XML('')) result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 1) self.assertEqual(root[0].text, "The cat sat on the mat.") token = self.parser.parse(r'fn:analyze-string("The cat sat on the mat.", "\w+")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 12) chunks = ['The', ' ', 'cat', ' ', 'sat', ' ', 'on', ' ', 'the', ' ', 'mat', '.'] for k in range(len(chunks)): if k % 2: self.assertEqual(root[k].tag, '{http://www.w3.org/2005/xpath-functions}non-match') else: self.assertEqual(root[k].tag, '{http://www.w3.org/2005/xpath-functions}match') self.assertEqual(root[k].text, chunks[k]) token = self.parser.parse(r'fn:analyze-string("2008-12-03", "^(\d+)\-(\d+)\-(\d+)$")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 1) ElementTree.register_namespace('', XPATH_FUNCTIONS_NAMESPACE) self.assertEqual( ElementTree.tostring(root, encoding='utf-8').decode('utf-8'), re.sub(r'\n\s*', '', ANALYZE_STRING_1) ) token = self.parser.parse('fn:analyze-string("A1,C15,,D24, X50,", "([A-Z])([0-9]+)")') result = token.evaluate(context) self.assertIsInstance(result, ElementNode) root = result.elem self.assertEqual(len(root), 8) self.assertEqual( ElementTree.tostring(root, encoding='utf-8').decode('utf-8'), re.sub(r'\n\s*', '', ANALYZE_STRING_2) ) def test_has_children_function(self): with self.assertRaises(MissingContextError): self.parser.parse('has-children()').evaluate() with self.assertRaises(MissingContextError): self.parser.parse('fn:has-children(1)').evaluate() context = XPathContext(root=self.etree.ElementTree(self.etree.XML(''))) self.assertTrue(self.parser.parse('has-children()').evaluate(context)) self.assertTrue(self.parser.parse('has-children(.)').evaluate(context)) context = XPathContext(root=self.etree.XML('')) self.assertFalse(self.parser.parse('has-children()').evaluate(context)) self.assertFalse(self.parser.parse('has-children(.)').evaluate(context)) context.item = ElementNode(self.etree.XML('')) self.assertFalse(self.parser.parse('has-children()').evaluate(context)) self.assertFalse(self.parser.parse('has-children(.)').evaluate(context)) context.variables['elem'] = ElementNode(self.etree.XML('
')) self.assertTrue(self.parser.parse('has-children($elem)').evaluate(context)) self.assertFalse(self.parser.parse('has-children($elem/b1)').evaluate(context)) expression = 'has-children($elem/b1)' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token), f'' ) self.assertEqual(str(token), "'fn:has-children' function") def test_innermost_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:innermost(A)').evaluate() root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext(root=document) nodes = self.parser.parse('fn:innermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=root) nodes = self.parser.parse('fn:innermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=document, variables={'nodes': [root, document]}) nodes = self.parser.parse('fn:innermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], ElementNode) self.assertIs(nodes[0].value, root) root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext( root=document, variables={'nodes': [root, document, root[0], root[0]]} ) nodes = self.parser.parse('fn:innermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0].value, root[0]) context = XPathContext( root=document, variables={'nodes': [document, root[0][0], root, document, root[0], root[1]]} ) nodes = self.parser.parse('fn:innermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 2) self.assertIs(nodes[0].value, root[0][0]) self.assertIs(nodes[1].value, root[1]) expression = 'innermost($nodes)' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token), f'' ) self.assertEqual(str(token), "'fn:innermost' function") def test_outermost_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:outermost(A)').evaluate() root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext(root=document) nodes = self.parser.parse('fn:outermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=root) nodes = self.parser.parse('fn:outermost(.)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIs(nodes[0], context.root) context = XPathContext(root=document, variables={'nodes': [root, document]}) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], DocumentNode) self.assertIs(nodes[0].value, document) root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext( root=document, variables={'nodes': [root, document, root[0], document]} ) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], DocumentNode) self.assertIs(nodes[0].value, document) context = XPathContext( root=document, variables={'nodes': [document, root[0][0], root, document, root[0], root[1]]} ) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 1) self.assertIsInstance(nodes[0], DocumentNode) self.assertIs(nodes[0].value, document) context = XPathContext( root=document, variables={'nodes': [root[0][0], root[1], root[0]]} ) nodes = self.parser.parse('fn:outermost($nodes)').evaluate(context) self.assertIsInstance(nodes, list) self.assertEqual(len(nodes), 2) self.assertIs(nodes[0].value, root[0]) self.assertIs(nodes[1].value, root[1]) expression = 'outermost($nodes)' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token), f'' ) self.assertEqual(str(token), "'fn:outermost' function") def test_parse_xml_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:parse-xml("abcd")').evaluate() root = self.etree.XML('') context = XPathContext(root=self.etree.ElementTree(root)) document = self.parser.parse('fn:parse-xml("abcd")').evaluate(context) self.assertIsInstance(document, DocumentNode) self.assertTrue(is_etree_element(document.document.getroot())) self.assertEqual(document.document.getroot().tag, 'alpha') self.assertEqual(document.document.getroot().text, 'abcd') if self.etree is lxml_etree: self.assertTrue(is_lxml_etree_document(document.document)) else: self.assertFalse(is_lxml_etree_document(document.document)) self.assertEqual(document.document.getroot().tag, 'alpha') self.assertEqual(document.document.getroot().text, 'abcd') self.assertEqual(self.parser.parse('fn:parse-xml(())').evaluate(), []) with self.assertRaises(ValueError) as ctx: self.parser.parse('fn:parse-xml("abcd")').evaluate(context) self.assertIn('FODC0006', str(ctx.exception)) self.assertIn('not a well-formed XML document', str(ctx.exception)) expression = "parse-xml('abcd')" token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token), f'' ) self.assertEqual(str(token), "'fn:parse-xml' function") def test_parse_xml_fragment_function(self): root = self.etree.XML('') context = XPathContext(root=self.etree.ElementTree(root)) result = self.parser.parse( 'fn:parse-xml-fragment("abcdabcd")' ).evaluate(context) self.assertIsInstance(result, DocumentNode) document = result.document self.assertTrue(is_etree_element(document.getroot())) self.assertEqual(document.getroot().tag, 'root') self.assertEqual(document.getroot()[0].tag, 'alpha') self.assertEqual(document.getroot()[0].text, 'abcd') self.assertEqual(document.getroot()[1].tag, 'beta') self.assertEqual(document.getroot()[1].text, 'abcd') # Fragments that are not valid formal documents result = self.parser.parse( 'fn:parse-xml-fragment("abcdabcd")' ).evaluate(context) self.assertIsInstance(result, DocumentNode) self.assertTrue(result.is_extended) self.assertTrue(is_etree_document(result.document)) self.assertEqual(result[0].elem.tag, 'alpha') self.assertEqual(result[0].elem.text, 'abcd') self.assertEqual(result[1].elem.tag, 'beta') self.assertEqual(result[1].elem.text, 'abcd') result = self.parser.parse( 'fn:parse-xml-fragment("He was so kind")' ).evaluate(context) self.assertIsInstance(result, DocumentNode) if not is_lxml_etree_document(result.document): self.assertTrue(result.is_extended) self.assertTrue(is_etree_document(result.document)) self.assertEqual(result[0].value, 'He was ') self.assertEqual(result[1].elem.tag, 'i') self.assertEqual(result[1].elem.text, 'so') self.assertEqual(result[1].elem.tail, ' kind') result = self.parser.parse('fn:parse-xml-fragment("")').evaluate(context) self.assertTrue(is_etree_document(result.document)) self.assertEqual(len(result.children), 0) result = self.parser.parse('fn:parse-xml-fragment(" ")').evaluate(context) self.assertTrue(is_etree_document(result.document)) self.assertEqual(result[0].value, ' ') with self.assertRaises(MissingContextError): self.parser.parse( 'fn:parse-xml(\'\')' ).evaluate() root = self.etree.XML('') context = XPathContext(root=self.etree.ElementTree(root)) with self.assertRaises(ValueError) as ctx: self.parser.parse( 'fn:parse-xml(\'\')' ).evaluate(context) self.assertIn('FODC0006', str(ctx.exception)) self.assertIn('not a well-formed XML document', str(ctx.exception)) expression = "parse-xml-fragment(' ')" token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token), f'' ) self.assertEqual(str(token), "'fn:parse-xml-fragment' function") def test_serialize_function(self): root = self.etree.XML('') document = self.etree.ElementTree(root) context = XPathContext( root=document, variables={ 'params': ElementTree.XML( '' ' ' '' ), 'data': self.etree.XML("") } ) result = self.parser.parse('fn:serialize($data, $params)').evaluate(context) self.assertEqual(result.replace(' />', '/>'), '') def test_odd_children_serialization__issue_056(self): root = self.etree.XML('This is important.') context = XPathContext(root) expected = 'This is important.' self.check_value('fn:serialize(.)', expected, context=context) context = XPathContext(root) expected = 'This is important.' self.check_value('fn:serialize(node())', expected, context=context) def test_head_function(self): self.assertEqual(self.parser.parse('fn:head(1 to 5)').evaluate(), 1) self.assertEqual(self.parser.parse('fn:head(("a", "b", "c"))').evaluate(), 'a') self.assertEqual(self.parser.parse('fn:head(())').evaluate(), []) def test_tail_function(self): self.assertListEqual(self.parser.parse('fn:tail(1 to 5)').evaluate(), [2, 3, 4, 5]) self.assertListEqual(self.parser.parse('fn:tail(("a", "b", "c"))').evaluate(), ['b', 'c']) self.assertListEqual(self.parser.parse('fn:tail(("a"))').evaluate(), []) self.assertListEqual(self.parser.parse('fn:tail(())').evaluate(), []) def test_generate_id_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:generate-id()').evaluate() with self.assertRaises(TypeError) as ctx: self.parser.parse('fn:generate-id(1)').evaluate() self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn('argument is not a node', str(ctx.exception)) root = self.etree.XML('') context = XPathContext(root=root) result = self.parser.parse('fn:generate-id()').evaluate(context) self.assertEqual(result, 'ID{}'.format(id(context.item))) result = self.parser.parse('fn:generate-id(.)').evaluate(context) self.assertEqual(result, 'ID{}'.format(id(context.item))) context.item = 1 with self.assertRaises(TypeError) as ctx: self.parser.parse('fn:generate-id()').evaluate(context) self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn('context item is not a node', str(ctx.exception)) def test_unparsed_text_function(self): with self.assertRaises(ValueError) as ctx: self.parser.parse('fn:unparsed-text("alpha#fragment")').evaluate() self.assertIn('FOUT1170', str(ctx.exception)) self.assertEqual(self.parser.parse('fn:unparsed-text(())').evaluate(), []) if platform.system() != 'Windows': filepath = pathlib.Path(__file__).absolute().parent.joinpath('resources/sample.xml') file_lines = ['', 'abc àèéìù'] # Checks before that the resource text file is accessible and its content is as expected with filepath.open() as fp: text = fp.read() self.assertListEqual([x.strip() for x in text.strip().split('\n')], file_lines) path = 'fn:unparsed-text("file://{}")'.format(str(filepath)) text = self.parser.parse(path).evaluate() self.assertListEqual([x.strip() for x in text.strip().split('\n')], file_lines) path = 'fn:unparsed-text("file://{}", "unknown")'.format(str(filepath)) with self.assertRaises(ValueError) as ctx: self.parser.parse(path).evaluate() self.assertIn('FOUT1190', str(ctx.exception)) def test_environment_variable_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:environment-variable("PATH")').evaluate() root = self.etree.XML('') context = XPathContext(root=root) path = 'fn:environment-variable("PATH")' self.assertEqual(self.parser.parse(path).evaluate(context), []) context = XPathContext(root=root, allow_environment=True) try: key = list(os.environ)[0] except IndexError: pass else: path = 'fn:environment-variable("{}")'.format(key) self.assertEqual(self.parser.parse(path).evaluate(context), os.environ[key]) def test_available_environment_variables_function(self): with self.assertRaises(MissingContextError): self.parser.parse('fn:available-environment-variables()').evaluate() root = self.etree.XML('') context = XPathContext(root=root) path = 'fn:available-environment-variables()' self.assertEqual(self.parser.parse(path).evaluate(context), []) context = XPathContext(root=root, allow_environment=True) self.assertListEqual(self.parser.parse(path).evaluate(context), list(os.environ)) def test_inline_function_expression(self): expression = "function() as xs:integer+ {2, 3, 5, 7, 11, 13}" token = self.parser.parse(expression) with self.assertRaises(MissingContextError): token.evaluate() self.assertEqual(token.source, expression) self.assertEqual( repr(token), f'<_InlineFunction object at {hex(id(token))}>' ) self.assertEqual(str(token), "inline function") root = self.etree.XML('') context = XPathContext(root=root, variables={'a': 9.0, 'b': 3.0}) self.assertListEqual(token(context=context), [2, 3, 5, 7, 11, 13]) expression = "function($a as xs:double, $b as xs:double) " \ "as xs:double {$a * $b} (9.0, 3.0)" token = self.parser.parse(expression) with self.assertRaises(MissingContextError): token.evaluate() self.assertEqual(token.source, expression.replace('} (', '}(')) root = self.etree.XML('') context = XPathContext(root=root) self.assertAlmostEqual(token.evaluate(context), 27.0) token = self.parser.parse("function($a) { $a } (10)") with self.assertRaises(MissingContextError): token.evaluate() self.assertEqual(token.evaluate(context), 10) def test_function_lookup(self): expression = "fn:function-lookup(xs:QName('fn:substring'), 2)('abcd', 2)" token = self.parser.parse(expression) self.assertEqual(token.evaluate(), "bcd") self.assertEqual(token.source, expression) self.assertEqual( repr(token[0][1]), f'' ) self.assertEqual(str(token[0][1]), "'fn:function-lookup' function") with self.xsd_version_parser('1.1'): token = self.parser.parse("(fn:function-lookup(xs:QName('xs:dateTimeStamp'), 1), " "xs:dateTime#1)[1] ('2011-11-11T11:11:11Z')") with self.assertRaises(MissingContextError): token.evaluate() # Context is required by predicate selector [1] root = self.etree.XML('') context = XPathContext(root=root) dts = datatypes.DateTimeStamp.fromstring('2011-11-11T11:11:11Z') self.assertEqual(token.evaluate(context), dts) def test_function_name(self): token = self.parser.parse("fn:function-name(fn:substring#2) ") result = datatypes.QName("http://www.w3.org/2005/xpath-functions", "fn:substring") self.assertEqual(token.evaluate(), result) expression = "fn:function-name(function($node) {count($node/*)})" token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:function-name' function") # Context is not used if the argument is a function self.assertEqual(token.evaluate(), []) root = self.etree.XML('') context = XPathContext(root=root, variables={'node': root}) self.assertEqual(token.evaluate(context), []) def test_function_arity(self): token = self.parser.parse("fn:function-arity(fn:substring#2)") self.assertEqual(token.evaluate(), 2) expression = "fn:function-arity(function($node) {name($node)})" token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:function-arity' function") # Context is not used if the argument is a function self.assertEqual(token.evaluate(), 1) root = self.etree.XML('') context = XPathContext(root=root, variables={'node': root}) self.assertEqual(token.evaluate(context), 1) def test_for_each(self): expression = 'fn:for-each(1 to 5, function($a) {$a * $a})' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:for-each' function") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [1, 4, 9, 16, 25]) token = self.parser.parse('fn:for-each(("john", "jane"), fn:string-to-codepoints#1)') self.assertListEqual(token.evaluate(context), [106, 111, 104, 110, 106, 97, 110, 101]) token = self.parser.parse('fn:for-each(("23", "29"), xs:int#1)') self.assertListEqual(token.evaluate(context), [23, 29]) def test_filter(self): expression = 'fn:filter(1 to 10, function($a) {$a mod 2 = 0})' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:filter' function") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [2, 4, 6, 8, 10]) def test_fold_left(self): expression = 'fn:fold-left(1 to 5, 0, function($a, $b) {$a + $b})' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:fold-left' function") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [15]) token = self.parser.parse('fn:fold-left((2,3,5,7), 1, function($a, $b) { $a * $b })') self.assertListEqual(token.evaluate(context), [210]) token = self.parser.parse( 'fn:fold-left((true(), false(), false()), false(), function($a, $b) { $a or $b })') self.assertListEqual(token.evaluate(context), [True]) token = self.parser.parse( 'fn:fold-left((true(), false(), false()), false(), function($a, $b) { $a and $b })') self.assertListEqual(token.evaluate(context), [False]) token = self.parser.parse( 'fn:fold-left(1 to 5, (), function($a, $b) {($b, $a)})') self.assertListEqual(token.evaluate(context), [5, 4, 3, 2, 1]) token = self.parser.parse( 'fn:fold-left(1 to 5, "", fn:concat(?, ".", ?))') self.assertListEqual(token.evaluate(context), [".1.2.3.4.5"]) token = self.parser.parse( 'fn:fold-left(1 to 5, "$zero", fn:concat("$f(", ?, ", ", ?, ")"))') self.assertListEqual(token.evaluate(context), ["$f($f($f($f($f($zero, 1), 2), 3), 4), 5)"]) def test_fold_right(self): expression = 'fn:fold-right(1 to 5, 0, function($a, $b) {$a + $b})' token = self.parser.parse(expression) self.assertEqual(token.source, expression) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:fold-right' function") with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [15]) token = self.parser.parse('fn:fold-right(1 to 5, "", fn:concat(?, ".", ?))') self.assertListEqual(token.evaluate(context), ["1.2.3.4.5."]) token = self.parser.parse( 'fn:fold-right(1 to 5, "$zero", concat("$f(", ?, ", ", ?, ")"))') self.assertListEqual(token.evaluate(context), ["$f(1, $f(2, $f(3, $f(4, $f(5, $zero)))))"]) def test_for_each_pair(self): expression = 'fn:for-each-pair(("a", "b", "c"), ("x", "y", "z"), concat#2)' token = self.parser.parse(expression) self.assertEqual(token.source, expression.replace('"', "'")) self.assertEqual( repr(token[1]), f'' ) self.assertEqual(str(token[1]), "'fn:for-each-pair' function") self.assertListEqual(token.evaluate(), ["ax", "by", "cz"]) token = self.parser.parse('fn:for-each-pair(1 to 5, 1 to 5, function($a, $b){10*$a + $b})') with self.assertRaises(MissingContextError): token.evaluate() root = self.etree.XML('') context = XPathContext(root=root) self.assertListEqual(token.evaluate(context), [11, 22, 33, 44, 55]) def test_format_integer(self): self.check_value("format-integer(57, 'I')", 'LVII') self.check_value("format-integer(594, 'i')", 'dxciv') self.check_value("format-integer(7, 'a')", 'g') self.check_value("format-integer(-90956, 'A')", '-EDNH') self.check_value("format-integer(123, 'w')", 'one hundred and twenty-three') self.check_value("format-integer(-8912, 'W')", "-EIGHT THOUSAND NINE HUNDRED AND TWELVE") self.check_value("format-integer(17089674, 'Ww')", "Seventeen Million Eighty-Nine Thousand Six Hundred And Seventy-Four") self.check_value("format-integer(123, '0000')", '0123') self.check_source("format-integer(-8912, 'W')") def test_path_function__issue_067(self): xml_sample = dedent(""" item 1 value 1 """) root = self.etree.parse(io.StringIO(xml_sample)) paths = select(root, '//*/path()', parser=self.parser.__class__) expected = [ '/Q{}root[1]', '/Q{}root[1]/Q{}item[1]', '/Q{}root[1]/Q{}item[1]/Q{}name[1]', '/Q{}root[1]/Q{}item[1]/Q{}value[1]' ] self.assertListEqual(paths, expected) root = self.etree.XML(xml_sample) paths = select(root, '//*/path()', parser=self.parser.__class__) expected = [ 'Q{http://www.w3.org/2005/xpath-functions}root()', 'Q{http://www.w3.org/2005/xpath-functions}root()/Q{}item[1]', 'Q{http://www.w3.org/2005/xpath-functions}root()/Q{}item[1]/Q{}name[1]', 'Q{http://www.w3.org/2005/xpath-functions}root()/Q{}item[1]/Q{}value[1]' ] self.assertListEqual(paths, expected) def test_path_function_with_namespaces(self): xml_sample = dedent(""" item 1 value 1 """) root = self.etree.parse(io.StringIO(xml_sample)) paths = select(root, '//*/path()', parser=self.parser.__class__) expected = [ '/Q{http://xpath.test/ns}root[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]/Q{}name[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]/Q{bar}value[1]' ] self.assertListEqual(paths, expected) root = self.etree.XML(xml_sample) paths = select(root, '//*/path()', parser=self.parser.__class__) expected = [ 'Q{http://www.w3.org/2005/xpath-functions}root()', 'Q{http://www.w3.org/2005/xpath-functions}root()/Q{}item[1]', 'Q{http://www.w3.org/2005/xpath-functions}root()/Q{}item[1]/Q{}name[1]', 'Q{http://www.w3.org/2005/xpath-functions}root()/Q{}item[1]/Q{bar}value[1]' ] self.assertListEqual(paths, expected) def test_path_function_with_same_child(self): xml_sample = dedent(""" item 1 value 1 item 2 item 3 """) root = self.etree.parse(io.StringIO(xml_sample)) paths = select(root, '//*/path()', parser=self.parser.__class__) expected = [ '/Q{http://xpath.test/ns}root[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]/Q{}name[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]/Q{}value[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]/Q{}name[2]', '/Q{http://xpath.test/ns}root[1]/Q{}item[1]/Q{}name[3]', '/Q{http://xpath.test/ns}root[1]/Q{}item2[1]', '/Q{http://xpath.test/ns}root[1]/Q{}item[2]', ] self.assertListEqual(paths, expected) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath30FunctionsTest(XPath30FunctionsTest): etree = lxml_etree class XPath30ConstructorsTest(test_xpath2_constructors.XPath2ConstructorsTest): def setUp(self): self.parser = XPath30Parser(namespaces=self.namespaces) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath30ConstructorsTest(XPath30ConstructorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath31.py000066400000000000000000001465521476131650400230070ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2022, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # https://www.w3.org/TR/xpath-3/ # https://www.w3.org/TR/xpath-30/ # https://www.w3.org/TR/xpath-31/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import os from textwrap import dedent from typing import cast try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath import XPathContext, select from elementpath.etree import etree_deep_equal from elementpath.datatypes import DateTime, Base64Binary from elementpath.xpath_nodes import DocumentNode from elementpath.xpath3 import XPath31Parser from elementpath.xpath_tokens import XPathMap, XPathArray try: from tests import test_xpath30 except ImportError: import test_xpath30 MAP_WEEKDAYS = """\ map { "Su" : "Sunday", "Mo" : "Monday", "Tu" : "Tuesday", "We" : "Wednesday", "Th" : "Thursday", "Fr" : "Friday", "Sa" : "Saturday" }""" MAP_WEEKDAYS_DE = """\ map{0:"Sonntag", 1:"Montag", 2:"Dienstag", 3:"Mittwoch", 4:"Donnerstag", 5:"Freitag", 6:"Samstag"}""" NESTED_MAP = """\ map { "book": map { "title": "Data on the Web", "year": 2000, "author": [ map { "last": "Abiteboul", "first": "Serge" }, map { "last": "Buneman", "first": "Peter" }, map { "last": "Suciu", "first": "Dan" } ], "publisher": "Morgan Kaufmann Publishers", "price": 39.95 } }""" class XPath31ParserTest(test_xpath30.XPath30ParserTest): def setUp(self): self.parser = XPath31Parser(namespaces=self.namespaces) def test_map_weekdays(self): token = self.parser.parse(MAP_WEEKDAYS) self.assertIsInstance(token, XPathMap) map_value = {'Su': 'Sunday', 'Mo': 'Monday', 'Tu': 'Tuesday', 'We': 'Wednesday', 'Th': 'Thursday', 'Fr': 'Friday', 'Sa': 'Saturday'} self.assertEqual(token.symbol, 'map') self.assertEqual(token.label, 'map') self.assertEqual(token.source, f'map{map_value!r}'.replace(': ', ':')) self.assertEqual( repr(token), f"" ) self.assertEqual(str(token), 'not evaluated map constructor with 7 entries') self.assertDictEqual(token.evaluate()._map, map_value) self.assertTrue( repr(token.evaluate()).startswith('')) self.assertEqual(token.evaluate(context), ['Monday']) def test_nested_map(self): token = self.parser.parse(f'{NESTED_MAP}("book")("title")') self.assertEqual(token.evaluate(), 'Data on the Web') self.assertEqual(token.symbol, '(') self.assertEqual(token.label, 'expression') self.assertTrue(token.source.startswith("map{'book':map{'title':'Data on the Web', ")) self.assertTrue(token.source.endswith(", 'price':39.95}}('book')('title')")) self.assertEqual( repr(token), f'<_LeftParenthesisExpression object at {hex(id(token))}>' ) self.assertEqual(str(token), "function call expression") token = self.parser.parse(f'{NESTED_MAP}("book")("author")') self.assertIsInstance(token.evaluate(), XPathArray) token = self.parser.parse(f'{NESTED_MAP}("book")("author")(1)("last")') self.assertEqual(token.evaluate(), 'Abiteboul') def test_map_ambiguity(self): self.parser.namespaces['a'] = 'http://xpath.test/ns' try: with self.assertRaises(SyntaxError): self.parser.parse('map{a:b}') token = cast(XPathMap, self.parser.parse('map{a :b}')) self.assertEqual(token[0].symbol, '(name)') self.assertEqual(token[0].value, 'a') self.assertEqual(token._values[0].symbol, '(name)') self.assertEqual(token._values[0].value, 'b') token = cast(XPathMap, self.parser.parse('map{a: b}')) self.assertEqual(token[0].symbol, '(name)') self.assertEqual(token[0].value, 'a') self.assertEqual(token._values[0].symbol, '(name)') self.assertEqual(token._values[0].value, 'b') token = self.parser.parse('map{a:b:c}') self.assertEqual(token[0].symbol, ':') self.assertEqual(token[0].value, 'a:b') self.assertEqual(token._values[0].symbol, '(name)') self.assertEqual(token._values[0].value, 'c') token = self.parser.parse('map{a:*:c}') self.assertEqual(token[0].symbol, ':') self.assertEqual(token[0].value, 'a:*') self.assertEqual(token._values[0].symbol, '(name)') self.assertEqual(token._values[0].value, 'c') token = self.parser.parse('map{*:b:c}') self.assertEqual(token[0].symbol, ':') self.assertEqual(token[0].value, '*:b') self.assertEqual(token._values[0].symbol, '(name)') self.assertEqual(token._values[0].value, 'c') finally: self.parser.namespaces.pop('a') def test_curly_array_constructor(self): token = self.parser.parse('array { 1, 2, 5, 7 }') self.assertIsInstance(token, XPathArray) self.assertEqual(token.symbol, 'array') self.assertEqual(token.label, 'array') self.assertEqual(token.source, 'array{1, 2, 5, 7}') self.assertEqual( repr(token), f'' ) self.assertEqual(str(token), 'not evaluated curly array constructor with 4 items') array = token.evaluate() # Create a new object ... self.assertEqual( repr(token), f'' ) self.assertEqual(str(array), '[1, 2, 5, 7]') def test_square_array_constructor(self): token = self.parser.parse('[ 1, 2, 5, 7 ]') self.assertIsInstance(token, XPathArray) self.assertEqual(token.symbol, '[') self.assertEqual(token.label, 'array') self.assertEqual(token.source, '[1, 2, 5, 7]') self.assertEqual( repr(token), f"" ) self.assertEqual(str(token), 'not evaluated square array constructor with 4 items') array = token.evaluate() self.assertEqual( repr(array), f"<{array.__class__.__name__} object at {hex(id(array))}>" ) self.assertEqual(str(array), '[1, 2, 5, 7]') def test_array_lookup(self): token = self.parser.parse('array { 1, 2, 5, 7 }(4)') self.assertEqual(token.evaluate(), 7) self.assertEqual(token.source, 'array{1, 2, 5, 7}(4)') self.assertEqual(repr(token), f'<_LeftParenthesisExpression object at {hex(id(token))}>') self.assertEqual( repr(token[0]), f"" ) self.assertEqual(str(token), "function call expression") token = self.parser.parse('[ 1, 2, 5, 7 ](4)') self.assertEqual(token.evaluate(), 7) self.assertEqual(token.source, '[1, 2, 5, 7](4)') self.assertEqual(repr(token), f'<_LeftParenthesisExpression object at {hex(id(token))}>') self.assertEqual( repr(token[0]), f"" ) self.assertEqual(str(token), "function call expression") def test_map_size_function(self): token = self.parser.parse('map:size(map{})') self.assertEqual(token.evaluate(), 0) self.assertEqual(str(token), "'map:size' function") self.assertEqual( repr(token), f'<_PrefixedReferenceToken object at {hex(id(token))}>' ) self.assertEqual(token.source, 'map:size(map{})') self.check_value('map:size(map{"true":1, "false":0})', 2) def test_map_keys_function(self): token = self.parser.parse('map:keys(map{})') self.assertListEqual(token.evaluate(), []) self.assertEqual(str(token), "'map:keys' function") self.assertEqual( repr(token), f'<_PrefixedReferenceToken object at {hex(id(token))}>' ) self.assertEqual(token.source, 'map:keys(map{})') self.check_value('map:keys(map{1:"yes", 2:"no"})', {1, 2}) def test_map_contains_function(self): self.check_value('map:contains(map{}, 1)', False) self.check_value('map:contains(map{}, "xyz")', False) self.check_value('map:contains(map{1:"yes", 2:"no"}, 1)', True) self.check_value('map:contains(map{"xyz":23}, "xyz")', True) self.check_value('map:contains(map{"abc":23, "xyz":()}, "xyz")', True) self.check_source('map:contains(map{"xyz":23}, "xyz")', "map:contains(map{'xyz':23}, 'xyz')") context = XPathContext(self.etree.XML('')) expression = f"let $x := {MAP_WEEKDAYS_DE} return map:contains($x, 2)" self.check_value(expression, [True], context=context) expression = f"let $x := {MAP_WEEKDAYS_DE} return map:contains($x, 9)" self.check_value(expression, [False], context=context) def test_map_get_function(self): context = XPathContext(self.etree.XML('')) expression = f"let $x := {MAP_WEEKDAYS} return map:get($x, 'Mo')" self.check_value(expression, ['Monday'], context=context) # Tht source property returns a compacted normalized form expected = expression.\ replace('\n', '').\ replace('\r', '').\ replace(' ', ' ').\ replace('"', "'").\ replace('map {', 'map{').\ replace('{ ', '{').replace(' : ', ':') self.check_source(expression, expected) expression = f"let $x := {MAP_WEEKDAYS} return map:get($x, 'Mon')" self.check_value(expression, [], context=context) def test_map_put_function(self): context = XPathContext(self.etree.XML('')) expression = f'let $week := {MAP_WEEKDAYS_DE} return map:put($week, 6, "Sonnabend")' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: "Sonnabend" }) self.check_value(expression, [result], context=context) expected = expression.\ replace('\n', '').\ replace('\r', '').\ replace('"', "'").\ replace(' ', ' ') self.check_source(expression, expected) def test_map_remove_function(self): context = XPathContext(self.etree.XML('')) expression = f'let $week := {MAP_WEEKDAYS_DE} return map:remove($week, 4)' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 5: "Freitag", 6: "Samstag" }) self.check_value(expression, [result], context=context) expression = f'let $week := {MAP_WEEKDAYS_DE} return map:remove($week, (0, 6 to 7))' result = XPathMap(self.parser, items={ 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag" }) self.check_value(expression, [result], context=context) expression = f'let $week := {MAP_WEEKDAYS_DE} return map:remove($week, ())' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: "Samstag" }) self.check_value(expression, [result], context=context) expected = expression.\ replace('\n', '').\ replace('\r', '').\ replace('"', "'").\ replace(' ', ' ') self.check_source(expression, expected) expression = f'let $week := {MAP_WEEKDAYS_DE} return map:remove($week, 4)' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", # 4: "Donnerstag", 5: "Freitag", 6: "Samstag" }) self.check_value(expression, [result], context=context) def test_map_entry_function(self): context = XPathContext(self.etree.XML('')) expression = 'map:entry("M", "Monday")' result = XPathMap(self.parser, items={'M': 'Monday'}) self.check_value(expression, result, context=context) self.check_source(expression, expression.replace('"', "'")) # e.g.: Alternative low level token-based check token = self.parser.parse('map:entry("M", "Monday")') result = token.evaluate(context) self.assertIsInstance(result, XPathMap) self.assertEqual(len(result), 1) self.assertEqual(result('M', context=context), 'Monday') def test_map_merge_function(self): week = {0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: "Samstag"} context = XPathContext( root=self.etree.XML(''), variables={'week': XPathMap(self.parser, week)} ) expression = 'map:merge(())' result = XPathMap(self.parser, items={}) self.check_value(expression, result, context=context) expression = 'map:merge((map:entry(0, "no"), map:entry(1, "yes")))' result = XPathMap(self.parser, items={0: 'no', 1: 'yes'}) self.check_value(expression, result, context=context) self.check_source(expression, expression.replace('"', "'")) expression = 'map:merge(($week, map{7:"Unbekannt"}))' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: "Samstag", 7: "Unbekannt" }) self.check_value(expression, result, context=context) self.check_source(expression, expression.replace('"', "'")) expression = 'map:merge(($week, map{6:"Sonnabend"}), map{"duplicates":"use-last"})' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: "Sonnabend" }) self.check_value(expression, result, context=context) self.check_source(expression, expression.replace('"', "'")) expression = 'map:merge(($week, map{6:"Sonnabend"}), map{"duplicates":"use-first"}) ' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: "Samstag" }) self.check_value(expression, result, context=context) self.check_source(expression, expression.strip().replace('"', "'")) expression = 'map:merge(($week, map{6:"Sonnabend"}), map{"duplicates":"combine"})' result = XPathMap(self.parser, items={ 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch", 4: "Donnerstag", 5: "Freitag", 6: ["Samstag", "Sonnabend"] }) self.check_value(expression, result, context=context) def test_map_find_function(self): map1 = XPathMap(self.parser, {0: 'no', 1: 'yes'}) map2 = XPathMap(self.parser, {0: 'non', 1: 'oui'}) map3 = XPathMap(self.parser, {0: 'nein', 1: ['ja', 'doch']}) context = XPathContext( root=self.etree.XML(''), variables={'responses': XPathArray(self.parser, [map1, map2, map3])} ) expression = 'map:find($responses, 0)' result = XPathArray(self.parser, items=['no', 'non', 'nein']) self.check_value(expression, result, context=context) expression = 'map:find($responses, 1)' result = XPathArray(self.parser, items=['yes', 'oui', ['ja', 'doch']]) self.check_value(expression, result, context=context) expression = 'map:find($responses, 2)' result = XPathArray(self.parser, items=[]) self.check_value(expression, result, context=context) self.check_source(expression, expression) array1 = XPathArray(self.parser, items=[]) map1 = XPathMap(self.parser, {"name": "engine", "id": "YW678", "parts": array1}) array2 = XPathArray(self.parser, items=[map1]) map2 = XPathMap(self.parser, {"name": "car", "id": "QZ123", "parts": array2}) context = XPathContext( root=self.etree.XML(''), variables={'inventory': map2} ) expression = 'map:find($inventory, "parts")' result = XPathArray(self.parser, items=[array2, array1]) self.check_value(expression, result, context=context) self.check_source(expression, expression.replace('"', "'")) expression = 'let $inventory := map{"name":"car", "id":"QZ123", ' \ '"parts": [map{"name":"engine", "id":"YW678", "parts":[]}]} ' \ 'return map:find($inventory, "parts")' token = self.parser.parse(expression) self.assertEqual(token.evaluate(context), [result]) def test_map_for_each_function(self): context = XPathContext(self.etree.XML('')) expression = 'map:for-each(map{1:"yes", 2:"no"}, function($k, $v){$k})' self.check_value(expression, [1, 2], context=context) expression = 'distinct-values(map:for-each(map{1:"yes", 2:"no"}, ' \ 'function($k, $v) {$v}))' self.check_value(expression, ['yes', 'no'], context=context) self.check_source(expression, expression.replace('"', "'")) expression = 'map:merge(map:for-each(map{"a":1, "b":2}, ' \ 'function($k, $v){map:entry($k, $v+1)}))' result = XPathMap(self.parser, {'a': 2, 'b': 3}) self.check_value(expression, result, context=context) def test_array_size_function(self): self.check_value('array:size(["a", "b", "c"])', 3) self.check_value('array:size(["a", ["b", "c"]])', 2) self.check_value('array:size([ ])', 0) self.check_value('array:size([[ ]])', 1) self.check_source('array:size(["a", ["b", "c"]])', "array:size(['a', ['b', 'c']])") def test_array_get_function(self): expression = 'array:get(["a", "b", "c"], 2)' self.check_value(expression, 'b') self.check_source(expression, expression.replace('"', "'")) token = self.parser.parse('array:get(["a", ["b", "c"]], 2)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result._array, ['b', 'c']) def test_array_put_function(self): expression = ' array:put(["a", "b", "c"], 2, "d")' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', 'd', 'c']) self.check_source(expression, expression.lstrip().replace('"', "'")) token = self.parser.parse('array:put(["a", "b", "c"], 2, ("d", "e"))') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', ['d', 'e'], 'c']) token = self.parser.parse('array:put(["a"], 1, ["d", "e"])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertIsInstance(result.items()[0], XPathArray) self.assertListEqual(result.items()[0].items(), ['d', 'e']) def test_array_insert_before_function(self): expression = 'array:insert-before(["a", "b", "c", "d"], 3, ("x", "y"))' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', 'b', ['x', 'y'], 'c', 'd']) self.check_source(expression, expression.replace('"', "'")) token = self.parser.parse('array:insert-before(["a", "b", "c", "d"], 5, ("x", "y"))') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', 'b', 'c', 'd', ['x', 'y']]) token = self.parser.parse('array:insert-before(["a", "b", "c", "d"], 3, ["x", "y"])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual( result.items(), ['a', 'b', XPathArray(self.parser, ['x', 'y']), 'c', 'd'] ) def test_array_append_function(self): token = self.parser.parse('array:append(["a", "b", "c"], "d")') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', 'b', 'c', 'd']) expression = 'array:append(["a", "b", "c"], ("d", "e"))' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', 'b', 'c', ['d', 'e']]) self.check_source(expression, expression.replace('"', "'")) token = self.parser.parse('array:append(["a", "b", "c"], ["d", "e"])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual( result.items(), ['a', 'b', 'c', XPathArray(self.parser, ['d', 'e'])] ) def test_array_subarray_function(self): token = self.parser.parse('array:subarray(["a", "b", "c", "d"], 2)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['b', 'c', 'd']) token = self.parser.parse('array:subarray(["a", "b", "c", "d"], 5)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) token = self.parser.parse('array:subarray(["a", "b", "c", "d"], 2, 0)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) token = self.parser.parse('array:subarray(["a", "b", "c", "d"], 2, 1)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['b']) token = self.parser.parse('array:subarray(["a", "b", "c", "d"], 2, 2)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['b', 'c']) expression = 'array:subarray(["a", "b", "c", "d"], 5, 0)' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) self.check_source(expression, expression.replace('"', "'")) expression = 'array:subarray([ ], 1, 0)' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) self.check_source(expression, expression.replace('[ ]', '[]')) def test_array_head_function(self): self.check_value('array:head([5, 6, 7, 8])', 5) self.check_value('array:head([("a", "b"), ("c", "d")])', ['a', 'b']) expression = 'array:head([["a", "b"], ["c", "d"]])' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ['a', 'b']) self.check_source(expression, expression.replace('"', "'")) def test_array_tail_function(self): expression = 'array:tail([5, 6, 7, 8])' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [6, 7, 8]) self.check_source(expression, expression) token = self.parser.parse('array:tail([5])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) def test_array_reverse_function(self): token = self.parser.parse('array:reverse(["a", "b", "c", "d"])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["d", "c", "b", "a"]) expression = 'array:reverse([("a", "b"), ("c", "d")])' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [["c", "d"], ["a", "b"]]) self.check_source(expression, expression.replace('"', "'")) expression = 'array:reverse([(1 to 5)])' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [[1, 2, 3, 4, 5]]) self.check_source(expression, expression) token = self.parser.parse('array:reverse([])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) def test_array_remove_function(self): token = self.parser.parse('array:remove(["a", "b", "c", "d"], 1)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["b", "c", "d"]) token = self.parser.parse('array:remove(["a", "b", "c", "d"], 2)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["a", "c", "d"]) token = self.parser.parse('array:remove(["a"], 1)') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) expression = 'array:remove(["a", "b", "c", "d"], 1 to 3)' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["d"]) self.check_source(expression, expression.replace('"', "'")) token = self.parser.parse('array:remove(["a", "b", "c", "d"], ())') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["a", "b", "c", "d"]) self.wrong_value('array:remove(["a", "b", "c", "d"], 0)', 'FOAY0001') def test_array_join_function(self): token = self.parser.parse('array:join(())') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), []) token = self.parser.parse('array:join([1, 2, 3])') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [1, 2, 3]) token = self.parser.parse(' array:join((["a", "b"], ["c", "d"]))') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["a", "b", "c", "d"]) token = self.parser.parse('array:join((["a", "b"], ["c", "d"], [ ]))') result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["a", "b", "c", "d"]) expression = 'array:join((["a", "b"], ["c", "d"], [["e", "f"]]))' token = self.parser.parse(expression) result = token.evaluate() self.assertIsInstance(result, XPathArray) self.assertListEqual( result.items(), ["a", "b", "c", "d", XPathArray(self.parser, ['e', 'f'])] ) self.check_source(expression, expression.replace('"', "'")) def test_array_flatten_function(self): token = self.parser.parse('array:flatten([1, 4, 6, 5, 3])') result = token.evaluate() self.assertListEqual(result, [1, 4, 6, 5, 3]) expression = 'array:flatten(([1, 2, 5], [[10, 11], 12], [], 13))' token = self.parser.parse(expression) result = token.evaluate() self.assertListEqual(result, [1, 2, 5, 10, 11, 12, 13]) self.check_source(expression, expression) expression = 'array:flatten([(1, 0), (1, 1), (0, 1), (0, 0)])' token = self.parser.parse(expression) result = token.evaluate() self.assertListEqual(result, [1, 0, 1, 1, 0, 1, 0, 0]) self.check_source(expression, expression) def test_array_for_each_function(self): context = XPathContext(self.etree.XML('')) expression = 'array:for-each(["A", "B", 1, 2], function($z) {$z instance of xs:integer})' token = self.parser.parse(expression) result = token.evaluate(context) self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [False, False, True, True]) expression = 'array:for-each(["the cat", "sat", "on the mat"], fn:tokenize#1)' token = self.parser.parse(expression) result = token.evaluate(context) self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [["the", "cat"], "sat", ["on", "the", "mat"]]) self.check_source(expression, expression.replace('"', "'")) def test_array_for_each_pair_function(self): context = XPathContext(self.etree.XML('')) expression = 'array:for-each-pair(["A", "B", "C"], [1, 2, 3], ' \ 'function($x, $y) { array {$x, $y}})' token = self.parser.parse(expression) result = token.evaluate(context) self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [ XPathArray(self.parser, ['A', 1]), XPathArray(self.parser, ['B', 2]), XPathArray(self.parser, ['C', 3]) ]) expected = expression.replace('"', "'").replace('{ array ', '{array') self.check_source(expression, expected) expression = 'let $A := ["A", "B", "C", "D"] ' \ 'return array:for-each-pair($A, array:tail($A), concat#2)' token = self.parser.parse(expression) result = token.evaluate(context) self.assertListEqual(result, [XPathArray(self.parser, ['AB', 'BC', 'CD'])]) self.check_source(expression, expression.replace('"', "'")) def test_array_filter_function(self): context = XPathContext(self.etree.XML('')) expression = 'array:filter(["A", "B", 1, 2], function($x) {$x instance of xs:integer})' token = self.parser.parse(expression) result = token.evaluate(context) self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), [1, 2]) expression = 'array:filter(["the cat", "sat", "on the mat"], ' \ 'function($s) {fn:count(fn:tokenize($s)) gt 1})' token = self.parser.parse(expression) result = token.evaluate(context) self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["the cat", "on the mat"]) self.check_source(expression, expression.replace('"', "'")) expression = 'array:filter(["A", "B", "", 0, 1], boolean#1)' token = self.parser.parse(expression) result = token.evaluate(context) self.assertIsInstance(result, XPathArray) self.assertListEqual(result.items(), ["A", "B", 1]) self.check_source(expression, expression.replace('"', "'")) def test_array_fold_left_function(self): context = XPathContext(self.etree.XML('')) expression = 'array:fold-left([true(), true(), false()], true(), ' \ 'function($x, $y){$x and $y})' self.check_value(expression, [False], context=context) expression = 'array:fold-left([true(), true(), false()], false(), ' \ 'function($x, $y){$x or $y})' self.check_value(expression, [True], context=context) expression = 'array:fold-left([1, 2, 3], [], function($x, $y){[$x, $y]})' ar1 = XPathArray(self.parser, []) ar2 = XPathArray(self.parser, items=[ar1, 1]) ar3 = XPathArray(self.parser, items=[ar2, 2]) ar4 = XPathArray(self.parser, items=[ar3, 3]) self.check_value(expression, [ar4], context=context) self.check_source(expression, expression.replace('){', ') {')) def test_array_fold_right_function(self): context = XPathContext(self.etree.XML('')) expression = 'array:fold-right([true(), true(), false()], true(), ' \ 'function($x, $y){$x and $y})' self.check_value(expression, [False], context=context) expression = 'array:fold-right([true(), true(), false()], false(), ' \ 'function($x, $y){$x or $y})' self.check_value(expression, [True], context=context) expression = 'array:fold-right([1,2,3], [], function($x, $y){[$x, $y]})' ar1 = XPathArray(self.parser, []) ar2 = XPathArray(self.parser, items=[3, ar1]) ar3 = XPathArray(self.parser, items=[2, ar2]) ar4 = XPathArray(self.parser, items=[1, ar3]) self.check_value(expression, [ar4], context=context) self.check_source(expression, expression.replace('){', ') {').replace('2,', ' 2, ')) def test_array_sort_function(self): expression = 'array:sort([1, 4, 6, 5, 3])' self.check_value(expression, XPathArray(self.parser, [1, 3, 4, 5, 6])) expression = 'array:sort([1, -2, 5, 10, -10, 10, 8], (), fn:abs#1)' self.check_value(expression, XPathArray(self.parser, [1, -2, 5, 8, 10, -10, 10])) self.check_source(expression, expression) expression = 'array:sort([(1,0), (1,1), (0,1), (0,0)])' self.check_value(expression, XPathArray(self.parser, [[0, 0], [0, 1], [1, 0], [1, 1]])) def test_sort_function(self): expression = 'fn:sort((1, 4, 6, 5, 3))' self.check_value(expression, [1, 3, 4, 5, 6]) expression = 'fn:sort((1, -2, 5, 10, -10, 10, 8), (), fn:abs#1)' self.check_value(expression, [1, -2, 5, 8, 10, -10, 10]) self.check_source(expression, expression) def test_parse_json_function(self): expression = 'parse-json(\'{"x":1, "y":[3,4,5]}\')' result = XPathMap(self.parser, {'x': 1, 'y': XPathArray(self.parser, [3, 4, 5])}) self.check_value(expression, result) expression = 'parse-json(\'"abcd"\')' self.check_value(expression, 'abcd') expression = 'parse-json(\'{"x":"\\\\", "y":"\\u0025"}\')' result = XPathMap(self.parser, {"x": "\\", "y": "%"}) self.check_value(expression, result) self.check_source(expression, expression) expression = 'parse-json(\'{"x":"\\\\", "y":"\\u0025"}\', map{\'escape\':true()})' result = XPathMap(self.parser, {"x": "\\\\", "y": "%"}) self.check_value(expression, result) expression = 'parse-json(\'{"x":"\\\\", "y":"\\u0000"}\', ' \ 'map{\'fallback\':function($s){\'[\'||$s||\']\'}})' result = XPathMap(self.parser, {"x": "\\", "y": "[\\u0000]"}) # fallback inline function requires a context for evaluation context = XPathContext(root=self.etree.XML('')) self.check_value(expression, result, context=context) def test_load_xquery_module_function(self): self.wrong_value('load-xquery-module("")', 'FOQM0001') with self.assertRaises(RuntimeError) as ctx: self.check_value('load-xquery-module("./xquery-module")') self.assertIn('FOQM0006', str(ctx.exception)) def test_transform_function(self): with self.assertRaises(RuntimeError) as ctx: self.check_value('transform(map{})') self.assertIn('FOXT0004', str(ctx.exception)) def test_random_number_generator_function(self): context = None expression = 'random-number-generator()' token = self.parser.parse(expression) self.assertEqual(token.source, expression) result = token.evaluate() self.assertIsInstance(result, XPathMap) self.assertListEqual(list(result.keys()), ['number', 'next', 'permute']) self.assertTrue(0 <= result('number', context=context) <= 1) seq = result('permute', context=context)(range(10)) _seq = tuple(seq) self.assertNotEqual(seq, list(range(10))) self.assertNotEqual(seq, result('permute', context=context)(seq)) self.assertNotEqual(seq, result('permute', context=context)(range(10))) self.assertListEqual(seq, list(_seq)) expression = 'random-number-generator(1000)' token = self.parser.parse(expression) self.assertEqual(token.source, expression) result = token.evaluate() self.assertNotEqual(seq, result('permute', context=context)(seq)) def test_apply_function(self): expression = 'fn:apply(fn:concat#3, ["a", "b", "c"])' self.check_value(expression, 'abc') self.check_source(expression, expression.replace('"', "'")) expression = 'fn:apply(fn:concat#3, ["a", "b", "c", "d"])' self.wrong_type(expression, 'FOAP0001') expression = 'fn:apply(fn:concat#4, array:subarray(["a", "b", "c", "d", "e", "f"], ' \ '1, fn:function-arity(fn:concat#4)))' self.check_value(expression, 'abcd') self.check_source(expression, expression.replace('"', "'")) def test_parse_ietf_date_function(self): expression = 'fn:parse-ietf-date("Wed, 06 Jun 1994 07:29:35 GMT")' result = DateTime.fromstring('1994-06-06T07:29:35Z') self.check_value(expression, result) self.check_source(expression, expression.replace('"', "'")) expression = 'fn:parse-ietf-date("Wed, 6 Jun 94 07:29:35 GMT")' result = DateTime.fromstring('1994-06-06T07:29:35Z') self.check_value(expression, result) self.check_source(expression, expression.replace('"', "'")) expression = 'fn:parse-ietf-date("Wed Jun 06 11:54:45 EST 2013")' result = DateTime.fromstring('2013-06-06T11:54:45-05:00') self.check_value(expression, result) self.check_source(expression, expression.replace('"', "'")) expression = 'fn:parse-ietf-date("Sunday, 06-Nov-94 08:49:37 GMT")' result = DateTime.fromstring('1994-11-06T08:49:37Z') self.check_value(expression, result) self.check_source(expression, expression.replace('"', "'")) expression = 'fn:parse-ietf-date("Wed, 6 Jun 94 07:29:35 +0500")' result = DateTime.fromstring('1994-06-06T07:29:35+05:00') self.check_value(expression, result) self.check_source(expression, expression.replace('"', "'")) def test_contains_token_function(self): expression = 'fn:contains-token("red green blue ", "red")' self.check_value(expression, True) expression = 'fn:contains-token(("red", "green", "blue"), " red ")' self.check_value(expression, True) self.check_source(expression, expression.replace('"', "'")) expression = 'fn:contains-token("red, green, blue", "red")' self.check_value(expression, False) expression = \ 'fn:contains-token("red green blue", "RED", ' \ '"http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive")' self.check_value(expression, True) self.check_source(expression, expression.replace('"', "'")) def test_collation_key_function(self): expression = 'fn:collation-key("foo")' self.check_value(expression, Base64Binary(b'Zm9v')) self.check_source(expression, expression.replace('"', "'")) def test_lookup_unary_operator(self): context = XPathContext(self.etree.XML('')) expression = '([1, 2, 3], [1, 2, 5], [1, 2, 6])[?3 = 5]' result = [XPathArray(self.parser, [1, 2, 5])] self.check_value(expression, result, context=context) self.check_source(expression, expression) def test_lookup_postfix_operator(self): expression = '[1, 2, 5, 7]?*' self.check_value(expression, [1, 2, 5, 7]) self.check_source(expression, expression) expression = '[[1, 2, 3], [4, 5, 6]]?*' result = [ XPathArray(self.parser, [1, 2, 3]), XPathArray(self.parser, [4, 5, 6]) ] self.check_value(expression, result) self.check_source(expression, expression) expression = 'map { "first" : "Jenna", "last" : "Scott" }?first' self.check_value(expression, ['Jenna']) self.check_value('[4, 5, 6]?2', [5]) expression = '(map {"first": "Tom"}, map {"first": "Dick"}, ' \ 'map {"first": "Harry"})?first' self.check_value(expression, ['Tom', 'Dick', 'Harry']) expected = expression.\ replace('"', "'").\ replace('map ', 'map').\ replace(': ', ':') self.check_source(expression, expected) expression = '([1,2,3], [4,5,6])?2' self.check_value(expression, [2, 5]) self.check_source(expression, '([1, 2, 3], [4, 5, 6])?2') self.wrong_value('["a","b"]?3', 'FOAY0001') def test_lookup_operator_tree(self): self.check_tree('$a?2?1', '(? (? ($ (a)) (2)) (1))') self.check_tree('$a?2 and $a?3', '(and (? ($ (a)) (2)) (? ($ (a)) (3)))') self.check_tree('$a?2?1 and $a?3?4', '(and (? (? ($ (a)) (2)) (1)) (? (? ($ (a)) (3)) (4)))') self.check_tree('$a[1] eq 1 and $a[2] eq 2', '(and (eq ([ ($ (a)) (1)) (1)) (eq ([ ($ (a)) (2)) (2)))') self.check_tree( '$a[1]?2 eq 1 and $a[2]?2 eq 2', '(and (eq (? ([ ($ (a)) (1)) (2)) (1)) (eq (? ([ ($ (a)) (2)) (2)) (2)))' ) def test_arrow_operator(self): expression = '"foo" => $f("bar")' self.check_tree(expression, "(=> ('foo') ($ (f)) ('bar'))") self.check_source(expression, expression.replace('"', "'")) expression = '"foo" => $f()' self.check_tree(expression, "(=> ('foo') ($ (f)) ())") expression = '"foo" => upper-case()' # self.check_tree(expression, "(=> ('foo') (upper-case) ())") self.check_value(expression, 'FOO') self.check_source(expression, expression.replace('"', "'")) def test_issue_082(self): root = self.etree.XML('') result = select( root=root, path="'aaa)' => substring-before(')') => string-to-codepoints()", parser=self.parser.__class__ ) self.assertListEqual(result, [97, 97, 97]) def test_issue_083(self): root = self.etree.XML('12.') result = select( root=root, path="term", parser=self.parser.__class__ ) self.assertListEqual(result, root[:]) result = select( root=root, path="term => substring-before('.') => xs:integer()", parser=self.parser.__class__ ) self.assertEqual(result, 12) with self.assertRaises(SyntaxError) as ctx: select( root=root, path="term => substring-before('.') => element()", parser=self.parser.__class__ ) self.assertIn("unexpected 'element' kind test", str(ctx.exception)) def test_xml_to_json_function(self): root = self.etree.XML('' '1is1' '') expression = 'fn:xml-to-json(.)' context = XPathContext(root) result = '[1,"is",true]' self.check_value(expression, result, context=context) self.check_source(expression, expression) root = self.etree.XML('' '12' '') context = XPathContext(root) result = '{"Sunday":1,"Monday":2}' self.check_value(expression, result, context=context) def test_json_to_xml_function(self): context = XPathContext(root=self.etree.XML('')) root = self.etree.XML(dedent("""\ 1 3 4 5 """)) expression = 'json-to-xml(\'{"x": 1, "y": [3,4,5]}\')' token = self.parser.parse(expression) self.check_source(expression, expression) result = token.evaluate(context) self.assertIsInstance(result, DocumentNode) self.assertTrue(etree_deep_equal(result.value.getroot(), root)) root = self.etree.XML(dedent("""\ abcd""")) token = self.parser.parse('json-to-xml(\'"abcd"\', map{\'liberal\': false()})') result = token.evaluate(context) self.assertIsInstance(result, DocumentNode) self.assertTrue(etree_deep_equal(result.value.getroot(), root)) root = self.etree.XML(dedent("""\ \\ % """)) expression = 'json-to-xml(\'{"x": "\\\\", "y": "\\u0025"}\')' token = self.parser.parse(expression) self.check_source(expression, expression) result = token.evaluate(context) self.assertIsInstance(result, DocumentNode) self.assertTrue(etree_deep_equal(result.value.getroot(), root)) root = self.etree.XML(dedent("""\ \\\\ % """)) expression = 'json-to-xml(\'{"x": "\\\\", "y": "\\u0025"}\', ' \ 'map{\'escape\':true()})' token = self.parser.parse(expression) self.check_source(expression, expression) result = token.evaluate(context) self.assertIsInstance(result, DocumentNode) self.assertTrue(etree_deep_equal(result.value.getroot(), root)) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath31ParserTest(XPath31ParserTest): etree = lxml_etree def test_regression_ep415_ep420__issue_71(self): import lxml.html as lxml_html xml_source = dedent("""\ 2023-10-10 Christopher Anderson 25 2023-10-11 Christopher Carter 30 Lisa Walker 60 Jessica Walker 32 Jennifer Roberts 50 """) queries = [ 'if (count(//hotel/branch/staff) = 5) then true() else false()', '//hotel/branch/staff', 'if (count(/hotel/branch/staff) = 5) then true() else false()', '(count(/hotel/branch/staff) = 5)', '(count(/hotel/branch/staff))', '/hotel/branch/staff', 'for $i in /hotel/branch/staff return $i/given_name', 'for $i in //hotel/branch/staff return $i/given_name', 'distinct-values(for $i in /hotel/branch/staff return $i/given_name)', 'distinct-values(for $i in //hotel/branch/staff return $i/given_name)', 'date(/hotel/branch[1]/staff[1]/date) instance of xs:date', '/hotel/branch[1]/staff[1]/date cast as xs:date', ] html_parser = lxml_html.HTMLParser() xml_parser = lxml_etree.XMLParser(strip_cdata=False) xml_data = bytes(xml_source, encoding='utf-8') data_trees = { 'html': lxml_html.fromstring(xml_data, parser=html_parser), 'xml': lxml_etree.fromstring(xml_data, parser=xml_parser) } for query in queries: results = [] for doctype, document in data_trees.items(): try: res = select(document, query, parser=XPath31Parser) except Exception as e: results.append(e) else: results.append(res) if isinstance(results[0], list): self.assertIsInstance(results[1], list) self.assertEqual(len(results[0]), len(results[1])) for e1, e2 in zip(*results): self.assertEqual(getattr(e1, 'tag', e1), getattr(e2, 'tag', e2)) else: self.assertEqual(results[0], results[1]) class XPath31FunctionsTest(test_xpath30.XPath30FunctionsTest): maxDiff = 1024 def setUp(self): self.parser = XPath31Parser(namespaces=self.namespaces) # Make sure the tests are repeatable. env_vars_to_tweak = 'LC_ALL', 'LANG' self.current_env_vars = {v: os.environ.get(v) for v in env_vars_to_tweak} for v in self.current_env_vars: os.environ[v] = 'en_US.UTF-8' def tearDown(self): if hasattr(self, 'current_env_vars'): for v in self.current_env_vars: if self.current_env_vars[v] is not None: os.environ[v] = self.current_env_vars[v] @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath31FunctionsTest(XPath31FunctionsTest): etree = lxml_etree class XPath31ConstructorsTest(test_xpath30.XPath30ConstructorsTest): def setUp(self): self.parser = XPath31Parser(namespaces=self.namespaces) @unittest.skipIf(lxml_etree is None, "The lxml library is not installed") class LxmlXPath31ConstructorsTest(XPath31ConstructorsTest): etree = lxml_etree if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath_context.py000066400000000000000000000435071476131650400244030ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from copy import copy from unittest.mock import patch import xml.etree.ElementTree as ElementTree try: import lxml.etree as lxml_etree import lxml.html as lxml_html except ImportError: lxml_etree = None lxml_html = None from elementpath import XPathContext, DocumentNode, ElementNode, datatypes, \ select, get_node_tree, TextNode class DummyXsdType: name = local_name = None @property def root_type(self): return self @property def simple_type(self): return self def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def decode(self, obj, *args, **kwargs): pass def validate(self, obj, *args, **kwargs): pass class XPathContextTest(unittest.TestCase): root = ElementTree.XML('Dickens') def test_invalid_initialization(self): self.assertRaises(TypeError, XPathContext, None) with self.assertRaises(TypeError): XPathContext(item=[1]) def test_timezone_argument(self): context = XPathContext(self.root) self.assertIsNone(context.timezone) context = XPathContext(self.root, timezone='Z') self.assertIsInstance(context.timezone, datatypes.Timezone) def test_repr(self): self.assertEqual(repr(XPathContext(self.root)), f"XPathContext(root={self.root})") self.assertEqual(repr(XPathContext(item=self.root)), f"XPathContext(item={self.root})") self.assertEqual(repr(XPathContext(item=9.0)), "XPathContext(item=9.0)") def test_copy(self): root = ElementTree.XML('') context = XPathContext(root) self.assertIsInstance(copy(context), XPathContext) self.assertIsNot(copy(context), context) @unittest.skipIf(lxml_etree is None, 'lxml library is not installed') def test_etree_property(self): root = ElementTree.XML('') context = XPathContext(root) self.assertEqual(context.etree.__name__, 'xml.etree.ElementTree') self.assertEqual(context.etree.__name__, 'xml.etree.ElementTree') # property caching root = lxml_etree.XML('') context = XPathContext(root) self.assertEqual(context.etree.__name__, 'lxml.etree') self.assertEqual(context.etree.__name__, 'lxml.etree') def test_context_root_type(self): root = ElementTree.XML('') context = XPathContext(root) self.assertTrue(context.is_document()) self.assertIsInstance(context.root, ElementNode) self.assertIsInstance(context.document, DocumentNode) self.assertFalse(context.is_fragment()) self.assertFalse(context.is_rooted_subtree()) root = ElementTree.XML('') context = XPathContext(root, fragment=True) self.assertFalse(context.is_document()) self.assertIsInstance(context.root, ElementNode) self.assertIsNone(context.document) self.assertIsNone(context.root.parent) self.assertTrue(context.is_fragment()) self.assertFalse(context.is_rooted_subtree()) root = ElementTree.XML('') context = XPathContext(root[0], fragment=True) self.assertFalse(context.is_document()) self.assertIsInstance(context.root, ElementNode) self.assertIsNone(context.root.parent) self.assertIsNone(context.document) self.assertTrue(context.is_fragment()) self.assertFalse(context.is_rooted_subtree()) def test_no_root(self): with self.assertRaises(TypeError) as ctx: XPathContext() self.assertEqual(str(ctx.exception), "Missing both the root node and the context item!") context = XPathContext(item=7) self.assertIsNone(context.root) self.assertEqual(context.item, 7) self.assertListEqual(list(context.iter_self()), [7]) self.assertListEqual(list(context.iter_children_or_self()), []) self.assertListEqual(list(context.iter_attributes()), []) self.assertListEqual(list(context.iter_descendants()), []) self.assertListEqual(list(context.iter_parent()), []) self.assertListEqual(list(context.iter_preceding()), []) self.assertListEqual(list(context.iter_followings()), []) self.assertListEqual(list(context.iter_ancestors()), []) self.assertEqual(context.item, 7) root = ElementTree.XML('') root_node = get_node_tree(root) context = XPathContext(item=root_node) self.assertEqual(context.item, root_node) self.assertListEqual(list(context.iter_self()), [root_node]) self.assertListEqual(list(context.iter_children_or_self()), root_node[:]) self.assertListEqual(list(context.iter_attributes()), []) self.assertListEqual(list(context.iter_descendants()), [root_node, root_node[0], root_node[1]]) self.assertListEqual(list(context.iter_parent()), []) self.assertListEqual(list(context.iter_preceding()), []) self.assertListEqual(list(context.iter_followings()), []) self.assertListEqual(list(context.iter_ancestors()), []) self.assertEqual(context.item, root_node) context = XPathContext(item=root_node[0]) self.assertEqual(context.item, root_node[0]) self.assertListEqual(list(context.iter_self()), [root_node[0]]) self.assertListEqual(list(context.iter_children_or_self()), []) self.assertListEqual(list(context.iter_attributes()), []) self.assertListEqual(list(context.iter_descendants()), [root_node[0]]) self.assertListEqual(list(context.iter_parent()), [root_node]) self.assertListEqual(list(context.iter_preceding()), []) self.assertListEqual(list(context.iter_followings()), [root_node[1]]) self.assertListEqual(list(context.iter_ancestors()), [root_node]) self.assertEqual(context.item, root_node[0]) context = XPathContext(item=root_node[1]) self.assertEqual(context.item, root_node[1]) self.assertListEqual(list(context.iter_self()), [root_node[1]]) self.assertListEqual(list(context.iter_children_or_self()), []) self.assertListEqual(list(context.iter_attributes()), []) self.assertListEqual(list(context.iter_descendants()), [root_node[1]]) self.assertListEqual(list(context.iter_parent()), [root_node]) self.assertListEqual(list(context.iter_preceding()), [root_node[0]]) self.assertListEqual(list(context.iter_followings()), []) self.assertListEqual(list(context.iter_ancestors()), [root_node]) self.assertEqual(context.item, root_node[1]) def test_default_collection(self): node = TextNode('hello world!') context = XPathContext(self.root, default_collection=1) self.assertListEqual(context.default_collection, []) context = XPathContext(self.root, default_collection=[node]) self.assertListEqual(context.default_collection, [node]) def test_is_principal_node_kind(self): root = ElementTree.XML('') context = XPathContext(root) self.assertTrue(hasattr(context.item.obj, 'tag')) self.assertTrue(context.is_principal_node_kind()) context.item = context.root.attributes[0] self.assertFalse(context.is_principal_node_kind()) context.axis = 'attribute' self.assertTrue(context.is_principal_node_kind()) def test_iter_product(self): context = XPathContext(self.root) def sel1(_context): yield from range(2) def sel2(_context): yield from range(3) expected = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] self.assertListEqual(list(context.iter_product([sel1, sel2])), expected) self.assertEqual(context.variables, {}) self.assertListEqual(list(context.iter_product([sel1, sel2], [])), expected) self.assertEqual(context.variables, {}) self.assertListEqual(list(context.iter_product([sel1, sel2], ['a', 'b'])), expected) self.assertEqual(context.variables, {'a': 1, 'b': 2}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['a', 'b'])), expected) self.assertEqual(context.variables, {'a': 1, 'b': 2}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['a'])), expected) self.assertEqual(context.variables, {'a': 1, 'b': 0}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['c', 'b'])), expected) self.assertEqual(context.variables, {'a': 0, 'b': 2, 'c': 1}) context.variables = {'a': 0, 'b': 0} self.assertListEqual(list(context.iter_product([sel1, sel2], ['b'])), expected) self.assertEqual(context.variables, {'a': 0, 'b': 1}) def test_iter_attributes(self): root = ElementTree.XML('') context = XPathContext(root) attributes = context.root.attributes self.assertEqual(len(attributes), 2) self.assertListEqual(list(context.iter_attributes()), attributes) context.item = attributes[0] self.assertListEqual(list(context.iter_attributes()), attributes[:1]) with patch.object(DummyXsdType(), 'has_simple_content', return_value=True) as xsd_type: context = XPathContext(root) context.root.xsd_type = xsd_type self.assertListEqual(list(context.iter_attributes()), context.root.attributes) self.assertNotEqual(attributes, context.root.attributes) def test_iter_children_or_self(self): doc = ElementTree.ElementTree(self.root) context = XPathContext(doc) self.assertIsInstance(context.root, DocumentNode) self.assertIsInstance(context.root[0], ElementNode) self.assertListEqual(list(e.obj for e in context.iter_children_or_self()), [self.root]) context.item = context.root[0] # root element self.assertListEqual(list(context.iter_children_or_self()), [context.root[0].children[0]]) context.item = context.root # document node self.assertListEqual(list(e.obj for e in context.iter_children_or_self()), [self.root]) def test_iter_parent(self): root = ElementTree.XML('') context = XPathContext(root, item=None) self.assertListEqual(list(context.iter_parent()), []) context = XPathContext(root) self.assertListEqual(list(context.iter_parent()), []) with patch.object(DummyXsdType(), 'has_simple_content', return_value=True) as xsd_type: context = XPathContext(root, item=root) context.root.xsd_type = xsd_type self.assertListEqual(list(context.iter_parent()), []) root = ElementTree.XML('') context = XPathContext(root, item=None) self.assertListEqual(list(context.iter_parent()), []) context = XPathContext(root, item=root[2][0]) self.assertListEqual(list(e.obj for e in context.iter_parent()), [root[2]]) with patch.object(DummyXsdType(), 'is_empty', return_value=True) as xsd_type: context = XPathContext(root, item=root[2][0]) context.root[2][0].xsd_type = xsd_type self.assertListEqual(list(e.obj for e in context.iter_parent()), [root[2]]) def test_iter_siblings(self): root = ElementTree.XML('') context = XPathContext(root) self.assertListEqual(list(context.iter_siblings()), []) context = XPathContext(root, item=root[2]) self.assertListEqual(list(e.obj for e in context.iter_siblings()), list(root[3:])) with patch.object(DummyXsdType(), 'is_element_only', return_value=True) as xsd_type: context = XPathContext(root, item=root[2]) context.root[2].xsd_type = xsd_type self.assertListEqual(list(e.obj for e in context.iter_siblings()), list(root[3:])) context = XPathContext(root, item=root[2]) self.assertListEqual( list(e.obj for e in context.iter_siblings('preceding-sibling')), list(root[:2]) ) with patch.object(DummyXsdType(), 'is_element_only', return_value=True) as xsd_type: context = XPathContext(root, item=root[2]) context.root[2].xsd_type = xsd_type self.assertListEqual( list(e.obj for e in context.iter_siblings('preceding-sibling')), list(root[:2]) ) @unittest.skipIf(lxml_etree is None, 'lxml library is not installed') def test_iter_siblings__issue_44(self): root = lxml_etree.XML('text 1text 2 text 3') result = select(root, 'node()[1]/following-sibling::node()') self.assertListEqual(result, [root[0], 'text 2', root[1], ' text 3']) self.assertListEqual(result, root.xpath('node()[1]/following-sibling::node()')) @unittest.skipIf(lxml_etree is None, 'lxml library is not installed') def test_set_context_root__issue_71(self): root = lxml_etree.XML('') self.assertIsNone(root.getparent()) context = XPathContext(root) self.assertIs(context.root.obj, root) self.assertIsInstance(context.document, DocumentNode) parser = lxml_html.HTMLParser() root = lxml_html.fromstring('', parser=parser) self.assertIsNotNone(root.getparent()) context = XPathContext(root) self.assertIs(context.root.obj, root) self.assertIsInstance(context.document, DocumentNode) def test_iter_descendants(self): root = ElementTree.XML('') context = XPathContext(root) attr = context.root.attributes[0] self.assertListEqual(list(e.obj for e in context.iter_descendants()), [root, root[0], root[1]]) context.item = attr self.assertListEqual(list(context.iter_descendants(axis='descendant')), []) context.item = attr self.assertListEqual(list(context.iter_descendants()), [attr]) with patch.object(DummyXsdType(), 'has_mixed_content', return_value=True) as xsd_type: context = XPathContext(root, item=root) context.root.xsd_type = xsd_type self.assertListEqual( list(e.obj for e in context.iter_descendants()), [root, root[0], root[1]] ) def test_iter_ancestors(self): root = ElementTree.XML('') context = XPathContext(root) attr = context.root.attributes[0] self.assertListEqual(list(context.iter_ancestors()), []) context.item = attr self.assertListEqual(list(context.iter_ancestors()), [context.root]) result = list(e.obj for e in XPathContext(root, item=root[1]).iter_ancestors()) self.assertListEqual(result, [root]) with patch.object(DummyXsdType(), 'has_mixed_content', return_value=True) as xsd_type: context = XPathContext(root, item=root[1]) context.root[1].xsd_type = xsd_type self.assertListEqual(list(context.iter_ancestors()), [context.root]) def test_iter_preceding(self): root = ElementTree.XML('') context = XPathContext(root, item=None) self.assertListEqual(list(context.iter_preceding()), []) context = XPathContext(root) self.assertListEqual(list(context.iter_preceding()), []) with patch.object(DummyXsdType(), 'has_simple_content', return_value=True) as xsd_type: context = XPathContext(root, item=root) context.root.xsd_type = xsd_type self.assertListEqual(list(context.iter_preceding()), []) context = XPathContext(root, item='text') self.assertListEqual(list(context.iter_preceding()), []) root = ElementTree.XML('') context = XPathContext(root, item=root[2][1]) self.assertListEqual(list(e.obj for e in context.iter_preceding()), [root[0], root[0][0], root[1], root[2][0]]) def test_iter_following(self): root = ElementTree.XML('') context = XPathContext(root) self.assertListEqual(list(context.iter_followings()), []) context = XPathContext(root) context.item = context.root.attributes[0] self.assertListEqual(list(context.iter_followings()), []) context = XPathContext(root, item=root[2]) self.assertListEqual(list(e.obj for e in context.iter_followings()), list(root[3:])) context = XPathContext(root, item=root[1]) result = [root[2], root[2][0], root[3], root[4]] self.assertListEqual(list(e.obj for e in context.iter_followings()), result) with patch.object(DummyXsdType(), 'has_mixed_content', return_value=True) as xsd_type: context = XPathContext(root, item=root[1]) context.root[1].xsd_type = xsd_type self.assertListEqual(list(e.obj for e in context.iter_followings()), result) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath_nodes.py000066400000000000000000000517731476131650400240330ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from unittest.mock import patch from textwrap import dedent import io import xml.etree.ElementTree as ElementTree try: import lxml.etree as lxml_etree except ImportError: lxml_etree = None try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath.etree import is_etree_element, etree_iter_strings, \ etree_deep_equal, etree_iter_paths from elementpath.datatypes import UntypedAtomic from elementpath.xpath_nodes import DocumentNode, ElementNode, AttributeNode, TextNode, \ NamespaceNode, CommentNode, ProcessingInstructionNode, EtreeElementNode, TextAttributeNode from elementpath.tree_builders import get_node_tree from elementpath.xpath_context import XPathContext, XPathSchemaContext class DummyXsdType: name = local_name = None def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def decode(self, obj, *args, **kwargs): return int(obj) def validate(self, obj, *args, **kwargs): pass class XPathNodesTest(unittest.TestCase): elem = ElementTree.XML('') def setUp(self): root = ElementTree.Element('root') self.context = XPathContext(root) # Dummy context for creating nodes def test_is_etree_element_function(self): self.assertTrue(is_etree_element(self.elem)) self.assertFalse(is_etree_element('text')) self.assertFalse(is_etree_element(None)) def test_elem_iter_strings_function(self): root = ElementTree.XML('text1\ntext2tail1text3tail2') result = ['text1\n', 'text2', 'tail1', 'tail2', 'text3'] self.assertListEqual(list(etree_iter_strings(root)), result) with patch.multiple(DummyXsdType, has_mixed_content=lambda x: True): xsd_type = DummyXsdType() typed_root = EtreeElementNode(elem=root) setattr(typed_root, 'xsd_type', xsd_type) self.assertListEqual(list(etree_iter_strings(typed_root.elem)), result) norm_result = ['text1', 'text2', 'tail1', 'tail2', 'text3'] with patch.multiple(DummyXsdType, is_element_only=lambda x: True): xsd_type = DummyXsdType() typed_root = EtreeElementNode(elem=root) setattr(typed_root, 'xsd_type', xsd_type) self.assertListEqual(list(etree_iter_strings(typed_root.elem, True)), norm_result) comment = ElementTree.Comment('foo') root[1].append(comment) self.assertListEqual(list(etree_iter_strings(typed_root.elem, True)), norm_result) self.assertListEqual(list(etree_iter_strings(root)), result) def test_etree_deep_equal_function(self): root = ElementTree.XML('10end') self.assertTrue(etree_deep_equal(root, root)) elem = ElementTree.XML('11end') self.assertFalse(etree_deep_equal(root, elem)) elem = ElementTree.XML('1030end') self.assertFalse(etree_deep_equal(root, elem)) elem = ElementTree.XML('10end') self.assertTrue(etree_deep_equal(root, elem)) elem = ElementTree.XML('10end') self.assertFalse(etree_deep_equal(root, elem)) def test_match_name_method(self): attr = AttributeNode('a1', '10', parent=None) self.assertTrue(attr.match_name('*')) self.assertTrue(attr.match_name('a1')) self.assertTrue(attr.match_name('*:a1')) self.assertFalse(attr.match_name('{foo}*')) self.assertFalse(attr.match_name('foo:*')) self.assertTrue( AttributeNode('{foo}a1', '10').match_name('{foo}*') ) attr = AttributeNode('{http://xpath.test/ns}a1', '10', parent=None) self.assertTrue(attr.match_name('*:a1')) def test_node_base_uri(self): xml_test = '' self.assertEqual(EtreeElementNode(ElementTree.XML(xml_test)).base_uri, '/') document = ElementTree.parse(io.StringIO(xml_test)) self.assertIsNone(DocumentNode(document).base_uri, '/') self.assertIsNone(EtreeElementNode(self.elem).base_uri) self.assertIsNone(TextNode('a text node').base_uri) xml_test = dedent("""\ """) root_node = get_node_tree(ElementTree.XML(xml_test)) self.assertEqual(root_node.base_uri, 'http://example.org/wine/') self.assertIsInstance(root_node[0], TextNode) self.assertEqual(root_node[1].base_uri, 'http://example.org/wine/rosé') xml_test = dedent("""\ """) root_node = get_node_tree(ElementTree.XML(xml_test)) self.assertEqual(root_node.base_uri, 'http://example.test/xpath/') self.assertIsInstance(root_node[0], TextNode) self.assertEqual(root_node[0].base_uri, 'http://example.test/xpath/') self.assertEqual(root_node[1].base_uri, 'urn:isbn:0451450523') self.assertIsInstance(root_node[2], TextNode) self.assertEqual(root_node[3].base_uri, 'urn:isan:0000-0000-2CEA-0000-1-0000-0000-Y') self.assertEqual(root_node[5].base_uri, 'urn:ISSN:0167-6423') self.assertEqual(root_node[7].base_uri, 'urn:ietf:rfc:2648') self.assertEqual(root_node[9].base_uri, 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66') def test_node_document_uri_function(self): node = EtreeElementNode(self.elem) self.assertIsNone(node.document_uri) xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document) self.assertIsNone(node.document_uri) node = DocumentNode(document, uri=' http://xpath.test/doc.xml ') self.assertEqual(node.document_uri, 'http://xpath.test/doc.xml') xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document) self.assertIsNone(node.document_uri) xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document, uri="dir1/dir2") self.assertIsNone(node.document_uri) xml_test = '' document = ElementTree.parse(io.StringIO(xml_test)) node = DocumentNode(document, uri="http://[xpath.test") self.assertIsNone(node.document_uri) def test_attribute_nodes(self): parent = self.context.root attribute = TextAttributeNode('id', '0212349350') self.assertEqual(repr(attribute), "TextAttributeNode(name='id', value='0212349350')") self.assertNotEqual(attribute, AttributeNode('id', '0212349350')) self.assertEqual(attribute.as_item(), ('id', '0212349350')) self.assertNotEqual(attribute.as_item(), AttributeNode('id', '0212349350')) self.assertNotEqual(attribute, AttributeNode('id', '0212349350', parent)) attribute = AttributeNode('id', '0212349350', parent) self.assertNotEqual(attribute, AttributeNode('id', '0212349350', parent)) self.assertEqual(attribute.as_item(), ('id', '0212349350')) attribute = AttributeNode('value', '10', parent) self.assertEqual(repr(attribute), "TextAttributeNode(name='value', value='10')") with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() attribute.xsd_type = xsd_type self.assertEqual(attribute.as_item(), ('value', '10')) def test_typed_element_nodes(self): element = ElementTree.Element('schema') with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() context = XPathContext(element) context.root.xsd_type = xsd_type self.assertTrue(repr(context.root).startswith( "EtreeElementNode(elem=")) self.assertListEqual(elem.children, [x for x in elem]) document = DocumentNode(ElementTree.parse(io.StringIO(""))) self.assertListEqual(document.children, []) # not built document document.children.append(EtreeElementNode(document.value.getroot(), document)) self.assertListEqual(document.children, [document.getroot()]) self.assertIsNone(TextNode('a text node').children) def test_node_nilled_property(self): xml_test = '' self.assertTrue(EtreeElementNode(ElementTree.XML(xml_test)).nilled) xml_test = '' self.assertFalse(EtreeElementNode(ElementTree.XML(xml_test)).nilled) self.assertFalse(EtreeElementNode(ElementTree.XML('')).nilled) self.assertFalse(TextNode('foo').nilled) def test_node_kind_property(self): document = DocumentNode(ElementTree.parse(io.StringIO(u''))) element = EtreeElementNode(ElementTree.Element('schema')) attribute = AttributeNode('id', '0212349350') namespace = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') comment = CommentNode(ElementTree.Comment('nothing important')) pi = ProcessingInstructionNode( ElementTree.ProcessingInstruction('action', 'nothing to do') ) text = TextNode('betelgeuse') self.assertEqual(document.node_kind, 'document') self.assertEqual(element.node_kind, 'element') self.assertEqual(attribute.node_kind, 'attribute') self.assertEqual(namespace.node_kind, 'namespace') self.assertEqual(comment.node_kind, 'comment') self.assertEqual(pi.node_kind, 'processing-instruction') self.assertEqual(text.node_kind, 'text') def test_name_property(self): root = self.context.root attr = AttributeNode('a1', '20') namespace = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') self.assertEqual(root.name, 'root') self.assertEqual(attr.name, 'a1') self.assertEqual(namespace.name, 'xs') def test_path_property(self): root = ElementTree.XML('') context = XPathContext(root) self.assertEqual(context.root.path, '/Q{}A[1]') self.assertEqual(context.root[0].path, '/Q{}A[1]/Q{}B1[1]') self.assertEqual(context.root[0][0].path, '/Q{}A[1]/Q{}B1[1]/Q{}C1[1]') self.assertEqual(context.root[1].path, '/Q{}A[1]/Q{}B2[1]') self.assertEqual(context.root[2].path, '/Q{}A[1]/Q{}B3[1]') self.assertEqual(context.root[2][0].path, '/Q{}A[1]/Q{}B3[1]/Q{}C1[1]') self.assertEqual(context.root[2][1].path, '/Q{}A[1]/Q{}B3[1]/Q{}C2[1]') attr = context.root[2][1].attributes[0] self.assertEqual(attr.path, '/Q{}A[1]/Q{}B3[1]/Q{}C2[1]/@max') document = ElementTree.ElementTree(root) context = XPathContext(root=document) self.assertEqual(context.root[0][2][0].path, '/Q{}A[1]/Q{}B3[1]/Q{}C1[1]') self.assertEqual(context.root[0][2][0].extended_path, '/A[1]/B3[1]/C1[1]') root = ElementTree.XML('10') context = XPathContext(root) with patch.object(DummyXsdType(), 'is_simple', return_value=True) as xsd_type: elem = context.root[0] elem.xsd_type = xsd_type self.assertEqual(elem.path, '/Q{}A[1]/Q{}B1[1]') with patch.object(DummyXsdType(), 'is_simple', return_value=True) as xsd_type: context = XPathContext(root) attr = context.root[1].attributes[0] attr.xsd_type = xsd_type self.assertEqual(attr.path, '/Q{}A[1]/Q{}B2[1]/@min') def test_path_property_with_namespaces(self): root = ElementTree.XML('' '') context = XPathContext(root, namespaces={'tns': 'foo'}) self.assertEqual(context.root.path, '/Q{foo}A[1]') self.assertEqual(context.root.qname_path, '/tns:A[1]') self.assertEqual(context.root[0].path, '/Q{foo}A[1]/Q{}B1[1]') self.assertEqual(context.root[0][0].qname_path, '/tns:A[1]/B1[1]/C1[1]') def test_element_node_iter(self): root = ElementTree.XML('text1\ntext2text3') context = XPathContext(root) expected = [ context.root, context.root.namespace_nodes[0], context.root[0], context.root[1], context.root[1].namespace_nodes[0], context.root[1].attributes[0], context.root[1][0], context.root[2], context.root[2].namespace_nodes[0], context.root[3], context.root[3].namespace_nodes[0], context.root[3][0], context.root[3][0].namespace_nodes[0], context.root[3][0][0] ] result = list(context.root.iter()) self.assertListEqual(result, expected) root = ElementTree.XML('') context = XPathContext(root) # iter includes also xml namespace nodes self.assertListEqual( list(e.elem for e in context.root.iter() if isinstance(e, ElementNode)), list(root.iter()) ) def test_document_node_iter(self): root = ElementTree.XML('') doc = ElementTree.ElementTree(root) context = XPathContext(doc) self.assertListEqual( list(e.elem for e in context.root.iter() if isinstance(e, ElementNode)), list(doc.iter()) ) @unittest.skipIf(lxml_etree is None, 'lxml.etree is not installed') def test_lazy_attributes_iter__issue_72(self): xml = lxml_etree.fromstring('') node_tree = get_node_tree(root=xml) nodes = list(node for node in node_tree.iter_lazy()) self.assertListEqual(nodes, [node_tree]) nodes = list(node for node in node_tree.iter()) self.assertListEqual(nodes, [ node_tree, node_tree.namespace_nodes[0], node_tree.attributes[0] ]) nodes = list(node for node in node_tree.iter_lazy()) self.assertListEqual(nodes, [ node_tree, node_tree.namespace_nodes[0], node_tree.attributes[0] ]) def test_is_schema_node(self): root = ElementTree.XML('text') context = XPathContext(root) self.assertFalse(context.root.is_schema_node) self.assertFalse(context.root.attributes[0].is_schema_node) self.assertFalse(context.root.children[0].is_schema_node) if xmlschema is not None: schema = xmlschema.XMLSchema(dedent(""" """)) context = XPathSchemaContext(schema) self.assertTrue(context.root.is_schema_node) # Is the schema self.assertTrue(context.root.attributes[0].is_schema_node) self.assertTrue(context.root.children[0].is_schema_node) def test_etree_iter_paths(self): root = ElementTree.XML('') root[2].append(ElementTree.Comment('a comment')) root[2].append(ElementTree.Element('c3')) # duplicated tag items = list(etree_iter_paths(root)) self.assertListEqual(items, [ (root, '.'), (root[0], './Q{}b1[1]'), (root[0][0], './Q{}b1[1]/Q{}c1[1]'), (root[0][1], './Q{}b1[1]/Q{}c2[1]'), (root[1], './Q{}b2[1]'), (root[2], './Q{}b3[1]'), (root[2][0], './Q{}b3[1]/Q{}c3[1]'), (root[2][1], './Q{}b3[1]/comment()[1]'), (root[2][2], './Q{}b3[1]/Q{}c3[2]') ]) self.assertListEqual(list(etree_iter_paths(root, path='')), [ (root, ''), (root[0], 'Q{}b1[1]'), (root[0][0], 'Q{}b1[1]/Q{}c1[1]'), (root[0][1], 'Q{}b1[1]/Q{}c2[1]'), (root[1], 'Q{}b2[1]'), (root[2], 'Q{}b3[1]'), (root[2][0], 'Q{}b3[1]/Q{}c3[1]'), (root[2][1], 'Q{}b3[1]/comment()[1]'), (root[2][2], 'Q{}b3[1]/Q{}c3[2]') ]) self.assertListEqual(list(etree_iter_paths(root, path='/')), [ (root, '/'), (root[0], '/Q{}b1[1]'), (root[0][0], '/Q{}b1[1]/Q{}c1[1]'), (root[0][1], '/Q{}b1[1]/Q{}c2[1]'), (root[1], '/Q{}b2[1]'), (root[2], '/Q{}b3[1]'), (root[2][0], '/Q{}b3[1]/Q{}c3[1]'), (root[2][1], '/Q{}b3[1]/comment()[1]'), (root[2][2], '/Q{}b3[1]/Q{}c3[2]') ]) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/test_xpath_tokens.py000066400000000000000000000534561476131650400242260ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2021, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # import unittest from unittest.mock import patch import io import math import xml.etree.ElementTree as ElementTree from decimal import Decimal try: import xmlschema except ImportError: xmlschema = None else: xmlschema.XMLSchema.meta_schema.build() from elementpath.exceptions import MissingContextError from elementpath.datatypes import UntypedAtomic, Int from elementpath.namespaces import XSD_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE from elementpath.xpath_nodes import ElementNode, AttributeNode, NamespaceNode, \ CommentNode, ProcessingInstructionNode, TextNode, DocumentNode, \ SchemaAttributeNode, TextAttributeNode, EtreeElementNode from elementpath.helpers import ordinal from elementpath.xpath_context import XPathContext, XPathSchemaContext from elementpath.xpath1 import XPath1Parser from elementpath.xpath2 import XPath2Parser from elementpath.xpath3 import XPath30Parser, XPath31Parser class DummyXsdType: name = local_name = None @property def root_type(self): return self @property def simple_type(self): return self def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_list(self): pass def is_union(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass @staticmethod def validate(obj, *args, **kwargs): Int.validate(obj) @staticmethod def decode(obj, *args, **kwargs): return int(obj) class Tagged(object): tag = 'root' def __repr__(self): return 'Tagged(tag=%r)' % self.tag class XPath1TokenTest(unittest.TestCase): @classmethod def setUpClass(cls): cls.parser = XPath1Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) def test_ordinal_function(self): self.assertEqual(ordinal(1), '1st') self.assertEqual(ordinal(2), '2nd') self.assertEqual(ordinal(3), '3rd') self.assertEqual(ordinal(4), '4th') self.assertEqual(ordinal(11), '11th') self.assertEqual(ordinal(23), '23rd') self.assertEqual(ordinal(34), '34th') def test_arity_property(self): token = self.parser.parse('true()') self.assertEqual(token.symbol, 'true') self.assertEqual(token.label, 'function') self.assertEqual(token.arity, 0) token = self.parser.parse('2 + 5') self.assertEqual(token.symbol, '+') self.assertEqual(token.label, 'operator') self.assertEqual(token.arity, 2) def test_source_property(self): token = self.parser.parse('last()') self.assertEqual(token.symbol, 'last') self.assertEqual(token.label, 'function') self.assertEqual(token.source, 'last()') token = self.parser.parse('2.0') self.assertEqual(token.symbol, '(decimal)') self.assertEqual(token.label, 'literal') self.assertEqual(token.source, '2.0') def test_position(self): parser = XPath2Parser() token = parser.parse("(1, 2, 3, 4)") self.assertEqual(token.symbol, '(') self.assertEqual(token.position, (1, 1)) token = parser.parse("(: Comment line :)\n\n (1, 2, 3, 4)") self.assertEqual(token.symbol, '(') self.assertEqual(token.position, (3, 2)) def test_iter_method(self): token = self.parser.parse('2 + 5') items = [tk for tk in token.iter()] self.assertListEqual(items, [token[0], token, token[1]]) token = self.parser.parse('/A/B[C]/D/@a') self.assertEqual(token.tree, '(/ (/ (/ (/ (A)) ([ (B) (C))) (D)) (@ (a)))') self.assertListEqual(list(tk.value for tk in token.iter()), ['/', 'A', '/', 'B', '[', 'C', '/', 'D', '/', '@', 'a']) self.assertListEqual(list(tk.value for tk in token.iter('(name)')), ['A', 'B', 'C', 'D', 'a']) self.assertListEqual(list(tk.source for tk in token.iter('/')), ['/A', '/A/B[C]', '/A/B[C]/D', '/A/B[C]/D/@a']) def test_iter_leaf_elements_method(self): token = self.parser.parse('2 + 5') self.assertListEqual(list(token.iter_leaf_elements()), []) token = self.parser.parse('/A/B[C]/D/@a') self.assertListEqual(list(token.iter_leaf_elements()), []) token = self.parser.parse('/A/B[C]/D') self.assertListEqual(list(token.iter_leaf_elements()), ['D']) token = self.parser.parse('/A/B[C]') self.assertEqual(token.tree, '(/ (/ (A)) ([ (B) (C)))') self.assertListEqual(list(token.iter_leaf_elements()), ['B']) def test_get_argument_method(self): token = self.parser.symbol_table['true'](self.parser) self.assertIsNone(token.get_argument(2)) with self.assertRaises(TypeError): token.get_argument(1, required=True) @patch.multiple(DummyXsdType, is_simple=lambda x: False, has_simple_content=lambda x: True) def test_select_results(self): token = self.parser.parse('.') elem = ElementTree.Element('A', attrib={'max': '30'}) elem.text = '10' xsd_type = DummyXsdType() context = XPathContext(elem) self.assertListEqual(list(token.select_results(context)), [elem]) context = XPathContext(elem, item=elem) setattr(context.root, 'xsd_type', xsd_type) self.assertListEqual(list(token.select_results(context)), [elem]) context = XPathContext(elem) context.item = context.root.attributes[0] self.assertListEqual(list(token.select_results(context)), ['30']) context = XPathContext(elem) context.item = context.root.attributes[0] setattr(context.item, 'xsd_type', xsd_type) self.assertListEqual(list(token.select_results(context)), ['30']) context = XPathContext(elem, item=10) self.assertListEqual(list(token.select_results(context)), [10]) context = XPathContext(elem, item='10') self.assertListEqual(list(token.select_results(context)), ['10']) def test_cast_to_double(self): token = self.parser.parse('.') self.assertEqual(token.cast_to_double(1), 1.0) with self.assertRaises(ValueError) as ctx: token.cast_to_double('nan') self.assertIn('FORG0001', str(ctx.exception)) if self.parser.version != '1.0': self.parser._xsd_version = '1.1' self.assertEqual(token.cast_to_double('1'), 1.0) self.parser._xsd_version = '1.0' def test_atomization_function(self): root = ElementTree.Element('root') token = self.parser.parse('/unknown/.') context = XPathContext(root) self.assertListEqual(list(token.atomization(context)), []) if self.parser.version > '1.0': token = self.parser.parse('((), 1, 3, "a")') self.assertListEqual(list(token.atomization()), [1, 3, 'a']) def test_boolean_value_function(self): token = self.parser.parse('true()') elem = ElementTree.Element('A') context = XPathContext(elem) self.assertTrue(token.boolean_value(context.root)) self.assertFalse(token.boolean_value([])) self.assertTrue(token.boolean_value([context.root])) self.assertFalse(token.boolean_value([0])) self.assertTrue(token.boolean_value([1])) with self.assertRaises(TypeError): token.boolean_value([1, 1]) self.assertFalse(token.boolean_value(0)) self.assertTrue(token.boolean_value(1)) self.assertTrue(token.boolean_value(1.0)) self.assertFalse(token.boolean_value(None)) @patch.multiple(DummyXsdType(), is_simple=lambda x: False, has_simple_content=lambda x: True) def test_data_value_function(self): token = self.parser.parse('true()') if self.parser.version != '1.0': xsd_type = DummyXsdType() context = XPathContext(ElementTree.XML('19')) setattr(context.root, 'xsd_type', xsd_type) self.assertEqual(token.data_value(context.root), 19) obj = AttributeNode('age', '19') self.assertEqual(token.data_value(obj), UntypedAtomic('19')) self.assertIsInstance(obj, TextAttributeNode) obj = TextAttributeNode('age', '19') self.assertEqual(token.data_value(obj), UntypedAtomic('19')) obj = NamespaceNode('tns', 'http://xpath.test/ns') self.assertEqual(token.data_value(obj), 'http://xpath.test/ns') obj = TextNode('19') self.assertEqual(token.data_value(obj), UntypedAtomic('19')) obj = ElementTree.XML('abcde') element_node = ElementNode(obj) self.assertEqual(token.data_value(element_node), UntypedAtomic('abcde')) self.assertIsInstance(element_node, EtreeElementNode) element_node = EtreeElementNode(obj) self.assertEqual(token.data_value(element_node), UntypedAtomic('abcde')) obj = ElementTree.parse(io.StringIO('abcde')) document_node = DocumentNode(obj) self.assertEqual(token.data_value(document_node), UntypedAtomic('abcde')) obj = ElementTree.Comment("foo bar") comment_node = CommentNode(obj) self.assertEqual(token.data_value(comment_node), 'foo bar') obj = ElementTree.ProcessingInstruction('action', 'nothing to do') pi_node = ProcessingInstructionNode(obj) self.assertEqual(token.data_value(pi_node), 'nothing to do') self.assertIsNone(token.data_value(None)) self.assertEqual(token.data_value(19), 19) self.assertEqual(token.data_value('19'), '19') self.assertFalse(token.data_value(False)) tagged_object = Tagged() with self.assertRaises(TypeError): token.data_value(tagged_object) def test_string_value_function(self): token = self.parser.parse('true()') document = ElementTree.parse(io.StringIO(u'123456789')) element = ElementTree.Element('schema') comment = ElementTree.Comment('nothing important') pi = ElementTree.ProcessingInstruction('action', 'nothing to do') document_node = XPathContext(document).root context = XPathContext(element) element_node = context.root attribute_node = TextAttributeNode('id', '0212349350') namespace_node = NamespaceNode('xs', 'http://www.w3.org/2001/XMLSchema') comment_node = CommentNode(comment) pi_node = ProcessingInstructionNode(pi) text_node = TextNode('betelgeuse') self.assertEqual(token.string_value(document_node), '123456789') self.assertEqual(token.string_value(element_node), '') self.assertEqual(token.string_value(attribute_node), '0212349350') self.assertEqual(token.string_value(namespace_node), 'http://www.w3.org/2001/XMLSchema') self.assertEqual(token.string_value(comment_node), 'nothing important') self.assertEqual(token.string_value(pi_node), 'nothing to do') self.assertEqual(token.string_value(text_node), 'betelgeuse') self.assertEqual(token.string_value(None), '') self.assertEqual(token.string_value(Decimal(+1999)), '1999') self.assertEqual(token.string_value(Decimal('+1999')), '1999') self.assertEqual(token.string_value(Decimal('+19.0010')), '19.001') self.assertEqual(token.string_value(10), '10') self.assertEqual(token.string_value(1e99), '1E99') self.assertEqual(token.string_value(1e-05), '1E-05') self.assertEqual(token.string_value(1.00), '1') self.assertEqual(token.string_value(+19.0010), '19.001') self.assertEqual(token.string_value(float('nan')), 'NaN') self.assertEqual(token.string_value(float('inf')), 'INF') self.assertEqual(token.string_value(float('-inf')), '-INF') self.assertEqual(token.string_value(()), '()') tagged_object = Tagged() self.assertEqual(token.string_value(tagged_object), "Tagged(tag='root')") with patch.multiple(DummyXsdType, is_simple=lambda x: True): xsd_type = DummyXsdType() element.text = '10' typed_elem = EtreeElementNode(elem=element) setattr(typed_elem, 'xsd_type', xsd_type) self.assertEqual(token.string_value(typed_elem), '10') self.assertEqual(token.data_value(typed_elem), 10) def test_number_value_function(self): token = self.parser.parse('true()') self.assertEqual(token.number_value("19"), 19) self.assertTrue(math.isnan(token.number_value("not a number"))) def test_compare_operator(self): token1 = self.parser.parse('true()') token2 = self.parser.parse('false()') self.assertEqual(token1, token1) self.assertNotEqual(token1, token2) self.assertNotEqual(token2, 'false()') def test_expected_method(self): token = self.parser.parse('.') self.assertIsNone(token.expected('.')) with self.assertRaises(SyntaxError) as ctx: raise token.expected('*') self.assertIn('XPST0003', str(ctx.exception)) def test_unexpected_method(self): token = self.parser.parse('.') self.assertIsNone(token.unexpected('*')) with self.assertRaises(SyntaxError) as ctx: raise token.unexpected('.') self.assertIn('XPST0003', str(ctx.exception)) with self.assertRaises(SyntaxError) as ctx: raise token.unexpected('.', message="unknown error") self.assertIn('XPST0003', str(ctx.exception)) self.assertIn('unknown error', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.unexpected('.', code='XPST0017') self.assertIn('XPST0017', str(ctx.exception)) def test_xpath_error(self): token = self.parser.parse('.') with self.assertRaises(ValueError) as ctx: raise token.error('xml:XPST0003') self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn("'xml:XPST0003' is not an XPath error code", str(ctx.exception)) with self.assertRaises(ValueError) as ctx: raise token.error('err:err:XPST0003') self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn("'err:err:XPST0003' is not an XPath error code", str(ctx.exception)) with self.assertRaises(ValueError) as ctx: raise token.error('XPST9999') self.assertIn('XPTY0004', str(ctx.exception)) self.assertIn("unknown XPath error code", str(ctx.exception)) def test_xpath_error_shortcuts(self): token = self.parser.parse('.') with self.assertRaises(ValueError) as ctx: raise token.wrong_value() self.assertIn('FOCA0002', str(ctx.exception)) with self.assertRaises(TypeError) as ctx: raise token.wrong_type() self.assertIn('FORG0006', str(ctx.exception)) with self.assertRaises(MissingContextError) as ctx: raise token.missing_context() self.assertIn('XPDY0002', str(ctx.exception)) def test_names_disambiguation(self): ambiguous_names = [ symbol for symbol, tk_cls in self.parser.symbol_table.items() if self.parser.name_pattern.match(tk_cls.symbol) and '{' not in symbol ] path = '/'.join(ambiguous_names) root_token = self.parser.parse(path) for tk in root_token.iter(): self.assertIn(tk.symbol, ('(root)', '/', '(name)'), msg=tk.symbol) for path in ambiguous_names: root_token = self.parser.parse(path) for tk in root_token.iter(): self.assertEqual(tk.symbol, '(name)', msg=tk.symbol) class XPath2TokenTest(XPath1TokenTest): @classmethod def setUpClass(cls): cls.parser = XPath2Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) def test_bind_namespace_method(self): token = self.parser.parse('true()') self.assertIsNone(token.bind_namespace(XPATH_FUNCTIONS_NAMESPACE)) with self.assertRaises(TypeError) as ctx: token.bind_namespace(XSD_NAMESPACE) self.assertIn('XPST0017', str(ctx.exception)) self.assertIn("a name, a wildcard or a constructor function", str(ctx.exception)) token = self.parser.parse("xs:string(10.1)") with self.assertRaises(TypeError) as ctx: token.bind_namespace(XSD_NAMESPACE) self.assertIn('XPST0017', str(ctx.exception)) self.assertIn("a name, a wildcard or a constructor function", str(ctx.exception)) self.assertIsNone(token[1].bind_namespace(XSD_NAMESPACE)) with self.assertRaises(TypeError) as ctx: token[1].bind_namespace(XPATH_FUNCTIONS_NAMESPACE) self.assertIn("a function expected", str(ctx.exception)) token = self.parser.parse("tst:foo") with self.assertRaises(TypeError) as ctx: token.bind_namespace('http://xpath.test/ns') self.assertIn('XPST0017', str(ctx.exception)) self.assertIn("a name, a wildcard or a function", str(ctx.exception)) @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_xsd_type_labeling(self): schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: context = XPathSchemaContext(root=schema, axis='self', schema=self.parser.schema) self.assertListEqual(list(context.iter_matching_nodes('root')), []) tag = '{%s}schema' % XSD_NAMESPACE self.assertListEqual( list(e.elem for e in context.iter_matching_nodes(tag)), [schema] ) finally: self.parser.schema = None @unittest.skipIf(xmlschema is None, "xmlschema library required.") def test_match_xsd_type(self): schema = xmlschema.XMLSchema(""" """) self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) try: root_token = self.parser.parse('root') context = XPathSchemaContext(root=schema) obj = list(context.iter_matching_nodes('root')) self.assertIsInstance(obj[0], ElementNode) context.axis = 'self' root_token.xsd_types = None list(context.iter_matching_nodes('root')) self.assertIsNone(root_token.xsd_types) context.axis = None obj = list(context.iter_matching_nodes('root')) self.assertIsInstance(obj[0], ElementNode) context = XPathSchemaContext(root=schema.meta_schema) obj = list(context.iter_matching_nodes('root')) self.assertListEqual(obj, []) root_token = self.parser.parse('@a') context = XPathSchemaContext(root=schema.meta_schema, axis='self') xsd_attribute = schema.attributes['a'] context.item = AttributeNode('a', xsd_attribute) setattr(context.item, 'xsd_type', xsd_attribute.type) obj = list(context.iter_matching_nodes('a')) self.assertIsInstance(obj[0], AttributeNode) self.assertIsNotNone(obj[0].xsd_type) root_token.xsd_types = None context = XPathSchemaContext(root=schema) list(context.iter_matching_nodes('a')) self.assertIsNone(root_token.xsd_types) context = XPathSchemaContext(root=schema.meta_schema, axis='self') attribute = context.item = SchemaAttributeNode(schema.attributes['a']) obj = list(context.iter_matching_nodes('a')) self.assertIsInstance(obj[0], AttributeNode) self.assertEqual(obj[0], attribute) self.assertIsInstance(obj[0].value, xmlschema.XsdAttribute) self.assertIsInstance(next(iter(obj[0].iter_typed_values), None), str) finally: self.parser.schema = None def test_string_value_function(self): super(XPath2TokenTest, self).test_string_value_function() if xmlschema is not None: schema = xmlschema.XMLSchema(""" """) token = self.parser.parse('.') self.parser.schema = xmlschema.xpath.XMLSchemaProxy(schema) context = XPathContext(schema) try: root = context.root[0] value = token.string_value(root) # 'root' element self.assertIsInstance(value, str) self.assertEqual(value, '1') finally: self.parser.schema = None class XPath30TokenTest(XPath2TokenTest): @classmethod def setUpClass(cls): cls.parser = XPath30Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) class XPath31TokenTest(XPath30TokenTest): @classmethod def setUpClass(cls): cls.parser = XPath31Parser(namespaces={'xs': XSD_NAMESPACE, 'tst': "http://xpath.test/ns"}) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tests/xpath_test_class.py000066400000000000000000000300351476131650400240140ustar00rootroot00000000000000#!/usr/bin/env python # # Copyright (c), 2018-2020, SISSA (International School for Advanced Studies). # All rights reserved. # This file is distributed under the terms of the MIT License. # See the file 'LICENSE' in the root directory of the present # distribution, or http://opensource.org/licenses/MIT. # # @author Davide Brunato # # # Note: Many tests are built using the examples of the XPath standards, # published by W3C under the W3C Document License. # # References: # http://www.w3.org/TR/1999/REC-xpath-19991116/ # http://www.w3.org/TR/2010/REC-xpath20-20101214/ # http://www.w3.org/TR/2010/REC-xpath-functions-20101214/ # https://www.w3.org/Consortium/Legal/2015/doc-license # https://www.w3.org/TR/charmod-norm/ # import unittest import math from copy import copy from contextlib import contextmanager from xml.etree import ElementTree from elementpath import ElementPathError, XPath2Parser, XPathContext, \ XPathFunction, select from elementpath.namespaces import XML_NAMESPACE, XSD_NAMESPACE, \ XSI_NAMESPACE, XPATH_FUNCTIONS_NAMESPACE class DummyXsdType: name = local_name = None @property def root_type(self): return self @property def simple_type(self): return self def is_matching(self, name, default_namespace): pass def is_empty(self): pass def is_simple(self): pass def has_simple_content(self): pass def has_mixed_content(self): pass def is_element_only(self): pass def is_key(self): pass def is_qname(self): pass def is_notation(self): pass def decode(self, obj, *args, **kwargs): pass def validate(self, obj, *args, **kwargs): pass # noinspection PyPropertyAccess class XPathTestCase(unittest.TestCase): namespaces = { 'xml': XML_NAMESPACE, 'xs': XSD_NAMESPACE, 'xsi': XSI_NAMESPACE, 'fn': XPATH_FUNCTIONS_NAMESPACE, 'eg': 'http://www.example.com/ns/', 'tst': 'http://xpath.test/ns', } variables = { 'values': [10, 20, 5], 'myaddress': 'admin@example.com', 'word': 'alpha', } etree = ElementTree def setUp(self): self.parser = XPath2Parser(self.namespaces) # # Helper methods def check_tokenizer(self, path, expected): """ Checks the list of lexemes generated by the parser tokenizer. :param path: the XPath expression to be checked. :param expected: a list with lexemes generated by the tokenizer. """ self.assertEqual([ lit or symbol or name or unexpected for lit, symbol, name, unexpected in self.parser.__class__.tokenizer.findall(path) ], expected) def check_token(self, symbol, expected_label=None, expected_str=None, expected_repr=None, value=None): """ Checks a token class of an XPath parser class. The instance of the token is created using the value argument and than is checked against other optional arguments. :param symbol: the string that identifies the token class in the parser's symbol table. :param expected_label: the expected label for the token instance. :param expected_str: the expected string conversion of the token instance. :param expected_repr: the expected string representation of the token instance. :param value: the value used to create the token instance. """ token = self.parser.symbol_table[symbol](self.parser, value) self.assertEqual(token.symbol, symbol) if expected_label is not None: self.assertEqual(token.label, expected_label) if expected_str is not None: self.assertEqual(str(token), expected_str) if expected_repr is not None: self.assertEqual(repr(token), expected_repr) def check_tree(self, path, expected): """ Checks the tree string representation of a parsed path. :param path: an XPath expression. :param expected: the expected result string. """ token = self.parser.parse(path) self.assertEqual(token.tree, expected) def check_source(self, path, expected=None): """ Checks the source representation of a parsed path. :param path: an XPath expression. :param expected: the expected result string. """ token = self.parser.parse(path) self.assertEqual(token.source, expected or path) def check_value(self, path, expected=None, context=None): """ Checks the result of the *evaluate* method with an XPath expression. The evaluation is applied on the root token of the parsed XPath expression. :param path: an XPath expression. :param expected: the expected result. Can be a data instance to compare to the result, \ a type to be used to check the type of the result, a function that accepts the result \ as argument and returns a boolean value, an exception class that is raised by running \ the evaluate method. :param context: an optional `XPathContext` instance to be passed to evaluate method. """ context = copy(context) try: root_token = self.parser.parse(path) except ElementPathError as err: if isinstance(expected, type) and isinstance(err, expected): return raise if expected is None: self.assertEqual(root_token.evaluate(context), []) elif isinstance(expected, type): if issubclass(expected, Exception): self.assertRaises(expected, root_token.evaluate, context) else: self.assertIsInstance(root_token.evaluate(context), expected) elif isinstance(expected, float): value = root_token.evaluate(context) if not math.isnan(expected): self.assertAlmostEqual(value, expected) else: if isinstance(value, list): value = [x for x in value if value is not None and value != []] self.assertTrue(len(value) == 1) value = value[0] self.assertIsInstance(value, float) self.assertTrue(math.isnan(value)) elif isinstance(expected, list): self.assertListEqual(root_token.evaluate(context), expected) elif isinstance(expected, set): self.assertEqual(set(root_token.evaluate(context)), expected) elif isinstance(expected, XPathFunction) or not callable(expected): self.assertEqual(root_token.evaluate(context), expected) else: self.assertTrue(expected(root_token.evaluate(context))) def check_select(self, path, expected, context=None): """ Checks the materialized result of the *select* method with an XPath expression. The selection is applied on the root token of the parsed XPath expression. :param path: an XPath expression. :param expected: the expected result. Can be a data instance to compare to the result, \ a function that accepts the result as argument and returns a boolean value, an exception \ class that is raised by running the evaluate method. :param context: an optional `XPathContext` instance to be passed to evaluate method. If no \ context is provided the method is called with a dummy context. """ if context is None: context = XPathContext(root=self.etree.Element(u'dummy_root')) else: context = copy(context) root_token = self.parser.parse(path) if isinstance(expected, type) and issubclass(expected, Exception): self.assertRaises(expected, root_token.select, context) elif isinstance(expected, list): self.assertListEqual(list(root_token.select(context)), expected) elif isinstance(expected, set): self.assertEqual(set(root_token.select(context)), expected) elif callable(expected): self.assertTrue(expected(list(root_token.parser.parse(path).select(context)))) else: self.assertEqual(list(root_token.select(context)), expected) # must fail def check_selector(self, path, root, expected, namespaces=None, **kwargs): """ Checks using the selector API, namely the *select* function at package level. :param path: an XPath expression. :param root: an Element or an ElementTree instance. :param expected: the expected result. Can be a data instance to compare to the result, \ a type to be used to check the type of the result, a function that accepts the result \ as argument and returns a boolean value, an exception class that is raised by running \ the evaluate method. :param namespaces: an optional mapping from prefixes to namespace URIs. :param kwargs: other optional arguments for the parser class. """ if isinstance(expected, type) and issubclass(expected, Exception): self.assertRaises(expected, select, root, path, namespaces, self.parser.__class__, **kwargs) else: results = select(root, path, namespaces, self.parser.__class__, **kwargs) if isinstance(expected, list): self.assertListEqual(results, expected) elif isinstance(expected, set): self.assertEqual(set(results), expected) elif isinstance(expected, float): if math.isnan(expected): self.assertTrue(math.isnan(results)) else: self.assertAlmostEqual(results, expected) elif not callable(expected): self.assertEqual(results, expected) elif isinstance(expected, type): self.assertIsInstance(results, expected) else: self.assertTrue(expected(results)) @contextmanager def schema_bound_parser(self, schema_proxy): # Code to acquire resource, e.g.: self.parser.schema = schema_proxy try: yield self.parser finally: self.parser.schema = None @contextmanager def xsd_version_parser(self, xsd_version): xsd_version, self.parser._xsd_version = self.parser._xsd_version, xsd_version try: yield self.parser finally: self.parser._xsd_version = xsd_version # Wrong XPath expression checker shortcuts def check_raise(self, path, exception_class, *message_parts, context=None): with self.assertRaises(exception_class) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_syntax(self, path, *message_parts, context=None): with self.assertRaises(SyntaxError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_value(self, path, *message_parts, context=None): with self.assertRaises(ValueError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_type(self, path, *message_parts, context=None): with self.assertRaises(TypeError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) def wrong_name(self, path, *message_parts, context=None): with self.assertRaises(NameError) as error_context: root_token = self.parser.parse(path) root_token.evaluate(copy(context)) for part in message_parts: self.assertIn(part, str(error_context.exception)) if __name__ == '__main__': unittest.main() sissaschool-elementpath-d3688c7/tox.ini000066400000000000000000000070741476131650400202520ustar00rootroot00000000000000# Tox (http://tox.testrun.org/) is a tool for running tests # in multiple virtualenvs. This configuration file will run the # test suite on all supported python versions. To use it, "pip install tox" # and then run "tox" from this directory. [tox] min_version = 4.0 envlist = py{38,39,310,311,312,313,314}, pypy3, docs, flake8, mypy-py{38,39,310,311,312,313,py3}, pytest, coverage, xmlschema{251,302,310,321,332,342,343}, w3c-xsdtests skip_missing_interpreters = true work_dir = {tox_root}/../.tox/elementpath [testenv] deps = lxml lxml-stubs xmlschema>=3.0.2 docs: Sphinx coverage: coverage set_env = py313: TEST_UNICODE_INSTALLATION = 6.2.0 commands = python -m unittest [testenv:py314] deps = elementpath>=4.4.0, <5.0.0 jinja2 [testenv:docs] commands = make -C doc html SPHINXOPTS="-W -n" make -C doc latexpdf SPHINXOPTS="-W -n" make -C doc doctest SPHINXOPTS="-W -n" sphinx-build -W -n -T -b man doc build/sphinx/man allowlist_externals = make [flake8] max-line-length = 100 [testenv:flake8] deps = flake8 commands = flake8 elementpath flake8 tests [testenv:mypy-py38] deps = mypy==1.14.1 xmlschema>=3.1.0 lxml-stubs commands = mypy --strict elementpath python tests/test_typing.py [testenv:mypy-py{39,310,311,312,313,py3}] deps = mypy==1.15.0 xmlschema>=3.1.0 lxml-stubs commands = mypy --strict elementpath python tests/test_typing.py [testenv:coverage] commands = coverage run -p -m unittest coverage combine coverage report -m [testenv:pytest] deps = pytest pytest-randomly lxml lxml-stubs xmlschema>=3.0.2 commands = pytest tests -ra [testenv:xmlschema{251,302,310,321,332,342,343}] description = Run xmlschema tests and mypy on xmlschema source (>=3.1.0) platform = (linux|darwin) set_env = xmlschema251: VERSION = 2.5.1 xmlschema302: VERSION = 3.0.2 xmlschema310: VERSION = 3.1.0 xmlschema321: VERSION = 3.2.1 xmlschema332: VERSION = 3.3.2 xmlschema342: VERSION = 3.4.2 xmlschema343: VERSION = 3.4.3 change_dir = {env_tmp_dir} deps = mypy==1.13.0 lxml lxml-stubs jinja2 xmlschema=={env:VERSION} commands = pip download xmlschema=={env:VERSION} --no-deps --no-binary xmlschema tar xzf xmlschema-{env:VERSION}.tar.gz --strip-components=1 bash -c 'if [[ "{env:VERSION}" > "3.4.2" ]]; then mypy --strict --disable-error-code attr-defined xmlschema; fi' sed -i -e "s/Can't pickle/Can't/g" tests/validators/test_schemas.py # Patch the failure python tests/test_all.py allowlist_externals = bash sed tar ignore_outcome = True [testenv:w3c-xsdtests] description = Run W3C XSD 1.0/1.1 tests using xmlschema==3.4.3 platform = (linux|darwin) set_env = VERSION = 3.4.3 COMMIT = 4293d6fb026af778aa7ad381c2a310354578cbe3 CHECKSUM = 3c7a44dbb59553d09ba96fee898255be78966960c22e9a7886c0b426a03255d7 change_dir = {env_tmp_dir} deps = lxml xmlschema=={env:VERSION} commands = pip download xmlschema=={env:VERSION} --no-deps --no-binary xmlschema tar xzf xmlschema-{env:VERSION}.tar.gz curl -L -o w3c-xsdtests.tar.gz https://github.com/w3c/xsdtests/tarball/{env:COMMIT} bash -c "sha256sum w3c-xsdtests.tar.gz | grep {env:CHECKSUM}" mkdir xsdtests tar xzf w3c-xsdtests.tar.gz -C xsdtests --strip-components=1 python xmlschema-{env:VERSION}/tests/test_w3c_suite.py --xml allowlist_externals = bash curl grep tar mkdir sha256sum ignore_outcome = True [testenv:build] deps = setuptools wheel build commands = python -m build