```
## `soupsieve.select()`
```py3
def select(select, tag, namespaces=None, limit=0, flags=0, **kwargs):
"""Select the specified tags."""
```
`select` will return all tags under the given tag that match the given CSS selectors provided. You can also limit the
number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags).
`select` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary,
a `limit`, and `flags`.
```pycon3
>>> import soupsieve as sv
>>> sv.select('p:is(.a, .b, .c)', soup)
[
Cat
,
Dog
,
Mouse
]
```
## `soupsieve.iselect()`
```py3
def iselect(select, node, namespaces=None, limit=0, flags=0, **kwargs):
"""Select the specified tags."""
```
`iselect` is exactly like `select` except that it returns a generator instead of a list.
## `soupsieve.closest()`
```py3
def closest(select, tag, namespaces=None, flags=0, **kwargs):
"""Match closest ancestor to the provided tag."""
```
`closest` returns the tag closest to the given tag that matches the given selector. The element found must be a direct
ancestor of the tag or the tag itself.
`closest` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces)
dictionary, and `flags`.
## `soupsieve.match()`
```py3
def match(select, tag, namespaces=None, flags=0, **kwargs):
"""Match node."""
```
The `match` function matches a given tag with a given CSS selector.
`match` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary,
and flags.
```pycon3
>>> nodes = sv.select('p:is(.a, .b, .c)', soup)
>>> sv.match('p:not(.b)', nodes[0])
True
>>> sv.match('p:not(.b)', nodes[1])
False
```
## `soupsieve.filter()`
```py3
def filter(select, nodes, namespaces=None, flags=0, **kwargs):
"""Filter list of nodes."""
```
`filter` takes an iterable containing HTML nodes and will filter them based on the provided CSS selector string. If
given a `Tag`/`BeautifulSoup` object, it will iterate the direct children filtering them.
`filter` accepts a CSS selector string, an iterable containing nodes, an optional [namespace](#namespaces) dictionary,
and flags.
```pycon3
>>> sv.filter('p:not(.b)', soup.div)
[
Cat
,
Mouse
]
```
## `soupsieve.escape()`
```py3
def escape(ident):
"""Escape CSS identifier."""
```
`escape` is used to escape CSS identifiers. It follows the [CSS specification][cssom] and escapes any character that
would normally cause an identifier to be invalid.
```pycon3
>>> sv.escape(".foo#bar")
'\\.foo\\#bar'
>>> sv.escape("()[]{}")
'\\(\\)\\[\\]\\{\\}'
>>> sv.escape('--a')
'--a'
>>> sv.escape('0')
'\\30 '
>>> sv.escape('\0')
'�'
```
/// new | New in 1.9.0
`escape` is a new API function added in 1.9.0.
///
## `soupsieve.compile()`
```py3
def compile(pattern, namespaces=None, flags=0, **kwargs):
"""Compile CSS pattern."""
```
`compile` will pre-compile a CSS selector pattern returning a `SoupSieve` object. The `SoupSieve` object has the same
selector functions available via the module without the need to specify the selector, namespaces, or flags.
```py3
class SoupSieve:
"""Match tags in Beautiful Soup with CSS selectors."""
def match(self, tag):
"""Match."""
def closest(self, tag):
"""Match closest ancestor."""
def filter(self, iterable):
"""Filter."""
def select_one(self, tag):
"""Select a single tag."""
def select(self, tag, limit=0):
"""Select the specified tags."""
def iselect(self, tag, limit=0):
"""Iterate the specified tags."""
```
## `soupsieve.purge()`
Soup Sieve caches compiled patterns for performance. If for whatever reason, you need to purge the cache, simply call
`purge`.
## Custom Selectors
The custom selector feature is loosely inspired by the `css-extensions` [proposal][custom-extensions-1]. In its current
form, Soup Sieve allows assigning a complex selector to a custom pseudo-class name. The pseudo-class name must start
with `:--` to avoid conflicts with any future pseudo-classes.
To create custom selectors, you simply need to pass a dictionary containing the custom pseudo-class names (keys) with
the associated CSS selectors that the pseudo-classes are meant to represent (values). It is important to remember that
pseudo-class names are not case sensitive, so even though a dictionary will allow you to specify multiple keys with the
same name (as long as the character cases are different), Soup Sieve will not and will throw an exception if you attempt
to do so.
In the following example, we will define our own custom selector called `#!css :--header` that will be an alias for
`#!css h1, h2, h3, h4, h5, h6`.
```py3
import soupsieve as sv
import bs4
markup = """
Header 1
Header 2
child
Header 1,
Header 2
]
```
Custom selectors can also be dependent upon other custom selectors. You don't have to worry about the order in the
dictionary as custom selectors will be compiled "just in time" when they are needed. Be careful though, if you create
a circular dependency, you will get a `SelectorSyntaxError`.
Assuming the same markup as in the first example, we will now create a custom selector that should find any element that
has child elements, we will call the selector `:--parent`. Then we will create another selector called
`:--parent-paragraph` that will use the `:--parent` selector to find `#!html
` elements that are also parents:
```py3
custom = {
":--parent": ":has(> *|*)",
":--parent-paragraph": "p:--parent"
}
print(sv.select(':--parent-paragraph', soup, custom=custom))
```
The above code will yield the only paragraph that is a parent:
```
[
child
]
```
## Namespaces
Many of Soup Sieve's selector functions take an optional namespace dictionary. Namespaces, just like CSS, must be
defined for Soup Sieve to evaluate `ns|tag` type selectors. This is analogous to CSS's namespace at-rule:
```css
@namespace url("http://www.w3.org/1999/xhtml");
@namespace svg url("http://www.w3.org/2000/svg");
```
A namespace dictionary should have keys (prefixes) and values (namespaces). An empty key string for a key would denote
the default key. An empty value would essentially represent a null namespace. To represent the above CSS example for
Soup Sieve, we would configure it like so:
```py3
namespace = {
"": "http://www.w3.org/1999/xhtml", # Default namespace is for XHTML
"svg": "http://www.w3.org/2000/svg", # The SVG namespace defined with prefix of "svg"
}
```
Prefixes used in the namespace dictionary do not have to match the prefixes in the document. The provided prefix is
never compared against the prefixes in the document, only the namespaces are compared. The prefixes in the document are
only there for the parser to know which tags get which namespace. And the prefixes in the namespace dictionary are only
defined in order to provide an alias for the namespaces when using the namespace selector syntax: `ns|name`.
Tags do not necessarily have to have a prefix for Soup Sieve to recognize them either. For instance, in HTML5, SVG
*should* automatically get the SVG namespace. Depending how namespaces were defined in the document, tags may inherit
namespaces in some conditions. Namespace assignment is mainly handled by the parser and exposed through the Beautiful
Soup API. Soup Sieve uses the Beautiful Soup API to then compare namespaces for supported documents.
soupsieve-2.7/docs/src/markdown/differences.md 0000644 0000000 0000000 00000014202 13615410400 016463 0 ustar 00 # Beautiful Soup Differences
Soup Sieve is the official CSS "select" implementation of Beautiful Soup 4.7.0+. While the inclusion of Soup Sieve fixes
many issues and greatly expands CSS support in Beautiful Soup, it does introduce some differences which may surprise
some who've become accustom to the old "select" implementation.
Beautiful Soup's old select method had numerous limitations and quirks that do not align with the actual CSS
specifications. Most are insignificant, but there are a couple differences that people over the years had come to rely
on. Soup Sieve, which aims to follow the CSS specification closely, does not support these differences.
## Attribute Values
Beautiful Soup was very relaxed when it came to attribute values in selectors: `#!css [attribute=value]`. Beautiful
Soup would allow almost anything for a valid unquoted value. Soup Sieve, on the other hand, follows the CSS
specification and requires that a value be a valid identifier, or it must be quoted. If you get an error complaining
about a malformed attribute, you may need to quote the value.
For instance, if you previously used a selector like this:
```py3
soup.select('[attr={}]')
```
You would need to quote the value as `{}` is not a valid CSS identifier, so it must be quoted:
```py3
soup.select('[attr="{}"]')
```
You can also use the [escape](./api.md#soupsieveescape) function to escape dynamic content:
```py3
import soupsieve
soup.select('[attr=%s]' % soupsieve.escape('{}'))
```
## CSS Identifiers
Since Soup Sieve follows the CSS specification, class names, id names, tag names, etc. must be valid identifiers. Since
identifiers, according to the CSS specification, cannot *start* with a number, some users may find that their old class,
id, or tag name selectors that started with numbers will not work. To specify such selectors, you'll have to use CSS
escapes.
So if you used to use:
```py3
soup.select('.2class')
```
You would need to update with:
```py3
soup.select(r'.\32 class')
```
Numbers in the middle or at the end of a class will work as they always did:
```py3
soup.select('.class2')
```
## Relative Selectors
Whether on purpose or on accident, Beautiful Soup used to allow relative selectors:
```py3
soup.select('> div')
```
The above is not a valid CSS selector according the CSS specifications. Relative selector lists have only recently been
added to the CSS specifications, and they are only allowed in a `#!css :has()` pseudo-class:
```css
article:has(> div)
```
But, in the level 4 CSS specifications, the `:scope` pseudo-class has been added which allows for the same feel as using
`#!css > div`. Since Soup Sieve supports the `:scope` pseudo-class, it can be used to produce the same behavior as the
legacy select method.
In CSS, the `:scope` pseudo-class represents the element that the CSS select operation is called on. In supported
browsers, the following JavaScript example would treats `:scope` as the element that `el` references:
```js
el.querySelectorAll(':scope > .class')
```
Just like in the JavaScript example above, Soup Sieve would also treat `:scope` as the element that `el` references:
```py3
el.select(':scope > .class')
```
In the case where the element is the document node, `:scope` would simply represent the root element of the document.
So, if you used to have selectors such as:
```py3
soup.select('> div')
```
You can simply add `:scope`, and it should work the same:
```py3
soup.select(':scope > div')
```
While this will generally give you what is expected for the relative, descendant selectors, this will not work for
sibling selectors, and the reasons why are covered in more details in [Out of Scope Selectors](#out-of-scope-selectors).
## Out of Scope Selectors
In a browser, when requesting a selector via `querySelectorAll`, the element that `querySelectorAll` is called on is
the *scoped* element. So in the following example, `el` is the *scoped* element.
```js
el.querySelectorAll('.class')
```
This same concept applies to Soup Sieve, where the element that `select` or `select_one` is called on is also the
*scoped* element. So in the following example, `el` is also the *scoped* element:
```py3
el.select('.class')
```
In browsers, `querySelectorAll` and `querySelector` only return elements under the *scoped* element. They do not return
the *scoped* element itself, its parents, or its siblings. Only when `querySelectorAll` or `querySelector` is called on
the document node will it return the *scoped* selector, which would be the *root* element, as the query is being called
on the document itself and not the *scoped* element.
Soup Sieve aims to essentially mimic the browser functions such as `querySelector`, `querySelectorAll`, `matches`, etc.
In Soup Sieve `select` and `select_one` are analogous to `querySelectorAll` and `querySelector` respectively. For this
reason, Soup Sieve also only returns elements under the *scoped* element. The idea is to provide a familiar interface
that behaves, as close as possible, to what people familiar with CSS selectors are used to.
So while Soup Sieve will find elements relative to `:scope` with `>` or :
```py3
soup.select(':scope > div')
```
It will not find elements relative to `:scope` with `+` or `~` as siblings to the *scoped* element are not under the
*scoped* element:
```py3
soup.select(':scope + div')
```
This is by design and is in align with the behavior exhibited in all web browsers.
## Selected Element Order
Another quirk of Beautiful Soup's old implementation was that it returned the HTML nodes in the order of how the
selectors were defined. For instance, Beautiful Soup, if given the pattern `#!css article, body` would first return
`#!html ` and then `#!html `.
Soup Sieve does not, and frankly cannot, honor Beautiful Soup's old ordering convention due to the way it is designed.
Soup Sieve returns the nodes in the order they are defined in the document as that is how the elements are searched.
This much more efficient and provides better performance.
So, given the earlier selector pattern of `article, body`, Soup Sieve would return the element `#!html ` and then
`#!html ` as that is how it is ordered in the HTML document.
soupsieve-2.7/docs/src/markdown/faq.md 0000644 0000000 0000000 00000006173 13615410400 014765 0 ustar 00 # Frequent Asked Questions
## Why do selectors not work the same in Beautiful Soup 4.7+?
Soup Sieve is the official CSS selector library in Beautiful Soup 4.7+, and with this change, Soup Sieve introduces a
number of changes that break some of the expected behaviors that existed in versions prior to 4.7.
In short, Soup Sieve follows the CSS specifications fairly close, and this broke a number of non-standard behaviors.
These non-standard behaviors were not allowed according to the CSS specifications. Soup Sieve has no intentions of
bringing back these behaviors.
For more details on specific changes, and the reasoning why a specific change is considered a good change, or simply a
feature that Soup Sieve cannot/will not support, see [Beautiful Soup Differences](./differences.md).
## How does `iframe` handling work?
In web browsers, CSS selectors do not usually select content inside an `iframe` element if the selector is called on an
element outside of the `iframe`. Each HTML document is usually encapsulated and CSS selector leakage across this
`iframe` boundary is usually prevented.
In it's current iteration, Soup Sieve is not aware of the origin of the documents in the `iframe`, and Soup Sieve will
not prevent selectors from crossing these boundaries. Soup Sieve is not used to style documents, but to scrape
documents. For this reason, it seems to be more helpful to allow selector combinators to cross these boundaries.
Soup Sieve isn't entirely unaware of `iframe` elements though. In Soup Sieve 1.9.1, it was noticed that some
pseudo-classes behaved in unexpected ways without awareness to `iframes`, this was fixed in 1.9.1. Pseudo-classes such
as [`:default`](./selectors/pseudo-classes.md#:default), [`:indeterminate`](./selectors/pseudo-classes.md#:indeterminate),
[`:dir()`](./selectors/pseudo-classes.md#:dir), [`:lang()`](./selectors/pseudo-classes.md#:lang),
[`:root`](./selectors/pseudo-classes.md#:root), and [`:contains()`](./selectors/pseudo-classes.md#:contains) were
given awareness of `iframes` to ensure they behaved properly and returned the expected elements. This doesn't mean that
`select` won't return elements in `iframes`, but it won't allow something like `:default` to select a `button` in an
`iframe` whose parent `form` is outside the `iframe`. Or better put, a default `button` will be evaluated in the context
of the document it is in.
With all of this said, if your selectors have issues with `iframes`, it is most likely because `iframes` are handled
differently by different parsers. `html.parser` will usually parse `iframe` elements as it sees them. `lxml` parser will
often remove `html` and `body` tags of an `iframe` HTML document. `lxml-xml` will simply ignore the content in a XHTML
document. And `html5lib` will HTML escape the content of an `iframe` making traversal impossible.
In short, Soup Sieve will return elements from all documents, even `iframes`. But certain pseudo-classes may take into
consideration the context of the document they are in. But even with all of this, a parser's handling of `iframes` may
make handling its content difficult if it doesn't parse it as HTML elements, or augments its structure.
soupsieve-2.7/docs/src/markdown/index.md 0000644 0000000 0000000 00000012671 13615410400 015325 0 ustar 00 # Quick Start
## Overview
Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting,
matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1
specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup
version 4.7.0, it now is :confetti_ball:. Soup Sieve can also be imported in order to use its API directly for
more controlled, specialized parsing.
Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a
number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply
do not match anything. Some of the supported selectors are:
- `#!css .classes`
- `#!css #ids`
- `#!css [attributes=value]`
- `#!css parent child`
- `#!css parent > child`
- `#!css sibling ~ sibling`
- `#!css sibling + sibling`
- `#!css :not(element.class, element2.class)`
- `#!css :is(element.class, element2.class)`
- `#!css parent:has(> child)`
- and [many more](./selectors/index.md)
## Installation
You must have Beautiful Soup already installed:
```
pip install beautifulsoup4
```
In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via
some alternative method, and Soup Sieve is not automatically installed, you can install it directly:
```
pip install soupsieve
```
If you want to manually install it from source, first ensure that [`build`][build] is installed:
```
pip install build
```
Then navigate to the root of the project and build the wheel and install (replacing `` with the current version):
```
python -m build -w
pip install dist/soupsive--py3-none-any.whl
```
## Usage
To use Soup Sieve, you must create a `BeautifulSoup` object:
```pycon3
>>> import bs4
>>> text = """
...
...
...
Cat
...
Dog
...
Mouse
...
... """
>>> soup = bs4.BeautifulSoup(text, 'html5lib')
```
For most people, using the Beautiful Soup 4.7.0+ API may be more than sufficient. Beautiful Soup offers two methods that employ
Soup Sieve: `select` and `select_one`. Beautiful Soup's select API is identical to Soup Sieve's, except that you don't
have to hand it the tag object, the calling object passes itself to Soup Sieve:
```pycon3
>>> soup = bs4.BeautifulSoup(text, 'html5lib')
>>> soup.select_one('p:is(.a, .b, .c)')
]
```
You can also use the Soup Sieve API directly to get access to the full range of possibilities that Soup Sieve offers.
You can select a single tag:
```pycon3
>>> import soupsieve as sv
>>> sv.select_one('p:is(.a, .b, .c)', soup)
Cat
```
You can select all tags:
```pycon3
>>> import soupsieve as sv
>>> sv.select('p:is(.a, .b, .c)', soup)
[
Cat
,
Dog
,
Mouse
]
```
You can select the closest ancestor:
```pycon3
>>> import soupsieve as sv
>>> el = sv.select_one('.c', soup)
>>> sv.closest('div', el)
Cat
Dog
Mouse
```
You can filter a tag's Children (or an iterable of tags):
```pycon3
>>> sv.filter('p:not(.b)', soup.div)
[
Cat
,
Mouse
]
```
You can match a single tag:
```pycon3
>>> els = sv.select('p:is(.a, .b, .c)', soup)
>>> sv.match('p:not(.b)', els[0])
True
>>> sv.match('p:not(.b)', els[1])
False
```
Or even just extract comments:
```pycon3
>>> sv.comments(soup)
[' These are animals ']
```
Selectors do not have to be constrained to one line either. You can span selectors over multiple lines just like you
would in a CSS file.
```pycon3
>>> selector = """
... .a,
... .b,
... .c
... """
>>> sv.select(selector, soup)
[
Cat
,
Dog
,
Mouse
]
```
You can even use comments to annotate a particularly complex selector.
```pycon3
>>> selector = """
... /* This isn't complicated, but we're going to annotate it anyways.
... This is the a class */
... .a,
... /* This is the b class */
... .b,
... /* This is the c class */
... .c
... """
>>> sv.select(selector, soup)
[
Cat
,
Dog
,
Mouse
]
```
If you've ever used Python's Re library for regular expressions, you may know that it is often useful to pre-compile a
regular expression pattern, especially if you plan to use it more than once. The same is true for Soup Sieve's
matchers, though is not required. If you have a pattern that you want to use more than once, it may be wise to
pre-compile it early on:
```pycon3
>>> selector = sv.compile('p:is(.a, .b, .c)')
>>> selector.filter(soup.div)
[
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('[href]'))
[Internal link, Example link, Insensitive internal link, Example org link]
```
////
///
/// define
`[attribute=value]`
- Represents elements with an attribute named **attribute** that also has a value of **value**.
//// tab | Syntax
```css
[attr=value]
[attr="value"]
```
////
//// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('a[href!="#internal"]'))
[Example link, Insensitive internal link, Example org link]
```
////
///
/// define
`[attribute operator value i]`:material-flask:{: title="Experimental" data-md-color-primary="purple" .icon}
- Represents elements with an attribute named **attribute** and whose value, when the **operator** is applied, matches
**value** *without* case sensitivity. In general, attribute comparison is insensitive in normal HTML, but not XML.
`i` is most useful in XML documents.
//// tab | Syntax
```css
[attr=value i]
[attr="value" i]
```
////
//// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('[href="#INTERNAL" s]'))
[]
>>> print(soup.select('[href="#internal" s]'))
[Internal link]
```
////
///
## Namespace Selectors
Namespace selectors are used in conjunction with type and universal selectors as well as attribute names in attribute
selectors. They are specified by declaring the namespace and the selector separated with `|`: `namespace|selector`.
`namespace`, in this context, is the prefix defined via the [namespace dictionary](../api.md#namespaces). The prefix
defined for the CSS selector does not need to match the prefix name in the document as it is the namespace associated
with the prefix that is compared, not the prefix itself.
The universal selector (`*`) can be used to represent any namespace just as it can with types.
By default, type selectors without a namespace selector will match any element whose type matches, regardless of
namespace. But if a CSS default namespace is declared (one with an empty key: `{"": "http://www.w3.org/1999/xhtml"}`),
all type selectors will assume the default namespace unless an explicit namespace selector is specified. For example,
if the default name was defined to be `http://www.w3.org/1999/xhtml`, the selector `a` would only match `a` tags that
are within the `http://www.w3.org/1999/xhtml` namespace. The one exception is within pseudo classes (`:not()`, `:has()`,
etc.) as namespaces are not considered within pseudo classes unless one is explicitly specified.
If the namespace is omitted (`|element`), any element without a namespace will be matched. In HTML documents that
support namespaces (XHTML and HTML5), HTML elements are counted as part of the `http://www.w3.org/1999/xhtml` namespace,
but attributes usually do not have a namespace unless one is explicitly defined in the markup.
Namespaces can be used with attribute selectors as well except that when `[|attribute`] is used, it is equivalent to
`[attribute]`.
/// tab | Syntax
```css
ns|element
ns|*
*|*
*|element
|element
[ns|attr]
[*|attr]
[|attr]
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('svg|a', namespaces={'svg': 'http://www.w3.org/2000/svg'}))
[MDN Web Docs]
>>> print(soup.select('a', namespaces={'svg': 'http://www.w3.org/2000/svg'}))
[Soup Sieve Docs, MDN Web Docs]
>>> print(soup.select('a', namespaces={'': 'http://www.w3.org/1999/xhtml', 'svg': 'http://www.w3.org/2000/svg'}))
[Soup Sieve Docs]
>>> print(soup.select('[xlink|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'}))
[MDN Web Docs]
>>> print(soup.select('[|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'}))
[Soup Sieve Docs]
```
///
--8<--
selector_styles.md
--8<--
soupsieve-2.7/docs/src/markdown/selectors/combinators.md 0000644 0000000 0000000 00000006712 13615410400 020540 0 ustar 00 # Combinators and Selector Lists
CSS employs a number of tokens in order to represent lists or to provide relational context between two selectors.
## Selector Lists
Selector lists use the comma (`,`) to join multiple selectors in a list. When presented with a selector list, any
selector in the list that matches an element will return that element.
/// tab | Syntax
```css
element1, element2
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
## Descendant Combinator
Descendant combinators combine two selectors with whitespace () in order to signify that the second
element is matched if it has an ancestor that matches the first element.
/// tab | Syntax
```css
parent descendant
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator
///
## Child combinator
Child combinators combine two selectors with `>` in order to signify that the second element is matched if it has a
parent that matches the first element.
/// tab | Syntax
```css
parent > child
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator
///
## General sibling combinator
General sibling combinators combine two selectors with `~` in order to signify that the second element is matched if it
has a sibling that precedes it that matches the first element.
/// tab | Syntax
```css
prevsibling ~ sibling
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/General_sibling_combinator
///
## Adjacent sibling combinator
Adjacent sibling combinators combine two selectors with `+` in order to signify that the second element is matched if it
has an adjacent sibling that precedes it that matches the first element.
/// tab | Syntax
```css
prevsibling + nextsibling
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/Adjacent_sibling_combinator
///
--8<--
selector_styles.md
--8<--
soupsieve-2.7/docs/src/markdown/selectors/index.md 0000644 0000000 0000000 00000014043 13615410400 017323 0 ustar 00 # General Details
## Implementation Specifics
The CSS selectors are based off of the CSS specification and includes not only stable selectors, but may also include
selectors currently under development from the draft specifications. Primarily support has been added for selectors that
were feasible to implement and most likely to get practical use. In addition to the selectors in the specification,
Soup Sieve also supports a couple non-standard selectors.
Soup Sieve aims to allow users to target XML/HTML elements with CSS selectors. It implements many pseudo classes, but it
does not currently implement any pseudo elements and has no plans to do so. Soup Sieve also will not match anything for
pseudo classes that are only relevant in a live, browser environment, but it will gracefully handle them if they've been
implemented; such pseudo classes are non-applicable in the Beautiful Soup environment and are noted in [Non-Applicable
Pseudo Classes](./unsupported.md#non-applicable-pseudo-classes).
When speaking about namespaces, they only apply to XML, XHTML, or when dealing with recognized foreign tags in HTML5.
Currently, Beautiful Soup's `html5lib` parser is the only parser that will return the appropriate namespaces for a HTML5
document. If you are using XHTML, you have to use the Beautiful Soup's `lxml-xml` parser (or `xml` for short) to get the
appropriate namespaces in an XHTML document. In addition to using the correct parser, you must provide a dictionary of
namespaces to Soup Sieve in order to use namespace selectors. See the documentation on
[namespaces](../api.md#namespaces) to learn more.
While an effort is made to mimic CSS selector behavior, there may be some differences or quirks, please report issues if
any are found.
## Selector Context Key
Some selectors are very specific to HTML and either have no meaningful representation in XML, or such functionality has
not been implemented. Selectors that are HTML only will be noted with :material-language-html5:{: data-md-color-primary="orange"},
and will match nothing if used in XML.
Soup Sieve has implemented a couple non-standard selectors. These can contain useful selectors that were rejected
from the official CSS specifications, selectors implemented by other systems such as JQuery, or even selectors
specifically created for Soup Sieve. If a selector is considered non standard, it will be marked with
:material-star:{: title="Custom" data-md-color-primary="green"}.
All selectors that are from the current working draft of CSS4 are considered experimental and are marked with
:material-flask:{: title="Experimental" data-md-color-primary="purple"}. Additionally, if there are other immature selectors, they may be marked as experimental as
well. Experimental may mean we are not entirely sure if our implementation is correct, that things may still be in flux
as they are part of a working draft, or even both.
If at anytime a working draft drops a selector from the current draft, it will most likely also be removed here,
most likely with a deprecation path, except where there may be a conflict that requires a less graceful transition.
One exception is in the rare case that the selector is found to be far too useful despite being rejected. In these
cases, we may adopt them as "custom" selectors.
/// tip | Additional Reading
If usage of a selector is not clear in this documentation, you can find more information by reading these
specification documents:
[CSS Level 3 Specification](https://www.w3.org/TR/selectors-3/)
: Contains the latest official document outlying official behaviors of CSS selectors.
[CSS Level 4 Working Draft](https://www.w3.org/TR/selectors-4/)
: Contains the latest published working draft of the CSS level 4 selectors which outlines the experimental new
selectors and experimental behavioral changes.
[HTML5](https://www.w3.org/TR/html50/)
: The HTML 5.0 specification document. Defines the semantics regarding HTML.
[HTML Living Standard](https://html.spec.whatwg.org/)
: The HTML Living Standard document. Defines semantics regarding HTML.
///
## Selector Terminology
Certain terminology is used throughout this document when describing selectors. In order to fully understand the syntax
a selector may implement, it is important to understand a couple of key terms.
### Selector
Selector is used to describe any selector whether it is a [simple](#simple-selector), [compound](#compound-selector), or
[complex](#complex-selector) selector.
### Simple Selector
A simple selector represents a single condition on an element. It can be a [type selector](#type-selectors),
[universal selector](#universal-selectors), [ID selector](#id-selectors), [class selector](#class-selectors),
[attribute selector](#attribute-selectors), or [pseudo class selector](#pseudo-classes).
### Compound Selector
A [compound](#compound-selector) selector is a sequence of [simple](#simple-selector) selectors. They do not contain any
[combinators](#combinators-and-selector-lists). If a universal or type selector is used, they must come first, and only
one instance of either a universal or type selector can be used, both cannot be used at the same time.
### Complex Selector
A complex selector consists of multiple [simple](#simple-selector) or [compound](#compound-selector) selectors joined
with [combinators](#combinators-and-selector-lists).
### Selector List
A selector list is a list of selectors joined with a comma (`,`). A selector list is used to specify that a match is
valid if any of the selectors in a list matches.
--8<--
selector_styles.md
--8<--
soupsieve-2.7/docs/src/markdown/selectors/pseudo-classes.md 0000644 0000000 0000000 00000142371 13615410400 021154 0 ustar 00 # Pseudo-Classes
## Overview
These are pseudo classes that are either fully or partially supported. Partial support is usually due to limitations of
not being in a live, browser environment. Pseudo classes that cannot be implemented are found under
[Non-Applicable Pseudo Classes](./unsupported.md/#non-applicable-pseudo-classes). Any selectors that are not found here or under the
non-applicable either are under consideration, have not yet been evaluated, or are too new and viewed as a risk to
implement as they might not stick around.
## `:any-link`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon} {:#:any-link}
Selects every `#!html `, or `#!html ` element that has an `href` attribute, independent of
whether it has been visited.
/// tab | Syntax
```css
:any-link
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select(':any-link'))
[click]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/:any-link
///
/// new | New in 2.2
The CSS specification recently updated to not include `#!html ` in the definition; therefore, Soup Sieve has
removed it as well.
///
## `:checked`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon} {:#:checked}
Selects any `#!html `, `#!html `, or `#!html
This is open.An open details element.
"""
def test_open(self):
"""Test open."""
self.assert_selector(
self.MARKUP,
":open",
['2', '3'],
flags=util.HTML
)
def test_targted_open(self):
"""Test targeted open."""
self.assert_selector(
self.MARKUP,
"details:open",
['2'],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
"dialog:open",
['3'],
flags=util.HTML
)
def test_not_open(self):
"""Test not open."""
self.assert_selector(
self.MARKUP,
":is(dialog, details):not(:open)",
["1", "4"],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_optional.py 0000644 0000000 0000000 00000001521 13615410400 017155 0 ustar 00 """Test optional selectors."""
from .. import util
class TestOptional(util.TestCase):
"""Test optional selectors."""
MARKUP = """
"""
def test_optional(self):
"""Test optional."""
self.assert_selector(
self.MARKUP,
":optional",
['3', '4', '5'],
flags=util.HTML
)
def test_specific_optional(self):
"""Test specific optional."""
self.assert_selector(
self.MARKUP,
"input:optional",
['3'],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_out_of_range.py 0000644 0000000 0000000 00000025260 13615410400 020005 0 ustar 00 """Test out of range selectors."""
from .. import util
class TestOutOfRange(util.TestCase):
"""Test out of range selectors."""
def test_out_of_range_number(self):
"""Test in range number."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['9', '10', '11'],
flags=util.HTML
)
def test_out_of_range_range(self):
"""Test in range range."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['9', '10'],
flags=util.HTML
)
def test_out_of_range_month(self):
"""Test in range month."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['7', '8', '9', '10'],
flags=util.HTML
)
def test_out_of_range_week(self):
"""Test in range week."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['8', '9', '10', '11'],
flags=util.HTML
)
def test_out_of_range_date(self):
"""Test in range date."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['7', '8', '9', '10', '11', '12'],
flags=util.HTML
)
def test_out_of_range_date_time(self):
"""Test in range date time."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['7', '8', '9', '10', '11', '12', '13', '14', '15', '16'],
flags=util.HTML
)
def test_out_of_range_time(self):
"""Test in range time."""
markup = """
"""
self.assert_selector(
markup,
":out-of-range",
['8', '9', '10', '11', '12', '13', '14'],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_past.py 0000644 0000000 0000000 00000001421 13615410400 016276 0 ustar 00 """Test past selectors."""
from .. import util
class TestPast(util.TestCase):
"""Test past selectors."""
MARKUP = """
"""
def test_scope_is_root(self):
"""Test scope is the root when the a specific element is not the target of the select call."""
# Scope is root when applied to a document node
self.assert_selector(
self.MARKUP,
":scope",
["root"],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
":scope > body > div",
["div"],
flags=util.HTML
)
def test_scope_cannot_select_target(self):
"""Test that scope, the element which scope is called on, cannot be selected."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope is the element we are applying the select to, and that element is never returned
self.assertTrue(len(sv.select(':scope', el, flags=sv.DEBUG)) == 0)
def test_scope_is_select_target(self):
"""Test that scope is the element which scope is called on."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope here means the current element under select
ids = [el.attrs['id'] for el in sv.select(':scope div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
el = soup.body
ids = [el.attrs['id'] for el in sv.select(':scope div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
# `div` is the current element under select, and it has no `div` elements.
el = soup.div
ids = [el.attrs['id'] for el in sv.select(':scope div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted([]))
# `div` does have an element with the class `.wordshere`
ids = [el.attrs['id'] for el in sv.select(':scope .wordshere', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['pre']))
soupsieve-2.7/tests/test_level4/test_target_within.py 0000644 0000000 0000000 00000001472 13615410400 020205 0 ustar 00 """Test target within selectors."""
from .. import util
class TestTargetWithin(util.TestCase):
"""Test target within selectors."""
MARKUP = """
Jump
"""
def test_amp_is_root(self):
"""Test ampersand is the root when the a specific element is not the target of the select call."""
# Scope is root when applied to a document node
self.assert_selector(
self.MARKUP,
"&",
["root"],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
"& > body > div",
["div"],
flags=util.HTML
)
def test_amp_cannot_select_target(self):
"""Test that ampersand, the element which scope is called on, cannot be selected."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope is the element we are applying the select to, and that element is never returned
self.assertTrue(len(sv.select('&', el, flags=sv.DEBUG)) == 0)
def test_amp_is_select_target(self):
"""Test that ampersand is the element which scope is called on."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope here means the current element under select
ids = [el.attrs['id'] for el in sv.select('& div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
el = soup.body
ids = [el.attrs['id'] for el in sv.select('& div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
# `div` is the current element under select, and it has no `div` elements.
el = soup.div
ids = [el.attrs['id'] for el in sv.select('& div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted([]))
# `div` does have an element with the class `.wordshere`
ids = [el.attrs['id'] for el in sv.select('& .wordshere', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['pre']))
soupsieve-2.7/.gitignore 0000644 0000000 0000000 00000002470 13615410400 012317 0 ustar 00 .DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# Patches
*.patch
soupsieve-2.7/LICENSE.md 0000644 0000000 0000000 00000002110 13615410400 011722 0 ustar 00 MIT License
Copyright (c) 2018 - 2025 Isaac Muse
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
soupsieve-2.7/README.md 0000644 0000000 0000000 00000006644 13615410400 011615 0 ustar 00 [![Donate via PayPal][donate-image]][donate-link]
[![Build][github-ci-image]][github-ci-link]
[![Coverage Status][codecov-image]][codecov-link]
[![PyPI Version][pypi-image]][pypi-link]
[![PyPI Downloads][pypi-down]][pypi-link]
[![PyPI - Python Version][python-image]][pypi-link]
[![License][license-image-mit]][license-link]
# Soup Sieve
## Overview
Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting,
matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1
specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup
version 4.7.0, it now is :confetti_ball:. Soup Sieve can also be imported in order to use its API directly for
more controlled, specialized parsing.
Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a
number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply
do not match anything. Some of the supported selectors are:
- `.classes`
- `#ids`
- `[attributes=value]`
- `parent child`
- `parent > child`
- `sibling ~ sibling`
- `sibling + sibling`
- `:not(element.class, element2.class)`
- `:is(element.class, element2.class)`
- `parent:has(> child)`
- and [many more](https://facelessuser.github.io/soupsieve/selectors/)
## Installation
You must have Beautiful Soup already installed:
```
pip install beautifulsoup4
```
In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via
some alternative method, and Soup Sieve is not automatically installed, you can install it directly:
```
pip install soupsieve
```
If you want to manually install it from source, first ensure that [`build`](https://pypi.org/project/build/) is
installed:
```
pip install build
```
Then navigate to the root of the project and build the wheel and install (replacing `` with the current version):
```
python -m build -w
pip install dist/soupsieve--py3-none-any.whl
```
## Documentation
Documentation is found here: https://facelessuser.github.io/soupsieve/.
## License
MIT
[bs4]: https://beautiful-soup-4.readthedocs.io/en/latest/#
[github-ci-image]: https://github.com/facelessuser/soupsieve/workflows/build/badge.svg
[github-ci-link]: https://github.com/facelessuser/soupsieve/actions?query=workflow%3Abuild+branch%3Amain
[codecov-image]: https://img.shields.io/codecov/c/github/facelessuser/soupsieve/master.svg?logo=codecov&logoColor=aaaaaa&labelColor=333333
[codecov-link]: https://codecov.io/github/facelessuser/soupsieve
[pypi-image]: https://img.shields.io/pypi/v/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-down]: https://img.shields.io/pypi/dm/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-link]: https://pypi.python.org/pypi/soupsieve
[python-image]: https://img.shields.io/pypi/pyversions/soupsieve?logo=python&logoColor=aaaaaa&labelColor=333333
[license-image-mit]: https://img.shields.io/badge/license-MIT-blue.svg?labelColor=333333
[license-link]: https://github.com/facelessuser/soupsieve/blob/main/LICENSE.md
[donate-image]: https://img.shields.io/badge/Donate-PayPal-3fabd1?logo=paypal
[donate-link]: https://www.paypal.me/facelessuser
soupsieve-2.7/hatch_build.py 0000644 0000000 0000000 00000003046 13615410400 013147 0 ustar 00 """Dynamically define some metadata."""
import os
from hatchling.metadata.plugin.interface import MetadataHookInterface
def get_version_dev_status(root):
"""Get version_info without importing the entire module."""
import importlib.util
path = os.path.join(root, "soupsieve", "__meta__.py")
spec = importlib.util.spec_from_file_location("__meta__", path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module.__version_info__._get_dev_status()
class CustomMetadataHook(MetadataHookInterface):
"""Our metadata hook."""
def update(self, metadata):
"""See https://ofek.dev/hatch/latest/plugins/metadata-hook/ for more information."""
metadata["classifiers"] = [
f"Development Status :: {get_version_dev_status(self.root)}",
'Environment :: Console',
'Intended Audience :: Developers',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
'Programming Language :: Python :: 3.12',
'Programming Language :: Python :: 3.13',
'Topic :: Internet :: WWW/HTTP :: Dynamic Content',
'Topic :: Software Development :: Libraries :: Python Modules',
'Typing :: Typed'
]
soupsieve-2.7/pyproject.toml 0000644 0000000 0000000 00000005451 13615410400 013245 0 ustar 00 [build-system]
requires = [
"hatchling>=0.21.1",
]
build-backend = "hatchling.build"
[project]
name = "soupsieve"
description = "A modern CSS selector implementation for Beautiful Soup."
readme = "README.md"
license = "MIT"
requires-python = ">=3.8"
authors = [
{ name = "Isaac Muse", email = "Isaac.Muse@gmail.com" },
]
keywords = [
"CSS",
"HTML",
"XML",
"selector",
"filter",
"query",
"soup"
]
dynamic = [
"classifiers",
"version",
]
[project.urls]
Homepage = "https://github.com/facelessuser/soupsieve"
[tool.hatch.version]
source = "code"
path = "soupsieve/__meta__.py"
[tool.hatch.build.targets.wheel]
include = [
"/soupsieve",
]
[tool.hatch.build.targets.sdist]
include = [
"/docs/src/markdown/**/*.md",
"/docs/src/markdown/**/*.gif",
"/docs/src/markdown/**/*.png",
"/docs/src/markdown/dictionary/*.txt",
"/docs/theme/**/*.css",
"/docs/theme/**/*.js",
"/docs/theme/**/*.html",
"/requirements/*.txt",
"/soupsieve/**/*.py",
"/soupsieve/py.typed",
"/tests/**/*.py",
"/.pyspelling.yml",
"/.coveragerc",
"/mkdocs.yml"
]
[tool.mypy]
files = [
"soupsieve"
]
strict = true
show_error_codes = true
[tool.hatch.metadata.hooks.custom]
[tool.ruff]
line-length = 120
lint.select = [
"A", # flake8-builtins
"B", # flake8-bugbear
"D", # pydocstyle
"C4", # flake8-comprehensions
"N", # pep8-naming
"E", # pycodestyle
"F", # pyflakes
"PGH", # pygrep-hooks
"RUF", # ruff
# "UP", # pyupgrade
"W", # pycodestyle
"YTT", # flake8-2020,
"PERF" # Perflint
]
lint.ignore = [
"E741",
"D202",
"D401",
"D212",
"D203",
"N802",
"N801",
"N803",
"N806",
"N818",
"RUF012",
"RUF005",
"PGH004",
"RUF100",
"RUF022",
"RUF023"
]
[tool.tox]
legacy_tox_ini = """
[tox]
isolated_build = true
envlist =
py{38,39,310,311,312},
lint, nolxml, nohtml5lib
[testenv]
passenv = *
deps =
-rrequirements/tests.txt
commands =
mypy
pytest --cov soupsieve --cov-append {toxinidir}
coverage html -d {envtmpdir}/coverage
coverage xml
coverage report --show-missing
[testenv:documents]
passenv = *
deps =
-rrequirements/docs.txt
commands =
mkdocs build --clean --verbose --strict
pyspelling -j 8
[testenv:lint]
passenv = *
deps =
-rrequirements/lint.txt
commands =
"{envbindir}"/ruff check .
[testenv:nolxml]
passenv = *
deps =
-rrequirements/tests-nolxml.txt
commands =
pytest {toxinidir}
[testenv:nohtml5lib]
passenv = *
deps =
-rrequirements/tests-nohtml5lib.txt
commands =
pytest {toxinidir}
[pytest]
filterwarnings =
ignore:\nCSS selector pattern:UserWarning
"""
[tool.pytest.ini_options]
filterwarnings = [
"ignore:The 'strip_cdata':DeprecationWarning"
]
soupsieve-2.7/PKG-INFO 0000644 0000000 0000000 00000011030 13615410400 011414 0 ustar 00 Metadata-Version: 2.4
Name: soupsieve
Version: 2.7
Summary: A modern CSS selector implementation for Beautiful Soup.
Project-URL: Homepage, https://github.com/facelessuser/soupsieve
Author-email: Isaac Muse
License-Expression: MIT
License-File: LICENSE.md
Keywords: CSS,HTML,XML,filter,query,selector,soup
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
[![Donate via PayPal][donate-image]][donate-link]
[![Build][github-ci-image]][github-ci-link]
[![Coverage Status][codecov-image]][codecov-link]
[![PyPI Version][pypi-image]][pypi-link]
[![PyPI Downloads][pypi-down]][pypi-link]
[![PyPI - Python Version][python-image]][pypi-link]
[![License][license-image-mit]][license-link]
# Soup Sieve
## Overview
Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting,
matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1
specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup
version 4.7.0, it now is :confetti_ball:. Soup Sieve can also be imported in order to use its API directly for
more controlled, specialized parsing.
Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a
number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply
do not match anything. Some of the supported selectors are:
- `.classes`
- `#ids`
- `[attributes=value]`
- `parent child`
- `parent > child`
- `sibling ~ sibling`
- `sibling + sibling`
- `:not(element.class, element2.class)`
- `:is(element.class, element2.class)`
- `parent:has(> child)`
- and [many more](https://facelessuser.github.io/soupsieve/selectors/)
## Installation
You must have Beautiful Soup already installed:
```
pip install beautifulsoup4
```
In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via
some alternative method, and Soup Sieve is not automatically installed, you can install it directly:
```
pip install soupsieve
```
If you want to manually install it from source, first ensure that [`build`](https://pypi.org/project/build/) is
installed:
```
pip install build
```
Then navigate to the root of the project and build the wheel and install (replacing `` with the current version):
```
python -m build -w
pip install dist/soupsieve--py3-none-any.whl
```
## Documentation
Documentation is found here: https://facelessuser.github.io/soupsieve/.
## License
MIT
[bs4]: https://beautiful-soup-4.readthedocs.io/en/latest/#
[github-ci-image]: https://github.com/facelessuser/soupsieve/workflows/build/badge.svg
[github-ci-link]: https://github.com/facelessuser/soupsieve/actions?query=workflow%3Abuild+branch%3Amain
[codecov-image]: https://img.shields.io/codecov/c/github/facelessuser/soupsieve/master.svg?logo=codecov&logoColor=aaaaaa&labelColor=333333
[codecov-link]: https://codecov.io/github/facelessuser/soupsieve
[pypi-image]: https://img.shields.io/pypi/v/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-down]: https://img.shields.io/pypi/dm/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-link]: https://pypi.python.org/pypi/soupsieve
[python-image]: https://img.shields.io/pypi/pyversions/soupsieve?logo=python&logoColor=aaaaaa&labelColor=333333
[license-image-mit]: https://img.shields.io/badge/license-MIT-blue.svg?labelColor=333333
[license-link]: https://github.com/facelessuser/soupsieve/blob/main/LICENSE.md
[donate-image]: https://img.shields.io/badge/Donate-PayPal-3fabd1?logo=paypal
[donate-link]: https://www.paypal.me/facelessuser