File size: 10,077 Bytes
8a58cf3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
"""Integration code for CSS selectors using Soup Sieve (pypi: soupsieve)."""

import warnings
try:
    import soupsieve
except ImportError as e:
    soupsieve = None
    warnings.warn(
        'The soupsieve package is not installed. CSS selectors cannot be used.'
    )


class CSS(object):
    """A proxy object against the soupsieve library, to simplify its
    CSS selector API.

    Acquire this object through the .css attribute on the
    BeautifulSoup object, or on the Tag you want to use as the
    starting point for a CSS selector.

    The main advantage of doing this is that the tag to be selected
    against doesn't need to be explicitly specified in the function
    calls, since it's already scoped to a tag.
    """

    def __init__(self, tag, api=soupsieve):
        """Constructor.

        You don't need to instantiate this class yourself; instead,
        access the .css attribute on the BeautifulSoup object, or on
        the Tag you want to use as the starting point for your CSS
        selector.

        :param tag: All CSS selectors will use this as their starting
        point.

        :param api: A plug-in replacement for the soupsieve module,
        designed mainly for use in tests.
        """
        if api is None:
            raise NotImplementedError(
                "Cannot execute CSS selectors because the soupsieve package is not installed."
            )
        self.api = api
        self.tag = tag

    def escape(self, ident):
        """Escape a CSS identifier.

        This is a simple wrapper around soupselect.escape(). See the
        documentation for that function for more information.
        """
        if soupsieve is None:
            raise NotImplementedError(
                "Cannot escape CSS identifiers because the soupsieve package is not installed."
            )
        return self.api.escape(ident)

    def _ns(self, ns, select):
        """Normalize a dictionary of namespaces."""
        if not isinstance(select, self.api.SoupSieve) and ns is None:
            # If the selector is a precompiled pattern, it already has
            # a namespace context compiled in, which cannot be
            # replaced.
            ns = self.tag._namespaces
        return ns

    def _rs(self, results):
        """Normalize a list of results to a Resultset.

        A ResultSet is more consistent with the rest of Beautiful
        Soup's API, and ResultSet.__getattr__ has a helpful error
        message if you try to treat a list of results as a single
        result (a common mistake).
        """
        # Import here to avoid circular import
        from bs4.element import ResultSet
        return ResultSet(None, results)

    def compile(self, select, namespaces=None, flags=0, **kwargs):
        """Pre-compile a selector and return the compiled object.

        :param selector: A CSS selector.

        :param namespaces: A dictionary mapping namespace prefixes
           used in the CSS selector to namespace URIs. By default,
           Beautiful Soup will use the prefixes it encountered while
           parsing the document.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.compile() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
           soupsieve.compile() method.

        :return: A precompiled selector object.
        :rtype: soupsieve.SoupSieve
        """
        return self.api.compile(
            select, self._ns(namespaces, select), flags, **kwargs
        )

    def select_one(self, select, namespaces=None, flags=0, **kwargs):
        """Perform a CSS selection operation on the current Tag and return the
        first result.

        This uses the Soup Sieve library. For more information, see
        that library's documentation for the soupsieve.select_one()
        method.

        :param selector: A CSS selector.

        :param namespaces: A dictionary mapping namespace prefixes
           used in the CSS selector to namespace URIs. By default,
           Beautiful Soup will use the prefixes it encountered while
           parsing the document.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.select_one() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
           soupsieve.select_one() method.

        :return: A Tag, or None if the selector has no match.
        :rtype: bs4.element.Tag

        """
        return self.api.select_one(
            select, self.tag, self._ns(namespaces, select), flags, **kwargs
        )

    def select(self, select, namespaces=None, limit=0, flags=0, **kwargs):
        """Perform a CSS selection operation on the current Tag.

        This uses the Soup Sieve library. For more information, see
        that library's documentation for the soupsieve.select()
        method.

        :param selector: A string containing a CSS selector.

        :param namespaces: A dictionary mapping namespace prefixes
            used in the CSS selector to namespace URIs. By default,
            Beautiful Soup will pass in the prefixes it encountered while
            parsing the document.

        :param limit: After finding this number of results, stop looking.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.select() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
            soupsieve.select() method.

        :return: A ResultSet of Tag objects.
        :rtype: bs4.element.ResultSet

        """
        if limit is None:
            limit = 0

        return self._rs(
            self.api.select(
                select, self.tag, self._ns(namespaces, select), limit, flags,
                **kwargs
            )
        )

    def iselect(self, select, namespaces=None, limit=0, flags=0, **kwargs):
        """Perform a CSS selection operation on the current Tag.

        This uses the Soup Sieve library. For more information, see
        that library's documentation for the soupsieve.iselect()
        method. It is the same as select(), but it returns a generator
        instead of a list.

        :param selector: A string containing a CSS selector.

        :param namespaces: A dictionary mapping namespace prefixes
            used in the CSS selector to namespace URIs. By default,
            Beautiful Soup will pass in the prefixes it encountered while
            parsing the document.

        :param limit: After finding this number of results, stop looking.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.iselect() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
            soupsieve.iselect() method.

        :return: A generator
        :rtype: types.GeneratorType
        """
        return self.api.iselect(
            select, self.tag, self._ns(namespaces, select), limit, flags, **kwargs
        )

    def closest(self, select, namespaces=None, flags=0, **kwargs):
        """Find the Tag closest to this one that matches the given selector.

        This uses the Soup Sieve library. For more information, see
        that library's documentation for the soupsieve.closest()
        method.

        :param selector: A string containing a CSS selector.

        :param namespaces: A dictionary mapping namespace prefixes
            used in the CSS selector to namespace URIs. By default,
            Beautiful Soup will pass in the prefixes it encountered while
            parsing the document.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.closest() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
            soupsieve.closest() method.

        :return: A Tag, or None if there is no match.
        :rtype: bs4.Tag

        """
        return self.api.closest(
            select, self.tag, self._ns(namespaces, select), flags, **kwargs
        )

    def match(self, select, namespaces=None, flags=0, **kwargs):
        """Check whether this Tag matches the given CSS selector.

        This uses the Soup Sieve library. For more information, see
        that library's documentation for the soupsieve.match()
        method.

        :param: a CSS selector.

        :param namespaces: A dictionary mapping namespace prefixes
            used in the CSS selector to namespace URIs. By default,
            Beautiful Soup will pass in the prefixes it encountered while
            parsing the document.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.match() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
            soupsieve.match() method.

        :return: True if this Tag matches the selector; False otherwise.
        :rtype: bool
        """
        return self.api.match(
            select, self.tag, self._ns(namespaces, select), flags, **kwargs
        )

    def filter(self, select, namespaces=None, flags=0, **kwargs):
        """Filter this Tag's direct children based on the given CSS selector.

        This uses the Soup Sieve library. It works the same way as
        passing this Tag into that library's soupsieve.filter()
        method. More information, for more information see the
        documentation for soupsieve.filter().

        :param namespaces: A dictionary mapping namespace prefixes
            used in the CSS selector to namespace URIs. By default,
            Beautiful Soup will pass in the prefixes it encountered while
            parsing the document.

        :param flags: Flags to be passed into Soup Sieve's
            soupsieve.filter() method.

        :param kwargs: Keyword arguments to be passed into SoupSieve's
            soupsieve.filter() method.

        :return: A ResultSet of Tag objects.
        :rtype: bs4.element.ResultSet

        """
        return self._rs(
            self.api.filter(
                select, self.tag, self._ns(namespaces, select), flags, **kwargs
            )
        )