Python爬虫:scrapy内置网页解析库parsel-通过css和xpath解析xml、html

文档

  • https://pypi.org/project/parsel/
  • https://github.com/scrapy/parsel

安装

pip install parsel

代码示例

from parsel import Selector

selector = Selector(text="""<
    html>
    
        <
    body>
    
            <
    h1>
    Hello, Parsel!<
    /h1>
    
            <
    ul>
    
                <
    li>
    <
    a href="http://example.com">
    Link 1<
    /a>
    <
    /li>
    
                <
    li>
    <
    a href="http://scrapy.org">
    Link 2<
    /a>
    <
    /li>
    
            <
    /ul>
    
        <
    /body>
    
        <
    /html>
    """)

selector.css('h1::text').get()
'Hello, Parsel!'

selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']

for li in selector.css('ul >
     li'):
    print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org