beautifulsoup¶

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

element

pip install beautifulsoup4

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_text)
links = [element.get('href') for element in soup.find_all('a')]

BeautifulSoup()¶

class beautifulsoup.BeautifulSoup(html_text: str, parser: str = 'html.parser')¶

Парсер html

soup = BeautifulSoup(some_html_string)
soup = BeautifulSoup(some_html_string, 'html.parser')
soup = BeautifulSoup(some_html_string, 'lxml')
soup = BeautifulSoup(some_html_string, 'lxml-xml')
soup = BeautifulSoup(some_html_string, 'html5lib')

body¶: Возвращает beautifulsoup.element.Tag

head¶: Возвращает beautifulsoup.element.Tag

title¶: Возвращает beautifulsoup.element.Tag

get_test()¶: Возвращает строку, весь текст, без html страницы

find(name=None, attributes={}, recursive=True, text=None, *kwargs)¶

name = None
attributes = {}
recursive = True
text = None
id
string

Возвращает первый найденный элемент, beautifulsoup.element.Tag

elem = soup.find(id='myId')
elem = soup.find('h2', string='Python')
elem = soup.find('h2', string=lambda text: 'Python' in text)

findAll(name=None, attributes={}, recursive=True, text=None, limit=None, *kwargs) → :py:class:`beautifulsoup.element.ResultSet`¶

Поиск элементов на странице

span_list = bs_obj.findAll('span', {'class': 'green'})
for span in span_list:
    print(span.get_text())

hs = bs_obj.findAll({'h1', 'h2', 'h3', 'h4', 'h5', 'h6'})

id_text_elem = bs_obj.findAll(id='text')

imgs = bs_obj.findAll('img', {'src': re.compile('\.\.\/img\/*\.jpg')})

imgs = bs_obj.findAll(lambda tag: len(tag.attrs) == 2)

prettify() → str¶

Возвращает строку, отформатированныую строку содержимого

print(soup.prettify())

Comment()¶

class beautifulsoup.Comment¶: Коментарии

beautifulsoup¶

BeautifulSoup()¶

Comment()¶

Table of Contents

Previous topic

Next topic

This Page