Skip to main content

Posts

Showing posts with the label beautifulsoup

Don't put html, head and body tags automatically into beautifulsoup content

Was making a parser recently with BeautifulSoup. Came to the final with rendering contents of the edited text. Like so: text = "<h1>Test text tag</h1>" soup = BeautifulSoup(text, "html5" ) text = soup . renderContents() print text It renders those contents with a result wrapped into the <html>, <head> and <body> tags. So print output looks like so: '<html><head></head><body><h1>Test text tag</h1></body></html>' That's a feature of the html5lib library, it fixes HTML that is lacking, such as adding back in missing required elements. The workaround is simple enough: text = soup . body . renderContents() This solution will return an inside of the <html> tag <body>. Result is: text = '<h1>Test text tag</h1>'