Don't put html, head and body tags automatically into beautifulsoup content

Was making a parser recently with BeautifulSoup. Came to the final with rendering contents of the edited text. Like so:

text = "<h1>Test text tag</h1>"
soup = BeautifulSoup(text, "html5")

text = soup.renderContents()
print text

It renders those contents with a result wrapped into the <html>, <head> and <body> tags. So print output looks like so:

'<html><head></head><body><h1>Test text tag</h1></body></html>'

That's a feature of the html5lib library, it fixes HTML that is lacking, such as adding back in missing required elements.

The workaround is simple enough:

text = soup.body.renderContents()

This solution will return an inside of the <html> tag <body>.
Result is:

text = '<h1>Test text tag</h1>'

Comments

Time Capsule for $25

The real article name might be something like: Configuring Raspbery Pi to serve like a Time Capsule with Netatalk 3.0 for Mountain Lion. But it's too long ;) Here I will describe the process of using Raspberry Pi like a Time Machine in my network. To be able to backup your MAC's remotely (Like it would be NAS of some kind). It assumes you have a Raspberry Pi and have installed a Raspbian there and have a ssh connection, or somehow having access to it's console. Refer to my previous article for details . Now that we have a Pi that is ready for action let's animate it. So to make it suit you as a Time Capsule (NAS) for your MAC's you need to do those basic steps: - connect and configure USB hard drive(s) - install support of HFS+ filesystem to be able to use MAC's native filesystem - make mount (auto-mount on boot) of your hard drive - install Avahi and Netatalk demons - configure Netatalk daemon to make it all serve as a Time Machine - configure ...

Django: Resetting Passwords (with internal tools)

I have had a task recently. It was about adding a forms/mechanism for resetting a password in our Django based project. We have had our own registration system ongoing... It's a corporate sector project. So you can not go and register yourself. Admins (probably via LDAP sync) will register your email/login in system. So you have to go there and only set yourself a password. For security reasons you can not register. One word. First I've tried to find standart decision. From reviewed by me were: django-registration and django password-reset . These are nice tools to install and give it a go. But I've needed a more complex decision. And the idea was that own bicycle is always better. So I've thought of django admin and that it has all the things you need to do this yourself in no time. (Actually it's django.contrib.auth part of django, but used out of the box in Admin UI) You can find views you need for this in there. they are: password_reset password_reset_...

CouchDB restoring deleted/updated documents and their data

We are using CouchDB for production and happy with it. It is much more lightweight rather then MongoDB yet powerful. (For our needs at least). But sometimes you have situations that some code deleted/spoiled your Couch Database data. We had some bugs leading to deleting indexes. However compaction have not been run and here is the decision. There are several ways for different situations. I'll try to cover them all. So for deleted CouchDB documents you need to: 1. Make sure your document with this id is Deleted. To do it you need to request CouchDB for this document. E.g. with this string: $db/$id Where $db is your CouchDB database name and $id is your deleted document id it should return something like this: { "error" : "not_found" , "reason" : "deleted" } 2. Get all the revisions of the deleted document. With this request: $db/$id?revs= true &open_revs=all Where $db is your CouchDB database name and $id is ...

Programmer blog

Search This Blog