In this tutorial I will show you how to remove all special characters, punctuation except spaces from string in Python.
The following program is to extract data from a URL using beautifulsoup package. If the title tag contain special characters then I want to remove it.
CODE:
import string from docx import Document from bs4 import BeautifulSoup import urllib.request def remove_symbols(title): trans = str.maketrans("", "", string.punctuation) cleaned_title = title.translate(trans) return cleaned_title hdr = {"User-Agent": "My Agent"} request = urllib.request.Request(url = 'https://tensix.com/oracle-bi-publisher-installation-error-inst-05058-a-lookup-of-the-address-for-this-machine/', headers=hdr) f = urllib.request.urlopen(request) myfile = f.read() soup = BeautifulSoup(myfile, 'html.parser') title = soup.title.text.strip() doc = Document() doc.add_heading(title, 1) cleaned_title = remove_symbols(title)print(cleaned_title)
But above code not removing full stops & numbers. I m going to use Regx to remove unwanted Characters.
CODE:
def remove_symbols(title): for k in title.split("\n"): return re.sub(r"[^a-zA-Z0-9]+", ' ', k)
Post your comments / questions
Recent Article
- How to create custom 404 error page in Django?
- Requested setting INSTALLED_APPS, but settings are not configured. You must either define..
- ValueError:All arrays must be of the same length - Python
- Check hostname requires server hostname - SOLVED
- How to restrict access to the page Access only for logged user in Django
- Migration admin.0001_initial is applied before its dependency admin.0001_initial on database default
- Add or change a related_name argument to the definition for 'auth.User.groups' or 'DriverUser.groups'. -Django ERROR
- Addition of two numbers in django python
Related Article