def sanitize_html (
html_string : str ,
* ,
remove_scripts : bool = True ,
remove_styles : bool = True ,
remove_svgs : bool = True ,
remove_comments : bool = True ,
remove_long_attributes : bool = True ,
max_attribute_length : int = 500 ,
preserve_attributes : list[ str ] | None = None ,
remove_empty_tags : bool = True ,
preserve_empty_tags : list[ str ] | None = None ,
minify_whitespace : bool = True ,
) -> str
Sanitizes and cleans HTML content by removing unwanted elements, attributes, and whitespace.
Provides fine-grained control over each cleaning operation through configurable options.
Examples
Basic Sanitization
Sanitization Options
from intuned_browser import sanitize_html
async def automation ( page , params , ** _kwargs ):
dirty_html = '''
<div>
<script>alert('xss')</script>
<p style="color: red;">Hello World</p>
<span></span>
</div>
'''
sanitized_html = sanitize_html(dirty_html)
# Output: '<div><p>Hello World</p></div>'
Arguments
The HTML content to sanitize
Remove all <script> elements. Defaults to True.
Remove all <style> elements. Defaults to True.
Remove all <svg> elements. Defaults to True.
Remove HTML comments. Defaults to True.
Remove attributes longer than max_attribute_length. Defaults to True.
Maximum length for attributes before removal. Defaults to 500.
List of attribute names to always preserve. Defaults to [“class”, “src”].
Remove empty tags (except preserved ones). Defaults to True.
preserve_empty_tags
list[str]
default: "['img']"
List of tag names to preserve even when empty. Defaults to [“img”].
Remove extra whitespace between tags and empty lines. Defaults to True.
Returns: str
The sanitized HTML string