html-pipeline
html-pipeline is a Ruby gem that provides a simple and powerful way to process and transform HTML documents. It is designed to be highly modular and extensible, allowing you to easily add custom filters and processors to transform the HTML in any way you need. It is widely used in the Ruby ecosystem, particularly in the context of Markdown processing and other forms of text-to-HTML conversion.
One of the main features of html-pipeline is its ability to process HTML documents using a pipeline of filters and processors.
These filters and processors can perform a wide variety of tasks, such as converting text to HTML, linking mentions and hashtags,
and sanitizing HTML to remove unwanted elements or attributes.
By chaining multiple filters and processors together, you can easily create complex and powerful transformations of HTML documents.
In addition to its processing capabilities, html-pipeline also provides a variety of other features that can simplify the process of working with HTML documents. It can automatically handle character encodings, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.
Example Use
require 'html/pipeline'
html_string = '<p>Hello <b>World!</b> <a href="javascript:alert(1)">Link</a></p>'
# Create a pipeline with the desired filters
pipeline = HTML::Pipeline.new [
# there are built-in filter like:
HTML::Pipeline::SanitizationFilter,
HTML::Pipeline::MentionFilter
]
# Process the HTML string
result = pipeline.call(html_string)
puts result[:output].to_s
# Output: <p>Hello <b>World!</b> <a>Link</a></p>
# Alternatively, we can write our own filter:
class MyFilter < HTML::Pipeline::Filter
def call
# Your custom logic here
doc.search('a').each do |a|
a.set_attribute('target', '_blank')
end
doc
end
end
pipeline = HTML::Pipeline.new [
MyFilter
]