html-pipelinevsloofah

MIT 1 18 2,269

280.7 thousand (month) Nov 07 2012 3.2.2(2 months ago)

942 4 18 MIT

Aug 18 2009 2.9 million (month) 2.24.0(a month ago)

html-pipeline is a Ruby gem that provides a simple and powerful way to process and transform HTML documents. It is designed to be highly modular and extensible, allowing you to easily add custom filters and processors to transform the HTML in any way you need. It is widely used in the Ruby ecosystem, particularly in the context of Markdown processing and other forms of text-to-HTML conversion.

One of the main features of html-pipeline is its ability to process HTML documents using a pipeline of filters and processors. These filters and processors can perform a wide variety of tasks, such as converting text to HTML, linking mentions and hashtags, and sanitizing HTML to remove unwanted elements or attributes.
By chaining multiple filters and processors together, you can easily create complex and powerful transformations of HTML documents.

In addition to its processing capabilities, html-pipeline also provides a variety of other features that can simplify the process of working with HTML documents. It can automatically handle character encodings, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.

Loofah is a general library for manipulating and transforming HTML/XML documents and fragments, built on top of Nokogiri.

Loofah excels at HTML sanitization (XSS prevention). It includes some nice HTML sanitizers, which are based on HTML5lib's safelist, so it most likely won't make your codes less secure. (These statements have not been evaluated by Netexperts.)

Features:

Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's safelists).
Common HTML sanitizing tasks are built-in:
Strip unsafe tags, leaving behind only the inner text.
Prune unsafe tags and their subtrees, removing all traces that they ever existed.
Escape unsafe tags and their subtrees, leaving behind lots of < and > entities.
Whitewash the markup, removing all attributes and namespaced nodes.
Common HTML transformation tasks are built-in:
Add the nofollow attribute to all hyperlinks.
Format markup as plain text, with or without sensible whitespace handling around block elements.
Replace Rails's strip_tags and sanitize view helper methods.

Example Use

require 'html/pipeline'

html_string = '<p>Hello <b>World!</b> <a href="javascript:alert(1)">Link</a></p>'

# Create a pipeline with the desired filters
pipeline = HTML::Pipeline.new [
  # there are built-in filter like:
  HTML::Pipeline::SanitizationFilter,
  HTML::Pipeline::MentionFilter
]

# Process the HTML string
result = pipeline.call(html_string)
puts result[:output].to_s
# Output: <p>Hello <b>World!</b> <a>Link</a></p>


# Alternatively, we can write our own filter:
class MyFilter < HTML::Pipeline::Filter
  def call
    # Your custom logic here
    doc.search('a').each do |a|
        a.set_attribute('target', '_blank')
    end
    doc
  end
end
pipeline = HTML::Pipeline.new [
  MyFilter
]

require 'loofah'

html_string = '<p>Hello <b>World!</b> <a href="javascript:alert(1)">Link</a></p>'

# Sanitize the HTML string
scrubbed_html = Loofah.fragment(html_string)
scrubbed_html.scrub!(:strip)
scrubbed_html.scrub!(:strip_enveloping_whitespace)
scrubbed_html.scrub!(:strip_tags, "b")
scrubbed_html.scrub!(:remove_attribute, "a", "href")

puts scrubbed_html.to_s
# Output: <p>Hello World! <a>Link</a></p>

Alternatives / Similar

loofah

942 compare

html-pipeline

2,269 compare