Skip to content

loofahvshtml-pipeline

MIT 17 4 982
3.4 million (month) Aug 18 2009 2.25.1(2026-03-17 17:35:19 ago)
2,346 19 1 MIT
Nov 07 2012 317.6 thousand (month) 3.2.4(2026-01-06 03:55:22 ago)

Loofah is a general library for manipulating and transforming HTML/XML documents and fragments, built on top of Nokogiri.

Loofah excels at HTML sanitization (XSS prevention). It includes some nice HTML sanitizers, which are based on HTML5lib's safelist, so it most likely won't make your codes less secure. (These statements have not been evaluated by Netexperts.)

Features:

  • Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's safelists).
  • Common HTML sanitizing tasks are built-in:
  • Strip unsafe tags, leaving behind only the inner text.
  • Prune unsafe tags and their subtrees, removing all traces that they ever existed.
  • Escape unsafe tags and their subtrees, leaving behind lots of < and > entities.
  • Whitewash the markup, removing all attributes and namespaced nodes.
  • Common HTML transformation tasks are built-in:
  • Add the nofollow attribute to all hyperlinks.
  • Format markup as plain text, with or without sensible whitespace handling around block elements.
  • Replace Rails's strip_tags and sanitize view helper methods.

html-pipeline is a Ruby gem that provides a simple and powerful way to process and transform HTML documents. It is designed to be highly modular and extensible, allowing you to easily add custom filters and processors to transform the HTML in any way you need. It is widely used in the Ruby ecosystem, particularly in the context of Markdown processing and other forms of text-to-HTML conversion.

One of the main features of html-pipeline is its ability to process HTML documents using a pipeline of filters and processors. These filters and processors can perform a wide variety of tasks, such as converting text to HTML, linking mentions and hashtags, and sanitizing HTML to remove unwanted elements or attributes.
By chaining multiple filters and processors together, you can easily create complex and powerful transformations of HTML documents.

In addition to its processing capabilities, html-pipeline also provides a variety of other features that can simplify the process of working with HTML documents. It can automatically handle character encodings, it can parse and search large documents with low memory usage, and it can validate documents against a DTD or schema.

Example Use


```ruby require 'loofah' html_string = '

Hello World! Link

' # Sanitize the HTML string scrubbed_html = Loofah.fragment(html_string) scrubbed_html.scrub!(:strip) scrubbed_html.scrub!(:strip_enveloping_whitespace) scrubbed_html.scrub!(:strip_tags, "b") scrubbed_html.scrub!(:remove_attribute, "a", "href") puts scrubbed_html.to_s # Output:

Hello World! Link

```
```ruby require 'html/pipeline' html_string = '

Hello World! Link

' # Create a pipeline with the desired filters pipeline = HTML::Pipeline.new [ # there are built-in filter like: HTML::Pipeline::SanitizationFilter, HTML::Pipeline::MentionFilter ] # Process the HTML string result = pipeline.call(html_string) puts result[:output].to_s # Output:

Hello World! Link

# Alternatively, we can write our own filter: class MyFilter < HTML::Pipeline::Filter def call # Your custom logic here doc.search('a').each do |a| a.set_attribute('target', '_blank') end doc end end pipeline = HTML::Pipeline.new [ MyFilter ] ```

Alternatives / Similar


Was this page helpful?