Skip to content

selectolaxvsembed

MIT 10 1 1,607
4.5 million (month) Mar 01 2018 0.4.7(2026-03-06 09:23:35 ago)
2,103 6 71 MIT
Oct 26 2013 5.2 thousand (month) v4.4.15(2025-01-02 16:53:09 ago)

selectolax is a fast and lightweight library for parsing HTML and XML documents in Python. It is designed to be a drop-in replacement for the popular BeautifulSoup library, with significantly faster performance.

selectolax uses a Cython-based parser to quickly parse and navigate through HTML and XML documents. It provides a simple and intuitive API for working with the document's structure, similar to BeautifulSoup.

To use selectolax, you first need to install it via pip by running pip install selectolax``. Once it is installed, you can use theselectolax.html.fromstring()` function to parse an HTML document and create a selectolax object. For example: ``` from selectolax.parser import HTMLParser

html_string = "Hello, World!" root = HTMLParser(html_string).root print(root.tag) # html ` You can also use `selectolax.html.fromstring()` with file-like objects, bytes or file paths, as well as `selectolax.xml.fromstring() for parsing XML documents.

Once you have a selectolax object, you can use the select() method to search for elements in the document using CSS selectors, similar to BeautifulSoup. For example: body = root.select("body")[0] print(body.text()) # "Hello, World!"

Like BeautifulSoups find and find_all methods selectolax also supports searching using the search()`` method, which returns the first matching element, and thesearch_all()`` method, which returns all matching elements.

PHP library to get information from any web page (using oembed, opengraph, twitter-cards, scrapping the html, etc). It's compatible with any web service (youtube, vimeo, flickr, instagram, etc) and has adapters to some sites like (archive.org, github, facebook, etc).

Example Use


```python from selectolax.parser import HTMLParser html_string = "Hello, World!" root = HTMLParser(html_string).root print(root.tag) # html # use css selectors: body = root.select("body")[0] print(body.text()) # "Hello, World!" # find first matching element: body = root.search("body") print(body.text()) # "Hello, World!" # or all matching elements: html_string = "

paragraph1

paragraph2

" root = HTMLParser(html_string).root for el in root.search_all("p"): print(el.text()) # will print: # paragraph 1 # paragraph 2 ```
```javascript use Embed\Embed; $embed = new Embed(); //Load any url: $info = $embed->get('https://www.youtube.com/watch?v=PP1xn5wHtxE'); //Get content info $info->title; //The page title $info->description; //The page description $info->url; //The canonical url $info->keywords; //The page keywords $info->image; //The thumbnail or main image $info->code->html; //The code to embed the image, video, etc $info->code->width; //The exact width of the embed code (if exists) $info->code->height; //The exact height of the embed code (if exists) $info->code->ratio; //The aspect ratio (width/height) $info->authorName; //The resource author $info->authorUrl; //The author url $info->cms; //The cms used $info->language; //The language of the page $info->languages; //The alternative languages $info->providerName; //The provider name of the page (Youtube, Twitter, Instagram, etc) $info->providerUrl; //The provider url $info->icon; //The big icon of the site $info->favicon; //The favicon of the site (an .ico file or a png with up to 32x32px) $info->publishedTime; //The published time of the resource $info->license; //The license url of the resource $info->feeds; //The RSS/Atom feeds ```

Alternatives / Similar


Was this page helpful?