myhtml

Fast HTML5 Parser that includes CSS selectors html parser fast myhtml
0.12 released
kostya/myhtml
152 11
Kostya M

MyHTML

Build Status

Crystal wrapper for HTML5 Parser https://github.com/lexborisov/myhtml

Installation

Add this to your application's shard.yml:

dependencies:
  myhtml:
    github: kostya/myhtml
    branch: master

And run crystal deps

Development Setup:

  git clone https://github.com/kostya/myhtml.git
  cd myhtml
  make

  crystal spec

Usage

# Example: print all html tree

require "myhtml"

def walk(node, level = 0)
  puts "#{" " * level}#{node.tag_name}#{node.attributes}(#{node.tag_text.strip})"
  node.children.each { |child| walk(child, level + 1) }
end

str = if filename = ARGV[0]?
        File.read(filename, "UTF-8", invalid: :skip)
      else
        "<html><Div><span class='test'>HTML</span></div></html>"
      end

parser = Myhtml::Parser.new(str)
walk(parser.root!)

Output:

html{}()
 head{}()
 body{}()
  div{}()
   span{"class" => "test"}()
    -text{}(HTML)

More Examples

examples

specs

CSS Selectors with shard modest

modest

Benchmark

Comparing with ruby-nokorigi(libxml), and crystal-crystagiri(libxml). Parse 1000 times google page, code: https://github.com/kostya/modest/tree/master/bench

require "modest"
page = File.read("./google.html")
s = 0
links = [] of String
1000.times do
  myhtml = Myhtml::Parser.new(page)
  links = myhtml.css("div.g h3.r a").map(&.attribute_by("href")).to_a
  s += links.size
  myhtml.free
end
p links.last
p s

| Lang | Package | Time, s | Memory, MiB | | -------- | ------------------ | ------- | ----------- | | Crystal | modest(myhtml) | 2.62 | 9.8 | | Crystal | Crystagiri(LibXML) | 19.89 | 11.5 | | Ruby 2.2 | Nokogiri(LibXML) | 45.82 | 136.2 |

myhtml:
  github: kostya/myhtml
  version: ~> 0.12
License MIT
Crystal none

Authors

Dependencies 0

Development Dependencies 0

Similar shards

Last synced .
search fire star recently