While working at an upcoming blogpost, I encountered the problem of extracting some plain text from HTML. If I was interested the whole plain text, I could just run html2text
in bash and feed it with the HTML, but what I needed was just a specific part of the plain text between two certain comments. As it was hard to google a simple solution for this I decided to share mine.
Nokogiri
From http://nokogiri.org/ :
Nokogiri (?) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.
XML is like violence – if it doesn’t solve your problems, you are not using enough of it.