Skip to content
  • Home
  • About
  • Imprint

Recent Posts

  • How to align text on the whitespace after the first word character in vim.
  • How to start your RSpec / Cucumber / Spork / Watchr test environment with a single command
  • Genetic Algorithm vs. 0-1-KNAPSACK
  • The tale of why I chose Vim over Emacs and any IDE.
  • How to ignore changes in git submodules

Recent Comments

  • Lu Peipei on How to ignore changes in git submodules
  • Bernhard Streit on Das Liskovsche Substitutionsprinzip in JAVA ( Invarianz, Kovarianz, Kontravarianz )
  • Niemand on MySQL: Zufälligen Datensatz nach relativer Häufigkeit auswählen ohne RAND()
  • How to apply Naive Bayes Classifiers to document classification problems. | /bb|^[b]{2}/ on How to extract plain text from HTML with Nokogiri
  • DankeSagen on Das Liskovsche Substitutionsprinzip in JAVA ( Invarianz, Kovarianz, Kontravarianz )

Archives

  • September 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • October 2010
  • September 2010
  • July 2009
  • June 2009
  • May 2009
  • March 2009

Categories

  • ANTLR
  • Computational Intelligence
  • General
  • German
  • Git
  • Java
  • Machine Learning
  • Nokogiri
  • Rails
  • Vim

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

Blogroll

Beratung in Sachen Wald, Jagd und Hund

/bb|[ˆb]{2}/

A PhD Candidates Journey

Menu
Widgets
Search

Nokogiri

From http://nokogiri.org/ :

Nokogiri (?) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.

XML is like violence – if it doesn’t solve your problems, you are not using enough of it.

How to extract plain text from HTML with Nokogiri

While working at an upcoming blogpost, I encountered the problem of extracting some plain text from HTML. If I was interested the whole plain text, I could just run html2text in bash and feed it with the HTML, but what I needed was just a specific part of the plain text between two certain comments. As it was hard to google a simple solution for this I decided to share mine.

Continue reading →

March 12, 2011NilsH nokogiri, Ruby 1 Comment
Proudly powered by WordPress | Theme: Quadra by Automattic.