Displaying posts published in

March 2011

How to apply Naive Bayes Classifiers to document classification problems.

Within the last decades it turned out that it is often much easier to tell a computer how to learn to do a specific task rather then telling it exactly how to do it. One of the generic terms for this could be Machine Learning, which is basically summarized by Wikipedia as: Machine learning, a [...]

How to extract plain text from HTML with Nokogiri

While working at an upcoming blogpost, I encountered the problem of extracting some plain text from HTML. If I was interested the whole plain text, I could just run html2text in bash and feed it with the HTML, but what I needed was just a specific part of the plain text between two certain comments. [...]