Cross validation predictions for 20 newsgroups dataset. Positive (blue) class is Christianity, negative class is Atheism.

We are using bag of words (case sensitive), and logistic regression. The size of a word is proportional to its weight coefficient in logistic regression. Hovering the mouse over a word gives summary statistics for that word in the training data (frequency and class distribution).

The "bi-histogram" at the bottom shows the overall performance of the model. Correctly-classified examples are shown above the solid horizontal line, while incorrect examples are shown below. The histogram is interactive - you can click on a document in it to load it into the text editor.

Close