Cross validation predictions for 20 newsgroups dataset. Positive (blue) class is Christianity, negative class is
Atheism.
We are using bag of words (case sensitive), and logistic regression. The size of
a word is proportional to its weight coefficient in logistic regression.
Hovering the mouse over a word gives summary statistics for that word in the
training data (frequency and class distribution).
The "bi-histogram" at the bottom shows the overall performance of the model. Correctly-classified examples are shown
above the solid horizontal line, while incorrect examples are shown below. The histogram is interactive - you can
click on a document in it to load it into the text editor.
Close