Voting Feedback

Story: Using Mathematical Probability to improve Linux news aggregationTotal Replies: 8
Author Content
bstadil

Apr 27, 2004
12:08 PM EDT
This is very interesting. Have you implemented any method so the Voting is incorporated into the selection process?
dave

Apr 27, 2004
12:13 PM EDT
Not yet, but I am planning to do just that, and it's not going to be very hard at all.

I have written several functions to assist this, whereby basically I call the function with the id of the story, and a status ("good" or "bad") and the functions takes it from there.

So, bringing data from other sources (like these votes) is going to be very easy!

Which, of course, will result in the users having a very real impact on the editorial selection.

dave
chappaquachap

Apr 27, 2004
12:16 PM EDT
Go for it.
peragrin

Apr 27, 2004
12:27 PM EDT
Training a filter in this manner might be hard. Spam is generally rejected based on it's content. Just because the story has the word linux in it doesn't mean that is is good. Your first few months you will have to work it out which way to go.

For example Enderle or Laura Dio write an article with out any facts but mention Open Source and Linux, not every article by those people will be posted but you might want a few of them(hey you never know). Teaching the filter the difference between content of words and content of meaning will be the hard part.

Good Luck.
dave

Apr 27, 2004
12:31 PM EDT
One of the nice things about bogofilter and spambayes is that it can be immediately useful to you if you already have a corpus of saved mail.

Suppose you've been saving your mail for the past 6 months: good mail in one folder and bad mail in another. You can just throw all that stuff at the filter and bingo - you've got your filter already nicely populated.

This is the case here, where I have already thrown my 10,000 deleted and approved stories into the bayes filter. So I can now already see how it will work today and basically in the future.

You are right that just because the word "Linux" is there, doesn't mean that it'll be posted. Just like in E-mail, the word "viagra" might not immediately disqualify it, either.

Consider this: I post almost every single article that Steve Shankland at CNET writes. So, if a story comes through and my bayes sees that it is written by Shankland, it knows that I will probably like it.

But, suppose a story by Enderle comes through, and the bayes sees that I have rejected several stories from that author, then it'll think that I might not like this one, either, and will penalize the story appropriately.

That's how bayesian inference works. It's not keyword based, but probability based. It looks at the entire message as a whole and makes decisions.

dave
plovs

Apr 27, 2004
10:31 PM EDT
If you will be using voting to filter what stories should on your site, then maybe you should expand the voting system a little. Now we can vote: Worth reading yes or no, but some stories are great to read, I just happen to totally disagree with the writers view-point. It looks like some people vote no just to disagree with the content, not with the article as being worth reading.
dave

Apr 29, 2004
4:45 AM EDT
This morning I integrated all the member votes (going back to the beginning) into the bayesian editor's filter and it had surprising results. The quality of the results have now improved incredibly. Stories are now actually reordered and scores are much higher. Further, I personally agree with the queue much better now.

From now on, every time you vote, you will be having a direct impact on the editorial selection of this site! Cool! :-)

dave
plovs

Apr 29, 2004
9:45 AM EDT
Great work Dave! Your site is getting better and better, thanks alot.
dave

Apr 29, 2004
10:59 AM EDT
If anyone wants to take a look at the queue, I set it up so you can do just that:

http://lxer.com/module/newswire/stories/bayes.php

You MUST be logged in before you can access the queue (I don't want search engines and other robots hitting this page). Take a look and check it out. There may be only a few stories when you look (or a lot, or none at all) depending on when I last hit the queue, but anyway it'll be fun to see it and check out the scores. Keep in mind that this is still a work in progress!

dave

Posting in this forum is limited to members of the group: [ForumMods, SITEADMINS, MEMBERS.]

Becoming a member of LXer is easy and free. Join Us!