Linux News
The world is talking about GNU/Linux and Free/Open Source Software

Login

If you don't have an account yet, visit the registration page to sign up.

If you already have an account, you may login here:

Today's Big Story

Raspberry Pi Radio Module 2 Launches For $4

LXer Features

Linux That's Small

Encryption, Trust, and the Hidden Dangers of Vendor-Controlled Data

My Linux Mint Tribute

How I Turned My Chromebook Into A "Mintbook"

Adventures With My New Chromebook

My Linux Laptop

Have something to say?

Ready to be published? LXer is read by around 350,000 individuals each month, and is an excellent place for you to publish your ideas, thoughts, reviews, complaints, etc. Do you have something to say to the Linux community?

Publish it here.

DaniWeb Linux Community
An exciting professional discussion group about software development, php, shell scripting, networking, ruby, and more.

Latest Discussions

Switching Wallpapers With Hyprpaper on openSUSE Tumbleweed

Debian Installer Trixie RC 1 release (official announcement)

UPDATE as of 04/24/2025 Agama Installer works fine

Time travel baby!

Are they

Well... probably not as good as some might think

Plasma-desktop 4:6.3.0-1 MIGRATED to Debian testing

Uh, you're being lied to.

Debian May Be Leaning Towards Systemd Over Upstart

Retro Remake products

More...

Site Menu

Other News

- LWN.net
Their weekly coverage of Linux news is unmatched in this community.

- LinuxGizmos.com
Excellent news for embedded Linux.

- LinuxQuestions.org
Discussion forums for Linux users.

LinuxQuestions.org is a friendly and active Linux Community with forums, reviews, a hardware compatibility list, a wiki, tutorials, a download site, a podcast and more.

Dynamic Bayesian driven Story listing

Forum: LXer Meta Forum

Total Replies: 6

Author	Content
bstadil Jul 09, 2004 7:03 AM CST	Dave, I think your system selecting News using Bayesian filter is very interesting and think it should get some coverage elsewhere. I submitted a story to Slashdot a while back but it was rejected. I was thinking that the news stories on your site and any other news site once entered just sits passively in the sequence and fall off once the length of the line has been reached. Why not try and use Bayesian filtering to determine the fall off rate. Highly rated stories or stories with a high score gets to stay longer. Some sites has a "most popular" section but I think what I am proposing would be better, maybe flagged with a "hours since posted" time stamp or something.
dave Jul 09, 2004 7:08 AM CST	Hey, that's a very good idea. I hadn't considered using bayesian algorithms to determine the front page "drop off" rate. The main problem is, my bayes code is not at all perfect, and I'm not satisfied with it. I'm getting a lot of "in the middle" type returns, like between 30% or 70% but very few ~0% and ~99%, which is what I want. I wonder if I put the code up somewhere, if anyone else would be interested in taking it and actually making it work. What is really needed is a Bayes class (PHP) that really works well (developerWorks had one, but it's not generic enough). I emailed Eric Raymond (author of bogofilter) and suggested he make this tool and, while he expressed some interest, he hasn't acted on it. The bayes stuff here needs to be much better before I'll put it in production use for the users. bstadil: are you a mathematician? dave
dave Jul 09, 2004 8:30 AM CST	Interesting. I found this: http://www.phpgeek.com/pragmacms/index.php?layout=main&cslot... It's a GPL generic Bayesian filter, which is EXACTLY what I've been looking for. This wasn't available several months ago but it sure is now. I'm playing with it at the moment and we'll see how it goes. If I can get this perfected, then I'll definitely be incorporating this all over the place, including in user preferences (your own personal newswire with the stuff that the system knows you do enjoy) dave
bstadil Jul 09, 2004 2:49 PM CST	I have been using POPFile for a year or so and it is extremely accurate. I just checked my accuracy and it's at 99.54%, so the generic Bayesian filter is probably an excellent choice. . You asked whether I am mathematician. No, not as such but have an advanced degree in Operations Research, You know model building and optimization stuff. Maybe you want to make something dead simple for the variable "drop off" rate, in case it turns out to be confusing or not something people want. Second I have mixed feelings about having your own ranking and selection of stories. I kind of like the idea that you benefit from others view and that the selection is a bit of a group effort.
dave Jul 09, 2004 6:04 PM CST	The front page, I think, would always be available as it is right now, but I'm thinking of giving another sorting option, which would be a personal sorter. Not a lot of success with the naivebayesian (sic) tool today. I'll play with it again tomorrow, if I find time between gardening sessions. :) I'll bounce any ideas off you as I get 'em. dave
sbergman27 Jul 11, 2004 9:37 AM CST	I think it would be good if there were more explicit guidelines as to just what "Worth Reading" means. I just read "The Five Top Objections to Open Source". It was an excellent article... for someone considering moving toward open source software. It would certainly be "worth reading" for that category of reader. That category of reader would probably be in the minority here and also less likely to vote than the regulars here who might not find it "worth reading". I consider those readers to be quite important. I would like to see more article like that on the site, since I see lxer as being an important portal for people just starting to investigate open source, even if their presence is transitory. (OSS is not a passion for those people, after all.) As discussed in a previous thread, some people may vote negatively simply because an article points out a problem with OSS as opposed to certain proprietary software, or positively to an article which is mainly a fluffy Linux "love fest" without much substance. Without clear guidelines, that's perfectly understandable. Bayesian filtering is a statistical method, and as with any statistical method, it is subject to Garbage In->Garbage Out. The more consistent the voting criteria, the more effective the filter will be. Another point that comes to mind regards voting frequency. If the Bayesian filter does its job well, then many people will be seeing articles which are "worth reading" and only a few which are not. So, do they vote positively for even the average articles (which are theoretically "worth reading" but not especially so) or do they vote only for the exceptionally good ones and against the exceptionally bad ones? Remember, in spam filtering, it doesn't take that much "spam in the ham" or vice versa to throw off the filter. Or at least that's my understanding.
bstadil Jul 12, 2004 6:05 AM CST	You make some interesting points but I do not think they are major. If the Bayesian filter determines the Fall-Off rate and the Cut-Off rate, then the lesser ranked stories will have a time span where they can be voted on. Albeit a lesser time span than the initial highly ranked stories. Once implemented a positive vote could result in a Kicker of some kind, the Kicker value to be determined by how the system in general performs.

Posting in this forum is limited to members of the group: [Editors, MEMBERS, SITEADMINS.]

Becoming a member of LXer is easy and free. Join Us!

Linux NewsThe world is talking about GNU/Linux and Free/Open Source Software