The biggest problem with newsreaders, IME, has been managing large numbers of feeds. Most user time is spent handling redundant stories - e.g., if you have feeds from many major news sources, for each major event you get one or more stories on each feed, saying mostly the same things.
I haven't seen a newsreader solve that problem. Has anyone tried an LLM?
The best solution I know is grouping redundant stories together, possibly hierarchically: e.g., Sports > Olympics > Figure skating > Jones performance. (Fewer feeds require fewer levels, possibly just one.)
That ~ deduplicates the stories and, by displaying them together, you can compare and choose the coverage you like and delete the rest. Otherwise, IME most user time is spent sorting through redundant stories one at a time.
But as I said, I haven't seen a newsreader do that well. It seems like a good fit for LLMs. Or maybe there's another solution besides grouping?
PaulHoule
today at 8:26 PM
My YOShInOn RSS reader uses an SBERT model for classification (will I upvote this or not?) and large-scale clustering (20 k-means clusters and show me the top N in each cluster so I get a diversity of different articles.)
For duplicate detection I am using DBSCAN
https://scikit-learn.org/stable/modules/generated/sklearn.cl...
and found some parameters where I get almost no false positives but a lot of duplicates get missed when I lowered the threshold to make clusters I started getting false positives fast. I don't find duplicates are a big problem in my system with the 110 feeds I have and the subjects I am interested in, but insofar as they are a problem there tend to be structured relationships between articles: that is, site A syndicates articles from site B but for some reason articles from site A usually get selected and site B articles don't. An article from Site A often links to one or more articles, often that I don't have a feed for, and it would be nice if the system looked at the whole constellation. Stuff like that.
Effective clustering is the really interesting technology Google News has had for a long time.
I have been attempting this exact sort of clustering solution for a few years now (on and off as a side project). Do you have source code available, or more detailed explanations/resources of how to approach this?
Edit: I just looked around for your YOShInOn RSS reader code and couldn't find it. I did find a number of references it looks like you've made to it on various forums, etc over the years.
PaulHoule
today at 8:55 PM
The technical report on YOShInOn is about 2 years overdue!
You mean the k-means for diversity or DBSCAN for duplicates? Either way it is about 10 lines of scikit-learn code. Send me an email.
Both. Just sent an email. Thanks!
That was partially the original promise of Fever, which is the API many RSS services still support and that somehow lives on.
Nuzzle did something similar for Twitter but shut down (https://daringfireball.net/linked/2021/05/05/nuzzel).
That would be a good addition to feed readers, especially for news feeds.
emschwartz
today at 9:27 PM
You should try Scour (https://scour.ing)!
You specify your interests as free form text, it ranks articles by how closely they match, and you can consume your Scour feed as an RSS feed to read it in NNW.
Disclaimer: I’m the developer
cosmic_cheese
today at 7:37 PM
I haven't used it much but I think Iconfactory's Tapestry[0] does some of this.
[0]: https://usetapestry.com/