New Yanoff User Manual

Terminology
HISTORY:
THE DEAL:
HOW IT WORKS:
(Re)Defining Servers:
NewsArts.pdb; message storage:
NewsGroup*.pdb; Newsgroup DBs:
DB Synchronicity
Unindexed Messages
Unsubscribed Articles
Bogus Message References
Indexing and Thread Caching
Crossposting and the History DB

TERMINOLOGY:

Throughout New Yanoff and its documentation we use the following conventions: "Yanoff" refers to all versions of Yanoff, "GPL Yanoff" refers to the original open-source Yanoff, "Yanoff-" refers to the totally demoware (free) version of the private-branch (our) Yanoff, "Yanoff+" refers to the trialware (pay-to-keep) version of the private-branch (our) Yanoff, and "New Yanoff" refers to both "Yanoff-" and "Yanoff+", "article" refers to a Usenet/NNTP message, "email" refers to an email/SMTP message, "message" is generic including both/either NNTP and/or SMTP messages, the word is "Newsgroup" is usually abbreviated as "NG" and "Message-ID" as "MID".

HISTORY:

In mid-June 1999, pioneer Matthias Jordan released a mostly-complete, relatively-stable, perfectly useable, open-source (under GPL) newsreader called Yanoff with the hopes that others would carry on his work and mature the app. Unfortunately, that never really happened (the great exception being the wonderful Conduit that was created by Jan-Pascal van Best but that was an original work, not an expansion of Yanoff itself). Several years passed and while Yanoff's popularity increased, it never spurred any further development. A "power user" decided he would jump in and fix some of the things that bugged him. After much discussion and a brief experiment with "Ransom-Ware" as a method to generate revenue from his improvements, all parties agreed to a licensing plan that would benefit everyone.

THE DEAL:

In exchange for an unconditional license of the single-point-in-time fork of the GPL source code, we agreed to pay royalty to the Free Software Foundation (FSF) on behalf of Matthias. In addition we agreed to pass along bug reports (and fixes) to Matthias so that anyone interested in fixing the GPL release might do so. Beyond the contracted obligations, we have decided to release two updated versions. One, Yanoff+, is a super-powered update with all of the known bugs of GPL (original) Yanoff fixed. Yanoff+ is "Trialware" in that after 15 days, it disables many features until it is unlocked by installing a license key. The other, Yanoff-, is TOTALLY free and includes most (if not all) of the features of GPL Yanoff plus all the bug fixes and many of the features of Yanoff+. However the most powerful and tempting features are not present so as to provide incentive for people to purchase a license for Yanoff+ (a complete list of differences is in the FAQ).

This way everybody wins. The GPL version will live on and anybody who wants to jump in and fix it up still can. Meanwhile EVERYBODY has a bug-fixed, feature-enhanced version (Yanoff-) that is totally FREE. Those people who want all the bells and whistles have the opportunity to buy the super-enhanced version (Yanoff+). The FSF gets some revenue which benefits everyone (they do GREAT work and we use their tools). And lastly we, the developers get compensation for the work and cost of adding the increased value to the product.

Yanoff+ may be used for 15 days with all but a small handful of features disabled (those are only for registered users; see the FAQ) at which point it reverts to the same functionality as Yanoff-. If you are new to Yanoff, we suggest you start with Yanoff- and, once you are familiar with how it works, try out the expanded features of Yanoff+. That way you won't waste any of your 15-day trial period on the initial basic learning curve.

HOW IT WORKS:

Here's how all versions of Yanoff work....

(Re)Defining Servers:

At least 1 SMTP (email) server is (re)defined by the user at server #0. At least 1 NNTP (Usenet) server is also defined. There are 2 "dummy" servers defined the first time Yanoff is run and the user must modify these for his servers. Then, using some other method to determine what newsgroups exist (Outlook Express or groups.google.com are 2 good ways), the user "Subscribe"s to the newsgroups he desires.

NewsArts.pdb; message storage:

All messages (or portions thereof) are stored in a single message database (NewsArts.pdb). This database contains both polled (incoming) and user-created (outgoing) messages; they are all stored in the same format and in the same place (with a flag reflecting which type it is). Each subscribed newsgroup (and each of the !Drafts, !!Outgoing, !!!Sent, and !!!!Lost newsgroups) has its own private database (NewsGroup-<#>.pdb) whose every entry contains a reference to an entry in the article database. More than 1 newsgroup may reference the same article (each article has a multi-index counter indicating the number of newsgroups which reference it).

NewsGroup*.pdb; Newsgroup DBs:

Messages first get saved into the article database (polling or creating) and a second stage (called indexing) adds entries to the appropriate newsgroup database(s) referencing the article (and increments the article's multi-index counter). When creating messages, these 2 stages happen immediately together but when polling, first all articles are polled, then all articles are indexed.

DB Synchronicity

These 2 (sets of) databases should remain in sync at all times but there is always potential for them to fall out of sync. There are 3 types of asyncronicity: unidexed messages, unsubscribed articles and bogus message references. There are several tools to handle these possibilities. The best and most time-consuming is "Re/Index-All". This destroys all newsgroup databases and recreates them by reindexing every single article.

One additional benefit of "Re/Index-All" is that it reassigns the numbers on the end of the "NewsGroup-<#>" DBs so that they match the order they appear on the first, main screen (the first one in the list gets #1, and so on). Initially, they are assigned as each newsgroup was subscribed (i.e. !Drafts got #1, !!Outgoing got #2 because they are automatically crated; the first user-subscribed newsgroup got #3 and so on). After a "Re/Index-All", the newsgroup names map directly to the "NewsGroup-<#>" DBs. If the order is later changed, ("Rearrange NGs"), the NewsGroup-<#> numbers do not change so they will again not directly map.

The "Re/Index-All" is a very long-duration operation and many times, a shorter, quicker operation will suffice if one is certain one understands what problems do and do not exist.

Unindexed Messages

An unindexed message is one which exists in the article database but is not referenced by any newsgroup databases. The "Set Next Re/Index" option looks for a group of messages in the article database which are not referenced in at least 1 newsgroup database. It does this by scanning all newsgroup databases keeping track of the highest message number referenced. If this number is less than the number of messages in the article database, then those messages above that number are, obviously, not referenced. Once the "Set Next Re/Index" is run, a "Re/Index-Unindexed" operation can be initiated to index the unindexed messages into their newsgroups. The "search" button on the "Re/Index" dialog combines these 2 steps into 1. If unindexed messages are found, an automatic "Re/Index-Unindexed" will be performed; if not, it will do nothing.

Unsubscribed Articles

An unsubscribed article is one which somehow exist in the article database but whose "Newsgroups:" header does not contain any subscribed newsgroups. In this case the article will be indexed into the "!!!!Lost" newsgroup. This special newsgroup is auto-subscribed when it is needed (if it does not already exist) and auto-unsubscribed when all articles inside it have been deleted. In GPL Yanoff, this article will never appear and be inaccessible with 1 exception: it will be purged appropriately by the "Purge Old Articles" function. This situation can be created with the following steps.

Subscribe to a newsgroup and download some articles.
Abort the index operation (for GPL Yanoff, this can only be done by resetting the device).
Unsubscribe the newsgroup and continue indexing (for GPL Yanoff, this means a "Re/Index-All").

Bogus Message References

A bogus message reference is a reference in a newsgroup database to message which does not exist in the article database. The "Fix NG Corruption" operation looks for and eliminates such references. If a newsgroup is found to contain any bogus references and it is using thread caching, this operation will delete the thread cache and flag it to be automatically recreated. If the re-cache operation is canceled, the newsgroup will have it's caching preference turned off. This must be done because we have no way to know to which thread the missing message belonged (we can't check it's Subject nor its Message-ID nor its References because it no longer exists) so the thread cache is also corrupt. A thread cache must be 100% accurate or it is useless and will result in threading mistakes.

Indexing and Thread Caching

So what is thread caching? Here is a better question: "What is a thread?" A thread is a time-sequenced conversation where messages are presented in genealogical order. So then threading is the operation of ordering messages so that they are in such a sequence. Every article has a unique "Message-ID". Furthermore, if it is a followup of an earlier message, it should also (but many times does not) have a "References" header which, at a minimum, should identify its root ancestor (the first message in the thread) and also its father. Armed with this data (assuming it is correct which is not always the case), we should be able to order messages in a thread so that those which are responses to previous messages are indexed after their predecessors.

Obviously threading requires that every message already in a newsgroup be examined to see if it is related to any newly-arriving (being-indexed) messages. A message of a "new" thread (no currently indexed messages) must check every currently indexed message before determining that it is not related to any of them. This is the most costly case because all messages must be examined. Even if it turns out the new message is a member of a pre-existing thread, we still don't know where the thread begins so every message must be checked until we find the first relative. The more messages that are already indexed in a newsgroup, the longer these examinations take. Very quickly this becomes intolerably slow.

This is where thread caching saves the day. For a relatively small cost in RAM, Yanoff+ (not Yanoff-, nor GPL Yanoff) will maintain an sorted list of threads which currently exist in the newsgroup along with the position of the bottom-most message and the total number of messages in each thread. When a message is indexed, a single search into the sorted thread cache will tell us not only whether there is a thread (if not, just add the message at the bottom and add a new thread to the cache) but, if so, also where the thread begins (or rather, where it ends) and how long it is. Thread caching turns an logarithmically deteriorating threading operation into a nearly linear one.

There are 2 types of thread caching: Subject and Reference. The question really is, "What constitutes a thread?" The reason this is an issue is because many users and software do not properly set (or even deliberately discard or corrupt) the "References" header. This carelessly anti-social behavior is actually shockingly common. If this is the case, how can one tell whether a message is part of a thread? One must compare the "Subject" header. The best threading will result when both types of caching are used but Subject-only caching is usually very acceptably accurate (Reference-only caching will probably never do as good a job but it is an option regardless).

Crossposting and the History DB

Now that we've broached the topic of threading and Message-IDs, let's talk about crossposting. Crossposting is when an article's "Newsgroup:" header contains more than 1 newsgroup. If a user is subscribed to 10 newsgroups and a single article has all 10 of them in its "Newsgroup:" header, it is silly to download (wasting time) or store (wasting RAM) 10 copies of that same article. Instead what is done is that before an article is downloaded, its Message-ID checked against an alphabetized list of previously-downloaded articles stored in another database (NewsHistory.pdb). The "Check MIDs" Poll Preference controls whether this is done. If the article is not found, it is downloaded and it's Message-ID is added to the database. The "Store MIDs" Poll Preference controls whether this is done. Each user must decide which is more wasteful of RAM, duplicating articles or maintaining the history database. If RAM is not a concern, then do use the history facility as it provides time savings both during polling and threading.

Obviously, the history database will grow ever larger unless old entries are purged. The "Purge MID" operation does exactly this and MUST be performed on a regular basis to keep RAM waste to a minimum. Turning off both poll preferences mentioned earlier completely disables this functionality which eliminates the history database but allows the bloating of the article database (if crossposting is common in subscribed newsgroups).

An interesting feature of purging is that it will also purge any entries which (will) "happen" in the future. So if you find you want to remove some entries from the history database so that you can go back and repoll articles recently polled, just set the date on the PDA back to the appropriate day and purge. All entries with birthdays later than "today" will also be purged in addition to those older than the timeframe specified. A similar function is the "Repopulate NewsHistory". If somehow the entire MID history DB gets purged or deleted, a partial recreation can be achieved by the use of this function. It takes every currently existing article and inserts it's MID into the history DB.

Note that for conduit users, NONE of this threading/history discussion applies because those articles arrive on the PDA pre-threaded by the software on the host computer. Also note that the host computer does a nearly perfect job of threading (being that it has, essentially, infinite battery and CPU power) whereas threading by Yanoff on the PDA takes several reasonable shortcuts to improve speed but which also allows some errors. Yanoff's threading algorithm is very much a work-in-progress. Whenever any other process other than New Yanoff updates the articles (e.g. the conduit), the thread caches become invalidated and will be automatcially deleted.

Go back to the New Yanoff HomePage.

Go back to the SonLight Software HomePage.