Summary


Enter and verify results for an event.


Contents


	Summary
	Contents
	Introduction
	Quick start
	Getting started
	Selecting emails
	Selecting text from emails
	A Trivial League
	A Real League - Southampton League
	A Real League - Portsmouth District League
	A Tabular Reported League - Portsmouth District League
	Monthly Rating


Introduction


This program extracts results from documents in emails sent by sources.

The menu bar has four items: Sources, Documents, Tools, and Help.

Sources and Documents provide access to the main functions of the program in the order they are used when dealing with an event.

Sources is used to specify selection rules for emails and extract emails from an email client's mailbox or mailboxes.

Documents is used to extract data relevant to chess game results from emails, validate the data, save any changes made to remove errors or resolve incorrect interpretation of data, and update a database to be processed using the Results item.

Tools provides access to an editor for the URLs to access the ECF database.  Also a font selector which is not implemented.

Help provides access to this file and other files describing the program.

The program is designed to extract results from emails, but does allow direct input by 'copy and paste' or even typing.


Quick start


If you just want to press buttons and see what happens.

Use sample_league_1.txt rather than the files used in Getting started: there are eight players to deal with, not around fifty.  Add the line:

competition=Division 1

to the results extraction configuration file that gets created.

Right-clicking the pointer will show a popup menu if relevant.

The most important actions are:

Email selection in the Sources menu
Results extraction in the Sources menu
Open in the Documents menu
Generate on the Edit tab
Save on the Edit tab


Getting started


This section describes using the program with realistic, but contrived, results in the 'Help | Samples' menu item.

Create a new event directory using 'Sources | Result extraction'.

The directory name must end with two digits conventionally indicating the year in which the event starts.  'Anytown12' matches the sample used first in this description.

A new window is opened showing the default configuration for the event.

Add the competition names within the event using 'Actions | Option editor', and 'Update' in the Text editor dialogue.

In this case, referring to sample_anytown12.txt, that is two lines:

competition=Open
competition=Major

Save the configuration by 'File | Save'.

That is all the configuring for this event so close the window by 'File | Quit'.

Open the event documents using 'Documents | Open' on the Anytown12 directory.

Several informative messages will appear saying what has been created: if the thing exists the message does not appear.  After accepting the messages increase the size of the application's window to see the three panes within the window.

Cut and Paste sample_anytown12.txt into the area between the highlighted lines.

The two dates after the event name, 'Anytown Chess Congress', are the start, '12 may 2012', and end, '14 may 2012', dates of the event.  Game dates given later must be on or between these dates.  (When results are extracted from emails you will have to type the event name and start and end dates into this area.)

The dates on the lines starting 'Open' and 'Major' are the dates in round order of the games in the six rounds of the two swiss system tournaments.

The other lines are copy-typed from the pairing cards.

ECF grading codes are notable by their absence, even though they are usually on the pairing cards.

Validate the results by 'Generate'.

A register of player identities appears in the top-right pane and a list of game results appears in the bottom-right pane.  There are no errors in either pane so saving a tabular version of the source document is allowed: note this does not imply the results are correct, just that they are valid.

The player identities are sorted by name.  For example 'Alf' is a name but the player identity is 'Alf Open 1 in the Anytown Chess Congress from 12 may 2012 to 14 may 2012'.

It is possible to arrange things so that the grading code appears within the brackets reserved for the club or place assocation hint.

Save the event using 'Save' if you have done editing to the results in the pane in the left-hand column: otherwise you will have to do them again next time you open the event documents.

The only way results can be modified, added. or deleted is by editing the data in the pane in the left-hand column.

Use 'Toggle' to see the data extracted from emails and the data after all edits ever done side-by-side.  The former is an empty pane in this example.

Quit using 'Sources | Quit'.


Selecting emails


The rules for selecting emails are like:

# pdl.ems email selection rules
mailboxstyle mbox
mboxmailstore ~/SelectMbox12/PDLFix2012-2013.mbs
mboxmailstore ~/SelectMbox12/PDL2013-06-04supplement.mbs
earliestfromdate 2012-09-01
mostrecentfromdate 2013-06-30
emailsfrom no.body@invented.isp.name
emailsfrom a.n.other@invented.isp.name
outputdirectory ~/SelectMbox12/mailbox
exclude 20130423256364+0130no.body@invented.isp.name

It says:

Get all emails sent by no.body@invented.isp.name or a.n.other@invented.isp.name between 2012-09-01 and 2013-06-30 inclusive, excluding any named in exclude lines, from the mbox style mailboxes ~/SelectMbox12/PDL2013-06-04supplement.mbs and ~/SelectMbox12/PDLFix2012-2013.mbs.  Put them in the directory named ~/SelectMbox12/mailbox, one email per file.  The file names are like the one in the exclude line, concatenating the date and time sent and the sender's email address.

The rules above are in sample_emails_mbox.txt

Email clients should be able to export emails in mbox format even if they do not use the format otherwise.

Manage the selection rules as follows:

Open the event's email selection configuration file using 'Sources | Email selection' and 'File | Open'.  The default name is collected.conf.

Display the selected emails using 'Actions | Show selection'.

A summary of the selected emails is displayed in the bottom-left pane and the full email text is displayed in the right-hand pane.

Right-click in the bottom-left pane will scroll the right-hand pane to show that email in full.

Right-click in the right-hand pane will add an exclude line in the top-left pane.

Right-click in the top-left pane will remove an exclude line.

'Actions | Option editor' will display a window for editing the file by typing.

'Actions | Apply selection' will update the output directory with the selected emails.

Note that 'Apply selection' will not change any files already in the output directory, nor will it allow emails dated between the oldest and newest ones already there to be added to the output directory.

Day-to-day management of emails using email client filters may place emails in a way that upsets 'Apply selection'.  The exclude line is useful to take no notice of the addition of emails to a mailbox.


Selecting text from emails


The results extraction configuration file has two types of rules:

to select emails and attachments from which text will be extracted
to extract text from the selected emails and attachments

The extract text rules will give competition names, and possibly some team name translations and, or, sets of regular expressions to extract text.  All these are ignored in this section.

The Trivial League section shows a reporting style which does not need regular expressions.  Things are a lot easier when regular expressions are not needed in the configuration file. (Repeat the 'lot' qualification as many times as you like.)

The select email rules are like:

# pdl conf file
collected mailbox
extracted extracts
textentry textentry
text_content_type text/plain
earliestdate 2012-09-01
mostrecentdate 2013-06-30

# competition names removed

# team name translation removed

# match results regular expressions removed

# fixture list regular expressions removed

ignore 20121030256364+0130no.body@invented.isp.name

It says:

Extract text from all files in directory mailbox whose email was sent between 2012-09-01 and 2013-06-30 inclusive, picking the parts of each email with content type text/plain.  Put the extracted text in files with the same name in directory named extracts.  Pasted or typed text that is not associated with any particular email can be put in a file named textentry.  Files names in ignore lines will not be put in the directory named extracts.

The rules above are derived from sample_league_ports.txt

Manage the text selection rules as follows:

Open the event's results extraction configuration file using 'Sources | Results extraction'.  The default name is event.conf.

Display the selected emails using 'Actions | Source emails' or 'Actions | Extracted text'.

A summary of the selected emails is displayed in the bottom-left pane and the full email text, or the extracted text, is displayed in the right-hand pane.

Right-click in the bottom-left pane will scroll the right-hand pane to show that email, or it's extracted text, in full.

Right-click in the right-hand pane will add an ignore line in the top-left pane.

Right-click in the top-left pane will remove an ignore line.

'Actions | Option editor' will display a window for editing the file by typing.

'Actions | Update' will update the extracts directory with the text extracted from the selected emails.

Note that 'Update' will not add any files to, or change files already in, the extracts directory if the requested update implies changing files already in the extracts directory.  The assessment of change ignores any editing done to the extracted text since it was put in the extracts directory.  In other words, the test is one of consistency between the email managed by the email client and the text in the extracts directory.


A Trivial League


Create the Anytown League for 2011-2012 using 'Documents | Open' and paste the contents of sample_league_1.txt into the textentry area.

Create the results extraction configuration file for this event using 'Sources | Result extraction', and add the line:

competition Division 1

to the configuration file using 'Actions | Option editor'.

Save the configuration file and quit using 'File | Quit'.

Validate the data using 'Generate'.

It should look perfect.

Try some, preferably all, the other sample_league_n.txt files, to see what does and does not go wrong with some variations related to valid reports when some or all of the games have not been completed.

Note in particular the cases where 'Frank Colin Jones unfinished' is got right (sample_league_4.txt) and wrong (sample_league_3.txt).

Change the line in sample_league_3.txt to 'Frank unfinished Colin Jones' and see it is got right now.

The program knows 'Frank Colin Jones unfinished' should contain two names but in sample_league_3.txt there is insufficient context to decide what the names are.  So the program guesses.  It gets two common possibilities right but, in general, it will guess wrong.

Back to sample_league_1.txt.

Add explicit board numbers to boards 2 and 3 in 'Yourstreet 2 Ourstreet 2', and to boards 1, 3, and 4 in 'Ourstreet 3 1 Yourstreet'.  For example:

Zoe 0.5 George

becomes

4 Zoe 0.5 George

and so forth.

Validate the data using 'Generate'.  It should look perfect.

Adding explicit board numbers to any of the other games makes the validation response look a real mess.

The 0.5 in '4 Zoe 0.5 George' always means drawn game so the 4 must be a board number.

The '½' in '4 George Zoe ½' is interpreted as part of a possibly mis-typed match result, rather than a game result on board 4 of a match.

Imagine the game result interpretation with the mis-type in the 'Ourstreet 3 1 Yourstreet' match result.  Then all the games look like part of the 'Yourstreet 2 Ourstreet 2' match.  Remove the explicit board numbers which made the mess: now all validation says is 'match score inconsistent' suggesting the problem is less severe than it actually is.


A Real League - Southampton League


The match results are now reported match-by-match in emails generated by the league's website but each report is very similar in structure to that used in the weekly articles.  The configurations for both versions of weekly emails remain in the samples_* files but are commented out as needed.

Southampton League match results are reported for grading using the weekly articles published in the Evening Echo.

The reports have a well-defined structure which is handled using the regular expressions included in the result extraction configuration file copied to sample_league_soton.txt.  The dates are for the 2011-2012 season which is one of the seasons used to test them.  (The regular expressions replaced Python code in use for eleven years since 2003.)

The fixture list was sent in an Excel spreadsheet attachment so the line:

ss_content_type application/vnd.ms-excel

was added.

The results and fixtures are tied together by the sets of lines like:

competition Div 1
section_name :division 1:Div 1
replace :1:Div 1

for each competition.

At time of writing, September 2014, the articles for the 2013-2014 season are available at:

www.sotonchessleague.org.uk/content/detailed-league-reasults-2013-14

but location and format are liable to change at any time.

The main problem is typing errors which transpose, add, or remove characters vital to the structure such as full-stops and commas.

The regular expressions deal with the web page content as successfully as they deal with the content of the emails used to send each article.  But locating the errors in the extracted text is a lot, lot, easier with the emails because each email gets it's own area delimited by the highlighted lines: right-click on the right-hand panes will scroll the left-hand pane to show the start of the source email if a unique email is associated with the pointer location.  You may care to paste the web page into the left-hand pane and try it a little.


A Real League - Portsmouth District League


The match results are now reported match-by-match in emails generated by the league's website but each report is very similar in structure to that used in the emails for the Southampton League (above).  The configurations for both versions of weekly emails remain in the samples_* files but are commented out as needed.

Portsmouth District League match results are reported for grading weekly using a *.txt file which was introduced a few years ago to assist populating the Hampshire Chess Association website with the league's results.  I happily took this feed for grading to avoid doing a Microsoft Word to *.txt conversion manually on the half-season reports used previously.

The reports have a well-defined structure which is handled using the regular expressions included in the result extraction configuration file copied to sample_league_ports.txt.  The dates are for the 2012-2013 season which is the season used to test them.  (The regular expressions replaced Python code in use for five years since 2009.)

The results and fixtures are tied together by the sets of lines like:

competition Div 1
section_name @div: 1@Div 1
section_name @division 1@Div 1

for each competition.

These source documents are not publically available.  Sample match reports and the fixture list, as they appear in the left-hand pane, are copied to sample_report_ports_match.txt and sample_report_ports_fix.txt with permission but I have changed all people's names anyway.

The match report is basically unreadable by eye once it has been through all the software but the structure has been preserved and the program has no problem with it.  Blame 'universal newlines' for transposing one row per game to twelve rows per game.

Typing errors which affect the structure are very rare indeed so it's horrible look does not matter.

You may care to paste the samples into the left-hand pane and try it.  You will need to have selected two emails, whose content will be replaced by the sample fixture list and sample match report: pasting it all into the textentry area will not work.

At time of writing, September 2014, the Division One results for the 2013-2014 season are available at:

www.bognorandarunchessclub.co.uk/division-1.html

but location and format are liable to change at any time.

You may care to paste the web page into the left-hand pane and try it.  The textentry area is fine for this source.  But you will have to edit the game results from:

4.A.N.Other 0-1  N.O.Body

to:

A.N.Other 0-1  N.O.Body

to get the validation to succeed.  Remove all the board numbers in other words.

At time of writing, September 2014, results for earlier seasons, are available at:

www.hampshirechess.co.uk

but location and format are liable to change at any time.

You may care to paste a web page of results into the left-hand pane and try it.  The textentry area is fine for this source.  The page of 2009-2010 results for Division One looks like it needs less editing than most:

Make sure all the board one game results are on the line following their match result.

Remove all the grades, including the 'u's meaning ungraded.

Remove the '(' and ')' characters surrounding the match dates.

Validition is successful but there is a team called 'Fareham A 22nd Jan' for example.  The date format is not recognised and the program has taken '2010' to be a round number.  Remove the 'nd' and the date gets recognised and is used as the match date.


A Tabular Reported League - Portsmouth District League


Support for reporting results and fixtures in tabular form was introduced when the Portsmouth District League and Southampton League websites gained the ability to generate csv downloads.

The feature has never been used for these leagues because error correction is more convenient with the match-by-match emails: it is easier to link and locate a detected error with the source emails than in one csv file.

The sample_tabular_* files are an example, visible in 'Help | Samples', based on a Portsmouth District League season.  The configurations for both versions of weekly emails remain in these files but are commented out as needed.

The feature may never be used for these two leagues because both websites are able to include ECF codes.  The switch to monthly rating, from grading every six months, is a strong incentive to have the websites generate the ECF submission files directly.


Monthly Rating


Assumptions may be invalidated by the introduction of monthly rating and effects are mentioned here as met.

Swiss tournament tables may be submitted when only one round has been played but a line with <name> <date> cannot be taken as the round date of a swiss tournament.  This affects tournaments where a round may take place over several weeks.  If the swiss tournament table provides the game results after one round only, remember to add a currently redundant second round date to force swiss tournament table processing.
