Photo of Al Kugel (right) Receiving an Award 2001
Al supplied many of his exhibits and articles for posting on the MPHS website. After his passing, the MPHS has started the project to publish scans of all available exhibits that he created.
For just a taste of what we hope to offer, see an example of the posting of one of Al's exhibits at:
The Expansion of Italy Following World War I 1918-1926 .
The 'Kugel Room' main page is here.
After Al's passing, Dann Mayo visited Al's Illinois home and scanned thousands of exhibit pages in color, most with the original postal items still mounted on the pages. In addition, a large number of older exhibits had been photocopied (in black and white). Dann auto-scanned these into PDF format, and they are being published on the website as well.
Exhibits are unlike other published products that appear on web pages. The usual physical exhibit is made up of (in the U. S.) 8 1/2 x 11 sheets of paper, usually placed 16 at a time in a glass-fronted frame. Visitors to philatelic events, are encouraged to walk around the frames, viewing the exhibit content.
To bring this experience to the viewer's computer, tablet, or phone, it is necessary to display the page contents as whole, one page at a time.
Historically, the vast majority of philatelic exhibits were hand-created by the author, with the actual philatelic items (covers, photos, illustrations, etc) physically positioned on the page. Text was originally hand-written, later typewritten, and finally, created with word processing software.
At this point in time, then, we must work with ONLY images of the physical pages. In many cases, after scanning has taken place of the pages, the exhibit is "broken down", and the postal items are sold off.
Much experimentation has brought us to the point where exhibits are best displayed as:
Note that several potential display formats are NOT included above:
Ideally, of course, it would be great if we had individual scans of all the original postal items, with the accompanying text and flow supplied separately. In that case, it would be possible to "re-create" the exhibit as a series of web pages, as well as the rejected formats described above. Alas, the huge majority of exhibits simply don't exist in that format.
Only a rough outline will be provided at this time.
After some expermentation, it has been very useful to shrink the bandwidth requirements of the embedded exhibit page images. Using the JPG images with quality 50 worked well enough, but the bandwidth savings of the newer WEBP format are remarkable. It may take some experimentation to determine the quality setting that is best for a particular exhibit in the WEBP format. Size savings of 1/3 to 1/5 have been observed. At a certain point, lower WEBP quality causes color "blotching", particularly in the yellow colors that we associate with older postal history paper items.
ImageMagick is the "go to" tool to convert the original JPG scan images to WEBP. Again, you should review the viewability of the images before finishing this conversion/quality reduction. These WEBP images can then be used in the web page version of the exhibit. The speed of loading WEBP is again, quite remarkable.
It is imperative that we DO NOT upload any PDF (or other web content) that does not contain "metadata". If we store and allow download of "bare" files from the website, they can easily become "orphans". Search engines are much more sophisticated these days, and they may well be able to extract keywords and other search terms from PDF files. This action is compounded by the fact that many of the PDF files from the Kugel exhibits are PDF's made of scanned images of exhibit pages. So the "web crawlers" have to extract content from PDF files, and then extract possible text from the images.
We can help the internet to better find our material, such as the Kugel exhibits, if we insert metadata into the PDF files, before they are posted.
More CRITICAL is the necessity of identifing the author, title, etc from PDF files that are downloaded from our website. It is easy to estimate that many millions of PDF files are present on the web that have NO identifying information whatsoever (and are therefore, orphans). Users of such a stand-alone PDF file would not know even the most basic information, including such critical matters as Copyright, ownership, etc.
Below is an example of the most minimal type of metadata that we MUST insert into PDF files, before they are uploaded to the website. There is a keyword at the front of each line to indicate the metadata item. The content of each field is on a single line of text.
Subject;This exhibit illustrates through contemporary postal material the decline and eventual collapse of the once-great Ottoman Empire in the first two decades of the 20th Century. It does this by showing a collection of postmarks used in former Ottoman territories that became independent or were annexed by other countries as a result of Turkey being on the losing side in three consecutive conflicts that occurred between 1911 and 1918. It should be viewed as a survey of examples of markings from as many different locations as feasible rather than trying to show all of the different types of markings from a limited number of places. It is organized both chronologically and geographically. Author;Al Kugel Keywords;ottoman,ottoman empire,20th century,postmarks,war,military postal history,postal history, military postal history society,MPHS,Turkey,collapse,collapse of empire,1911,1918 Title;Ottoman Forerunner Postmarks of the 20th Century Creator;Military Postal History Society
While the syntax shown above will work with some products, it is most likely that the person preparing Kugel exhibit scans will, since it is universal, use the "pdftk" toolkit. [Bob Swanson has written some Java code (which will run on any computer) that uses a free product called PDFBox. That product allows deep setting of metadata within the PDF file.] However, again, it is more likely that the person preparing exhibits will have more access to PDFTK. The syntax used by PDFTK is very similar to the above.
As an illustration, the text below is the metadata setup for PDFTK, in this case used to mark up a PDF file of an auction listing. This text is placed on a file, and that filename is passed to PDFTK as input to the process of metadata annotation.
InfoBegin InfoKey: Subject InfoValue: Lot listing for Military Postal History Society Auction 218, closing 28 February 2021. For more information, see: https://militaryphs.org/auctions.html. InfoBegin InfoKey: Author InfoValue: Kelly Horn, Military Postal History Society InfoBegin InfoKey: Keywords InfoValue: war military MPHS military postal history society postalhistory auction lots forsale stamp postal history conflict InfoBegin InfoKey: Title InfoValue: Lot Listing for MPHS Auction 218, Closing 28 February 2021 InfoBegin InfoKey: Rights InfoValue: Property of the Military Postal History Society; contents may be used for research, purchasing, and educational purposes only InfoBegin InfoKey: Creator InfoValue: Military Postal History Society
One of the powerful features of pdftk is it can read in a PDF file, insert metadata into that file, and write it back out into a new PDF file.
Given the text file described above (the first line is "InfoBegin") the following command line will insert that file's metadata description into the original "auction.pdf", yielding a new file "auction_with_metadata.pdf". The metadata description file in this example has been named "meta.txt". It was built with a simple text-only editor, and should contain the lines shown in the description above.
pdftk auction.pdf update_info meta.txt output auction_with_metadata.pdf
Not only can the command-line version of pdftk be used to insert metadata, it can also be used to merge and/or re-order existing PDF files.
For instance, the following command line combines PDF files "one.pdf", "two.pdf", and "three.pdf", creating a new file "all.pdf" that contains the content of the three input files.
pdftk one.pdf two.pdf three.pdf cat output all.pdf
The input files can be placed in any order, if the user so wishes. The following command line combines the files "one.pdf", "two.pdf", and "three.pdf", creating a new file "all.pdf", as above, but the file "three.pdf" is the first in the new output file.
pdftk three.pdf two.pdf one.pdf cat output all_different.pdf
There is a command-line product that will perform an OCR on existing PDF files. The enhanced PDF file can then be supplied to the website as a "searchable PDF". The text content from the searchable PDF can be extracted with TIKA, yet another powerful tool used in processing exhibit files.
When only page images are available, it is still easy to extract the text from them. The tool to use here is called "Tesseract". Again, it is command-line callable and available for a wide variety of computer systems. The extracted text for an exhibit is inserted into a web page and placed on our website for use by: (1) researchers, and (2) search engines. Since the text-only exhibit content is formatted and structured as a normal web page, it should qualify for inclusion in search engine results. Note that many search engines can read the text from images, as well as extract text directly from PDF files of exhibit page images. This is all done by OCR processing, including Tesseract itself. (Note that the "ocrmypdf" uses Tesseract internally.)
Searchable Table of Contents for Kugel's Exhibits
Back to Educational Material Page for the Military Postal History
Page Layout Design Made Possible by: water css
Webpage design by The
Swanson Group
Updated: 22 December 2024