GWR Journal Image Index

Andy Keane · February 18, 2023

This thread is for those who would like to engage in a community effort to collect together all the picture and figure captions in the GWRJ into a searchable index.

If we manage to do the GWRJ we could move on to other sources of photos, drawings, plans etc.

To begin with I would like to try and agree on the software we might use to store the efforts of those who are disposed to help.

All I imagine people doing is keying in (or perhaps using some form of OCR scan) the captions along with GWRJ issue number, date and page number.

This might sound like a very tedious task but actually taking the time to study the photos in the Journal can of itself be quite interesting and throw up many thoughts.

If we managed who did what, a group could make steady progress and then we could make it more widely available.

Andy

Andy Keane · February 18, 2023

Those who have expressed an interest at the outset are @longchap @Harlequin @Neal Ball @Mikkel

If you would like to try and do this please join with your thoughts on this thread.

Andy

Andy Keane · February 18, 2023

By my count there are 103 issues of GWRJ plus the initial Bumper Preview and Cornish Special.

There are then just shy of 600 articles in them!

Andy Keane · February 18, 2023

What I would suggest is dividing up the issues and the each doing some before we try and merge them together?

This leaves people free to use whatever tool they like best while we plan how to house the results.

I will start just entering into Excel.

Andy Keane · February 18, 2023

Harlequin · February 18, 2023

I suggest using one of the online collaboration tools, which are designed to allow teams of people to all work on the same documents, avoiding all the hassle of maintaining and aggregating separate copies.

The prime candidates would be Google Docs and Open Office (and I guess there's an Apple equivalent) but Google Docs is the platform-neutral choice and has very good OCR built-in.

All contributors would need is a browser and Andy would assign them permission to view, comment or edit the shared document as required.

Andy Keane · February 18, 2023

Seems like a sound plan if people are happy with it.

I am also wondering if there is any pen like device that can be run over the text to read it.

If such a thing was not too costly it might help a great deal. Phil do you know what the Google OCR will talk to?

now I look I see they cost about £75

hmm

Edited February 18, 2023 by Andy Keane

Harlequin · February 18, 2023

17 minutes ago, Andy Keane said:

Seems like a sound plan if people are happy with it.

I am also wondering if there is any pen like device that can be run over the text to read it.

If such a thing was not too costly it might help a great deal. Phil do you know what the Google OCR will talk to?

now I look I see they cost about £75

hmm

You just need a SmartPhone. Snap the page, Upload to Google Docs, then <remember how to use their UI> and it says, "Would you like to extract the text from this photo?".

Andy Keane · February 18, 2023

3 minutes ago, Harlequin said:

You just need a SmartPhone. Snap the page, Upload to Google Docs, then <remember how to use their UI> and it says, "Would you like to extract the text from this photo?".

Can you photo a whole page with a mixture of plain text, images and captions or do you need to crop down to the bit you want?

Harlequin · February 18, 2023

Just now, Andy Keane said:

Can you photo a whole page with a mixture of plain text, images and captions or do you need to crop down to the bit you want?

The whole thing. It will give you the text in blocks and ignore the images. I must admit it can get confused by tables and lists - the text is all extracted OK but it doesn't interleave the parts in the right order sometimes.

I think we need to do a test! If you photograph a sample page and post it here (as high res as you can), I'll upload it to Google Docs and post some screenshots of the OCR process and the end results.

Andy Keane · February 18, 2023

Like this?

Neal Ball · February 18, 2023

Are we documenting EVERY image?

Neal Ball · February 18, 2023

Plus!

We need to include the issues of Western Times beyond GWRJ as well. As it seems they are continuing in the same vein.

ikcdab · February 18, 2023

Sounds a useful project, but two things spring to mind.

1, don't just start. You do need to agree what metadata you are capturing. Date of picture, issue number, page, loco number, location etc. If all start capturing different data then it won't merge together and won't be useful.

2. The boring point, if you are capturing the exact text and recording it in a database, then is that ok under the copyright? Most magazines, books etc have some sort of statement starting "no part of this publication may be replicated or stored......." I've no idea if GWRJ does this. For your own home use you would be ok, but if you start making it publically available, then it might infringe it.

Ian

Neal Ball · February 18, 2023

12 minutes ago, ikcdab said:

Sounds a useful project, but two things spring to mind.

1, don't just start. You do need to agree what metadata you are capturing. Date of picture, issue number, page, loco number, location etc. If all start capturing different data then it won't merge together and won't be useful.

2. The boring point, if you are capturing the exact text and recording it in a database, then is that ok under the copyright? Most magazines, books etc have some sort of statement starting "no part of this publication may be replicated or stored......." I've no idea if GWRJ does this. For your own home use you would be ok, but if you start making it publically available, then it might infringe it.

Ian

Sadly yes it’s probably the boring bit… Sadly also it’s probably true….

I take the view that posting the odd photo on here from book “x” dated 1975 is kind of ok as the book is possibly out of copyright…. The chances are that the photo isn’t- as it’s now owned by the NRM et Al. But that’s a chance we take.

However, if we are just compiling an index with names, dates etc. I really can’t see that we are hurting anyone.

As has been said further up the page, we need to collate what we have rather than re-inventing the wheel.

Ive got a busy few days ahead, but will post a screen shot of what I have done so far to compare it with others.

Andy Keane · February 18, 2023

This is the part of my full index of articles in GWRJ - happy to share with anyone who wants the full thing (just short of 600 entries)

Andy Keane · February 18, 2023

8 minutes ago, Neal Ball said:

Sadly yes it’s probably the boring bit… Sadly also it’s probably true….

I take the view that posting the odd photo on here from book “x” dated 1975 is kind of ok as the book is possibly out of copyright…. The chances are that the photo isn’t- as it’s now owned by the NRM et Al. But that’s a chance we take.

However, if we are just compiling an index with names, dates etc. I really can’t see that we are hurting anyone.

As has been said further up the page, we need to collate what we have rather than re-inventing the wheel.

Ive got a busy few days ahead, but will post a screen shot of what I have done so far to compare it with others.

I have written a couple of text books with John Wiley. They were full text dumped onto the web within days and when I asked Wiley's what they planned to do they just shrugged! So no publisher is going to say anything about an index - especially for something no longer available in print.

Harlequin · February 18, 2023

Google Docs OCR Test

The method is:

Upload photo to Google Drive
Right click on photo in Google Drive and select "Open with Google Docs"

That does the OCR and creates a doc containing both the image and the transcription of the text. Here's the result for the sample image:

Quote

307

e vicinity

cr.

The 'tops

located a

paration fireman

he 'tops the foot

ler, just side valve el feeder % cut-off quadrant

uld easily

um pump placed a mountings,

sed waste

1, produc-

the bucket

= just right had a few

moving up bled us to 5 we might

as so much ight up to

month.

Departure from No. 9 platform, with Old Oak 'Castle' No. 4037 The South Wales Borderers at the head of a train for Paddington c.1938. This engine was rebuilt from 'Star' class engine Queen Phillipa, and was renamed after the famous Welsh regiment in March 1937. The official renaming was carried out by the regiment's Colonel at a ceremony at Paddington station the following G. H. SOOLE following 9.30 a.m. service to Paddington (7.50 a.m. Taunton) was a very different beast, as an extra vehicle was added at

box, and then reversed down towards the waiting train. As we drew level with the Weston engine, the driver shouted across. "12 for 408 tone" be old

Obviously, the text clipped by the edges of the image is incomplete and pretty senseless, (but it is a pretty accurate transcription!)

I highlighted the image caption in Bold and you can see that it's nearly perfect. It just got confused by the final word "month" standing alone.

Miss Prism · February 18, 2023

Just to point out that John Dolan's excel sheet (although it only goes up to issue 67) is probably a good base starter for this project.

Compound2632 · February 18, 2023

A number of bookshops carry backnumbers of GWRJ and other WSP titles - in other words, a good few issues are not yet out of print - so might be very discontent at digital page images becoming freely available. Furthermore, you have to consider that the copyright in the photographs and some other material will lie with others.

An index to images, however, sounds a useful thing but what information are you going to include in each image caption? Often the thing I'm hunting for is not the subject of the photo but some detail lurking in the margin.

Have you seen the searchable index hosted on Western Thunder?

https://www.westernthunder.co.uk/gwrji/index.php?s=stations&t=tags

Edited February 18, 2023 by Compound2632

Andy Keane · February 18, 2023

I have a complete index of all the articles, nearly 600 in all but as far as I can tell nobody has indexed the images.

Clearly the captions don't tell all the story but they do contain lots of info that would be useful in an index.

And I completely agree about not posting page scans etc.

3 minutes ago, Compound2632 said:

A number of bookshops carry backnumbers of GWRJ and other WSP titles - in other words, a good few issues are not yet out of print - so might be very discontent at digital page images becoming freely available. Furthermore, you have to consider that the copyright in the photographs and some other material will lie with others.

An index to images, however, sounds a useful thing but what information are you going to include in each image caption? Often the thing I'm hunting for is not the subject of the photo but some detail lurking in the margin.

Have you seen the searchable index hosted on Western Thunder?

https://www.westernthunder.co.uk/gwrji/index.php?s=stations&t=tags

22 minutes ago, Miss Prism said:

Just to point out that John Dolan's excel sheet (although it only goes up to issue 67) is probably a good base starter for this project.

I can't get your link to work @Miss Prism but the spreadsheet sounds interesting

Andy Keane · February 18, 2023

I have now found John Dolan's index and it is certainly useful but not quite what I have in mind which is focused on photos, drawings and track-plans and their captions.

Maybe the two could be merged.

longchap · February 18, 2023

3 minutes ago, Andy Keane said:

27 minutes ago, Miss Prism said:

I can't get your link to work @Miss Prism but the spreadsheet sounds interesting

I received the following message when attempting to download:

File not downloaded: Potential security risk.

GWR_Modeller · February 18, 2023

Hi, Are you aware of this site?

https://www.steamindex.com/gwrj/gwrj1.htm

All 13 volumes of GWRJ fully indexed, captions etc. The only issue is each volume is a seperate file.

Regards, Paul

kevinlms · February 19, 2023

5 hours ago, Andy Keane said:

I have now found John Dolan's index and it is certainly useful but not quite what I have in mind which is focused on photos, drawings and track-plans and their captions.

Maybe the two could be merged.

There used to be an index on various photos and the like, from a range of books and magazines. IIRC it was on the GWR only, but I could be wrong on that.

However, I haven't seen it for a decade or more. I assume that it is now lost?

GWR Journal Image Index

Recommended Posts

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in