Jump to content
 

GWR Journal Image Index


Andy Keane
 Share

Recommended Posts

  • RMweb Premium

This thread is for those who would like to engage in a community effort to collect together all the picture and figure captions in the GWRJ into a searchable index.

If we manage to do the GWRJ we could move on to other sources of photos, drawings, plans etc.

To begin with I would like to try and agree on the software we might use to store the efforts of those who are disposed to help.

All I imagine people doing is keying in (or perhaps using some form of OCR scan) the captions along with GWRJ issue number, date and page number.

This might sound like a very tedious task but actually taking the time to study the photos in the Journal can of itself be quite interesting and throw up many thoughts.

If we managed who did what, a group could make steady progress and then we could make it more widely available.

Andy

  • Like 2
  • Friendly/supportive 1
Link to post
Share on other sites

  • RMweb Premium

What I would suggest is dividing up the issues and the each doing some before we try and merge them together?

This leaves people free to use whatever tool they like best while we plan how to house the results.

I will start just entering into Excel.

Link to post
Share on other sites

  • RMweb Gold

I suggest using one of the online collaboration tools, which are designed to allow teams of people to all work on the same documents, avoiding all the hassle of maintaining and aggregating separate copies.

 

The prime candidates would be Google Docs and Open Office (and I guess there's an Apple equivalent) but Google Docs is the platform-neutral choice and has very good OCR built-in.

 

All contributors would need is a browser and Andy would assign them permission to view, comment or edit the shared document as required.

 

  • Like 1
  • Agree 2
Link to post
Share on other sites

  • RMweb Premium

Seems like a sound plan if people are happy with it.

I am also wondering if there is any pen like device that can be run over the text to read it.

If such a thing was not too costly it might help a great deal. Phil do you know what the Google OCR will talk to?

 

now I look I see they cost about £75

hmm

Edited by Andy Keane
  • Like 1
Link to post
Share on other sites

  • RMweb Gold
17 minutes ago, Andy Keane said:

Seems like a sound plan if people are happy with it.

I am also wondering if there is any pen like device that can be run over the text to read it.

If such a thing was not too costly it might help a great deal. Phil do you know what the Google OCR will talk to?

 

now I look I see they cost about £75

hmm

 

You just need a SmartPhone. Snap the page, Upload to Google Docs, then <remember how to use their UI> and it says, "Would you like to extract the text from this photo?".

 

Link to post
Share on other sites

  • RMweb Premium
3 minutes ago, Harlequin said:

 

You just need a SmartPhone. Snap the page, Upload to Google Docs, then <remember how to use their UI> and it says, "Would you like to extract the text from this photo?".

 

Can you photo a whole page with a mixture of plain text, images and captions or do you need to crop down to the bit you want?

 

Link to post
Share on other sites

  • RMweb Gold
Just now, Andy Keane said:

Can you photo a whole page with a mixture of plain text, images and captions or do you need to crop down to the bit you want?

 

The whole thing. It will give you the text in blocks and ignore the images. I must admit it can get confused by tables and lists - the text is all extracted OK but it doesn't interleave the parts in the right order sometimes.

 

I think we need to do a test! If you photograph a sample page and post it here (as high res as you can), I'll upload it to Google Docs and post some screenshots of the OCR process and the end results.

 

Link to post
Share on other sites

  • RMweb Gold

Sounds a useful project, but two things spring to mind.

1, don't just start. You do need to agree what metadata you are capturing.  Date of picture, issue number, page, loco number, location etc. If all start capturing different data then it won't merge together and won't be useful.

2. The boring point, if you are capturing the exact text and recording it in a database, then is that ok under the copyright? Most magazines, books etc have some sort of statement starting "no part of this publication may be replicated or stored......."   I've no idea if GWRJ does this. For your own home use you would be ok, but if you start making it publically available, then it might infringe it. 

Ian

  • Agree 3
  • Informative/Useful 1
Link to post
Share on other sites

  • RMweb Premium
12 minutes ago, ikcdab said:

Sounds a useful project, but two things spring to mind.

1, don't just start. You do need to agree what metadata you are capturing.  Date of picture, issue number, page, loco number, location etc. If all start capturing different data then it won't merge together and won't be useful.

2. The boring point, if you are capturing the exact text and recording it in a database, then is that ok under the copyright? Most magazines, books etc have some sort of statement starting "no part of this publication may be replicated or stored......."   I've no idea if GWRJ does this. For your own home use you would be ok, but if you start making it publically available, then it might infringe it. 

Ian


Sadly yes it’s probably the boring bit… Sadly also it’s probably true….

 

I take the view that posting the odd photo on here from book “x” dated 1975 is kind of ok as the book is possibly out of copyright…. The chances are that the photo isn’t- as it’s now owned by the NRM et Al. But that’s a chance we take.

 

However, if we are just compiling an index with names, dates etc. I really can’t see that we are hurting anyone.

 

As has been said further up the page, we need to collate what we have rather than re-inventing the wheel.

 

Ive got a busy few days ahead, but will post a screen shot of what I have done so far to compare it with others.

  • Agree 1
Link to post
Share on other sites

  • RMweb Premium
8 minutes ago, Neal Ball said:


Sadly yes it’s probably the boring bit… Sadly also it’s probably true….

 

I take the view that posting the odd photo on here from book “x” dated 1975 is kind of ok as the book is possibly out of copyright…. The chances are that the photo isn’t- as it’s now owned by the NRM et Al. But that’s a chance we take.

 

However, if we are just compiling an index with names, dates etc. I really can’t see that we are hurting anyone.

 

As has been said further up the page, we need to collate what we have rather than re-inventing the wheel.

 

Ive got a busy few days ahead, but will post a screen shot of what I have done so far to compare it with others.

I have written a couple of text books with John Wiley. They were full text dumped onto the web within days and when I asked Wiley's what they planned to do they just shrugged! So no publisher is going to say anything about an index - especially for something no longer available in print.

  • Agree 1
  • Informative/Useful 1
Link to post
Share on other sites

  • RMweb Gold

Google Docs OCR Test

 

The method is:

  • Upload photo to Google Drive
  • Right click on photo in Google Drive and select "Open with Google Docs"

That does the OCR and creates a doc containing both the image and the transcription of the text. Here's the result for the sample image:

image.png.a7e60553242aedc5c0fa2682febb8305.png

Quote

307

e vicinity

cr.

The 'tops

located a

paration fireman

he 'tops the foot

ler, just side valve el feeder % cut-off quadrant

uld easily

um pump placed a mountings,

sed waste

1, produc-

the bucket

= just right had a few

moving up bled us to 5 we might

as so much ight up to

month.

Departure from No. 9 platform, with Old Oak 'Castle' No. 4037 The South Wales Borderers at the head of a train for Paddington c.1938. This engine was rebuilt from 'Star' class engine Queen Phillipa, and was renamed after the famous Welsh regiment in March 1937. The official renaming was carried out by the regiment's Colonel at a ceremony at Paddington station the following G. H. SOOLE following 9.30 a.m. service to Paddington (7.50 a.m. Taunton) was a very different beast, as an extra vehicle was added at

box, and then reversed down towards the waiting train. As we drew level with the Weston engine, the driver shouted across. "12 for 408 tone" be old

 

Obviously, the text clipped by the edges of the image is incomplete and pretty senseless, (but it is a pretty accurate transcription!)

 

I highlighted the image caption in Bold and you can see that it's nearly perfect. It just got confused by the final word "month" standing alone.

 

  • Like 2
Link to post
Share on other sites

  • RMweb Premium

A number of bookshops carry backnumbers of GWRJ and other WSP titles - in other words, a good few issues are not yet out of print - so might be very discontent at digital page images becoming freely available. Furthermore, you have to consider that the copyright in the photographs and some other material will lie with others. 

 

An index to images, however, sounds a useful thing but what information are you going to include in each image caption? Often the thing I'm hunting for is not the subject of the photo but some detail lurking in the margin.

 

Have you seen the searchable index hosted on Western Thunder?

https://www.westernthunder.co.uk/gwrji/index.php?s=stations&t=tags

Edited by Compound2632
  • Agree 1
Link to post
Share on other sites

  • RMweb Premium

I have a complete index of all the articles, nearly 600 in all but as far as I can tell nobody has indexed the images. 

Clearly the captions don't tell all the story but they do contain lots of info that would be useful in an index.

And I completely agree about not posting page scans etc.

3 minutes ago, Compound2632 said:

A number of bookshops carry backnumbers of GWRJ and other WSP titles - in other words, a good few issues are not yet out of print - so might be very discontent at digital page images becoming freely available. Furthermore, you have to consider that the copyright in the photographs and some other material will lie with others. 

 

An index to images, however, sounds a useful thing but what information are you going to include in each image caption? Often the thing I'm hunting for is not the subject of the photo but some detail lurking in the margin.

 

Have you seen the searchable index hosted on Western Thunder?

https://www.westernthunder.co.uk/gwrji/index.php?s=stations&t=tags

 

22 minutes ago, Miss Prism said:

Just to point out that John Dolan's excel sheet (although it only goes up to issue 67) is probably a good base starter for this project.

 

I can't get your link to work @Miss Prism but the spreadsheet sounds interesting

Link to post
Share on other sites

  • RMweb Gold
3 minutes ago, Andy Keane said:
27 minutes ago, Miss Prism said:

 

I can't get your link to work @Miss Prism but the spreadsheet sounds interesting

 

I received the following message when attempting to download:

 

File not downloaded: Potential security risk.

Link to post
Share on other sites

  • RMweb Premium
5 hours ago, Andy Keane said:

I have now found John Dolan's index and it is certainly useful but not quite what I have in mind which is focused on photos, drawings and track-plans and their captions.

Maybe the two could be merged.

There used to be an index on various photos and the like, from a range of books and magazines. IIRC it was on the GWR only, but I could be wrong on that.

However, I haven't seen it for a decade or more. I assume that it is now lost?

  • Like 1
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...