RMweb Premium Andy Keane Posted February 18, 2023 RMweb Premium Share Posted February 18, 2023 This thread is for those who would like to engage in a community effort to collect together all the picture and figure captions in the GWRJ into a searchable index. If we manage to do the GWRJ we could move on to other sources of photos, drawings, plans etc. To begin with I would like to try and agree on the software we might use to store the efforts of those who are disposed to help. All I imagine people doing is keying in (or perhaps using some form of OCR scan) the captions along with GWRJ issue number, date and page number. This might sound like a very tedious task but actually taking the time to study the photos in the Journal can of itself be quite interesting and throw up many thoughts. If we managed who did what, a group could make steady progress and then we could make it more widely available. Andy 2 1 Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 Those who have expressed an interest at the outset are @longchap @Harlequin @Neal Ball @Mikkel If you would like to try and do this please join with your thoughts on this thread. Andy 3 1 Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 By my count there are 103 issues of GWRJ plus the initial Bumper Preview and Cornish Special. There are then just shy of 600 articles in them! Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 What I would suggest is dividing up the issues and the each doing some before we try and merge them together? This leaves people free to use whatever tool they like best while we plan how to house the results. I will start just entering into Excel. Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 Link to post Share on other sites More sharing options...
RMweb Gold Harlequin Posted February 18, 2023 RMweb Gold Share Posted February 18, 2023 I suggest using one of the online collaboration tools, which are designed to allow teams of people to all work on the same documents, avoiding all the hassle of maintaining and aggregating separate copies. The prime candidates would be Google Docs and Open Office (and I guess there's an Apple equivalent) but Google Docs is the platform-neutral choice and has very good OCR built-in. All contributors would need is a browser and Andy would assign them permission to view, comment or edit the shared document as required. 1 2 Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 (edited) Seems like a sound plan if people are happy with it. I am also wondering if there is any pen like device that can be run over the text to read it. If such a thing was not too costly it might help a great deal. Phil do you know what the Google OCR will talk to? now I look I see they cost about £75 hmm Edited February 18, 2023 by Andy Keane 1 Link to post Share on other sites More sharing options...
RMweb Gold Harlequin Posted February 18, 2023 RMweb Gold Share Posted February 18, 2023 17 minutes ago, Andy Keane said: Seems like a sound plan if people are happy with it. I am also wondering if there is any pen like device that can be run over the text to read it. If such a thing was not too costly it might help a great deal. Phil do you know what the Google OCR will talk to? now I look I see they cost about £75 hmm You just need a SmartPhone. Snap the page, Upload to Google Docs, then <remember how to use their UI> and it says, "Would you like to extract the text from this photo?". Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 3 minutes ago, Harlequin said: You just need a SmartPhone. Snap the page, Upload to Google Docs, then <remember how to use their UI> and it says, "Would you like to extract the text from this photo?". Can you photo a whole page with a mixture of plain text, images and captions or do you need to crop down to the bit you want? Link to post Share on other sites More sharing options...
RMweb Gold Harlequin Posted February 18, 2023 RMweb Gold Share Posted February 18, 2023 Just now, Andy Keane said: Can you photo a whole page with a mixture of plain text, images and captions or do you need to crop down to the bit you want? The whole thing. It will give you the text in blocks and ignore the images. I must admit it can get confused by tables and lists - the text is all extracted OK but it doesn't interleave the parts in the right order sometimes. I think we need to do a test! If you photograph a sample page and post it here (as high res as you can), I'll upload it to Google Docs and post some screenshots of the OCR process and the end results. Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 Like this? 2 Link to post Share on other sites More sharing options...
RMweb Premium Neal Ball Posted February 18, 2023 RMweb Premium Share Posted February 18, 2023 Are we documenting EVERY image? 1 Link to post Share on other sites More sharing options...
RMweb Premium Neal Ball Posted February 18, 2023 RMweb Premium Share Posted February 18, 2023 Plus! We need to include the issues of Western Times beyond GWRJ as well. As it seems they are continuing in the same vein. 2 Link to post Share on other sites More sharing options...
RMweb Gold ikcdab Posted February 18, 2023 RMweb Gold Share Posted February 18, 2023 Sounds a useful project, but two things spring to mind. 1, don't just start. You do need to agree what metadata you are capturing. Date of picture, issue number, page, loco number, location etc. If all start capturing different data then it won't merge together and won't be useful. 2. The boring point, if you are capturing the exact text and recording it in a database, then is that ok under the copyright? Most magazines, books etc have some sort of statement starting "no part of this publication may be replicated or stored......." I've no idea if GWRJ does this. For your own home use you would be ok, but if you start making it publically available, then it might infringe it. Ian 3 1 Link to post Share on other sites More sharing options...
RMweb Premium Neal Ball Posted February 18, 2023 RMweb Premium Share Posted February 18, 2023 12 minutes ago, ikcdab said: Sounds a useful project, but two things spring to mind. 1, don't just start. You do need to agree what metadata you are capturing. Date of picture, issue number, page, loco number, location etc. If all start capturing different data then it won't merge together and won't be useful. 2. The boring point, if you are capturing the exact text and recording it in a database, then is that ok under the copyright? Most magazines, books etc have some sort of statement starting "no part of this publication may be replicated or stored......." I've no idea if GWRJ does this. For your own home use you would be ok, but if you start making it publically available, then it might infringe it. Ian Sadly yes it’s probably the boring bit… Sadly also it’s probably true…. I take the view that posting the odd photo on here from book “x” dated 1975 is kind of ok as the book is possibly out of copyright…. The chances are that the photo isn’t- as it’s now owned by the NRM et Al. But that’s a chance we take. However, if we are just compiling an index with names, dates etc. I really can’t see that we are hurting anyone. As has been said further up the page, we need to collate what we have rather than re-inventing the wheel. Ive got a busy few days ahead, but will post a screen shot of what I have done so far to compare it with others. 1 Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 This is the part of my full index of articles in GWRJ - happy to share with anyone who wants the full thing (just short of 600 entries) 1 Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 8 minutes ago, Neal Ball said: Sadly yes it’s probably the boring bit… Sadly also it’s probably true…. I take the view that posting the odd photo on here from book “x” dated 1975 is kind of ok as the book is possibly out of copyright…. The chances are that the photo isn’t- as it’s now owned by the NRM et Al. But that’s a chance we take. However, if we are just compiling an index with names, dates etc. I really can’t see that we are hurting anyone. As has been said further up the page, we need to collate what we have rather than re-inventing the wheel. Ive got a busy few days ahead, but will post a screen shot of what I have done so far to compare it with others. I have written a couple of text books with John Wiley. They were full text dumped onto the web within days and when I asked Wiley's what they planned to do they just shrugged! So no publisher is going to say anything about an index - especially for something no longer available in print. 1 1 Link to post Share on other sites More sharing options...
RMweb Gold Harlequin Posted February 18, 2023 RMweb Gold Share Posted February 18, 2023 Google Docs OCR Test The method is: Upload photo to Google Drive Right click on photo in Google Drive and select "Open with Google Docs" That does the OCR and creates a doc containing both the image and the transcription of the text. Here's the result for the sample image: Quote 307 e vicinity cr. The 'tops located a paration fireman he 'tops the foot ler, just side valve el feeder % cut-off quadrant uld easily um pump placed a mountings, sed waste 1, produc- the bucket = just right had a few moving up bled us to 5 we might as so much ight up to month. Departure from No. 9 platform, with Old Oak 'Castle' No. 4037 The South Wales Borderers at the head of a train for Paddington c.1938. This engine was rebuilt from 'Star' class engine Queen Phillipa, and was renamed after the famous Welsh regiment in March 1937. The official renaming was carried out by the regiment's Colonel at a ceremony at Paddington station the following G. H. SOOLE following 9.30 a.m. service to Paddington (7.50 a.m. Taunton) was a very different beast, as an extra vehicle was added at box, and then reversed down towards the waiting train. As we drew level with the Weston engine, the driver shouted across. "12 for 408 tone" be old Obviously, the text clipped by the edges of the image is incomplete and pretty senseless, (but it is a pretty accurate transcription!) I highlighted the image caption in Bold and you can see that it's nearly perfect. It just got confused by the final word "month" standing alone. 2 Link to post Share on other sites More sharing options...
Miss Prism Posted February 18, 2023 Share Posted February 18, 2023 Just to point out that John Dolan's excel sheet (although it only goes up to issue 67) is probably a good base starter for this project. 1 Link to post Share on other sites More sharing options...
RMweb Premium Compound2632 Posted February 18, 2023 RMweb Premium Share Posted February 18, 2023 (edited) A number of bookshops carry backnumbers of GWRJ and other WSP titles - in other words, a good few issues are not yet out of print - so might be very discontent at digital page images becoming freely available. Furthermore, you have to consider that the copyright in the photographs and some other material will lie with others. An index to images, however, sounds a useful thing but what information are you going to include in each image caption? Often the thing I'm hunting for is not the subject of the photo but some detail lurking in the margin. Have you seen the searchable index hosted on Western Thunder? https://www.westernthunder.co.uk/gwrji/index.php?s=stations&t=tags Edited February 18, 2023 by Compound2632 1 Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 I have a complete index of all the articles, nearly 600 in all but as far as I can tell nobody has indexed the images. Clearly the captions don't tell all the story but they do contain lots of info that would be useful in an index. And I completely agree about not posting page scans etc. 3 minutes ago, Compound2632 said: A number of bookshops carry backnumbers of GWRJ and other WSP titles - in other words, a good few issues are not yet out of print - so might be very discontent at digital page images becoming freely available. Furthermore, you have to consider that the copyright in the photographs and some other material will lie with others. An index to images, however, sounds a useful thing but what information are you going to include in each image caption? Often the thing I'm hunting for is not the subject of the photo but some detail lurking in the margin. Have you seen the searchable index hosted on Western Thunder? https://www.westernthunder.co.uk/gwrji/index.php?s=stations&t=tags 22 minutes ago, Miss Prism said: Just to point out that John Dolan's excel sheet (although it only goes up to issue 67) is probably a good base starter for this project. I can't get your link to work @Miss Prism but the spreadsheet sounds interesting Link to post Share on other sites More sharing options...
RMweb Premium Andy Keane Posted February 18, 2023 Author RMweb Premium Share Posted February 18, 2023 I have now found John Dolan's index and it is certainly useful but not quite what I have in mind which is focused on photos, drawings and track-plans and their captions. Maybe the two could be merged. Link to post Share on other sites More sharing options...
RMweb Gold longchap Posted February 18, 2023 RMweb Gold Share Posted February 18, 2023 3 minutes ago, Andy Keane said: 27 minutes ago, Miss Prism said: I can't get your link to work @Miss Prism but the spreadsheet sounds interesting I received the following message when attempting to download: File not downloaded: Potential security risk. Link to post Share on other sites More sharing options...
GWR_Modeller Posted February 18, 2023 Share Posted February 18, 2023 Hi, Are you aware of this site? https://www.steamindex.com/gwrj/gwrj1.htm All 13 volumes of GWRJ fully indexed, captions etc. The only issue is each volume is a seperate file. Regards, Paul 1 Link to post Share on other sites More sharing options...
RMweb Premium kevinlms Posted February 19, 2023 RMweb Premium Share Posted February 19, 2023 5 hours ago, Andy Keane said: I have now found John Dolan's index and it is certainly useful but not quite what I have in mind which is focused on photos, drawings and track-plans and their captions. Maybe the two could be merged. There used to be an index on various photos and the like, from a range of books and magazines. IIRC it was on the GWR only, but I could be wrong on that. However, I haven't seen it for a decade or more. I assume that it is now lost? 1 Link to post Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now