Cambrian Line Radio Signalling failure - RAIB investigating

PaulRhB · February 22, 2018

I have always understood that all signalling systems had to fail safe. Does this fancy system do that under all situations? If it does not fail safe under all circumstances then it has no place on our railway system.

Not really the same thing as the main signalling itself, TSR boards fall over or go missing in the physical world too. It's still up to the drivers to notice and report there too so route knowledge is very important.

It's not ideal but as the updates Jims mentioned this has highlighted a problem that can now be solved and there's an easy solution in the meantime by cautioning to check for them after this reboot process. We get speeds issued on our patch so the info should be available anyway but will cause a little delay to the first train in each direction. I suspect it's just a case of putting in a check that all data sets have uploaded before handing back. It can be done with a 3187 form or a special form much like the Form for trains and otm in axle counter areas.

The Rule Book was written as they found out what can go wrong. People miss some possibilities and thankfully this was found due to staff questioning because they were doing their job right not an actual incident.

david.hill64 · February 22, 2018

I have always understood that all signalling systems had to fail safe. Does this fancy system do that under all situations? If it does not fail safe under all circumstances then it has no place on our railway system.

I think this depends what you mean by safe.

When BRR was trying to get approval to install the first Solid State Interlockings in the UK, it ran into a problem because it was able to demonstrate that the chances of wrong side failure were infinitesimally small, when signal engineers believed (wrongly) that the chance of wrong side failure of a mechanical or relay based system was zero. In the end common sense and logical argument prevailed. SSI is safe but does not have zero chance of causing an accident: just like other systems.

ETCS has probably been through more approvals process than any other signalling system and provided that it is operated and maintained in accordance with the Safety Related Application Conditions is as safe as any other system.

That doesn't mean that things cannot go wrong in certain circumstances. Consider for example the recent incident at Waterloo, probably the most serious safety lapse on UK rail in a decade judging from the interim report. No system is fail safe under all circumstances including when humans do something unexpected or against the rules. I think it important to wait for the RAIB report before judging.

The Stationmaster · February 22, 2018

Simon - Russ works on the railway (I've photographed him and his steeds many times) so has good experience from which to comment.

Fewer failures is always relative, one power supply failure can take out a huge area as has been demonstrated several times, maybe disruption would be a better comparison.

I'm not ganging up on anyone on this forum but I'm certainly with ganging up on various things which seem to happen on the railway today - and this wrong side failure is definitely one of them. Any system is only going to be tried if it is tried properly on a real railway, just like class 800 trains can only really be tried out once they're in traffic. BUT, and yes it is a very big BUT, while there is nothing wrong with trying stuff it has to be done against a background where operational safety remains paramount and something in the Cambrian trial or something associated with that trial has shown that not to be the case - it is an extremely serious situation and to be honest I'm amazed that various people had not taken the necessary steps to ensure the chance of it happening would be reduced to as near zero as possible and as was the case with previous systems.

I suspect that part of today's problem on the railway is lack of experienced middle management (and probably senior management) who are not aware of the old idea of a railway safety culture (even if it was imperfect in some ways) and who assume that things are safe because some even more remote person has written it into a Rule Book (as it happens I know who initially drafted the ERTMS related changes to the Rule Book, he had virtually no practical operating experience but years of experience in writing Rules and Instructions in proper English). But going beyond that a lot of the latest developments in signal control seem to have created a situation where comparatively minor failures can bring huge swathes of railway to a stand and then nobody seems to have any particular interest or sense of urgency in getting trains moving again. The fault there is not really with the signal engineers but with those who have told them what they want, no doubt at as low a price as possible, and with those who don't give a tuppenny damn about getting the railway working when something very simple has gone wrong - and most of that lack of responsibility and disinterest lies within Network Rail although in some respects they have been suitably misled by DafT.

ERTMS/ETCS or whatever you care to call it (the Rule Book calls it ERTMS as I've said before so as far as those using it are concerned it is ERTMS) is a Europe wide system development with masses of integration problems into rolling stock and operating methodolgy for most of the Railways who are likely to take it up. But if any one of them doesn't - within its area of control - ensure its past standards of safety and operational flexibility are recognised then that responsibility is wholly with that railway administration and not the system. And if you don't know or understand the front end of railway operating in operational safety terms and from a Driver's and Signalman's perspective you ain't necessarily going to get it right unless you are very lucky.

That is not the fault of the signal engineer but is very much down to poor overall management and lack of understanding of the railway you are managing. And it very definitely isn't the fault of Simon who has been building his skills within his technical area and using them in teh way the present day railway wants them to be used, ooh and he's good at standing hiis ground on here as well

3.6k · February 22, 2018

...But going beyond that a lot of the latest developments in signal control seem to have created a situation where comparatively minor failures can bring huge swathes of railway to a stand and then nobody seems to have any particular interest or sense of urgency in getting trains moving again. The fault there is not really with the signal engineers but with those who have told them what they want, no doubt at as low a price as possible, and with those who don't give a tuppenny damn about getting the railway working when something very simple has gone wrong - and most of that lack of responsibility and disinterest lies within Network Rail although in some respects they have been suitably misled by DafT...

Not ERTMS related, but I hope all the systems that support Network Rail's Rail Operating Centres (ROC) are/were suitably specified, robust and with adequate effective back-up and testing, considering the route miles that they cover.

A local track circuit or axle counter failure is certainly an inconvenience, albeit temporary, for operators and passengers. A major failure associated with a Network Rail ROC will potentially be a different magnitude entirely.

Edited February 22, 2018 by 4630

iands · February 22, 2018

What is perhaps more surprising (or disturbing?) is that this particular incident occurred in October 2017, but it has only just been "made public" now - 4 months later. I hope, as a concienous railway professional, that someone hasn't been "sat on this" hoping it would "go away naturally" and that it was reported to the authorities last October and not last week.

Regards, Ian.

St. Simon · February 22, 2018

What is perhaps more surprising (or disturbing?) is that this particular incident occurred in October 2017, but it has only just been "made public" now - 4 months later. I hope, as a concienous railway professional, that someone hasn't been "sat on this" hoping it would "go away naturally" and that it was reported to the authorities last October and not last week.

Regards, Ian.

It would have been reported immediately, it's too big to 'cover up', it is probably just down to everybody trying to work out if it should be an NR, ORR, RAIB or DfT investigation.

Simon

martin_wynne · February 22, 2018

What is perhaps more surprising (or disturbing?) is that this particular incident occurred in October 2017, but it has only just been "made public" now - 4 months later. I hope, as a concienous railway professional, that someone hasn't been "sat on this" hoping it would "go away naturally" and that it was reported to the authorities last October and not last week.

RAIB says "The RAIB has decided to undertake an independent investigation because to date, the signalling system supplier has not identified the cause of the failure."

It would seem the delay is because the system manufacturer was given time to explain what went wrong. Obviously if they had come back with an explanation and fix, there would be no need for much more investigation on site. It would be up to them to explain the failure of their design and testing.

Martin.

PaulRhB · February 22, 2018

What is perhaps more surprising (or disturbing?) is that this particular incident occurred in October 2017, but it has only just been "made public" now - 4 months later. I hope, as a concienous railway professional, that someone hasn't been "sat on this" hoping it would "go away naturally" and that it was reported to the authorities last October and not last week.

Regards, Ian.

What you see in public is usually a little bit down the line. We get urgent operating notices internally straight away where appropriate, that doesn't affect the transparency as it's visible to TOCs, Unions and regulating bodies. It's just giving time to investigate what actually happened before putting it in the public domain letting the press turn it into a froth fest if there weren't some facts to reign them in.

iands · February 22, 2018

It would have been reported immediately, it's too big to 'cover up', it is probably just down to everybody trying to work out if it should be an NR, ORR, RAIB or DfT investigation.

Simon

I hope you are right Simon, but it wouldn't be the first time that what turns out to be a "significant incident" was delayed in being reported/escalated to an appropriate higher authority because someone didn't realise how potentially serious it could be, because somebody was insufficiently trained/experienced in recognising a potential/actual "wrong side failure" - a point alluded to by others above.

Regards, Ian.

iands · February 22, 2018

RAIB says "The RAIB has decided to undertake an independent investigation because to date, the signalling system supplier has not identified the cause of the failure."

It would seem the delay is because the system manufacturer was given time to explain what went wrong. Obviously if they had come back with an explanation and fix, there would be no need for much more investigation on site. It would be up to them to explain the failure of their design and testing.

Martin.

Hi Martin,

Good point, I'd overlooked that statement. Thanks for highlighting it to me.

Regards, Ian.

iands · February 22, 2018

What you see in public is usually a little bit down the line. We get urgent operating notices internally straight away where appropriate, that doesn't affect the transparency as it's visible to TOCs, Unions and regulating bodies. It's just giving time to investigate what actually happened before putting it in the public domain letting the press turn it into a froth fest if there weren't some facts to reign them in.

Hi Paul,

You are quite right. My excuse is I've been off work for a year with illness (returning next month) so have missed out on the official notifications and unofficial (but usually very reliable) "bush telegraph" to find out the details of such incidents.

Regards, Ian.

Junctionmad · February 22, 2018

From my limited knowledge of ERTMS, it has the typical danger of trying to be all things to all people, it's a classic case for complexity. Computerising vast quantities of special cases, embedding local knowledge, different corporate cultures , engineering standards etc , is and can be a receipe for a system that becomes almost impossible to prove , both operationally proofed and safety proof.

I follow it with interest , but certainly , with modern communications it makes no sense to have track side signalling

phil-b259 · February 22, 2018

On the face if things, the incident sounds similar to the situation where a SSI / Smartloc / Westloc Interlocking has required a 'hard reboot' (either through a power outage, or the signaller pressing the "Emergency All Signals In' button that has the same effect) and where all the 'Technicians Controls' (thats route bars, aspect restrictions, points locked one way, etc) have to be re-entered manually and which do not appear on a NX control panel.

Edited February 22, 2018 by phil-b259

Junctionmad · February 22, 2018

In my 40+ years involved in the railway industry I saw an alarming decline in the understanding of what was required by opposite sides in putting things together. Before Sectorisation we had District Signalling Inspectors who had all been long-terms signalman, Footplate inspectors who has all been drivers, the people who worked in the Divisional Manager's signalling works office had largely started as signalmen or in places like the timetable section. Signaling designers and testers started as either probationers in the wages grades or engineering students in the salaried grades. The latter was no boil-in-the-bag job as we spent five years before being let loose unsupervised starting with learning how to put up a signal, as the junior in the gang you got to dig the hole if you wanted to get respect in future, set up points, wire circuits, cut locking for frames etc.

In the early 1980s I was involved in producing documents which became a regional standard on specifying operational and signalling design requirements for projects which were later translated into BR standards. By 1992 we even had a manual on how to specify and implement infrastructure projects. Railtrack binned it on day one.

Privatisation of operations only made things worse. I had a Traction man from a FOC at a Junction Risk Workshop i was running for a 125mph line. He said he wasn't interested in route knowledge as his trains only ran at 60mph so if we were sighting for 125mph it wasn't necessary. He wasn't in the job by the second meeting, wonder who put a spoke in his wheels.

I was only too glad to get out and just act as an casual hired hand as soon as I was able to take an early pension

The travails of the UKs Private railway experiment are well documented.

However there are generic issues also at play . In the past , many " trades " were largely learnt " on the job" , the system employed enough people , so that you could " afford" many grades and hence the time to transition people from grade to grade as they acquired knowledge and experience , in addition to passing theory exams etc.

This method of training wasn't unique to railways , it also existed in the maritime industry.

However , two things have changed , in the western world anyway . One is the extraordinary cost of labour , especially skilled or specialist knowledge labour. Secondly is the requirement for very specialist knowledge in itself , take " applied IT " like embedded systems , when I entered electronics, this was a fairly small branch of the main profession and many EE engineers could easily transistion into the sector.

Today , embedded systems is a huge sector of electronics , requiring a mix with of electronics and software, very specialised , with good people commanding high salaries.

The nett effect of this , is companies all over the developed world , have labour cost reduction as a core policy. You simply have to to survive.

Hence as more and more technology is deployed , largely to reduce dependence on human labour , so the old ways of learning "on the job" are largely untenable. Companies hiring skilled people , can't afford to have them digging holes for signal Posts !! , hence the tendency to get " experts" with less then enough " hands on " knowledge.

It's the same reason the apprentice system has largely collapsed

Of course , as a system designer , I will always argue , that I am charged with the "big picture " , i.e. My design must deliver the expected productivity , cost savings, operational advantages , lower maintenance , etc , blah, etc.

In some cases , in fact in many cases , that means " telling " the operational layer " it will now be done like this "

Ultimately , technology will remove many of these operational level posts , so the human at the coalface, will be increasingly ignored , until ultimately replaced. Hence the push towards increasing automation and driverless cars , trains , planes etc. . what you'd seen to date is only a shadow of what's " coming down the tracks " . A computer doesn't mind being delayed at 2AM in the snow !!!

Of course , any system designer must take user feedback into consideration, in the case of the Cambrian ERTMS , my understanding is it's a pilot testing installation , so it's not surprising it's has issues , that's the whole point of a pilot, if it was perfect , you wouldn't need a pilot

Edited February 22, 2018 by Junctionmad

The Stationmaster · February 22, 2018

RAIB says "The RAIB has decided to undertake an independent investigation because to date, the signalling system supplier has not identified the cause of the failure."

It would seem the delay is because the system manufacturer was given time to explain what went wrong. Obviously if they had come back with an explanation and fix, there would be no need for much more investigation on site. It would be up to them to explain the failure of their design and testing.

Martin.

So the system supplier hasn't answered, right perfectly clear and obviously a need for further outside involvement BUT the most important thing surely is the fact that a wrong side failure was allowed to happen in the first place. I can't find, and haven't received, any recent amendments to the Rule Book which reflect that particular procedural hole being stopped up (maybe someone else. Paul?, has?) and that strikes me as umpteen times far more important than the system supplier giving an answer (important though that is).

PS Just in case anybody asks I do have a full copy of the national Rule Book and am on the distribution list for all amendments.

Edited February 22, 2018 by The Stationmaster

Junctionmad · February 22, 2018

On the face if things, the incident sounds similar to the situation where a SSI / Smartloc / Westloc Interlocking has required a 'hard reboot' (either through a power outage, or the signaller pressing the "Emergency All Signals In' button that has the same effect) and where all the 'Technicians Controls' (thats route bars, aspect restrictions, points locked one way, etc) have to be re-entered manually and which do not appear on a NX control panel.

The issue boils down to hot or cold standby systems. , hot standbys maintain an exact duplicate of the state of the primary system , cold standbys merely have a snapshot in time , and anything added between snapshots must be reapplied when the standby is activated

This still leaves the issue of how to update the original primary source when the hot standby takes over and has changes applied to it, the original unit may not even be functional at this point . Database designers deal with this issue on a regular basis for example. transaction logging, roll forward and roll back systems etc. It's a very complex area. Flight computers largely deal with it by a form of " voting "

At the end of the day , these complex systems are difficult to engineer and hence cost money , so decisions to lower cost tend to affect the ability of systems to respond comphrensively to failures or errors. It's all a trade off.

Junctionmad · February 22, 2018

So the system supplier hasn't answered, right perfectly clear and obviously a need for further outside involvement BUT the most important thing surely is the fact that a wrong side failure was allowed to happen in the first place. I can't find, and haven't received, any recent amendments to the Rule Book which reflect that particular procedural hole being stopped up (maybe someone else. Paul?, has?) and that strikes me as umpteen times far more important than the system supplier giving an answer (important though that is).

PS Just in case anybody asks I do have a full copy of the national Rule Book and am on the distribution list for all amendments.

Wrong side failures have being happening for a long time , see Abbots Ripton accident

Junctionmad · February 22, 2018

There should never really be a methodology to restarting computer systems because there is always the possibility of a power cut that will not respect any methodology, most especially in linked systems.

Safety critical systems should be robust enough to withstand such things and recover safely and nowadays, thanks to data regulation (you don't really want to fall foul of), even trivial systems can be deemed safety critical.

Standard testing for large systems, hit them with an off switch then see what happens, you be amazed how often developers have neglected to consider such eventualities.

I'm afraid what you suggest is pie in the sky , many complex systems have to be restarted in a certain order and power failures have to be migigated so that they do not occur when the system cannot handle them. ( this is as much true of a Raspberry Pi as anything else , )

Particularly in distributed systems , sequencing restarts can be a huge technical issue

Arun Sharma · February 22, 2018

It may be that I have no idea what I'm talking about, but it seems to me from Big Jim's [and others'] descriptions of ERTMS that it could readily be modified to allow driverless trains to run on all programmed mainlines. After all, if the system knows what the stopping, starting and braking characteristics of any given set of wheels on a given set of rails are, then presumably drivers would become redundant idc. Such a constrained system would also probably mean the end of non-ERTMS fitted prime movers/rolling stock such as steam locos presumably?

The Stationmaster · February 22, 2018

Wrong side failures have being happening for a long time , see Abbots Ripton accident

Agreed - but then everything in the machine/human interface on the railway in Britain has for many years been planned to reduce that to the minimum and eliminate it wherever possible. In this case, with a near brand new system, I get the impression that the procedural check to minimise the potential for certain types of wrong side failure (such as this one) does not exist; it certainly isn't in the relevant module of the Rule Book which is where anything concerning operational safety should be. There are very clear Rules which relate to the disconnection and reconnection of just about every sort of signalling equipment that can impact on operational safety yet as far as I can trace they don't apply to some parts of ERTMS despite the potential safety impact being almost as great as that relating to signals (or their equivalent) themselves).

dmustu · February 22, 2018

It may be that I have no idea what I'm talking about, but it seems to me from Big Jim's [and others'] descriptions of ERTMS that it could readily be modified to allow driverless trains to run on all programmed mainlines.

No, it really couldn't.

D854_Tiger · February 22, 2018

I'm afraid what you suggest is pie in the sky , many complex systems have to be restarted in a certain order and power failures have to be migigated so that they do not occur when the system cannot handle them. ( this is as much true of a Raspberry Pi as anything else , )

Particularly in distributed systems , sequencing restarts can be a huge technical issue

Horses for courses, I would argue for safety critical systems something better than a check list for switching outed systems back on is required.

Perhaps a bank could get away with that approach (I know some who do) but then that is until it causes say a security breach and the s**t hits the fan. OK no one is going to die but after the horror show known as national media headlines there will be more than a few left staring at the carnage that maybe wished they had.

Pandora · February 22, 2018

It may be that I have no idea what I'm talking about, but it seems to me from Big Jim's [and others'] descriptions of ERTMS that it could readily be modified to allow driverless trains to run on all programmed mainlines. After all, if the system knows what the stopping, starting and braking characteristics of any given set of wheels on a given set of rails are, then presumably drivers would become redundant idc. Such a constrained system would also probably mean the end of non-ERTMS fitted prime movers/rolling stock such as steam locos presumably?

They will not get rid of the driver, it would breach a fundamental principle, the principle of someone to blame when it all goes wrong!

martin_wynne · February 22, 2018

Can this radio system be jammed by the Russians? Could that lose the TSR data? Just asking the question.

http://www.independent.co.uk/news/uk/home-news/russia-cyber-attacks-notpetya-gavin-williamson-defence-secretary-putin-hacking-ransomware-a8212801.html

Martin.

Reorte · February 22, 2018

They will not get rid of the driver, it would breach a fundamental principle, the principle of someone to blame when it all goes wrong!

Nah, blame the programmer.

The driver will go some day.

A lot of "Improvements" these days have a feel of wanting high tech for the sake of it than to solve any problems that really need solving (I'm talking entirely generically here). The bile towards those unimpressed with them from some quarters is disturbing.

Edited February 22, 2018 by Reorte

Cambrian Line Radio Signalling failure - RAIB investigating

Recommended Posts

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in