Innovation, complexity, and system engineering

morreale Monday 22 of November, 2010
The November 1/8, 2010 Aviation Week & Space Technology issue on innovation and the article on revamping system engineering have struck a nerve with me. The article describes high profile program failures in aerospace and defense, like the integration issues with the F35 and the manufacturing issues with the 787 as a problem with present methods of system engineering. Systems have become so complex that the Apollo-era system engineering approach that breaks systems into subsystems and defines the interfaces between those subsystems is no longer adequate today according to the article. Further, complexity measurement, elegant design, and graceful performance outside the performance envelop are now important design criteria in modern systems design. DARPA has funded the META program to address these issues through the development of a modeling language so that companies can model before building a system. It’s an ambitious project and I hope that its use becomes widely available for use across many industries.

Star Trek and Systems Engineering
So I have to admit that I’m a long time Trekkie or Trekker or what every it’s call now a days. I watch Star Trek any time it’s on. I can’t just look away. It gave me a constructive model of how all kinds of different creatures can work together, which has influenced my professional work life. Expect this I did not. It was socially progressive for its time too. The engineering and adventure are my favorite aspect of the show. So, I love the show but parts of the show frustrate me. Scotty, Geordi, and Data, for example, could rig together vastly different parts and systems together to perform new functions that the original pieces were never designed to do in the first place. The team could create these systems in almost no time and under high pressure situations. The new system would work flawlessly, precisely, and save the planet without any apparent testing before hand. Anyone who has done any kind of design before from flower arranging, to painting, to circuit design, to aircraft or auto design on this planet knows that it’s not that easy. Today, the materials or components have to be just right. The resulting systems have a narrow range of operation and it takes several prototype iterations to get a system to function reliably and precisely. Outside of the intended operating range, present day systems don’t perform well and are more likely to fail than not perform. They also don’t perform well when a component fails. The fault performance isn’t graceful either (It’s dead, Jim). Designers may spend a lot of time modeling and simulating the design but models don’t capture everything. The first prototype gives you an idea of the general performance and a start at understanding the dynamic behavior of the system.

Real World Integration Issue
In the mid 1990’s for example, I worked on a multi-gigabit per second multiplexer for a telecom system. The system went into production and then into service. Then we discovered that every once in a while bits would get dropped when switching from the operating input to the backup input when the operating input failed (at least 1 out of ~10,000 times). Since the system rarely switched due to the inherent reliability, it took a while to see this problem. It took about a year of tuning firmware, adjusting Phase Lock Loop (PLL) bandwidths, and adding a PLL before the problem was solved within our equipment. The solution to the problem transcended subsystems and interfaces, which typically are the root of most problems. Then the interoperability tests began with equipment from two other companies. Strangely, the ITU specifications on SONET equipment and timing does not adequately define the dynamic switching behavior of these types of systems. At the beginning of the project, I did not expect this issue or that it would lead me to spend seven wonderful weeks in France working with an international team to identify the root cause of the dynamic switching failure and determine a fix to allow all three systems to interoperate without fault (no faults in many 100,000s of switches). Even now, it’s not clear how to model, simulate, prototype, integrate, and test for dynamic (or now emergent) behaviors for complex systems when you don’t know what these behaviors might be or how it occurs across subsystems and interfaces. At the time, the cost of the simulation tools and models to model the whole system might have cost more than the system itself.

Operating Outside the Envelope
The other challenging issue is designing systems to work gracefully outside of their operating envelope and to fail gracefully. I worked on a hi-rel pump laser power supply around 2004. The circuits were modeled in SPICE to get a first cut at the design. I could not get SPICE models for all the semiconductor devices so simple substitute models were created as a workaround. Once you identify candidate devices for a design, you study the datasheet. Each vendor has a way of writing datasheets and it takes some experience and testing to understand what the datasheet is telling you. The datasheet doesn’t always tell you the sensitivity to various parameters so the full “personality” of the device can’t be understood from the datasheet. Simulating the circuit can include some tests for component failures but conclusive results can be difficult to obtain because the component models may not capture the fault conditions under test, or SPICE may not converge. I’ve tested the board by examining the behavior of the design when passive components fail open or short. For designs with under a hundred components, this may be reasonable, but not for designs with 100s or 1000s of components. The open and short failure mode tests were made with the same methodology used for the way the reliability analysis is usually computed. No attempt was made to determine what happens if a resistor value or capacitance value changes over time. The number of combinations to test becomes too large to test in a reasonable time.

Nano Quantum System Engineering
As someone working on a nanodevice, I am thinking about the device design, tools, techniques, methods, verification, and integration of the device into a system. I fall back on the subsystem-interface approach as a matter of reflex but since nanodevices function at the quantum level, these techniques do not seem adequate or even applicable. I wonder what system design tools that incorporate hundreds or thousands of nanostructures and devices would look like. Perhaps my favorite Star Trek engineers were able to succeed because their tools were so good at making all the pieces work together. Perhaps, someone like DARPA will create programs to produce quantum mechanical-based system design tools to design complex nanoscale systems. This might gets us closer to the future we envisioned in 1966.

Permalink: https://www.p-brane.com/nano/blogpost80-Innovation-complexity-and-system-engineering