Energy managers (and quality managers) know the infamous PDCA- cycle that is on the basis of ISO 50001. The idea is to continuously improve how a given facility is operated. You plan improvements (PLAN), carry them out (DO), measure their impact (CHECK), and then take management decisions (ACT). In a digitized world, one would expect that data is at the core of this process and most decisions and that actions within this circle are mostly automated. But in reality, we could not be further away from that idea.
Over the last year, we talked to energy managers, consultants, and innovators working with energy providers, the packaging, paper, steel, and even the automotive industry. When asked what their main challenge was, they all named one thing:
“Working with IoT data is frustrating, time-consuming and hardly leads to results.”
To carry out the PDCA process efficiently, one needs data from within one’s company. Energy demands of plants, machines, and transport processes. Production volumes, quality standards. All of that is necessary to monitor a companies (energy) performance. In reality, however, this data still seems to not be available for those that need it.
When asked about their real-life experiences, they describe a dire situation. Depending on how familiar you are with your companies structure and people within it, you might find data and information fast or never. Data that you receive is often of poor quality which makes data-driven decision-making a gamble.
How to spend two months calculating the CO2 footprint of a machine park
Here is one specific use case to illustrate all that. We worked together with a local energy provider who told us about how they failed at a rather simple task. They tried to use power measurements stored in their database to calculate the CO2 footprint of their machines. From a domain perspective, this is a rather simple task. Integrate all power measurements for a given period (i.e. the year 2020) to arrive at an overall energy demand. Then, multiply this number with the specific CO2 emission factor per energy.
In reality, however, this was much harder than anticipated. First, Martin the domain expert tasked with this analysis had to retrieve the data from their database. He had to find the person that had access to this (technically) and inform this person what kind of data he was looking for. Doing this himself was not an option. Neither was he allowed to access the database nor did he have the necessary IT knowledge (SQL) to do so.
A few weeks and several e-mails later he finally received the data he was looking for in the form of .csv-files. 360 columns representing temperatures, thermal power measurements, electrical power measurements, and so on. Sampled every 15 minutes for the past 3 years (~100.000 rows). No this is not big data, but way more than what the tool he was using (MS Excel) was designed for.
Thanks to a relatively expressive naming scheme, he found the power measurements he was looking for. Any other necessary information like unit (was the value measured in Megawatts, Kilowatts, or Kilowatthours per 15 Minutes) was, however, missing. Again, Martin had to pick up the phone and call his IT colleague. He could, unfortunately, not answer his questions. He was not from the domain and forwarded his request to another person familiar with the sensors themselves.
Almost there, he thought. But plotting the data, he saw like this.
These are called “outliers”. Data indicates, that some machines, at some point did not only provide electrical energy instead of requiring it (negative values) but also that it was more than an average powerplant would provide, which is not only unrealistic but would also render the results of Martins calculation pointless. So, Martin cleaned the data manually. In Excel.
In the last step, Martin integrated the power measurements manually to finally arrive at the result he was looking for. 466t of CO2 per year. But after all that, he had no trust whatsoever in this result. The question was not IF this number was wrong, but rather HOW wrong it was.
What we miss out on: saving energy and improving efficiency. Another example.
This example is taken from the research project BaMa- Balanced Manufacturing. One of many use cases that this project covered was between Infineon Technologies and researchers from TU Wien. Their common goal was to optimize the operation strategy of industrial chillers at one of Infineon’s manufacturing plants.
Semiconductor manufacturing takes place in cleanroom conditions. In cleanrooms, temperature and humidity need to be kept between narrow limits. To maintain these conditions, industrial chillers are used. These machines are similar to your fridge at home. They convert electrical energy to provide cool water. Using IoT data such as temperature and power measurements, we could reduce the overall energy demand by 20%, which translated to ~250k€ in cost savings. No additional hardware is required.
For each chiller, historic sensor data was used to estimate characterizing functions and in turn describe the behavior of each of those machines. They built a digital twin of the facility.
Using this digital twin, for each day, they could simply simulate which chiller configuration would need the least electrical energy to be operated. By switching on the best performing chillers first and the worst performing last, we came up with significant saving potential.
As you might, however, expect by now, it also took ages to do this. 5000 hours, to be precise. They heavily relied on using monitoring data and ended up spending almost 70% of their time to access and exchange data between researchers, domain experts, and the IT department.
They encountered the very same problems as Martin in the example above:
- no direct data access for domain experts
- data exchange via .csv files
- domain-specific knowledge (i.e. units, sampling rates, exact locations of sensors) had to be communicated via mail
- Excel as the main “analysis tool”, as no other tool was both usable by non-IT people AND up for the challenge at hand
Remember, this was a research project. They had to spend a lot of time defining methods and goals. This was all done in just 30% of the overall time. Therefore, if they wanted to do the same thing at a different location (i.e. a different manufacturing plant), they would only save 30% of the time. The vast majority of work would have to be done again with no chance to accelerate whatsoever. Therefore, despite a rather significant saving potential, they collectively came to the conclusion that they would not roll out this approach.
The reason: Introducing the Semantic Chasm
These are just two of many examples. After a decade of digitization efforts, data is available abundantly, but yet still hardly ever used. 70% of IoT data remains unused. 80% of data science project costs are spent on data access, cleaning, and preprocessing. With an average of 250.000 € annual investment volume in “Industrie 4.0”, this costs industrial companies dearly.
So what, exactly, is the problem at hand. Why did Martin have to spend so much time to arrive at a result that he could eventually not trust anymore? At its core, the problem is not a technical one. It is much more about communication. Just like the story about The Tower of Babel, where different languages prevented mankind from finishing an ambitious project, also here the problem is that IT people, domain experts from different fields, and data scientists speak different “languages”. Often without realizing it.
There lies a gaping chasm between those that have data and those that want to use it. This chasm exists due to an inability to communicate efficiently between different domains- a Semantic Chasm.
The left-hand side of the chasm represents those that have data. In our first example, this was Martins IT- department. In the second example, it was the semiconductor manufacturer. Here, an abundance of data, distributed in different silos is available. Whenever anybody wants to use it, they need to access these silos, understand them and connect them in a useful manner. Remember, in our example, this data was IoT data, but similar things can be said about data from ERP, MES, or other enterprise data sources.
On the right-hand side, there are those that know what to do with data. In our first example, this was Martin, knowing how to calculate the CO2 footprint base on power measurements. In our second example, this was researchers fitting data models to historic IoT data. They both had an abstract notion of what they were looking for (power measurements, chilled water temperatures, …) but had to connect these with actual data to provide value. There are myriads of solutions like this. Inhouse scrips that some motivated person developed, but also third-party solutions sold by software vendors.
On the left-hand side of the chasm, where industrial companies have data. Here, the following points keep the chasm alive:
- Lack of IT- knowledge: Use cases like those that we discuss here are motivated by engineers, not IT- experts. They are the ones that know what could be feasible and worth doing. They know what to do with available data but hardly ever have the necessary knowledge to access and process it. So they have to involve (internal or external) IT experts to execute their ideas.
- Inflexible Infrastructure: Both examples mentioned here illustrate how data that was stored with a particular use case in mind, was used in a different way a few years after the original system was first erected which induces a lot of friction in conjunction with current industrial IT- infrastructure. This infrastructure is often a monolith, built with a single, defined goal in mind. But just like manufacturing itself, also IT in manufacturing required to become more flexible to react to faster changing requirements from users.
- Heterogeneous data sources: The examples mentioned here are rather simple. All data used in both cases was always stored in just a single database. But in reality, for sufficiently complex use cases, data sources are heterogeneous and distributed. Just think about the semiconductor manufacturer in example two. If they could combine data from all their chillers worldwide, they would potentially end up with much better digital twins and therefore even better savings. But to this end, data from various data sources at different locations would have to be integrated.
On the right-hand side of the chasm are tools and their developers. On their side, the following factors keep the chasm open.
- Lack of domain knowledge: Developers and data scientists are trained to process data. To train machine learning models or to visualize data to make sense out of it. They do, however, often lack the domain-specific knowledge required to correctly interpret data and results of their analysis. This makes it hard for them to improve their tools and, more importantly, address the challenges of their users.
- User- unfriendliness as a business model: This is a hard one. But both consultants and large, industrial software vendors seem not to be interested in empowering their customers to develop their own solutions. Consultants are paid to spend time with their customers. Large software vendors lock their customers into proprietary solutions and charge premium rates for necessary changes.
- Standardization is rarely an option: Integrating a new solution in a brown-field environment is already incredibly challenging with broadly accepted standards in place. But such standards seem not to be anywhere near. No two factories are really the same. On the research side, RAMI4.0 is a rather popular approach towards a common understanding of the manufacturing domain, but as of now, this does not seem to be much more than a few presentation slides.
So what can we do to cross the chasm? For a problem this big, there have to be solutions, right?
How (not to) cross the chasm
Let’s have a closer look at the value chain of using data. It starts with generating it with sensors. After that, it is transmitted and stored. Lastly, data is mapped, cleaned, and preprocessed to eventually be used in some sort of data usage step.
Let’s illustrate this with Martins’s case. First, data is generated by a system of sensors distributed throughout his facility. The data is transmitted and then stored in a central, relational database along with thousands of other signals. Our story starts, when Martin wants to access this data. Asking the IT department for an export, cleaning outliers, understanding the naming scheme- all part of the “Mapping and Cleaning” part of this chain. Finally, he calculated the CO2 footprint. A very simple instance of “Data Usage”. The Semantic Chasm lies between the “Mapping and Cleaning” stage and the “Data Usage” stage. The semantic chasm as described above seems to lie between the “Mapping and Cleaning” stage and the “Data Usage” stage of this chain.
In a space as crowded as industrial data analytics, there have to be solutions to address this, right? Indeed, for all of these steps, there are offers on the market. But there is a significant difference between those that lie within the Semantic Chasm and those on the left-hand and right-hand side of it. To understand this better, as it is good practice, here is a little (non-representative) landscape of these solutions mapped to the steps that they are supposed to help with.
Businesses on the left-hand side of the Semantic Chasm provide great service for their customers and are getting better at connecting new solutions. But only if these solutions are a part of their respective ecosystem. Extending this ecosystem is, however, only an option for sufficiently large use cases. Neither Martin nor researchers from our initial examples would have liked to first become partners of any of the large industrial software vendors before being able to experiment with their data.
Businesses on the right-hand side of the Semantic Chasm that want to “scale” fast without relying on intensive consultancy activities need to focus on a particular vertical. Once chosen, they provide a “full-stack” solution (data generation, transmission, and usage- often skipping the mapping and cleaning step altogether) for a very particular niche. They provide everything necessary and don’t really need to integrate existing data. This is great but is only applicable for use cases that can exist isolated in a silo. A very successful example of this approach is Toolsense.
What remains are logos that are located within the Semantic Chasm. While most logos on this landscape represent software tools of some sort, this is part is still dominated by consultancies. We see companies specializing in translating between the data world and the domain. This is a painstakingly slow and expansive business. The chasm is not really closed but rather filled with thousands of hours of manual labor. Either by employees within a company or external experts. But always expensive.
Just recently, however, No-Code and Low-Code Tools have started gaining exposure in this field. Mendix is one of the few solutions that try to empower domain experts themselves.
Energy management: from nuisance to an enabler of digitization
Energy efficiency and energy management have never had a high priority for most manufacturers. Every once in a while, a CSR report had to be published and sustainability managers, auditors, or consultants tried gathering data and measures from all their colleagues.
This might, however, change faster than most would think. A third of global CO2 emissions come from industrial facilities. If global societies really want to stay true to their commitment to the Paris agreement, they will not only have to increase penalties for emissions significantly but also be much stricter on companies in this sector.
This is a huge challenge for these companies, and they will not be able to neglect the potentials that lie in using already available IoT data. These companies will have to finally have to cross the Semantic Chasm, and energy management will play a crucial role in it.
Other than popular “Industry 4.0” use cases such as predictive maintenance, business intelligence, or quality control, energy management applications do not allow for shortcuts and might moreover be a unique solution to cross the Semantic Chasm for good as well. Here is why.
- Energy management needs to be holistic. For energy management to be successful AND data-driven, the Semantic Chasm needs to truly disappear. There is no “energy management department” in any company. Energy management does not have its own silo, but always uses data from other departments. More importantly, energy management needs to combine that data. Energy demand per product sold? There you go- you already need to combine data from facility management, production, and your ERP.
- Energy management has no single “killer application”. One-time consultancy solutions are great if there is a single thing that you want to achieve. Connecting all your quality data with a machine learning application to reduce the necessary number of product tests. Great! But for energy management, there is no single thing that should be done with data. It is rather a huge number of small things like the analysis Martin did in our first example. To significantly reduce the energy demand of an industrial facility, you need many small use cases instead of just a single huge one.
- Energy management might be a common language. As stated above, at its core, the Semantic Chasm is rather a communication problem and not a technical problem. As a consequence, if we want to cross it, industrial facilities need to find a common language. And energy management might be just that. A language that is understood by engineers from different domains as well as business managers.
The bottom line
The most important takeaway of this post for you, dear reader, is the following. If you are…
- a struggling engineer trying to use IoT data: You are not alone. And it’s not you. Current tools are just not built to be easy to use and you have not been prepared to do this.
- if you are an innovation or digitization manager: AI, blockchain, predictive maintenance are all great. But without fast access to high-quality data, they will not work.
- if you are working on a solution to cross the Semantic Chasm: get in touch. We might have something in common.
Currently, there are only two groups in the industry. Those, that know about this problem and those that haven’t found out yet. If you made it here, you most likely are part of the first group and might want to consider signing-up.