Process-Data-Warehousing-Based Operator Support ...

Viewer
Transcript

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

2

Process-Data-Warehousing-Based Operator Support System for Complex Production Technologies

3

Ferenc Peter Pach, Balazs Feil, Sandor Nemeth, Peter Arva, and Janos Abonyi

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Abstract—Process manufacturing is increasingly being driven by market forces, customer needs, and perceptions, resulting in more and more complex multiproduct manufacturing technologies. The increasing automation and tighter quality constraints related to these processes make the operator’s job more and more difficult. This makes decision support systems (DSSs) for the operator more important than ever before. A traditional operator support system (OSS) focuses only on specific tasks that are performed. In the case of complex processes, the design of an integrated information system is extremely important. The proposed data-warehouse-based OSS makes possible linking complex and isolated production units based on the integration of the heterogenous information collected from the production units of a complex production process. The developed OSS is based on a data warehouse designed by following the proposed focus-on-process data-warehouse-design approach, which means stronger focus on the material and information flow through the entire enterprise. The resulting OSS follows the process through the organization instead of focusing separate tasks of the isolated process units. For human–computer interaction, front-end tools have been worked out, where exploratory data analysis and advanced multivariate statistical models are applied to extract the most informative features of the operation of the technology. The concept is illustrated by an industrial case study, where the OSS is designed for the monitoring and control of a high-density polyethylene (HDPE) plant.

30 Index Terms—Data mining, data warehousing, decision sup31 port system (DSS), heterogenous data integration, human–system 32 interaction, Kalman Filter, neural network (NN), process 33 monitoring.

I. I NTRODUCTION

34

P

ROCESS manufacturing is increasingly being driven by market forces, customer needs, and perceptions, resulting in more and more complex multiproduct manufacturing 38 technologies. It is globally accepted that information is a 39 very powerful asset that can provide significant benefits and 40 a competitive advantage to any organization, including com41 plex production technologies. Since these complex produc42 tion processes consist of several physically isolated distributed 43 production units, and these production units represent heteroge44 nous information sources (e.g., real-time process data, product35

36 37

quality data, financial data, etc.), for the effective operation and improvement of complex technological systems, an integrated information system, including advanced process automation and operator support systems (OSSs), is essential. Such an information system must be able to handle heterogeneities in terms of:

Manuscript received xxxx; revised xxxx. This paper was recommended by Associate Editor xxxx. This paper was supported by the Cooperative Research Center (VIKKK) (Project 2001-II-1), and by the Hungarian Ministry of Education (FKFP-0073/2001) and the Hungarian Science Foundation (T037600). The work of J. Abonyi was supported by the Janos Bolyai Research Fellowship of the Hungarian Academy of Science. The authors are with the Department of Process Engineering, University of Veszprem, H-8201, Hungary (e-mail: [email protected]; http://www. fmt.vein.hu/softcomp). Digital Object Identifier 10.1109/TSMCA.2006.XXXXXX

45 46 47 48 49 50

1) type of information, like: 51 a) prior knowledge arising from natural sciences and 52 engineering—formulated by mathematical equations; 53 b) heuristical empirical knowledge expressed by linguis- 54 tical rules, stored and handled by expert systems; 55 c) sampled and calculated process data. 56 2) data format: 57 a) manually logged data (reports reside in many different 58 file and database structures developed by different 59 vendors); 60 b) databases in different platforms and formats, including 61 the historical databases of a distributed process control 62 system (DCS). 63 3) content: 64 a) product features—measured in laboratory; 65 b) process features—measured by the DCS. 66

IE E Pr E oo f

1

1

These illustrated items show that complex organizations have 67 vast amounts of heterogenous data but have found it increas- 68 ingly difficult to access it and make use of it. Thus, large orga- 69 nizations have had to write and maintain perhaps hundreds of 70 programs that are used to extract, prepare, and consolidate data 71 for use by many different applications for analysis and report- 72 ing. Also, decision makers often want to dig deeper into the 73 data once initial findings are made. This would typically require 74 more intensive and effective integration of the information 75 sources. 76 Several works deal with the problem of heterogeneous data 77 (base) integration. Zhao and Ram [42] dealt with the problem 78 of detecting semantically corresponding records from heteroge- 79 neous data sources because it can be a critical step in integrating 80 the data sources. Many researches focus on how data struc- 81 tured in different ways can be handled. Considering databases, 82 XML files, other structured text files, or web services as infor- 83 mation suppliers, the complexity of integrating the information 84 with their various describing models is not easy to handle. 85 Different solution methods have been worked out (e.g., in 86 [10] and [38]). In [6], a new object-oriented language, with 87 an underlying description logic, was introduced for information 88 extraction from both structured and semistructured data sources 89 based on tool-supported techniques. Paton et al. [31] presented 90 a framework for the comparison of systems, which can exploit 91

1083-4427/$20.00 © 2006 IEEE

2

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

knowledge-based techniques to assist with information integration. Integration of heterogeneous data sources is also related to knowledge discovery and data mining, see, e.g., [15] and [35]. 95 Besides the database integration within a particular produc96 tion unit, there is a need for information integration in the level 97 of the whole enterprise for the purpose of optimal operation. 98 This task cannot be fully automated; there is a need for perma99 nently improved methods and approaches for creation, storage, 100 and dissemination of experience, know how, and judgment 101 embedded in an organization [40]. Since this solution cannot 102 be fully automated, it is costly, inefficient, and very time 103 consuming. 104 The aim of this paper is to illustrate that data warehousing 105 offers a better approach. Data warehousing implements the 106 process to access heterogeneous data sources; clean, filter, and 107 transform the data; and store the data in a structure that is 108 easy to access, understand, and use. The data are then used for 109 query, reporting, and data analysis to extract relevant informa110 tion about the current state of the production, and support the 111 decision-making process related to the control and optimization 112 of the operating technology. 113 Since the increasing automation and tighter quality con114 straints related to production technologies make the employees’ 115 (e.g., the operators’) job more and more difficult, therefore, the 116 main function of the information-integration methods cannot 117 only be data analysis and mining but also the support of the 118 human–system interaction. Hence, the goal of the research 119 and design project is the development of an information sys120 tem, termed OSS, that can handle heterogeneous information 121 sources, process models, and contains several data-mining 122 methods to make the operators’ job easier in reducing the cost 123 of the operation and production. 124 The contribution of this paper is the development of a cost125 effective OSS based on a data warehouse designed with the help 126 of enterprise and process-modeling tools. For human–computer 127 interaction, front-end tools have been worked out, which sup128 port the monitoring of the process and prediction of product 129 quality. 130 The traditional OSS focuses only on specific tasks that are 131 performed. In the case of complex processes, the design of an 132 integrated information system is extremely important. The pro133 posed focus on process-development approach means stronger 134 focus on the material and information flow through the entire 135 enterprise, where the OSS follows the process through the 136 organization instead of focusing separate tasks of the isolated 137 process units. This also means that most of the information 138 moves horizontally within the organization, thus requiring a 139 higher degree of cooperation and communication across the 140 different divisions in the plant [24], [27]. 141 This paper proposes the integration of heterogeneous histor142 ical data taken from the various production units into a data 143 warehouse, with focus on the specialties of the technology. 144 The paper is organized as follows. In Section II-A, main 145 issues of problem description and motivation are described. 146 Section II presents the main features and the structure of the 147 proposed information system. It is presented that the proposed 148 data-warehouse-based OSS and the traditional DCS-based OSS 149 are not alike; the proposed OSS has data-warehouse features: 92

93 94

Fig. 1.

Three-level model of the performance of a skilled human operator.

IE E Pr E oo f

It contains only consistent, nonviolate, preprocessed, historical, 150 and integrated data for analysis, and operates separately from 151 the databases of the original information system. This section 152 also shows that the design of this data warehouse should be 153 based on the synchronization of the events related to the hetero- 154 geneous information sources, which requires the understanding 155 the material, energy, and information flows between the produc- 156 tion units of the technology. In Section III, we show a detailed 157 case study where the goal of the OSS is the support of the 158 polyethylene production in a factory of TVK Ltd. The designed 159 process data warehouse can be used not only for generating 160 report and executing queries, but it also supports the analysis 161 of historical data, process monitoring, and data-mining appli- 162 cations. As an example of how these functions can be used in 163 real-time process management, the developed semimechanistic 164 model of the polymerization unit that integrates first-principle 165 models and neural networks (NNs) for product-quality estima- 166 tion is presented. The results confirm the effectiveness of the 167 proposed structure. Conclusions can be found in Section IV. 168 II. P ROCESS -D ATA -W AREHOUSING AND M INING -B ASED OSS

A. Motivation

169 170 171

The increasing automation and tighter quality constraints 172 related to production processes make the operator’s job more 173 and more difficult. The operators in the process industry have 174 many tasks such as to keep the process condition as closely as 175 possible to a given operating point, to preserve optimality, to 176 detect failures, and to maintain safety. The more heterogenous 177 the units are, the less transparent the system is. Hence, there is 178 a need for an integrated information system that solves these 179 problems and supports the operators’ work. 180 Fig. 1 shows a three-level model of the performance of 181 skilled operators. As this scheme suggests, there is a need for 182 an OSS that indicates intuitive and essential information on 183 what is happening to avoid operator mental overload and gives 184 suggestions according to the operator’s experience and skills 185 [16], [23], [24]. Hence, the OSS of complex processes should 186 be the combination of information systems, mathematical mod- 187 els, and algorithms aimed to extract relevant information (signs, 188 e.g., process trends and symbols) to “ease” the operator’s work. 189

AQ1

AQ3

In the following, the main elements of this kind of system are described. Many works dealt with OSSs, whose aims can be the assess193 ment of systems reliability [11], real-time process monitoring 194 [12], or fault diagnosis [39]. Human trust plays an important 195 role in influencing the operator’s strategies toward the use of 196 automated systems. This is confirmed in [19], where a study 197 was conducted to measure the effect of human trust in a hybrid 198 inspection system given different types of errors (i.e., false 199 alarms and misses). 200 In modern industrial technologies, the existence of a DCS 201 is a basic requirement. This system is responsible for the safe 202 operation of the technology in the local level. In the coordina203 tion level of the DCS, many complex tasks are handled, like 204 controller tuning, process optimization, model identification, 205 and error diagnostic. These tasks are based on process models. 206 As new products are required to be introduced to the mar207 ket over a short time scale to ensure competitive advantage, 208 the development of process models necessitates the use of 209 empirical-based techniques as opposed to first-principle mod210 els, since phenomenological model development is unrealizable 211 in the time available [23]. Hence, the mountains of data that 212 computer-controlled plants generate must be effectively used. 213 For this purpose, most of the DCS systems are able to store 214 operational process data. However, the DCS has limited storage 215 capacity because this is not its main function, only data logged 216 in the last one or two months is stored in these computers. 217 Since data measured in a longer time period have to be used 218 for sophisticated process analysis, quality control, and model 219 building, it is expedient to store data in a historical database 220 that is useful to query, group, and analyze the data related 221 to the production of different products and different grade 222 transitions. Today, several software products in the market 223 provide the capability of integration of historical process data of 224 DCSs: e.g., Intellution i-Historian [8], Siemens SIMATIC [36], 225 the system of Fisher–Rosemount PlantWeb [14], and the Won226 derware FactorySuite 2000 MMI software package [3]. 227 As will be deeply described in the next section and in the 228 case study, there are several heterogeneous information sources 229 that have to be integrated to support the work of operators 230 with relevant, accurate, and useful information. In the case of 231 process systems, standard data warehousing and online analyt232 ical processing (OLAP) techniques are not always suitable for 233 this purpose because the operation units in the process industry 234 have significant dynamical behavior that requires special atten235 tion, contrary to the classical static business models. The source 236 of this dynamical behavior is the dynamical effect of the trans237 portation and mixing of material flows. The residence times of 238 the process units used for the mixing and transportation of the 239 raw materials and products represent time-varying time delay 240 that should be handled at the synchronization of data taken from 241 different process units. Furthermore, in the case of the chemical 242 industry, special attention should be given to the effect of the 243 transformation of the materials, i.e., the model of the chemical 244 reaction should also be incorporated to the “business model” of 245 the data warehouse. Since process-engineering systems rarely 246 operate on a steady-state manner (process transitions, product 247 changes significantly occur), and the control and monitoring 190

191 192

3

of these dynamical transitions are the most critical tasks of 248 the process operators, the synchronization of data taken from 249 heterogeneous information sources of process systems requires 250 dynamical process models. 251 These dynamic qualities of process units and the related data 252 sources make it unsuitable to simply apply standard and OLAP 253 techniques. Hence, as it will be presented in the following 254 section, the integration of the historical databases of DCSs into 255 OSSs is not only a technical problem; in this process, the special 256 features of the technology have to be taken into account. 257 B. Data Warehouse in OSS

258

The traditional OSS focuses only on specific tasks that are 259 performed. In the case of complex processes, the design of 260 the integrated information system is extremely important. This 261 is especially true in the higher level of control of the process 262 and enterprise. For example, to monitor and control the quality 263 of the production (better product quality, less environmental 264 pollution, less off-specification product, etc.), data taken from 265 different units of the production line (e.g., quality of the raw 266 materials, operating parameters of the reaction system, and 267 product-quality measurements) have to be analyzed. Since there 268 is a strong but complex and dynamically changing dependence 269 among these data, there is a need to improve the classical 270 functions and models of standard data warehouses to help the 271 work of operators. 272 The proposed focus-on-process approach means stronger 273 focus on the material and information flow through the entire 274 enterprise, where the OSS follows the process through the 275 organization instead of focusing separate tasks of the isolated 276 process units. This also means that most of the information 277 moves horizontally within the organization, thus requiring a 278 higher degree of cooperation and communication across the 279 different divisions in the plant [24], [27]. 280 This paper proposes the integration of heterogeneous histor- 281 ical data taken from the various production units into a data 282 warehouse, with focus on the specialties of the technology. 283 The resulting model-based information system contains only 284 consistent nonviolate preprocessed historical data [17], and it 285 is working independently from the DCS. This type of data 286 warehouse has the features presented in Table I, and it is called 287 process data warehouse. 288 The fact that the integration of historical process data taken 289 from various production units requires data-warehousing func- 290 tions has already been realized by the developers of mod- 291 ern DCSs, e.g., Honeywell process-history-database module or 292 the i-Historian module of Intellution. This process data ware- 293 house can be the basis of the support information systems 294 (e.g., the OSS) that give the most relevant information about 295 the production process from the integrated data sources. 296 Compared to the traditional data-warehousing strategy [21], 297 in the case of OSS, not only historical data have to be integrated, 298 but it is also important to effectively handle real-time data that 299 represent the current status of the process. Real-time data are 300 typically used by operational applications to run the business 301 and are constantly changing as operational transactions are 302 processed. To use real-time data in a data warehouse, typically, 303

IE E Pr E oo f

AQ2

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

4

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

IE E Pr E oo f

TABLE I MAIN DIFFERENCES BETWEEN THE DCS RELATIONAL DATABASE AND THE PROCESS DATA WAREHOUSE

it first must be cleansed to ensure appropriate data quality, perhaps summarized, and transformed into a format more easily understood and manipulated by business analysts. This is be307 cause the real-time data contain all the individual, transactional, 308 and detailed data values, as well as other data valuable only to 309 the operational systems that must be filtered out. In addition, 310 because it may come from multiple different systems, real-time 311 data may not be consistent in representation and meaning. As 312 an example, the units of measure may differ among systems. 313 These anomalies must be reconciled before loading into the 314 data warehouse. 315 A typical example for the need of dynamic models for data 316 integration is the data reconciliation that is often needed to 317 handle the uncertainties and inaccuracies of process measure318 ments. Data reconciliation is the adjustment of a set of data so 319 the quantities derived from the data obey natural laws, such 320 as material and energy balances. The adjustments are made 321 using redundancies in the measurements. After adjustment, the 322 material and, if considered, the energy balances, are satisfied 323 exactly. Data reconciliation may be performed on a set of 324 steady-state data, using a steady-state model of the process, or it 325 may be applied to dynamic data, using a dynamic model of the 326 process [4], [7], [13], [20]. 327 Since there is no need to provide summarized and integrated 328 data with the sampling time in the range of seconds (the 329 smallest sampling time of the process units), the basic real-time 330 functions that are needed for the control and monitoring of the 331 separate process units remain on the level of the DCS, while 332 the data warehouse is used only for the summarization of these 333 data. Since the calculated and summarized data are needed less 334 frequently (in the range of minutes and hours), and relatively 304

305 306

simple models are used to calculate the dynamical aspects of the material flows and transformations, the integration of real-time data into the data warehouse is not critical in terms of running time and computational costs.

338

C. Enterprise and Process Modeling

339

335 336 337

Actually, data are simply a record of all business activities, 340 resources, and results of the organization. The data model is 341 a well-organized abstraction of that data. Therefore, it is quite 342 natural that the data model has become the best method to un- 343 derstand and manage the business of the organization. Without 344 a data model, it would be very difficult to organize the structure 345 and contents of the data in the data warehouse [5]. 346 The application of data and enterprise modeling (EM) is 347 extremely important, as these models describe the organization, 348 maps the work processes, and thereby identifies the needs of 349 the OSS. 350 Usually, two basic data-modeling techniques are considered: 351 entities relationships (ER) modeling and dimensional model- 352 ing. ER modeling produces a data model of the specific area of 353 interest, using two basic concepts: entities and the relationships 354 between those entities. The ER model is an abstraction tool be- 355 cause it can be used to understand and simplify the ambiguous 356 data relationships in the business world and complex systems 357 environments. Dimensional modeling uses three basic con- 358 cepts: measures, facts, and dimensions. Dimensional modeling 359 is powerful in representing the requirements of the business 360 user in the context of database tables. Both ER and dimensional 361 modeling can be used to create an abstract model of a specific 362 subject. However, each has its own limited set of modeling 363

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

dynamical system. For nonlinear processes, extended Kalman 422 filtering (EKF) should be used [30]. The dynamic model of 423 EKF can be a first-principle model formulated by a set of non- 424 linear differential equations or a black-box model, e.g., an NN. 425 Generally, models used in the state estimation of process 426 systems are formulated by macroscopic balance equations, for 427 instance, mass or energy balances. In general, not all of the 428 terms in these equations are exactly or even partially known. 429 In semimechanistic modeling, black-box models, like NNs, 430 are used to represent the otherwise difficult-to-obtain parts of 431 the model. Usually, in the modeling phase, it turns out which 432 parts of the first-principle model are easier and which are more 433 laborious to obtain, and often, we can get the so-called hybrid 434 model structure that integrates a first-principle model with an 435 NN model, which serves as an estimator of unmeasured process 436 parameters that are difficult to model from first principles [32]. 437 Since the seminal paper of Psichogios, many industrial appli- 438 cations of these semimechanistic models have been reported, 439 and it has been proven that this kind of model has better prop- 440 erties than stand-alone NN applications, e.g., in the pyrolysis 441 of ethane [41], in industrial polymerization [29], and/or bio- 442 process optimization [34]. The aim of the case study of this 443 paper is the examination of the applicability of such semi- 444 mechanistic models in industrial environment, namely, how 445 this model structure can be identified and applied for state 446 estimation in OSS. 447

IE E Pr E oo f

concepts and associated notation conventions. Consequently, the techniques look different, and they are indeed different in terms of semantic representation. 367 The data model plays the role of a guideline, or plan, to 368 implement the data warehouse. Traditionally, ER modeling has 369 primarily focused on eliminating data redundancy and keeping 370 consistency among the different data sources and applications. 371 Consolidating the data models of each business area before the 372 real implementation can help assure that the result will be an 373 effective data warehouse and can help reduce the cost of imple374 mentation. Although ER models can be used to support a data375 warehouse environment, there is now an increased interest in 376 dimensional modeling for that task. 377 The design of a process data warehouse is based on the syn378 chronization of the events related to the different information 379 sources, which requires the understanding the material, energy, 380 and information flow between the units of the plant. For this 381 purpose, not only the abovementioned classical data-modeling 382 techniques have to be used, but also models related to the 383 nonlinear functional relationships of the process and product 384 variables and dynamic models that represent the dynamical 385 behavior of these variables. 386 Data mining is a relatively new data-analysis technique. It is 387 very different from query and reporting and multidimensional 388 analysis in that is uses what is called a discovery technique. 389 That is, the users do not ask a particular question of the data 390 but rather use specific algorithms that analyze the data and 391 report what they have discovered. This discovery could take 392 the form of finding significance in relationships between certain 393 data elements, a clustering together of specific data elements, 394 or other patterns in the usage of specific sets of data elements. 395 Data mining is most typically used for statistical data analysis 396 and knowledge discovery. Statistical data analysis detects un397 usual patterns in data and applies statistical and mathematical 398 modeling techniques to explain the patterns. The models are 399 then used to forecast and predict. These types of statistical 400 data-analysis techniques include linear and nonlinear analysis, 401 regression analysis [2], multivariant analysis, and time-series 402 analysis [1]. 403 In the following, the role of the application of advanced data404 mining tools and dynamical models is presented by two illustra405 tive problems that will be solved in the case study of this paper. 406 Formulated products (plastics, polymer composites) are gen407 erally produced from many ingredients, and a large number of 408 the interactions between the components and the processing 409 conditions all have an effect on the final product quality [22]. 410 When a reliable nonlinear model that is able to estimate the 411 quality of the product is available, it can be inverted to obtain 412 the suitable operating conditions required for achieving the 413 target product quality [25]. If such model is incorporated to the 414 OSS, significant economic benefits can be realized. 415 Advanced control and monitoring algorithms of the OSS are 416 based on state variables that are not always measurable, or 417 they are measured offline. Hence, for the effective application 418 of these tools, there is a need for state-estimation algorithms 419 that are based on the model of the monitored and/or controlled 420 process. In the presence of additive white Gaussian noise, a 421 Kalman filter provides optimal estimates of the states of a linear 364

365 366

5

D. Front-End Tools

448

Complex process technologies are multivariable, exhibit non- 449 linear characteristics, and often have significant time delays. In 450 this case, the operator cannot easily follow and visualize what 451 is happening in the process, so the computer should aid the vi- 452 sualization of the process states and their relation to the quality 453 of the final product. As the final product quality is measured in 454 the quality-control laboratory, not only what-you-see-is-what- 455 you-want (WYSIWYW) interfaces between the operator and 456 the console are important, but also what-you-see-is-what-i-see 457 (WYSIWIS) interfaces between the operators (operators at the 458 reactor, at the product formation process, and at the laboratory) 459 are needed to share the information horizontally in the organiza- 460 tion. A data warehouse provides the base for the powerful data- 461 analysis techniques that are available today such as data mining 462 and multidimensional analysis, as well as the more traditional 463 query and reporting. Making use of these techniques along 464 with process data warehousing can result in easier access to 465 the information the operators need for more informed decision 466 making. 467 Plant operators are skilled in the extraction of real-time 468 patterns of process data and the identification of distinguish- 469 ing features (see Fig. 1). Hence, the correct interpretation of 470 measured process data is essential for the satisfactory execution 471 of many computer-aided intelligent decision support systems 472 (DSS) that modern processing plants require. 473 The aim of the incorporation of multivariate statistical-based 474 approaches into the OSS is to reduce the dimensionality of 475 the correlated process data by projecting them down onto a 476 lower dimensional latent variable space where the operation can 477

6

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

Fig. 2. Web-based data-warehouse structure.

be easily visualized. These approaches use the techniques of principal component analysis (PCA) or projection to latent 480 structure (PLS). Besides process performance monitoring, these 481 tools can be used for system identification [25], [37], ensuring 482 consistent production, and product design [28]. The potential 483 of existing approaches has been limited by its inability to 484 handle more than one recipe/grade. There is, therefore, a need 485 for methodologies from which process representations can be 486 developed, which simultaneously handle a range of products, 487 grade, and recipes [23]. 488 In supervisory control, detection and diagnosis of faults, 489 product-quality control, and recovery from large operation 490 derivations, determining the mapping from process trends to 491 operating conditions is the pivotal task. Query and reporting 492 analysis is the process of posing a question to be answered, 493 retrieving relevant data from the data warehouse, transforming 494 it into the appropriate context, and displaying it in a readable 495 format. It is driven by analysts who must pose those questions 496 to receive an answer. These tasks are quite different from data 497 mining, which is data driven.

498

AQ4

AQ5

E. Web Access of Process Data Warehouse

The leading idea of the incorporation of web technology into the process data warehouse is that all data recording, monitor501 ing, analysis, and mining functions should be accessible by only 502 a single web browser. In the spirit of the cost effectiveness, this 503 solution has been developed by a freeware Apache web server 504 and PHP script (hypertext preprocessor) language. PHP is a 505 server-side script language, therefore, the software and codes 506 responsible for the functions of the OSS run on a web server, 507 which is the traditional and most frequent application form of 508 the PHP. Three components are needed (see on Fig. 2) to use 509 this form: 499 500

510 511 512 513

1) PHP interpreter [in common gateway interface (CGI) or in server module format]; 2) web server; 3) web browser (in the client computers).

The server (back end) is a database engine that ensures the connection between the data tables and the users’ applications, and handles the commands originated from the web server. The client side (front end) is a set of web browsers by which the 518 users can give and edit their requests for queries. Since, in this 519 structure, the server contains the data and performs the com520 putations, this system does not use the computational resources

514

515 516 517

of the client computers, so it allows the cost-effective distribu- 521 tion of the information of the process data warehouse, and its 522 implementation and management is easy and cost effective. 523

III. C ASE S TUDY

524

We begin this section with the presentation of a business 525 problem to be solved. We then define our data-warehouse pro- 526 ject and the business needs on which it is based. The structure 527 (the ER model) of the source data is provided as a starting point. 528 We close the case study with the details of the proposed model- 529 based dynamic data-integration approach, and the process state 530 and product-quality estimation system developed for effective 531 process monitoring. The steps of the design and implementation 532 of this OSS will be also presented. 533

IE E Pr E oo f

478 479

A. Problem Description

534

Formulated products (plastics, polymer composites) are gen- 535 erally produced from many ingredients, and a large number of 536 the interactions between the components and the processing 537 conditions all have an effect on the final product quality. If 538 these effects are detected, significant economic benefits can be 539 realized. The major aims of monitoring plant performance are 540 the reduction of off-specification production, the identification 541 of important process disturbances, and the early warning of 542 process malfunctions or plant faults. Furthermore, when a 543 reliable model that is able to estimate the quality of the product 544 is available, it can be inverted to obtain the suitable operating 545 conditions required for achieving the target product quality. The 546 above considerations lead the foundation of the “Optimization 547 of Operating Processes” project of the VIKKK Research Center 548 at the University of Veszprem, supported by the largest Hungar- 549 ian polymer production company (TVK Ltd.). 550 TVK Ltd. produces medium-density polyethylene (MDPE) 551 and the high-density PE (HDPE) with the technology of Phillips 552 Petroleum Co., which is divided into three separated units in 553 aspect of information sources: 554 1) polymerization production unit; 555 2) granulation production unit; 556 3) PE quality-control laboratory. 557 In Fig. 3, the information flow between these units are depicted. 558 The production of the polymer powder in the polymerization 559 production unit is the most important step of the process (see 560 Fig. 4). The melting point of HDPE is approximately 135 ◦ C. 561

AQ6

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

Fig. 3.

Information connections between the production units.

Therefore, slurry polymerization takes place at a temperature below 135◦ C; the polymer formed is in the solid state. The Phillips process takes place at a temperature between 565 85–110 ◦ C. The catalyst and the inert solvent are introduced 566 into the loop reactor where ethylene and an α-olefin (hexene) 567 are circulating. The inert solvent (isobuthane) is used to dis568 sipate heat as the reaction is highly exothermic. A cooling 569 jacket is also used to dissipate heat. The reactor consists of a 570 folded loop containing four long runs of pipe 1 m in diameter, 571 connected by short horizontal lengths of 5 m. The slurry of 572 HDPE and catalyst particles circulates through the loop at a 573 velocity between 5–12 m/s. The polymer is concentrated in 574 settling legs to about 60–70% by weight slurry and continu575 ously removed. The solvent is recovered by hot flashing. The 576 polymer is dried and pelletized. The conversion of ethylene 577 to polyethylene is very high (95–98%), eliminating ethylene 578 recovery. The molecular weight of HDPE is controlled by the 579 temperature of catalyst preparation. The main properties of 580 polymer products [melt index (MI) and density] are controlled 581 by the reactor temperature, monomer, comonomer, and chain582 transfer-agent concentration. 583 An interesting problem with the process is that it requires 584 the production of about ten product grades according to market 585 demand. Hence, there is a clear need to minimize the time 586 of changeover because an off-specification product may be 587 produced during transition. The difficulty of the problem comes 588 from the fact that there are more than ten process variables to 589 consider. 590 The problem is not only arising from the wide product palette 591 and from the frequent product change, but also the heterogene592 ity of the measurement data in terms of time horizons and 593 formats.

mer, the monomer, the solvent, and the chain-transfer- 605 agent inlet flowrate and temperature; uk,9 —polymer 606 production rate; uk,10 —the flowrate of the catalyzator; 607 uk,(11,...,13) —cooling-water flowrate, inlet and outlet 608 temperature. 609 2) The devices of the granulation production unit are con- 610 trolled by programmable logical controllers (PLCs). In 611 this unit, not all data are stored electronic. The data are 612 mostly logged manually in reports related to the events 613 that happen in every 1 or 2 h. 614 3) In the PE quality-control laboratory, the measured data 615 are stored in reports (e.g., polymer powder and granulate 616 classification report, batch qualification report, and prod- 617 uct change report). 618 The product quality yk is determined by offline labora- 619 tory analysis after drying the polymer, which causes a 1-h 620 time delay. The most important quality variables are the 621 MI and the density, whose sampling-time intervals are 622 between 0.5 and 5 h. While the sampling and the mea- 623 surement of the quality of the polymer powder and the 624 granulate are made in every 1 or 2 h, the time of the 625 qualification of the batches strongly depends on the tech- 626 nology. 627

IE E Pr E oo f

562 563 564

7

594 595 596 597 598 599 600 601 602 603 604

1) A Honeywell DCS operates in the polymerization production unit, which serves the data via the so-called process history database (PHD) module. This database contains the most important process variables and some technological variables calculated by the advanced process control (APC) module of the DCS. In the following, among these process variables, the most important ones are mentioned, which are used for process modeling and monitoring purposes. Measurements are available every 15 s on process variables, which consist of input and output variables: uk,(1,...,8) —the comono-

Fig. 5 shows the relation between these information sources 628 and their sampling frequency, and the time horizons of the 629 measured data. 630 Since it would be useful to know if the product is good before 631 testing it, the monitoring and the estimation of the state and 632 product-quality variables would help in the early detection of a 633 poor-quality product. There are other reasons why monitoring 634 the process is advantageous. Only a few properties of the 635 product are measured, and sometimes these are not sufficient 636 to define entirely the product quality. For example, if only rheo- 637 logical properties of a polymer are measured (MI), any variation 638 in end-use application that arise due to variation of chemical 639 structure (branching, composition, etc.) will not be captured 640 by following only these product properties. In these cases, the 641 process data may contain more information about events with 642 special causes that may effect the product quality [18]. 643

B. Process Data Warehouse

644

The data-warehouse project was implemented in three steps, 645 depicted on Fig. 6. 646 Besides the operational database of the DCS, the information 647 sources are the standard data sheets and reports, which often 648 include redundant information. Unfortunately, the comparison 649 of these reports collected from different process units proved 650 that these separated sources of information include contradic- 651 tions as well. Consequently, electronic forms have been created 652 to avoid these problems. The following aspects were kept in 653 mind at the design of these forms and related data tables: Data 654 should be inserted into only one table; “everybody” should be 655 able to access the necessary information; the rights for data 656 upload, query, and change of the data should be clarified; and 657 the identification of responsible users for the data and data 658 security should be solved. 659

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

IE E Pr E oo f

8

Fig. 4. Scheme of the polymerization production unit (Phillips loop-reactor process).

According to the above aspects, the following tables were defined: 662 1) PE laboratory: 663 a) results of powder analysis; 664 b) results of granulate analysis; 665 c) batch qualification results; 666 d) other analysis. 667 2) Granulation production unit: 668 a) silo data (powder, granulate, mixer); 669 b) data of extruders; 670 c) chargeman reports. 671 These tables include the measurements of the laboratory and 672 the events that are recorded by the chargeman (and by the 673 operators). The technological variables of the polymerization 674 reactors are stored via the PHD module of the Honeywell sys675 tem (reactor data, cleaning system, etc.) as well as the features 676 calculated by the APC module: some state variables and vari677 ables of the input streams (e.g., temperature, pressure, concen678 trations), and other data (e.g., catalyst activation). 679 Besides the web-based front-end tools, applications based on 680 MS-Excel, MS-Access, and Visual Basic have been worked 681 out. This was proven to be practical at the beginning of the 682 project because the employees in the production unit and in the 683 laboratory were expert in the usage of these simple tools. 660

661

684

C. Dynamic Model for Data Integration

685

To detect and analyze causal relationships among the process and quality variables taken from several heterogenous informa-

686

Fig. 5.

Time horizons of the measured data.

tion sources, the laboratory measurements and the operating 687 variables of the reactors and extruders have to be synchronized 688 based on the model of the main process elements (e.g., pipes, 689 silos, flash tanks). For this purpose, based on the models of 690 the material and information flows, MATLAB scripts were 691 written to collect all the logged events of the whole production 692 line, and to arrange and recalculate the time of these events 693 according to the “life” of the product from the reactor to the 694 final product storing. In this section, the basic considerations 695 behind this dynamic data integration are presented. The general 696 theoretical issues behind this implementation step are presented 697 in Section II-C. 698

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

Fig. 6.

Main tasks of the project.

The connection between the polymerization production unit and the granulation unit is determined by the input flow from the polymerization production unit into the top of the silos and the output flow from the bottom of the silos into the granulation 703 unit (Fig. 7). 704 Since the dynamical behavior of the silos, the integration of 705 the information about these units, is not realizable by static 706 structured query language (SQL) queries and OLAP functions. 707 This solution should be based on the dynamic model of the silos 699

T

(Fin,i (t) − Fbatch,i (t) − Fadd,i (t)) d t + M0,i

Mi (T ) = t0

(1)

where Fin,i (t) is the input mass flow that can be calculated based on the transport report and the productivity estimated by the DCS; Fbatch,i (t) and Fadd,i (t) are the mass flows of the feeding container and of the premixed additive ingredients cal712 culated by the reports that include details of extrusion process; 713 and M0,i is the mass, which is calculated using the previously 714 measured levels of the silo. This simple model is the base of the 715 calculation of the actual polymer mass Mi (T ) in the ith silo. 716 From technological viewpoint, the age of the polymer pow717 der leaving the silo is more important than the actual mass of the 718 polymer in the silo, because the measured polymer quality can 719 be retrievable based on the age of the polymer. The modification 720 of (1) can give the answer of how old the polymer is (T ∗ time): 708

709 710 711

T ∗ T 0 = Fin,i (t)dt − (Fbatch,i (t) + Fadd,i (t)) d t + M0,i . t0

Based on this model, the data warehouse is able to answer the 723 following questions. 724 1) What is the content of the silos? How much powder 725 is there with a given feature in a particular silo at an 726 arbitrary time instant? The answer of this question is very 727 useful in the scheduling of powder processing. 728 2) What kind of polymer will be granulated? Polymer 729 powder with what property is being processed from which 730 silos at a given time instant. This is important to the 731 estimation of the powder quality before the extrusion, 732 because the operation of the extrusion could be controlled 733 feedforward based on this estimated feature before the 734 granulate product qualification. For this purpose, the ef- 735 fect of the mixing of polymer powders and the operation 736 parameters of extrusion (e.g., temperature and power 737 consumption) should be explored. 738 The density of the mixed polymer powder is calculated 739 by the properties of the “raw” polymers in proportion to 740 their quantities. In the calculation of the MI, it is assumed 741 that the average molecule mass (M ) can be calculated by 742 the average molecule masses of the polymer powders in 743 proportion with their total masses 744

IE E Pr E oo f

700 701 702

t0

(2) 721 722

9

The combination of (1) and (2) gives the age of the polymer powder leaving the silo T Mi (T ) =

Fin,i (t)d t. T∗

F1 M 1 + F2 M 2 = (F1 + F2 )M

(4)

while the relationship between the average molecule mass 745 and the MI is the following [9] 746 M2 = M1

M I2 M I1

0.294

.

(5)

In the case of two polymer powders, the MI is calculated 747 by the following way 748 

0.294  M I2 + F F 2 M I1 1  1  log  log M I =  0.294 F1 + F2

(3) + log M I1 . (6)

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

IE E Pr E oo f

10

Fig. 7. Dynamic behavior of the polymer powder silos. 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763

3) What will be the quality of the end product? How the endproduct features are correlated to the measurements of the laboratory samples used for the quality control of the process to the final batch qualification measured after the homogenization (mixing) of the product is an important question. For this purpose, the mass of the batch calculated based on the extruding data sheets should be compared to its measured value [see Fig. 8(a)]. The next step of the validation is the preestimation of the batch features by the hourly sampled mass rate. These features are measured after the homogenization of the product, which takes 8 h. Fig. 8(b) shows the accuracy of the estimation. 4) Product retrieval: Due to the dynamical behavior of the complex production technology, it is not trivial to assign

in which period of the operation of a given process 764 unit in which the final product is produced. Hence, the 765 process (silo) model shown above makes possible the 766 calculation of the time delays represented by the flows 767 among the distributed process units, and the use of these 768 time delays to synchronize the events of the production 769 of a given product, which makes possible the retrieval of 770 the details of the production process (process and product 771 variables). 772

D. Analysis of Grade Transitions

773

Based on the historical data of a half-year operation, all of 774 the productions and grade transitions have been preprocessed 775 by the abovementioned models. As Fig. 9 shows, based on 776

Fig. 8.

IE E Pr E oo f

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

Validation and estimation of the final product quantity and quality.

the database of the grade transitions, we have designed special plots that can be used by the operators as patterns of recent control strategies. Not only tools for the visualization of time series have been developed but also plots illustrating the safety 781 constraints too [see Fig. 10(a)]. In some cases, the difficulty 782 of the analysis of these plots comes from the fact that there 783 are more than ten process variables to consider. As Fig. 10(b) 784 shows, the proposed OSS consists of PCA-based visualization 785 tools to solve this problem, since the two-dimensional space of 786 the transformed variables, the plotted Hotelling T 2 , and model 787 error measures Q are able to give an insight to the process 788 behavior and to detect faults. 777

778 779 780

789

E. Modeling and Monitoring of a Polyethylene Plant

Advanced control and monitoring algorithms are based on state variables that are not always measurable, or they are measured offline. Hence, for the effective application of these 793 tools, there is a need for state-estimation algorithms that 790

791 792

11

are based on the model of the monitored and/or controlled 794 process. 795 MI and density are the product-quality variables that are 796 needed to estimate because the interval between the product 797 samples is between 0.5 and 5 h. Since, it would be useful to 798 know if the product is good before testing it, the monitoring of 799 the process would help in the early detection of a poor-quality 800 product. This study focuses on the MI prediction. MI depends 801 on the state variables that describe the behavior of the dynamic 802 system; therefore, for the development of a soft sensor, it is 803 necessary to estimate these variables. 804 The state variables are: xk,(1,2) —the mass of the fluid and 805 the polymer in the reactor; xk,(3,...,6) —the chain-transfer agent, 806 monomer, comonomer, and catalyst concentration in the loop 807 reactor; and xk,7 —the reactor temperature. Measurements are 808 available on the chain-transfer agent, monomer and comonomer 809 concentration, reactor temperature, and the density of the slurry 810 in the reactor (which is connected with the mass of the fluid 811 and the polymer in the reactor). There are additional state 812

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

IE E Pr E oo f

12

Fig. 9. Grade transition from product A to product E (left: reactor process variables, right: quality-related variables, laboratory measurements).

variables that must be identified: xk,(8,...,10) —the reactionrate coefficients, because they are not known precisely. The concentration of the comonomer in the instantly formulated polyethylene is xk,11 , and the MI also can be seen as a state 817 variable xk,12 . 818 Generally, models used in the state estimation of process 819 systems are formulated by macroscopic balance equations, for 820 instance, mass or energy balances. In general, not all of the 821 terms in these equations are exactly or even partially known. 822 In semimechanistic modeling, black-box models, like NNs, 823 are used to represent the otherwise difficult-to-obtain parts of 824 the model. Usually, in the modeling phase, it turns out which 825 parts of the first-principle model are easier and which are more 813

814 815 816

laborious to obtain, and often, we can get the following hybrid 826 model structure 827 xk+1 = f (xk , uk , vk , fNN (xk , uk ), θ),

yk = g(xk , wk ) (7)

where xk , yk , and uk represents the states, the outputs, and 828 the inputs of the system, and vk and wk are noise variables. 829 fNN = [fNN,1 , . . . , fNN,n ]T represents the black-box elements 830 of the model (NNs) and θ, the parameter set of fNN represented 831 by feedforward multi-input single-output NNs with one hidden 832 layer and one output neuron: fNN,i (z, θ) = w2 tanh(W1 z + 833 b1 ) + b2 , where nn represents the number of hidden neurons, 834

13

IE E Pr E oo f

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

Fig. 10. OSS plots based on PCA.

z = [z1 , . . . , zni ]T is the input of network (ni × 1), W1 is the weight of the hidden layer (nn × ni), b1 is the bias of the hidden layer (nn × 1), w2 is the weight of the output layer 838 (1 × nn), and b2 is the bias of output layer (1 × 1); so, θ 839 denotes the set of parameters: θ = {W1 , w2 , b1 , b2 }. 835 836 837

The MI of the instantly produced polyethylene is mainly dependent on the current ethylene concentration in the loop reactor (x4 ), the reactor temperature (x7 ), and the concentration of the hexene in the instantly formulated polyethylene (x11 ). These three variables and the other state variables can be

840 841 842 843 844

14

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

the measured data (shown in Fig. 12) and estimate the corre- 878 sponding derivatives in the rearranged (8) to obtain the desired 879 outputs of the NN 880 fNN (x4 , x7 , x11 ) =

1 R

d xξ d x2 x2 12 +Fout xξ12 + xξ12 dt dt

ξ1 . (10)

Fig. 11. Semimechanistic model of the system.

calculated by a nonlinear-state-estimation algorithm based on the first-principle model of the system (see FP1 in Fig. 11). The BB1 box contains a black box in which an NN calculates the 848 MI of the instantaneously produced polyethylene (fNN ). Since 849 the produced polyethylene that leaves the reactor is the mixture 850 of the previously produced products, the evolving M I can be 851 calculated by the first-principle model of the mixing (see FP2 852 in Fig. 11): d xξ12 1 = dt x2

IE E Pr E oo f

845 846 847

ξ (x4 , x7 , x11 ) RfNN

−

Fout xξ12

−

d x2 xξ12 dt

(8)

where R = (x8 x3 x1 x6 x2 + x9 x4 x1 x6 x2 + x10 x5 x1 x6 x2 ) represents the instantaneously produced mass of the polyethylene, Fout is the polymer mass leaving the reactor, and 856 ξ = −0.294 is an empirical coefficient. 857 To train the NN parts of the previously presented semi858 mechanistic process model, pairs of input/output data should 859 be used to determine the θ parameter set (weights of the NN) 860 in such way that the sum of the squared deviations VN = N 861 (1/2N )

k )2 between the predicted output of netk=1 (yk − y 862 work and the corresponding training data becomes minimal. 863 The usual way to minimize VN is to use gradient procedures, 864 like the Gauss–Newton algorithm. Weights in the ith step of this 865 iterative process are changed in the direction of gradient. 853 854 855

θi+1 = θi − µR−1 i VN

(9)

N T where Ri =(1/N ) N k=1 jk,θ jk,θ , VN = −(1/N ) k=1 (yk −

k )jk,θ , jk,θ = (∂yk,θ /∂θ) = (∂gk /∂x)(∂xk,θ /∂θ). y The key problem of the application of this approach is the 869 determination of ∂x/∂θ, because in semimechanistic models, 870 the NN’s output does not appear explicitly in the above ex871 pression as it is part of the differential-equation system. In this 872 case, the NN can be trained by the integration of the sensitivity 873 equations, using a nonlinear programming technique, using 874 EKF for state and parameter estimation, or by using a spline875 smoothing approach [34]. 876 In this paper, the Hermite spine-smoothing method (pre877 sented in the Appendix) has been applied to interpolate between

866 867 868

In the applied cubic splines, which are piecewise third-order 881 polynomials, the polynomials are defined such that their values 882 and first derivatives are continuous at so-called knots where the 883 individual polynomials are interconnected. When such splines 884 are identified, a continuous function is fitted to the available 885 measured data, x = [x1 , . . . , xN ]T given at time instants t = 886 [t1 , . . . , tN ]T (see Fig. 12). 887 For the identification of the NN, four data sets that include 888 the same product transition have been used. Each data set 889 contains 80 h of operation, in which the product change starts 890 around the 50th hour. Among the four data sets, three were used 891 for the training of the semimechanistic model and the other one 892 for the validation. This ratio fits the proposed ratio in [33], and 893 it could improve the accuracy of the NN. The Levenberg– 894 Marquardt algorithm was used for training the NN. The number 895 of the hidden neurons, nn = 4, was estimated by applying 896 the four cross-validation methods. The results were compared 897 with the results of a linear model identified based on the same 898 data sets. The average validation mean square error (mse) is 899 equal to 0.0037 in the case of NN with four hidden neurons, 900 but this value is much worse, e.g., 0.0204, in the case of 901 the linear model. It can be determined that the linear model 902 gives acceptable but worse results than the NN in the case of 903 “normal” operating conditions, i.e., without product changes. 904 However, it cannot handle and accurately predict product 905 changes and unusual operations like changes in the catalyst 906 activity because of changes from one catalyst tank to another, 907 etc. The difference between the simple estimation performances 908 of the linear and neural models can be seen on the right side 909 of Fig. 12. 910 The identified hybrid model was used in nonlinear state 911 estimation. The EKF is based on the Taylor linearization of 912 the state transition and output model equations. Instead of 913 this solution, a more advanced state-estimation tool, the DD1 914 filter, has been used, which is based on approximations of the 915 model equations with a multivariable extension of Stirling’s 916 interpolation formula. This filter is simple to implement as no 917 derivatives of the model equations are needed, yet it provides 918 excellent accuracy [30]. For the feedback of the filter, the yk 919 outputs of the system were chosen variables that are measured 920 connected to the reactor and the product. Measurements are 921 available on yk,1 chain-transfer agent, yk,2 monomer, and yk,3 922 comonomer concentration every 8 min, yk,4 reactor temper- 923 ature and yk,5 density of the slurry in the reactor (which is 924 connected with the mass of the fluid and the polymer in the 925 reactor) every 15 s. Measurements of yk,6 MI was mentioned 926 above. As Fig. 13 shows, the resulting soft sensor gives an 927 excellent performance. 928

IE E Pr E oo f

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

15

Fig. 12. (Left) Spline interpolation of MI (solid line) and current MI (dashed line). (Right) Comparison of the NN (solid line) and linear model (dashed line).

929

IV. C ONCLUSION

In this paper, the structure of a data-driven OSS has been pro931 posed for the monitoring and control of complex multiproduct 932 processes. As the proposed approach extensively uses process 933 data, the OSS is based on a data warehouse designed with the 934 help of enterprise and process-modeling tools. 935 The traditional OSS focuses only on specific tasks that are 936 performed. In the case of complex processes, the design of 937 an integrated information system is extremely important. The 938 proposed data-warehouse-based OSS makes possible linking 939 complex and isolated production units based on the integration 930

of the heterogenous information collected from the produc- 940 tion units of a complex production process. The developed 941 OSS is based on a data warehouse designed by following the 942 proposed focus-on-process data-warehouse-design approach, 943 which means stronger focus on the material and information 944 flow through the entire enterprise. The resulting OSS follows 945 the process through the organization instead of focusing sepa- 946 rate tasks of the isolated process units. For human–computer 947 interaction, front-end tools have been worked out, where 948 advanced multivariate statistical models (PCA) are applied to 949 extract the most informative features of the process. 950

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

IE E Pr E oo f

16

Fig. 13. (Left) Estimated MI given by the DD1 algorithm. (Right) Estimated state variables used by the fNN model.

The concept is illustrated by an industrial case study, where the OSS is designed for the monitoring and control of a HDPE plant. When we attempted to use first-principle modeling, stan955 dard data mining, and multivariate statistical tools for these 956 industrial problems, we realized that the studied multiproduct 957 production systems are typically ill-defined systems, difficult to 958 model, and they have large-scale solution spaces. Furthermore, 959 the relevant available information is usually in the form of 960 empirical prior knowledge and historical process data. To detect 961 and analyze causal relationships among the process and quality 962 variables taken from several heterogenous information sources, 951

952 953 954

the laboratory measurements and the operating variables of the 963 reactors and extruders have to be synchronized based on the 964 model of the main process elements (e.g., pipes, silos, flash 965 tanks). For this purpose, based on the models of the material 966 and information flows, MATLAB scripts were written to collect 967 all the logged events of the whole production line and to arrange 968 and recalculate the time of these events according to the “life” 969 of the product from the reactor to the final product storing. 970 To estimate the product quality, an approximate reasoning 971 system is needed, which is capable of handling imperfect 972 information. In the proposed structure with the integration of 973 modeling and monitoring functions, a new method, which is 974

PACH et al.: PROCESS-DATA-WAREHOUSING-BASED OPERATOR SUPPORT SYSTEM

based on semimechanistic modeling and nonlinear state estimation, was proposed for this purpose. For the identification of an NN, a spline-smoothing approach has been followed, where 978 splines have been used to extract the desired outputs of the NN 979 from infrequent and noisy measurements. 980 The results show that the proposed process-data981 warehousing and data-mining methods are efficient and useful 982 tools for data integration, decision support, and state and 983 product-quality estimation, which can be useful tools to in984 crease the productivity of complex technological processes. 975

976 977

985

A PPENDIX

To formulate the spline-smoothing algorithm (see [26] for more details), let us define a cubic spline for a knot se988 quence: t1 = k1 < k2 < · · · < kn−1 < kn = tN . The cubic 989 spline is a sequence of cubic polynomials defined for each 990 interval, [k1 , k2 ], [k2 , k3 ], . . . , [kn−1 , kN ], by the combination 991 of the function values and the first-order derivatives at the knots 992 S(t) = si ai (t) + si+1 bi (t) + si ci (t) + si+1 di (t) for ki ≤ t < 993 ki+1 , where si = S(ki ), si = (d S(t)/d t)|t=ki , ai (t) = 994 ((ki+1 − t)2 (t − ki ))/h2i , bi (t) = −((ki+1 − t)(t − ki )2 )/h2i , 995 ci (t) = ((ki+1 − t)2 (2(t − ki ) + hi ))/h3i , and di (t) = 996 ((t − ki )2 (2(ki+1 − t) + hi ))/h3i , where hi = ki+1 − ki . 997 As can be seen, the spline is linear in the parameters φ = 998 [s1 , s1 , s2 , s2 , . . . , sN , sN ]T . Hence, the φ parameter vector can 999 be determined by minimizing the following quadratic cost func N 2 1000 tion: minφ Q(φ), where Q(φ) = (1/N ) i=1 (xi − S(ti )) . 1001 This optimization problem can be solved analytically by the 1002 ordinary linear least squares (LS) method. The advantage of 1003 the utilized cubic spline is that the integral and derivative of 1004 the spline is also linear in the parameters of the spline, e.g., 1005 d S(t)/d t = si ai (t) + si+1 bi (t) + si ci (t) + si+1 di (t), where 1006 the means the derivative on time. 1007

ACKNOWLEDGMENT

1008 1009 1010

The authors would like to acknowledge the support of their industrial partners at TVK Ltd., especially M. Nemeth and Dr. G. Nagy.

1011

R EFERENCES

1012 1013 1014 1015 1016 1017 1018 1019 1020 AQ7 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031

[7] T. Bindera, L. Blank, W. Dahmen, and W. Marquardt, “On the regularization of dynamic data reconciliation problems,” J. Process Control, vol. 12, no. 4, pp. 557–567, 2002. [8] G. Capocaccia, “Intellution production is the heart of manufacturing e-business, i-Historian,” in Distributed Control Systems 7th Meeting, Miskolc, Hungary, 2001. [9] N. P. Cheremisinoff, Encyclopedia of Engineering Materials Part A: Polymer Science and Technology, vol. 1. New York: Marcel Dekker, 1989. [10] S. R. Collins, S. Navathe, and L. Mark, “Xml schema mappings for heterogeneous database access,” Inf. Softw. Technol., vol. 44, no. 4, pp. 251–257, 2002. [11] A. G. de Araujo Goes, M. A. B. Alvarenga, and P. F. F. Frutuoso E Melo, “Naroas: A neural network-based advanced operator support system for the assessment of systems reliability,” Reliab. Eng. Syst. Saf., vol. 87, no. 1, pp. 149–161, 2005. [12] F. Doymaz, J. Chen, A. Romagnoli, and A. Palazoglu, “A robust strategy for real-time process monitoring,” J. Process Control, vol. 11, no. 4, pp. 343–359, 2001. [13] Z. H. Abu el zeet, V. M. Becerra, and P. D. Roberts, “Combined bias and outlier identification in dynamic data reconciliation,” Comput. Chem. Eng., vol. 26, no. 6, pp. 921–935, 2002. [14] S. Fle, “Integration of distributed—and enterprise control systems,” in Distributed Control Systems 5th Meeting, Miskolc, Hungary, 1999. [15] N. Giannadakis, A. Rowe, M. Ghanem, and Y.-K. Guo, “Infogrid: Providing information integration for knowledge discovery,” Inf. Sci., vol. 155, no. 3–4, pp. 199–226, Oct. 2003. [16] S.-H. Huang, J.-X. Qian, and H.-H. Shao, “Human-machine cooperative control for ethylene production,” Artif. Intell. Eng., vol. 9, no. 3, pp. 203– 209, 1995. [17] W. H. Inmon, Building the Data Warehouse, 3rd ed. New York: Wiley, 2002. [18] C. M. Jeackle and J. F. MacGregor, “Product design through multivariate statistical analysis of process data,” AIChE, Amer. Inst. Chem. Eng. J., vol. 44, no. 5, pp. 1105–1118, May 1998. [19] X. Jiang, M. T. Khasawneh, R. Master, S. R. Bowling, A. K. Gramopadhye, B. J. Melloy, and L. Grimes, “Measurement of human trust in a hybrid inspection system based on signal detection theory measures,” Int. J. Ind. Ergon., vol. 34, no. 5, pp. 407–419, 2004. [20] V. P. Barbosa, Jr., M. R. M. Wolf, and R. M. Fo, “Development of data reconciliation for dynamic nonlinear system: Application the polymerization reactor,” Comput. Chem. Eng., vol. 24, no. 2, pp. 501–506, Jul. 2000. [21] R. Kimball, The Data Warehouse Toolkit. New York: Wiley, 1996. [22] S. Lakshminarayanan, H. Fujii, B. Grosman, E. Dassau, and D. R. Lewin, “New product design via analysis of historical databases,” Comput. Chem. Eng., vol. 24, no. 2–7, pp. 671–676, 2000. [23] S. Lane, E. B. Martin, R. Kooijmans, and A. J. Morris, “Performance monitoring of a multi-product semi-batch process,” J. Process Control, vol. 11, no. 1, pp. 1–11, Feb. 2001. [24] C. Lindheim and K. M. Lien, “Operator support systems for new kinds of process operation work,” Comput. Chem. Eng., vol. 21, no. 6, pp. 113– 118, May 1997. [25] J. F. MacGregor and T. Kourti, “Statistical process control of multivariate processes,” Control Eng. Pract., vol. 3, no. 3, pp. 403–414, 1995. [26] J. Madar, J. Abonyi, H. Roubos, and F. Szeifert, “Incorporating prior knowledge in a cubic spline approximation—Application to the identification of reaction kinetic models,” Ind. Eng. Chem. Res., vol. 42, pp. 4043– 4049, 2003. [27] A. Mjaavatten and B. A. Foss, “A modular system for estimation and diagnosis,” Comput. Chem. Eng., vol. 21, no. 11, pp. 1203–1218, 1997. [28] Y. Moteki and Y. Arai, “Operation planning and quality design of a polymer process,” in Proc. IFAC Symp. Dynamics and Control Reactors and Distillation Columns (DYCORD), Bournemouth, U.K., 1986, pp. 159– 165. [29] C. A. O. Nascimento, R. Giudici, and N. Scherbakoff, “Modeling of industrial nylon-6,6 polymerization process in a twin-screw extruder reactor. II. Neural networks and hybrid models,” J. Appl. Polym. Sci., vol. 723, pp. 905–912, 1999. [30] M. Norgaard, N. Poulsen, and O. Ravn, “New developments in state estimation for nonlinear systems,” Automatica, vol. 36, no. 11, pp. 1627– 1638, Nov. 2000. [31] N. W. Paton, C. A. Goble, and S. Bechhofer, “Knowledge based information integration systems,” Inf. Softw. Technol., vol. 42, no. 5, pp. 299–312, Apr. 2000. [32] D. C. Psichogios and L. H. Ungar, “A hybrid neural network-first principles approach to process modeling,” AIChE J., vol. 38, no. 10, pp. 1498– 1511, Oct. 1992.

IE E Pr E oo f

986 987

[1] J. Abonyi, B. Feil, S. Nemeth, and P. Arva, “Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series,” Fuzzy Sets Syst., Data Mining Special Issue, vol. 149, no. 1, pp. 39–56, Jan. 2005. [2] J. Abonyi, S. Nemeth, C. Vincze, and P. Arva, “Process analysis and product quality estimation by self-organizing maps with an application to polyethylene production,” Comput. Ind., Special Issue on Soft Computing in Industrial Applications, vol. 52, no. 3, pp. 221–234, Dec. 2003. [3] I. Ajtonyi and A. Ballagi, “Integration of DCS in the complex producing system with Wonderware FactorySuite 2000 MMI software package,” in Distributed Control Systems 7th Meeting, Miskolc, Hungary, 2001. [4] M. J. Bagajewicz and Q. Jiang, “Comparison of steady state and integral dynamic data reconciliation,” Comput. Chem. Eng., vol. 24, no. 11, pp. 2367–2383, Nov. 2000. [5] C. Ballard, D. Herreman, D. Schau, R. Bell, E. Kim, and A. Valencic, Data Modeling Techniques for Data Warehousing, p. 25, 1998. White Plains, NY: Int. Tech. Support Org., IBM. [Online]. Available: http://www.redbooks.ibm.com [6] S. Bergamaschi, S. Castano, M. Vincini, and D. Beneventano, “Semantic integration of heterogeneous information sources,” Data Knowl. Eng., vol. 36, no. 3, pp. 215–249, Mar. 2001.

17

1032 1033 1034 1035 1036 1037 AQ8 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 AQ9 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 AQ10 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 AQ11 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108

18

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 36, NO. 1, JANUARY 2006

[33] R. J. Schalkoff, Artiﬁcial Neural Networks. New York: McGraw-Hill, 1997. [34] J. Schubert, R. Simutis, M. Dors, I. Havlik, and A. Lbbert, “Bioprocess optimization and control: Application of hybrid modeling,” J. Biotechnol., vol. 35, pp. 51–68, 1994. [35] B. Scotney and S. McClean, “Efficient knowledge discovery through the integration of heterogeneous data,” Inf. Softw. Technol., vol. 41, no. 9, pp. 569–578, Jun. 1999. [36] U. Seidl, “Simatic pcs 7: Efficient integration for tomorrow’s DCS applications,” in Distributed Control Systems 5th Meeting, Miskolc, Hungary, 1999. [37] X. Z. Wang, Data Mining and Knowledge Discovery for Process Monitoring and Control. London, U.K.: Springer-Verlag, 1999. [38] H. Wehr, “Integrating heterogeneous data sources into federated information systems,” in Proc. 4th Eur. GCSE Young Researchers Workshop, Erfurt, Germany, Oct. 2002, pp. 1–11. IESE-Rep. 053.02/E by Fraunhofer IESE. [39] R. Wennersten, R. Narfeldt, A. Granfors, and S. Sjokvist, “Process modelling in fault diagnosis,” Comput. Chem. Eng., vol. 20, pp. 665–670, 1996. [40] D. Zahaya, A. Griffinb, and E. Fredericks, “Sources, uses, and forms of data in the new product development process,” Ind. Mark. Manage., vol. 33, pp. 657–666, 2004. [41] H.-J. Zander, R. Dittmeyer, and J. Wagenhuber, “Dynamic modeling of chemical reaction systems with neural networks and hybrid models,” Chem. Eng. Technol., vol. 22, no. 7, pp. 571–574, Jul. 1999. [42] H. Zhao and S. Ram, “Entity identification for heterogeneous database integration—A multiple classifier system approach and empirical evaluation,” Inf. Syst., vol. 30, no. 2, pp. 119–132, Apr. 2005.

1138 1139 1140 1141 1142 1143 1144 1145 1146 1147

Ferenc Peter Pach received the M.A. degree in information technology from Faculty of Information Technology of the University of Veszprem, Hungary, in 2004. Since September 2004, he has been working towards the Ph.D. degree at the Department of Process Engineering at the University of Veszprem. His current research interests include data warehousing, decision supporting, knowledge discovery, and data mining (association-rule mining and rulebased classification).

1148 1149 1150 1151 1152 1153 1154 1155

Balazs Feil received the M.Eng. degree in chemical engineering from the University of Veszprem, Hungary, in 2003. Currently, he is working towards the Ph.D. degree at the Department of Process Engineering at the University of Veszprem. His research interests include data mining [especially (fuzzy) clustering techniques] and its applications in process engineering.

Sandor Nemeth received the M.Eng. and Dr. Univ degrees in chemical engineering from the University of Veszprem, Hungary, in 1988 and 1996, respectively. From 1993 to 1994, he was a Research Fellow at the Department of Material Science and Processes (Unit of Processes) at the Universite Catholique de Louvain, Belgium. Currently, he is an Associate Professor at the Department of Process Engineering at the University of Veszprem. He has coauthored about 15 journal papers. His research interests include modeling of polymerization processes and computer-aided process engineering.

1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167

Peter Arva received the M.Eng. and Ph.D. degrees in chemical engineering from the University of Veszprem, Hungary, respectively. Currently, he is a Professor at the Department of Process Engineering at the University of Veszprem. He has coauthored about 50 journal papers. His research interests include modeling and simulation of chemical and biotechnological processes, and application of artificial-intelligence (AI) methods in process modeling and design.

1168 1169 1170 1171 1172 1173 1174 1175 1176 1177

Janos Abonyi received the M.Eng. and Ph.D. degrees in chemical engineering from the University of Veszprem, Hungary, in 1997 and 2000, respectively. From 1999 to 2000, he was a Research Fellow at the Control Laboratory at Delft University of Technology, The Netherlands. Currently, he is the Head of the Department of Process Engineering at the University of Veszprem. He has coauthored more than 50 journal papers and chapters in books and has published the research monograph Fuzzy Model Identiﬁcation for Control (Boston, MA: Birkhauser, 2003). His research interests include process engineering, data mining, and the use of fuzzy models, genetic algorithms, and neural networks (NNs) in nonlinear system identification and control.

1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191

IE E Pr E oo f

1109 1110 1111 1112 AQ12 1113 1114 1115 1116 1117 1118 AQ13 1119 1120 1121 1122 1123 1124 1125 1126 AQ14 1127 1128 1129 1130 AQ15 1131 1132 1133 1134 1135 1136 1137

AUTHOR QUERIES AUTHOR PLEASE ANSWER ALL QUERIES

IE E Pr E oo f

AQ1 = Please provide the expanded form of the acronym “TVK.” AQ2 = Please provide the expanded form of the acronym “SIMATIC.” AQ3 = Please provide the expanded form of the acronym “MMI.” AQ4 = Please provide the expanded form of the acronym “PHP.” AQ5 = The acronym “CGI” was defined as “common gateway interface.” Please check if correct. AQ6 = Please provide the expanded form of the acronym “VIKKK.” AQ7 = Please provide page range in Ref. [3]. AQ8 = Please provide page range in Ref. [8]. AQ9 = Please provide page range in Ref. [14]. AQ10 = Please provide issue number in Ref. [26]. AQ11 = Please provide issue number in Ref. [29]. AQ12 = Please provide issue number in Ref. [34]. AQ13 = Please provide page range in Ref. [36]. AQ14 = Please provide issue number in Ref. [39]. AQ15 = Please provide issue number in Ref. [40]. Notes: 1) Photo of author F.P. Pach is pixelated. 2) Photo of author S. Nemeth and P. Avra has bad screen patterns. 3) Photo of author B. Feil and J. Abonyi is unsharp. END OF ALL QUERIES

Operator Type Operator Java Flow of Control - GitHub