2009 International Conference on Intelligent Human-Machine Systems and Cybernetics

FEDC: A Framework for Field Ecological Data Collection and Management Binglin Wang1, 2, Yuanchun Zhou1, Jie Cheng1, 2 Xuezhi Wang1, Jianhui Li1, Baoping Yan1 1 Computer Network Information Center, Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences [email protected],{yczhou,chengjie,wxz,lijh,ybp}@sdb.cnic.cn

and management. This framework uses the self-registry method to add new metadata into the system, and all the metadata information related with the equipment can be extracted and register to the central server. The FEDC uses the SRB technique to manage the data distributed in different kinds of field station, and we transmit part of the data in the field station to the central server, and register the metadata of other files to the central server. In this framework, we use the GIS technique to visualize the status of the equipments, so that we can know whether the equipments are working normally or not, and we also can know the status of all equipments just by looking at the map. We believe that this framework will be a useful framework applicable to many other large scale sensor network projects. The structure of this paper is as follows. Section 2 shows some related projects similar to ChinaFlux. Section 3 describes the framework and related techniques, and finally we give conclusions.

Abstract— In the field ecological data collection process, lots of heterogeneous data are generated by equipments which are composed of hundreds to thousands of sensors and cameras and it is a big challenge to collect and manage these data. In this paper, we proposed a five layer framework named FEDC which can be used in the field ecological data collection and management. FEDC is a hybrid-distributed framework with some distributed properties, and it also has scalable, real-time and self-adaptive abilities. It successfully used in the ChinaFlux [1] project and the result shows that the FEDC framework has a good performance in managing and transmitting the field ecological data. Keywords: semi-distributed transportation; data collection; data management.

I. INTRODUCTION The Chinese Terrestrial Ecosystem Flux Research Network (ChinaFLUX) [1] is a long-term national network of micrometeorological flux measurement sites that measure the net exchange of carbon dioxide, water vapor, and energy between the biospheres. The ChinaFLUX network includes 8 observation sites (10 ecosystem types) and encompasses a large range of latitudes (218570N to 448300N), altitudes, climates and species. It relies on the existing Chinese Ecosystem Research Network (CERN), fills an important regional gap and increases the number of ecosystem types in FLUXNET. The 8 observation sites located in different parts of China and their network conditions are always very bad. There are lots of equipments which used to collect different kinds of data in these observation sites. One of the biggest challenges involving in managing these kinds of system is the human efforts to configure, deploy and monitor the thousands of heterogeneous equipments. We need to manage such heterogeneous data generated by these sensor equipments and transmit them to the central server. The traditional methods to dispose these data are by manual operation. When the experts need the data, they are mailed to the experts to preserve and analyze through the internet network, and it’s very low efficient and the experts always can’t get the data at the right time. We proposed such a framework named FEDC which can be used to collect, transport, manage and analyze the data generated in the field ecological data collection. It is a semi-distributed architecture and also stable for ecological data collection 978-0-7695-3752-8/09 $25.00 © 2009 IEEE DOI 10.1109/IHMSC.2009.110

II. RELATED WORKS In recent years, there are lots of projects related to the data collection and environment monitor. Such as The Realtime Observatories, Application, and Data management Network (ROADNet) [2], The National Ecological Observatory Network (NEON), Ocean Research Interactive Observatory Networks (ORION) [3], and USArray [4]. Some of these projects use data grid technology to transmit and manage the data, such as images, videos, and other data generated in the field ecological data collection. Because It doesn’t have good network condition in some field stations, and it needs to transmit some of the data to the central server located in BeiJing through a long distance internet network in ChinaFLUX project, so using the data grid technology can’t solve all the problems and get a good performance in this project. We proposed the semidistributed FEDC framework and the results proved that it successfully improved the efficiency and performance in the field data collection and management. III. OVERVIEW OF FEDC The FEDC framework is composed of five layers, data collection layer, data transportation layer, data management layer, data analysis layer, and data visualization layer. 409

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on December 9, 2009 at 21:12 from IEEE Xplore. Restrictions apply.

A.

Data Collection There are two kinds of data in ChinaFlux project; one is called flux data generated by lots of sensors including co2 , h2 o sensors; the other one is the video data generated

by camera installed in the field station. All the flux data sensors in the field station are connected with equipment called LoggerNet [5]. The LoggerNet server stores the data in a cache and writes data to ASCII files. Datalogger is used to acquire and process the data collected from the various sensors. Every field station has several cameras to monitor the growth status of some plants and the environment change in the field station. The video data can be divided into two categories, the real-time video data which needs to transmit to the central server and the backup data which stored in the field station server. There are several objects need to monitor every time, but we just have limited cameras. Some plants which change a little so that don’t need to store every video frame in every second. In FEDC, we set a timer for every object, and when the timer happens, the data collection module sends a command to the camera and the camera will rotate itself to focus this object, get a picture of the object’s current status and store them in the filed station server.

Figure 1. The FEDC framework overview

Data Collection collecting the data stores them to the project, these data

Layer. This generated by field station are co2 , h2 o

layer is responsible for sensors and cameras and server. In the ChinaFlux and heat fluxes between

vegetation and the atmosphere in typical Chinese ecosystems and some video data generated by cameras. We use the application programming interface provided by the equipment company to connect to these equipments and get the data, then store them into the field station server. Data Transportation Layer. Lots of data need to transmit to the central server to preserve and analyze, and data management is also based on this layer, so the data transportation layer is the core of the FEDC framework. The data transportation layer can be divided into two parts, one part is responsible for the real-time transportation, and the another part is for data transportation between the field stations.

B.

Data Transportation We define C as the central server located in Beijing, S as the sensor, camera or other equipments, and T as the field station server located in the field station. Central Servers (C). In ChinaFlux project, the central server is located in Beijing and responsible for data analysis and visualization. The data which transmitted to the central server is composed of some sensor data and huge amount of video data. Because most of these data are real-time data, so it will be a big burden for the central server to dispose these data, especially stream the video data to the web users. The central server is the core of the general FEDC framework, and if it emerges any problems, the whole process of field ecological data collection will fail to work. So in order to increase the stabilities, the central server is composed of several servers which have different kinds of responsibilities in FEDC.

Data Management Layer. Some of the data collected from the sensors and cameras need to transmit to the central server, but some don’t need. We need a layer to manage the data stored both in the observation server and central server. These data are semi-distributed in different kinds of field stations, and the central server is the central node in this semi-distributed network. To get better efficiency of data management, we use the SRB technique to manage these semi-distributed data in FEDC framework.

Control Server. The control server is responsible for distributing the data to the data server and streaming server, and sending the control command to the observation server to control the sensor and camera equipments. Data Server. The data server is used to do some data backup work. There is a central database in the data server, and all the data transmitted from the field station server is stored in this database. Streaming Server. Because it needs to observe the ecological environment and equipments in field station in real-time, so we have a streaming server to stream the realtime data transmitted from the field station server to the users, included the web users and the scientists.

Data Analysis Layer. The purpose of collecting the data is for analyzing and visualization. This module use some data mining and mathematical methods to do some data analysis, so that the results can be used to show to scientists and other users in the ecological research area. Data Visualization Layer. There are lots of flux data and video data which are collected in real-time. We use some web and streaming media technologies to visualize the data and show them to the users, including the web users and other users who need the results.

410

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on December 9, 2009 at 21:12 from IEEE Xplore. Restrictions apply.

Web Server. Lots of our techniques including the GIS, the real-time video and flux data visualization are based on this server. Observation Server (T). In ChinaFlux project, sensor and camera equipments are mainly managed by the server which named field station server located in field station. It connects with all the equipments through a local area network and this can assure that all the data can transmit to the observation server through the LAN steadily and quickly. Because not all the data need to transmit to the central server, so there are lots of data are stored in the observation server, which can be used in the future. The Data Transportation. The observation server T collects the data including the flux data and video data generated by sensors or cameras(S) through the local area network. If the data is real-time data and needs to transmit to the central server, the observation server connects to the central server, and sends the data to the central server. The data which don’t need to transmit to the central server will be stored in the observation server so that it can be used in the future. We also can control the equipments by sending some commands to them from the central server. The central server sends the commands to the observation server T, and then T redirects the command to the equipment S, and S will respond according to different kinds of commands which sent from the central server.

Figure 2. Data Management Architecture in FEDC

1) Metadata Registry and Location In the FEDC framework, all the metadata information will be registered in the metadata database located in the central server so that every query just needs to send to the central server, and it will return the results every quickly. In order to register all the metadata [8] in the central server, we proposed a metadata self-registry method in the FEDC framework. To accomplish the metadata self-registry, we need to do three steps as below. a) Extract the metadata. Every data file has its own metadata information and this information exists in their file name, file attribute, and other related objects. We use a module to extract all the metadata related to the file in the FEDC framework.

C.

Data Management There are two kinds of data, the data stored in the central server and the data stored in the observation server. Data collection module collects the camera and sensor data from the camera and sensor equipments. If the data don’t need to transmit to the central server, they will be stored in the field station server database. To manage these heterogeneous data located in different field stations, we use the SDSC Storage Resource Broker (SRB) [6] technique to manage them. SRB is client-server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and accessing unique or replicated data objects and in conjunction with the Metadata Catalog (MCAT) provides a way to access data sets and resources based on their logical names or attributes rather than their names and physical locations. Irods [7] is middleware based on the SRB technique, and we use it to manage the heterogeneous data distributed in different field station in the FEDC framework. We need to manage two kinds of data, equipment status monitor data and data generated by sensors and cameras in ChinaFlux project. We have an iRods iCAT server in the central server node, and each field station has an iRods data server which used to index all the data in the field station server. The iCAT is the iRODS CATalog, stored in a database using a DataBase Management System (DBMS). The basic architecture of data management of FEDC is described by figure 2.

b) Ingest metadata. The metadata must be organized into special format so that it can be registered in the central server. c) Commit metadata. When the metadata is in a format that the iRods server can accept, we use the micro-services which are small, well-defined procedures/functions that perform a certain task in the iRods, to register the metadata into the irods iCAT server. 2) Resource Location All the data information in the central server and field stations are stored in the metadata database in central iRods server. The query is as below. Step1, the user asks for data. The user sends the query to the central server S to get the data information, including its location and metadata information. Step2, data request goes to SRB Server. The central server S process the query sent by user and redirect the query to the SRB server to get the results. Step3, SRB Server looks up information in database. When the SRB server gets the queries, it will query its metadata database to see whether the metadata information of the query data is there, and then tell the central server S which field station server has the data.

411

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on December 9, 2009 at 21:12 from IEEE Xplore. Restrictions apply.

IV. CONCLUSIONS

Step4, get the data from field station server. After get the query results, the central server will return the results to the user U, and the user can send the request to the field station server T according to the query results from central server. If the user has the enough rights, the field station server T will send the data to the user.

In this paper, we proposed a framework named FEDC which can be used in large scale field ecological data collection for the ChinaFlux project. This framework can be used to manage and transmit heterogeneous data generated by different kinds of collection equipments. We implemented the data collection layer, transportation layer, management layer, analysis and visualization layer in FEDC framework, and every layer has different kinds of tasks in the field ecological data collection, and the result shows that it has good performance in field ecological data collection and management.

D. Data Analysis and Visualization After we have transmitted the data to the central server, we do some analysis based on the data stored in the central server database. In FEDC framework, we used GIS technique to visualize the monitor data, and some flash charts to visualize the real-time flux data. Figure 3 shows that results of using the flash chart to show the real-time wind speed and co2 flux data, and figure 4 shows that

ACKNOWLEDGEMENTS We would like to thank Professor Honglin He, Doctor Xuefa Wen from Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, for their open idea, discussion, cooperation, and contribution. This work was supported by the Knowledge Innovation Program of the Chinese Academy of Sciences (No.O815021108).

using the map to visualize the status of local area network in the field station.

REFERENCES [1] GR Yu, XF Wen, XM Sun, BD Tanner, X Lee, JY Chen. Overview of ChinaFLUX and evaluation of its eddy covariance measurement. Agricultural and Forest Meteorology, 2006 [2] F Vernon, T Hansen, K Lindquist, B Ludaescher. ROADNET: A Realtime Data Aware System for Earth, Oceanographic, and Environmental Applications. Eos Transactions (American Geophysical Union fall meeting [3] BM Howe, T McGinnis. Sensor networks for cabled ocean observatories. Underwater Technology, 2004 [4] C Cotofana, L Ding, P Shin, S Tilak, T Fountain. An SOA-based Framework for Instrument Management for Large-scale Observing Systems (USArray Case Study). Proceedings of the IEEE International Conference on Web

Figure 3. The real-time visualization of flux data.

[5] MT Ritsche, DJ Holdridge, R Pearson. New and Improved Data Logging and Collection System for Atmospheric Radiation Measurement Climate Research Facility, Tropical Western Pacific, and North Slope of Alaska Sky Radiation, Ground Radiation, and MET Systems. Fifteenth Atmospheric Radiation Measurement, 2005. [6] C Baru, R Moore, A Rajasekar, M Wan. The SDSC storage resource broker. Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research,1998. [7] A Rajasekar, R Moore, F Vernon. iRODS: A Distributed Data Management Cyberinfrastructure for Observatories. American Geophysical Union, Fall Meeting 2007. [8] S Weibel. Metadata: the foundations of resource description. portal.acm.org, 1995.

Figure 4. The visualization of monitor data.

412

Authorized licensed use limited to: THE LIBRARY OF CHINESE ACADEMY OF SCIENCES. Downloaded on December 9, 2009 at 21:12 from IEEE Xplore. Restrictions apply.

FEDC: A Framework for Field Ecological Data ...

of these projects use data grid technology to transmit and manage the data, such ... data mining and mathematical methods to do some data analysis, so that the ...

319KB Sizes 0 Downloads 253 Views

Recommend Documents

Universal Kriging for Ecological Data
The goal is to predict the response variable for the remaining cells in the ... every cell in a grid. Thus .... The response variable is a log-transformed cover value of.

A Framework for Access Methods for Versioned Data
3. ,d. 3. > version v. 3 branch b. 2 branch b. 1 time. Key space v. 1 v. 3 k. 1 k. 2 k. 3 now d. 1 ..... (current_version, ∅) (we call restricted-key split). • Pure key splits ...

A Framework for Access Methods for Versioned Data
sentation of a record can be made using start version of the version range ... Many applications such as medical records databases and banking require his-.

Designing with data: A framework for the design professional
Products become tools that deliver a complete experience within a complex system for the user. How can a designer stay relevant in this process, where users have the ... 2. Generative: Create design opportunities. 3. Evaluative: Further development o

SDAFT: A Novel Scalable Data Access Framework for ...
becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Frame- work (SDAFT) to solve the problem.

a simulation framework for energy efficient data grids
ing a data grid that can conserve energy for data-intensive ... Figure 1: A system architecture for data grids. 1418 .... distributed memory multiprocessors.

A Java Framework for Mobile Data Synchronization
file systems, availability is more important than serializability. .... accumulate a list of newly inserted objects, and listen for completion of the receiving phase to ...

A Framework for Simplifying Trip Data into Networks via Coupled ...
simultaneously cluster locations and times based on the associated .... In the context of social media ... arrival-type events (e.g. Foursquare check-in data [20]).

SilkRoute: A Framework for Publishing Relational Data in XML
To implement the SilkRoute framework, this work makes two key technical ... for selecting a good decomposition plan; the algorithm takes as input estimates of query and data ...... else . Fig. ...... nationkey CHAR(10), phone CHAR(10)).

SilkRoute: A Framework for Publishing Relational Data in XML
virtual XML view over the canonical XML view; and an application formulates an ... supported by the NSF CAREER Grant 0092955, a gift from Microsoft, and ... serialization format, a network message format, and most importantly, a uni-.

Sailfish: A Framework For Large Scale Data Processing
... data intensive computing has become ubiquitous at Internet companies of all sizes, ... by using parallel dataflow graph frameworks such as Map-Reduce [10], ... Our Sailfish implementation and the other software components developed as ...

A Proposed Framework for Proposed Framework for ...
approach helps to predict QoS ranking of a set of cloud services. ...... Guarantee in Cloud Systems” International Journal of Grid and Distributed Computing Vol.3 ...

Evaluating alternative data sets for ecological niche ...
We used three alternative environmental data sets: climatic data, remote- sensing data .... relation to 1) all expected occurrences (sensitivity) and 2) all predicted ..... our reach. One of the potential sources of error in .... The effect of energy