Choi - 2010 - Adaptive Sensing Technique to Maximize Spectrum ...

Viewer
Transcript

992

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

optimizes the link-layer PDR and the physical-layer PER to improve the overall PLR and, hence, the system performance. Furthermore, it distributes the excess packet losses according to predefined weights to more effectively and flexibly maintain fairness. The simulation results show that the CEPS algorithm can greatly decrease the PLR experienced by users, particularly when the traffic demand is heavy, and support more users in the system while meeting a specified PLR objective. The simulation results also show that CEPS is also effective and more flexible in maintaining fairness among different users. R EFERENCES [1] I. F. Akyildiz, D. A. Levine, and I. Joe, “A slotted CDMA protocol with BER scheduling for wireless multimedia networks,” IEEE/ACM Trans. Netw., vol. 7, no. 2, pp. 146–158, Apr. 1999. [2] P. Kong, K. Chua, and B. Bensaou, “A novel scheduling scheme to share dropping ratio while guaranteeing a delay bound in a multicodeCDMA network,” IEEE/ACM Trans. Netw., vol. 11, no. 6, pp. 994–1006, Dec. 2003. [3] V. Huang and W. Zhuang, “QoS-oriented packet scheduling for wireless multimedia CDMA communications,” IEEE Trans. Mobile Comput., vol. 3, no. 1, pp. 73–85, Jan./Feb. 2004. [4] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-layer scheduling with prescribed QoS guarantee in adaptive wireless networks,” IEEE J. Sel. Areas Commun., vol. 23, no. 5, pp. 1056–1066, May 2005. [5] D. Zhao, X. Shen, and J. W. Mark, “Radio resource management for cellular CDMA systems supporting heterogeneous service,” IEEE Trans. Mobile Comput., vol. 2, no. 2, pp. 147–160, Apr.–Jun. 2003. [6] Q. Liu, S. Zhou, and G. B. Giannakis, “Queuing with adaptive modulation and coding over wireless links: Cross-layer analysis and design,” IEEE Trans. Wireless Commun., vol. 4, no. 3, pp. 1142–1153, May 2005. [7] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-layer combining of adaptive modulation and coding with truncated ARQ over wireless links,” IEEE Trans. Wireless Commun., vol. 3, no. 5, pp. 1746–1755, Sep. 2004. [8] C. Lin and R. D. Gitlin, “Multi-code CDMA wireless personal communication networks,” in Proc. IEEE Commun., Jun. 1995, pp. 1060–1064. [9] B. S. Thian, Y. Wang, T. T. Tjhung, and L. W. C. Wong, “A hybrid receiver scheme for multiuser multicode CDMA systems in multipath fading channels,” IEEE Trans. Veh. Technol., vol. 56, no. 5, pp. 3014– 3023, Sep. 2007. [10] C. S. Chang and K. C. Chen, “Medium access protocol design for delayguaranteed multicode CDMA multimedia networks,” IEEE Trans. Wireless Commun., vol. 2, no. 6, pp. 1159–1167, Nov. 2003. [11] P. Y. Kong, K. C. Chua, and B. Bensaou, “Multicode-DRR: A packetscheduling algorithm for delay guarantee in a multicode-CDMA network,” IEEE Trans. Wireless Commun., vol. 4, no. 6, pp. 2694–2704, Nov. 2005. [12] L. Lenzini, M. Luise, and R. Reggiannini, “CRDA: A collision resolution and dynamic allocation MAC protocol to integrate data and voice in wireless networks,” IEEE J. Sel. Areas Commun., vol. 19, no. 6, pp. 1153– 1163, Jun. 2001. [13] F. Yu, V. Krishnamurthy, and V. C. M. Leung, “Cross-layer optimal connection admission control for variable bit rate multimedia traffic in packet wireless CDMA networks,” IEEE Trans. Signal Process., vol. 54, no. 2, pp. 542–555, Feb. 2006.

Adaptive Sensing Technique to Maximize Spectrum Utilization in Cognitive Radio Kae Won Choi

Abstract—A cognitive radio (CR) system exploits spectrum bands that primary users (PUs) are licensed to use. The CR performs channel sensing to find spectrum opportunities. Conventional periodic sensing schemes require a long sensing time to detect a weak signal from the PU with fast channel-usage variation. Since the CR network should be quiet during a sensing period, a long sensing time results in low spectrum utilization. To improve spectrum utilization, we propose a novel sensing scheme that adaptively decides whether to sense the channel or to transmit the user data based on previous sensing results. The simulation results show that the proposed scheme significantly outperforms the conventional scheme. Index Terms—Cognitive radio (CR), energy detection, opportunistic spectrum access, partially observable Markov decision process (POMDP).

I. I NTRODUCTION The cognitive radio (CR) opportunistically accesses spectrum bands that primary users (PUs) are licensed to use under the condition that the CR does not interfere with the PU [1]. To avoid interference between the CR and the PU, the CR exploits the spectrum opportunity, which is defined as the frequency channel that is temporarily not used by the PU. The CR performs channel sensing to find spectrum opportunities. Generally, the CR adopts the periodic sensing strategy, as depicted in Fig. 1(a) (e.g., [2]–[8]). Using this strategy, the CR periodically senses the current operating channel to monitor the PU activity. If the CR detects the PU on the operating channel, then it switches the operating channel to find another spectrum opportunity. The periodic sensing scheme may work well in a favorable environment; however, its performance significantly degrades under certain adverse conditions. For example, if the CR is required to sense very weak PU signals to provide strong protection to the PUs, then the sensing time must be prolonged to guarantee sufficiently low detection error probability. Since the CR is not allowed to transmit any signal during a sensing period, a long sensing time results in low spectrum utilization. Moreover, the CR may intend to access the very short spectrum opportunity caused by the bursty arrival of PU application traffic. Recently, there have been studies on the exploitation of temporal spectrum opportunities that last on the order of milliseconds (e.g., [6]–[9]). In this environment, the CR should very frequently perform sensing to catch up with the variations in the PU state. Therefore, fast PU state variation, combined with weak PU signals, can cause the CR to waste a large portion of time on channel sensing. In this paper, we propose an “adaptive sensing CR,” as depicted in Fig. 1(b), which significantly reduces the sensing overhead in the adverse environment. In contrast with the “periodic sensing CR” shown in Fig. 1(a), the adaptive sensing CR determines whether to sense a channel or to transmit data at consecutive decision epochs. By Manuscript received March 15, 2009; revised July 4, 2009. First published November 13, 2009; current version published February 19, 2010. The review of this paper was coordinated by Dr. S. Wei. The author was with the Telecommunication Business, Samsung Electronics, Suwon 443-742, Korea. He is now with the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVT.2009.2036631

0018-9545/$26.00 © 2010 IEEE

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

993

sensing CR. In Section IV, we design the decision-making algorithm by using the POMDP framework. Section V presents some numerical results, and this paper is concluded with Section VI. II. R ELATED W ORK

Fig. 1. Periodic sensing CR and adaptive sensing CR. (a) Periodic sensing CR. (b) Adaptive sensing CR.

means of this adaptive decision process, the CR can perform channel sensing only when it is needed, and therefore, unnecessary sensing can be avoided. The adaptive sensing CR makes each decision based on the previous sensing results. At a decision epoch after sensing, the CR can immediately stop sensing and transmit data if the sensing results strongly indicate that the channel is vacant. More sensing is required only when the sensing results are not conclusive. We consider the energy detection [10] as a channel-sensing method. After each sensing period, the energy detector produces a real-valued test statistic, which is the total estimated energy during the sensing period. Typically, the periodic sensing CR makes a hard decision on whether the PU is active or not by comparing a test statistic with a certain threshold. Therefore, the sensing period of the periodic sensing CR should be sufficiently long so that the decision regarding the PU activity is not erroneous. However, we set a sensing period of the adaptive sensing CR to be much shorter than that of the periodic sensing CR, since a shorter sensing period enhances the adaptability of the CR by increasing the frequency of decision epochs. Because of the short sensing period, the adaptive sensing CR cannot accurately decide the PU activity from a single test statistic. Therefore, the adaptive sensing CR does not make a hard decision from a test statistic but instead just takes the test statistic as a soft “sensing result.” Since such a sensing result is noisy, the adaptive sensing CR simultaneously takes into account the multiple sensing results and combines them to generate reliable information regarding the PU activity. At each decision epoch, a “decision-making algorithm” decides whether to sense the channel or to transmit data. The aim of the decision-making algorithm is to maximize spectrum utilization while restricting interference to the PU. To design the optimal algorithm that achieves such goal, we use the partially observable Markov decision process (POMDP) framework [11], [12]. In summary, the proposed CR has the following distinctive features in comparison with the conventional periodic sensing CR: 1) Adopting the adaptive sensing structure, the proposed CR can avoid unnecessary sensing. 2) The proposed CR combines multiple soft sensing results from short sensing periods to enhance adaptability. 3) Adaptive decisions are made by the optimal decision-making algorithm, which was designed by using the POMDP framework. We will present the simulation results that show how the proposed CR significantly outperforms the periodic sensing CR in terms of channel utilization. This performance gain is attributable to the previously listed features of the proposed CR. The rest of this paper is organized as follows: Some related works are briefly described in Section II. Section III introduces the adaptive

There have been some studies on the investigation of the adaptive spectrum sensing and the medium access control protocol for CR networks (e.g., [3]–[8], [13], and [14]). In [3], the authors proposed the algorithm that finds the optimal sensing parameters for each frequency channel and chooses the best frequency channels for maximizing capacity. Zhao and Chen [4] developed the frequency channelselection strategy by formulating the CR as a POMDP. In [5], they also showed that a simple round-robin channel-selection strategy is close to optimal. The authors of [6]–[8] suggested a statistical model of the PU activity based on empirical data and proposed channelaccess strategies that maximize the channel utilization while limiting interference to the PU. In [13] and [14], the authors proposed the adaptive sensing methods that select the frequency channels based on the historical information about the PU channel usage. All these studies deal with the frequency channel-selection problem. These existing works and our work intend to solve very different problems arising in designing the CR. While the existing schemes aim to find the frequency channels most likely to be vacant, the proposed scheme focuses on efficiently utilizing the sensing results and adaptively allocating the sensing periods to minimize the sensing overhead. The existing schemes can only select the frequency channel, not adaptively choose whether to perform channel sensing or to transmit data. Therefore, the existing schemes constitute the periodic sensing CR, and they cannot reduce the sensing overhead. Among the existing works, [4] and [5] also make use of the POMDP framework. Although these works take the same mathematical approach as our work does, the focus is quite different, as previously noted. In addition to the energy detector, the cyclostationary feature detector [15] is also considered as a candidate for the sensing method of the CR network. Although the cyclostationary feature detector has a long detection time and high computational complexity, it can detect a much weaker PU signal than the energy detector can, owing to its robustness to noise uncertainty. In [16], we applied the sequential detection framework [17] to the cyclostationary feature detector to reduce the required detection time. It is noted that the adaptive sensing CR can also be implemented on top of the cyclostationary feature detector by using some techniques proposed in [16]. III. A DAPTIVE S ENSING C OGNITIVE R ADIO A. System Model Consider N frequency channels that the PU is licensed to use. The CR network is allowed to access the channel when it is not occupied by the PU. A collision occurs when the CR network transmits a signal on a channel currently used by the PU. The collision probability should be restricted to a certain level. For the CR, we consider a small-scale network such as the wireless personal area network. The CR is a star topology network where a “master node (MN)” is located at the center of the network, and “slave nodes (SNs)” are attached to the MN. Only the MN performs channel sensing, whereas the SNs do not. Based on the sensing results, the MN makes the decision on the next action at each decision epoch, and it orders the SNs to follow its decision by sending a control signal. Upon receiving the control signal, the SNs follow the order in it. All the CR nodes are tuned to the same frequency channel, which is designated as the “operating channel,” among N frequency channels.

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

994

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

The CR network utilizes the operating channel until it is reclaimed by the PU. On the operating channel, the CR nodes exchange user data. In addition, the MN senses the operating channel to monitor the PU activity. If the PU is detected, then the MN switches the operating channel to another channel and directs the SNs to move to the new operating channel.

B. Adaptive Sensing Structure As illustrated in Fig. 1(b), the MN chooses its next action at each decision epoch, which occurs at the end of each action. The decision epoch is indexed by t(= 1, . . .). At each decision epoch, the MN selects the next action among data transmission, sensing, and channel switching. This decision is made by the decision-making algorithm, which is described in Section IV. From now on, we explain the operation of the CR network when each action is selected. 1) Data Transmission: If the MN is convinced that the operating channel is vacant, then it selects data transmission. In this case, the CR nodes exchange user data by using the time-division multiple-access scheme during the data transmission period of which the length is TD . At the beginning of the data transmission period, the MN sends the control signal, which contains the time allocation, to the SNs. The SNs transmit or receive user data according to this time allocation. 2) Sensing: When the MN is not sure whether the PU exists or not on the operating channel, it chooses sensing to perform energy detection on the operating channel during a sensing period of which the length is TS . No control signal is sent by the MN in this case. The SNs should be quiet if they do not receive a control signal. During the sensing period immediately after the tth decision epoch, the energy detector generates a soft sensing result ξt from the input signal. If the bandwidth of a frequency channel is denoted by W , the energy detector takes W · TS baseband complex signal samples during a sensing period. Let yt,i denote the ith signal sample in the sensing period immediately after the tth decision epoch. It is assumed that the MN is aware of the noise spectral density No . The energy detector estimates the energy in the signal samples and normalizes it by No /2 to derive the sensing result ξt as follows:

ξt =

W ·TS 1 |yt,i |2 . No /2

(1)

Fig. 2. Example pmf’s of the quantized sensing result ζt when the PU is inactive and active. The bandwidth of a frequency channel W is 1 MHz, the length of a sensing period TS is 0.1 ms, and the SNR of the PU signal is −10 dB.

Fig. 3.

Example of the quantized sensing results over time.

simple, it corresponds to the suboptimal channel-selection strategy proposed in [5].

i=1

To efficiently process the sensing result, the proposed scheme quantizes ξt to produce the quantized sensing result ζt . Let M denote the number of quantization levels. We define τ1 < · · · < τm < · · · < τM +1 as the thresholds for quantization. For quantization, the MN finds m such that τm ≤ ξt < τm+1 and sets ζt to such an m. To closely approximate the real-valued space, the number of quantization levels M should be sufficiently large. Fig. 2 shows examples of probability mass functions (pmf’s) of the quantized sensing result when the PU is inactive and when it is active. In this figure, the number of quantization level is 20, and the thresholds are τ1 = 0, τm = 120 + 10 · (m − 2) for m = 2, . . . , 20, and τ21 = ∞. 3) Channel Switching: If it is highly probable that a PU exists in the operating channel, then the MN selects channel switching and sends a control signal that orders the SNs to switch the operating channel. We assume that it takes TC to complete the channel-switching process and be ready to choose another action, since the CR nodes should tune their frequency band and perform a synchronization process. When channel switching is selected, the CR network simply moves to the next adjacent frequency channel. If the current operating channel is the last channel, then the CR network moves to the first frequency channel. Although this channel-selection strategy seems very

C. Operation of Adaptive Sensing CR Now, we describe the operation of the adaptive sensing CR and explain how it can reduce the sensing overhead. In Fig. 2, we can see that the variance of the sensing result is very large because of the short sensing period. Therefore, even when the PU activity does not change, the CR observes random sensing results that are different for each sensing period, as shown in the example of quantized sensing results over time in Fig. 3. In this figure, the PU is inactive throughout time. The lengths of a data transmission period and a sensing period are both 0.1 ms. A time duration with no sensing result corresponds to a data-transmission period. In the example shown in Fig. 3, the CR performs channel sensing from 0.1 to 0.2 ms and obtains the quantized sensing result of 12. In Fig. 2, we can see that the quantized sensing result of 12 indicates that the channel is more occupied than empty; therefore, the CR cannot be certain that the PU is inactive, and it performs more sensing. The CR starts to transmit data at 0.6 ms after two consecutive low-sensing results obtained from 0. ms to 0.6 ms. Therefore, it takes five sensing periods, which are from 0.1 to 0.6 ms, to start data transmission. After one data transmission period, the CR decides that more sensing is needed at 0.7 ms. From 0.7 to 0.8 ms, the CR fortunately obtains a low

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

value of the sensing result, i.e., 9. Thus, the CR quickly determines that the PU is inactive only after one sensing period, and it resumes data transmission at 0.8 ms. After 0.8 ms, we can also see that a variable number of sensing periods are required to resume data transmission, depending on the sensing results. This example shows that the proposed CR finds out the appropriate sensing time on the basis of the sensing results, whereas the periodic sensing CR senses the channel for a fixed time period. Therefore, the proposed CR can avoid unnecessary sensing and reduce the sensing overhead. In fact, this gain is similar to the gain achieved by using the sequential detection algorithm [17]. The sequential detection algorithm is known to outperform the conventional fixed-time detection algorithms by a very wide margin, and it generally requires one-half to one-third sensing time in average. A similar amount of performance gain is also expected for the proposed CR. Since the sequential detection algorithm is a sort of detection method, it has no particular consideration for the CR. For example, the sequential detection algorithm does not take into account possible changes in the PU activity during the sensing periods, which the proposed CR carefully considers. Moreover, the proposed CR can adaptively decide when to stop data transmission for more sensing and when to switch the operating channel, as shown in the example in Fig. 3. Therefore, the proposed CR integrates all the CR functionalities into one unified decision-theoretic framework, thereby combining the known advantages of the sequential detection algorithm with the gains from the adaptive decisions of data transmission and channel switching. IV. D ECISION -M AKING A LGORITHM A. POMDP Formulation The decision-making algorithm decides the next action among data transmission, sensing, and channel switching at each decision epoch. In this section, we model the adaptive sensing CR as a POMDP to design the optimal decision-making algorithm. In [6] and [7], it is shown that the PU activity can be modeled as a Markov process with two states: The PU is active in one state and is inactive in the other state. In addition, the CR does not have knowledge of the true state (i.e., PU activity) but only infers it from the noisy sensing results. This setting matches very well with the POMDP framework. A POMDP is defined by states, actions, state transition probabilities, observations, and rewards. From now on, we will define them one by one. See [11] and [12] for more information about the POMDP framework. 1) State and Action: Let st be the state at the tth decision epoch. The state st indicates the PU activity on the current operating channel. The state st is 0 if the operating channel is vacant at the tth decision epoch and is 1 if the operating channel is occupied by the PU at the tth decision epoch. We do not consider the PU activities on the frequency channels other than the operating channel because they are needed only for frequency channel selection, to which we do not pay attention in this paper. The action selected at the tth decision epoch is denoted by at . The value of at can be selected among D, S, and C, which stand for data transmission, sensing, and channel switching, respectively. 2) State Transition Probability: It is assumed that the PU is activated on a channel according to the Markov process with the rate of λ, and the PU sojourns on the channel for an exponentially distributed time with the average of 1/μ. We also assume that the MN is aware of λ and μ. From these parameters, the state transition probability can be calculated as follows. Let pai,j be the state transition probability from state i to state j when the action a is taken. That is, pai,j := Pr{st+1 = j|st = i, at = a}. We assume that the state of the PU can change only once during a data-transmission period or during a sensing period, since the interarrival and sojourn times of the PU

995

are very large compared with TD and TS . Based on this assumption, the state transition probability when the action is data transmission or sensing (i.e., a = D or S) can be calculated as pa0,1 = 1 − e−λTa , pa1,0 = 1 − e−μTa , pa0,0 = e−λTa , and pa1,1 = e−μTa . If the selected action is channel switching, then the CR moves to the next frequency channel. We assume that there are so many frequency channels that it takes a very long time to visit the same channel again. When the CR revisits a channel that was visited a long time ago, the probability distribution of the states on the channel is already converged to the stationary probability distribution. Therefore, if the action is channel switching, the state transition probability is given as C pC i,0 = μ/(λ + μ) and pi,1 = λ/(λ + μ) for i = 0 and 1. 3) Observation Model: Let rt denote the observation that the MN receives after the tth decision epoch. When the MN chooses sensing at the tth decision epoch (i.e., at = S), it performs energy detection on the operating channel, calculates the sensing result ξt from (1), and quantizes ξt to generate ζt . In this case, the MN takes the quantized sensing result ζt as the observation. That is, if at = S, then we have rt = ζt . If the MN selects data transmission or channel switching, then it obtains no sensing result. In this case, the MN receives a null observation ∅, and therefore, rt = ∅ for t’s such that at = D or C. a as the probability that MN receives r as the obserWe define qi,r a := Pr{rt = vation when the action a is taken in state i. That is, qi,r r|st+1 = i, at = a}. If the selected action at the tth decision epoch is data transmission or channel switching, then the observation rt is a =1 always ∅. Therefore, if a = D or C, then we simply have qi,r a for r = ∅ and qi,r = 0 for r = 1, . . . , M , regardless of the state i. On the other hand, the observation rt is equal to the quantized sensing result when sensing is selected at the tth decision epoch. S S = 0 for r = ∅ and qi,r = Pr{τr ≤ ξt < Therefore, it holds that qi,r S and τr+1 |st+1 = i, at = S} for r = 1, . . . , M . We can calculate q0,r S q1,r for r = 1, . . . , M from the probability density functions of the sensing result ξt under the condition that the PU is inactive and active, respectively. According to [10], ξt follows the chi-square distribution with 2 · W · TS degrees of freedom if the PU is inactive. If the PU is active, then ξt follows the noncentral chi-square distribution with 2 · W · TS degrees of freedom and the noncentrality parameter of 2 · P · TS /No , where P denotes the received PU signal power. 4) Objective Function and Reward Model: In the POMDP framework, the objective function is the total expected discounted reward

E

∞

t

β R(st , at )

(2)

t=1

where 0 < β < 1 is a discount factor, and R(s, a) is a reward when the action a is selected in state s. The decision-making algorithm tries to maximize this objective function. Thus, the reward R(s, a) should be given in such a way that the channel utilization can be maximized while the collision probability is minimized. The reward R(0, D) should be a positive value, since user data are successfully transmitted without collision when data transmission is selected in state 0. On the other hand, if data transmission is selected in state 1, then a collision occurs, and therefore, R(1, D) should be a negative value. If sensing or channel switching is chosen, then the time is consumed without transmitting any data, and therefore, R(i, S) and R(i, C) for i = 0, 1 should be less than or equal to zero. Under these constraints, the values of the rewards can be varied to control the tradeoff between the channel utilization and the collision probability. For example, we can reduce the collision probability at the expense of channel utilization by decreasing the value of R(0, D).

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

996

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

B. Solution to POMDP To choose an appropriate action, the decision-making algorithm calculates the “belief vector” at each decision epoch. In [11], it is shown that the belief vector contains all the necessary information for making an optimal decision. Thus, the decision-making algorithm selects the next action based on the belief vector. The “policy” is a function that maps the belief vector to the next action. Among the policies, we aim to find the optimal policy that maximizes the objective function. From now on, we will explain how to calculate the belief vector and how to find the optimal policy. 1) Belief Vector: The belief vector at the tth decision epoch is denoted by π t = (πt0 , πt1 ), where πti is the probability of the state i at the tth decision epoch, which is inferred by the CR on the basis of the previous actions and observations. Since the CR does not have any knowledge of the channel when it is first switched on, the initial probability π 1 is given as the stationary probability vector γ := (μ/(λ + μ), λ/(λ + μ)). After the tth decision epoch, the decisionmaking algorithm updates π t to π t+1 on the basis of the action at and the observation rt . If the action at is D, then the decision-making algorithm receives a null observation ∅, regardless of the true state. This means that the decision-making algorithm obtains no information about the true state. In this case, the belief vector evolves according to the state transition probability. That is, the algorithm updates the belief vector as π t+1 = η(π t ), where η(π) :=

1

j pD j,0 π ,

j=0

1

j pD j,1 π

.

(3)

θ(r, π) :=

σ(r, π)

S q0,r

S := q0,r

1

1

j S q1,r pS pS π j j,0 π j=0 j,1 , σ(r, π) σ(r, π)

1 j=0

j=0

j pS j,0 π

adopted. According to [11], the optimal value function of our POMDP model satisfies V ∗ (π) = maxa∈{D,S,C} Ua∗ (π), where

j=0

In this equation, we define π := (π 0 , π 1 ). If the action at is S, then the MN performs energy detection, and the decision-making algorithm receives a quantized sensing result as an observation. In addition to the state transition, the quantized sensing result is also taken into account by using Bayes’ theorem. The belief vector is calculated as π t+1 = θ(rt , π t ), where

Fig. 4. Example of the optimal value function for each action Ua∗ (π) and the optimal policy as a function of the probability that the channel is occupied π 1 . The bandwidth of a frequency channel W is 1 MHz, and the SNR of the PU signal is −10 dB. The other parameters are as follows: β = 0.99, TD = 0.1 ms, TS = 0.1 ms, TC = 1 ms, 1/λ = 10 ms, 1/μ = 10 ms, R0D = 1, R1D = −1, R0S = R1S = 0, and R0C = R1C = −0.1.

+

S q1,r

1

(4)

∗ UD (π) =

1

πj R(j, D) + βV ∗ (η(π))

(6)

j=0

US∗ (π) =

1

πj R(j, S) + β

j=0

UC∗ (π) =

1

M

σ(r, π)V ∗ (θ(r, π))

(7)

r=1

πj R(j, C) + βV ∗ (γ).

(8)

j=0

From the optimal value function, we can derive the optimal policy as j pS j,1 π

(5)

δ ∗ (π) = argmax Ua∗ (π).

j=0

(9)

a∈{D,S,C}

for r = 1, . . . , M . From (4) and (5) and Fig. 2, we can see that the 1 ) increases as the quantized value belief that the state is 1 (i.e., πt+1 of the sensing result increases. This corresponds to the fact that a high value of the sensing result indicates a high probability that the channel is occupied. Thus, the soft sensing result is well taken into account in updating the belief vector. If the action at is C, then the CR network moves to the next channel. In this case, the belief vector becomes the stationary probability vector, that is, π t+1 = γ. 2) Optimal Policy: The proposed CR makes a decision on the basis of the belief vector. The policy δ: Π → {D, S, C} is a function mapping a belief vector to an action, where Π := {(π 0 , π 1 )|π 0 +π 1 = 1, π 0 ≥ 0, π 1 ≥ 0}. At the tth decision epoch, the decision-making algorithm selects δ(π t ) as an action. Among all the possible policies, the optimal policy δ ∗ is the one that maximizes the objective function (2). Our target is to find the optimal policy. To this end, we define the optimal value function. The optimal value function V ∗ : Π → maps the current belief vector to the total expected reward that will be earned when the optimal policy is

We can calculate the optimal value function and the optimal policy by using dynamic programming, specifically the fixed-grid method in [12]. The optimal policy δ ∗ should be calculated and stored in the MN prior to real-time operation, during which, the MN uses δ ∗ to select actions. Although the optimal policy can easily be calculated by using dynamic programming, we can further simplify the calculation by exploiting the threshold structure of the optimal policy. In Fig. 4, we show an example of the optimal value function and the corresponding optimal policy. In this figure, we can see that the optimal policy can be represented by the lower threshold L and the upper threshold H. We found that the optimal policies with different parameters also have similar structures to this example. Therefore, from these observations, we can empirically reduce the optimal policy to

∗

δ (π) =

D, S, C,

if π 1 < L if L ≤ π 1 < H if π 1 ≥ H.

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

(10)

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

Fig. 5.

Algorithm for the adaptive sensing CR.

997

Fig. 6. Example variation of the probability that the channel is occupied by the PU πt1 over time. The parameters are as follows: L = 0.02, H = 0.6, 1/λ = 10 ms, and 1/μ = 10 ms.

From (10), it is possible to derive the optimal policy by just deciding two thresholds L and H instead of calculating the optimal policy by using dynamic programming. These thresholds affect the performance of the CR network, such as the channel utilization and the collision probability. We will explain how to choose the thresholds in Section V. We summarize the algorithm for the adaptive sensing CR in Fig. 5. In this algorithm, the CR selects actions by comparing πt1 with the thresholds L and H. This can intuitively be explained as follows. To minimize collision, the CR allows data transmission only if the channel is likely to be vacant (i.e., if πt1 < L). If the CR is uncertain about the channel (i.e., if L ≤ πt1 < H), then it performs channel sensing to learn the channel state. The CR switches the operating channel if it is highly probable that the channel is occupied by the PU (i.e., if πt1 ≥ H). V. N UMERICAL R ESULTS In this section, we present some simulation results on the performance of the proposed CR. There are ten frequency channels, each of which has a 1-MHz bandwidth. Unless otherwise noted, the SNR of the PU signal is −10 dB. For the adaptive sensing CR, we set TD = TS = 0.1 ms. The time required to complete a channel-switching process (i.e., TC ) is set to 1 ms. Fig. 6 shows an example of variation of πt1 over time. In this example, we set L = 0.02 and H = 0.6. At the beginning, the operating channel is free of the PU. Whenever πt1 exceeds L = 0.02, the CR performs energy detection. It can be seen that data transmission resumes when πt1 goes below L. This means that only the necessary amount of sensing is conducted by adaptive decision. The proposed CR avoids unnecessary sensing by this mechanism, and it can thus outperform the periodic sensing CR. At 5.2 ms, the PU is activated. At 5.8 ms, πt1 exceeds H = 0.6, and the CR changes the operating channel. In Fig. 7, we show the channel utilization, the collision probability, and the channel-switching time proportion of the adaptive sensing CR according to the lower and upper thresholds. The “channel utilization” is defined as the proportion of time in which the CR nodes successfully exchange data without interrupting the PU. The “collision probability” is defined as the proportion of time in which the CR nodes transmit data when the operating channel is occupied by the PU. The “channel-

Fig. 7. Channel utilization, collision probability, and channel-switching time proportion of the adaptive sensing CR according to the lower and upper thresholds.

switching time proportion” is defined as the proportion of time that the CR nodes spend performing channel-switching processes. In Fig. 7, it can be seen that the lower threshold L controls two important performance measures, i.e., the channel utilization and the collision probability. By increasing L, we can enhance the channel utilization at the cost of the collision probability. We can also see that the channel switching time proportion is affected by the upper threshold H. From the simulation results, we can determine the lower and upper thresholds that allow the CR network to achieve a given target performance. For example, suppose that the objective is to maximize the channel utilization while maintaining the collision probability below 0.01. We can achieve a channel utilization of 0.75 and a collision probability of 0.01 by choosing 0.015 as the value of L, based on Fig. 7. In addition, we can choose 0.9 as the value of H if less-frequent channel switching is preferred for stability. In Figs. 8 and 9, we compare the adaptive sensing CR with the periodic sensing CR in terms of channel utilization and collision probability. The periodic sensing CR is described in Fig. 1(a). It is noted that the periodic sensing CR is virtually the same as the myopic

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

998

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 2, FEBRUARY 2010

10 or 100 ms. The SNR of the PU signal is different for each channel occupation, and it is determined according to the uniform distribution on [−10 dB, −7 dB]. The estimate of the SNR by the adaptive sensing CR is −10 dB. In Fig. 9, we can see that the adaptive sensing CR still outperforms the periodic sensing CR, even when the estimates are incorrect. VI. C ONCLUSION In this paper, we have proposed the adaptive sensing CR. The performance results show that the proposed CR is robust to fast PU state variation. Thus, the proposed CR provides an efficient way to exploit the temporary spectrum opportunities caused by bursty PU data traffic. ACKNOWLEDGMENT Fig. 8. Tradeoff between channel utilization and collision probability of the periodic and the adaptive sensing CRs when λ, μ, and the SNR of the PU signal are fixed and known to the adaptive sensing CR.

The author would like to thank the associate editor and the anonymous reviewers for their valuable comments, which greatly improved this paper. R EFERENCES

Fig. 9. Tradeoff between channel utilization and collision probability of the periodic and the adaptive sensing CRs when λ and μ are unknown to the CR and the SNR of the PU signal is random.

sensing policy proposed in [5], since it also switches the channel to the next in a circular order. In the case of the periodic sensing CR, the time required for channel switching is also set to 1 ms. The periodic sensing CR requires the following three parameters for operation: 1) the length of a data transmission period; 2) the length of a sensing period; and 3) the detection threshold. The graphs of the performance of the periodic sensing CR are plotted by varying these parameters. On the other hand, we plot the graphs of the adaptive sensing CR by varying the thresholds L and H. In addition to these thresholds, the adaptive sensing CR requires information about the PU, including λ, μ, and the SNR of the PU. To obtain the simulation results in Fig. 8, we assume that λ, μ, and the SNR are fixed and known to the adaptive sensing CR. In this figure, we can see that the proposed CR has much higher channel utilization than the periodic sensing CR for the same collision probability, and the performance gap is greater under adverse conditions, i.e., fast PU state variation and a low collision probability requirement. To show that the adaptive sensing CR operates well even when it does not know exact information about the PU, we present the simulation results in Fig. 9 on the assumption that λ and μ are unknown to the CR and the SNR is random. The adaptive sensing CR estimates that both 1/λ and 1/μ are 50 ms, whereas the real values are

[1] S. Haykin, “Cognitive radio: Brain-empowered wireless communications,” IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220, Feb. 2005. [2] Y. C. Liang, Y. Zeng, E. Peh, and T. Hoang, “Sensing-throughput tradeoff for cognitive radio networks,” IEEE Trans. Commun., vol. 7, no. 4, pp. 1326–1337, Apr. 2008. [3] W. Y. Lee and I. F. Akyildiz, “Optimal spectrum sensing framework for cognitive radio networks,” IEEE Trans. Wireless Commun., vol. 7, no. 10, pp. 3845–3857, Oct. 2008. [4] Q. Zhao and Y. Chen, “Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework,” IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589–600, Apr. 2007. [5] Q. Zhao, B. Krishnamachari, and K. Liu, “On myopic sensing for multichannel opportunistic access: Structure, optimality, and performance,” IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 5431–5440, Dec. 2008. [6] S. Geirhofer, L. Tong, and B. M. Sadler, “Dynamic spectrum access in the time domain: Modeling and exploiting white space,” IEEE Commun. Mag., vol. 45, no. 5, pp. 66–72, May 2007. [7] S. Geirhofer, L. Tong, and B. M. Sadler, “Cognitive medium access: Constraining interference based on experimental models,” IEEE J. Sel. Areas Commun., vol. 26, no. 1, pp. 95–105, Jan. 2008. [8] Q. Zhao, S. Geirhofer, L. Tong, and B. M. Sadler, “Opportunistic spectrum access via periodic channel sensing,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 785–796, Feb. 2008. [9] S. D. Jones, N. Merheb, and I. J. Wang, “An experiment for sensingbased opportunistic spectrum access in CSMA/CA networks,” in Proc. DySPAN, Baltimore, MD, Nov. 2005, pp. 593–596. [10] H. Urkowitz, “Energy detection of unknown deterministic signals,” Proc. IEEE, vol. 55, no. 4, pp. 523–531, Apr. 1967. [11] G. E. Monahan, “A survey of partially observable Markov decision processes: Theory, models, and algorithms,” Manage. Sci., vol. 28, no. 1, pp. 1–16, Jan. 1982. [12] W. S. Lovejoy, “A survey of algorithmic methods for partially observable Markov decision processes,” Ann. Oper. Res., vol. 28, no. 1, pp. 47–66, Dec. 1991. [13] M. Wellens, A. de Baynast, and P. Mahonen, “Exploiting historical spectrum occupancy information for adaptive spectrum sensing,” in Proc. IEEE WCNC, Las Vegas, NV, Mar. 2008, pp. 717–722. [14] D. Datla, R. Rajbanshi, A. M. Wyglinski, and G. J. Minden, “Parametric adaptive spectrum sensing framework for dynamic spectrum access network,” in Proc. DySPAN, Dublin, Ireland, Apr. 2007, pp. 482–485. [15] W. A. Gardner, “Signal interception: A unifying theoretical framework for feature detection,” IEEE Trans. Commun., vol. 36, no. 8, pp. 897–906, Aug. 1988. [16] K. W. Choi, W. S. Jeon, and D. G. Jeong, “Sequential detection of cyclostationary signal for cognitive radio systems,” IEEE Trans. Wireless Commun., vol. 8, no. 9, pp. 4480–4485, Sep. 2009. [17] T. Kailath and H. V. Poor, “Detection of stochastic process,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2230–2259, Nov. 1998.

Authorized licensed use limited to: UNIVERSITY OF MANITOBA. Downloaded on April 27,2010 at 17:59:45 UTC from IEEE Xplore. Restrictions apply.

Choi, Hossain, Kim - 2011 - Cooperative Spectrum Sensing Under ...