Natural Language Watermarking Ya l d a M o h s e n za d e h 2 9 th A b a n 1 3 8 7

The 1st Workshop on Info. Hiding

Outline 2

 Introduction 

Text watermarking vs. Language watermarking



Natural language watermarking requirements



A general language watermarking system

 Language watermarking techniques 

Synonym substitution



Syntactical Transformations



Semantical Transformations

 Conclusion & Future Work ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Introduction 3

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

TW vs. NLW 4

Test Watermarking

Language Watermarking

Modifying the appearance of text elements

Embedding in texts without changing its meaning and appearance

Limited to change line or word spacing substitute fonts (sizes) and more

Semantic and syntactic transformations, lexical substitutions, typographical alterations,… ‫شاخه داوشجویی اوجمه رمز ایران‬

The 1st Workshop on Info. Hiding

13:38

Value of The Text 5

Meaning

Style

NLW requirements

Gramm aticality

Fluency ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Language Watermarking Categories 6

Synonym substitutions Semantic transformations Syntactic transformations Translation Punctuation modification Simulated Typographical methods ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

A Generic NLW System 7 Linguistic transformation

Watermark message

Secret key

User parameters: Distortion threshold

Original document

Linguistic analysis: part of speech tagging, parsing, …

Selection (Information carrying words or sentences)

Watermarked document Verification of Embedding

Failure message

Success of embedding & distortion threshold

Applying linguistic transformations to embed the watermark ‫شاخه داوشجویی اوجمه رمز ایران‬

The 1st Workshop on Info. Hiding

13:38

Synonym Substitution 8

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Equmark 9

(Word, sense)  A lexical watermarking system w3 w1

 Building a weighted undirected graph G w2  Node: (word, sense)  Edge: Its nodes are synonyms  Edge weight: The measure of the similarity

G The 1st Workshop on Info. Hiding

‫شاخه داوشجویی اوجمه رمز ایران‬ 13:38

Algorithm 10

 Select a sub Graph (GW) using a secret key  Color GW using three color  D1: Distortion on the meaning of the text due to the transformations  D2: Estimated distortion will be done by the adversary  The candidate message carrying word that maximizes D2 while keeping D1

below a user-set threshold is picked for embedding replacement

GW ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Syntactical Transformations 11

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Syntactic Watermarking 12

 Syntactic rules

Structure of sentences  The semantic component of the grammar 

The meaning

In theory: 

A given linguistic structure has a particular meaning defined by the related syntax–semantics combination.

 In practice: 

All languages possess forms that carry very similar or identical semantic interpretations and yet they have different syntactic structures.

 Each syntactic structure

A particular bit combination ‫شاخه داوشجویی اوجمه رمز ایران‬

The 1st Workshop on Info. Hiding

13:38

The Syntactic Tools for NLW 13

“Active” / “Passive” Structures Possession “ Of ” / “ ‘s ”

• Workers carried the sand. • The sand was carried by workers. • John’s book • The book of John

• Watermarking is one of the important areas of study. subject-predicateDIR/predicate-subject-DIR • One of the important areas of study is watermarking.

Conjunct order change

Adverb displacement

• John and Jane • Jane and John • Ali will go to Tehran tomorrow. • Tomorrow Ali will go to Tehran. ‫شاخه داوشجویی اوجمه رمز ایران‬

The 1st Workshop on Info. Hiding

13:38

Watermark Embedding 14

 Assume:

Watermark: m bits  Text:n sentences, n > m 

 Parse the input text  Find available syntactic tools  Select one of available tools randomly  Implement the tool

 Skip watermarking occasionally ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Sentence-level Preprocessor for Watermarking 15

Raw Sentences

Annotator

Input for watermarking

Treebank Sentences

Transformer

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Text-level Watermark Embedding 16

 Watermark Testing  Watermark Selecting  Stylistic concerns  Security concerns  Watermark Embedding

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

The Watermarking Algorithm at The Text Level 17

S1

Wordnet

S2

Watermark tool Tester

Dictionary

WS1,1

Wordnet

Watermark tool Tester

Dictionary

WS1,3

WS1,5 WS1,6

S3

Wordnet Dictionary

WS2,2

Watermark tool Tester

WS3,1 WS3,4

Message Bits

Ranomized Bit Embedder Secret key The 1st Workshop on Info. Hiding

Watermarked ‫داوشجویی اوجمه رمز ایران‬Text ‫شاخه‬ 13:38

Watermark Extracting 18

 Input the possibly marked sentence and run the syntactic parser  Obtain the parsed syntactic tree  Estimate the pool of potentially applicable tools and determine

the one(s) that must have been applied (secret key or roundrobin scheme)  Check the direction of the tool:

Forward manner: decide for ‘‘1”  Backward sense: decide for ‘‘0” 

The 1st Workshop on Info. Hiding

‫شاخه داوشجویی اوجمه رمز ایران‬ 13:38

Semantical Transformations 19

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

NLW based on Presuppositional Analysis 20

 Based on the linguistic semantic phenomenon called

presupposition  Example:  Bill regrets that Jane is married.  Presupposition : Jane is married.  Jane has no husband? Bill regrets that Jane is married. (unacceptable)

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Presupposition Triggers 21

Definite NPs Factives Implicative verbs Aspectual verbs and modifiers ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Definite NPs and Proper Names 22

 Identification  Definite article: the  Demonstrative pronouns: this, that, these, those  Possibilities for transformations  Removing the determiner at all  Using an indefinite article instead  Replacing one trigger with another  Example:  It seems pretty clear to me that the/a great majority among the Revolutionary generation wanted a national Union... ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Factives 23

 Identification  

Predicates: know that, realize that, regret that, discover that,…. Non-factive predicates: “believe”, “think”, “suppose”, ….

 Possibilities for transformations 

Replacing the factive construction in a sentence with a non-factive

 Example 

Many Americans do not realize/believe that George Washington crossed and re-crossed the Delaware River a total of four times in the waning days of 1776. ‫شاخه داوشجویی اوجمه رمز ایران‬

The 1st Workshop on Info. Hiding

13:38

Implicative verbs 24

 Identification  Semantically rich verbs: manage, forget, happen, avoid, etc.  X forgot to V

= “X ought to have Ved, or intended to V” result: X did not V  X managed to V

= “X tried to V” result: X did V  Possibility for transformations  Preserving the “result” introduced by the implicative verb, and removing implicative verb  Example  Somehow I managed to wrench/wrenched myself out of the dream, but not into a state of waking; it was like the screen went blank. 

= “I tried to wrench myself out of the dream” ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Aspectual Verbs and Modifiers 25

 Identification 



Verbs and adverbs denoting the beginning, end or continuation of an action such as begin, start, continue, stop, finish,…. X + start, begin VERB-ing

= “X didn't VERB at time before t”

 Possibilities for transformations  Removing the aspectual modifier, but introducing the information presupposed by it  Example:  Demand for the Community Services Board's assistance will continue to grow/grow as before. ‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Watermarking Method 27  Transforming any sentence containing n presuppositions into a sentence containing

(n-1) presuppositions  Assume: 

The sentences are ordered from 1 to N.

 Group sentences in subsets of k sentences  Secret key: the choice of the particular arrangement of k sentences among N

(C(N, k) possibilities)  For each subset of k sentences:  

bit 1 : Forcing the amount of presuppositions in the group to be even bit 0 : Forcing the amount of presuppositions in the group to be odd ‫شاخه داوشجویی اوجمه رمز ایران‬

The 1st Workshop on Info. Hiding

13:38

Watermarking Method 28

 Making only one transformation per k sentences in

order to:  

Keep fluency of the text Avoid the sentences to sound too heavy, artificial and hence obviously modified.

 If the text is long enough,  Using cycling code in order to resist against cropping of a part of the text

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

Conclusion & Future Work 29

 Equmark system  Syntactical watermarking  Semantical watermarking  Increasing capacity  Different languages, different genres

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

References 30

[1] O. Vybornova, B. Macq, “Natural Language Watermarking and Robust Hashing Based on Presuppositinal Analysis”, IEEE 2007. [2] H. M. Meral, B. Sankur, A. S. Ozsoy, T. Gungor, E. Sevinc, "Natural language watermarking via morphosyntactic alterations", since direct, computer speech and languang, 2008. [3] M. K. Topkara, “NEW DESIGNS FOR IMPROVING THE EFFICIENCY AND RESILIENCE OF NATURAL LANGUAGE WATERMARKING”, Thesis, Purdue University, 2007 [4] U. Topkara, “INFORMATION SECURITY APPLICATIONS OF NATURAL LANGUAGE PROCESSING TECHNIQUES”, Thesis , Purdue University, 2007

‫شاخه داوشجویی اوجمه رمز ایران‬ The 1st Workshop on Info. Hiding

13:38

The 1st Workshop on Info. Hiding

31

13:38

Natural Language Watermarking

Watermark Testing. Watermark Selecting. ○ Stylistic concerns. ○ Security concerns. Watermark Embedding. 13:38. The 1st Workshop on Info. Hiding. 16 ...

2MB Sizes 0 Downloads 183 Views

Recommend Documents

Partitivity in natural language
partitivity in Zamparelli's analysis to which I turn presently. Zamparelli's analysis of partitives takes of to be the residue operator. (Re') which is defined as follows:.

natural language processing
In AI, more attention has been paid ... the AI area of knowledge representation via the study of ... McTear (http://www.infj.ulst.ac.uk/ cbdg23/dialsite.html).

Relating Natural Language and Visual Recognition
Grounding natural language phrases in im- ages. In many human-computer interaction or robotic scenar- ios it is important to be able to ground, i.e. localize, ref-.

Ambiguity Management in Natural Language Generation - CiteSeerX
from these interactions, and an ambiguity ... of part of speech triggers obtained by tagging the text. .... menu commands of an Internet browser, and has used.

Natural Language as the Basis for Meaning ... - Springer Link
practical applications usually adopt shallower lexical or lexical-syntactic ... representation, encourages the development of semantic formalisms like ours.

Natural Language Generation through Case-based ...
Overview of the CeBeTA architecture for case-based text modification. .... where SED(s, s ) is defined as a custom Sentence Edit Distance (SED)[12] func- tion.