Natural Language Watermarking Ya l d a M o h s e n za d e h 2 9 th A b a n 1 3 8 7
The 1st Workshop on Info. Hiding
Outline 2
Introduction
Text watermarking vs. Language watermarking
Natural language watermarking requirements
A general language watermarking system
Language watermarking techniques
Synonym substitution
Syntactical Transformations
Semantical Transformations
Conclusion & Future Work شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Introduction 3
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
TW vs. NLW 4
Test Watermarking
Language Watermarking
Modifying the appearance of text elements
Embedding in texts without changing its meaning and appearance
Limited to change line or word spacing substitute fonts (sizes) and more
Semantic and syntactic transformations, lexical substitutions, typographical alterations,… شاخه داوشجویی اوجمه رمز ایران
The 1st Workshop on Info. Hiding
13:38
Value of The Text 5
Meaning
Style
NLW requirements
Gramm aticality
Fluency شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Language Watermarking Categories 6
Synonym substitutions Semantic transformations Syntactic transformations Translation Punctuation modification Simulated Typographical methods شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
A Generic NLW System 7 Linguistic transformation
Watermark message
Secret key
User parameters: Distortion threshold
Original document
Linguistic analysis: part of speech tagging, parsing, …
Selection (Information carrying words or sentences)
Watermarked document Verification of Embedding
Failure message
Success of embedding & distortion threshold
Applying linguistic transformations to embed the watermark شاخه داوشجویی اوجمه رمز ایران
The 1st Workshop on Info. Hiding
13:38
Synonym Substitution 8
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Equmark 9
(Word, sense) A lexical watermarking system w3 w1
Building a weighted undirected graph G w2 Node: (word, sense) Edge: Its nodes are synonyms Edge weight: The measure of the similarity
G The 1st Workshop on Info. Hiding
شاخه داوشجویی اوجمه رمز ایران 13:38
Algorithm 10
Select a sub Graph (GW) using a secret key Color GW using three color D1: Distortion on the meaning of the text due to the transformations D2: Estimated distortion will be done by the adversary The candidate message carrying word that maximizes D2 while keeping D1
below a user-set threshold is picked for embedding replacement
GW شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Syntactical Transformations 11
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Syntactic Watermarking 12
Syntactic rules
Structure of sentences The semantic component of the grammar
The meaning
In theory:
A given linguistic structure has a particular meaning defined by the related syntax–semantics combination.
In practice:
All languages possess forms that carry very similar or identical semantic interpretations and yet they have different syntactic structures.
Each syntactic structure
A particular bit combination شاخه داوشجویی اوجمه رمز ایران
The 1st Workshop on Info. Hiding
13:38
The Syntactic Tools for NLW 13
“Active” / “Passive” Structures Possession “ Of ” / “ ‘s ”
• Workers carried the sand. • The sand was carried by workers. • John’s book • The book of John
• Watermarking is one of the important areas of study. subject-predicateDIR/predicate-subject-DIR • One of the important areas of study is watermarking.
Conjunct order change
Adverb displacement
• John and Jane • Jane and John • Ali will go to Tehran tomorrow. • Tomorrow Ali will go to Tehran. شاخه داوشجویی اوجمه رمز ایران
The 1st Workshop on Info. Hiding
13:38
Watermark Embedding 14
Assume:
Watermark: m bits Text:n sentences, n > m
Parse the input text Find available syntactic tools Select one of available tools randomly Implement the tool
Skip watermarking occasionally شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Sentence-level Preprocessor for Watermarking 15
Raw Sentences
Annotator
Input for watermarking
Treebank Sentences
Transformer
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Text-level Watermark Embedding 16
Watermark Testing Watermark Selecting Stylistic concerns Security concerns Watermark Embedding
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
The Watermarking Algorithm at The Text Level 17
S1
Wordnet
S2
Watermark tool Tester
Dictionary
WS1,1
Wordnet
Watermark tool Tester
Dictionary
WS1,3
WS1,5 WS1,6
S3
Wordnet Dictionary
WS2,2
Watermark tool Tester
WS3,1 WS3,4
Message Bits
Ranomized Bit Embedder Secret key The 1st Workshop on Info. Hiding
Watermarked داوشجویی اوجمه رمز ایرانText شاخه 13:38
Watermark Extracting 18
Input the possibly marked sentence and run the syntactic parser Obtain the parsed syntactic tree Estimate the pool of potentially applicable tools and determine
the one(s) that must have been applied (secret key or roundrobin scheme) Check the direction of the tool:
Forward manner: decide for ‘‘1” Backward sense: decide for ‘‘0”
The 1st Workshop on Info. Hiding
شاخه داوشجویی اوجمه رمز ایران 13:38
Semantical Transformations 19
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
NLW based on Presuppositional Analysis 20
Based on the linguistic semantic phenomenon called
presupposition Example: Bill regrets that Jane is married. Presupposition : Jane is married. Jane has no husband? Bill regrets that Jane is married. (unacceptable)
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Presupposition Triggers 21
Definite NPs Factives Implicative verbs Aspectual verbs and modifiers شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Definite NPs and Proper Names 22
Identification Definite article: the Demonstrative pronouns: this, that, these, those Possibilities for transformations Removing the determiner at all Using an indefinite article instead Replacing one trigger with another Example: It seems pretty clear to me that the/a great majority among the Revolutionary generation wanted a national Union... شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Factives 23
Identification
Predicates: know that, realize that, regret that, discover that,…. Non-factive predicates: “believe”, “think”, “suppose”, ….
Possibilities for transformations
Replacing the factive construction in a sentence with a non-factive
Example
Many Americans do not realize/believe that George Washington crossed and re-crossed the Delaware River a total of four times in the waning days of 1776. شاخه داوشجویی اوجمه رمز ایران
The 1st Workshop on Info. Hiding
13:38
Implicative verbs 24
Identification Semantically rich verbs: manage, forget, happen, avoid, etc. X forgot to V
= “X ought to have Ved, or intended to V” result: X did not V X managed to V
= “X tried to V” result: X did V Possibility for transformations Preserving the “result” introduced by the implicative verb, and removing implicative verb Example Somehow I managed to wrench/wrenched myself out of the dream, but not into a state of waking; it was like the screen went blank.
= “I tried to wrench myself out of the dream” شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Aspectual Verbs and Modifiers 25
Identification
Verbs and adverbs denoting the beginning, end or continuation of an action such as begin, start, continue, stop, finish,…. X + start, begin VERB-ing
= “X didn't VERB at time before t”
Possibilities for transformations Removing the aspectual modifier, but introducing the information presupposed by it Example: Demand for the Community Services Board's assistance will continue to grow/grow as before. شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Watermarking Method 27 Transforming any sentence containing n presuppositions into a sentence containing
(n-1) presuppositions Assume:
The sentences are ordered from 1 to N.
Group sentences in subsets of k sentences Secret key: the choice of the particular arrangement of k sentences among N
(C(N, k) possibilities) For each subset of k sentences:
bit 1 : Forcing the amount of presuppositions in the group to be even bit 0 : Forcing the amount of presuppositions in the group to be odd شاخه داوشجویی اوجمه رمز ایران
The 1st Workshop on Info. Hiding
13:38
Watermarking Method 28
Making only one transformation per k sentences in
order to:
Keep fluency of the text Avoid the sentences to sound too heavy, artificial and hence obviously modified.
If the text is long enough, Using cycling code in order to resist against cropping of a part of the text
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
Conclusion & Future Work 29
Equmark system Syntactical watermarking Semantical watermarking Increasing capacity Different languages, different genres
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
References 30
[1] O. Vybornova, B. Macq, “Natural Language Watermarking and Robust Hashing Based on Presuppositinal Analysis”, IEEE 2007. [2] H. M. Meral, B. Sankur, A. S. Ozsoy, T. Gungor, E. Sevinc, "Natural language watermarking via morphosyntactic alterations", since direct, computer speech and languang, 2008. [3] M. K. Topkara, “NEW DESIGNS FOR IMPROVING THE EFFICIENCY AND RESILIENCE OF NATURAL LANGUAGE WATERMARKING”, Thesis, Purdue University, 2007 [4] U. Topkara, “INFORMATION SECURITY APPLICATIONS OF NATURAL LANGUAGE PROCESSING TECHNIQUES”, Thesis , Purdue University, 2007
شاخه داوشجویی اوجمه رمز ایران The 1st Workshop on Info. Hiding
13:38
The 1st Workshop on Info. Hiding
31
13:38