1. The   names   and   emails   of   members   in   your   group  a. Sneha   Rudra   ([email protected])  b. Shantanu   Singhal   ([email protected])  c. Zirui   Tao   ([email protected])    2. List   the   schema   of   the   two   tables    This   was   the   original   schema   of   the   two   tables     TableA 

TableB 

SNo,   Album,   SongName,   Artist,   Time,  Price,   Genres,   Released,   Rights 

Id,Album,Genres,ASIN,Label,Time,Rating ,TrackName,Price,Artist,Released 

  The   scheme   of   the   two   tables   was   transformed   to   be   as   described   in   the   next   step.   The  updated   CSV   files   under   Project   Stage   1  on   the   Project   Website.    3. List   the   attributes   in   the   set   S    S   =  {‘Id’,   ‘Album’,   ‘Genre’,   ‘Label’,   ‘Time’,   ‘Track’,   ‘Price’,’Artist’,   ‘Released’}    The   common   scheme   has   9  attributes.    4. For   each   attribute   X  in   S,   consider   only   Table   A  and   discuss   the   following   in   the  report    a. The   missing   values   as   percentage   and   fractions,   classifications   of   each  attribute   and   the   min,   max   and   average   length   of   the   various   textual  attributes   is   discussed   in   the   table   below     

  Type 

    Missing   Values  Fraction 

Length 

Percentage 

Avg 

Min 

Max 

Id 

numeric 

 



 

 

 

Album 

textual  

 



31.86 



90 

Genres 

categorical   



 

 

 

Label 

textual 

1.79 

31.26 

13 

141 

54   /  3019 

Time 

Time  

 



 

 

 

Track 

textual 

 



26.59 



104 

Price 

money 

34   /  3019 

1.13 

 

 

 

Artist 

textual 

 



20.15 



85 

Released 

Date 

1   /  3019 

  0.03 

 

 

 

  b. Proposed   solutions   for   missing   values :  matrix   completion   and/or   replacing  with   flag   values   or   average   values.    c. Show   at   least   two   histograms   that   your   team   has   created.   Find   and   report  possible   outliers   and   anomalies   among   the   attribute   values    We   picked   two   attributes   to   plot   the   histogram   on   our   table:   the   “Time”   attributes  and    “Album”   attributes.   Since   Album   belongs   to   the   textual   type.   We   plot   its  corresponding   length   distribution   on   histogram.       

 

    Based   on   the   histograms   that   we   created   for   track   time   and   album   name   we’ve  identified   the   following   outliers   for   each      Track   Duration   Length  1. 00:34:29   (id   =  2786) 

Album   Name   Length  1.   9   0   (id   =  2980) 

2.     00:34:25   (id   =  2911) 

    d. If   the   attribute   value   is   supposed   to   follow   a  certain   format   (e.g.,   dates),  then   discuss   if   all   values   follow   the   same   format,   or   if   there   is   some  problem   with   the   format   and   we   will   have   to   standardize   the   formats   later     

  Expected   Format 

Genres 

Comma   separated   list   of   Genre 

Time 

mm:ss 

Price 

$[0­9]+(\.[0­9]{1,2}) 

Released 

The   values   in   Released   attribute  should   follow   “dd­MMM­yy”    but   in  some   cases   only   the   “yy”   mentioned.    Possible   Solutions  1. We   gather   more   information  and   update   the   tuple   in   the  database  2. Split   the   “released   ”  attributes  into   “  released   date”   ,”released  month”,   “released   year”   and  make   the   first   two   optional. 

  e. Are   there   synonyms   among   attribute   values?    No,   not   in   our   datasets.    f. Sometimes   attribute   values   are   "sprinkled"   all   over   the   item.   Do   you   have  this   problem   with   this   attribute?     The  A   rtist   attribute   is   sprinkled   in   other   attribute   such   as  T   rack   name.   For  example,   in   the   track   ‘Weight   Of   The   World   [feat.   Blaque   Keyz]’   by   Jon   Bellion,  the   name   of   another   featuring   artist   Blaque   Keyz   appears   but   isn’t   captured   as   a  distinguished   attribute.    g. Do   you   see   any   other   data   quality   problems   with   this   attribute?     Yes,   the   data   we   extracted   (in   Stage   1)   was   not   encoded   in   UTF­8   which   lead   to  IO   errors   while   reading   this   data   into   the   database   for   analysis.   So   we   had   to  convert   the   file   into   UTF­8   encoding   before   feeding   it   to   the   database.    5. List   any   software   tools   that   you   have   used   to   do   the   above   data   understanding  and   cleaning.    List   of   packages   used:   Pandas,   Jupyter   notebook,   matplotlib,   Numpy,   Scipy   

([email protected]) b. Shantanu - GitHub

The common scheme has 9 attributes. 4. For each attribute X in S, ... and cleaning. List of packages used: Pandas, Jupyter notebook, matplotlib, Numpy, Scipy.

159KB Sizes 15 Downloads 262 Views

Recommend Documents

David B. Maia - GitHub
Development of a new web application (frontend & backend) from scratch using . ... Important role on a complex IAM software upgrade with major architectural changes ... Java Developer for an IAM regional project for an insurance company.

Alarm clock - model B - GitHub
ALARM ON-OFF. 5.797. 3.495. USB HOST. ETHERNET ... Alarm Clock. TITLE. Final assembly (Model B) ..... ARM System-On-Module. 1. 3. DM3AT-SF-PEJM5.

Display (Model B) - Base PCB - GitHub
Alarm Clock. TITLE. Display (Model B) - Base PCB. REV. PART #. CLK-PC-07. DOCUMENT #. UNITS. INCHES. SIZE. B. DATE 2/8/2015. CLK-DWG-10. BENOIT ...

B.IO.BepeTeuoa, M.l1.rypeauq, l1.A.EMeJIHH, B. - GitHub
BepeTeuoa, M.l1.rypeauq, l1.A.EMeJIHH, ..... R: -40B. Aim nxo.n;Horo 6JioKa ECT n KBY B3CM-6 STHX yponHeA n;ocraTotJHo -~ xopomero cpa6aTHBaHH.fl,.

Display (Model B) - Dots PCB - GitHub
1. 2. 3. 4. B. A. 3. 2. 1. 5. C. D. 4. 6. 7. 8. A. A. SHEET 1 OF 1. Alarm Clock. TITLE. Display (Model B) - Dots PCB. REV. PART #. CLK-PC-06. DOCUMENT #.

Olga B. Botvinnik – Curriculum Vitae - GitHub
Aug 6, 2017 - 2009 David Gifford Laboratory, MIT Computer Science and Artificial .... Taught three weeks of git, RNA-seq and analysis methods to graduate-level UCSD course ... Developed interactive curriculum for online Bioinformatics ...

Variant A Variant B Variant C Variant D - GitHub
2017-05-15 13:15 /home/wingel/eagle/diff-probe/diff-probe.sch (Sheet: 1/1). Variant A. Variant B. Variant C. Variant D. LMH6702. 0. Ω. /N. F. NF. 1. 5. 0. Ω. GND. GND. 2. 2. 0. Ω. N. F. CON-1X3-TAB-AMP-29173-3-NARROW. V. +. V. -. GND. V. +. V. -.

BeagleBoard-xM LCD7 Rev B System Reference Manual - GitHub
Nov 1, 2012 - Figure 6. BeagleBoard-xM LCD7 Power Management IC .......................................... 21. Figure 7. ... A vs. B. In revision B, the backlight circuit being modified to provide better current control for the .... S2-S6. SWITCHES.

pdf-82\linux-shell-scripting-cookbook-second-edition-by-shantanu ...
The free and open source software projects he has contributed to are PiTiVi. Page 3 of 12. pdf-82\linux-shell-scripting-cookbook-second-edition-by-shantanu-tushar-sarath-lakshman.pdf. pdf-82\linux-shell-scripting-cookbook-second-edition-by-shantanu-t

B = B , A B , A \ B = AA, B U (A [ B) C = AC \ BC (A \ B) C ...
0 2 0. 0 ⇢ 0. 0 2 10l. 0 ⇢ 10l. A [ B = B , A ⇢ B , A \ B = A. A, B ⇢ U. (A [ B)C = AC \ BC. (A \ B)C = AC [ BC. P Q R. U. A B C. U. P. Q R. P Q. R. A B C. A \ BC ⇢ C. AC [ BC ⇢ C. AC [ B ⇢ CC. AC ⇢ BC [ C. A ⇢ BC [ CC. B < A = 1x ;

brš b# b#ay鋟h
Jan 1, 2014 - Jiw 箎td§f #353; g碞褉 bjh瞜hs®fS¡F mè«¢fÂ¥gl nt©oa mid¤J¢. rYiffis肏fhy¤nj m諵J tU»wJ. mªj tif #353;, j #339;ehL #8249; c‰g刈k‰W« g ...

B
So far we cannot correct this side effect, but at least we can detect it. ➢ Possible uses of this ... standard deviation in the signal since the beginning of the experimental ... Illustration of the process on the magnetometers signal. Fig. 5. Outp

B b ki
Microsoft Office 2013 ProfessionalPlus SP1 NL.Woodruffand theschnibble ... Pdf 2015. is_safe:1.Someadaptations thecharacters went through concluded corruptly, such has Marijaand Jurgis' and othercharacterscontinued to handle. life has it b b kia day

brš b# b#ay鋟h
Jan 1, 2014 - 枵 xÂ¥gªj¤ nj醬鄫ªJ braÅ¡gh£L¡F tU«. 10). 洈j C朦ca晌1.12.2011 KjÅ¡ eilKiw¡F tU«. 1.12.2011 KjÅ¡. 31.12.2013 tiu獠hd 25 khj§fS¡fhd C朦kh‰w 箘it¤ ...

Alice Bob g, p, AA = ga mod p B = gb mod p BK = A b mod p K ... - GitHub
Page 1. Alice. Bob g, p, A. A = ga mod p. B = gb mod p. B. K = Ab mod p. K = Ba mod p.

GitHub
domain = meq.domain(10,20,0,10); cells = meq.cells(domain,num_freq=200, num_time=100); ...... This is now contaminator-free. – Observe the ghosts. Optional ...

GitHub
data can only be “corrected” for a single point on the sky. ... sufficient to predict it at the phase center (shifting ... errors (well this is actually good news, isn't it?)

Form B-08-004-B -
Aug 12, 2011 - All personal vehicles must be parked in the parking lot at the Front Gate (or other designated area) unless authorized to drive in.

ENDF/B-VI Ni-61 Resonance Cross Sections Energy (eV) - GitHub
ENDF/B-VI Ni-61. Gas Production. 0. 5. 10. 15. 20. *10. 6. Energy (eV). 0.0. 0.1. 0.2. 0.3. 0.4. 0.5. 0.6. 0.7. C ross section (barns) hydrogen helium-4.

The equation for a straight line is y = mx + b - GitHub
Equation 1 and Eq. 2 are known to all first-year math students. The Fourier series is a little more advanced: y = 1. 2 a0 +. ∞. ∑ n=1 an cos(nx) +. ∞. ∑ n=1 bn cos(nx). (3). Equations 1–3 are used throughout science and engineering. Equatio

The equation for a straight line is y = mx + b (1) - GitHub
y = Ae−γt cos(2πft). It is also possible to number equations generically without planning to refer to them; e.g.: π = 3.141592653589793238462643 ... (4). 1.

bb b
D bb b b !b c ! ! d Wee b D ,9: ! ! ! gE£ Z b b D ) D `b ! h E aE , i b. b b ! T! bb bb /E j ¤ b b7 ??b b D `b ! ! kkb bb b ! b X Eb ! ! ! ! ! b m E ! ! nb , h E aE o b!

MP-STMD 06-03 DQ420MA MP-STMD 06-03-B DQ420MA - GitHub
Page 1. MP-STMD. 06-03. DQ420MA. MP-STMD. 06-03-B. DQ420MA.

II B
current I (t) and the instantaneous power P(t) and also the average power. ... Show that the resonant frequency ω0 of an RLC series circuit is the geometric mean.