Week 3 Video 3 Feature Engineering

Feature Engineering

Feature Engineering ◻

Up until this point in the class, we’ve talked about building and validating prediction models



Models that infer a predicted variable from predictor variables

Where the Predicted Variable Comes From ◻

A couple lectures ago, we went into a little more detail about where the predicted variable can come from

Where the Predictor Variables Come From ◻

Where do the predictor variables come from?



Do they fall out of the sky?



Do they come from the Office for Predictor Variables in Washington, DC?

Feature Engineering ◻

The art of creating predictor variables



A major topic in its own right



At Teachers College, I teach a semester-long design studio in Feature Engineering http://www.columbia.edu/~rsb2162/FES2015/



Why a whole class? ◻

Feature engineering is the least well-studied part of the process of developing prediction models ⬜ ⬜

But it’s arguably the most important part Your model will never be any good if your features (predictors) aren’t very good

Why a whole class? ◻ ◻



It is an art, it is human-driven design It involves lore rather than well-known and validated principles It is hard!

The Big Idea ◻

How can we take the voluminous, ill-formed, and yet under-specified data that we now have in education



And shape it into a reasonable set of variables



In an efficient, effective, and predictive way?

A process in its own right 1. 2. 3. 4.

5. 6.

Brainstorming features Deciding what features to create Creating the features Studying the impact of features on model goodness Iterating on features if useful Go to 3 (or 1)

Brainstorming Features ◻

Can be more or less formal

IDEO tips for Brainstorming 1. Defer judgment 2. Encourage wild ideas 3. Build on the ideas of others 4. Stay focused on the topic 5. One conversation at a time 6. Be visual 7. Go for quantity http://www.openideo.com/fieldnotes/openideo-teamnotes/seven-tips-on-better-brainstorming

Building on the Ideas of Others ◻

Doesn’t just have to be people nearby



There’s a huge literature out there of features people have tried and what has worked, or failed to work, for a range of problems



Read papers from researchers working on similar problems, and see what you can use



Some folks have also tried crowd-sourcing (Veeramacheneni et al., 2014)

Brainstorming Features ◻

On hard projects, my research group often meets as a team over pizza and beer to brainstorm



On easier projects, one person brainstorms solo ⬜

And then often discusses their features with another person, who offers further suggestions

Deciding what features to create ◻ ◻

There is never infinite time A trade-off between the effort to create a feature and how likely it is to be useful ⬜

“How likely it is to be useful” – the best you can do is to ■ ■



Look at whether similar features have been useful for similar problems Use your best intuition

Worth biasing in favor of features that are different than anything else you’ve tried before ⬜

Explores a different part of the space

Creating features ◻ ◻



Excel – Really good for prototyping features Google Refine/OpenRefine – Some alternate features that are nice Distillation Code – The scalable solution… but harder to check yourself or explore

Some useful tools in Excel



Pivot Tables – great for aggregating data, and getting the average, min, max, stdev Vlookup – great for translating from aggregations (student-level data, for instance) back to action-level data



Example in this week’s Walkthrough



Further resources ◻

http://www.howtogeek. com/howto/13780/using-vlookup-in-excel/



http://www.excel-easy.com/data-analysis/pivottables.html



http://spreadsheets.about. com/od/datamanagementinexcel/ss/8912pivot _table.htm

Other useful things you can do in Excel ◻ ◻ ◻ ◻ ◻

Counts-so-far Counts-last-n-actions Differentiating first and subsequent attempts Ratios between events of interest Cut-off based features

GoogleRefine (now OpenRefine) ◻

Functionality to make it easy to regroup and transform data ⬜ ⬜ ⬜ ⬜



Find similar names Connect names Bin numerical data Mathematical transforms showing resultant graphs Text transforms and column creation

GoogleRefine (now OpenRefine) ◻

Functionality for finding anomalies/outliers

GoogleRefine (now OpenRefine) ◻



Functionality for automatically repeating the same process on a new data set *Really* nice for cases where you complete a complex process and want to repeat it

GoogleRefine (now OpenRefine) ◻

Some videos you may want to watch later



http://www.youtube.com/watch? v=B70J_H_zAWM http://www.youtube.com/watch? v=cO8NVCs_Ba0 http://www.youtube.com/watch?v=5tsyz3ibYzk





Feature Iteration ◻

Sometimes when a feature looks like it might be good



It’s worth iterating on that feature, trying close variants to see if they do better

Example ◻

You have a feature “slow actions after hints” (cf. Shih, Koedinger, & Scheines, 2008)



You define “slow action” as an action taking over 20 seconds



What if 30 seconds is a better cut-off?

Ways to accomplish this… ◻ ◻ ◻

By hand Programming (Java? Matlab?) Excel Equation Solver

Excel Equation Solver Tutorials ◻

http://office.microsoft.com/en-us/excelhelp/define-and-solve-a-problem-by-usingsolver-HP010072691.aspx



http://www.youtube.com/watch? v=K4QkLA3sT1o



One tip: multistart option avoids local minima (that can sometimes block the solver from even getting started)

A few thoughts

Does feature engineering overfit? ◻ ◻ ◻





It can Which is why it’s useful to remember The true test of a model is whether it works on entirely unseen data If you iterate a lot and use cross-validated goodness Then the true test of your model will be either a held-out data set or newly-collected data later on

Feature Engineering ◻

Your features come from somewhere



You can take a standard set of variables or pre-existing variables ⬜



No question it’s faster

But thinking about your variables is likely to lead to better models ⬜

Actually evidence for this, see (Sao Pedro et al., 2012)

Next Lecture ◻

Automated feature generation and selection

Week 3 Video 3

Your model will never be any good if your features. (predictors) aren't very ... Some useful tools in Excel ... http://www.excel-easy.com/data-analysis/pivot- tables.

135KB Sizes 1 Downloads 278 Views

Recommend Documents

Vegan Week 3
Jan 16, 2017 - Orange juice - Raw, 3 fl oz. 42. 10g. 0g. 1g. 0mg. 1mg. 8g. 0g. Natural Delights - Pecan Pumpkin Pie Spiced Date Rolls, 1 piece. (20g). 75. 12g.

week 3 newsletter.pdf
... peace by being. witnesses to and artificers of peace, by proving. in our lives and example the Lordship of Christ,. whom Isaiah heralded as Prince of Peace.

WEEK 1 WEEK 2 WEEK 3 - Aspens Services
All Day Breakfast. Meat or Vegetarian. Lasagne ... Apple Pie with custard. Winter Berry Sponge with custard. Chocolate Krispie. WEEK 2. 11th Sept, 2nd Oct, ...

ECI Schedule Week 3 (Dec 3) - 2017.pdf
Fieldhouse 1 Fieldhouse 2 High School 1 High School 2. 12. 1 3B - Winchester vs. Monroe Central 5B - Delta (Keesling) vs. Winchester 6B - Monroe Central vs. Muncie White 5/6G - Selma 5 vs. Yorktown 6. 2 3B - Winchester vs. Muncie White 5B - Delta (Ke

Week 3 Lecture Material.pdf
Page 2 of 33. 2. ASIMAVA ROY CHOUDHURY. MECHANICAL ENGINEERING. IIT KHARAGPUR. A cutting tool is susceptible to breakage, dulling and wear. TOOL WEAR AND TOOL LIFE. Rake. surface. Pr. flank. Aux. flank. Page 2 of 33. Page 3 of 33. 3. ASIMAVA ROY CHOU

Term 3 Week 6.pdf
There will be a day time meeting as well as an evening meeting to try to. accommodate the needs of all families. The first Parent Forum will be. on Monday 12th ...

Term 3 Week 4.pdf
Page 1 of 12. St Joseph's Catholic Primary School. NEWSLETTER “Work and Pray”. Term 3 Week 4. 5 August 2015. Dear Parents and Caregivers. “Blessed are the pure of heart, for they. shall see God.” Matthew 5:8. 28-32 Thurlow Street Riverwood NS

Geo Quiz, Week 3.PDF
Qtr. 4 Week 3. notebook April 18, 2017. 1 What river in Northern. Geo Oz, O. 4 Week 3 ... Apr 9-3:28 PM Apr 9-3:20 PM. 2 What kind of farming is most common in ...

Summer 2016 week 3.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Summer 2016 week 3.pdf. Summer 2016 week 3.pdf. Open. Extract. Open with. Sign In. Details. Comments. Genera

Chevra Shmooze Week 3.pdf
bravest 3rd graders actually stayed up the. entire night...oh and all the other kids they. kept up! Sorry guys! The Seniors were. treated to a Late Night/ Early ...

Term 3 Week 10.pdf
Mrs McCue returns after the holidays and I will be on Leave for the first two weeks of next term. STAFF NEWS. Miss Jenna O'Donnell has accepted a position at ...

Term 3 Week 2.pdf
Tea and Open classrooms on Friday 4th August; Feast of Mary. MacKillop liturgy and dress up day on August 8th; Book Week parade. on Tuesday 22nd August; ...

week 8 term 3.pdf
stall or activity e.g donation that pays for the whitebait so all is profit,. donation for the equipment for the hair and beauty salon , please let. Chrissy in the school office know- [email protected] or ph. 4488339. • If you or your fami

Term 3 Week 10.pdf
St Joseph's Catholic Church Riverwood. Mass Times: Sat 9am & 6pm (Vigil), ... St Mary of the Cross. Australia's first Saint ... Spring Fete. Saturday 11 November ...

Church Transforms - Week 3.pdf
But if I tarry long, that thou mayest know how thou oughtest to. behave thyself in the house of God, which is the church of the. living God, the pillar and ground of ...

Term 1 Week 3.pdf
coloured pinboard on the walls, storage cupboards, Audio Visual. equipment, air conditioning and new safety glass in all existing window. panes. We are hoping ...

Term 3 Week 8.pdf
mobile phones and instant access to the internet are part and parcel of. daily life for all ages, both at home and at school. Computers are a vital. and absolutely ...

Ordinary Week 3 DC.pdf
Page 1 of 4. The Parish of. St Wilfrid of York. 53 London Road,. Coalville, Leics, LE67 3JB. Third Sunday in Ordinary Time. 25/26th January 2014. First Reading: Isaiah: 8:23-9:3. Psalm Response: The Lord is my light and my help. Second Reading: Corin

Acts - 3 Week Study.pdf
SATURDAY NOVEMBER 18 6-8PM :: PRAYER AND WORSHIP NIGHT. All of this ... Where does this journey end? I hope it ... Acts - 3 Week Study.pdf. Acts - 3 ...

Week 3 Feb16 .pdf
Rather than have families buying separate school packs. with booklists, at Epsom we buy in bulk and then par- ents pay a subject levy. This is a compulsory ...

Term 3 Week 4.pdf
up in the drop off zone and send an adult into the school grounds to. collect a child from the classroom or playground. If you are in the car. line you are to stay in ...

Week 3 Lecture Material.pdf
Page 2 of 104. 2. Fuzzy Logic Controller. • Applications of Fuzzy logic. • Fuzzy logic controller. • Modules of Fuzzy logic controller. • Approaches to Fuzzy logic controller design. • Mamdani approach. • Takagi and Sugeno's approach. Deb

April week 3.pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. April week 3.pdf. April week