Recently I went to StrataConf (http://strataconf.com/stratany2012) to learn more about this crazy world of data I'm slowly slipping further into. I made several mind maps that I've posted at the end of this blog post.
Key Take Aways
- Hadoop is huge in the data mining space. Like HUGE.
- Data scientists get overly fixated on playing with their data like programmers do on coding. It wasn't caught until the company was basically about to die. BOTH times. Three solutions:
- Appoint a kind of "canary" who isn't emotionally involved. Listening to the canary becomes the next hard problem.
- Have a hypothesis before diving into data!
- Data scientists need to be "Scrappy". For coders "Hacker" is a synonym.
- Steps
- Analyze- Take the time to understand your model and look at the data. No black boxes.
- Anticipate- Build a data viewer and proactively look for bugs. Bugs are the enemy. STOP THEM.
- Improvise- "Don't indulge in any unnecessary, sophisticated moves..." -Bruce Lee
- Adapt- Error data is GREAT data. Don't just give up... Understand.
- What's a "Data Scientist"?
- The venn diagram:
- The venn diagram:
- Steps
- Real time data- Event oriented queries via Esper. Your algorithms shouldn't require rerunning the whole calculation on new data.
And last but not least! Mind maps!