## Saturday, December 31, 2016

### 2016 Review and 2017 Goals

In the review a year ago I had dubbed 2016 as the Year of Learning. I had already fixed my finances and my body and heath were in good places. I had just gotten into a new role at my job. So I decided 2016 would be about rapid learning. The second major goal was to create passive income streams. The last goal was to continue learning language.

Here was what I had written down a year ago:

Start

attempt 1 yoga session
studying mandarin.
Join a social group
Write an ebook
blogging on a weekly basis quota

### 2016 notable events

Brother passed away on Christmas Eve. One of the saddest days of my life
I quit my job - a scary but major necessary life change
Company that I worked for was bought so I should expect a little cashout in 2017 from stocks
I did a ton of interview prep, got significantly better - got an offer, but decided to not go that direction.
started the habit of making my bed each morning
gained the habit of flossing every night
decluttered - got rid of almost everything I had, ready for moving out

### 2017 Goals

With that said I have dubbed 2017 The Age of Exploration. I have it set up that I will move out in January, which will ensure I do one more round of minimizing and decluttering. From there I will be doing some traveling until a wedding requires me to be back. I would think that the best plan is to work on my book during travel off times and minor project ideas. When I get back I can decided to explore more places or to focus on exploring new business careers. Regardless I am treading into new and scary territory but I'm going excitedly and without regrets

Start

Work on a static muscle up.
working on the handstand
Toastmasters
writing ebook on a regular schedule
\$1 in passive income
waking up early (again)

Continue

blogging on a minimum weekly basis
Morning/Gratitude journal
diet,exercise
expense tracking

Stop

watching sports
focus on the efficiency of the workout.

## Thursday, December 22, 2016

During his podcast Jocko reads his highlighted notes from The Armed Forces Officer. I've noticed that there isn't really much difference between military command and software engineering command. Leadership skill in different crafts are totally transferable. It also stands the test of time, this book was written such a long time ago but still has so many golden nuggets.

### Notes:

These qualities are the epitome of strength not of softness.
1. Strong belief in human rights
2. Respect for dignity in every other person
3. The golden rule attitude (treat people
4. Interest in human welfare
5. Dealing with every man as if he was a blood relative

#### 13 pitfalls of a leader

1. To attempt to setup your own standard of right and wrong
2. Try to measure the enjoyment of others by your own
3. To expect uniformity of opinions in the world
4. To fail to make allowance for inexperience
5. To endeavor to mold all dispositions alike
6. Not to yield on unimportant trifles
7. To look for perfection in our own actions
8. To worry ourselves and others about what can't be remedied
9. Not to help everyone wherever, however, whenever
10. To consider impossible what we cannot do ourselves
11. To believe only what our finite minds can grasp
12. Not to make allowances for the weakness of others
13. To estimate by some outside quality when it is that within that makes the man

• Don't make snap judgments of people
• The ones who talk moderately will receive the respect of others. Not the ones who brag.
• The man who will not listen will never developed wits enough to differentiate between a bore and a sage and therefore cannot pick the best company
• Remember the little things like names, titles, etc.
• Success means leisure and focusing on what is important.
• Lack of common purpose is main downfall of groups
• A leader doesn't need to understand everything but need to know if something is being done right or wrong
• Refuse kindly vs acquiesce in a bad manners
• There is so much worthwhile in living dangerously. Risk nothing, gain nothing
• What is the main test of human character? Being patient through tough circumstances. Can you push on?

## Tuesday, December 20, 2016

### Muscle Up

It took me 3 months to be able to get to this point (not seriously working at it but going out and training it every other day). When I started I was able to do 15-20 pullups and 25 dips.  I started with explosive pullups to my chest and then some kipping and finally working on strength in the transition point between the pullup and the dip.

I was able to get 2 (ugly) muscleups. My next goal is to refine this to strict non-kipping muscleups and eventually (on parallel bars) transition to a handstand.

Why do this? Recently my gym membership expired and since my whereabouts might be unknown in the next year I've decided to avoid re-upping for another year. In the meantime I'm exploring some creative ways to stay fit without it. MMM has a great article on not needing to go to gyms. I do miss doing deadlifts but for now I'm going to completely avoid the gym and work on balance and core strength. I'm mainly going to be working on handstands and muscle ups not because they are necessary the most efficient to stay in shape, but because they are fun and I'd like to see how far I can go in that arena.

## Monday, December 12, 2016

### Visual Books

Audio books seem to be more popular than ever. When I listen to podcasts, which is like a short form audio book,  I often hear Audible.com as a sponsor. Amazon, who owns them also gives a free trial. So I decided to give it a shot.  Here's what I find so far.

The voice actor gives the book a bit more spice which is enjoyable, but after awhile I end up thinking that the audio is too slow and speed it up to 1.5-2x anyway which sort of kills that. I guess its hard for me to just sit back and enjoy a book when I know I can do it in half the time. There are certainly pros and cons to this attitude. Another thing that I found is that one of the main sells of audio books doesn't work well for me. The sell is that you can listen to it anywhere such as during the commute or working out. This sounds like it would work but what I find for myself is that I am so focused on my workout or even noticing things on my commute that I don't tune in to the book that well. Although I am hearing the words, the meaning is not registering in my head when I get the least bit distracted. I end up hitting the 15 second back button often. I don't find this surprising to me as I know that I am terrible at multi-tasking; unable to walk and listen at the same time.

So in order to listen to an audio book or podcast and get anything out of it I need to not be exercising or walking or commuting. Just sitting and giving it my complete attention. But if I were to do this, then I'd usually rather just be skimming a book instead. You can't really skim an audio book as fast as you would a book since you are really only limited to the jump forward button and the audio throughput is much slower than that of a visual scan.

I'm not saying that audio books are worse than visual books. I think if you polled the general population they would enjoy audio books better than visual books. I'm just pointing out some disadvantages of them and noting that I have a general preference towards speed of processing in my decision in most cases. For me the ideal audio book listening case is when I want to listen to a fiction story book with a specific voice actor that I enjoy. Otherwise I'll be processing my books in the old style visual format.

## Wednesday, December 7, 2016

I did a personality test at https://www.16personalities.com/ and here is what it came up with. Now like any other personality test on the internet, I know to take the results with a grain of salt. Just because I answered some questions and they gave me this result, it does not necessarily describe who I am. I wonder if I had answered one question differently or if I happened to be in a different mood if I would have gotten something completely different. I think its likely. When I read the results it sounds like it hit the nail spot on but I'll keep the Forer Effect in the back of my mind when doing the analysis.

#### In General...

INFJs tend to see helping others as their purpose in life, but while people with this personality type can be found engaging rescue efforts and doing charity work, their real passion is to get to the heart of the issue so that people need not be rescued at all.
INFJs indeed share a very unique combination of traits: though soft-spoken, they have very strong opinions and will fight tirelessly for an idea they believe in. They are decisive and strong-willed, but will rarely use that energy for personal gain – INFJs will act with creativity, imagination, conviction and sensitivity not to create advantage, but to create balance. Egalitarianism and karma are very attractive ideas to INFJs, and they tend to believe that nothing would help the world so much as using love and compassion to soften the hearts of tyrants.
INFJs are often perfectionistic, looking for ultimate compatibility, and yet also look for someone with whom they can grow and improve in tandem. Needless to say, this is a tall order, and INFJs should try to remember that they are a particularly rare personality type, and even if they find someone compatible in that sense, the odds that they will also share every interest are slim. If they don’t learn to meet others halfway and recognize that the kind of self-improvement and depth they demand is simply exhausting for many types, INFJs are likely end up abandoning healthy friendships in their infancy, in search of more perfect compatibilities.

#### On Friendships...

Really the only way to be counted among INFJs’ true friends is to be authentic, and to have that authenticity naturally reflect their own.
INFJs don’t require a great deal of day-to-day attention – for them, quality trumps quantity every time, and over the years they will likely end up with just a few true friendships, built on a richness of mutual understanding that forges an indelible link between them.

#### On careers..

First and foremost, INFJs need to find meaning in their work, to know that they are helping and connecting with people – an INFJ Ferrari salesperson is a non-sequitur. This desire to help and connect makes careers in healthcare, especially the more holistic varieties, very rewarding for INFJs – roles as counselors, psychologists, doctors, life coaches and spiritual guides are all attractive options.
For INFJs, money and Employee of the Month simply won’t cut it compared to living their values and principles.
These needs are hard to meet in a corporate structure, where INFJs will be forced to manage someone else’s policies alongside their own. For this reason, people with the INFJ personality type are more likely to, despite their aversion to controlling others, establish their independence by either finding a leadership position, or simply starting their own practice. As independents, sole proprietors in the parlance of business, INFJs are free to follow their hearts, applying their personal touch, creativity and altruism to everything they do.
This is the most rewarding option for INFJs, as they will step out of the overly humble supporting and noncompetitive roles they are often drawn to, and into positions where they can grow and make a difference. INFJs often pursue expressive careers such as writing, elegant communicators that they are, and author many popular blogs, stories and screenplays. Music, photography, design and art are viable options too, and they all can focus on deeper themes of personal growth, morality and spirituality.
Where INFJs fall flat is in work focusing on impersonal concerns, mundanity, and high-profile conflict. Accounting and auditing, data analysis and routine work will leave people with the INFJ personality type fidgety and unfulfilled, and they will simply wilt under the scrutiny, criticism and pressure of courtroom prosecution and defense, corporate politics and cold-call sales. INFJs are clever, and can function in any of these fields, but to be truly happy, they need to be able to exercise their insightfulness and independence, learn and grow alongside the people they are helping, and contribute to the well-being of humanity on a personal level.
Multiple sites have claimed that INFJs are extremely rare (< 1%). I had always assumed that I was an INTJ based on what I have read about them as it is pretty obvious that I am introverted. I think most people will know easily if they are the I(introvert) or E(extrovert). The S(sensing) vs. N(intuition) is a bit trickier. This quiz says that I am slightly more N but I took another test that said I was much more towards S.

The F(feeling) vs. T(thinking) is the biggest surprise for me.  But in looking back, I can certainly see some of the traits in action. As a manager, I was always very people first, and made extremely strong connections with those that I had managed. Even as an engineer, I'd consider how people felt and try to address those issues along with the raw technical issues.

I am more J(judging) than P(perceiving). I like to approach life in a structured way and make decisions early rather than keep all my options open. If I get an email that needs a decision, I'll make one and reply right away. Everything is acted upon right away (although sometimes that decision is to delay). Although I like to keep my general life options as wide open as possible, usually when I am faced with smaller decisions, I have no problem making them and accepting the consequences.

I am more T(turbulent) than A(assertive) which means that I am more self conscious and perfectionist. More eager to improve but more sensitive to stress. The opposite would be someone who is more worry free and not pushing to their goals

Overall I think this result makes sense. It explains why I have a few high quality friends and am perfectly happy with that. It explains how work is not just an exchange for money and has much deeper meaning; and why I would walk away from a job with such great pay. It explains why I am attracted to working and living independently.

Again, taking this with a grain of salt. I think so many of the questions I could have answered either way and most of them I'll respond with "well it depends on the situation". But it was a fun exercise anyway.

## Tuesday, November 29, 2016

### Don't Settle for the Default

There are a lot of places in life where there is a normal curriculum to follow. For example college is usually a 4 year program, high school is 4 years, you pay off a house in 30 or 15 years. If you follow these plans, its probably not bad but you have to keep in mind one thing: These plans are often catered for the lowest common denominator. Or in other words, they are designed in a way so that even the worst students can pace themselves and pass. Yes of course there are exceptions, I'm sure there are people who fail high school or don't pay off their house. But the point is that it is designed so that the masses will be successful.

So the next line of reasoning is that if you are average or above average and still follow the same course as the person who is below average, then you are probably being sub-optimal about it. You aren't learning as much and aren't pushing yourself hard enough. In the case of the house, there is a tremendous cost savings of the overall purchase of a house for someone who can pay off their loan quicker. If you can graduate college or high school in 2 years, you may want to consider it. I say consider it because there are reasons to follow the normal course (maybe social reasons perhaps) and not charge ahead. But at least don't forget to look at the default that is being given to you and question whether those settings are optimal for you. If not, then feel free to poke at the walls.

## Saturday, November 26, 2016

### It Won't Make You Happy

Black Friday and Cyber Monday are here again and there's millions of products that are marked down. Even if you didn't want to contend with the traffic and long lines you could go online or use your mobile app and buy that product with just a simple click. But before buying anything, lets remind ourselves again why we do what we do.

Why do we buy any product? At its simplest it is because it makes us happy. This might be through making our lives more easier(a tool) so we can do more of the things we love. Or it might be to make us look more attractive, or generate a feeling of respect. Its helpful to take a look at Maslow's Hierarchy since different products address different levels of needs.

Basic food and water would satisfy our physiological needs. I assume you aren't doing your Black Friday shopping at Trader Joe's so we know that the products aren't satisfying your Physiological needs. So where do they fall on the triangle?

Yesterday was Black Friday and I was asked to go shopping with some friends. I haven't seen them in a while so I decided to come hang out. We headed over to the San Francisco Outlets (which are actually in Livermore). Along the drive we came to a standstill of cars, bumper to bumper, 30 minutes from the mall's parking lot. This was amazing to me as not only were people rushing to the mall, they had to wait in line just to do so. After we waited again to find parking and started walking around I noticed that again there were more lines. Lines that stretched outside the stores that were 2 or 3 store lengths long.  Many of these stores were high end stores such as Kate Spade, Gucci, Ferragamo, etc. So after all this, people were presented the opportunity to buy expensive items at a cheaper price. Why would people do this?

I think consumerism really targets the Esteem and sometimes Love/Belonging levels of the pyramids. The promise is that if you buy this product, you will get confidence and respect from yourself and others. On its surface this sounds like a great deal. But at what cost?

Well if you look two levels down you see Safety which entails security and resources. The act of buying something means trading a resource for another resource. When you buy a high end luxury item you are trading some Safety for Esteem. Now if you have a very wide base and plenty of resources then this seems like a pretty reasonable trade to do (I'd argue that there are different ways to build Esteem to consider, but we'll leave it at that for today).  The much bigger issue is when you don't have an excess of resources and you still make the trade. You are sacrificing lower level needs for higher level ones, Possibly inverting the middle level of your pyramid:

Maslow's Fish

"Maslow's theory suggests that the most basic level of needs must be met before the individual will strongly desire (or focus motivation upon) the secondary or higher level needs." This might still be true, but Maslow's Fish might represent some people not building up such a strong foundation of Safety before reaching for Esteem.

Ok, maybe that looks a bit ridiculous, but consider the number of people who hate their jobs but aren't able to change because they say they can't make ends meet. In the meantime they drive their BMW and buy expensive Gucci bags to build their confidence and show to the world that they are indeed successful. The allure of the higher rungs of the pyramid are so powerful that in many cases we even let it erode our physiological level needs. That might be an example of trading Physiological needs for Safety, but its possible that it is just a reaction to the prior trade of Safety for Esteem. Our resources or depleted so we trade our time and energy to work harder for more resources to keep our pyramid standing.

I'm not hating on consumerism, everyone is free to choose whatever they want. But I think its a good idea to take a step back and look at the big picture. See our wants and needs as a whole and how other levels of needs might be affected. Think about if that product really does have the effect that is promised and what trade-offs are being made.

## Wednesday, November 23, 2016

### Giving Thanks

Thanksgiving is a tradition where we get together with loved ones and appreciate each other. Its a good time to reflect on our lives. We all have so much to be thankful for. I think it tends to be one of the most joyful holidays because (as the name implies) its kind of pushes you to practice gratitude. And as Tony Robbins puts it, as humans we are wired so that we cannot experience negative emotions like anger and sadness at the same time as experiencing gratitude. So in order to go through life happier and with more appreciation, in addition to doing some reflection on things we are grateful for during Thanksgiving, we should try to make it a daily habit.

This Thanksgiving I've decided to create a list of 100 things I am thankful for:

1. Immediate and extended family including (Mom,Dad,Brother) as well as those cousins that I only see once a year.
2. My friends that have had my back through thick and thin
3. All the people that I have crossed paths with whether that might be a random conversation with a stranger or learning something from a coworker
4. This laptop that I am writing on that is so reliable and still going strong and works so great after 7 years!
5. My Nexus 5 phone with a crack in the screen
6. The podcasts that I listen on it (particularly Jocko, Tim Ferris, Security Now)
8. These Bose bluetooth headphones that were given to me from hired.com (as a gift even though I was not hired through them)
9. Hiking up mission peak. I can walk from my parents house.
11. Haeegandaz Ice cream
12. Baking breakfast muffins with my mom
13. A nice home cooked meal
14. Coffee - this list isn't in order, but this would be high up there
15. My amazing place in the heart of San Francisco
16. Amazingly good health
17. My latest job and everyone who contributed to my growth and development
18. Myself being a minimalist simple person who needs very little to be happy - lots of credit to my parents for raising me this way
19. Sports. Both playing and watching competition
20. Assorted nuts (from Costco)
21. My other bluetooth sports headphones that I can run with or take to the gym and not have to deal with wires
22. Having the opportunity to take a 3 month South America trip
23. Get togethers with friends
24. Chicken wings
25. Baths
26. Going camping in the desert and getting away from it all
27. Writing code (and being able to make a living on it)
28. Design discussions with a particular coworker
29. Not being perfect and having a bunch of failures but being able to bounce back
31. Blogging
32. The library
33. Seth Godin - learned a lot from his ideas
34. Hacker news
36. Coding contests
37. Slack (the chat service)
38. Rock climbing
39. Running
40. A glass of wine
41. That feeling of cool wind blowing against my skin
42. A warm shower
43. Believing that I am a badass
44. Doing a pullup
45. Chicken
46. Intellij IDEA
47. Trader Joes - particular one a couple blocks from my house
48. Sparkling water
49. Blueberries
50. Music
51. Sunny days, sunrises
52. My camera - storing memories
53. Wifi
54. Beaches
55. Electricity
57. Life being hard, challenges
58. Good eyesight
59. My other senses (touch, smell, taste, hear)
60. My thoughts
61. Exercise
62. Sunsets
63. Anxiety
64. The feeling of confidence
65. Fear
66. Pets - cats and dogs
67. Women of all kind
68. Fruits
69. Electricity
70. Fire
71. A warm blanket
72. Laughter
73. Motivation
74. Habits
75. Smiling
76. Laughing
77. Crying
78. Both good and bad things in life
79. Nature
80. Having a good conversation with someone
81. All the memories that I have
82. Helping others
83. Receiving help from others
84. The rain
85. Puzzles
86. Running shorts
87. The feeling of sweating during a good workout
88. Traveling
89. The feeling of getting stronger
90. Still having hair left
91. Being born beautiful
92. Myself
93. A good beer
94. Being very comfortable by myself
95. Getting a good nights sleep
96. Snow and snowboarding
97. Brunch
98. A good pair of shoes
99. Being up early and enjoying the morning
100. Putting in work and getting after it

## Monday, November 21, 2016

### Derek Sivers on the Tim Ferris Podcast

Love the Tim Ferris podcast. So many gold nuggets. Particularly like it when Derek Sivers is on as he explains things so simply and elegantly:
• Its not what you know, its what you do and what you practice every day.
• Would we still call Richard Branson successful if his goal was to live a quiet life but as a compulsive gambler he keeps creating companies?
• Early in your career, say yes to everything. You don't know what the lottery tickets are
• The standard pace is for chumps. Go as fast as you want. Why would you graduate in 4 years if you can do it in 2? The pace of 4 years was geared for the lowest common denominator.
• Don't be donkey. Think long term. That means you can decide something now. (The fable is that a donkey stands between hay and water, not being able to choose, and eventually dies; a donkey can't think long term enough to figure out that it can have both).
• Hell yes or no. We say yes to much and let too many mediocre things fill our lives
• Busy implies out of control. If you keep saying "I'm too busy, I have no time for this", it seems like you lack control over your time.
• When making a business, sometimes we focus on the big things but sometimes something small (like a special email to make someone smile) will have a big impact.

## Sunday, November 20, 2016

### Take the Training Wheels Off

When you are struggling or learning something, it is sometimes useful to introduce some structure or positive constraints. This might mean working in 30-minute pomodoros, tracking every dollar spent, or calorie counting. However, as you develop the habit or skill, sometimes this structure comes back and becomes a limit on further growth. Think of training wheels. In the beginning we need them to help while we train our coordination, balance, and confidence. It doesn't take long for us to outgrow them. Soon the support turns into a crutch. We could still ride just find with those training wheels but it would limit our speed and turning ability.

Its important to build discipline and good habits, and most of us probably never do enough. But its also important to step back and take a look at the reasons for setting up these systems in the first place. Sometimes the original reasons are no longer applicable and we need to adapt again to continue to grow.

I've personally felt this with pomodoros. At first I felt undisciplined, always getting distracted so I decided to do focused work in 30 minute intervals. It certainly helped me be more productive, but I also felt it limited me as I would get distracted and get pulled out of my flow state at each interval. Sometimes I can work for hours straight and not feel distracted or bored. So I decided take off the training wheels. Things feel more natrual and I am as productive as ever. However, I still feel there are good situational uses for pomodoros; if I'm ever feeling lazy or distracted I might go back to this technique for a spark.

## Monday, November 14, 2016

### Limits To Growth

I love thinking about systems. The Fifth Discipline has some great insight into understanding common systems that occur in different contexts in our live. One of them is the concept that in order to grow, you can't just push growth. The world has underlying systems and these systems will have limits on them. At some point if you push to much the systems resists and pushes back.

For example, lets look at growing the engineering org at a typical company. You will do so by making more hires. This in turn will make the complexity of the system more difficult to manage. The senior engineers will be pushed to management and thus pulled away from developing product. At first there is a very noticeable increase in productivity. This is good so you add more people. At some point the system starts becoming pretty complex and the increase in hierarchy and difficulty of communication means that the rate of getting things done slows to a crawl. You can't just keep growing in the same way. The system has reached a limit here.

Another example is dieting. If you go on a diet you will probably see immediate results, but in not so long your rate of weight loss will decrease and you may hit a limit on how much weight you can lose.

A naive way of thinking is to push harder into the system and add more workers or eat or push the diet even harder. But this will solve nothing as the limits are already reached. More workers will cause morality issues as workers realize it is impossible to get things done and are overburdened by complexity. Dieters will be overcome by hunger pains and unable to cut calories any further.

In each of these situations the solution is not to push growth but to really take a step back and see what the limits are on the system. Focus on removing the limits. This might mean decentralizing the hierarchy or starting an exercise plan in addition to the diet. It might mean keeping the management hierarchy flat and introducing a new software management methodology. It might mean changing the diet completely; maybe your body responds differently to the Paleo diet than the Ketogenic.

Whenever you see a situation where there is growth at first but then the growth starts to slow to a halt, it is likely that the system is hitting a limit. The way past that limit is not more of what has been successful before but instead changing the system in a way that raises the limit.

## Sunday, November 13, 2016

### Dangerous vs Scary

On NPR's How I Built This Podcast, Jim Koch, the founder of Sam Adams talks about the difference between scary vs. dangerous. Many things that are scary to us are not dangerous. Oppositely, many things that are dangerous are not scary.

He gives the example that repelling off of a cliff is scary but you are held by a belay rope which could hold up a car. Therefore it is not really dangerous. Things like walking near a snow mountain when the weather heats up probably isn't scary, but is really dangerous as it could cause an avalanche. I think not wearing sunscreen out to the beach might also be not scary, but dangerous.

Jim then explains that him staying at Boston Consulting Group would not be scary but would be very dangerous as one day when he is 65 he'd look back at his life and see that he wasted it by not doing something that made him happy.

Although I can honestly say that I loved my job, made such great friendships and learned so much from the experience, I knew it was time to leave. Towards the end the winds changed. There was a change in management, the community broke down and the people who I shed blood and sweat with started leaving. My rate of growth slowed down a lot. Although I made good money there I realized that it would be really dangerous to stay. Quitting my job would be scary but I'd ultimately be fine and I'd gain new skills and confidence. I'd be more antifragile

Whenever something is scary, we should also ask if it is dangerous. If not, then don't be afraid to take the leap.

## Wednesday, November 9, 2016

### Hyperfocus

I've noticed a pattern across a few notably successful people. When they are very interested in something they get into a flow and hyper-focus on it.

For example when Elon Musk got his first computer it came with a BASIC programming workbook. It it supposed to take 6 months to finish but he got OCD and spent the next 3 days coding straight and finished it.

Derek Sivers, founder of CD Baby, says that he doesn't have morning rituals because he gets so focused on what he does. For example, he spent 5 months, every waking hour programming SQL, only stopping an hour or two. After the project was over he hyper-focused on the next thing. When he started CD Baby he did almost nothing else but work on his company from 7am to midnight.

In addition he mentions that when people are asked about their general happiness, those who have been in the flow more often will report higher happiness.

Personally I've have always followed a to-do list style of approach where I would tick off 5-10 tasks a day. However, I'm going to really focus on doing just one thing, to put my self in an environment to experience flow more often and to stay in-tune with my interests and not stop myself from pursuing them.

I'm interested in seeing how this goes as opposed to my current busy (but possibly running in place?) schedule. So I'm going to do an experiment and for the next month I'll stay off the Trello board and all daily planning completely. Curious to see how it goes.

## Tuesday, November 8, 2016

### Machine Learning Introduction

*Interested in Machine Learning? I recommend starting with Google's free course. It is very detailed and will make this post make a little bit more sense. My notes on it are here.

I had the privilege to join a workshop on Machine Learning hosted by Galvanize and CrowdFlower using the Scikit-learn toolkit. Below are my notes using Jupyter Notebook.

You can get the course material at https://github.com/lukas/scikit-class. You can follow the README there to download the code and get all the libraries setup (which includes Scikit-learn, which is the library which contains all the neat machine learning tools  we'll be using).

Start by running scikit/feature-extraction-1.py
In [1]:
# First attempt at feature extraction
# Leads to an error, can you tell why?

import pandas as pd
import numpy as np

target = df['is_there_an_emotion_directed_at_a_brand_or_product']
text = df['tweet_text']

from sklearn.feature_extraction.text import CountVectorizer

count_vect=CountVectorizer()
count_vect.fit(text)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
12
13 count_vect=CountVectorizer()
---> 14 count_vect.fit(text)

/home/jerry/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in fit(self, raw_documents, y)
794         self
795         """
--> 796         self.fit_transform(raw_documents)
797         return self
798

/home/jerry/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in fit_transform(self, raw_documents, y)
822
823         vocabulary, X = self._count_vocab(raw_documents,
--> 824                                           self.fixed_vocabulary_)
825
826         if self.binary:

/home/jerry/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in _count_vocab(self, raw_documents, fixed_vocab)
750         for doc in raw_documents:
751             feature_counter = {}
--> 752             for feature in analyze(doc):
753                 try:
754                     feature_idx = vocabulary[feature]

/home/jerry/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in <lambda>(doc)
239
240             return lambda doc: self._word_ngrams(
--> 241                 tokenize(preprocess(self.decode(doc))), stop_words)
242
243         else:

/home/jerry/anaconda2/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in decode(self, doc)
119
120         if doc is np.nan:
--> 121             raise ValueError("np.nan is an invalid document, expected byte or "
122                              "unicode string.")
123

ValueError: np.nan is an invalid document, expected byte or unicode string.
There's an exception already?! This throws an exception because there is missing information on line 8 of tweets.csv. If you look at that file you will see that there is no tweet text at all. You'll find this is extremely common with data science. Half the battle is manipulating the data into something you can use. To fix it, the pandas library provides a convenient notnull() function on arrays. Here's an example of how this works:
In [40]:
import pandas as pd
#pandas has a special type of object called a Series object
s = pd.Series(['apple','banana','cat','dog','elephant','fish'])
print type(s)
print
print s
print

# you can pass a list of booleans to this series object to include or exclude an index.
print s[[True,False,True]]
print

# in our example above the extracted tweet_text is also in the same Pandas Series object
text = df['tweet_text']
print type(text)
print

# pandas.notnull returns a boolean array with False values where values are null
print pd.notnull(['apple','banana', None, 'dog',None,'fish'])
print

#Thus combining the Series datatype and pandas.notnull, you can exclude null values.
print s[pd.notnull(['apple','banana', None, 'dog',None,'fish'])]
print

<class 'pandas.core.series.Series'>

0       apple
1      banana
2         cat
3         dog
4    elephant
5        fish
dtype: object

0    apple
2      cat
dtype: object

<class 'pandas.core.series.Series'>

[ True  True False  True False  True]

0     apple
1    banana
3       dog
5      fish
dtype: object


In [41]:
# scikit/feature-extraction-2.py
# second attempt at feature extraction

import pandas as pd
import numpy as np

target = df['is_there_an_emotion_directed_at_a_brand_or_product']
text = df['tweet_text']

# what did we do here?
fixed_text = text[pd.notnull(text)]
fixed_target = target[pd.notnull(text)]

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
count_vect.fit(fixed_text)

# print the number of words in the vocabulary
print(count_vect.vocabulary_)

{u'unscientific': 9042, u'hordes': 4175, u'pickmeupanipad2': 6385, u'yellow': 9608, u'four': 3434, u'prices': 6652, u'woods': 9501, u'hanging': 3940, u'16mins': 70, u'looking': 5143, u'html5': 4215, u'gad': 3543, u'eligible': 2846, u'gadgetoverload': 3546, u'insertion': 4461, u'lori': 5154, u'sxswdad': 8340, u'lord': 5152, u'newmusic': 5809, u'dynamic': 2743, u'bergstrom': 1065, u'dell': 2351, u'rancewilemon': 6892, u'leisurely': 4985, u'bringing': 1305, u'basics': 971, u'prize': 6675, u'customizable': 2213, u'wednesday': 9356, u'oooh': 6028, ... output truncated, its quite long ... }

CountVectorizer will convert the text to a token count. The fit() function applies our tweet data to the CountVectorizer. If you look at the vocabulary_ of count_vect you'll see each word lower cased and assigned to an index.
Before you take a look at scikit/feature-extraction-3.py its worth taking a look at this next example as its a simplified version
In [44]:
import pandas as pd
import numpy as np

target = df['is_there_an_emotion_directed_at_a_brand_or_product']
text = df['tweet_text']

fixed_text = text[pd.notnull(text)]
fixed_target = target[pd.notnull(text)]

from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer(lowercase=True) # this lowercase=True is not necessary because the default is True
count_vect.fit(fixed_text)

transformed = count_vect.transform(["I love my iphone!!!"])
print transformed

vocab = count_vect.vocabulary_
for v in transformed.indices:
print vocab.keys()[vocab.values().index(v)]

  (0, 4573) 1
(0, 5170) 1
(0, 5700) 1
iphone
love
my

By calling transform on a given text such as "I love my iphone!!!", a matrix is returned with the counts of each vocabulary word found. Our original vocab that we fitted to the CountVectorizer is used. "iphone", "love", and "my" are found once in our "I love my iphone!!!" text. In (0, 4573): 0 is used because we only have one sentence and it refers to the first sentence. If you added another sentence you would see a 1 representing the second sentence. 4573 is the index of "iphone" and you can verify if you wanted by finding it in print(countvect.vocabulary) of the previous example. I should mention that "I" is not found because by default only 2 or more character tokens are included in our vocab while the exclamation poins in "iphone!!!" are ignored since punctuation is completely ignored and always treated as a token separator
In [48]:
# scikit/feature-extraction-3.py
import pandas as pd
import numpy as np

target = df['is_there_an_emotion_directed_at_a_brand_or_product']
text = df['tweet_text']

fixed_text = text[pd.notnull(text)]
fixed_target = target[pd.notnull(text)]

from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer(lowercase=True) # this lowercase=True is not necessary because the default is True
count_vect.fit(fixed_text)

# turns the text into a sparse matrix
counts = count_vect.transform(fixed_text)

print(counts)

  (0, 168) 1
(0, 430) 1
(0, 774) 2
(0, 2291) 1
(0, 3981) 1
(0, 4210) 1
(0, 4573) 1
(0, 4610) 1
(0, 4678) 1
(0, 5767) 1
(0, 6479) 1
(0, 7233) 1
(0, 8077) 1
(0, 8324) 1
(0, 8703) 1
(0, 8921) 1
(0, 9063) 1
(0, 9304) 1
(0, 9374) 1
(1, 313) 1
(1, 527) 1
(1, 644) 1
(1, 677) 1
(1, 774) 1
(1, 876) 1
: :
(9090, 5802) 1
(9090, 5968) 1
(9090, 7904) 1
(9090, 8324) 1
(9090, 8563) 1
(9090, 8579) 1
(9090, 8603) 1
(9090, 8617) 1
(9090, 8667) 1
(9090, 9159) 1
(9090, 9358) 1
(9090, 9372) 1
(9090, 9403) 1
(9090, 9624) 1
(9091, 774) 1
(9091, 1618) 1
(9091, 3741) 1
(9091, 4374) 1
(9091, 5058) 1
(9091, 5436) 1
(9091, 5975) 1
(9091, 7295) 1
(9091, 8324) 1
(9091, 8540) 1
(9091, 9702) 1

In this example each of the valid 9092 (because we 0-indexed) sentences are transformed
In the next step we get to apply algorithms to our data. How do we decide what algorithm to use? One simple way to decide what you want is to use this cheat sheet. You can find it at http://scikit-learn.org/stable/tutorial/machine_learning_map/

We'll use a classifier in this step next step. Classification is like Shazam (the music discovery app). The app was told what songs to identity and then when it hears a song it tries to match it to one of them. In this example we'll be training the program to know what happy and sad is and when it sees a new sentence it will try to figure out to smile or not.
In [103]:
# classifier.py

counts = count_vect.transform(fixed_text)
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
nb.fit(counts, fixed_target)

print nb.predict(count_vect.transform(["I love my iphone!!!"]))
print nb.predict(count_vect.transform(["I hate my iphone!!!"]))

['Positive emotion']
['Negative emotion']

You can see that we have added our target data as well all of our token count data. Using the Naive Bayes algorithm we are able to make some predictions. But how do we know how well the algorithm is working?
In [97]:
predictions = nb.predict(counts)
print sum(predictions == fixed_target) / float(len(fixed_target))

0.795094588649

Here we see that almost 80% of the predictions that we have made are correct. That's pretty good right. But we made a rookie mistake here. We used the same data that we trained with to test with. This doesn't help since we could just parrot the results of what we have seen back and get a 100% prediction rate. What we really want is to use our trained model on yet-unseen data. So lets do it again but this time lets train using the first 6k lines of data and then test the rest (~3k).
In [107]:
nb.fit(counts[0:6000], fixed_target[0:6000])

predictions = nb.predict(counts[6000:])
print sum(predictions == fixed_target[6000:]) / float(len(fixed_target[6000:]))

0.611254851229

Thats seems much better. But this number might mean more if we compare it to some baseline. Lets compare to a simple dummy 'most frequent' classifier which will just blindly return the most frequent label (In this case that would be "No emotion toward brand or product"
In [106]:
from sklearn.dummy import DummyClassifier

nb = DummyClassifier(strategy='most_frequent')
nb.fit(counts[0:6000], fixed_target[0:6000])
predictions = nb.predict(counts[6000:])

print sum(predictions == fixed_target[6000:]) / float(len(fixed_target[6000:]))

0.611254851229

So it turns out our classifier using Naive Bayes is just 5% better than a classifier that just looks at the most frequent token

# Cross Validation¶

Cross validation gives us a more accurate gauge of accuracy. What it does is partition the data into a certain number of pieces. It will then do many rounds, rotating which partitions are used to train and which are used to validate
In [110]:
nb = MultinomialNB()
from sklearn import cross_validation
scores = cross_validation.cross_val_score(nb, counts, fixed_target, cv=10)
print scores
print scores.mean()

[ 0.65824176  0.63076923  0.60659341  0.60879121  0.64395604  0.68901099
0.70077008  0.66886689  0.65270121  0.62183021]
0.648153102333

In the above example we split the data into 10 pieces (cv=10) and do a KFolds cross validator where 1 piece of data is used for validation while the other 9 peices are used for training. You can see the results of each round and the mean of all the rounds. Once again we'll do the same cross validation with a baseline 'most_frequence' classifier
In [128]:
nb = DummyClassifier(strategy='most_frequent')
scores = cross_validation.cross_val_score(nb, counts, fixed_target, cv=10)
print scores
print scores.mean()

[ 0.59230769  0.59230769  0.59230769  0.59230769  0.59230769  0.59230769
0.5929593   0.5929593   0.59316428  0.59316428]
0.592609330138


# Pipelines¶

Pipelines are just some useful plumbing to chain together multiple transformers. You can notice that our code to create a CountVectorizer and apply Naive Bayes becomes much more compact:
In [138]:
p = Pipeline(steps=[('counts', CountVectorizer()),
('multinomialnb', MultinomialNB())])

p.fit(fixed_text, fixed_target)
print p.predict(["I love my iphone!"])

['Positive emotion']


# N-Grams¶

In the previous examples we've only built out our vocabulary using one word at a time. But there's a difference if someone says "Great" vs "Oh, Great". To get more accurate results we can try taking both 1 and 2 gram combinations
In [156]:
p = Pipeline(steps=[('counts', CountVectorizer(ngram_range=(1, 2))),
('multinomialnb', MultinomialNB())])

p.fit(fixed_text, fixed_target)
print p.named_steps['counts'].vocabulary_.get(u'garage sale')
print p.named_steps['counts'].vocabulary_.get(u'like')
print len(p.named_steps['counts'].vocabulary_)

18967
28693
59616

Notice that the vocab is much larger than before
In [140]:
scores = cross_validation.cross_val_score(p, fixed_text, fixed_target, cv=10)
print scores
print scores.mean()

[ 0.68351648  0.66593407  0.65384615  0.64725275  0.68021978  0.69120879
0.73267327  0.70517052  0.68026461  0.64829107]
0.678837748442

And our result taking the 1 and 2 gram is a bit more accurate

# Feature Selection¶

You want to be selecting features or attributes that will help be the most predictive to either boost performance or to make results more accurate
In [175]:
# feature_selection.py
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

p = Pipeline(steps=[('counts', CountVectorizer(ngram_range=(1, 2))),
('feature_selection', SelectKBest(chi2, k=10000)),
('multinomialnb', MultinomialNB())])

p.fit(fixed_text, fixed_target)

from sklearn import cross_validation

scores = cross_validation.cross_val_score(p, fixed_text, fixed_target, cv=10)
print scores
print scores.mean()

[ 0.67032967  0.66813187  0.62087912  0.64285714  0.64945055  0.67912088
0.67876788  0.6809681   0.66041896  0.63947078]
0.659039495078

In this case we took only the most predictive 10k tokens. You can see that this actually lowered the accuracy
In [177]:
 p = Pipeline(steps=[('counts', CountVectorizer()),
('feature_selection', SelectKBest(chi2)),
('multinomialnb', MultinomialNB())])

parameters = {
'counts__max_df': (0.5, 0.75, 1.0),
'counts__min_df': (1, 2, 3),
'counts__ngram_range': ((1,1), (1,2))
}

grid_search = GridSearchCV(p, parameters, n_jobs=1, verbose=1, cv=10)

grid_search.fit(fixed_text, fixed_target)

print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters set:")
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))

Fitting 10 folds for each of 18 candidates, totalling 180 fits

[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed:  1.9min finished

Best score: 0.605
Best parameters set:
counts__max_df: 0.5
counts__min_df: 3
counts__ngram_range: (1, 1)

This last step shows how to do a Grid Search. This tries out all possible combinations of given parameters and returns the parameters that give us the best fit. In the example above there are 3 max_df options, 3 min_df options, and 2 ngram_range options. Multiplying them together gives us 3x3x2 = 18 candidates. All 18 are tried and the best score and best parameters are give.