Sunday, June 9, 2013

JIT compiling

Java is frikin fast now and in some cases it can be faster than C/C++. A large part of this is due to the Just in Time compiler (JIT). As it sounds, the Java JIT compiler does not compile all of the bye code at once. Instead it will analyze the code while it is running and figure out which chucks of code are frequently executed. Then it goes ahead and optimizes this code. Because it doesn't need to spend resources compiling code that is not run often, it can devote resources to compiling and optimizing the stuff that matters.

Why doesn't Python use curly braces(for blocks of code)?

If you've used Python at all you know that it uses indentation to denote blocks instead of curly braces. Theres a couple reasons for this but it boils down to the Don't Repeat Yourself(DRY) principal.

How is using braces repeating yourself?

Well you've probably worked in a language that relies on braces for scope since most of the most common languages do this (Java,C,C++, Javascript, PHP,etc.). When working with one of these languages have you ever had to refactor legacy code that wasn't indented properly? Although the code has braces, we can't easily figure out what is going on. Where does one method end and the other one begin? The first thing you probably did was use the auto-indent feature in your trusty IDE so you could at least get a high level idea of the format of the code.

So it ends up that those curly braces didn't help you much. And they shouldn't because they really aren't for you. They are for the compiler. You see blocks of code in terms of indentation. The compiler interprets blocks of code based on braces. There's no reason to have both.

Monday, May 27, 2013

Scala Highlights

I went to a Scala meetup which was about introducing features of Scala to Java developers. The speaker was Dr. Venkat Subramaniam and his talk was very informative, easy to follow, and engaging. He showed examples of how to do common tasks in Java and then showed how it is better with Scala.

It was recorded so I'll link to the video when it gets posted. Some of my notes:

Scala is hybrid functional - you have the option of functional programming or Java like imperative programming. Functional programming is beautiful as we want to focus on telling the computer what to do instead of exactly how to do it.

Scala is more statically typed than Java - the Scala compiler does type inference so you don't need to be explicit about specifying the type (although there are circumstances when the compiler cannot infer the type). This inference is all compile time.

The concept of ceremony - ceremony is basically all the extra code/things you need to write in order to do what you want. This includes writing getters and setters and boiler plate code. Java has high ceremony -- imagine trying to explain "hello world" in Java to a newbie. Explaining each of the parts of "public static void main(String args[]) ) would take quite a while. Scala automatically creates classes, main methods, getters/setters so you can write less code. Whenever you need to use the IDE to generate code, it is usually a language smell.

From the Java book "Effective Java", everything should be explicitly declared final if the reference is never going to change. It is very easy to find all the places where you put final but very hard to find all the places where you forgot to put final. This is addressed in Scala with the val keyword so that one has to think about immutability when declaring. Also, method parameters are always immutable in Scala. Because Effective Java was such a huge part in improving the quality of Java code, there are many examples of Scala taking concepts from that book and implementing them directly as part of the language.

Scala IDE/Eclipse has a cool REPL called Scala worksheet that helps quickly evaluate and test code.

Sunday, May 26, 2013

Piping to diff

The usually use of the linux diff program is to diff two files:
diff file1.txt file2.txt

but what if I wanted to diff one file with output generated from a different program instead of a file. Well I could send the the output of the program to a file and then diff the two files but there is a much more efficient way.

./program  | diff file1.txt -

This will diff the output of program with file1.txt

Now what if I don't have any files but just want to diff outputs of two files?

The solution is redirection:
diff <(./command1) <(./command2)

much nicer than creating intermediate files

Friday, May 24, 2013

Design Pattern: Facade

 Let's say that I want to make a pb&j sandwich. I have bread, peanut butter and jelly. Now I can take two slices of bread, put peanut butter on one slice and jelly on the other. Then I can put the two slices together. Great.

Now let's look at how we might do this in code. Here we have some PBJ sandwich related classes.
 public class PeanutButter implements Spreadable{  
 public class Jelly implements Spreadable{  
 public class Bread{  
 public class Knife {...}  
 public class Jar{ ...}  

So a client could make a sandwich like so:
 public static void main(String[] args) {  
     Bread slice1 = new Bread();  
     Bread slice2 = new Bread();  
     PeanutButter pb = new PeanutButter();  
     Jelly jelly = new Jelly();  

But thats a little too detailed for me. I don't care about all the complexities. As a client, all I know is that I am hungry and want a sandwich. So let's provide a facade:
 public class PBJSandwich(){  
   public void make();  
 public static void main(String[] args) {  
    PBJSandwich sandwich = new PBJSandwich();  
The make method is a facade because it hides underlying complexities and provides a simplified interface for the client.

Thursday, May 23, 2013

HTTP Preflighting

I was aware of the same origin policy, but am new to the concept of preflighting.  Basically when making AJAX requests to cross-domains (meaning domains that are different than the one from where the JavaScript was served), the browser does a preflight check first to make sure that it is okay to make the original request.

What this means is that first it does an HTTP OPTIONS  request. Along with the regular headers it will include these special ones referring to the original request

Access-Control-Request-Method: POST
Access-Control-Request-Headers: X-PINGOTHER

The server can then look at these headers and response if the request is allowed:

Access-Control-Allow-Origin: http://foo.example
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: X-PINGOTHER
Access-Control-Max-Age: 1728000
 This response says to allow the POST with the header X-PINGOTHER only from the http://foo.example origin.

What I noticed is that if any extra non-standard headers are sent in the original/preflighted request that are now allowed by the server, the POST will not be sent.
More detailed information can be found here:

Friday, May 3, 2013

What is Data Science?

I'm taking the Introduction to Data Science course on coursera so I'll post a few tidbits on some things I learn over the next many weeks.

First topic, what is Data Science?

Well the term is quite fuzzy so it might depend who you ask but here's Drew Conway's Data Science Diagram -  a commonly referred to on when describing data science.
Since alot of data is electronic now a days you need to be able to somewhat speak the language. You do not need to be a CS major or programmer, but more specific skills of working with data are important from using the command like to put a text file in the right format to programming in R.

The substantive expertise part of it means being able to explore, discover, create hypothesis and tests. Basically, ask and find the right questions and answers.

Conway points out the danger zone because this is the part where people "know enough to be dangerous". Without grounded statistics, one might misinterpret data (when doing data science).  Thinking about it I think the danger zone might be called "Computer Science"

The difference between data science and business intelligence is that in business intelligence, a data warehouse is often created to do specific analysis and answer particular questions which takes a lot  of effort up front to build. This is usually  more specific than data science and BI is not as adaptable when requirements change. In short, BI is about building a particular tool to answer particular questions where data science is more general. Also noted that alot of times the BI engineers do not consume or do analysis on the system they build.

Friday, April 26, 2013

Courage is not the absence of fear

Julian Assange from WikiLeaks and Google CEO Eric Schmidt got together to talk about various topics of technology and information. A great listen

The audio and transcript are here:

A couple brilliant nuggets from Julian:

"I mean, people often say, you are tremendously courageous in doing what you are doing, and I say, no no you misunderstand what courage is. Courage is not the absence of fear. Only fools have no fear. Rather courage is the intellectual mastery of fear by understanding the true risks and opportunities of the situation. And in keeping these things in balance. And not simply having prejudice about what the risks are. But actually testing them. There are all sorts of myths that go around about what can be done and what cannot be done. It is important to test. You don't test by jumping off a bridge. You test by jumping off a footstool, and then jumping off something a bit higher and a bit higher."

"and because we all only live once, we all suffer the continuous risk of not having lived our life well. Every year. Every year that is not used is 100% wasted, it's not a risk of that, it is a dead bet."