Showing posts from 2013

Creating a simple TCP server in Java and using it from Python – to use the Stanford POS tagger

As the advanced network protocols flood the market we forgot how easy it is to create a simple server and a client if we want to integrate two different platforms – Java and Python in this case. The standard TCP socket library is there on practically every platform so connecting to a simple TCP server or actually creating one is just a couple of lines even in C. If we need to, we can build more complex protocols and use more complex document formats, but it's important to remember that we don't need it all the time. POS tagging For one of my projects, I needed to use the Stanford POS tagger to parse a large text corpus. Even though there are Python POS taggers , my favourite one is by far the Java based Stanford implementation. Usually I use it directly from Java, but in this case the input file was a bit tricky to parse and Python did it very well so I just wanted to do the POS tagging in Java and everything else in Python. My first thought was to create a file base

Drawing circle and calculating sinus function without trigonometry, power, or root

The formula of a circle is quite straightforward ( r=sqrt(x^2+y^2) ) but it's not trivial how to draw a circle or calculate the trigonometric functions without advanced math. However, an interesting finding from 1972 makes it really easy. Minsky discovered (by a mistake) that the following loop will draw an almost perfect circle on the screen: loop:     x = x - epsilon * y     y = y + epsilon * x # NOTE: the x is the new x value here Turns out that epsilon in the equation is practically the rotation angle in radians, so the above loop will gradually rotate the x and y coordinates in a circle. Calculating sinus If we can draw this circle, we can easily estimate the sinus values: for the current angle, which is basically the sum of epsilons so far,  we have a height (y), which just needs to be normalized to the 0-1 range to get the actual sinus for the angle. The smaller the steps (epsilon) are, the more accurate the formula will be. However, because it's not a p

Dutch national flag problem - performance in Python and C (two pivot partition, three way partition)

One of the typical interview questions is the three way partitioning, also known as the Dutch national flag problem: given an array with three different values, sort it in a way that all values are grouped together (like a three colored flag) in linear time without extra memory. The problem was first described by Edsger Dijkstra. As it's a typical sorting problem, any sorting would do but they would need n*log(n) time. There are two good solutions to it: do a counting sort which requires two passes over the array; place the elements to their correct location using two pointers. The latter uses a reader, lower, and an upper pointer to do the sorting. Reader and lower starts from one end of the array, upper pointer starts at the other end. The algorithms goes like this: - If it's a small value, swap it with the lower pointer and step lower pointer one up - If it's a middle value, do nothing with it and step reader pointer one up - If it's a larger value, swap

Introsort algorithm in .NET 4.5 – internal implementation details

Even though introspective sort was invented in 1997 (by David Musser) and was taken on  by many frameworks no too long after, Microsoft decided to add it to the .NET platform only in 2012. However, it worth having a look at the implementation, it’s quite smartly written. Introsort Currently the fastest sorting algorithm is quicksort but it’s got a couple of edge cases when it performs really poorly – in O(n^2) time instead of the O(n*log(n)) typical runtime. Even though a smart pivot selection makes it practically impossible to have an O(n^2) runtime, “evil” input can force the sorting to fall into this, making the running time much longer than expected. Musser had two ideas to speed up the typical quicksort algorithm: If the algorithm on any sub-array runs longer than expected (recursion is deeper than log(array size) ) then let’s switch to heapsort which is guaranteed to finish in O(n*log(n)) time If the number of elements is small enough don’t use quicksort, simply

Async-Await in .NET is not for performance – it’s for readability

While I see the LINQ on the .NET platform one of the best if not the best language feature, I think the async-await idea is more like smoke in the mirror than anything else. Actually, it’s even worse: as the underlying operations are not trivial in some cases we are better off using linear code. Idea based on desktop apps The whole async-await idea is mostly coming from the cumbersomeness of desktop GUI applications and UI threads: only a single thread can modify the UI so when a long running process finishes we need to ask the UI thread to display the result for us but the operation itself cannot update the UI. It’s not a language problem; it’s a Windows/framework problem. However, Microsoft decided to “solve” it with a language feature, because: Concurrent programming is a hot topic and Microsoft wanted to “be there” C# language evolves fast so adding a new feature is quite easy While the latter is a great thing (hey Java, can you hear me?), adding a language feature b

Damn cool machines – stirling engine, ram pump, and valveless pulsejet engine

Once in a while it is worth looking around what kind of software or in this case hardware solutions were invented that did not catch enough attention. This post is about three hardware designs that amaze me. Stirling engine The Otto engine is by far the most successful engine we know these days but it requires special kind of fuel like gasoline or gases to operate. In some cases it is not available but cheap heat source can be accessed, like burning solid materials or focusing solar energy with mirrors. In this case a stirling engine can convert heat into mechanical movement in an easy way. The idea is based on the rules of thermodynamics: the temperature of the gas always equals to pressure multiplied by volume (within constant factor). This basically means if we heat up the gas it expands or the pressure needs to rise. Similarly, if we decrease the pressure, the volume or the temperature has to drop (or both moving together). The stirling engine is basically a two cylinder

Hosted Git (GitLab) in 5 minutes on Azure Virtual Machine

Finally Microsoft realized they are not the only software vendors out there so they started to support other suppliers on their cloud platform, Windows Azure. Apart from the stock images from Microsoft, BitNami publishes their own high quality virtual machines as well – mostly based on some kind of Linux system. One of their images is a completely set up and ready to use Git source control solution using GitLab. This could really save a lot of time for us as the installation guide seems to be a little longer than anyone would prefer spending on setting up Git. To install the image simply Browse VM Depot, select the GitLab image and create a new virtual machine from it. Microsoft went so far with supporting Linux systems that we can even enter our own username and password for the box during setup, no need to use any default logins on the fresh machine. GitLab setup isn’t particularly resource hungry, so we are perfect fine with the extra small (A0) instance to host our git/wik

Languages and databases - a step back a day

I've started to learn the Go language recently and to be frank I'm horrified. Horrified by the way the future of programming looks like and the NoSQL movement just making it worse. Languages - a step back If you haven't read any Fortran code yet, do it by any means. It's a language that was invented in 1957 when 'computers' were practically non existent. However, if you look at it more than 50 years later it's still readable and not really ugly - especially compared to some C++ code... Then we had Python, Ruby, Java, C# and all sorts of modern languages. In fact, we have a new language born almost every day (have you heard about Rust for instance?). The only issue with them is that the programming model hasn't really changed in the past 50 years. We write text files, line by line, instructing the computer what to do step after step. The processors didn't really change much either unfortunately. And the horrifying part? Google comes along,

The next big thing on the web – custom HTML elements

If you are either a HTML component vendor or ever wanted to use someone else's HTML module, you already faces the same issue: components may not work together and even if they do, they put a lot of noise in your pure HTML structure. This is going to change very soon. Large component vendors like Kendo or Telerik are creating amazing HTML components that are, in theory, just plug and play: whatever framework you are working with it should be really easy to use those enhanced modules. However, even if they work as they are intended to, they change and litter the HTML DOM so any kind of debugging can be a bit of a pain. Noisy HTML The problem is that even if we start with a nice and clean HTML code that looks like this: <select id="size"> <option>S - 6 3/4"</option> <option>M - 7 1/4"</option> <option>L - 7 1/8"</option> <opti

Simulating crowds and animal swarms

If you've ever looked at an ant farm I'm sure you were surprised by how organised they were. But the truth is they all follow a very simple logic which “magically” adds up to a large scale organised colony. How do they do that? Swarm behaviour is not really observable on one entity but very visible on the large scale. Interestingly it works with us, humans as well: we don't know how we board a train or plain but a simulated environment can predict very well will we move and how efficient the crowd movement will be. Boids Studying the crowd movement is not new by any means: Craig Reynolds in 1986 came up with three simple rules that can describe the large swarm movement quite accurately. He named the rules as “boids” after birds (it supposed to be the New York accent style bird?). The three rules are: - separation : the entities prefer to keep some distance from each other - alignment : try to go the same way as the group or the local flock-mates - cohesion : t

Ubuntu loses network connection after do-release-upgrade - Ignoring unknown interface eth0=eth0

Over the weekend I decided to upgrade one of my linux servers, running an Ubuntu desktop. The Ubuntu do-release-upgrade tool is pretty reliable, even over SSH the whole process completed without any errors (try that from Windows 2003 - 2008). The only issue is that after reboot the machine disappeared from the net. Had to hack into the console to have a look what's going on. Turns out my eth0 interface wasn't configured so didn't get any IP addresses: > ifup eth0  * Reconfiguring network interfaces... Ignoring unknown interface eth0=eth0. The fix was pretty easy, just had to edit /etc/network/interfaces files and try again: auto lo iface lo inet loopback # For DHCP only auto eth0 # Statis IP assignment iface eth0 inet static address netmask gateway # Google's DNS - fast and reliable dns-nameservers ifup eth0 was much friendlier this time, the box was back on the net again.

Static constructors and sealed classes - how to make your code untestable

In almost all programming languages there is a possibility to put some application logic in static constructor. However, it's usually up to the platform to decide when to execute the code we put there - usually it happens when we the environment runs a piece of code that refers that class. Recently I was reading through the Windows Azure Training Kit (written by Microsoft) and this is a piece of code from the kit: public class GuestBookDataSource{ ... static GuestBookDataSource() { storageAccount = CloudStorageAccount .Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); CloudTableClient cloudTableClient = storageAccount.CreateCloudTableClient(); CloudTable table = cloudTableClient.GetTableReference("GuestBookEntry"); table.CreateIfNotExists(); } ... } This code will open or create a cloud SQL table before accessing anything around it. As the code runs only once, it's seems like an efficient way to do the initialization of the
One of my projects uses a self hosted executable that is accepting incoming connections on port 80. The package has an embedded Jetty so I didn't need to deal with any low level TCP stuff. That is the good part. The bad part is that it turned out that the hosting environment (Ubuntu linux) for security reasons by default does not allow a non-root user to use any TCP port below 1024 - so the first free for all port is 1024.                                                                                                                           One easy solution would be to set up Nginx and forward the traffic it receives on port 80 to my self hosted app on a different port. Quite easy to set up but why waste a network hop when I didn't really need any feature Nginx could offer?                                                                                                                           Installing authbind Luckily I'm not the only one with this kind of

Are we developers the romance writers of the science world?

As developers we tend to do repetitive, sometimes tedious labour every day. We stand at a scrum/kanban board every morning pretending the problems we are facing are interesting, hard and we are the only solution to them. Maybe we think we are the smartest in our team. Maybe everyone in that team thinks he is the smartest. Anyhow, the job we have to deliver as an everyday developer relies roughly on the same technologies. We like it or not, but most of our work is based on someone else's work, someone else's idea, methodology. It's so rare that we, as everyday developers can add something brand new that wasn't there before. And no, a HTML5 progress bar is not all that new. And no, creating a iPhone app that uses Facebook login is not new either. But if we are not really in the invention part of the game, surely the people at the big fancy IT companies are, right?                                                     What developers at Google/Amazon/eBay/Microsoft/

Using Vim on Mac OS - basic settings

In every programmers life comes a moment when they need to deal with Vim somehow - some people just need to close it once accidentally opened, some people need to start using it for work. For a long while I could get away without using it but then I had to start editing Octave scripts and a simple notepad did not seem to be too useful to do that. Ancient roots Vim was released 21 years ago so most of us can argue that is is an ancient editor. An editor from an era where even Java wasn't invented (it was released in 1995). This reason alone could invalidate its usage as so many things changed since then. Well, the thing may not be that different. We are still using some kind of text editors to write source code and the problems around code writing are roughly the same: quickly edit and reformat specially formatted texts. From the first look Vim is an editor that does not even support mouse and even deleting a line is troublesome - not to mention exiting the editor. However