What I've Learnt Today: December 2011

Monday 19 December 2011

A little bit of physics

It's been a day of videos. I watched the BBC program A Night with the Stars with Brian Cox talking about quantum physics. (http://www.bbc.co.uk/iplayer/episode/b018nn7l/A_Night_with_the_Stars/) He explains why there is so much space in matter, the bonding of atoms and dying stars forming diamond planets. I think this link to a clip may be available outside the UK: http://www.bbc.co.uk/news/science-environment-16200089.

I also watched a Minute Physics talk where it explains that Neutrinos that they are all left handed and there isn't a corresponding right hand version.

Sunday 18 December 2011

Microbes

Between thinking about the AI exam, shopping and java exercises, I've been watching the following video by the Open University about the wonders of the microbe world.

It briefly covers

yeast in beer making in ancient Egypt
the Black Death
preserving food
microbes fixing nitrogen in the soil by combining it with hydrogen or oxygen
microbes helping digest food
antibiotics produced by one microbe to kill other microbes
using e. coli to produce, for example, insulin and treatments for AIDS

Friday 16 December 2011

End of a Chapter

This year has certainly brought about a lot of changes for me. Today was another one. The end of a chapter.

From September 2003 I've taught a craft class every Friday morning from September until somewhere around April. Today was the last one. There have been ups and downs, and quite a few people have come and gone as circumstances changed. I've learnt a lot and developed my skills as the people in the class have developed theirs. I've been pushed to be creative and come up with new ideas for cards, and sometimes other projects, every week. It's been challenging but I'm now ready to move on to other things. I don't know quite what but I'm sure time will tell. From a craft point of view, I want to go in a slightly different direction and look into fabric and fibres instead of paper which is the medium I've mostly worked with.

Not so much learnt today

So these are the things I learnt today.

Natural Language Processes

Today I've been reading about the structure of languages in the book Speech and Language Processing - Daniel Jurafsky and James H. Martin, which is recommended for the Natural Language Processes course running in the New Year. Turkish has 40000 possible forms of a verb excluding derivational suffices. This means that storing all the words in Turkish is impossible as the derivational suffices allow infinitely many words.

Java

Progress is slow, but at least I'm picking up bits and pieces. Today, I tried to write a loop using

for(int item:nums)

{

item=5;

}

to change the values in the nums array. This doesn't work. It does the equivalent to

for(int i=0;i <nums.length;i++)

{

item=nums[i];

item=5;

}

So it was back to the normal for loop

for(int i=0;i <nums.length;i++)

{

nums[i]=5;

}

to do what I wanted to do.

Wednesday 14 December 2011

The end draws near

Today, it's a bit harder to summarise exactly what I've learnt as it consisted of the material from 2 hours of lectures on machine learning and about an hour on language processing. I also learnt about how == works in Java between Strings and objects compared with primitive types. It might be a slow process, but hopefully I'm improving.

I feel a mixture of sadness and relief today as I've finished the second of the three free Stanford courses running this term. It's been interesting looking behind the scenes at how machine learning is being used all around us from news sites recommending news we might be interested in to variable pricing on websites depending on acceptance or rejection of previous offers to predicting whether a tumour might be cancer or not. The course was quite mathematical and the programming involved a high level of vectorisation so that built in algebra libraries could be used. Today I finally understood why we used the summation sign rather than just working with the vectorised version all the time. It's because in order to split the work over multiple computers or cores, we give each computer/core part of the summation to do, then combine the results.

It's really quite amazing what can be done, and is being done in this field. The course has given me quite a few "so that's how they do it" moments.

The course runs again from January for about 10 weeks. For details see http://jan2012.ml-class.org/. From looking at the other course dates, my guess is it will start around the last week in January.

Three things for today

Today I learnt that not only can you use compression to determine the language of a passage, but you can also use it to tell which of a series of photographs is the sharpest. A blurry image will compress more than a sharp one.
In java, StringBuilder can be used instead of, and if preferable to StringBuffer if not using multiple threads.
We have brains, first and foremost to control movement. There is a creature called a sea quirt which as a juvenille swims around but then finds a place to anchor itself. Once it has done this, it digests its own brain and nervous system for food. A brain isn't needed when you're not going to move.

Photo Wikipedia Commons: Nhobgood

For an interesting talk on the complexity of movement, and the purpose of the brain see the following TED talk.

Tuesday 13 December 2011

Gzip and Determining Language

While watching the last set of videos for the free Stanford Artificial Intelligence course, the following video completely blew me out of the water. It was so completely unexpected.

This is my third thing I learnt on Monday. The unix command gzip can be used to recognise which language a passage is written in. Suppose you have passages in different languages, and want to know which language a new passage is written in. What you can do is concatenate the passage onto each of the other passages, compress and check which one has compressed the best. The reason this method works is because compression works by shortening the representation of common language patterns, such as "is " in English, by a single byte. This compression will be different for each language and thus the best compression should be achieved for the concatenation with the same language.

Finding Components in a Directed Graph

The second thing I've learnt today is about finding strongly connected components in a directed graph. A strongly connected component is one in which there exists a directed path between every pair of vertices in each direction.

The algorithm in involves starting with an ordering (any ordering) of the vertices, running a depth first search on the graph with all arcs reversed, while keeping note of position the vertices are removed from the stack. This position is then used to order the vertices in the original graph. A second depth first search is run, this time on the original graph with the new ordering of the vertices from largest to smallest. Whichever vertices are generated from the same leader are in the same component.

Example

For the directed graph above, I'll run through the algorithm. First we reverse all the arcs like this:

Starting at 1, we do a depth first search on the graph. From 1 we can go to 2 or 4, and we arbitarily pick 4. From 4 we can go to 10 or 6 so we'll go for 10. From 10 we have no choice but to go to 7, then 8 and from there to 9 and then 3. From 3 we have no unvisited vertex to add so 3 pops off the stack and gets new label 1. I'll use f(3)=1 to denote the new label of vertex 3.

1-> 4 -> 10 ->7-> 8->9->3 dead end so f(3)=1.
1->4->10->7->8->9 dead end so 9 pops off next giving vertex 9 the new label 2, that is, f(9)=2.
1->4->10->7->8 dead end so 8 pops off giving f(8)=3.
1->4->10->7 dead end so 7 pops off giving f(7)=4.
1->4->10 dead end so 10 pops off giving f(10)=5.
1->4 this time it isn't a dead end so we take the other option at 4 which is 6, then 5.
1->4->6 ->5 dead end since 1 has already been visited so 5 pops off giving f(5)=6.
1->4->6 dead end so f(6)=7.
1->4 dead end so f(4)=8.
1 not a dead end as we haven't yet visited 2.
1->2 but since 10 has already been visited this is a dead end. Thus f(2)=9.
1 dead end and finally 1 pops off giving f(1)=10.
Since all vertices have been visited, we're done with the first DFS. The original graph with the new ordering of vertices is shown below.

Starting at 10, we do a depth first search again, taking note of the leader vertex for each vertex, that is, the vertex which starts the path containing that vertex.

10-> 6->7->8. Now 8 pops off then 7 then 6 and finally 10. All of these have leader vertex 10 and thus are in the same strongly connected component as 10.
The next unvisited vertex is 9, so we start the next path with 9. There is no unvisited vertex accessible from 9 so it's in a component on its own.
Now we move on to 5.
5->1->2->3->4. Each of these vertices pops off, 4 then 3, 2, 1, 5. All of them have lead vertex 5 so are in the same strongly connected component..

All vertices have been visited so we are done with the second DFS. This leaves us with the following three components. Note that vertex 9 is in a component on its own and not the same component as 1,2,3,4,5.

Source
CS 161 - Design and Analysis of Algorithms Lectures 6

Monday 12 December 2011

Kaprekar's Constant

I've been watching the University of Nottingham videos by Brady Haran on physics and chemistry for a while now but I hadn't come across this channel, numberphile before which is, rather unsurprisingly, about numbers.

So for the first thing I've learnt today, it's Kaprekar's constant. Take any four digit number, excluding numbers where all the digits are the same (0000, 1111, 2222, etc). Make the largest number and the smallest number out of these digits. Subtract them and repeat the process until it converges, that is you get the same number again. That number will always be 6174.

For example, take 2395. Rearrange the digits to make 9532 and 2359.
9532-2359 = 7173, and repeat
7731-1377 = 6354
6543-3456 = 3087
8730-0378 = 8352
8532-2358 = 6174
7641-1467 = 6174, and as we had this on the line above, we'll always get 6174 so the process has converged.

For more details, and an expansion to numbers with a different number of digits, see http://plus.maths.org/content/mysterious-number-6174.

Sunday 11 December 2011

It's been a while...

The last three months have been quite busy for me with three online courses to study. I've had a wonderful time learning lots of new things. When I started I thought I was going to struggle and not really be able to manage them. I've found that I could do more than I expected. I've found the three courses quite different.

Introduction to Databases

When I watched the first lecture on databases I thought I was crazy and didn't know what was going on. I had to pause and rewind the video multiple times. I felt completely in over my head. After a very shaky start, it became my favourite of the three courses. There were many additional exercises that meant I could really get my teeth into the material. I did everything. It brought back the joy and excitement of learning which made university such a wonderful time for me. I've missed the intellectual challenge.

Machine Learning

This was by far the most mathematical of the courses which meant that I could follow fairly easily. The programming exercises in Octave were frustrating at times as I seem to have a great ability to make the program crash. My poor laptop found it a little tough at times to process everything such as a load of spam emails to predict whether another one was spam or not. It's been great to see the sort of techniques used behind the scenes to analyse spam, categorise news or predict whether someone has a tumour or not.

Artificial Intelligence

This has been the most frustrating and yet most interesting course. I think if I'd watched the course from a casual point of view, I would have loved it. There have been hiccups over precision in asking questions which have lead to confusion. The lecturers are enthusiastic and you can really see how keen they are to explain their subject. It's a shame about the problems but the potential for a great course is there.

What's Next?

I'm trying to learn Java over Christmas so that I can take part in some more courses from January to April. A list of the available courses can be found at the bottom of this page: http://www.anatomy-class.org/. There are 16 courses at the moment, one of which is the machine learning course mentioned above. I don't know which ones I will do yet as they all sound interesting. I think I might just watch some of the courses without being active since I am sure my coding will not be good enough by then and it's also impossible for me to do 15 courses at once.

Java

So, in order to do these courses I've been working my way through http://codingbat.com/java which I'm finding quite helpful. It's a basic enough level that I can work quite quickly while learning some of the classes. I'll need to find something else later on to drag myself up to a higher level but I'll look for that later. I did start off with http://docs.oracle.com/javase/tutorial/java/TOC.html but it went too fast for me so I've paused that for a while and will come back to it later.

Regex

I've started reading one of the recommended books for Natural Language Processes and it looks like I need to get to grips with regular expressions. I've been intending learning this for several years so finally I'm going to do it.

Other Aims

I have quite a few other things I want to get done next year, and I've started to collect them together at a new site called Schemer. The following link is an invitation to the site: http://schemer.com/invite/9vidub2r7rqn6 (It's good for 20 invitations.) You can see other people's goals as well so there is lots of inspiration. Some of the things I'm hoping to do are

become proficient at sewing - I have a sewing machine (well several) but I am not really confident in using it. I'd like to be able to alter clothes, make clothes and do patchwork on it.
make a patchwork quilt
finish some of the many half done projects I have
learn some yoga
learn three new things a day
learn some more Dutch
learn to recognise the constellations of the stars
improve my knowledge of physics
cook my way through a cookery book
walk a few miles every week
grow some vegetables

I'd rather have too many aims than too few - more chances to succeed! I don't know how I'm going to fit everything in but it going to be fun anyway.