Archives for posts with tag: programming

control a computer with text messages

I did a video a while back about using Siri, the virtual assistant on the iPhone, to control my computer and the lights in my room. The ability to control things with a phone is a great concept, but the main problem with remote control solutions like Siri proxy is that everyone doesn’t have an iPhone 4S. Also, most people who do have the new iPhone don’t have the technical knowledge to set up a proxy server on their computer. This realization led me to write a Perl script that allows you to text commands to your computer through a Google Voice number, which is free and accessible to everyone.

A while back, I came across a PERL module called Google::Voice, that uses LWP (Library for WWW in PERL) to connect to Google Voice. The module allows you to integrate the power of Voice with PERL, my favorite scripting language. For those who have never heard of Google Voice, it is a service operated by Google that gives you a free phone number, the ability to make free calls and send free text messages, and the ability to unify all your other phone numbers under it. Google Voice can do a lot of great things, and it is, by far, one of my favorite services that Google offers.

A while back, I started developing a Perl script that integrates the Google::Voice module into a basic message parser. Over time, the message parser got increasingly advance, and has reached the point where it can control iTunes, my lights, and is easily extensible to other things as well. The script, named TextDaemon (daemon is the operating systems term for a program that runs constantly in the background), binds to a Google Voice account and pulls any text messages sent to that account. Here’s how to get it set up.

Read the rest of this entry »


Generating ASCII art in Mathematica

My favorite set of tools in Mathematica is the image processing functionality, for good reason. Image processing in Mathematica can be used to find Waldo or control a robotic turret. Here’s another neat example, where the imaging functions are used to generate ASCII art.

The function ASCIIimage generates an ASCII art version of a regular image that is passed into the function. ASCII images are just images created with text, where lines of text seen from afar create the illusion of an image.

The function can be applied to any image – this example imports an image from a URL and passes it into the function.

It’s hard to see from the image, but each line in that image is actually a string of characters. The image processing function works by first converting the image to grayscale, then applying a thresholding to each pixel. It then replaces each possible pixel value with a string of two characters, which are chosen based on how big they are. For example, a light gray pixel is converted to a “.:”, while a dark gray pixel is converted to a “pq”. After tweaking the mapping from pixel intensity to text, a pretty good thresholding can be found that accurately represents the spectrum.

Here’s an example image which shows what the underlying text looks like for another sample image.

If we look at these same strings of random text from afar, we can see something a bit more beautiful.


how to make a universal remote with an Arduino

Yes, I know you can just buy a universal remote at Radioshack. It would probably look a hell of a lot better than my contraption. But can a universal remote control those awkward low-tech remotes that rely on a line-of-sight IR beam? What about all those remotes you own where all you really do is press the on/off button? A universal remote is great, but at the end of the day, its still a remote. Remotes get lost. Remotes like to hide in the last crevice you would think of checking.

An Arduino, on the other hand, is a wonderful little device that you can control from your computer. In this post, I’ll talk about hooking up an Arduino to relays, which opens up a world of possibilities. It’s a significantly cheaper alternative to a universal remote – my Arduino setup costs a grand total of $37, not including the remotes that I stripped down and attached to it. The cool part about the Arduino remote is that I can use the built-in voice recognition on OSX (reference to my previous post on Jarvis) to trigger a serial message to the Arduino whenever I say “projector on” or “screen down”. This is a great project if you’re trying to remotely control a device, be it a coffee maker, remote control, or pretty much anything that you’re willing to take apart.

Read the rest of this entry »

Adding custom commands to Siri on the iPhone 4S

When I first heard about Siri, my initial reaction was to make a snide correlation to The Machine Stops (a fantastic short story written in 1909 by E. M. Forster that basically predicted the present day where humans are overly and inescapably reliant on technology). Siri allows you to use your voice to accomplish all the basic functions and some pretty advance ones for a phone – calling, texting, emailing, reading, and looking up information. However, it can’t really have too great of an impact on your life aside from what your phone can accomplish – that is, until a developer by the name of “plamoni” released SiriProxy, a way for developers to add custom commands to the phrases that Siri will respond to. As an enthusiast for home automation and voice control, I quickly found that Siri could be very useful in a situation like mine, where I have something like Jarvis to control my lights. Check out the video of the proxy in action:

How it works

Siri works by recording your voice, encoding it, and sending it over the Internet to servers run by Apple ( The server does the voice recognition, and returns it back to your iPhone. A while back, the awesome developer(s) that go by applidium released a bunch of ruby scripts for reverse-engineering Siri. I won’t go into too much detail about what they did – for the curious soul, check out the github link. This paved the way to Siri Proxy, a user-extensible proxy server written by plamoni (another awesome developer on github) that can intercept traffic to and from Siri and use it for, well, anything.

In my case, a while back I wrote a shell script that could control my lights through a web power switch. All I had to do was create a simple plugin for the proxy server that called that script, and I was good to go. If you don’t feel like writing a bunch of Ruby scripts and modifying Gemfiles, you can just grab plugins from github and load them into your proxy. The number of plugins is steadily increasing – a quick github search shows plugins to control a thermostat, remotely control an iTunes library, and even turn on a car!

Read the rest of this entry »

Symbolic programming of Arduino devices in Mathematica

To give you a better idea of what this is, I thought I’d preface this post with a video I took of my robotic turret pointing a laser pointer. At about 1:45 in the video I start talking about the technical details, so if that doesn’t interest you, skip ahead to the rest of the post.

What is that Arduino contraption?

An Arduino is a cheap, powerful tool for prototyping electronics quickly and efficiently. They can be used for any number of things, from building simple circuits to light LEDs and do simple on/off actions, to robotic turrets and full-scale robots that can move, talk, and think. They are the Lego NXT brick on steroids, and they are typically programmed in C. They come in a variety of sizes, shapes, and power, from the ATtiny, a $2.50 IC, to the Mega, with over 60 I/O ports. The board I use in the video is called the Uno, and it’s just the right size and has enough port capacity for pretty much any application.

Arduino microcontrollers are essentially cheap, simple ways to integrate electronic control into everyday things. Jarvis could easily be created with an Arduino board hooked up to a bunch of relays that turn AC power lines on and off – you’d still need the voice recognition on the computer, but an Arduino would be a cheap replacement (about $80 cheaper) for the expensive web power switch. The DIY website has thousands of Arduino how-to’s, from making gardening robots to 3D LED cubes. The best part about using Arduino boards with electronic hardware is that everyone is doing it, so there is a huge community to provide ideas and help to novices and experts alike.

How were you controlling them in Mathematica?

The video is a demonstration of a package that I developed for Mathematica called ArduinoLink. I can’t release too many details about the package just yet, but what I can say is that it will soon be available (for free) to all Mathematica users. It uses symbolic code generation, context prediction, and low-level backend functions to vastly simplify programming and communication with Arduino microcontrollers. The entire video demonstration above was programmed with about 10 lines of Mathematica code.

Read the rest of this entry »

a first look at natural language translation

I embarked on an arduous project a while back, to program a translator to convert English sentences to their Swahili equivalents. I would later realize that I had chosen to make a translator for one of the most difficult languages to form, both syntactically and semantically. In Swahili, nouns, verbs, adjectives, and sentence formation are all modified by the context of the sentence. Prefixes, suffixes, and infixes can all be added to words to modify them, and almost all of the conjugation rules have various irregular exceptions.

Take, for example, the simple English sentence “I am going to the store.” In Swahili, the equivalent sentence is “Mimi ninaenda dukani.” The subject, “I”, literally translates to “mimi” in Swahili. The verb, however, translates to “ninaenda”, a conjugation of the infinitive “kuenda”. Verbs generally have two prefixes attached to them – the subject prefix, which in this case would be “ni” (me), and a tense prefix, which in this case is “na” (present tense). The noun “store” translates to “duka”, but is conjugated as “dukani” because the subject is going to a place. Unlike the simple conjugation rules of English, Swahili words have distinct conjugations that assign much more context in a single word.

Also, there is no equivalent to “the”, or for that matter, the words “a” and “an”. If I want to say “I have a book”, I say “Mimi nina kitabu.” If I want to say “I have books”, it is “Mimi nina vitabu.” The noun changes if it is plural or singular, and modifies other words according to the “noun class” that it belongs to (more on this below).

The rules of Swahili are complex, and the grammar is very different from the English language. However, with an HSQL database and a recursive descent parser, I was able to parse through subject-predicate sentences and return a fairly accurate Swahili equivalent. Since this project is still in progress (and nowhere near completion), I’ve made a project page so that anyone interested in this can track its progress. I’ll periodically be uploading the source code through github, so you can pull it from there if you have Mathematica and want to try it out.

Read the rest of this entry »

This is part 2 of my Where’s Waldo algorithm. Here’s part 1.

So after numerous tweaks to my Where’s Waldo algorithm, I finally settled on one that takes threshold values in order to locate the elusive striped character of Waldo.

This is essentially the same idea as what I did in my last post, with the added parameter of threshold values and a revised algorithm.

With this function, I tested out various Waldo puzzles. For example,

Read the rest of this entry »

Almost everyone can reminisce back to a time where they had their eyes affixed to one of these:

… a “Where’s Waldo” puzzle, where one has to find the brightly clad “Waldo”. Waldo wears a striped red shirt, and is primarily clad in red clothing. When your eye scans across this image, your brain is looking at regions of the image and looking for the distinguishing red-striped pattern of Waldo.

But let’s say you had to find Waldo in less than 2 seconds. Obviously, such a situation is incredibly unlikely to exist, but for the sake of experimentation we’ll consider it. The defining characteristics of Waldo could quite easily be encapsulated in an image processing function- one that looks for stripey things, and one that looks for red things. Defining an automated parser that could look for these characteristics would allow us to highlight regions of the image where Waldo is more likely to be.

Why would one do this? Well, with great computational abilities comes great ideas.

Read the rest of this entry »

I read a very interesting math riddle recently, which inspired this blog post.

Let’s say I give you a fair coin and offer you a bet on the outcome of its flips. If I bet on heads and you bet on tails, we have an equal chance of winning/losing. If I bet on the sequence heads-tails, and you bet on heads-heads, we both have a 1/4 chance of winning and a 3/4 chance of losing. The pattern continues from there – every combination of heads and tails is equally likely to win as any other combination of heads and tails- however, as the combinations get larger, the probability that they don’t occur grows exponentially larger.

But what if I place a bet on which sequence will appear first? If I bet on head-head, and you bet on tail-head, we keep flipping the coin until either of our sequence shows up. What is the probability of either of us winning now?

If you are thinking 50-50, you are wrong. Once we enter the realm of sequences, it is no longer a series of independent events- each event affects how likely a certain outcome becomes. Let’s observe heads-heads (denoted HH) versus tail-heads (denoted TH), and look at all the possible sequence of flips (permutations generated in Mathematica by executing “With[{length = 4}, Map[StringJoin, Tuples[{“H”, “T”}, length]]]”).

  • 2 flips: {HH, HT, TH, TT} -> HH has 1 winning chance, TH has 1 winning chance.
  • 3 flips: {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} -> Some of the permutations can’t occur (HH will have already won), so TH has 2 winning chances, HH has none.
  • 4 flips: {HHHH, HHHT, HHTH, HHTT, HTHH, HTHT, HTTH, HTTT, THHH, THHT, THTH, THTT, TTHH, TTHT, TTTH, TTTT} -> Excluding impossible permutations, HH has 1 winning chance, TH has 3.

As you can see, it isn’t a 50-50 game- some combinations of flips are more likely to show up than others. In the case of HH vs. TH, the chances to win divided by total chances is 2/14 for HH, and 6/14 for TH. So, you are 3 times more likely to win if you pick TH than if you pick HH (75% odds are in your favor), after 4 flips of the coin. It doesn’t apply to other strings of flips, because after each flip, one gets more likely than the other. We’ll look at where the probabilities converge to later in this post.

But what about “HHHTHTH” vs. “HTHHTHH”?

This nifty little coin trick is something that people instinctively believe is a 50-50 chance. So if you’re ever down on your luck, strike a bet for HH vs. TH on a series of coin flips- the Law of Large Numbers guarantees that you will win. This post should allow readers of this blog to exploit the general population that doesn’t read this blog- but in how many ways?

Read the rest of this entry »

Depth and Image Processing with the Kinect

After writing my last post, I got down to business on the Kinect and on processing the Kinect output. The Java wrapper made it pretty easy to input the video and depth readout into a byte buffer, and ofxkinect let me verify that after processing, the Kinect actually does have the ability to output a 3D image. However, out of personal desire to experiment and the lack of a robust processing library led me to work on my own methods to process the Kinect camera input. (correction- Daniel Shiffman released a great open source processing library for OSX)

Point cloud image of myself – ofxkinect debug screenshot

Image and depth processing

I want to first say that the way I processed input is almost certainly not correct. If you look at my screenshots below, you can see for yourself how weird the RGB input, and even more so the depth input, looks like. Whether this is a processing problem or a problem in the way I inputted the information is all speculation. I will soon be implementing shiffman’s library to take care of the processing, although admittedly it was pretty interesting to do it myself first.

Read the rest of this entry »