Fun with nokogiri: Screen-scraping ikea.com

I recently discovered that ikea is really scraping-friendly – they have their categories and products belonging to a category all linked very clean, and the product pages themselves include, as json, the complete product data necessary to display a product somewhere else ( say, on an iPad ). They also feature the assembly instructions linked as pdf and more fun stuff. Just open any product page and browse the source, you will discover a field named “jProductData” which is just what you want.

So i decided it’s time for me to try nokogiri, a beautiful and fast framework for processing urls and searching through the HTML. I managed to write a scraper in only very few lines that actually works. Since the json is embedded in regular JavaScript, i had to use rkelly to parse the javascript part and extract the right data.

For the datamodel, I used ohm with redis, which takes about no time to setup and works as advertised.

If you are curious, the source is available at github.

Flattr, a great idea, a great video, great people behind

Flattr is exactly what I dreamed of when I wrote my post about the donation-button-dilemma on open-source projects. Flattr, by the creators of the pirate bay, is a service that let’s you specify an amount you want to spend monthly, I don’t know what they plan, but let’s say 10€, and you can flattr any project you like, very much like digging, and they will get a share of your money. Of course, this only works because the money adds up.

There are some unknown variables to me so far. I don’t know how much of the money reaches the designated receiver after all, how privacy and security is ensured, how money can be transferred to flattr and so on, yet, this idea opens up new horizons for people who want to invest time in software they don’t mean to sell, but still be able to live a life worth living.

I mentioned the video, it’s a well-made one, see below

( note: I’m using the logo without permission, hope the pirate-bay guys won’t mind that. )

impressive, prezi and other ways to kill PowerPoint

I’ve to admit, I haven’t used PowerPoint for 4 years, mostly because no one wanted me to present anything. Times have changed, I had to do 4 presentations in one week lately. Being forced to communicate through slides, I experienced a bit, and here’s my conclusion.

The first presentation was built using Keynote, which is part of iWork, Apple’s Office counterpart. Creation was really simple, the look is just amazing ( especially the transitions ), but there is one problem: I don’t have a mobile Mac at the moment, so I needed to use a friend’s MacBook to do the presentation. This is definitely a bit of a pain, considering the fact that he’s most likely not always around, and exporting to PowerPoint kills all the USPs. There is a workaround though, you can export presentations as .mov files, movies that halt on every transition and continue only after you press a key.

Second presentation, Latex Beamer producing PDFs. The ugly part is the creation, certainly. Unless you are fluid in a text editor, command line and the reading of technical documentation, this might not be your premier choice. After you’ve gracefully managed to create some slides, the output looks great and the best point about it is that any netbook will do just fine for the presentation. I used my Ubuntu-equipped machine and it worked well using the default PDF-Viewer.

Sven showed impressive to me today, a great way of tuning your PDF-only presentations by adding transitions, overview pages and other handy accessoires to finished PDF-Presentations. It’s written in python, has some dependencies ( nothing that easy_install couldn’t fix ) and works really well.

The third presentation kicked off using Google Docs. The presentation ( about Google Go ) had no style, no transitions, was built in about 10 minutes and.. worked. Nothing more, nothing less. It clearly did it’s job, yet there are, at least by now, a bunch of more pleasing ways to get the job done available.

Because a lot of people are really excited about Prezi at the moment, I decided to give it a shot. The presentations created are really different from what presentations have been in the past, it looks.. amazing, is easy to use, features a free plan. It depends on flash, which is somewhat a drawback and is, at least for me not a choice at the moment. Why? Well, first of all, there is no way to create custom styles, you are basically limited to what Prezi offers. The next problem is the lack of a free offline editor ( Google Docs is able to do that ). And of course, what you create looks create, but either I missed the feature that says “copy region from one Prezi to another” or it’s not there. Either way, without it, reuse seems impossible. Still, if you have only 3 minutes and need to deliver a stunning performance, you should go with Prezi. ( Edit: Benny just posted something about Prezi a few days ago, too )

iType Demo Video online – finally getting real

iType, my project, is finally getting close to release state. But talking is boring, so here is a short demo clip. If you have any questions, please write to itypeapp @ gmail.com

I am looking for beta testers, so if you want to test it, please let me know!

iType Demo Video from Moritz Haarmann on Vimeo.

Related Blogs

Google Docs Privacy, Latex knowledge, project update

Hey! I haven’t been writing here in a while, so let’s have some updates.

First of all, I noticed something Techcrunch was also writing about some time ago, that google doesn’t protect embedded images in private documents in _any_ way. So basically, everything you upload there can be treated as public.

Next thing, latex, the typesetting system, seems to be stuck in a time where character encodings where something rather bizarre, and for that reason has it’s share of problems with it. There are packages, like utf8 and utf8x to work around this issue, but it’s not perfect. What I’ve experienced, a strange error message telling me that

Package inputenc Error: Unicode char u8:  not set up for use with LaTeX.

And my way to get around it was to simply add the packages ucs besides utf8 to the preamble..

usepackage{ucs}
usepackage[utf8]{inputenc}

and everything is working fine now. great.

Last topic today is a project update. What’s going on?

  • A virtual keyboard for android with some nice effects, I’m doing this one together with Marc Seeger.
  • I just participated in the foundation of the Google Technology Usergroup Neckar-Alb ( A region in southern germany around Stuttgart ), and we hope to be able to provide the community here with some cool events in the future! If you want to participate or are just curious, be sure to check out our twitter.
  • I’m currently also writing a paper on distributed contact management, a really interesting field, stay tuned on that one.
  • Some hacking involved, but nothing releasable actually.

If you want some nice reading, I can really recommend two papers: the first one is on API design, and some kind of a .NET-rant, it’s called “API Design Matters” and is written by Michi Henning. The other one was written by Marc Seeger and is a practical overview of available Key-Value stores ( applause for NoSQL ). Get it on his blog.

The next weekend I’m going to spend in Hamburg, really looking forward!

Finally: ActiveResource with Service Discovery and Authentication

Yay. In case you’re looking for a release, there is none. Not yet, we are maybe going to release one, but it’s a question of time rather than a lack of good will.

Why. In my current position, I am building a set of applications ( most of them rails based ) communicating with each other in a RESTful manner. This is, well, just continue reading my ActiveResource rant here. It’s not really nice to use the official ActiveResource thing. It’s a lot of hardcoding ( e.g. you have to set the remote service’ URL in the model, not in some kind of configuration file, which makes switching from development to testing and production a pain ) and other shortcomings. It’s a good idea, yet far from being perfect. And the two things that bothered me most were service discovery, meaning, the easy ability to resolve a service by its name than by its url, and authentication. Both of them are crucial for a system exceeding the hello world boundaries. That is, what I’m doing. So, utilizing all of Ruby’s beauty, a) a Rails plugin was developed and b) a standalone Server acting as a Central Authentication Service and Service Discovery instance. And from what I can tell, it’s beautiful ( not the codebase, at the moment, but the functionality ).

What is this thing able to do? Well, for the simple parts, it handles all your authentication needs. No more password juggling, just do it in one place, and nowhere else. OpenID compatibility is on the way, both as consumer and provider. It’s nice to have this kind of functionality by only installing a plugin and create a before_filter.

The next big thing is the service discovery. ActiveResource wasn’t used as an entry point for customizations, it was HyperactiveResource. I extended it to provide the ability to connect to the above mentioned central instance ( the address of this instance is defined in a configuration file, by the way ) to retrieve a services’ address. A simple thing, yet it makes life so much easier.

Is there a clue? Yes. Bundling the two features above, you are able to allow and disallow communication between two services at your will. Bidirectional, so assuming you do have an E-Mail-service and an AddressBook-service, you now can allow the E-Mail to access your AddressBook, without allowing the other direction. Authentication is handled completely transparent to the developer, and the rest of the usage is like HyperactiveResource. Just a charm.

And for me? Fun is back :-)

Using ActiveResource in the wild

My last posts haven’t been quite technical, and they haven’t been quite recent too- I’m sorry, yet I just don’t have time. As a brief follow-up: it has been overwhelming in San Francisco ( except for the 300$-dentist bill ), and I’m so looking forward to coming back.

And, tada, I’ve been spending the last few weeks just entering Rails ( a Ghetto? maybe ) a bit more than I did before. As always, there are cool things to tell. And not so cool things, too. Start with the good ones.

Rails has proven to work extremely well in pre-production mode. No framework-sourced flaws, bugs, whatever. Just nice, and with some caching-salt, speed hasn’t been an issue, in contrast, it has been amazing.

That’s the good part of it. The bad part is that Rails claims to be capable of working in a distributed environment, providing services ( they don’t call them Web Services anymore ) addressable via REST ( buzzzz ) over HTTP, of course. Well, that’s not quite true. If you stick with the example given in the docs, and are in fact happy with a remote object capable of experiencing an unauthenticated change of some first name, this is nothing but true. But when it comes to some other features, e.g. associations or caching, one question came up: are they serious?

Let’s do some demos. In case you want to include objects of a has-many relationship, the way of choice is ( according to the ActiveRecord::Serialization docs ) either to call to_xml with really ugly parameters ( such as @ship.to_xml :include=>[:passengers, :sailors ] ) in a non-dry manner or to override to_xml, both not being exactly Rails-like, ugly, hard to maintain and only loosely coupled with real-world requirements.

The other part, the consuming part where ActiveResource is indeed responsible for handling everything, has been consequently kept free of useful stuff. This could on the one hand be a pro for people hating Rails’ magic, but it’s just quite far away from being usable. And it has nothing in common with the way ActiveRecord behaves. Any options? Sure.

At the moment, I’m sticking with HyperactiveResource, the funniest named plugin ever. It extends ActiveResource to play a bit more nice, yet it’s not the ultimate answer, but a great extension, still. If you’re into some serious Rails stuff with some distributed thingies, you should give it a try.

So what’s missing? The feeling to work with something that has been designed by someone who actually uses it doing more than changing first names. I’m really disappointed by having to deal with the framework, something I wasn’t used to with Rails before in that intensity. And while Java EE may be a pain, for distributed stuff, it’s still a very good way if you need reliable and consistent, proven-to-work solutions.

In my next post, I’ll talk about my android experiences so far. Love it :-)

First Impressions from Google I/O

Hey! I’m currently located in session room 09 and learning on how to code thee for android. Back to the headline.

After being opened by Eric Schmidt, the whole Keynote thingy mainly concentrated on HTML5 features. These features are really impressive, be it a video-tag, the canvas element ( max :-) ), background-worker processes and other related stuff that is quite handy and really capable of improving every developers live. It’s also nice to see that every major browser vendor, excluding Microsoft, seem to be well-prepared for the upcoming changes.

The clear development towards web-based everything not being distinguishable form “normal” locally-ran applications is also supported by new caching mechanisms and subsequent possibility of using e.g. gmail even with no connection, a  highly useful feature certainly leading to a broader acceptance of web-based applications.

Another thing remarkable is google’s view on android and it’s wide market adoption. In contrast to popular tech opinion, they seem to be quite satisfied, and predicting a strong year for android to come. An indicator may be that this session room I’m sitting in is overcrowded. Another is the wide range of apps presented, including gmote, a remote control application working on all platforms providing a simple to use remote control feature for your desktop, including touchscreen-to-mouse translation, remote media control and even the possibility to watch movies and pictures ( in fact, everything renderable ) stored on your computer on your handset. Nice.

I’m looking forward to the rest of this day, especially, tada, for my free htc magic. Stay tuned.

Web Development and MVC: a pain somewhere, cont’d.

Our beloved model-view-controller pattern seems to act as an excuse for every damn web-framework out there. Yet there is one problem. From my point of view, no framework to date has ever managed to slice a web application in a useful way. Really. Why?

First of all, it should be clear that patterns just like MVC can’t be adapted without major changes from desktop environments to the web. Why? Well, desktops are a safe harbor compared to every browser. The Operating System provides resources, screen estate, some kind of event handling and the promise that once you build your app around a certain system, it will work there. Of course this is the major drawback of desktop applications.

Web Applications are not built for just one system, they are mostly built with the target of total platform-independence. Including that nothing can be taken for granted, just like screen estate, resolution, browser software etc.

And then there is this huge limiting factor called HTTP. HTTP is a simple, beautiful protocol that does a great job for what it was designed. That is, delivering web pages, stateless, happiness. Which introduces just a new detail that clearly seperates desktop from the web, the web just doesn’t know anything about state. Once a server delivered a page to you, he just doesn’t care. Really. We introduced sessions and stuff, yet, there is this design-based limitation, and for the time being, we’re just working around this limitation, whereas desktop applications are stateful: ever heard of RAM?

So the problem we are actually facing is that all the well-engineered patterns of the desktop world just can’t be adapted. Sorry. There maybe certain areas where the impression of adaptability may occur, yet these impressions vanished as AJAX appeared.

Why? The traditional web-application, let’s call it Web 1.0, was based on that well-known request-response-leavemyalone cycle. Maybe some session sugar enabled a stateful behaviour, whatever. And we had a way to adapt MVC: put everything that is persistent or part of the “business logic” in a Model-Class ( aka bean ). Go ahead and search for your embedded html. Try to put in a seperate file, and tell all your geek-friends that it’s the view. And finally, the glue was the controller. And yes, it worked out really well. That was then.

Since the unfortunate launch of GMail everything has changed. There is no longer anything like “impossible”, since someone clearly demonstrated that virtually no restrictions in terms of usability and functionality exist.

Again, from my point of view, Google just didn’t reinvent the wheel, they just used available techniques in a new way.

So, with all this “new” technology, web application creators were forced to adapt, which led to a wide variety of 1) AJAX-Frameworks like Prototype and JQuery 2) Smart helper functions for server-side components to hide away the fact that none of them is able to encapsulate that new technology beyond some well-designed layer of abstraction.

Welcome to my world. I’m doing web stuff daily. Someone even pays me for doing it. But I’m really, really frustrated. There is simply no single well-designed framework, enabling me to really focus on what I want to do, not on how.

Today, business logic isn’t the problem. There are some very good approaches making it a breeze to brew out even complex logic.

But there is still too much code here ( in my repositories, maybe I’m just doing wrong ) that’s just dealing with requests. But the amount wouldn’t be a big deal, since the average linecount already dropped massively over the years. The problem is that there is absolutely no beauty. Nada. No language can hide, not even Ruby, that it’s ugly to deal with requests on a per-request level.

I don’t have a solution, though I’m thinking about it a looottt. And that’s just my opinion, expressed in some kind of english. And I’d really appreciate any single comment.